linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
@ 2023-10-06 14:12 Hardik Gajjar
  2023-10-06 14:21 ` Greg KH
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Hardik Gajjar @ 2023-10-06 14:12 UTC (permalink / raw)
  To: gregkh, s.hauer, jonathanh, linux-usb, linux-kernel
  Cc: quic_linyyuan, paul, quic_eserrao, erosca, gah2hi

From: gah2hi <external.Hardik.Gajjar@de.bosch.com>

This patch replaces the usage of netif_stop_queue with netif_device_detach
in the u_ether driver. The netif_device_detach function not only stops all
tx queues by calling netif_tx_stop_all_queues but also marks the device as
removed by clearing the __LINK_STATE_PRESENT bit.

This change helps notify user space about the disconnection of the device
more effectively, compared to netif_stop_queue, which only stops a single
transmit queue.

Signed-off-by: gah2hi <external.Hardik.Gajjar@de.bosch.com>
---
 drivers/usb/gadget/function/u_ether.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c
index 4bb0553da658..9d1c40c152d8 100644
--- a/drivers/usb/gadget/function/u_ether.c
+++ b/drivers/usb/gadget/function/u_ether.c
@@ -1200,7 +1200,7 @@ void gether_disconnect(struct gether *link)
 
 	DBG(dev, "%s\n", __func__);
 
-	netif_stop_queue(dev->net);
+	netif_device_detach(dev->net);
 	netif_carrier_off(dev->net);
 
 	/* disable endpoints, forcing (synchronous) completion
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2023-10-06 14:12 [PATCH] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach Hardik Gajjar
@ 2023-10-06 14:21 ` Greg KH
  2023-10-06 14:50 ` Hardik Gajjar
  2023-10-06 14:53 ` [PATCH v2] " Hardik Gajjar
  2 siblings, 0 replies; 32+ messages in thread
From: Greg KH @ 2023-10-06 14:21 UTC (permalink / raw)
  To: Hardik Gajjar
  Cc: s.hauer, jonathanh, linux-usb, linux-kernel, quic_linyyuan, paul,
	quic_eserrao, erosca, gah2hi

On Fri, Oct 06, 2023 at 04:12:31PM +0200, Hardik Gajjar wrote:
> From: gah2hi <external.Hardik.Gajjar@de.bosch.com>
> 
> This patch replaces the usage of netif_stop_queue with netif_device_detach
> in the u_ether driver. The netif_device_detach function not only stops all
> tx queues by calling netif_tx_stop_all_queues but also marks the device as
> removed by clearing the __LINK_STATE_PRESENT bit.
> 
> This change helps notify user space about the disconnection of the device
> more effectively, compared to netif_stop_queue, which only stops a single
> transmit queue.
> 
> Signed-off-by: gah2hi <external.Hardik.Gajjar@de.bosch.com>
> ---
>  drivers/usb/gadget/function/u_ether.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c
> index 4bb0553da658..9d1c40c152d8 100644
> --- a/drivers/usb/gadget/function/u_ether.c
> +++ b/drivers/usb/gadget/function/u_ether.c
> @@ -1200,7 +1200,7 @@ void gether_disconnect(struct gether *link)
>  
>  	DBG(dev, "%s\n", __func__);
>  
> -	netif_stop_queue(dev->net);
> +	netif_device_detach(dev->net);
>  	netif_carrier_off(dev->net);
>  
>  	/* disable endpoints, forcing (synchronous) completion
> -- 
> 2.17.1
> 
> 

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- It looks like you did not use your "real" name for the patch on either
  the Signed-off-by: line, or the From: line (both of which have to
  match).  Please read the kernel file,
  Documentation/process/submitting-patches.rst for how to do this
  correctly.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2023-10-06 14:12 [PATCH] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach Hardik Gajjar
  2023-10-06 14:21 ` Greg KH
@ 2023-10-06 14:50 ` Hardik Gajjar
  2023-10-06 14:53 ` [PATCH v2] " Hardik Gajjar
  2 siblings, 0 replies; 32+ messages in thread
From: Hardik Gajjar @ 2023-10-06 14:50 UTC (permalink / raw)
  To: gregkh, s.hauer, jonathanh, linux-usb, linux-kernel
  Cc: quic_linyyuan, paul, quic_eserrao, erosca, gah2hi

From: gah2hi <external.Hardik.Gajjar@de.bosch.com>

This patch replaces the usage of netif_stop_queue with netif_device_detach
in the u_ether driver. The netif_device_detach function not only stops all
tx queues by calling netif_tx_stop_all_queues but also marks the device as
removed by clearing the __LINK_STATE_PRESENT bit.

This change helps notify user space about the disconnection of the device
more effectively, compared to netif_stop_queue, which only stops a single
transmit queue.

Signed-off-by: gah2hi <external.Hardik.Gajjar@de.bosch.com>
---
 drivers/usb/gadget/function/u_ether.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c
index 4bb0553da658..9d1c40c152d8 100644
--- a/drivers/usb/gadget/function/u_ether.c
+++ b/drivers/usb/gadget/function/u_ether.c
@@ -1200,7 +1200,7 @@ void gether_disconnect(struct gether *link)
 
 	DBG(dev, "%s\n", __func__);
 
-	netif_stop_queue(dev->net);
+	netif_device_detach(dev->net);
 	netif_carrier_off(dev->net);
 
 	/* disable endpoints, forcing (synchronous) completion
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2023-10-06 14:12 [PATCH] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach Hardik Gajjar
  2023-10-06 14:21 ` Greg KH
  2023-10-06 14:50 ` Hardik Gajjar
@ 2023-10-06 14:53 ` Hardik Gajjar
  2023-10-06 14:59   ` Greg KH
  2023-10-06 15:38   ` [PATCH v3] " Hardik Gajjar
  2 siblings, 2 replies; 32+ messages in thread
From: Hardik Gajjar @ 2023-10-06 14:53 UTC (permalink / raw)
  To: gregkh, s.hauer, jonathanh, linux-usb, linux-kernel
  Cc: quic_linyyuan, paul, quic_eserrao, erosca, hgajjar

This patch replaces the usage of netif_stop_queue with netif_device_detach
in the u_ether driver. The netif_device_detach function not only stops all
tx queues by calling netif_tx_stop_all_queues but also marks the device as
removed by clearing the __LINK_STATE_PRESENT bit.

This change helps notify user space about the disconnection of the device
more effectively, compared to netif_stop_queue, which only stops a single
transmit queue.

Changes since version 1:
	- Correct Singed-off user name and e-mail

Signed-off-by: Hardik Gajjar <hgajjar@de.adit-jv.com>
---
 drivers/usb/gadget/function/u_ether.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c
index 4bb0553da658..b0daee35b996 100644
--- a/drivers/usb/gadget/function/u_ether.c
+++ b/drivers/usb/gadget/function/u_ether.c
@@ -635,7 +635,7 @@ static int eth_stop(struct net_device *net)
 	unsigned long	flags;
 
 	VDBG(dev, "%s\n", __func__);
-	netif_stop_queue(net);
+	netif_device_detach(dev->net);
 
 	DBG(dev, "stop stats: rx/tx %ld/%ld, errs %ld/%ld\n",
 		dev->net->stats.rx_packets, dev->net->stats.tx_packets,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2023-10-06 14:53 ` [PATCH v2] " Hardik Gajjar
@ 2023-10-06 14:59   ` Greg KH
  2023-10-06 15:38   ` [PATCH v3] " Hardik Gajjar
  1 sibling, 0 replies; 32+ messages in thread
From: Greg KH @ 2023-10-06 14:59 UTC (permalink / raw)
  To: Hardik Gajjar
  Cc: s.hauer, jonathanh, linux-usb, linux-kernel, quic_linyyuan, paul,
	quic_eserrao, erosca

On Fri, Oct 06, 2023 at 04:53:32PM +0200, Hardik Gajjar wrote:
> This patch replaces the usage of netif_stop_queue with netif_device_detach
> in the u_ether driver. The netif_device_detach function not only stops all
> tx queues by calling netif_tx_stop_all_queues but also marks the device as
> removed by clearing the __LINK_STATE_PRESENT bit.
> 
> This change helps notify user space about the disconnection of the device
> more effectively, compared to netif_stop_queue, which only stops a single
> transmit queue.
> 
> Changes since version 1:
> 	- Correct Singed-off user name and e-mail

Nit, this goes below the --- line :(

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2023-10-06 14:53 ` [PATCH v2] " Hardik Gajjar
  2023-10-06 14:59   ` Greg KH
@ 2023-10-06 15:38   ` Hardik Gajjar
  2023-10-06 15:56     ` [PATCH v4] " Hardik Gajjar
  1 sibling, 1 reply; 32+ messages in thread
From: Hardik Gajjar @ 2023-10-06 15:38 UTC (permalink / raw)
  To: gregkh, s.hauer, jonathanh, linux-usb, linux-kernel
  Cc: quic_linyyuan, paul, quic_eserrao, erosca, hgajjar

This patch replaces the usage of netif_stop_queue with netif_device_detach
in the u_ether driver. The netif_device_detach function not only stops all
tx queues by calling netif_tx_stop_all_queues but also marks the device as
removed by clearing the __LINK_STATE_PRESENT bit.

This change helps notify user space about the disconnection of the device
more effectively, compared to netif_stop_queue, which only stops a single
transmit queue.

Signed-off-by: Hardik Gajjar <hgajjar@de.adit-jv.com>
---
Changes since version 1:
	- Correct Singed-off user name and e-mail

Changes since version 2:
	- Move change history below signed-off-by
---
 drivers/usb/gadget/function/u_ether.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c
index 4bb0553da658..b0daee35b996 100644
--- a/drivers/usb/gadget/function/u_ether.c
+++ b/drivers/usb/gadget/function/u_ether.c
@@ -635,7 +635,7 @@ static int eth_stop(struct net_device *net)
 	unsigned long	flags;
 
 	VDBG(dev, "%s\n", __func__);
-	netif_stop_queue(net);
+	netif_device_detach(dev->net);
 
 	DBG(dev, "stop stats: rx/tx %ld/%ld, errs %ld/%ld\n",
 		dev->net->stats.rx_packets, dev->net->stats.tx_packets,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2023-10-06 15:38   ` [PATCH v3] " Hardik Gajjar
@ 2023-10-06 15:56     ` Hardik Gajjar
  2024-01-14 16:59       ` Andy Shevchenko
  0 siblings, 1 reply; 32+ messages in thread
From: Hardik Gajjar @ 2023-10-06 15:56 UTC (permalink / raw)
  To: gregkh, s.hauer, jonathanh, linux-usb, linux-kernel
  Cc: quic_linyyuan, paul, quic_eserrao, erosca, hgajjar

This patch replaces the usage of netif_stop_queue with netif_device_detach
in the u_ether driver. The netif_device_detach function not only stops all
tx queues by calling netif_tx_stop_all_queues but also marks the device as
removed by clearing the __LINK_STATE_PRESENT bit.

This change helps notify user space about the disconnection of the device
more effectively, compared to netif_stop_queue, which only stops a single
transmit queue.

Signed-off-by: Hardik Gajjar <hgajjar@de.adit-jv.com>
---
Changes since version 1:
	- Correct Singed-off user name and e-mail

Changes since version 2:
	- Move change history below signed-off-by

Changes since version 3:
	- move netif_device_detach from eth_stop to gether_disconnect.
---
 drivers/usb/gadget/function/u_ether.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c
index 4bb0553da658..9d1c40c152d8 100644
--- a/drivers/usb/gadget/function/u_ether.c
+++ b/drivers/usb/gadget/function/u_ether.c
@@ -1200,7 +1200,7 @@ void gether_disconnect(struct gether *link)
 
 	DBG(dev, "%s\n", __func__);
 
-	netif_stop_queue(dev->net);
+	netif_device_detach(dev->net);
 	netif_carrier_off(dev->net);
 
 	/* disable endpoints, forcing (synchronous) completion
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2023-10-06 15:56     ` [PATCH v4] " Hardik Gajjar
@ 2024-01-14 16:59       ` Andy Shevchenko
  2024-01-15 13:27         ` Hardik Gajjar
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Shevchenko @ 2024-01-14 16:59 UTC (permalink / raw)
  To: Hardik Gajjar, Ferry Toth
  Cc: gregkh, s.hauer, jonathanh, linux-usb, linux-kernel,
	quic_linyyuan, paul, quic_eserrao, erosca

+Cc: Ferry.

On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
> This patch replaces the usage of netif_stop_queue with netif_device_detach
> in the u_ether driver. The netif_device_detach function not only stops all
> tx queues by calling netif_tx_stop_all_queues but also marks the device as
> removed by clearing the __LINK_STATE_PRESENT bit.
> 
> This change helps notify user space about the disconnection of the device
> more effectively, compared to netif_stop_queue, which only stops a single
> transmit queue.

This change effectively broke my USB ether setup.

git bisect start
# status: waiting for both good and bad commits
# good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag 'tty-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
# status: waiting for bad commit, 1 good commit known
# bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag 'usb-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
# bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix wrong data added to platform device
git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
# good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb: xhci-mtk: add a bandwidth budget table
git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
# good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb: gadget: uvc: cleanup request when not in correct state"
git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
# bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec: tps6598x: Refactor tps6598x port registration
git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
# bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
# good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip: fix stub_dev hub disconnect
git bisect good 97475763484245916735a1aa9a3310a01d46b008
# good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest: fix a type promotion bug
git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
# first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

Note, revert indeed helps. Should I send a revert?

I use configfs to setup USB EEM function and it worked till this commit.
If needed, I can share my scripts, but I believe it's not needed as here
we see a clear regression.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-01-14 16:59       ` Andy Shevchenko
@ 2024-01-15 13:27         ` Hardik Gajjar
  2024-01-15 20:10           ` Ferry Toth
  0 siblings, 1 reply; 32+ messages in thread
From: Hardik Gajjar @ 2024-01-15 13:27 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Hardik Gajjar, Ferry Toth, gregkh, s.hauer, jonathanh, linux-usb,
	linux-kernel, quic_linyyuan, paul, quic_eserrao, erosca

On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote:
> +Cc: Ferry.
> 
> On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
> > This patch replaces the usage of netif_stop_queue with netif_device_detach
> > in the u_ether driver. The netif_device_detach function not only stops all
> > tx queues by calling netif_tx_stop_all_queues but also marks the device as
> > removed by clearing the __LINK_STATE_PRESENT bit.
> > 
> > This change helps notify user space about the disconnection of the device
> > more effectively, compared to netif_stop_queue, which only stops a single
> > transmit queue.
> 
> This change effectively broke my USB ether setup.
> 
> git bisect start
> # status: waiting for both good and bad commits
> # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag 'tty-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
> # status: waiting for bad commit, 1 good commit known
> # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag 'usb-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
> git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
> # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix wrong data added to platform device
> git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
> # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb: xhci-mtk: add a bandwidth budget table
> git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
> # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb: gadget: uvc: cleanup request when not in correct state"
> git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
> # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec: tps6598x: Refactor tps6598x port registration
> git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
> # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
> git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
> # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip: fix stub_dev hub disconnect
> git bisect good 97475763484245916735a1aa9a3310a01d46b008
> # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest: fix a type promotion bug
> git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
> # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
> 
> Note, revert indeed helps. Should I send a revert?
> 
> I use configfs to setup USB EEM function and it worked till this commit.
> If needed, I can share my scripts, but I believe it's not needed as here
> we see a clear regression.
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 

Without this patch, there may be a potential crash in a race condition, as __LINK_STATE_PRESENT is monitored at many places in the Network stack to determine the status of the link.

Could you please provide details on how this patch affects your functionality? Are you experiencing connection problems or data transfer interruptions?
Instead of reverting this patch, consider trying the upcoming patch (soon to be available in the mainline) to see if it resolves your issue.

https://lore.kernel.org/lkml/2023122900-commence-agenda-db2c@gregkh/T/#m36a812d3f1e5d744ee32381f6ae4185940b376de

Thanks,
Hardik

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-01-15 13:27         ` Hardik Gajjar
@ 2024-01-15 20:10           ` Ferry Toth
  2024-04-03 21:01             ` Ferry Toth
  0 siblings, 1 reply; 32+ messages in thread
From: Ferry Toth @ 2024-01-15 20:10 UTC (permalink / raw)
  To: Hardik Gajjar, Andy Shevchenko
  Cc: Ferry Toth, gregkh, s.hauer, jonathanh, linux-usb, linux-kernel,
	quic_linyyuan, paul, quic_eserrao, erosca

Hi,

Op 15-01-2024 om 14:27 schreef Hardik Gajjar:
> On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote:
>> +Cc: Ferry.
>>
>> On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
>>> This patch replaces the usage of netif_stop_queue with netif_device_detach
>>> in the u_ether driver. The netif_device_detach function not only stops all
>>> tx queues by calling netif_tx_stop_all_queues but also marks the device as
>>> removed by clearing the __LINK_STATE_PRESENT bit.
>>>
>>> This change helps notify user space about the disconnection of the device
>>> more effectively, compared to netif_stop_queue, which only stops a single
>>> transmit queue.
>>
>> This change effectively broke my USB ether setup.
>>
>> git bisect start
>> # status: waiting for both good and bad commits
>> # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag 'tty-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
>> git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
>> # status: waiting for bad commit, 1 good commit known
>> # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag 'usb-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
>> git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
>> # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix wrong data added to platform device
>> git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
>> # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb: xhci-mtk: add a bandwidth budget table
>> git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
>> # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb: gadget: uvc: cleanup request when not in correct state"
>> git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
>> # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec: tps6598x: Refactor tps6598x port registration
>> git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
>> # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
>> git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
>> # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip: fix stub_dev hub disconnect
>> git bisect good 97475763484245916735a1aa9a3310a01d46b008
>> # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest: fix a type promotion bug
>> git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
>> # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
>>
>> Note, revert indeed helps. Should I send a revert?
>>
>> I use configfs to setup USB EEM function and it worked till this commit.
>> If needed, I can share my scripts, but I believe it's not needed as here
>> we see a clear regression.
>>
>> -- 
>> With Best Regards,
>> Andy Shevchenko
>>
>>
> 
> Without this patch, there may be a potential crash in a race condition, as __LINK_STATE_PRESENT is monitored at many places in the Network stack to determine the status of the link.
> 
> Could you please provide details on how this patch affects your functionality? Are you experiencing connection problems or data transfer interruptions?

In my case on mrfld (Intel Edison Arduino) using configfs with this 
patch no config from host through dhcp is received. Manual setting 
correct ipv4 addr / mask / gw still no connection.

> Instead of reverting this patch, consider trying the upcoming patch (soon to be available in the mainline) to see if it resolves your issue.
> 
> https://lore.kernel.org/lkml/2023122900-commence-agenda-db2c@gregkh/T/#m36a812d3f1e5d744ee32381f6ae4185940b376de

This patch works for me with v6.7.0.

> Thanks,
> Hardik


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-01-15 20:10           ` Ferry Toth
@ 2024-04-03 21:01             ` Ferry Toth
  2024-04-05 11:38               ` Hardik Gajjar
  0 siblings, 1 reply; 32+ messages in thread
From: Ferry Toth @ 2024-04-03 21:01 UTC (permalink / raw)
  To: Ferry Toth, Hardik Gajjar, Andy Shevchenko
  Cc: gregkh, s.hauer, jonathanh, linux-usb, linux-kernel,
	quic_linyyuan, paul, quic_eserrao, erosca

Hi,

Op 15-01-2024 om 21:10 schreef Ferry Toth:
> Hi,
>
> Op 15-01-2024 om 14:27 schreef Hardik Gajjar:
>> On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote:
>>> +Cc: Ferry.
>>>
>>> On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
>>>> This patch replaces the usage of netif_stop_queue with 
>>>> netif_device_detach
>>>> in the u_ether driver. The netif_device_detach function not only 
>>>> stops all
>>>> tx queues by calling netif_tx_stop_all_queues but also marks the 
>>>> device as
>>>> removed by clearing the __LINK_STATE_PRESENT bit.
>>>>
>>>> This change helps notify user space about the disconnection of the 
>>>> device
>>>> more effectively, compared to netif_stop_queue, which only stops a 
>>>> single
>>>> transmit queue.
>>>
>>> This change effectively broke my USB ether setup.
>>>
>>> git bisect start
>>> # status: waiting for both good and bad commits
>>> # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag 
>>> 'tty-6.7-rc1' of 
>>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
>>> git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
>>> # status: waiting for bad commit, 1 good commit known
>>> # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag 
>>> 'usb-6.7-rc1' of 
>>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
>>> git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
>>> # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix 
>>> wrong data added to platform device
>>> git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
>>> # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb: xhci-mtk: 
>>> add a bandwidth budget table
>>> git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
>>> # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb: 
>>> gadget: uvc: cleanup request when not in correct state"
>>> git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
>>> # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec: 
>>> tps6598x: Refactor tps6598x port registration
>>> git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
>>> # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: 
>>> u_ether: Replace netif_stop_queue with netif_device_detach
>>> git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
>>> # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip: fix 
>>> stub_dev hub disconnect
>>> git bisect good 97475763484245916735a1aa9a3310a01d46b008
>>> # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest: fix 
>>> a type promotion bug
>>> git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
>>> # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: 
>>> gadget: u_ether: Replace netif_stop_queue with netif_device_detach
>>>
>>> Note, revert indeed helps. Should I send a revert?
>>>
>>> I use configfs to setup USB EEM function and it worked till this 
>>> commit.
>>> If needed, I can share my scripts, but I believe it's not needed as 
>>> here
>>> we see a clear regression.
>>>
>>> -- 
>>> With Best Regards,
>>> Andy Shevchenko
>>>
>>>
>>
>> Without this patch, there may be a potential crash in a race 
>> condition, as __LINK_STATE_PRESENT is monitored at many places in the 
>> Network stack to determine the status of the link.
>>
>> Could you please provide details on how this patch affects your 
>> functionality? Are you experiencing connection problems or data 
>> transfer interruptions?
>
> In my case on mrfld (Intel Edison Arduino) using configfs with this 
> patch no config from host through dhcp is received. Manual setting 
> correct ipv4 addr / mask / gw still no connection.
>
>> Instead of reverting this patch, consider trying the upcoming patch 
>> (soon to be available in the mainline) to see if it resolves your issue.
>>
>> https://lore.kernel.org/lkml/2023122900-commence-agenda-db2c@gregkh/T/#m36a812d3f1e5d744ee32381f6ae4185940b376de 
>>
>
> This patch works for me with v6.7.0.

I need to revisit this. The patch in this topic landed in v6.7.0-rc1 
(f49449fbc21e) and breaks the gadget mrfld (Intel Edison Arduino) and 
other platforms as well.

The mentioned fix "usb: gadget: u_ether: Re-attach netif device to 
mirror detachment*" * has landed in v6.8.0-rc1 (76c945730). What it does 
fix: I am able to make a USB EEM function again.

However, now a hidden issue appears. With mrfld there is an external 
switch to easily switch between host and device mode.

What is not fixed:

- when in device mode and unplugging/plugging the cable when using 
`ifconfig usb0` the line "usb0: 
flags=4163<UP,BROADCAST,RUNNING,MULTICAST>" changes to "usb0: 
flags=4099<UP,BROADCAST,MULTICAST>" as is supposed to, the route table 
is updated and the dir `/sys/class/net/usb0` exists and in the dir `cat 
carrier*` shows the carrier up and down counts. This is the expected 
behavior.

- when in device mode and switching to host mode `ifconfig usb0` 
continues to show "RUNNING", the route table is not modified and the dir 
`/sys/class/net/usb0` no longer exists.

- switching to device mode again, USB EEM works fine, no changes to 
RUNNING or the route table happen and the dir `/sys/class/net/usb0` 
still is non- existing.

- unplugging/plugging the cable in device mode after this does not 
restore the original situation.

This behavior I tested on v6.9.0-rc2 (with a few unrelated but essential 
patches on top) and bisected back to this patch in v6.70-rc1.

It seems `netif_device_detach` does not completely clean up as expected 
and `netif_device_attach` does not completely rebuild.

I am wondering if on other platforms this can be reproduced? If so, inho 
it would be best to revert the both patches until the issue is resolved.

Thanks,

Ferry

>> Thanks,
>> Hardik
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-04-03 21:01             ` Ferry Toth
@ 2024-04-05 11:38               ` Hardik Gajjar
  2024-04-07 20:51                 ` Ferry Toth
  0 siblings, 1 reply; 32+ messages in thread
From: Hardik Gajjar @ 2024-04-05 11:38 UTC (permalink / raw)
  To: Ferry Toth
  Cc: Hardik Gajjar, Andy Shevchenko, gregkh, s.hauer, jonathanh,
	linux-usb, linux-kernel, quic_linyyuan, paul, quic_eserrao,
	erosca

On Wed, Apr 03, 2024 at 11:01:58PM +0200, Ferry Toth wrote:
> Hi,
> 
> Op 15-01-2024 om 21:10 schreef Ferry Toth:
> > Hi,
> > 
> > Op 15-01-2024 om 14:27 schreef Hardik Gajjar:
> > > On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote:
> > > > +Cc: Ferry.
> > > > 
> > > > On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
> > > > > This patch replaces the usage of netif_stop_queue with
> > > > > netif_device_detach
> > > > > in the u_ether driver. The netif_device_detach function not
> > > > > only stops all
> > > > > tx queues by calling netif_tx_stop_all_queues but also marks
> > > > > the device as
> > > > > removed by clearing the __LINK_STATE_PRESENT bit.
> > > > > 
> > > > > This change helps notify user space about the disconnection
> > > > > of the device
> > > > > more effectively, compared to netif_stop_queue, which only
> > > > > stops a single
> > > > > transmit queue.
> > > > 
> > > > This change effectively broke my USB ether setup.
> > > > 
> > > > git bisect start
> > > > # status: waiting for both good and bad commits
> > > > # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag
> > > > 'tty-6.7-rc1' of
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> > > > git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
> > > > # status: waiting for bad commit, 1 good commit known
> > > > # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag
> > > > 'usb-6.7-rc1' of
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
> > > > git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
> > > > # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix
> > > > wrong data added to platform device
> > > > git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
> > > > # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb:
> > > > xhci-mtk: add a bandwidth budget table
> > > > git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
> > > > # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb:
> > > > gadget: uvc: cleanup request when not in correct state"
> > > > git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
> > > > # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec:
> > > > tps6598x: Refactor tps6598x port registration
> > > > git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
> > > > # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget:
> > > > u_ether: Replace netif_stop_queue with netif_device_detach
> > > > git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
> > > > # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip:
> > > > fix stub_dev hub disconnect
> > > > git bisect good 97475763484245916735a1aa9a3310a01d46b008
> > > > # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest:
> > > > fix a type promotion bug
> > > > git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
> > > > # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918]
> > > > usb: gadget: u_ether: Replace netif_stop_queue with
> > > > netif_device_detach
> > > > 
> > > > Note, revert indeed helps. Should I send a revert?
> > > > 
> > > > I use configfs to setup USB EEM function and it worked till this
> > > > commit.
> > > > If needed, I can share my scripts, but I believe it's not needed
> > > > as here
> > > > we see a clear regression.
> > > > 
> > > > -- 
> > > > With Best Regards,
> > > > Andy Shevchenko
> > > > 
> > > > 
> > > 
> > > Without this patch, there may be a potential crash in a race
> > > condition, as __LINK_STATE_PRESENT is monitored at many places in
> > > the Network stack to determine the status of the link.
> > > 
> > > Could you please provide details on how this patch affects your
> > > functionality? Are you experiencing connection problems or data
> > > transfer interruptions?
> > 
> > In my case on mrfld (Intel Edison Arduino) using configfs with this
> > patch no config from host through dhcp is received. Manual setting
> > correct ipv4 addr / mask / gw still no connection.
> > 
> > > Instead of reverting this patch, consider trying the upcoming patch
> > > (soon to be available in the mainline) to see if it resolves your
> > > issue.
> > > 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_2023122900-2Dcommence-2Dagenda-2Ddb2c-40gregkh_T_-23m36a812d3f1e5d744ee32381f6ae4185940b376de&d=DwICaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=4g6EtvkKKfp8YYHpU196b2HzQxCMgsAhuo8pFng3X4TCWdcOVEUCug2-l2hRfLyV&s=t82wZAFwm2FTSaacgsmSpZWvWEa89jN8GX-okIJrwFw&e=
> > > 
> > 
> > This patch works for me with v6.7.0.
> 
> I need to revisit this. The patch in this topic landed in v6.7.0-rc1
> (f49449fbc21e) and breaks the gadget mrfld (Intel Edison Arduino) and other
> platforms as well.
> 
> The mentioned fix "usb: gadget: u_ether: Re-attach netif device to mirror
> detachment*" * has landed in v6.8.0-rc1 (76c945730). What it does fix: I am
> able to make a USB EEM function again.
> 
> However, now a hidden issue appears. With mrfld there is an external switch
> to easily switch between host and device mode.
> 
> What is not fixed:
> 
> - when in device mode and unplugging/plugging the cable when using `ifconfig
> usb0` the line "usb0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>" changes to
> "usb0: flags=4099<UP,BROADCAST,MULTICAST>" as is supposed to, the route
> table is updated and the dir `/sys/class/net/usb0` exists and in the dir
> `cat carrier*` shows the carrier up and down counts. This is the expected
> behavior.
> 
> - when in device mode and switching to host mode `ifconfig usb0` continues
> to show "RUNNING", the route table is not modified and the dir
> `/sys/class/net/usb0` no longer exists.
> 
> - switching to device mode again, USB EEM works fine, no changes to RUNNING
> or the route table happen and the dir `/sys/class/net/usb0` still is non-
> existing.
> 
> - unplugging/plugging the cable in device mode after this does not restore
> the original situation.
> 
> This behavior I tested on v6.9.0-rc2 (with a few unrelated but essential
> patches on top) and bisected back to this patch in v6.70-rc1.
> 
> It seems `netif_device_detach` does not completely clean up as expected and
> `netif_device_attach` does not completely rebuild.
> 
> I am wondering if on other platforms this can be reproduced? If so, inho it
> would be best to revert the both patches until the issue is resolved.
> 
> Thanks,
> 
> Ferry

I'm wondering why the /sys/class/net/usb0 directory is being removed when the network interface is still available.
This behavior seems not correct.

The gether_cleanup function should remove the interface along with the associated kobject,
and this function should only be called during the unloading of the driver or deleting the gadget.
Could you please confirm whether any of your custom modifications are removing the net interface kobject?

> 
> > > Thanks,
> > > Hardik
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-04-05 11:38               ` Hardik Gajjar
@ 2024-04-07 20:51                 ` Ferry Toth
  2024-04-10 17:37                   ` Andy Shevchenko
  0 siblings, 1 reply; 32+ messages in thread
From: Ferry Toth @ 2024-04-07 20:51 UTC (permalink / raw)
  To: Hardik Gajjar
  Cc: Andy Shevchenko, gregkh, s.hauer, jonathanh, linux-usb,
	linux-kernel, quic_linyyuan, paul, quic_eserrao, erosca

Hi

Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> On Wed, Apr 03, 2024 at 11:01:58PM +0200, Ferry Toth wrote:
>> Hi,
>>
>> Op 15-01-2024 om 21:10 schreef Ferry Toth:
>>> Hi,
>>>
>>> Op 15-01-2024 om 14:27 schreef Hardik Gajjar:
>>>> On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote:
>>>>> +Cc: Ferry.
>>>>>
>>>>> On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
>>>>>> This patch replaces the usage of netif_stop_queue with
>>>>>> netif_device_detach
>>>>>> in the u_ether driver. The netif_device_detach function not
>>>>>> only stops all
>>>>>> tx queues by calling netif_tx_stop_all_queues but also marks
>>>>>> the device as
>>>>>> removed by clearing the __LINK_STATE_PRESENT bit.
>>>>>>
>>>>>> This change helps notify user space about the disconnection
>>>>>> of the device
>>>>>> more effectively, compared to netif_stop_queue, which only
>>>>>> stops a single
>>>>>> transmit queue.
>>>>> This change effectively broke my USB ether setup.
>>>>>
>>>>> git bisect start
>>>>> # status: waiting for both good and bad commits
>>>>> # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag
>>>>> 'tty-6.7-rc1' of
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
>>>>> git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
>>>>> # status: waiting for bad commit, 1 good commit known
>>>>> # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag
>>>>> 'usb-6.7-rc1' of
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
>>>>> git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
>>>>> # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix
>>>>> wrong data added to platform device
>>>>> git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
>>>>> # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb:
>>>>> xhci-mtk: add a bandwidth budget table
>>>>> git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
>>>>> # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb:
>>>>> gadget: uvc: cleanup request when not in correct state"
>>>>> git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
>>>>> # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec:
>>>>> tps6598x: Refactor tps6598x port registration
>>>>> git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
>>>>> # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget:
>>>>> u_ether: Replace netif_stop_queue with netif_device_detach
>>>>> git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
>>>>> # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip:
>>>>> fix stub_dev hub disconnect
>>>>> git bisect good 97475763484245916735a1aa9a3310a01d46b008
>>>>> # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest:
>>>>> fix a type promotion bug
>>>>> git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
>>>>> # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918]
>>>>> usb: gadget: u_ether: Replace netif_stop_queue with
>>>>> netif_device_detach
>>>>>
>>>>> Note, revert indeed helps. Should I send a revert?
>>>>>
>>>>> I use configfs to setup USB EEM function and it worked till this
>>>>> commit.
>>>>> If needed, I can share my scripts, but I believe it's not needed
>>>>> as here
>>>>> we see a clear regression.
>>>>>
>>>>> -- 
>>>>> With Best Regards,
>>>>> Andy Shevchenko
>>>>>
>>>>>
>>>> Without this patch, there may be a potential crash in a race
>>>> condition, as __LINK_STATE_PRESENT is monitored at many places in
>>>> the Network stack to determine the status of the link.
>>>>
>>>> Could you please provide details on how this patch affects your
>>>> functionality? Are you experiencing connection problems or data
>>>> transfer interruptions?
>>> In my case on mrfld (Intel Edison Arduino) using configfs with this
>>> patch no config from host through dhcp is received. Manual setting
>>> correct ipv4 addr / mask / gw still no connection.
>>>
>>>> Instead of reverting this patch, consider trying the upcoming patch
>>>> (soon to be available in the mainline) to see if it resolves your
>>>> issue.
>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_2023122900-2Dcommence-2Dagenda-2Ddb2c-40gregkh_T_-23m36a812d3f1e5d744ee32381f6ae4185940b376de&d=DwICaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=4g6EtvkKKfp8YYHpU196b2HzQxCMgsAhuo8pFng3X4TCWdcOVEUCug2-l2hRfLyV&s=t82wZAFwm2FTSaacgsmSpZWvWEa89jN8GX-okIJrwFw&e=
>>>>
>>> This patch works for me with v6.7.0.
>> I need to revisit this. The patch in this topic landed in v6.7.0-rc1
>> (f49449fbc21e) and breaks the gadget mrfld (Intel Edison Arduino) and other
>> platforms as well.
>>
>> The mentioned fix "usb: gadget: u_ether: Re-attach netif device to mirror
>> detachment*" * has landed in v6.8.0-rc1 (76c945730). What it does fix: I am
>> able to make a USB EEM function again.
>>
>> However, now a hidden issue appears. With mrfld there is an external switch
>> to easily switch between host and device mode.
>>
>> What is not fixed:
>>
>> - when in device mode and unplugging/plugging the cable when using `ifconfig
>> usb0` the line "usb0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>" changes to
>> "usb0: flags=4099<UP,BROADCAST,MULTICAST>" as is supposed to, the route
>> table is updated and the dir `/sys/class/net/usb0` exists and in the dir
>> `cat carrier*` shows the carrier up and down counts. This is the expected
>> behavior.
>>
>> - when in device mode and switching to host mode `ifconfig usb0` continues
>> to show "RUNNING", the route table is not modified and the dir
>> `/sys/class/net/usb0` no longer exists.
>>
>> - switching to device mode again, USB EEM works fine, no changes to RUNNING
>> or the route table happen and the dir `/sys/class/net/usb0` still is non-
>> existing.
>>
>> - unplugging/plugging the cable in device mode after this does not restore
>> the original situation.
>>
>> This behavior I tested on v6.9.0-rc2 (with a few unrelated but essential
>> patches on top) and bisected back to this patch in v6.70-rc1.
>>
>> It seems `netif_device_detach` does not completely clean up as expected and
>> `netif_device_attach` does not completely rebuild.
>>
>> I am wondering if on other platforms this can be reproduced? If so, inho it
>> would be best to revert the both patches until the issue is resolved.
>>
>> Thanks,
>>
>> Ferry
> I'm wondering why the /sys/class/net/usb0 directory is being removed when the network interface is still available.
> This behavior seems not correct.

Exactly. And this didn't happen before the 2 patches.

To be precise: /sys/class/net/usb0 is not removed and it is a link, the 
link target 
/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no 
longer exists

> The gether_cleanup function should remove the interface along with the associated kobject,
> and this function should only be called during the unloading of the driver or deleting the gadget.
> Could you please confirm whether any of your custom modifications are removing the net interface kobject?

As far as know not. I have one tusb1210 (usb phy) and 2 dwc3 patches, 
nothing related to gadgets or net.

For reference, patches are here: 
https://github.com/htot/meta-intel-edison/tree/nanbield/meta-intel-edison-bsp/recipes-kernel/linux/files

0001-phy-ti-tusb1210-write-to-scratch-on-power-on.patch

0001a-usb-dwc3-core-Fix-dwc3_core_soft_reset-before-anythi.patch

044-REVERTME-usb-dwc3-gadget-skip-endpoints-ep-18-in-out.patch

Seems (just guessing) gether_cleanup is being called due to the switch 
to host mode, but cleanup only partly succeeds. Now I'm finding I can 
make the net interface disappear with `ip link set dev usb0 down` and 
the broken link goes away by destroying the gadget in configfs.

Is that intentional? Do I need to tear down the gadgets in reverse order 
as on create when switching to host mode? That would be new.

What happens when you trigger a switch to host mode without actually 
unplugging your cable?

>>>> Thanks,
>>>> Hardik

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-04-07 20:51                 ` Ferry Toth
@ 2024-04-10 17:37                   ` Andy Shevchenko
  2024-04-11 14:26                     ` Hardik Gajjar
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Shevchenko @ 2024-04-10 17:37 UTC (permalink / raw)
  To: Ferry Toth
  Cc: Hardik Gajjar, gregkh, s.hauer, jonathanh, linux-usb,
	linux-kernel, quic_linyyuan, paul, quic_eserrao, erosca

On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> > On Wed, Apr 03, 2024 at 11:01:58PM +0200, Ferry Toth wrote:
> > > Op 15-01-2024 om 21:10 schreef Ferry Toth:
> > > > Op 15-01-2024 om 14:27 schreef Hardik Gajjar:
> > > > > On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote:
> > > > > > On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
> > > > > > > This patch replaces the usage of netif_stop_queue with
> > > > > > > netif_device_detach
> > > > > > > in the u_ether driver. The netif_device_detach function not
> > > > > > > only stops all
> > > > > > > tx queues by calling netif_tx_stop_all_queues but also marks
> > > > > > > the device as
> > > > > > > removed by clearing the __LINK_STATE_PRESENT bit.
> > > > > > > 
> > > > > > > This change helps notify user space about the disconnection
> > > > > > > of the device
> > > > > > > more effectively, compared to netif_stop_queue, which only
> > > > > > > stops a single
> > > > > > > transmit queue.
> > > > > > This change effectively broke my USB ether setup.
> > > > > > 
> > > > > > git bisect start
> > > > > > # status: waiting for both good and bad commits
> > > > > > # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag
> > > > > > 'tty-6.7-rc1' of
> > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> > > > > > git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
> > > > > > # status: waiting for bad commit, 1 good commit known
> > > > > > # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag
> > > > > > 'usb-6.7-rc1' of
> > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
> > > > > > git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
> > > > > > # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix
> > > > > > wrong data added to platform device
> > > > > > git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
> > > > > > # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb:
> > > > > > xhci-mtk: add a bandwidth budget table
> > > > > > git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
> > > > > > # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb:
> > > > > > gadget: uvc: cleanup request when not in correct state"
> > > > > > git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
> > > > > > # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec:
> > > > > > tps6598x: Refactor tps6598x port registration
> > > > > > git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
> > > > > > # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget:
> > > > > > u_ether: Replace netif_stop_queue with netif_device_detach
> > > > > > git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
> > > > > > # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip:
> > > > > > fix stub_dev hub disconnect
> > > > > > git bisect good 97475763484245916735a1aa9a3310a01d46b008
> > > > > > # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest:
> > > > > > fix a type promotion bug
> > > > > > git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
> > > > > > # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918]
> > > > > > usb: gadget: u_ether: Replace netif_stop_queue with
> > > > > > netif_device_detach
> > > > > > 
> > > > > > Note, revert indeed helps. Should I send a revert?
> > > > > > 
> > > > > > I use configfs to setup USB EEM function and it worked till this
> > > > > > commit.
> > > > > > If needed, I can share my scripts, but I believe it's not needed
> > > > > > as here
> > > > > > we see a clear regression.
> > > > > > 
> > > > > Without this patch, there may be a potential crash in a race
> > > > > condition, as __LINK_STATE_PRESENT is monitored at many places in
> > > > > the Network stack to determine the status of the link.
> > > > > 
> > > > > Could you please provide details on how this patch affects your
> > > > > functionality? Are you experiencing connection problems or data
> > > > > transfer interruptions?
> > > > In my case on mrfld (Intel Edison Arduino) using configfs with this
> > > > patch no config from host through dhcp is received. Manual setting
> > > > correct ipv4 addr / mask / gw still no connection.
> > > > 
> > > > > Instead of reverting this patch, consider trying the upcoming patch
> > > > > (soon to be available in the mainline) to see if it resolves your
> > > > > issue.
> > > > > 
> > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_2023122900-2Dcommence-2Dagenda-2Ddb2c-40gregkh_T_-23m36a812d3f1e5d744ee32381f6ae4185940b376de&d=DwICaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=4g6EtvkKKfp8YYHpU196b2HzQxCMgsAhuo8pFng3X4TCWdcOVEUCug2-l2hRfLyV&s=t82wZAFwm2FTSaacgsmSpZWvWEa89jN8GX-okIJrwFw&e=
> > > > > 
> > > > This patch works for me with v6.7.0.
> > > I need to revisit this. The patch in this topic landed in v6.7.0-rc1
> > > (f49449fbc21e) and breaks the gadget mrfld (Intel Edison Arduino) and other
> > > platforms as well.
> > > 
> > > The mentioned fix "usb: gadget: u_ether: Re-attach netif device to mirror
> > > detachment*" * has landed in v6.8.0-rc1 (76c945730). What it does fix: I am
> > > able to make a USB EEM function again.
> > > 
> > > However, now a hidden issue appears. With mrfld there is an external switch
> > > to easily switch between host and device mode.
> > > 
> > > What is not fixed:
> > > 
> > > - when in device mode and unplugging/plugging the cable when using `ifconfig
> > > usb0` the line "usb0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>" changes to
> > > "usb0: flags=4099<UP,BROADCAST,MULTICAST>" as is supposed to, the route
> > > table is updated and the dir `/sys/class/net/usb0` exists and in the dir
> > > `cat carrier*` shows the carrier up and down counts. This is the expected
> > > behavior.
> > > 
> > > - when in device mode and switching to host mode `ifconfig usb0` continues
> > > to show "RUNNING", the route table is not modified and the dir
> > > `/sys/class/net/usb0` no longer exists.
> > > 
> > > - switching to device mode again, USB EEM works fine, no changes to RUNNING
> > > or the route table happen and the dir `/sys/class/net/usb0` still is non-
> > > existing.
> > > 
> > > - unplugging/plugging the cable in device mode after this does not restore
> > > the original situation.
> > > 
> > > This behavior I tested on v6.9.0-rc2 (with a few unrelated but essential
> > > patches on top) and bisected back to this patch in v6.70-rc1.
> > > 
> > > It seems `netif_device_detach` does not completely clean up as expected and
> > > `netif_device_attach` does not completely rebuild.
> > > 
> > > I am wondering if on other platforms this can be reproduced? If so, inho it
> > > would be best to revert the both patches until the issue is resolved.
> > > 
> > > Thanks,
> > > 
> > > Ferry
> > I'm wondering why the /sys/class/net/usb0 directory is being removed when the network interface is still available.
> > This behavior seems not correct.
> 
> Exactly. And this didn't happen before the 2 patches.
> 
> To be precise: /sys/class/net/usb0 is not removed and it is a link, the link
> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
> longer exists
> 
> > The gether_cleanup function should remove the interface along with the associated kobject,
> > and this function should only be called during the unloading of the driver or deleting the gadget.
> > Could you please confirm whether any of your custom modifications are removing the net interface kobject?
> 
> As far as know not. I have one tusb1210 (usb phy) and 2 dwc3 patches,
> nothing related to gadgets or net.
> 
> For reference, patches are here: https://github.com/htot/meta-intel-edison/tree/nanbield/meta-intel-edison-bsp/recipes-kernel/linux/files
> 
> 0001-phy-ti-tusb1210-write-to-scratch-on-power-on.patch
> 
> 0001a-usb-dwc3-core-Fix-dwc3_core_soft_reset-before-anythi.patch
> 
> 044-REVERTME-usb-dwc3-gadget-skip-endpoints-ep-18-in-out.patch
> 
> Seems (just guessing) gether_cleanup is being called due to the switch to
> host mode, but cleanup only partly succeeds. Now I'm finding I can make the
> net interface disappear with `ip link set dev usb0 down` and the broken link
> goes away by destroying the gadget in configfs.
> 
> Is that intentional? Do I need to tear down the gadgets in reverse order as
> on create when switching to host mode? That would be new.
> 
> What happens when you trigger a switch to host mode without actually
> unplugging your cable?

Since there is no response, should we actually prepare revert next week?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-04-10 17:37                   ` Andy Shevchenko
@ 2024-04-11 14:26                     ` Hardik Gajjar
  2024-04-11 16:39                       ` Andy Shevchenko
  0 siblings, 1 reply; 32+ messages in thread
From: Hardik Gajjar @ 2024-04-11 14:26 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Ferry Toth, Hardik Gajjar, gregkh, s.hauer, jonathanh, linux-usb,
	linux-kernel, quic_linyyuan, paul, quic_eserrao, erosca

On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> > > On Wed, Apr 03, 2024 at 11:01:58PM +0200, Ferry Toth wrote:
> > > > Op 15-01-2024 om 21:10 schreef Ferry Toth:
> > > > > Op 15-01-2024 om 14:27 schreef Hardik Gajjar:
> > > > > > On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote:
> > > > > > > On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
> > > > > > > > This patch replaces the usage of netif_stop_queue with
> > > > > > > > netif_device_detach
> > > > > > > > in the u_ether driver. The netif_device_detach function not
> > > > > > > > only stops all
> > > > > > > > tx queues by calling netif_tx_stop_all_queues but also marks
> > > > > > > > the device as
> > > > > > > > removed by clearing the __LINK_STATE_PRESENT bit.
> > > > > > > > 
> > > > > > > > This change helps notify user space about the disconnection
> > > > > > > > of the device
> > > > > > > > more effectively, compared to netif_stop_queue, which only
> > > > > > > > stops a single
> > > > > > > > transmit queue.
> > > > > > > This change effectively broke my USB ether setup.
> > > > > > > 
> > > > > > > git bisect start
> > > > > > > # status: waiting for both good and bad commits
> > > > > > > # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag
> > > > > > > 'tty-6.7-rc1' of
> > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> > > > > > > git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
> > > > > > > # status: waiting for bad commit, 1 good commit known
> > > > > > > # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag
> > > > > > > 'usb-6.7-rc1' of
> > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
> > > > > > > git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
> > > > > > > # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix
> > > > > > > wrong data added to platform device
> > > > > > > git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
> > > > > > > # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb:
> > > > > > > xhci-mtk: add a bandwidth budget table
> > > > > > > git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
> > > > > > > # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb:
> > > > > > > gadget: uvc: cleanup request when not in correct state"
> > > > > > > git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
> > > > > > > # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec:
> > > > > > > tps6598x: Refactor tps6598x port registration
> > > > > > > git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
> > > > > > > # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget:
> > > > > > > u_ether: Replace netif_stop_queue with netif_device_detach
> > > > > > > git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
> > > > > > > # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip:
> > > > > > > fix stub_dev hub disconnect
> > > > > > > git bisect good 97475763484245916735a1aa9a3310a01d46b008
> > > > > > > # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest:
> > > > > > > fix a type promotion bug
> > > > > > > git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
> > > > > > > # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918]
> > > > > > > usb: gadget: u_ether: Replace netif_stop_queue with
> > > > > > > netif_device_detach
> > > > > > > 
> > > > > > > Note, revert indeed helps. Should I send a revert?
> > > > > > > 
> > > > > > > I use configfs to setup USB EEM function and it worked till this
> > > > > > > commit.
> > > > > > > If needed, I can share my scripts, but I believe it's not needed
> > > > > > > as here
> > > > > > > we see a clear regression.
> > > > > > > 
> > > > > > Without this patch, there may be a potential crash in a race
> > > > > > condition, as __LINK_STATE_PRESENT is monitored at many places in
> > > > > > the Network stack to determine the status of the link.
> > > > > > 
> > > > > > Could you please provide details on how this patch affects your
> > > > > > functionality? Are you experiencing connection problems or data
> > > > > > transfer interruptions?
> > > > > In my case on mrfld (Intel Edison Arduino) using configfs with this
> > > > > patch no config from host through dhcp is received. Manual setting
> > > > > correct ipv4 addr / mask / gw still no connection.
> > > > > 
> > > > > > Instead of reverting this patch, consider trying the upcoming patch
> > > > > > (soon to be available in the mainline) to see if it resolves your
> > > > > > issue.
> > > > > > 
> > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_2023122900-2Dcommence-2Dagenda-2Ddb2c-40gregkh_T_-23m36a812d3f1e5d744ee32381f6ae4185940b376de&d=DwICaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=4g6EtvkKKfp8YYHpU196b2HzQxCMgsAhuo8pFng3X4TCWdcOVEUCug2-l2hRfLyV&s=t82wZAFwm2FTSaacgsmSpZWvWEa89jN8GX-okIJrwFw&e=
> > > > > > 
> > > > > This patch works for me with v6.7.0.
> > > > I need to revisit this. The patch in this topic landed in v6.7.0-rc1
> > > > (f49449fbc21e) and breaks the gadget mrfld (Intel Edison Arduino) and other
> > > > platforms as well.
> > > > 
> > > > The mentioned fix "usb: gadget: u_ether: Re-attach netif device to mirror
> > > > detachment*" * has landed in v6.8.0-rc1 (76c945730). What it does fix: I am
> > > > able to make a USB EEM function again.
> > > > 
> > > > However, now a hidden issue appears. With mrfld there is an external switch
> > > > to easily switch between host and device mode.
> > > > 
> > > > What is not fixed:
> > > > 
> > > > - when in device mode and unplugging/plugging the cable when using `ifconfig
> > > > usb0` the line "usb0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>" changes to
> > > > "usb0: flags=4099<UP,BROADCAST,MULTICAST>" as is supposed to, the route
> > > > table is updated and the dir `/sys/class/net/usb0` exists and in the dir
> > > > `cat carrier*` shows the carrier up and down counts. This is the expected
> > > > behavior.
> > > > 
> > > > - when in device mode and switching to host mode `ifconfig usb0` continues
> > > > to show "RUNNING", the route table is not modified and the dir
> > > > `/sys/class/net/usb0` no longer exists.
> > > > 
> > > > - switching to device mode again, USB EEM works fine, no changes to RUNNING
> > > > or the route table happen and the dir `/sys/class/net/usb0` still is non-
> > > > existing.
> > > > 
> > > > - unplugging/plugging the cable in device mode after this does not restore
> > > > the original situation.
> > > > 
> > > > This behavior I tested on v6.9.0-rc2 (with a few unrelated but essential
> > > > patches on top) and bisected back to this patch in v6.70-rc1.
> > > > 
> > > > It seems `netif_device_detach` does not completely clean up as expected and
> > > > `netif_device_attach` does not completely rebuild.
> > > > 
> > > > I am wondering if on other platforms this can be reproduced? If so, inho it
> > > > would be best to revert the both patches until the issue is resolved.
> > > > 
> > > > Thanks,
> > > > 
> > > > Ferry
> > > I'm wondering why the /sys/class/net/usb0 directory is being removed when the network interface is still available.
> > > This behavior seems not correct.
> > 
> > Exactly. And this didn't happen before the 2 patches.
> > 
> > To be precise: /sys/class/net/usb0 is not removed and it is a link, the link
> > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
> > longer exists

So, it means that the /sys/class/net/usb0 is present, but the symlink is broken. In that case, the dwc3 driver should recreate the device, and the symlink should become active again
I have the dwc3 IP base usb controller, Let me check with this patch and share result here.
May be we need some fix in dwc3

> > 
> > > The gether_cleanup function should remove the interface along with the associated kobject,
> > > and this function should only be called during the unloading of the driver or deleting the gadget.
> > > Could you please confirm whether any of your custom modifications are removing the net interface kobject?
> > 
> > As far as know not. I have one tusb1210 (usb phy) and 2 dwc3 patches,
> > nothing related to gadgets or net.
> > 
> > For reference, patches are here: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_htot_meta-2Dintel-2Dedison_tree_nanbield_meta-2Dintel-2Dedison-2Dbsp_recipes-2Dkernel_linux_files&d=DwICAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=b4sDyiL5E5Uvzote95pWzkQE_qFYYrvfF766keBAL41H9ZrXG_fF_I7mAnRnI32b&s=1t88BV-wyxvDFZcsmuO0wtanAnpRPP9TSw3ysu6ZUgs&e=
> > 
> > 0001-phy-ti-tusb1210-write-to-scratch-on-power-on.patch
> > 
> > 0001a-usb-dwc3-core-Fix-dwc3_core_soft_reset-before-anythi.patch
> > 
> > 044-REVERTME-usb-dwc3-gadget-skip-endpoints-ep-18-in-out.patch
> > 
> > Seems (just guessing) gether_cleanup is being called due to the switch to
> > host mode, but cleanup only partly succeeds. Now I'm finding I can make the
> > net interface disappear with `ip link set dev usb0 down` and the broken link
> > goes away by destroying the gadget in configfs.
> > 
> > Is that intentional? Do I need to tear down the gadgets in reverse order as
> > on create when switching to host mode? That would be new.
> > 
> > What happens when you trigger a switch to host mode without actually
> > unplugging your cable?
> 
> Since there is no response, should we actually prepare revert next week?
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-04-11 14:26                     ` Hardik Gajjar
@ 2024-04-11 16:39                       ` Andy Shevchenko
  2024-04-11 20:52                         ` Ferry Toth
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Shevchenko @ 2024-04-11 16:39 UTC (permalink / raw)
  To: Hardik Gajjar
  Cc: Ferry Toth, gregkh, s.hauer, jonathanh, linux-usb, linux-kernel,
	quic_linyyuan, paul, quic_eserrao, erosca

On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:

...

> > > Exactly. And this didn't happen before the 2 patches.
> > > 
> > > To be precise: /sys/class/net/usb0 is not removed and it is a link, the link
> > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
> > > longer exists
> 
> So, it means that the /sys/class/net/usb0 is present, but the symlink is
> broken. In that case, the dwc3 driver should recreate the device, and the
> symlink should become active again
> I have the dwc3 IP base usb controller, Let me check with this patch and
> share result here.  May be we need some fix in dwc3

It's quite possible, please test on your side.
We are happy to test any fixes if you come up with.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-04-11 16:39                       ` Andy Shevchenko
@ 2024-04-11 20:52                         ` Ferry Toth
  2024-04-16 13:48                           ` Andy Shevchenko
  0 siblings, 1 reply; 32+ messages in thread
From: Ferry Toth @ 2024-04-11 20:52 UTC (permalink / raw)
  To: Andy Shevchenko, Hardik Gajjar
  Cc: gregkh, s.hauer, jonathanh, linux-usb, linux-kernel,
	quic_linyyuan, paul, quic_eserrao, erosca

Hi

Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> ...
>
>>>> Exactly. And this didn't happen before the 2 patches.
>>>>
>>>> To be precise: /sys/class/net/usb0 is not removed and it is a link, the link
>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
>>>> longer exists
>> So, it means that the /sys/class/net/usb0 is present, but the symlink is
>> broken. In that case, the dwc3 driver should recreate the device, and the
>> symlink should become active again

Yes, on first enabling gadget (when device mode is activated):

root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
driver  net  power  sound  subsystem  suspended  uevent

Then switching to host mode:

root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
ls: cannot access 
'/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No such 
file or directory

Then back to device mode:

root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
driver  power  sound  subsystem  suspended  uevent

net is missing. But, network functions:

root@yuna:~# ping 10.42.0.1
PING 10.42.0.1 (10.42.0.1): 56 data bytes

Mass storage device is created and removed each time as expected.

>> I have the dwc3 IP base usb controller, Let me check with this patch and
>> share result here.  May be we need some fix in dwc3
Would have been nice if someone could test on other controller as well. 
But another instance of dwc3 is also very welcome.
> It's quite possible, please test on your side.
> We are happy to test any fixes if you come up with.
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-04-11 20:52                         ` Ferry Toth
@ 2024-04-16 13:48                           ` Andy Shevchenko
  2024-04-17 15:13                             ` Hardik Gajjar
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Shevchenko @ 2024-04-16 13:48 UTC (permalink / raw)
  To: Ferry Toth
  Cc: Hardik Gajjar, gregkh, s.hauer, jonathanh, linux-usb,
	linux-kernel, quic_linyyuan, paul, quic_eserrao, erosca

On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:

...

> > > > > Exactly. And this didn't happen before the 2 patches.
> > > > > 
> > > > > To be precise: /sys/class/net/usb0 is not removed and it is a link, the link
> > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
> > > > > longer exists
> > > So, it means that the /sys/class/net/usb0 is present, but the symlink is
> > > broken. In that case, the dwc3 driver should recreate the device, and the
> > > symlink should become active again
> 
> Yes, on first enabling gadget (when device mode is activated):
> 
> root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> driver  net  power  sound  subsystem  suspended  uevent
> 
> Then switching to host mode:
> 
> root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> ls: cannot access
> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No such file
> or directory
> 
> Then back to device mode:
> 
> root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> driver  power  sound  subsystem  suspended  uevent
> 
> net is missing. But, network functions:
> 
> root@yuna:~# ping 10.42.0.1
> PING 10.42.0.1 (10.42.0.1): 56 data bytes
> 
> Mass storage device is created and removed each time as expected.

So, what's the conclusion? Shall we move towards revert of those two changes?

> > > I have the dwc3 IP base usb controller, Let me check with this patch and
> > > share result here.  May be we need some fix in dwc3
> Would have been nice if someone could test on other controller as well. But
> another instance of dwc3 is also very welcome.
> > It's quite possible, please test on your side.
> > We are happy to test any fixes if you come up with.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-04-16 13:48                           ` Andy Shevchenko
@ 2024-04-17 15:13                             ` Hardik Gajjar
  2024-04-25 21:27                               ` Ferry Toth
  0 siblings, 1 reply; 32+ messages in thread
From: Hardik Gajjar @ 2024-04-17 15:13 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Ferry Toth, Hardik Gajjar, gregkh, s.hauer, jonathanh, linux-usb,
	linux-kernel, quic_linyyuan, paul, quic_eserrao, erosca

On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> > Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> 
> ...
> 
> > > > > > Exactly. And this didn't happen before the 2 patches.
> > > > > > 
> > > > > > To be precise: /sys/class/net/usb0 is not removed and it is a link, the link
> > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
> > > > > > longer exists
> > > > So, it means that the /sys/class/net/usb0 is present, but the symlink is
> > > > broken. In that case, the dwc3 driver should recreate the device, and the
> > > > symlink should become active again
> > 
> > Yes, on first enabling gadget (when device mode is activated):
> > 
> > root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > driver  net  power  sound  subsystem  suspended  uevent
> > 
> > Then switching to host mode:
> > 
> > root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > ls: cannot access
> > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No such file
> > or directory
> > 
> > Then back to device mode:
> > 
> > root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > driver  power  sound  subsystem  suspended  uevent
> > 
> > net is missing. But, network functions:
> > 
> > root@yuna:~# ping 10.42.0.1
> > PING 10.42.0.1 (10.42.0.1): 56 data bytes
> > 
> > Mass storage device is created and removed each time as expected.
> 
> So, what's the conclusion? Shall we move towards revert of those two changes?


As promised, I have the tested the this patch with the dwc3 gadget. I could not reproduce
the issue. 

I can see the usb0 exist all the time and accessible regardless of the role switching of the USB mode (peripheral <-> host)

Following are the logs:
//Host to device

console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" > mode
console:/sys/bus/platform/devices/a800000.ssusb # ls a800000.dwc3/gadget/net/
usb0

//device to host
console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
console:/sys/bus/platform/devices/a800000.ssusb # ls a800000.dwc3/gadget/net/
usb0
s a800000.dwc3/gadget/net/usb0                                                <
addr_assign_type    duplex             phys_port_name
addr_len            flags              phys_switch_id
address             gro_flush_timeout  power
broadcast           ifalias            proto_down
carrier             ifindex            queues
carrier_changes     iflink             speed
carrier_down_count  link_mode          statistics
carrier_up_count    mtu                subsystem
dev_id              name_assign_type   tx_queue_len
dev_port            netdev_group       type
device              operstate          uevent
dormant             phys_port_id       waiting_for_supplier
console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 TX bytes:0

console:/sys/bus/platform/devices/a800000.ssusb #

I strongly advise against reverting the patch solely based on the observed issue of removing the /sys/class/net/usb0 directory while the usb0 interface remains available. 
Instead, I recommend enabling FTRACE to trace the functions involved and identify which faulty call is responsible for removing usb0.

According to current kernel architecture of u_ether driver, only gether_cleanup should remove the usb0 interface along with its kobject and sysfs interface.
I suggest sharing the analysis here to understand why this practice is not followed in your use case or driver ?

I am curious why the driver was developed without adhering to the kernel's gadget architecture.

> 
> > > > I have the dwc3 IP base usb controller, Let me check with this patch and
> > > > share result here.  May be we need some fix in dwc3
> > Would have been nice if someone could test on other controller as well. But
> > another instance of dwc3 is also very welcome.
> > > It's quite possible, please test on your side.
> > > We are happy to test any fixes if you come up with.
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-04-17 15:13                             ` Hardik Gajjar
@ 2024-04-25 21:27                               ` Ferry Toth
  2024-04-28 21:07                                 ` Ferry Toth
  0 siblings, 1 reply; 32+ messages in thread
From: Ferry Toth @ 2024-04-25 21:27 UTC (permalink / raw)
  To: Hardik Gajjar, Andy Shevchenko
  Cc: gregkh, s.hauer, jonathanh, linux-usb, linux-kernel,
	quic_linyyuan, paul, quic_eserrao, erosca

Hi,

Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>
>> ...
>>
>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>
>>>>>>> To be precise: /sys/class/net/usb0 is not removed and it is a link, the link
>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
>>>>>>> longer exists
>>>>> So, it means that the /sys/class/net/usb0 is present, but the symlink is
>>>>> broken. In that case, the dwc3 driver should recreate the device, and the
>>>>> symlink should become active again
>>>
>>> Yes, on first enabling gadget (when device mode is activated):
>>>
>>> root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>> driver  net  power  sound  subsystem  suspended  uevent
>>>
>>> Then switching to host mode:
>>>
>>> root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>> ls: cannot access
>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No such file
>>> or directory
>>>
>>> Then back to device mode:
>>>
>>> root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>> driver  power  sound  subsystem  suspended  uevent
>>>
>>> net is missing. But, network functions:
>>>
>>> root@yuna:~# ping 10.42.0.1
>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>
>>> Mass storage device is created and removed each time as expected.
>>
>> So, what's the conclusion? Shall we move towards revert of those two changes?
> 
> 
> As promised, I have the tested the this patch with the dwc3 gadget. I could not reproduce
> the issue.
> 
> I can see the usb0 exist all the time and accessible regardless of the role switching of the USB mode (peripheral <-> host)
> 
> Following are the logs:
> //Host to device
> 
> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" > mode
> console:/sys/bus/platform/devices/a800000.ssusb # ls a800000.dwc3/gadget/net/
> usb0
> 
> //device to host
> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
> console:/sys/bus/platform/devices/a800000.ssusb # ls a800000.dwc3/gadget/net/
> usb0

That is weird. When I switch to host mode (using the physical switch), 
the whole gadget directory is removed (now testing 6.9.0-rc5)

Switching back to device mode, that gadget directory is recreated. And 
gadget/sound as well, but not gadget/net.

> s a800000.dwc3/gadget/net/usb0                                                <
> addr_assign_type    duplex             phys_port_name
> addr_len            flags              phys_switch_id
> address             gro_flush_timeout  power
> broadcast           ifalias            proto_down
> carrier             ifindex            queues
> carrier_changes     iflink             speed
> carrier_down_count  link_mode          statistics
> carrier_up_count    mtu                subsystem
> dev_id              name_assign_type   tx_queue_len
> dev_port            netdev_group       type
> device              operstate          uevent
> dormant             phys_port_id       waiting_for_supplier
> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>            BROADCAST MULTICAST  MTU:1500  Metric:1
>            RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>            collisions:0 txqueuelen:1000
>            RX bytes:0 TX bytes:0
> 
> console:/sys/bus/platform/devices/a800000.ssusb #
> 
> I strongly advise against reverting the patch solely based on the observed issue of removing the /sys/class/net/usb0 directory while the usb0 interface remains available.

There's more to it. I also mentioned that switching the role or 
unplugging the cable leaves the usb0 connection.

I have while in host mode:
root@yuna:~# ifconfig -a usb0
usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
         inet 10.42.0.221  netmask 255.255.255.0  broadcast 10.42.0.255
         inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64  scopeid 0x20<link>


You don't see that because you didn't create a connection at all.

> Instead, I recommend enabling FTRACE to trace the functions involved and identify which faulty call is responsible for removing usb0.

Switching from device -> host -> device:

root@yuna:~# trace-cmd record -p function_graph -l *gether_*
   plugin 'function_graph'
Hit Ctrl^C to stop recording
^CCPU0 data recorded at offset=0x1c8000
     188 bytes in size (4096 uncompressed)
CPU1 data recorded at offset=0x1c9000
     0 bytes in size (0 uncompressed)
root@yuna:~# trace-cmd report
cpus=2
      irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      # 
2079.480 us |  gether_disconnect();
      irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      + 
11.640 us  |  gether_disconnect();
      irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      ! 
116.520 us |  gether_connect();
      irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      # 
2057.260 us |  gether_disconnect();
      irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      ! 
109.000 us |  gether_connect();


> According to current kernel architecture of u_ether driver, only gether_cleanup should remove the usb0 interface along with its kobject and sysfs interface.
> I suggest sharing the analysis here to understand why this practice is not followed in your use case or driver ?

Yes, I'll try to trace where that happens.

Nevertheless, the disappearance of the net/usb0 directory seems 
harmless? But the usb: net device remaining after disconnect or role 
switch is not good, as the route remains.

May be they are 2 separate problems. Could you try to reproduce what 
happens if you make eem connection and then unplug?

> I am curious why the driver was developed without adhering to the kernel's gadget architecture.
> 
>>
>>>>> I have the dwc3 IP base usb controller, Let me check with this patch and
>>>>> share result here.  May be we need some fix in dwc3
>>> Would have been nice if someone could test on other controller as well. But
>>> another instance of dwc3 is also very welcome.
>>>> It's quite possible, please test on your side.
>>>> We are happy to test any fixes if you come up with.
>>
>> -- 
>> With Best Regards,
>> Andy Shevchenko
>>
>>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-04-25 21:27                               ` Ferry Toth
@ 2024-04-28 21:07                                 ` Ferry Toth
  2024-04-30 15:32                                   ` Hardik Gajjar
  0 siblings, 1 reply; 32+ messages in thread
From: Ferry Toth @ 2024-04-28 21:07 UTC (permalink / raw)
  To: Hardik Gajjar, Andy Shevchenko
  Cc: gregkh, s.hauer, jonathanh, linux-usb, linux-kernel,
	quic_linyyuan, paul, quic_eserrao, erosca

Hi,

Op 25-04-2024 om 23:27 schreef Ferry Toth:
> Hi,
> 
> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>
>>> ...
>>>
>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>
>>>>>>>> To be precise: /sys/class/net/usb0 is not removed and it is a 
>>>>>>>> link, the link
>>>>>>>> target 
>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
>>>>>>>> longer exists
>>>>>> So, it means that the /sys/class/net/usb0 is present, but the 
>>>>>> symlink is
>>>>>> broken. In that case, the dwc3 driver should recreate the device, 
>>>>>> and the
>>>>>> symlink should become active again
>>>>
>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>
>>>> root@yuna:~# ls 
>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>
>>>> Then switching to host mode:
>>>>
>>>> root@yuna:~# ls 
>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>> ls: cannot access
>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No 
>>>> such file
>>>> or directory
>>>>
>>>> Then back to device mode:
>>>>
>>>> root@yuna:~# ls 
>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>> driver  power  sound  subsystem  suspended  uevent
>>>>
>>>> net is missing. But, network functions:
>>>>
>>>> root@yuna:~# ping 10.42.0.1
>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>
>>>> Mass storage device is created and removed each time as expected.
>>>
>>> So, what's the conclusion? Shall we move towards revert of those two 
>>> changes?
>>
>>
>> As promised, I have the tested the this patch with the dwc3 gadget. I 
>> could not reproduce
>> the issue.
>>
>> I can see the usb0 exist all the time and accessible regardless of the 
>> role switching of the USB mode (peripheral <-> host)
>>
>> Following are the logs:
>> //Host to device
>>
>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" > 
>> mode
>> console:/sys/bus/platform/devices/a800000.ssusb # ls 
>> a800000.dwc3/gadget/net/
>> usb0
>>
>> //device to host
>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
>> console:/sys/bus/platform/devices/a800000.ssusb # ls 
>> a800000.dwc3/gadget/net/
>> usb0
> 
> That is weird. When I switch to host mode (using the physical switch), 
> the whole gadget directory is removed (now testing 6.9.0-rc5)
> 
> Switching back to device mode, that gadget directory is recreated. And 
> gadget/sound as well, but not gadget/net.
> 
>> s 
>> a800000.dwc3/gadget/net/usb0                                                <
>> addr_assign_type    duplex             phys_port_name
>> addr_len            flags              phys_switch_id
>> address             gro_flush_timeout  power
>> broadcast           ifalias            proto_down
>> carrier             ifindex            queues
>> carrier_changes     iflink             speed
>> carrier_down_count  link_mode          statistics
>> carrier_up_count    mtu                subsystem
>> dev_id              name_assign_type   tx_queue_len
>> dev_port            netdev_group       type
>> device              operstate          uevent
>> dormant             phys_port_id       waiting_for_supplier
>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>            BROADCAST MULTICAST  MTU:1500  Metric:1
>>            RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>            collisions:0 txqueuelen:1000
>>            RX bytes:0 TX bytes:0
>>
>> console:/sys/bus/platform/devices/a800000.ssusb #
>>
>> I strongly advise against reverting the patch solely based on the 
>> observed issue of removing the /sys/class/net/usb0 directory while the 
>> usb0 interface remains available.
> 
> There's more to it. I also mentioned that switching the role or 
> unplugging the cable leaves the usb0 connection.
> 
> I have while in host mode:
> root@yuna:~# ifconfig -a usb0
> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>          inet 10.42.0.221  netmask 255.255.255.0  broadcast 10.42.0.255
>          inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64  scopeid 0x20<link>
> 
> 
> You don't see that because you didn't create a connection at all.
> 
>> Instead, I recommend enabling FTRACE to trace the functions involved 
>> and identify which faulty call is responsible for removing usb0.
> 
> Switching from device -> host -> device:
> 
> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>    plugin 'function_graph'
> Hit Ctrl^C to stop recording
> ^CCPU0 data recorded at offset=0x1c8000
>      188 bytes in size (4096 uncompressed)
> CPU1 data recorded at offset=0x1c9000
>      0 bytes in size (0 uncompressed)
> root@yuna:~# trace-cmd report
> cpus=2
>       irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      # 
> 2079.480 us |  gether_disconnect();
>       irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      + 
> 11.640 us  |  gether_disconnect();
>       irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      ! 
> 116.520 us |  gether_connect();
>       irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      # 
> 2057.260 us |  gether_disconnect();
>       irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      ! 
> 109.000 us |  gether_connect();

I tried to get a more useful trace:
root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
root@yuna:/sys/kernel/tracing# echo function > current_tracer
root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
-> switch to host mode then back to device
root@yuna:/sys/kernel/tracing# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 53/53   #P:2
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
      irq/68-dwc3-523     [000] D..3.   133.990254: reset_config 
<-__composite_disconnect
      irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable 
<-reset_config
      irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect 
<-reset_config
      kworker/1:3-443     [001] ...1.   134.022453: eem_unbind 
<-purge_configs_funcs

-> to device mode

      kworker/1:3-443     [001] ...1.   148.630773: eem_bind 
<-usb_add_function
      irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt 
<-composite_setup
      irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect 
<-eem_set_alt
      irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect 
<-eem_set_alt
      irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt 
<-composite_setup
      irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect 
<-eem_set_alt
      irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect 
<-eem_set_alt
      irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap <-rx_complete
...

I don't know where to look exactly. Any hints?

> 
>> According to current kernel architecture of u_ether driver, only 
>> gether_cleanup should remove the usb0 interface along with its kobject 
>> and sysfs interface.
>> I suggest sharing the analysis here to understand why this practice is 
>> not followed in your use case or driver ?
> 
> Yes, I'll try to trace where that happens.
> 
> Nevertheless, the disappearance of the net/usb0 directory seems 
> harmless? But the usb: net device remaining after disconnect or role 
> switch is not good, as the route remains.
> 
> May be they are 2 separate problems. Could you try to reproduce what 
> happens if you make eem connection and then unplug?
> 
>> I am curious why the driver was developed without adhering to the 
>> kernel's gadget architecture.

I don't know what you mean here. Which driver do you mean?

>>>
>>>>>> I have the dwc3 IP base usb controller, Let me check with this 
>>>>>> patch and
>>>>>> share result here.  May be we need some fix in dwc3
>>>> Would have been nice if someone could test on other controller as 
>>>> well. But
>>>> another instance of dwc3 is also very welcome.
>>>>> It's quite possible, please test on your side.
>>>>> We are happy to test any fixes if you come up with.
>>>
>>> -- 
>>> With Best Regards,
>>> Andy Shevchenko
>>>
>>>
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-04-28 21:07                                 ` Ferry Toth
@ 2024-04-30 15:32                                   ` Hardik Gajjar
  2024-04-30 19:40                                     ` Ferry Toth
  0 siblings, 1 reply; 32+ messages in thread
From: Hardik Gajjar @ 2024-04-30 15:32 UTC (permalink / raw)
  To: Ferry Toth
  Cc: Hardik Gajjar, Andy Shevchenko, gregkh, s.hauer, jonathanh,
	linux-usb, linux-kernel, quic_linyyuan, paul, quic_eserrao,
	erosca

On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
> Hi,
> 
> Op 25-04-2024 om 23:27 schreef Ferry Toth:
> > Hi,
> > 
> > Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > > > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> > > > > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> > > > 
> > > > ...
> > > > 
> > > > > > > > > Exactly. And this didn't happen before the 2 patches.
> > > > > > > > > 
> > > > > > > > > To be precise: /sys/class/net/usb0 is not
> > > > > > > > > removed and it is a link, the link
> > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
> > > > > > > > > no
> > > > > > > > > longer exists
> > > > > > > So, it means that the /sys/class/net/usb0 is
> > > > > > > present, but the symlink is
> > > > > > > broken. In that case, the dwc3 driver should
> > > > > > > recreate the device, and the
> > > > > > > symlink should become active again
> > > > > 
> > > > > Yes, on first enabling gadget (when device mode is activated):
> > > > > 
> > > > > root@yuna:~# ls
> > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > driver  net  power  sound  subsystem  suspended  uevent
> > > > > 
> > > > > Then switching to host mode:
> > > > > 
> > > > > root@yuna:~# ls
> > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > ls: cannot access
> > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
> > > > > No such file
> > > > > or directory
> > > > > 
> > > > > Then back to device mode:
> > > > > 
> > > > > root@yuna:~# ls
> > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > driver  power  sound  subsystem  suspended  uevent
> > > > > 
> > > > > net is missing. But, network functions:
> > > > > 
> > > > > root@yuna:~# ping 10.42.0.1
> > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes
> > > > > 
> > > > > Mass storage device is created and removed each time as expected.
> > > > 
> > > > So, what's the conclusion? Shall we move towards revert of those
> > > > two changes?
> > > 
> > > 
> > > As promised, I have the tested the this patch with the dwc3 gadget.
> > > I could not reproduce
> > > the issue.
> > > 
> > > I can see the usb0 exist all the time and accessible regardless of
> > > the role switching of the USB mode (peripheral <-> host)
> > > 
> > > Following are the logs:
> > > //Host to device
> > > 
> > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
> > > > mode
> > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > a800000.dwc3/gadget/net/
> > > usb0
> > > 
> > > //device to host
> > > console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
> > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > a800000.dwc3/gadget/net/
> > > usb0
> > 
> > That is weird. When I switch to host mode (using the physical switch),
> > the whole gadget directory is removed (now testing 6.9.0-rc5)
> > 
> > Switching back to device mode, that gadget directory is recreated. And
> > gadget/sound as well, but not gadget/net.
> > 
> > > s a800000.dwc3/gadget/net/usb0                                               
> > > <
> > > addr_assign_type    duplex             phys_port_name
> > > addr_len            flags              phys_switch_id
> > > address             gro_flush_timeout  power
> > > broadcast           ifalias            proto_down
> > > carrier             ifindex            queues
> > > carrier_changes     iflink             speed
> > > carrier_down_count  link_mode          statistics
> > > carrier_up_count    mtu                subsystem
> > > dev_id              name_assign_type   tx_queue_len
> > > dev_port            netdev_group       type
> > > device              operstate          uevent
> > > dormant             phys_port_id       waiting_for_supplier
> > > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
> > > usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
> > >            BROADCAST MULTICAST  MTU:1500  Metric:1
> > >            RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > >            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > >            collisions:0 txqueuelen:1000
> > >            RX bytes:0 TX bytes:0
> > > 
> > > console:/sys/bus/platform/devices/a800000.ssusb #
> > > 
> > > I strongly advise against reverting the patch solely based on the
> > > observed issue of removing the /sys/class/net/usb0 directory while
> > > the usb0 interface remains available.
> > 
> > There's more to it. I also mentioned that switching the role or
> > unplugging the cable leaves the usb0 connection.
> > 
> > I have while in host mode:
> > root@yuna:~# ifconfig -a usb0
> > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
> >          inet 10.42.0.221  netmask 255.255.255.0  broadcast 10.42.0.255
> >          inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64  scopeid 0x20<link>
> > 
> > 
> > You don't see that because you didn't create a connection at all.
> > 
> > > Instead, I recommend enabling FTRACE to trace the functions involved
> > > and identify which faulty call is responsible for removing usb0.
> > 
> > Switching from device -> host -> device:
> > 
> > root@yuna:~# trace-cmd record -p function_graph -l *gether_*
> >    plugin 'function_graph'
> > Hit Ctrl^C to stop recording
> > ^CCPU0 data recorded at offset=0x1c8000
> >      188 bytes in size (4096 uncompressed)
> > CPU1 data recorded at offset=0x1c9000
> >      0 bytes in size (0 uncompressed)
> > root@yuna:~# trace-cmd report
> > cpus=2
> >       irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
> > 2079.480 us |  gether_disconnect();
> >       irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
> > 11.640 us  |  gether_disconnect();
> >       irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
> > 116.520 us |  gether_connect();
> >       irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
> > 2057.260 us |  gether_disconnect();
> >       irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
> > 109.000 us |  gether_connect();
> 
> I tried to get a more useful trace:
> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
> root@yuna:/sys/kernel/tracing# echo function > current_tracer
> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
> -> switch to host mode then back to device
> root@yuna:/sys/kernel/tracing# cat trace
> # tracer: function
> #
> # entries-in-buffer/entries-written: 53/53   #P:2
> #
> #                                _-----=> irqs-off/BH-disabled
> #                               / _----=> need-resched
> #                              | / _---=> hardirq/softirq
> #                              || / _--=> preempt-depth
> #                              ||| / _-=> migrate-disable
> #                              |||| /     delay
> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> #              | |         |   |||||     |         |
>      irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
> <-__composite_disconnect
>      irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
> <-reset_config
>      irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
> <-reset_config
>      kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
> <-purge_configs_funcs
> 
> -> to device mode
> 
>      kworker/1:3-443     [001] ...1.   148.630773: eem_bind
> <-usb_add_function
>      irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
> <-composite_setup
>      irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
> <-eem_set_alt
>      irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
> <-eem_set_alt
>      irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
> <-composite_setup
>      irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
> <-eem_set_alt
>      irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
> <-eem_set_alt
>      irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap <-rx_complete
> ...
> 
> I don't know where to look exactly. Any hints?

do you see anything related to gether_cleanup() after eem_unbind() ?
If not then, you may try to enable tracing of TCP/IP stack and network side to check who deleting the sysfs entry

Hardik


> 
> > 
> > > According to current kernel architecture of u_ether driver, only
> > > gether_cleanup should remove the usb0 interface along with its
> > > kobject and sysfs interface.
> > > I suggest sharing the analysis here to understand why this practice
> > > is not followed in your use case or driver ?
> > 
> > Yes, I'll try to trace where that happens.
> > 
> > Nevertheless, the disappearance of the net/usb0 directory seems
> > harmless? But the usb: net device remaining after disconnect or role
> > switch is not good, as the route remains.
> > 
> > May be they are 2 separate problems. Could you try to reproduce what
> > happens if you make eem connection and then unplug?
> > 
> > > I am curious why the driver was developed without adhering to the
> > > kernel's gadget architecture.
> 
> I don't know what you mean here. Which driver do you mean?
> 
> > > > 
> > > > > > > I have the dwc3 IP base usb controller, Let me check
> > > > > > > with this patch and
> > > > > > > share result here.  May be we need some fix in dwc3
> > > > > Would have been nice if someone could test on other
> > > > > controller as well. But
> > > > > another instance of dwc3 is also very welcome.
> > > > > > It's quite possible, please test on your side.
> > > > > > We are happy to test any fixes if you come up with.
> > > > 
> > > > -- 
> > > > With Best Regards,
> > > > Andy Shevchenko
> > > > 
> > > > 
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-04-30 15:32                                   ` Hardik Gajjar
@ 2024-04-30 19:40                                     ` Ferry Toth
  2024-04-30 21:12                                       ` Ferry Toth
  0 siblings, 1 reply; 32+ messages in thread
From: Ferry Toth @ 2024-04-30 19:40 UTC (permalink / raw)
  To: Hardik Gajjar
  Cc: Andy Shevchenko, gregkh, s.hauer, jonathanh, linux-usb,
	linux-kernel, quic_linyyuan, paul, quic_eserrao, erosca

Hi,

Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>> Hi,
>>
>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>> Hi,
>>>
>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>
>>>>> ...
>>>>>
>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>
>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>> no
>>>>>>>>>> longer exists
>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>> present, but the symlink is
>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>> recreate the device, and the
>>>>>>>> symlink should become active again
>>>>>>
>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>
>>>>>> root@yuna:~# ls
>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>
>>>>>> Then switching to host mode:
>>>>>>
>>>>>> root@yuna:~# ls
>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>> ls: cannot access
>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>> No such file
>>>>>> or directory
>>>>>>
>>>>>> Then back to device mode:
>>>>>>
>>>>>> root@yuna:~# ls
>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>
>>>>>> net is missing. But, network functions:
>>>>>>
>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>
>>>>>> Mass storage device is created and removed each time as expected.
>>>>>
>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>> two changes?
>>>>
>>>>
>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>> I could not reproduce
>>>> the issue.
>>>>
>>>> I can see the usb0 exist all the time and accessible regardless of
>>>> the role switching of the USB mode (peripheral <-> host)
>>>>
>>>> Following are the logs:
>>>> //Host to device
>>>>
>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>> mode
>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>> a800000.dwc3/gadget/net/
>>>> usb0
>>>>
>>>> //device to host
>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>> a800000.dwc3/gadget/net/
>>>> usb0
>>>
>>> That is weird. When I switch to host mode (using the physical switch),
>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>
>>> Switching back to device mode, that gadget directory is recreated. And
>>> gadget/sound as well, but not gadget/net.
>>>
>>>> s a800000.dwc3/gadget/net/usb0
>>>> <
>>>> addr_assign_type    duplex             phys_port_name
>>>> addr_len            flags              phys_switch_id
>>>> address             gro_flush_timeout  power
>>>> broadcast           ifalias            proto_down
>>>> carrier             ifindex            queues
>>>> carrier_changes     iflink             speed
>>>> carrier_down_count  link_mode          statistics
>>>> carrier_up_count    mtu                subsystem
>>>> dev_id              name_assign_type   tx_queue_len
>>>> dev_port            netdev_group       type
>>>> device              operstate          uevent
>>>> dormant             phys_port_id       waiting_for_supplier
>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>             BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>             collisions:0 txqueuelen:1000
>>>>             RX bytes:0 TX bytes:0
>>>>
>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>
>>>> I strongly advise against reverting the patch solely based on the
>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>> the usb0 interface remains available.
>>>
>>> There's more to it. I also mentioned that switching the role or
>>> unplugging the cable leaves the usb0 connection.
>>>
>>> I have while in host mode:
>>> root@yuna:~# ifconfig -a usb0
>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>           inet 10.42.0.221  netmask 255.255.255.0  broadcast 10.42.0.255
>>>           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64  scopeid 0x20<link>
>>>
>>>
>>> You don't see that because you didn't create a connection at all.
>>>
>>>> Instead, I recommend enabling FTRACE to trace the functions involved
>>>> and identify which faulty call is responsible for removing usb0.
>>>
>>> Switching from device -> host -> device:
>>>
>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>     plugin 'function_graph'
>>> Hit Ctrl^C to stop recording
>>> ^CCPU0 data recorded at offset=0x1c8000
>>>       188 bytes in size (4096 uncompressed)
>>> CPU1 data recorded at offset=0x1c9000
>>>       0 bytes in size (0 uncompressed)
>>> root@yuna:~# trace-cmd report
>>> cpus=2
>>>        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
>>> 2079.480 us |  gether_disconnect();
>>>        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
>>> 11.640 us  |  gether_disconnect();
>>>        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
>>> 116.520 us |  gether_connect();
>>>        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
>>> 2057.260 us |  gether_disconnect();
>>>        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
>>> 109.000 us |  gether_connect();
>>
>> I tried to get a more useful trace:
>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
>> -> switch to host mode then back to device
>> root@yuna:/sys/kernel/tracing# cat trace
>> # tracer: function
>> #
>> # entries-in-buffer/entries-written: 53/53   #P:2
>> #
>> #                                _-----=> irqs-off/BH-disabled
>> #                               / _----=> need-resched
>> #                              | / _---=> hardirq/softirq
>> #                              || / _--=> preempt-depth
>> #                              ||| / _-=> migrate-disable
>> #                              |||| /     delay
>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>> #              | |         |   |||||     |         |
>>       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>> <-__composite_disconnect
>>       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>> <-reset_config
>>       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
>> <-reset_config
>>       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>> <-purge_configs_funcs
>>
>> -> to device mode
>>
>>       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>> <-usb_add_function
>>       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>> <-composite_setup
>>       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
>> <-eem_set_alt
>>       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
>> <-eem_set_alt
>>       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>> <-composite_setup
>>       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
>> <-eem_set_alt
>>       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
>> <-eem_set_alt
>>       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap <-rx_complete
>> ...
>>
>> I don't know where to look exactly. Any hints?
> 
> do you see anything related to gether_cleanup() after eem_unbind() ?

Nope. It's a pitty that the trace formatting got messed up above. But as 
you can see I traced gether_* and eem_*. After eem_unbind no traced 
function is called, until I flip the switch to device mode.
The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete 
and eem_wrap <-eth_start_xmit lines.

> If not then, you may try to enable tracing of TCP/IP stack and network side to check who deleting the sysfs entry

Yes, that's a vast amount of functions to trace. And I don't see how 
that would be related to the patch we're discussing here. I was hoping 
for a little more targeted hint.

You may recall the whole issue did not occur before this patch got applied.

> Hardik
> 
> 
>>
>>>
>>>> According to current kernel architecture of u_ether driver, only
>>>> gether_cleanup should remove the usb0 interface along with its
>>>> kobject and sysfs interface.
>>>> I suggest sharing the analysis here to understand why this practice
>>>> is not followed in your use case or driver ?
>>>
>>> Yes, I'll try to trace where that happens.
>>>
>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>> harmless? But the usb: net device remaining after disconnect or role
>>> switch is not good, as the route remains.
>>>
>>> May be they are 2 separate problems. Could you try to reproduce what
>>> happens if you make eem connection and then unplug?
>>>
>>>> I am curious why the driver was developed without adhering to the
>>>> kernel's gadget architecture.
>>
>> I don't know what you mean here. Which driver do you mean?
>>
>>>>>
>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>> with this patch and
>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>> Would have been nice if someone could test on other
>>>>>> controller as well. But
>>>>>> another instance of dwc3 is also very welcome.
>>>>>>> It's quite possible, please test on your side.
>>>>>>> We are happy to test any fixes if you come up with.
>>>>>
>>>>> -- 
>>>>> With Best Regards,
>>>>> Andy Shevchenko
>>>>>
>>>>>
>>>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-04-30 19:40                                     ` Ferry Toth
@ 2024-04-30 21:12                                       ` Ferry Toth
  2024-05-02 15:29                                         ` Hardik Gajjar
  0 siblings, 1 reply; 32+ messages in thread
From: Ferry Toth @ 2024-04-30 21:12 UTC (permalink / raw)
  To: Hardik Gajjar
  Cc: Andy Shevchenko, gregkh, s.hauer, jonathanh, linux-usb,
	linux-kernel, quic_linyyuan, paul, quic_eserrao, erosca

Hi,

Op 30-04-2024 om 21:40 schreef Ferry Toth:
> Hi,
> 
> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>> Hi,
>>>
>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>> Hi,
>>>>
>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>
>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>> target 
>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>> no
>>>>>>>>>>> longer exists
>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>> present, but the symlink is
>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>> recreate the device, and the
>>>>>>>>> symlink should become active again
>>>>>>>
>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>
>>>>>>> root@yuna:~# ls
>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>
>>>>>>> Then switching to host mode:
>>>>>>>
>>>>>>> root@yuna:~# ls
>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>> ls: cannot access
>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>> No such file
>>>>>>> or directory
>>>>>>>
>>>>>>> Then back to device mode:
>>>>>>>
>>>>>>> root@yuna:~# ls
>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>
>>>>>>> net is missing. But, network functions:
>>>>>>>
>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>
>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>
>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>> two changes?
>>>>>
>>>>>
>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>> I could not reproduce
>>>>> the issue.
>>>>>
>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>
>>>>> Following are the logs:
>>>>> //Host to device
>>>>>
>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>> mode
>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>> a800000.dwc3/gadget/net/
>>>>> usb0
>>>>>
>>>>> //device to host
>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>> a800000.dwc3/gadget/net/
>>>>> usb0
>>>>
>>>> That is weird. When I switch to host mode (using the physical switch),
>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>
>>>> Switching back to device mode, that gadget directory is recreated. And
>>>> gadget/sound as well, but not gadget/net.
>>>>
>>>>> s a800000.dwc3/gadget/net/usb0
>>>>> <
>>>>> addr_assign_type    duplex             phys_port_name
>>>>> addr_len            flags              phys_switch_id
>>>>> address             gro_flush_timeout  power
>>>>> broadcast           ifalias            proto_down
>>>>> carrier             ifindex            queues
>>>>> carrier_changes     iflink             speed
>>>>> carrier_down_count  link_mode          statistics
>>>>> carrier_up_count    mtu                subsystem
>>>>> dev_id              name_assign_type   tx_queue_len
>>>>> dev_port            netdev_group       type
>>>>> device              operstate          uevent
>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>             BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>             collisions:0 txqueuelen:1000
>>>>>             RX bytes:0 TX bytes:0
>>>>>
>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>
>>>>> I strongly advise against reverting the patch solely based on the
>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>> the usb0 interface remains available.
>>>>
>>>> There's more to it. I also mentioned that switching the role or
>>>> unplugging the cable leaves the usb0 connection.
>>>>
>>>> I have while in host mode:
>>>> root@yuna:~# ifconfig -a usb0
>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>>           inet 10.42.0.221  netmask 255.255.255.0  broadcast 
>>>> 10.42.0.255
>>>>           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64  scopeid 
>>>> 0x20<link>
>>>>
>>>>
>>>> You don't see that because you didn't create a connection at all.
>>>>
>>>>> Instead, I recommend enabling FTRACE to trace the functions involved
>>>>> and identify which faulty call is responsible for removing usb0.
>>>>
>>>> Switching from device -> host -> device:
>>>>
>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>     plugin 'function_graph'
>>>> Hit Ctrl^C to stop recording
>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>       188 bytes in size (4096 uncompressed)
>>>> CPU1 data recorded at offset=0x1c9000
>>>>       0 bytes in size (0 uncompressed)
>>>> root@yuna:~# trace-cmd report
>>>> cpus=2
>>>>        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
>>>> 2079.480 us |  gether_disconnect();
>>>>        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
>>>> 11.640 us  |  gether_disconnect();
>>>>        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
>>>> 116.520 us |  gether_connect();
>>>>        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
>>>> 2057.260 us |  gether_disconnect();
>>>>        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
>>>> 109.000 us |  gether_connect();
>>>
>>> I tried to get a more useful trace:
>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
>>> -> switch to host mode then back to device
>>> root@yuna:/sys/kernel/tracing# cat trace
>>> # tracer: function
>>> #
>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>> #
>>> #                                _-----=> irqs-off/BH-disabled
>>> #                               / _----=> need-resched
>>> #                              | / _---=> hardirq/softirq
>>> #                              || / _--=> preempt-depth
>>> #                              ||| / _-=> migrate-disable
>>> #                              |||| /     delay
>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>> #              | |         |   |||||     |         |
>>>       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>> <-__composite_disconnect
>>>       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>> <-reset_config
>>>       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
>>> <-reset_config
>>>       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>> <-purge_configs_funcs
>>>
>>> -> to device mode
>>>
>>>       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>> <-usb_add_function
>>>       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>> <-composite_setup
>>>       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
>>> <-eem_set_alt
>>>       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
>>> <-eem_set_alt
>>>       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>> <-composite_setup
>>>       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
>>> <-eem_set_alt
>>>       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
>>> <-eem_set_alt
>>>       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap 
>>> <-rx_complete
>>> ...
>>>
>>> I don't know where to look exactly. Any hints?
>>
>> do you see anything related to gether_cleanup() after eem_unbind() ?
> 
> Nope. It's a pitty that the trace formatting got messed up above. But as 
> you can see I traced gether_* and eem_*. After eem_unbind no traced 
> function is called, until I flip the switch to device mode.
> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete 
> and eem_wrap <-eth_start_xmit lines.
> 
>> If not then, you may try to enable tracing of TCP/IP stack and network 
>> side to check who deleting the sysfs entry
> 
> Yes, that's a vast amount of functions to trace. And I don't see how 
> that would be related to the patch we're discussing here. I was hoping 
> for a little more targeted hint.

Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 
'usb_fun*', 'reset_config' and 'device_remove_file' leads me to:

TIMESTAMP  FUNCTION
    |         |
   49.952477: eem_wrap <-eth_start_xmit
   55.072455: eem_wrap <-eth_start_xmit
   55.072621: eem_unwrap <-rx_complete
   59.011540: configfs_composite_reset <-usb_gadget_udc_reset
   59.011545: composite_reset <-configfs_composite_reset
   59.011548: reset_config <-__composite_disconnect
   59.013565: eem_disable <-reset_config
   59.013567: gether_disconnect <-reset_config
   59.049560: device_remove_file <-device_remove
   59.051185: configfs_composite_disconnect <-usb_gadget_disco
   59.051189: composite_disconnect <-configfs_composite_discon
   59.051195: configfs_composite_unbind <-gadget_unbind_driver
   59.052519: eem_unbind <-purge_configs_funcs
   59.052529: composite_dev_cleanup <-configfs_composite_unbin
   59.052537: device_remove_file <-composite_dev_cleanup

device_remove_file gets called twice, once by device_remove after 
gether_disconnect (that the one). The 2nd time by composite_dev_cleanup 
(removing the gadget)

> You may recall the whole issue did not occur before this patch got applied.
> 
>> Hardik
>>
>>
>>>
>>>>
>>>>> According to current kernel architecture of u_ether driver, only
>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>> kobject and sysfs interface.
>>>>> I suggest sharing the analysis here to understand why this practice
>>>>> is not followed in your use case or driver ?
>>>>
>>>> Yes, I'll try to trace where that happens.
>>>>
>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>> harmless? But the usb: net device remaining after disconnect or role
>>>> switch is not good, as the route remains.
>>>>
>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>> happens if you make eem connection and then unplug?
>>>>
>>>>> I am curious why the driver was developed without adhering to the
>>>>> kernel's gadget architecture.
>>>
>>> I don't know what you mean here. Which driver do you mean?
>>>
>>>>>>
>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>> with this patch and
>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>> Would have been nice if someone could test on other
>>>>>>> controller as well. But
>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>> It's quite possible, please test on your side.
>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>
>>>>>> -- 
>>>>>> With Best Regards,
>>>>>> Andy Shevchenko
>>>>>>
>>>>>>
>>>>
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-04-30 21:12                                       ` Ferry Toth
@ 2024-05-02 15:29                                         ` Hardik Gajjar
  2024-05-02 15:31                                           ` Andy Shevchenko
  2024-05-02 20:13                                           ` Ferry Toth
  0 siblings, 2 replies; 32+ messages in thread
From: Hardik Gajjar @ 2024-05-02 15:29 UTC (permalink / raw)
  To: Ferry Toth
  Cc: Hardik Gajjar, Andy Shevchenko, gregkh, s.hauer, jonathanh,
	linux-usb, linux-kernel, quic_linyyuan, paul, quic_eserrao,
	erosca

On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
> Hi,
> 
> Op 30-04-2024 om 21:40 schreef Ferry Toth:
> > Hi,
> > 
> > Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
> > > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
> > > > Hi,
> > > > 
> > > > Op 25-04-2024 om 23:27 schreef Ferry Toth:
> > > > > Hi,
> > > > > 
> > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> > > > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> > > > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> > > > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > > > > > > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> > > > > > > > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> > > > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> > > > > > > 
> > > > > > > ...
> > > > > > > 
> > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches.
> > > > > > > > > > > > 
> > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not
> > > > > > > > > > > > removed and it is a link, the link
> > > > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
> > > > > > > > > > > > no
> > > > > > > > > > > > longer exists
> > > > > > > > > > So, it means that the /sys/class/net/usb0 is
> > > > > > > > > > present, but the symlink is
> > > > > > > > > > broken. In that case, the dwc3 driver should
> > > > > > > > > > recreate the device, and the
> > > > > > > > > > symlink should become active again
> > > > > > > > 
> > > > > > > > Yes, on first enabling gadget (when device mode is activated):
> > > > > > > > 
> > > > > > > > root@yuna:~# ls
> > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > driver  net  power  sound  subsystem  suspended  uevent
> > > > > > > > 
> > > > > > > > Then switching to host mode:
> > > > > > > > 
> > > > > > > > root@yuna:~# ls
> > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > ls: cannot access
> > > > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
> > > > > > > > No such file
> > > > > > > > or directory
> > > > > > > > 
> > > > > > > > Then back to device mode:
> > > > > > > > 
> > > > > > > > root@yuna:~# ls
> > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > driver  power  sound  subsystem  suspended  uevent
> > > > > > > > 
> > > > > > > > net is missing. But, network functions:
> > > > > > > > 
> > > > > > > > root@yuna:~# ping 10.42.0.1
> > > > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes
> > > > > > > > 
> > > > > > > > Mass storage device is created and removed each time as expected.
> > > > > > > 
> > > > > > > So, what's the conclusion? Shall we move towards revert of those
> > > > > > > two changes?
> > > > > > 
> > > > > > 
> > > > > > As promised, I have the tested the this patch with the dwc3 gadget.
> > > > > > I could not reproduce
> > > > > > the issue.
> > > > > > 
> > > > > > I can see the usb0 exist all the time and accessible regardless of
> > > > > > the role switching of the USB mode (peripheral <-> host)
> > > > > > 
> > > > > > Following are the logs:
> > > > > > //Host to device
> > > > > > 
> > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
> > > > > > > mode
> > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > a800000.dwc3/gadget/net/
> > > > > > usb0
> > > > > > 
> > > > > > //device to host
> > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
> > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > a800000.dwc3/gadget/net/
> > > > > > usb0
> > > > > 
> > > > > That is weird. When I switch to host mode (using the physical switch),
> > > > > the whole gadget directory is removed (now testing 6.9.0-rc5)
> > > > > 
> > > > > Switching back to device mode, that gadget directory is recreated. And
> > > > > gadget/sound as well, but not gadget/net.
> > > > > 
> > > > > > s a800000.dwc3/gadget/net/usb0
> > > > > > <
> > > > > > addr_assign_type    duplex             phys_port_name
> > > > > > addr_len            flags              phys_switch_id
> > > > > > address             gro_flush_timeout  power
> > > > > > broadcast           ifalias            proto_down
> > > > > > carrier             ifindex            queues
> > > > > > carrier_changes     iflink             speed
> > > > > > carrier_down_count  link_mode          statistics
> > > > > > carrier_up_count    mtu                subsystem
> > > > > > dev_id              name_assign_type   tx_queue_len
> > > > > > dev_port            netdev_group       type
> > > > > > device              operstate          uevent
> > > > > > dormant             phys_port_id       waiting_for_supplier
> > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
> > > > > > usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
> > > > > >             BROADCAST MULTICAST  MTU:1500  Metric:1
> > > > > >             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > > > > >             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > > > > >             collisions:0 txqueuelen:1000
> > > > > >             RX bytes:0 TX bytes:0
> > > > > > 
> > > > > > console:/sys/bus/platform/devices/a800000.ssusb #
> > > > > > 
> > > > > > I strongly advise against reverting the patch solely based on the
> > > > > > observed issue of removing the /sys/class/net/usb0 directory while
> > > > > > the usb0 interface remains available.
> > > > > 
> > > > > There's more to it. I also mentioned that switching the role or
> > > > > unplugging the cable leaves the usb0 connection.
> > > > > 
> > > > > I have while in host mode:
> > > > > root@yuna:~# ifconfig -a usb0
> > > > > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
> > > > >           inet 10.42.0.221  netmask 255.255.255.0  broadcast
> > > > > 10.42.0.255
> > > > >           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64 
> > > > > scopeid 0x20<link>
> > > > > 
> > > > > 
> > > > > You don't see that because you didn't create a connection at all.
> > > > > 
> > > > > > Instead, I recommend enabling FTRACE to trace the functions involved
> > > > > > and identify which faulty call is responsible for removing usb0.
> > > > > 
> > > > > Switching from device -> host -> device:
> > > > > 
> > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_*
> > > > >     plugin 'function_graph'
> > > > > Hit Ctrl^C to stop recording
> > > > > ^CCPU0 data recorded at offset=0x1c8000
> > > > >       188 bytes in size (4096 uncompressed)
> > > > > CPU1 data recorded at offset=0x1c9000
> > > > >       0 bytes in size (0 uncompressed)
> > > > > root@yuna:~# trace-cmd report
> > > > > cpus=2
> > > > >        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
> > > > > 2079.480 us |  gether_disconnect();
> > > > >        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
> > > > > 11.640 us  |  gether_disconnect();
> > > > >        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
> > > > > 116.520 us |  gether_connect();
> > > > >        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
> > > > > 2057.260 us |  gether_disconnect();
> > > > >        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
> > > > > 109.000 us |  gether_connect();
> > > > 
> > > > I tried to get a more useful trace:
> > > > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
> > > > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
> > > > root@yuna:/sys/kernel/tracing# echo function > current_tracer
> > > > root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
> > > > -> switch to host mode then back to device
> > > > root@yuna:/sys/kernel/tracing# cat trace
> > > > # tracer: function
> > > > #
> > > > # entries-in-buffer/entries-written: 53/53   #P:2
> > > > #
> > > > #                                _-----=> irqs-off/BH-disabled
> > > > #                               / _----=> need-resched
> > > > #                              | / _---=> hardirq/softirq
> > > > #                              || / _--=> preempt-depth
> > > > #                              ||| / _-=> migrate-disable
> > > > #                              |||| /     delay
> > > > #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> > > > #              | |         |   |||||     |         |
> > > >       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
> > > > <-__composite_disconnect
> > > >       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
> > > > <-reset_config
> > > >       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
> > > > <-reset_config
> > > >       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
> > > > <-purge_configs_funcs
> > > > 
> > > > -> to device mode
> > > > 
> > > >       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
> > > > <-usb_add_function
> > > >       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
> > > > <-composite_setup
> > > >       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
> > > > <-eem_set_alt
> > > >       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
> > > > <-eem_set_alt
> > > >       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
> > > > <-composite_setup
> > > >       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
> > > > <-eem_set_alt
> > > >       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
> > > > <-eem_set_alt
> > > >       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
> > > > <-rx_complete
> > > > ...
> > > > 
> > > > I don't know where to look exactly. Any hints?
> > > 
> > > do you see anything related to gether_cleanup() after eem_unbind() ?
> > 
> > Nope. It's a pitty that the trace formatting got messed up above. But as
> > you can see I traced gether_* and eem_*. After eem_unbind no traced
> > function is called, until I flip the switch to device mode.
> > The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
> > and eem_wrap <-eth_start_xmit lines.
> > 
> > > If not then, you may try to enable tracing of TCP/IP stack and
> > > network side to check who deleting the sysfs entry
> > 
> > Yes, that's a vast amount of functions to trace. And I don't see how
> > that would be related to the patch we're discussing here. I was hoping
> > for a little more targeted hint.
> 
> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
> 'reset_config' and 'device_remove_file' leads me to:
> 
> TIMESTAMP  FUNCTION
>    |         |
>   49.952477: eem_wrap <-eth_start_xmit
>   55.072455: eem_wrap <-eth_start_xmit
>   55.072621: eem_unwrap <-rx_complete
>   59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>   59.011545: composite_reset <-configfs_composite_reset
>   59.011548: reset_config <-__composite_disconnect
>   59.013565: eem_disable <-reset_config
>   59.013567: gether_disconnect <-reset_config
>   59.049560: device_remove_file <-device_remove
>   59.051185: configfs_composite_disconnect <-usb_gadget_disco
>   59.051189: composite_disconnect <-configfs_composite_discon
>   59.051195: configfs_composite_unbind <-gadget_unbind_driver
>   59.052519: eem_unbind <-purge_configs_funcs
>   59.052529: composite_dev_cleanup <-configfs_composite_unbin
>   59.052537: device_remove_file <-composite_dev_cleanup
> 
> device_remove_file gets called twice, once by device_remove after
> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
> (removing the gadget)

I believe that the device_remove_file function is only removing suspend-specific attributes, not the complete gadget. 
Typically, when you perform the role switch, the Gadget start/stop function in your UDC driver is called. These functions should not delete the gadget

To investigate further, could you please enable the DWC3 functions in ftrace and check who is removing the gadget? 
I can also enable this on my system and compare the logs with yours, but I will be in PI planning for 1.5 weeks and may not be able to provide immediate support.

Additionally, please check if you have any customized DWC patches that may be causing this problem.

> 
> > You may recall the whole issue did not occur before this patch got applied.
> > 
> > > Hardik
> > > 
> > > 
> > > > 
> > > > > 
> > > > > > According to current kernel architecture of u_ether driver, only
> > > > > > gether_cleanup should remove the usb0 interface along with its
> > > > > > kobject and sysfs interface.
> > > > > > I suggest sharing the analysis here to understand why this practice
> > > > > > is not followed in your use case or driver ?
> > > > > 
> > > > > Yes, I'll try to trace where that happens.
> > > > > 
> > > > > Nevertheless, the disappearance of the net/usb0 directory seems
> > > > > harmless? But the usb: net device remaining after disconnect or role
> > > > > switch is not good, as the route remains.
> > > > > 
> > > > > May be they are 2 separate problems. Could you try to reproduce what
> > > > > happens if you make eem connection and then unplug?
> > > > > 
> > > > > > I am curious why the driver was developed without adhering to the
> > > > > > kernel's gadget architecture.
> > > > 
> > > > I don't know what you mean here. Which driver do you mean?
> > > > 
> > > > > > > 
> > > > > > > > > > I have the dwc3 IP base usb controller, Let me check
> > > > > > > > > > with this patch and
> > > > > > > > > > share result here.  May be we need some fix in dwc3
> > > > > > > > Would have been nice if someone could test on other
> > > > > > > > controller as well. But
> > > > > > > > another instance of dwc3 is also very welcome.
> > > > > > > > > It's quite possible, please test on your side.
> > > > > > > > > We are happy to test any fixes if you come up with.
> > > > > > > 
> > > > > > > -- 
> > > > > > > With Best Regards,
> > > > > > > Andy Shevchenko
> > > > > > > 
> > > > > > > 
> > > > > 
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-05-02 15:29                                         ` Hardik Gajjar
@ 2024-05-02 15:31                                           ` Andy Shevchenko
  2024-05-02 16:16                                             ` Hardik Gajjar
  2024-05-02 20:13                                           ` Ferry Toth
  1 sibling, 1 reply; 32+ messages in thread
From: Andy Shevchenko @ 2024-05-02 15:31 UTC (permalink / raw)
  To: Hardik Gajjar
  Cc: Ferry Toth, gregkh, s.hauer, jonathanh, linux-usb, linux-kernel,
	quic_linyyuan, paul, quic_eserrao, erosca

On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote:
> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
> > Op 30-04-2024 om 21:40 schreef Ferry Toth:
> > > Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
> > > > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
> > > > > Op 25-04-2024 om 23:27 schreef Ferry Toth:
> > > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> > > > > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> > > > > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> > > > > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > > > > > > > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> > > > > > > > > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> > > > > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:

...

> > > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not
> > > > > > > > > > > > > removed and it is a link, the link
> > > > > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
> > > > > > > > > > > > > no
> > > > > > > > > > > > > longer exists
> > > > > > > > > > > So, it means that the /sys/class/net/usb0 is
> > > > > > > > > > > present, but the symlink is
> > > > > > > > > > > broken. In that case, the dwc3 driver should
> > > > > > > > > > > recreate the device, and the
> > > > > > > > > > > symlink should become active again
> > > > > > > > > 
> > > > > > > > > Yes, on first enabling gadget (when device mode is activated):
> > > > > > > > > 
> > > > > > > > > root@yuna:~# ls
> > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > driver  net  power  sound  subsystem  suspended  uevent
> > > > > > > > > 
> > > > > > > > > Then switching to host mode:
> > > > > > > > > 
> > > > > > > > > root@yuna:~# ls
> > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > ls: cannot access
> > > > > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
> > > > > > > > > No such file
> > > > > > > > > or directory
> > > > > > > > > 
> > > > > > > > > Then back to device mode:
> > > > > > > > > 
> > > > > > > > > root@yuna:~# ls
> > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > driver  power  sound  subsystem  suspended  uevent
> > > > > > > > > 
> > > > > > > > > net is missing. But, network functions:
> > > > > > > > > 
> > > > > > > > > root@yuna:~# ping 10.42.0.1
> > > > > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes
> > > > > > > > > 
> > > > > > > > > Mass storage device is created and removed each time as expected.
> > > > > > > > 
> > > > > > > > So, what's the conclusion? Shall we move towards revert of those
> > > > > > > > two changes?
> > > > > > > 
> > > > > > > 
> > > > > > > As promised, I have the tested the this patch with the dwc3 gadget.
> > > > > > > I could not reproduce
> > > > > > > the issue.
> > > > > > > 
> > > > > > > I can see the usb0 exist all the time and accessible regardless of
> > > > > > > the role switching of the USB mode (peripheral <-> host)
> > > > > > > 
> > > > > > > Following are the logs:
> > > > > > > //Host to device
> > > > > > > 
> > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
> > > > > > > > mode
> > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > > a800000.dwc3/gadget/net/
> > > > > > > usb0
> > > > > > > 
> > > > > > > //device to host
> > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
> > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > > a800000.dwc3/gadget/net/
> > > > > > > usb0
> > > > > > 
> > > > > > That is weird. When I switch to host mode (using the physical switch),
> > > > > > the whole gadget directory is removed (now testing 6.9.0-rc5)
> > > > > > 
> > > > > > Switching back to device mode, that gadget directory is recreated. And
> > > > > > gadget/sound as well, but not gadget/net.
> > > > > > 
> > > > > > > s a800000.dwc3/gadget/net/usb0
> > > > > > > <
> > > > > > > addr_assign_type    duplex             phys_port_name
> > > > > > > addr_len            flags              phys_switch_id
> > > > > > > address             gro_flush_timeout  power
> > > > > > > broadcast           ifalias            proto_down
> > > > > > > carrier             ifindex            queues
> > > > > > > carrier_changes     iflink             speed
> > > > > > > carrier_down_count  link_mode          statistics
> > > > > > > carrier_up_count    mtu                subsystem
> > > > > > > dev_id              name_assign_type   tx_queue_len
> > > > > > > dev_port            netdev_group       type
> > > > > > > device              operstate          uevent
> > > > > > > dormant             phys_port_id       waiting_for_supplier
> > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
> > > > > > > usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
> > > > > > >             BROADCAST MULTICAST  MTU:1500  Metric:1
> > > > > > >             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > > > > > >             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > > > > > >             collisions:0 txqueuelen:1000
> > > > > > >             RX bytes:0 TX bytes:0
> > > > > > > 
> > > > > > > console:/sys/bus/platform/devices/a800000.ssusb #
> > > > > > > 
> > > > > > > I strongly advise against reverting the patch solely based on the
> > > > > > > observed issue of removing the /sys/class/net/usb0 directory while
> > > > > > > the usb0 interface remains available.
> > > > > > 
> > > > > > There's more to it. I also mentioned that switching the role or
> > > > > > unplugging the cable leaves the usb0 connection.
> > > > > > 
> > > > > > I have while in host mode:
> > > > > > root@yuna:~# ifconfig -a usb0
> > > > > > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
> > > > > >           inet 10.42.0.221  netmask 255.255.255.0  broadcast
> > > > > > 10.42.0.255
> > > > > >           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64 
> > > > > > scopeid 0x20<link>
> > > > > > 
> > > > > > 
> > > > > > You don't see that because you didn't create a connection at all.
> > > > > > 
> > > > > > > Instead, I recommend enabling FTRACE to trace the functions involved
> > > > > > > and identify which faulty call is responsible for removing usb0.
> > > > > > 
> > > > > > Switching from device -> host -> device:
> > > > > > 
> > > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_*
> > > > > >     plugin 'function_graph'
> > > > > > Hit Ctrl^C to stop recording
> > > > > > ^CCPU0 data recorded at offset=0x1c8000
> > > > > >       188 bytes in size (4096 uncompressed)
> > > > > > CPU1 data recorded at offset=0x1c9000
> > > > > >       0 bytes in size (0 uncompressed)
> > > > > > root@yuna:~# trace-cmd report
> > > > > > cpus=2
> > > > > >        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
> > > > > > 2079.480 us |  gether_disconnect();
> > > > > >        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
> > > > > > 11.640 us  |  gether_disconnect();
> > > > > >        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
> > > > > > 116.520 us |  gether_connect();
> > > > > >        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
> > > > > > 2057.260 us |  gether_disconnect();
> > > > > >        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
> > > > > > 109.000 us |  gether_connect();
> > > > > 
> > > > > I tried to get a more useful trace:
> > > > > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
> > > > > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
> > > > > root@yuna:/sys/kernel/tracing# echo function > current_tracer
> > > > > root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
> > > > > -> switch to host mode then back to device
> > > > > root@yuna:/sys/kernel/tracing# cat trace
> > > > > # tracer: function
> > > > > #
> > > > > # entries-in-buffer/entries-written: 53/53   #P:2
> > > > > #
> > > > > #                                _-----=> irqs-off/BH-disabled
> > > > > #                               / _----=> need-resched
> > > > > #                              | / _---=> hardirq/softirq
> > > > > #                              || / _--=> preempt-depth
> > > > > #                              ||| / _-=> migrate-disable
> > > > > #                              |||| /     delay
> > > > > #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> > > > > #              | |         |   |||||     |         |
> > > > >       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
> > > > > <-__composite_disconnect
> > > > >       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
> > > > > <-reset_config
> > > > >       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
> > > > > <-reset_config
> > > > >       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
> > > > > <-purge_configs_funcs
> > > > > 
> > > > > -> to device mode
> > > > > 
> > > > >       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
> > > > > <-usb_add_function
> > > > >       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
> > > > > <-composite_setup
> > > > >       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
> > > > > <-eem_set_alt
> > > > >       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
> > > > > <-eem_set_alt
> > > > >       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
> > > > > <-composite_setup
> > > > >       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
> > > > > <-eem_set_alt
> > > > >       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
> > > > > <-eem_set_alt
> > > > >       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
> > > > > <-rx_complete
> > > > > ...
> > > > > 
> > > > > I don't know where to look exactly. Any hints?
> > > > 
> > > > do you see anything related to gether_cleanup() after eem_unbind() ?
> > > 
> > > Nope. It's a pitty that the trace formatting got messed up above. But as
> > > you can see I traced gether_* and eem_*. After eem_unbind no traced
> > > function is called, until I flip the switch to device mode.
> > > The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
> > > and eem_wrap <-eth_start_xmit lines.
> > > 
> > > > If not then, you may try to enable tracing of TCP/IP stack and
> > > > network side to check who deleting the sysfs entry
> > > 
> > > Yes, that's a vast amount of functions to trace. And I don't see how
> > > that would be related to the patch we're discussing here. I was hoping
> > > for a little more targeted hint.
> > 
> > Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
> > 'reset_config' and 'device_remove_file' leads me to:
> > 
> > TIMESTAMP  FUNCTION
> >    |         |
> >   49.952477: eem_wrap <-eth_start_xmit
> >   55.072455: eem_wrap <-eth_start_xmit
> >   55.072621: eem_unwrap <-rx_complete
> >   59.011540: configfs_composite_reset <-usb_gadget_udc_reset
> >   59.011545: composite_reset <-configfs_composite_reset
> >   59.011548: reset_config <-__composite_disconnect
> >   59.013565: eem_disable <-reset_config
> >   59.013567: gether_disconnect <-reset_config
> >   59.049560: device_remove_file <-device_remove
> >   59.051185: configfs_composite_disconnect <-usb_gadget_disco
> >   59.051189: composite_disconnect <-configfs_composite_discon
> >   59.051195: configfs_composite_unbind <-gadget_unbind_driver
> >   59.052519: eem_unbind <-purge_configs_funcs
> >   59.052529: composite_dev_cleanup <-configfs_composite_unbin
> >   59.052537: device_remove_file <-composite_dev_cleanup
> > 
> > device_remove_file gets called twice, once by device_remove after
> > gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
> > (removing the gadget)
> 
> I believe that the device_remove_file function is only removing
> suspend-specific attributes, not the complete gadget.  Typically, when you
> perform the role switch, the Gadget start/stop function in your UDC driver is
> called. These functions should not delete the gadget
> 
> To investigate further, could you please enable the DWC3 functions in ftrace
> and check who is removing the gadget?  I can also enable this on my system
> and compare the logs with yours, but I will be in PI planning for 1.5 weeks
> and may not be able to provide immediate support.

Since we are almost at -rc7, I propose to revert and try again next cycle.

> Additionally, please check if you have any customized DWC patches that may be causing this problem.
> 
> > > You may recall the whole issue did not occur before this patch got applied.
> > > 
> > > > > > > According to current kernel architecture of u_ether driver, only
> > > > > > > gether_cleanup should remove the usb0 interface along with its
> > > > > > > kobject and sysfs interface.
> > > > > > > I suggest sharing the analysis here to understand why this practice
> > > > > > > is not followed in your use case or driver ?
> > > > > > 
> > > > > > Yes, I'll try to trace where that happens.
> > > > > > 
> > > > > > Nevertheless, the disappearance of the net/usb0 directory seems
> > > > > > harmless? But the usb: net device remaining after disconnect or role
> > > > > > switch is not good, as the route remains.
> > > > > > 
> > > > > > May be they are 2 separate problems. Could you try to reproduce what
> > > > > > happens if you make eem connection and then unplug?
> > > > > > 
> > > > > > > I am curious why the driver was developed without adhering to the
> > > > > > > kernel's gadget architecture.
> > > > > 
> > > > > I don't know what you mean here. Which driver do you mean?
> > > > > 
> > > > > > > > > > > I have the dwc3 IP base usb controller, Let me check
> > > > > > > > > > > with this patch and
> > > > > > > > > > > share result here.  May be we need some fix in dwc3
> > > > > > > > > Would have been nice if someone could test on other
> > > > > > > > > controller as well. But
> > > > > > > > > another instance of dwc3 is also very welcome.
> > > > > > > > > > It's quite possible, please test on your side.
> > > > > > > > > > We are happy to test any fixes if you come up with.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-05-02 15:31                                           ` Andy Shevchenko
@ 2024-05-02 16:16                                             ` Hardik Gajjar
  2024-05-03  7:24                                               ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 32+ messages in thread
From: Hardik Gajjar @ 2024-05-02 16:16 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Hardik Gajjar, Ferry Toth, gregkh, s.hauer, jonathanh, linux-usb,
	linux-kernel, quic_linyyuan, paul, quic_eserrao, erosca

On Thu, May 02, 2024 at 06:31:48PM +0300, Andy Shevchenko wrote:
> On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote:
> > On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
> > > Op 30-04-2024 om 21:40 schreef Ferry Toth:
> > > > Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
> > > > > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
> > > > > > Op 25-04-2024 om 23:27 schreef Ferry Toth:
> > > > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> > > > > > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> > > > > > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> > > > > > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > > > > > > > > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> > > > > > > > > > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> > > > > > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> 
> ...
> 
> > > > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not
> > > > > > > > > > > > > > removed and it is a link, the link
> > > > > > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
> > > > > > > > > > > > > > no
> > > > > > > > > > > > > > longer exists
> > > > > > > > > > > > So, it means that the /sys/class/net/usb0 is
> > > > > > > > > > > > present, but the symlink is
> > > > > > > > > > > > broken. In that case, the dwc3 driver should
> > > > > > > > > > > > recreate the device, and the
> > > > > > > > > > > > symlink should become active again
> > > > > > > > > > 
> > > > > > > > > > Yes, on first enabling gadget (when device mode is activated):
> > > > > > > > > > 
> > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > driver  net  power  sound  subsystem  suspended  uevent
> > > > > > > > > > 
> > > > > > > > > > Then switching to host mode:
> > > > > > > > > > 
> > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > ls: cannot access
> > > > > > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
> > > > > > > > > > No such file
> > > > > > > > > > or directory
> > > > > > > > > > 
> > > > > > > > > > Then back to device mode:
> > > > > > > > > > 
> > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > driver  power  sound  subsystem  suspended  uevent
> > > > > > > > > > 
> > > > > > > > > > net is missing. But, network functions:
> > > > > > > > > > 
> > > > > > > > > > root@yuna:~# ping 10.42.0.1
> > > > > > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes
> > > > > > > > > > 
> > > > > > > > > > Mass storage device is created and removed each time as expected.
> > > > > > > > > 
> > > > > > > > > So, what's the conclusion? Shall we move towards revert of those
> > > > > > > > > two changes?
> > > > > > > > 
> > > > > > > > 
> > > > > > > > As promised, I have the tested the this patch with the dwc3 gadget.
> > > > > > > > I could not reproduce
> > > > > > > > the issue.
> > > > > > > > 
> > > > > > > > I can see the usb0 exist all the time and accessible regardless of
> > > > > > > > the role switching of the USB mode (peripheral <-> host)
> > > > > > > > 
> > > > > > > > Following are the logs:
> > > > > > > > //Host to device
> > > > > > > > 
> > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
> > > > > > > > > mode
> > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > > > a800000.dwc3/gadget/net/
> > > > > > > > usb0
> > > > > > > > 
> > > > > > > > //device to host
> > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
> > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > > > a800000.dwc3/gadget/net/
> > > > > > > > usb0
> > > > > > > 
> > > > > > > That is weird. When I switch to host mode (using the physical switch),
> > > > > > > the whole gadget directory is removed (now testing 6.9.0-rc5)
> > > > > > > 
> > > > > > > Switching back to device mode, that gadget directory is recreated. And
> > > > > > > gadget/sound as well, but not gadget/net.
> > > > > > > 
> > > > > > > > s a800000.dwc3/gadget/net/usb0
> > > > > > > > <
> > > > > > > > addr_assign_type    duplex             phys_port_name
> > > > > > > > addr_len            flags              phys_switch_id
> > > > > > > > address             gro_flush_timeout  power
> > > > > > > > broadcast           ifalias            proto_down
> > > > > > > > carrier             ifindex            queues
> > > > > > > > carrier_changes     iflink             speed
> > > > > > > > carrier_down_count  link_mode          statistics
> > > > > > > > carrier_up_count    mtu                subsystem
> > > > > > > > dev_id              name_assign_type   tx_queue_len
> > > > > > > > dev_port            netdev_group       type
> > > > > > > > device              operstate          uevent
> > > > > > > > dormant             phys_port_id       waiting_for_supplier
> > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
> > > > > > > > usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
> > > > > > > >             BROADCAST MULTICAST  MTU:1500  Metric:1
> > > > > > > >             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > > > > > > >             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > > > > > > >             collisions:0 txqueuelen:1000
> > > > > > > >             RX bytes:0 TX bytes:0
> > > > > > > > 
> > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb #
> > > > > > > > 
> > > > > > > > I strongly advise against reverting the patch solely based on the
> > > > > > > > observed issue of removing the /sys/class/net/usb0 directory while
> > > > > > > > the usb0 interface remains available.
> > > > > > > 
> > > > > > > There's more to it. I also mentioned that switching the role or
> > > > > > > unplugging the cable leaves the usb0 connection.
> > > > > > > 
> > > > > > > I have while in host mode:
> > > > > > > root@yuna:~# ifconfig -a usb0
> > > > > > > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
> > > > > > >           inet 10.42.0.221  netmask 255.255.255.0  broadcast
> > > > > > > 10.42.0.255
> > > > > > >           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64 
> > > > > > > scopeid 0x20<link>
> > > > > > > 
> > > > > > > 
> > > > > > > You don't see that because you didn't create a connection at all.
> > > > > > > 
> > > > > > > > Instead, I recommend enabling FTRACE to trace the functions involved
> > > > > > > > and identify which faulty call is responsible for removing usb0.
> > > > > > > 
> > > > > > > Switching from device -> host -> device:
> > > > > > > 
> > > > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_*
> > > > > > >     plugin 'function_graph'
> > > > > > > Hit Ctrl^C to stop recording
> > > > > > > ^CCPU0 data recorded at offset=0x1c8000
> > > > > > >       188 bytes in size (4096 uncompressed)
> > > > > > > CPU1 data recorded at offset=0x1c9000
> > > > > > >       0 bytes in size (0 uncompressed)
> > > > > > > root@yuna:~# trace-cmd report
> > > > > > > cpus=2
> > > > > > >        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
> > > > > > > 2079.480 us |  gether_disconnect();
> > > > > > >        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
> > > > > > > 11.640 us  |  gether_disconnect();
> > > > > > >        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
> > > > > > > 116.520 us |  gether_connect();
> > > > > > >        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
> > > > > > > 2057.260 us |  gether_disconnect();
> > > > > > >        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
> > > > > > > 109.000 us |  gether_connect();
> > > > > > 
> > > > > > I tried to get a more useful trace:
> > > > > > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
> > > > > > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
> > > > > > root@yuna:/sys/kernel/tracing# echo function > current_tracer
> > > > > > root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
> > > > > > -> switch to host mode then back to device
> > > > > > root@yuna:/sys/kernel/tracing# cat trace
> > > > > > # tracer: function
> > > > > > #
> > > > > > # entries-in-buffer/entries-written: 53/53   #P:2
> > > > > > #
> > > > > > #                                _-----=> irqs-off/BH-disabled
> > > > > > #                               / _----=> need-resched
> > > > > > #                              | / _---=> hardirq/softirq
> > > > > > #                              || / _--=> preempt-depth
> > > > > > #                              ||| / _-=> migrate-disable
> > > > > > #                              |||| /     delay
> > > > > > #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> > > > > > #              | |         |   |||||     |         |
> > > > > >       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
> > > > > > <-__composite_disconnect
> > > > > >       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
> > > > > > <-reset_config
> > > > > >       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
> > > > > > <-reset_config
> > > > > >       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
> > > > > > <-purge_configs_funcs
> > > > > > 
> > > > > > -> to device mode
> > > > > > 
> > > > > >       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
> > > > > > <-usb_add_function
> > > > > >       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
> > > > > > <-composite_setup
> > > > > >       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
> > > > > > <-eem_set_alt
> > > > > >       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
> > > > > > <-eem_set_alt
> > > > > >       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
> > > > > > <-composite_setup
> > > > > >       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
> > > > > > <-eem_set_alt
> > > > > >       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
> > > > > > <-eem_set_alt
> > > > > >       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
> > > > > > <-rx_complete
> > > > > > ...
> > > > > > 
> > > > > > I don't know where to look exactly. Any hints?
> > > > > 
> > > > > do you see anything related to gether_cleanup() after eem_unbind() ?
> > > > 
> > > > Nope. It's a pitty that the trace formatting got messed up above. But as
> > > > you can see I traced gether_* and eem_*. After eem_unbind no traced
> > > > function is called, until I flip the switch to device mode.
> > > > The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
> > > > and eem_wrap <-eth_start_xmit lines.
> > > > 
> > > > > If not then, you may try to enable tracing of TCP/IP stack and
> > > > > network side to check who deleting the sysfs entry
> > > > 
> > > > Yes, that's a vast amount of functions to trace. And I don't see how
> > > > that would be related to the patch we're discussing here. I was hoping
> > > > for a little more targeted hint.
> > > 
> > > Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
> > > 'reset_config' and 'device_remove_file' leads me to:
> > > 
> > > TIMESTAMP  FUNCTION
> > >    |         |
> > >   49.952477: eem_wrap <-eth_start_xmit
> > >   55.072455: eem_wrap <-eth_start_xmit
> > >   55.072621: eem_unwrap <-rx_complete
> > >   59.011540: configfs_composite_reset <-usb_gadget_udc_reset
> > >   59.011545: composite_reset <-configfs_composite_reset
> > >   59.011548: reset_config <-__composite_disconnect
> > >   59.013565: eem_disable <-reset_config
> > >   59.013567: gether_disconnect <-reset_config
> > >   59.049560: device_remove_file <-device_remove
> > >   59.051185: configfs_composite_disconnect <-usb_gadget_disco
> > >   59.051189: composite_disconnect <-configfs_composite_discon
> > >   59.051195: configfs_composite_unbind <-gadget_unbind_driver
> > >   59.052519: eem_unbind <-purge_configs_funcs
> > >   59.052529: composite_dev_cleanup <-configfs_composite_unbin
> > >   59.052537: device_remove_file <-composite_dev_cleanup
> > > 
> > > device_remove_file gets called twice, once by device_remove after
> > > gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
> > > (removing the gadget)
> > 
> > I believe that the device_remove_file function is only removing
> > suspend-specific attributes, not the complete gadget.  Typically, when you
> > perform the role switch, the Gadget start/stop function in your UDC driver is
> > called. These functions should not delete the gadget
> > 
> > To investigate further, could you please enable the DWC3 functions in ftrace
> > and check who is removing the gadget?  I can also enable this on my system
> > and compare the logs with yours, but I will be in PI planning for 1.5 weeks
> > and may not be able to provide immediate support.
> 
> Since we are almost at -rc7, I propose to revert and try again next cycle.

Why? There is no problem with this patch!

I don't know why you are so excited to revert the patch instead of investigating the original issue. 
Even though I have proved that the problem is caused by usb0 being removed just by role-switching the port by the UDC driver, this behavior is incorrect.

There is no behavior in the Linux kernel that keeps the network interface and removes the related sysfs entry. THIS IS WRONG and has been FIXED without reverting any mainline patch.

Please don't encourage reverting any mainstream patches to avoid investigating a (probably) faulty driver. This is not how the community should work.

Also, I'm a bit wondering: the patch has been in the mainline for some time, and we haven't had any issues except this one, which is due to the wrong behavior of the UDC driver.

> 
> > Additionally, please check if you have any customized DWC patches that may be causing this problem.
> > 
> > > > You may recall the whole issue did not occur before this patch got applied.
> > > > 
> > > > > > > > According to current kernel architecture of u_ether driver, only
> > > > > > > > gether_cleanup should remove the usb0 interface along with its
> > > > > > > > kobject and sysfs interface.
> > > > > > > > I suggest sharing the analysis here to understand why this practice
> > > > > > > > is not followed in your use case or driver ?
> > > > > > > 
> > > > > > > Yes, I'll try to trace where that happens.
> > > > > > > 
> > > > > > > Nevertheless, the disappearance of the net/usb0 directory seems
> > > > > > > harmless? But the usb: net device remaining after disconnect or role
> > > > > > > switch is not good, as the route remains.
> > > > > > > 
> > > > > > > May be they are 2 separate problems. Could you try to reproduce what
> > > > > > > happens if you make eem connection and then unplug?
> > > > > > > 
> > > > > > > > I am curious why the driver was developed without adhering to the
> > > > > > > > kernel's gadget architecture.
> > > > > > 
> > > > > > I don't know what you mean here. Which driver do you mean?
> > > > > > 
> > > > > > > > > > > > I have the dwc3 IP base usb controller, Let me check
> > > > > > > > > > > > with this patch and
> > > > > > > > > > > > share result here.  May be we need some fix in dwc3
> > > > > > > > > > Would have been nice if someone could test on other
> > > > > > > > > > controller as well. But
> > > > > > > > > > another instance of dwc3 is also very welcome.
> > > > > > > > > > > It's quite possible, please test on your side.
> > > > > > > > > > > We are happy to test any fixes if you come up with.
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-05-02 15:29                                         ` Hardik Gajjar
  2024-05-02 15:31                                           ` Andy Shevchenko
@ 2024-05-02 20:13                                           ` Ferry Toth
  2024-05-02 20:32                                             ` Ferry Toth
  1 sibling, 1 reply; 32+ messages in thread
From: Ferry Toth @ 2024-05-02 20:13 UTC (permalink / raw)
  To: Hardik Gajjar
  Cc: Andy Shevchenko, gregkh, s.hauer, jonathanh, linux-usb,
	linux-kernel, quic_linyyuan, paul, quic_eserrao, erosca

[-- Attachment #1: Type: text/plain, Size: 14342 bytes --]

Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>> Hi,
>>
>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>> Hi,
>>>
>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>> Hi,
>>>>>
>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>> Hi,
>>>>>>
>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>
>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>> no
>>>>>>>>>>>>> longer exists
>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>> symlink should become active again
>>>>>>>>>
>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>
>>>>>>>>> root@yuna:~# ls
>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>
>>>>>>>>> Then switching to host mode:
>>>>>>>>>
>>>>>>>>> root@yuna:~# ls
>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>> ls: cannot access
>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>> No such file
>>>>>>>>> or directory
>>>>>>>>>
>>>>>>>>> Then back to device mode:
>>>>>>>>>
>>>>>>>>> root@yuna:~# ls
>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>
>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>
>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>
>>>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>>>
>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>>>> two changes?
>>>>>>>
>>>>>>>
>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>>>> I could not reproduce
>>>>>>> the issue.
>>>>>>>
>>>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>
>>>>>>> Following are the logs:
>>>>>>> //Host to device
>>>>>>>
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>>>> mode
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>> a800000.dwc3/gadget/net/
>>>>>>> usb0
>>>>>>>
>>>>>>> //device to host
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>> a800000.dwc3/gadget/net/
>>>>>>> usb0
>>>>>>
>>>>>> That is weird. When I switch to host mode (using the physical switch),
>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>
>>>>>> Switching back to device mode, that gadget directory is recreated. And
>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>
>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>> <
>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>> addr_len            flags              phys_switch_id
>>>>>>> address             gro_flush_timeout  power
>>>>>>> broadcast           ifalias            proto_down
>>>>>>> carrier             ifindex            queues
>>>>>>> carrier_changes     iflink             speed
>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>> dev_port            netdev_group       type
>>>>>>> device              operstate          uevent
>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>              BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>>              collisions:0 txqueuelen:1000
>>>>>>>              RX bytes:0 TX bytes:0
>>>>>>>
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>
>>>>>>> I strongly advise against reverting the patch solely based on the
>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>>>> the usb0 interface remains available.
>>>>>>
>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>
>>>>>> I have while in host mode:
>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>>>>            inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>> 10.42.0.255
>>>>>>            inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64
>>>>>> scopeid 0x20<link>
>>>>>>
>>>>>>
>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>
>>>>>>> Instead, I recommend enabling FTRACE to trace the functions involved
>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>
>>>>>> Switching from device -> host -> device:
>>>>>>
>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>      plugin 'function_graph'
>>>>>> Hit Ctrl^C to stop recording
>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>        188 bytes in size (4096 uncompressed)
>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>        0 bytes in size (0 uncompressed)
>>>>>> root@yuna:~# trace-cmd report
>>>>>> cpus=2
>>>>>>         irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>         irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>         irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
>>>>>> 116.520 us |  gether_connect();
>>>>>>         irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>         irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
>>>>>> 109.000 us |  gether_connect();
>>>>>
>>>>> I tried to get a more useful trace:
>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
>>>>> -> switch to host mode then back to device
>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>> # tracer: function
>>>>> #
>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>> #
>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>> #                               / _----=> need-resched
>>>>> #                              | / _---=> hardirq/softirq
>>>>> #                              || / _--=> preempt-depth
>>>>> #                              ||| / _-=> migrate-disable
>>>>> #                              |||| /     delay
>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>> #              | |         |   |||||     |         |
>>>>>        irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>>>> <-__composite_disconnect
>>>>>        irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>> <-reset_config
>>>>>        irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
>>>>> <-reset_config
>>>>>        kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>> <-purge_configs_funcs
>>>>>
>>>>> -> to device mode
>>>>>
>>>>>        kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>> <-usb_add_function
>>>>>        irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>> <-composite_setup
>>>>>        irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
>>>>> <-eem_set_alt
>>>>>        irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
>>>>> <-eem_set_alt
>>>>>        irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>> <-composite_setup
>>>>>        irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
>>>>> <-eem_set_alt
>>>>>        irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
>>>>> <-eem_set_alt
>>>>>        irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>> <-rx_complete
>>>>> ...
>>>>>
>>>>> I don't know where to look exactly. Any hints?
>>>>
>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
>>>
>>> Nope. It's a pitty that the trace formatting got messed up above. But as
>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>> function is called, until I flip the switch to device mode.
>>> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
>>> and eem_wrap <-eth_start_xmit lines.
>>>
>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>> network side to check who deleting the sysfs entry
>>>
>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>> that would be related to the patch we're discussing here. I was hoping
>>> for a little more targeted hint.
>>
>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
>> 'reset_config' and 'device_remove_file' leads me to:
>>
>> TIMESTAMP  FUNCTION
>>     |         |
>>    49.952477: eem_wrap <-eth_start_xmit
>>    55.072455: eem_wrap <-eth_start_xmit
>>    55.072621: eem_unwrap <-rx_complete
>>    59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>    59.011545: composite_reset <-configfs_composite_reset
>>    59.011548: reset_config <-__composite_disconnect
>>    59.013565: eem_disable <-reset_config
>>    59.013567: gether_disconnect <-reset_config
>>    59.049560: device_remove_file <-device_remove
>>    59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>    59.051189: composite_disconnect <-configfs_composite_discon
>>    59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>    59.052519: eem_unbind <-purge_configs_funcs
>>    59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>    59.052537: device_remove_file <-composite_dev_cleanup
>>
>> device_remove_file gets called twice, once by device_remove after
>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
>> (removing the gadget)
> 
> I believe that the device_remove_file function is only removing suspend-specific attributes, not the complete gadget.
> Typically, when you perform the role switch, the Gadget start/stop function in your UDC driver is called. These functions should not delete the gadget
> 
> To investigate further, could you please enable the DWC3 functions in ftrace and check who is removing the gadget?
> I can also enable this on my system and compare the logs with yours, but I will be in PI planning for 1.5 weeks and may not be able to provide immediate support.

Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so 
trace is 600kb. I cut uninteresting stuff before 
configfs_composite_reset <-usb_gadget_udc_reset and after 
__dwc3_set_mode, < 300 lines remain. See attached tar.gz for you to compare.

> Additionally, please check if you have any customized DWC patches that may be causing this problem.
> 
>>
>>> You may recall the whole issue did not occur before this patch got applied.
>>>
>>>> Hardik
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>> kobject and sysfs interface.
>>>>>>> I suggest sharing the analysis here to understand why this practice
>>>>>>> is not followed in your use case or driver ?
>>>>>>
>>>>>> Yes, I'll try to trace where that happens.
>>>>>>
>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>> harmless? But the usb: net device remaining after disconnect or role
>>>>>> switch is not good, as the route remains.
>>>>>>
>>>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>>>> happens if you make eem connection and then unplug?
>>>>>>
>>>>>>> I am curious why the driver was developed without adhering to the
>>>>>>> kernel's gadget architecture.
>>>>>
>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>
>>>>>>>>
>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>> with this patch and
>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>> controller as well. But
>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> With Best Regards,
>>>>>>>> Andy Shevchenko
>>>>>>>>
>>>>>>>>
>>>>>>
>>>

[-- Attachment #2: trace-u_ether.tar.gz --]
[-- Type: application/gzip, Size: 1484 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-05-02 20:13                                           ` Ferry Toth
@ 2024-05-02 20:32                                             ` Ferry Toth
  0 siblings, 0 replies; 32+ messages in thread
From: Ferry Toth @ 2024-05-02 20:32 UTC (permalink / raw)
  To: Hardik Gajjar
  Cc: Andy Shevchenko, gregkh, s.hauer, jonathanh, linux-usb,
	linux-kernel, quic_linyyuan, paul, quic_eserrao, erosca

[-- Attachment #1: Type: text/plain, Size: 15022 bytes --]

Oops, sorry, wrong file attached . Now correct one.

Op 02-05-2024 om 22:13 schreef Ferry Toth:
> Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>>> Hi,
>>>
>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>>> Hi,
>>>>
>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>>> target 
>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>>> no
>>>>>>>>>>>>>> longer exists
>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>>> symlink should become active again
>>>>>>>>>>
>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>>
>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>>
>>>>>>>>>> Then switching to host mode:
>>>>>>>>>>
>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>> ls: cannot access
>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>>> No such file
>>>>>>>>>> or directory
>>>>>>>>>>
>>>>>>>>>> Then back to device mode:
>>>>>>>>>>
>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>>
>>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>>
>>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>>
>>>>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>>>>
>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>>>>> two changes?
>>>>>>>>
>>>>>>>>
>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>>>>> I could not reproduce
>>>>>>>> the issue.
>>>>>>>>
>>>>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>>
>>>>>>>> Following are the logs:
>>>>>>>> //Host to device
>>>>>>>>
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>>>>> mode
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>> usb0
>>>>>>>>
>>>>>>>> //device to host
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > 
>>>>>>>> mode
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>> usb0
>>>>>>>
>>>>>>> That is weird. When I switch to host mode (using the physical 
>>>>>>> switch),
>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>>
>>>>>>> Switching back to device mode, that gadget directory is 
>>>>>>> recreated. And
>>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>>
>>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>>> <
>>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>>> addr_len            flags              phys_switch_id
>>>>>>>> address             gro_flush_timeout  power
>>>>>>>> broadcast           ifalias            proto_down
>>>>>>>> carrier             ifindex            queues
>>>>>>>> carrier_changes     iflink             speed
>>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>>> dev_port            netdev_group       type
>>>>>>>> device              operstate          uevent
>>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>>              BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>>              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>>              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>>>              collisions:0 txqueuelen:1000
>>>>>>>>              RX bytes:0 TX bytes:0
>>>>>>>>
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>>
>>>>>>>> I strongly advise against reverting the patch solely based on the
>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>>>>> the usb0 interface remains available.
>>>>>>>
>>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>>
>>>>>>> I have while in host mode:
>>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>>>>>            inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>>> 10.42.0.255
>>>>>>>            inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64
>>>>>>> scopeid 0x20<link>
>>>>>>>
>>>>>>>
>>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>>
>>>>>>>> Instead, I recommend enabling FTRACE to trace the functions 
>>>>>>>> involved
>>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>>
>>>>>>> Switching from device -> host -> device:
>>>>>>>
>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>>      plugin 'function_graph'
>>>>>>> Hit Ctrl^C to stop recording
>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>>        188 bytes in size (4096 uncompressed)
>>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>>        0 bytes in size (0 uncompressed)
>>>>>>> root@yuna:~# trace-cmd report
>>>>>>> cpus=2
>>>>>>>         irq/68-dwc3-725   [000]   514.575337: 
>>>>>>> funcgraph_entry:      #
>>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>>         irq/68-dwc3-946   [000]   524.263731: 
>>>>>>> funcgraph_entry:      +
>>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>>         irq/68-dwc3-946   [000]   524.263743: 
>>>>>>> funcgraph_entry:      !
>>>>>>> 116.520 us |  gether_connect();
>>>>>>>         irq/68-dwc3-946   [000]   524.268029: 
>>>>>>> funcgraph_entry:      #
>>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>>         irq/68-dwc3-946   [000]   524.270089: 
>>>>>>> funcgraph_entry:      !
>>>>>>> 109.000 us |  gether_connect();
>>>>>>
>>>>>> I tried to get a more useful trace:
>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> 
>>>>>> set_ftrace_filter
>>>>>> -> switch to host mode then back to device
>>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>>> # tracer: function
>>>>>> #
>>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>>> #
>>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>>> #                               / _----=> need-resched
>>>>>> #                              | / _---=> hardirq/softirq
>>>>>> #                              || / _--=> preempt-depth
>>>>>> #                              ||| / _-=> migrate-disable
>>>>>> #                              |||| /     delay
>>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>>> #              | |         |   |||||     |         |
>>>>>>        irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>>>>> <-__composite_disconnect
>>>>>>        irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>>> <-reset_config
>>>>>>        irq/68-dwc3-523     [000] D..3.   133.992276: 
>>>>>> gether_disconnect
>>>>>> <-reset_config
>>>>>>        kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>>> <-purge_configs_funcs
>>>>>>
>>>>>> -> to device mode
>>>>>>
>>>>>>        kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>>> <-usb_add_function
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>>> <-composite_setup
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.155215: 
>>>>>> gether_disconnect
>>>>>> <-eem_set_alt
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
>>>>>> <-eem_set_alt
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>>> <-composite_setup
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.157292: 
>>>>>> gether_disconnect
>>>>>> <-eem_set_alt
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
>>>>>> <-eem_set_alt
>>>>>>        irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>>> <-rx_complete
>>>>>> ...
>>>>>>
>>>>>> I don't know where to look exactly. Any hints?
>>>>>
>>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
>>>>
>>>> Nope. It's a pitty that the trace formatting got messed up above. 
>>>> But as
>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>>> function is called, until I flip the switch to device mode.
>>>> The ... at the end is where I cut uninteresting eem_unwrap 
>>>> <-rx_complete
>>>> and eem_wrap <-eth_start_xmit lines.
>>>>
>>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>>> network side to check who deleting the sysfs entry
>>>>
>>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>>> that would be related to the patch we're discussing here. I was hoping
>>>> for a little more targeted hint.
>>>
>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 
>>> 'usb_fun*',
>>> 'reset_config' and 'device_remove_file' leads me to:
>>>
>>> TIMESTAMP  FUNCTION
>>>     |         |
>>>    49.952477: eem_wrap <-eth_start_xmit
>>>    55.072455: eem_wrap <-eth_start_xmit
>>>    55.072621: eem_unwrap <-rx_complete
>>>    59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>>    59.011545: composite_reset <-configfs_composite_reset
>>>    59.011548: reset_config <-__composite_disconnect
>>>    59.013565: eem_disable <-reset_config
>>>    59.013567: gether_disconnect <-reset_config
>>>    59.049560: device_remove_file <-device_remove
>>>    59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>>    59.051189: composite_disconnect <-configfs_composite_discon
>>>    59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>>    59.052519: eem_unbind <-purge_configs_funcs
>>>    59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>>    59.052537: device_remove_file <-composite_dev_cleanup
>>>
>>> device_remove_file gets called twice, once by device_remove after
>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
>>> (removing the gadget)
>>
>> I believe that the device_remove_file function is only removing 
>> suspend-specific attributes, not the complete gadget.
>> Typically, when you perform the role switch, the Gadget start/stop 
>> function in your UDC driver is called. These functions should not 
>> delete the gadget
>>
>> To investigate further, could you please enable the DWC3 functions in 
>> ftrace and check who is removing the gadget?
>> I can also enable this on my system and compare the logs with yours, 
>> but I will be in PI planning for 1.5 weeks and may not be able to 
>> provide immediate support.
> 
> Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so 
> trace is 600kb. I cut uninteresting stuff before 
> configfs_composite_reset <-usb_gadget_udc_reset and after 
> __dwc3_set_mode, < 300 lines remain. See attached tar.gz for you to 
> compare.
> 
>> Additionally, please check if you have any customized DWC patches that 
>> may be causing this problem.
>>
>>>
>>>> You may recall the whole issue did not occur before this patch got 
>>>> applied.
>>>>
>>>>> Hardik
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>>> kobject and sysfs interface.
>>>>>>>> I suggest sharing the analysis here to understand why this practice
>>>>>>>> is not followed in your use case or driver ?
>>>>>>>
>>>>>>> Yes, I'll try to trace where that happens.
>>>>>>>
>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>>> harmless? But the usb: net device remaining after disconnect or role
>>>>>>> switch is not good, as the route remains.
>>>>>>>
>>>>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>>>>> happens if you make eem connection and then unplug?
>>>>>>>
>>>>>>>> I am curious why the driver was developed without adhering to the
>>>>>>>> kernel's gadget architecture.
>>>>>>
>>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>>
>>>>>>>>>
>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>>> with this patch and
>>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>>> controller as well. But
>>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> With Best Regards,
>>>>>>>>> Andy Shevchenko
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>

[-- Attachment #2: trace_u_ether.tar.gz --]
[-- Type: application/gzip, Size: 2501 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-05-02 16:16                                             ` Hardik Gajjar
@ 2024-05-03  7:24                                               ` Linux regression tracking (Thorsten Leemhuis)
  2024-05-03  9:15                                                 ` Hardik Gajjar
  0 siblings, 1 reply; 32+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2024-05-03  7:24 UTC (permalink / raw)
  To: gregkh
  Cc: Ferry Toth, s.hauer, jonathanh, linux-usb, linux-kernel,
	quic_linyyuan, paul, quic_eserrao, erosca,
	Linux kernel regressions list, Andy Shevchenko, Hardik Gajjar

Greg, not totally sure, but this might be something where you might need
to make a judgement call. See below:

On 02.05.24 18:16, Hardik Gajjar wrote:
> On Thu, May 02, 2024 at 06:31:48PM +0300, Andy Shevchenko wrote:
>> On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote:
>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>> 

>> ...
>> 
>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>> longer exists
>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>>>> symlink should become active again
>>>>>>>>>>>
>>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>>>
>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>
>>>>>>>>>>> Then switching to host mode:
>>>>>>>>>>>
>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>> ls: cannot access
>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>>>> No such file
>>>>>>>>>>> or directory
>>>>>>>>>>>
>>>>>>>>>>> Then back to device mode:
>>>>>>>>>>>
>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>
>>>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>>>
>>>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>>>
>>>>>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>>>>>
>>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>>>>>> two changes?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>>>>>> I could not reproduce
>>>>>>>>> the issue.
>>>>>>>>>
>>>>>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>>>
>>>>>>>>> Following are the logs:
>>>>>>>>> //Host to device
>>>>>>>>>
>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>>>>>> mode
>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>> usb0
>>>>>>>>>
>>>>>>>>> //device to host
>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>> usb0
>>>>>>>>
>>>>>>>> That is weird. When I switch to host mode (using the physical switch),
>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>>>
>>>>>>>> Switching back to device mode, that gadget directory is recreated. And
>>>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>>>
>>>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>>>> <
>>>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>>>> addr_len            flags              phys_switch_id
>>>>>>>>> address             gro_flush_timeout  power
>>>>>>>>> broadcast           ifalias            proto_down
>>>>>>>>> carrier             ifindex            queues
>>>>>>>>> carrier_changes     iflink             speed
>>>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>>>> dev_port            netdev_group       type
>>>>>>>>> device              operstate          uevent
>>>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>>>             BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>>>             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>>>             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>>>>             collisions:0 txqueuelen:1000
>>>>>>>>>             RX bytes:0 TX bytes:0
>>>>>>>>>
>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>>>
>>>>>>>>> I strongly advise against reverting the patch solely based on the
>>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>>>>>> the usb0 interface remains available.
>>>>>>>>
>>>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>>>
>>>>>>>> I have while in host mode:
>>>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>>>>>>           inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>>>> 10.42.0.255
>>>>>>>>           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64 
>>>>>>>> scopeid 0x20<link>
>>>>>>>>
>>>>>>>>
>>>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>>>
>>>>>>>>> Instead, I recommend enabling FTRACE to trace the functions involved
>>>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>>>
>>>>>>>> Switching from device -> host -> device:
>>>>>>>>
>>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>>>     plugin 'function_graph'
>>>>>>>> Hit Ctrl^C to stop recording
>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>>>       188 bytes in size (4096 uncompressed)
>>>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>>>       0 bytes in size (0 uncompressed)
>>>>>>>> root@yuna:~# trace-cmd report
>>>>>>>> cpus=2
>>>>>>>>        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
>>>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>>>        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
>>>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>>>        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
>>>>>>>> 116.520 us |  gether_connect();
>>>>>>>>        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
>>>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>>>        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
>>>>>>>> 109.000 us |  gether_connect();
>>>>>>>
>>>>>>> I tried to get a more useful trace:
>>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
>>>>>>> -> switch to host mode then back to device
>>>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>>>> # tracer: function
>>>>>>> #
>>>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>>>> #
>>>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>>>> #                               / _----=> need-resched
>>>>>>> #                              | / _---=> hardirq/softirq
>>>>>>> #                              || / _--=> preempt-depth
>>>>>>> #                              ||| / _-=> migrate-disable
>>>>>>> #                              |||| /     delay
>>>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>>>> #              | |         |   |||||     |         |
>>>>>>>       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>>>>>> <-__composite_disconnect
>>>>>>>       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>>>> <-reset_config
>>>>>>>       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
>>>>>>> <-reset_config
>>>>>>>       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>>>> <-purge_configs_funcs
>>>>>>>
>>>>>>> -> to device mode
>>>>>>>
>>>>>>>       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>>>> <-usb_add_function
>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>>>> <-composite_setup
>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
>>>>>>> <-eem_set_alt
>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
>>>>>>> <-eem_set_alt
>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>>>> <-composite_setup
>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
>>>>>>> <-eem_set_alt
>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
>>>>>>> <-eem_set_alt
>>>>>>>       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>>>> <-rx_complete
>>>>>>> ...
>>>>>>>
>>>>>>> I don't know where to look exactly. Any hints?
>>>>>>
>>>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
>>>>>
>>>>> Nope. It's a pitty that the trace formatting got messed up above. But as
>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>>>> function is called, until I flip the switch to device mode.
>>>>> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
>>>>> and eem_wrap <-eth_start_xmit lines.
>>>>>
>>>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>>>> network side to check who deleting the sysfs entry
>>>>>
>>>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>>>> that would be related to the patch we're discussing here. I was hoping
>>>>> for a little more targeted hint.
>>>>
>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
>>>> 'reset_config' and 'device_remove_file' leads me to:
>>>>
>>>> TIMESTAMP  FUNCTION
>>>>    |         |
>>>>   49.952477: eem_wrap <-eth_start_xmit
>>>>   55.072455: eem_wrap <-eth_start_xmit
>>>>   55.072621: eem_unwrap <-rx_complete
>>>>   59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>>>   59.011545: composite_reset <-configfs_composite_reset
>>>>   59.011548: reset_config <-__composite_disconnect
>>>>   59.013565: eem_disable <-reset_config
>>>>   59.013567: gether_disconnect <-reset_config
>>>>   59.049560: device_remove_file <-device_remove
>>>>   59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>>>   59.051189: composite_disconnect <-configfs_composite_discon
>>>>   59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>>>   59.052519: eem_unbind <-purge_configs_funcs
>>>>   59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>>>   59.052537: device_remove_file <-composite_dev_cleanup
>>>>
>>>> device_remove_file gets called twice, once by device_remove after
>>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
>>>> (removing the gadget)
>>>
>>> I believe that the device_remove_file function is only removing
>>> suspend-specific attributes, not the complete gadget.  Typically, when you
>>> perform the role switch, the Gadget start/stop function in your UDC driver is
>>> called. These functions should not delete the gadget
>>>
>>> To investigate further, could you please enable the DWC3 functions in ftrace
>>> and check who is removing the gadget?  I can also enable this on my system
>>> and compare the logs with yours, but I will be in PI planning for 1.5 weeks
>>> and may not be able to provide immediate support.
>>
>> Since we are almost at -rc7, I propose to revert and try again next cycle.
> 
> Why? There is no problem with this patch!
> 
> I don't know why you are so excited to revert the patch instead of
> investigating the original issue. Even though I have proved that the
> problem is caused by usb0 being removed just by role-switching the
> port by the UDC driver, this behavior is incorrect.
> 
> There is no behavior in the Linux kernel that keeps the network
> interface and removes the related sysfs entry. THIS IS WRONG and has
> been FIXED without reverting any mainline patch.
> 
> Please don't encourage reverting any mainstream patches to avoid
> investigating a (probably) faulty driver. This is not how the
> community should work.

I only skimmed the thread and this is not my area of expertise, so
correct me if I'm wrong anywhere here:

But from what I see I tend to disagree: the patch mentioned in the
subject apparently regressed something. This afaics was reported 16
weeks ago during the 6.7 cycle[1]. Due to how we apply the "no
regressions" rule[2] the change causing this ideally should have been
reverted before 6.8 was release -- it does not matter that it just
exposed an existing problem rooted somewhere else. That revert did not
happen. Then even a few more weeks went by and this is still not fixed.
Then I'd guess reverting this might be the right course of action to
motivate someone to fix the exposed problem. Unless of course we know or
fear that a revert causes regressions for users of 6.8.y -- is that the
case?

[1] https://lore.kernel.org/all/ZaQS5x-XK08Jre6I@smile.fi.intel.com/

[2] https://docs.kernel.org/process/handling-regressions.html

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

> Also, I'm a bit wondering: the patch has been in the mainline for
> some time, and we haven't had any issues except this one, which is
> due to the wrong behavior of the UDC driver.
>>
>>> Additionally, please check if you have any customized DWC patches that may be causing this problem.
>>>
>>>>> You may recall the whole issue did not occur before this patch got applied.
>>>>>
>>>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>>>> kobject and sysfs interface.
>>>>>>>>> I suggest sharing the analysis here to understand why this practice
>>>>>>>>> is not followed in your use case or driver ?
>>>>>>>>
>>>>>>>> Yes, I'll try to trace where that happens.
>>>>>>>>
>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>>>> harmless? But the usb: net device remaining after disconnect or role
>>>>>>>> switch is not good, as the route remains.
>>>>>>>>
>>>>>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>>>>>> happens if you make eem connection and then unplug?
>>>>>>>>
>>>>>>>>> I am curious why the driver was developed without adhering to the
>>>>>>>>> kernel's gadget architecture.
>>>>>>>
>>>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>>>
>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>>>> with this patch and
>>>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>>>> controller as well. But
>>>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>>>> We are happy to test any fixes if you come up with.
>>
>> -- 
>> With Best Regards,
>> Andy Shevchenko
>>
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-05-03  7:24                                               ` Linux regression tracking (Thorsten Leemhuis)
@ 2024-05-03  9:15                                                 ` Hardik Gajjar
  2024-05-03 12:39                                                   ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 32+ messages in thread
From: Hardik Gajjar @ 2024-05-03  9:15 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: gregkh, Ferry Toth, s.hauer, jonathanh, linux-usb, linux-kernel,
	quic_linyyuan, paul, quic_eserrao, erosca, Andy Shevchenko,
	Hardik Gajjar

On Fri, May 03, 2024 at 09:24:16AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> Greg, not totally sure, but this might be something where you might need
> to make a judgement call. See below:
> 
> On 02.05.24 18:16, Hardik Gajjar wrote:
> > On Thu, May 02, 2024 at 06:31:48PM +0300, Andy Shevchenko wrote:
> >> On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote:
> >>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
> >>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
> >>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
> >>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
> >>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
> >>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> >>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> >>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> >>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> >>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> >>>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> >>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> >>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> >> 
> 
> >> ...
> >> 
> >>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
> >>>>>>>>>>>>>>> removed and it is a link, the link
> >>>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
> >>>>>>>>>>>>>>> no
> >>>>>>>>>>>>>>> longer exists
> >>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
> >>>>>>>>>>>>> present, but the symlink is
> >>>>>>>>>>>>> broken. In that case, the dwc3 driver should
> >>>>>>>>>>>>> recreate the device, and the
> >>>>>>>>>>>>> symlink should become active again
> >>>>>>>>>>>
> >>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
> >>>>>>>>>>>
> >>>>>>>>>>> root@yuna:~# ls
> >>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> >>>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
> >>>>>>>>>>>
> >>>>>>>>>>> Then switching to host mode:
> >>>>>>>>>>>
> >>>>>>>>>>> root@yuna:~# ls
> >>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> >>>>>>>>>>> ls: cannot access
> >>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
> >>>>>>>>>>> No such file
> >>>>>>>>>>> or directory
> >>>>>>>>>>>
> >>>>>>>>>>> Then back to device mode:
> >>>>>>>>>>>
> >>>>>>>>>>> root@yuna:~# ls
> >>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> >>>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
> >>>>>>>>>>>
> >>>>>>>>>>> net is missing. But, network functions:
> >>>>>>>>>>>
> >>>>>>>>>>> root@yuna:~# ping 10.42.0.1
> >>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
> >>>>>>>>>>>
> >>>>>>>>>>> Mass storage device is created and removed each time as expected.
> >>>>>>>>>>
> >>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
> >>>>>>>>>> two changes?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
> >>>>>>>>> I could not reproduce
> >>>>>>>>> the issue.
> >>>>>>>>>
> >>>>>>>>> I can see the usb0 exist all the time and accessible regardless of
> >>>>>>>>> the role switching of the USB mode (peripheral <-> host)
> >>>>>>>>>
> >>>>>>>>> Following are the logs:
> >>>>>>>>> //Host to device
> >>>>>>>>>
> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
> >>>>>>>>>> mode
> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
> >>>>>>>>> a800000.dwc3/gadget/net/
> >>>>>>>>> usb0
> >>>>>>>>>
> >>>>>>>>> //device to host
> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
> >>>>>>>>> a800000.dwc3/gadget/net/
> >>>>>>>>> usb0
> >>>>>>>>
> >>>>>>>> That is weird. When I switch to host mode (using the physical switch),
> >>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
> >>>>>>>>
> >>>>>>>> Switching back to device mode, that gadget directory is recreated. And
> >>>>>>>> gadget/sound as well, but not gadget/net.
> >>>>>>>>
> >>>>>>>>> s a800000.dwc3/gadget/net/usb0
> >>>>>>>>> <
> >>>>>>>>> addr_assign_type    duplex             phys_port_name
> >>>>>>>>> addr_len            flags              phys_switch_id
> >>>>>>>>> address             gro_flush_timeout  power
> >>>>>>>>> broadcast           ifalias            proto_down
> >>>>>>>>> carrier             ifindex            queues
> >>>>>>>>> carrier_changes     iflink             speed
> >>>>>>>>> carrier_down_count  link_mode          statistics
> >>>>>>>>> carrier_up_count    mtu                subsystem
> >>>>>>>>> dev_id              name_assign_type   tx_queue_len
> >>>>>>>>> dev_port            netdev_group       type
> >>>>>>>>> device              operstate          uevent
> >>>>>>>>> dormant             phys_port_id       waiting_for_supplier
> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
> >>>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
> >>>>>>>>>             BROADCAST MULTICAST  MTU:1500  Metric:1
> >>>>>>>>>             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> >>>>>>>>>             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> >>>>>>>>>             collisions:0 txqueuelen:1000
> >>>>>>>>>             RX bytes:0 TX bytes:0
> >>>>>>>>>
> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
> >>>>>>>>>
> >>>>>>>>> I strongly advise against reverting the patch solely based on the
> >>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
> >>>>>>>>> the usb0 interface remains available.
> >>>>>>>>
> >>>>>>>> There's more to it. I also mentioned that switching the role or
> >>>>>>>> unplugging the cable leaves the usb0 connection.
> >>>>>>>>
> >>>>>>>> I have while in host mode:
> >>>>>>>> root@yuna:~# ifconfig -a usb0
> >>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
> >>>>>>>>           inet 10.42.0.221  netmask 255.255.255.0  broadcast
> >>>>>>>> 10.42.0.255
> >>>>>>>>           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64 
> >>>>>>>> scopeid 0x20<link>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> You don't see that because you didn't create a connection at all.
> >>>>>>>>
> >>>>>>>>> Instead, I recommend enabling FTRACE to trace the functions involved
> >>>>>>>>> and identify which faulty call is responsible for removing usb0.
> >>>>>>>>
> >>>>>>>> Switching from device -> host -> device:
> >>>>>>>>
> >>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
> >>>>>>>>     plugin 'function_graph'
> >>>>>>>> Hit Ctrl^C to stop recording
> >>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
> >>>>>>>>       188 bytes in size (4096 uncompressed)
> >>>>>>>> CPU1 data recorded at offset=0x1c9000
> >>>>>>>>       0 bytes in size (0 uncompressed)
> >>>>>>>> root@yuna:~# trace-cmd report
> >>>>>>>> cpus=2
> >>>>>>>>        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
> >>>>>>>> 2079.480 us |  gether_disconnect();
> >>>>>>>>        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
> >>>>>>>> 11.640 us  |  gether_disconnect();
> >>>>>>>>        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
> >>>>>>>> 116.520 us |  gether_connect();
> >>>>>>>>        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
> >>>>>>>> 2057.260 us |  gether_disconnect();
> >>>>>>>>        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
> >>>>>>>> 109.000 us |  gether_connect();
> >>>>>>>
> >>>>>>> I tried to get a more useful trace:
> >>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
> >>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
> >>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
> >>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
> >>>>>>> -> switch to host mode then back to device
> >>>>>>> root@yuna:/sys/kernel/tracing# cat trace
> >>>>>>> # tracer: function
> >>>>>>> #
> >>>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
> >>>>>>> #
> >>>>>>> #                                _-----=> irqs-off/BH-disabled
> >>>>>>> #                               / _----=> need-resched
> >>>>>>> #                              | / _---=> hardirq/softirq
> >>>>>>> #                              || / _--=> preempt-depth
> >>>>>>> #                              ||| / _-=> migrate-disable
> >>>>>>> #                              |||| /     delay
> >>>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> >>>>>>> #              | |         |   |||||     |         |
> >>>>>>>       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
> >>>>>>> <-__composite_disconnect
> >>>>>>>       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
> >>>>>>> <-reset_config
> >>>>>>>       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
> >>>>>>> <-reset_config
> >>>>>>>       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
> >>>>>>> <-purge_configs_funcs
> >>>>>>>
> >>>>>>> -> to device mode
> >>>>>>>
> >>>>>>>       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
> >>>>>>> <-usb_add_function
> >>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
> >>>>>>> <-composite_setup
> >>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
> >>>>>>> <-eem_set_alt
> >>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
> >>>>>>> <-eem_set_alt
> >>>>>>>       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
> >>>>>>> <-composite_setup
> >>>>>>>       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
> >>>>>>> <-eem_set_alt
> >>>>>>>       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
> >>>>>>> <-eem_set_alt
> >>>>>>>       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
> >>>>>>> <-rx_complete
> >>>>>>> ...
> >>>>>>>
> >>>>>>> I don't know where to look exactly. Any hints?
> >>>>>>
> >>>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
> >>>>>
> >>>>> Nope. It's a pitty that the trace formatting got messed up above. But as
> >>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
> >>>>> function is called, until I flip the switch to device mode.
> >>>>> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
> >>>>> and eem_wrap <-eth_start_xmit lines.
> >>>>>
> >>>>>> If not then, you may try to enable tracing of TCP/IP stack and
> >>>>>> network side to check who deleting the sysfs entry
> >>>>>
> >>>>> Yes, that's a vast amount of functions to trace. And I don't see how
> >>>>> that would be related to the patch we're discussing here. I was hoping
> >>>>> for a little more targeted hint.
> >>>>
> >>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
> >>>> 'reset_config' and 'device_remove_file' leads me to:
> >>>>
> >>>> TIMESTAMP  FUNCTION
> >>>>    |         |
> >>>>   49.952477: eem_wrap <-eth_start_xmit
> >>>>   55.072455: eem_wrap <-eth_start_xmit
> >>>>   55.072621: eem_unwrap <-rx_complete
> >>>>   59.011540: configfs_composite_reset <-usb_gadget_udc_reset
> >>>>   59.011545: composite_reset <-configfs_composite_reset
> >>>>   59.011548: reset_config <-__composite_disconnect
> >>>>   59.013565: eem_disable <-reset_config
> >>>>   59.013567: gether_disconnect <-reset_config
> >>>>   59.049560: device_remove_file <-device_remove
> >>>>   59.051185: configfs_composite_disconnect <-usb_gadget_disco
> >>>>   59.051189: composite_disconnect <-configfs_composite_discon
> >>>>   59.051195: configfs_composite_unbind <-gadget_unbind_driver
> >>>>   59.052519: eem_unbind <-purge_configs_funcs
> >>>>   59.052529: composite_dev_cleanup <-configfs_composite_unbin
> >>>>   59.052537: device_remove_file <-composite_dev_cleanup
> >>>>
> >>>> device_remove_file gets called twice, once by device_remove after
> >>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
> >>>> (removing the gadget)
> >>>
> >>> I believe that the device_remove_file function is only removing
> >>> suspend-specific attributes, not the complete gadget.  Typically, when you
> >>> perform the role switch, the Gadget start/stop function in your UDC driver is
> >>> called. These functions should not delete the gadget
> >>>
> >>> To investigate further, could you please enable the DWC3 functions in ftrace
> >>> and check who is removing the gadget?  I can also enable this on my system
> >>> and compare the logs with yours, but I will be in PI planning for 1.5 weeks
> >>> and may not be able to provide immediate support.
> >>
> >> Since we are almost at -rc7, I propose to revert and try again next cycle.
> > 
> > Why? There is no problem with this patch!
> > 
> > I don't know why you are so excited to revert the patch instead of
> > investigating the original issue. Even though I have proved that the
> > problem is caused by usb0 being removed just by role-switching the
> > port by the UDC driver, this behavior is incorrect.
> > 
> > There is no behavior in the Linux kernel that keeps the network
> > interface and removes the related sysfs entry. THIS IS WRONG and has
> > been FIXED without reverting any mainline patch.
> > 
> > Please don't encourage reverting any mainstream patches to avoid
> > investigating a (probably) faulty driver. This is not how the
> > community should work.
> 
> I only skimmed the thread and this is not my area of expertise, so
> correct me if I'm wrong anywhere here:
> 
> But from what I see I tend to disagree: the patch mentioned in the
> subject apparently regressed something. This afaics was reported 16
> weeks ago during the 6.7 cycle[1]. Due to how we apply the "no
> regressions" rule[2] the change causing this ideally should have been
> reverted before 6.8 was release -- it does not matter that it just
> exposed an existing problem rooted somewhere else. That revert did not
> happen. Then even a few more weeks went by and this is still not fixed.
> Then I'd guess reverting this might be the right course of action to
> motivate someone to fix the exposed problem. Unless of course we know or
> fear that a revert causes regressions for users of 6.8.y -- is that the
> case?
> 
> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_ZaQS5x-2DXK08Jre6I-40smile.fi.intel.com_&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=HBokvujOYQdyCxk3eMV6xtwLLNdCxjsnZ4nTzbhbLXM&e=
> 
> [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.kernel.org_process_handling-2Dregressions.html&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=lOh1a5Vzu2VNEaspoNgEipxRU9UDDPcibRLoOVrGbIU&e=
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__linux-2Dregtracking.leemhuis.info_about_-23tldr&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=MuVuulVZqYAjIqCVpNc01ve8yD6DFt2wOZEH4p9khTc&e=
> If I did something stupid, please tell me, as explained on that page.

I understand your perspective. Do you have documentation or guidelines available to ensure that the existing problem is resolved and the correct patch is reinstated?

As far as I know, the issue is currently only reproducible by the reporter. I also have a similar setup, but I have not been able to reproduce the issue myself (please refer to our previous discussions). This discrepancy could be due to some OEM modifications.

If the problem is only reproducible by the reporter, how do we motivate others to support and upstream the fix? Isn't it demotivating to revert the correct patch and retain the wrong behavior of the driver?

Additionally, is it possible to determine how many correct patches have been reverted in the past to address this particular problem?

Please correct me if I am wrong ?

> 
> > Also, I'm a bit wondering: the patch has been in the mainline for
> > some time, and we haven't had any issues except this one, which is
> > due to the wrong behavior of the UDC driver.
> >>
> >>> Additionally, please check if you have any customized DWC patches that may be causing this problem.
> >>>
> >>>>> You may recall the whole issue did not occur before this patch got applied.
> >>>>>
> >>>>>>>>> According to current kernel architecture of u_ether driver, only
> >>>>>>>>> gether_cleanup should remove the usb0 interface along with its
> >>>>>>>>> kobject and sysfs interface.
> >>>>>>>>> I suggest sharing the analysis here to understand why this practice
> >>>>>>>>> is not followed in your use case or driver ?
> >>>>>>>>
> >>>>>>>> Yes, I'll try to trace where that happens.
> >>>>>>>>
> >>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
> >>>>>>>> harmless? But the usb: net device remaining after disconnect or role
> >>>>>>>> switch is not good, as the route remains.
> >>>>>>>>
> >>>>>>>> May be they are 2 separate problems. Could you try to reproduce what
> >>>>>>>> happens if you make eem connection and then unplug?
> >>>>>>>>
> >>>>>>>>> I am curious why the driver was developed without adhering to the
> >>>>>>>>> kernel's gadget architecture.
> >>>>>>>
> >>>>>>> I don't know what you mean here. Which driver do you mean?
> >>>>>>>
> >>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
> >>>>>>>>>>>>> with this patch and
> >>>>>>>>>>>>> share result here.  May be we need some fix in dwc3
> >>>>>>>>>>> Would have been nice if someone could test on other
> >>>>>>>>>>> controller as well. But
> >>>>>>>>>>> another instance of dwc3 is also very welcome.
> >>>>>>>>>>>> It's quite possible, please test on your side.
> >>>>>>>>>>>> We are happy to test any fixes if you come up with.
> >>
> >> -- 
> >> With Best Regards,
> >> Andy Shevchenko
> >>
> >>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
  2024-05-03  9:15                                                 ` Hardik Gajjar
@ 2024-05-03 12:39                                                   ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 0 replies; 32+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2024-05-03 12:39 UTC (permalink / raw)
  To: Hardik Gajjar, Linux regressions mailing list
  Cc: gregkh, Ferry Toth, s.hauer, jonathanh, linux-usb, linux-kernel,
	quic_linyyuan, paul, quic_eserrao, erosca, Andy Shevchenko

On 03.05.24 11:15, Hardik Gajjar wrote:
> On Fri, May 03, 2024 at 09:24:16AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
>> Greg, not totally sure, but this might be something where you might need
>> to make a judgement call. See below:
>>
>> On 02.05.24 18:16, Hardik Gajjar wrote:
>>> On Thu, May 02, 2024 at 06:31:48PM +0300, Andy Shevchenko wrote:
>>>> On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote:
>>>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>>>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>
>>
>>>> ...
>>>>
>>>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>> longer exists
>>>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>>>>>> symlink should become active again
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then switching to host mode:
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>> ls: cannot access
>>>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>>>>>> No such file
>>>>>>>>>>>>> or directory
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then back to device mode:
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>>
>>>>>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>>>>>>>
>>>>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>>>>>>>> two changes?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>>>>>>>> I could not reproduce
>>>>>>>>>>> the issue.
>>>>>>>>>>>
>>>>>>>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>>>>>
>>>>>>>>>>> Following are the logs:
>>>>>>>>>>> //Host to device
>>>>>>>>>>>
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>>>>>>>> mode
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>>> usb0
>>>>>>>>>>>
>>>>>>>>>>> //device to host
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>>> usb0
>>>>>>>>>>
>>>>>>>>>> That is weird. When I switch to host mode (using the physical switch),
>>>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>>>>>
>>>>>>>>>> Switching back to device mode, that gadget directory is recreated. And
>>>>>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>>>>>
>>>>>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>>>>>> <
>>>>>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>>>>>> addr_len            flags              phys_switch_id
>>>>>>>>>>> address             gro_flush_timeout  power
>>>>>>>>>>> broadcast           ifalias            proto_down
>>>>>>>>>>> carrier             ifindex            queues
>>>>>>>>>>> carrier_changes     iflink             speed
>>>>>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>>>>>> dev_port            netdev_group       type
>>>>>>>>>>> device              operstate          uevent
>>>>>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>>>>>             BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>>>>>             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>>>>>             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>>>>>>             collisions:0 txqueuelen:1000
>>>>>>>>>>>             RX bytes:0 TX bytes:0
>>>>>>>>>>>
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>>>>>
>>>>>>>>>>> I strongly advise against reverting the patch solely based on the
>>>>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>>>>>>>> the usb0 interface remains available.
>>>>>>>>>>
>>>>>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>>>>>
>>>>>>>>>> I have while in host mode:
>>>>>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>>>>>>>>           inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>>>>>> 10.42.0.255
>>>>>>>>>>           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64 
>>>>>>>>>> scopeid 0x20<link>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>>>>>
>>>>>>>>>>> Instead, I recommend enabling FTRACE to trace the functions involved
>>>>>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>>>>>
>>>>>>>>>> Switching from device -> host -> device:
>>>>>>>>>>
>>>>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>>>>>     plugin 'function_graph'
>>>>>>>>>> Hit Ctrl^C to stop recording
>>>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>>>>>       188 bytes in size (4096 uncompressed)
>>>>>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>>>>>       0 bytes in size (0 uncompressed)
>>>>>>>>>> root@yuna:~# trace-cmd report
>>>>>>>>>> cpus=2
>>>>>>>>>>        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
>>>>>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>>>>>        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
>>>>>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>>>>>        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
>>>>>>>>>> 116.520 us |  gether_connect();
>>>>>>>>>>        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
>>>>>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>>>>>        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
>>>>>>>>>> 109.000 us |  gether_connect();
>>>>>>>>>
>>>>>>>>> I tried to get a more useful trace:
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
>>>>>>>>> -> switch to host mode then back to device
>>>>>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>>>>>> # tracer: function
>>>>>>>>> #
>>>>>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>>>>>> #
>>>>>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>>>>>> #                               / _----=> need-resched
>>>>>>>>> #                              | / _---=> hardirq/softirq
>>>>>>>>> #                              || / _--=> preempt-depth
>>>>>>>>> #                              ||| / _-=> migrate-disable
>>>>>>>>> #                              |||| /     delay
>>>>>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>>>>>> #              | |         |   |||||     |         |
>>>>>>>>>       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>>>>>>>> <-__composite_disconnect
>>>>>>>>>       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>>>>>> <-reset_config
>>>>>>>>>       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
>>>>>>>>> <-reset_config
>>>>>>>>>       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>>>>>> <-purge_configs_funcs
>>>>>>>>>
>>>>>>>>> -> to device mode
>>>>>>>>>
>>>>>>>>>       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>>>>>> <-usb_add_function
>>>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>>>>>> <-composite_setup
>>>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
>>>>>>>>> <-eem_set_alt
>>>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
>>>>>>>>> <-eem_set_alt
>>>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>>>>>> <-composite_setup
>>>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
>>>>>>>>> <-eem_set_alt
>>>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
>>>>>>>>> <-eem_set_alt
>>>>>>>>>       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>>>>>> <-rx_complete
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> I don't know where to look exactly. Any hints?
>>>>>>>>
>>>>>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
>>>>>>>
>>>>>>> Nope. It's a pitty that the trace formatting got messed up above. But as
>>>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>>>>>> function is called, until I flip the switch to device mode.
>>>>>>> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
>>>>>>> and eem_wrap <-eth_start_xmit lines.
>>>>>>>
>>>>>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>>>>>> network side to check who deleting the sysfs entry
>>>>>>>
>>>>>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>>>>>> that would be related to the patch we're discussing here. I was hoping
>>>>>>> for a little more targeted hint.
>>>>>>
>>>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
>>>>>> 'reset_config' and 'device_remove_file' leads me to:
>>>>>>
>>>>>> TIMESTAMP  FUNCTION
>>>>>>    |         |
>>>>>>   49.952477: eem_wrap <-eth_start_xmit
>>>>>>   55.072455: eem_wrap <-eth_start_xmit
>>>>>>   55.072621: eem_unwrap <-rx_complete
>>>>>>   59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>>>>>   59.011545: composite_reset <-configfs_composite_reset
>>>>>>   59.011548: reset_config <-__composite_disconnect
>>>>>>   59.013565: eem_disable <-reset_config
>>>>>>   59.013567: gether_disconnect <-reset_config
>>>>>>   59.049560: device_remove_file <-device_remove
>>>>>>   59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>>>>>   59.051189: composite_disconnect <-configfs_composite_discon
>>>>>>   59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>>>>>   59.052519: eem_unbind <-purge_configs_funcs
>>>>>>   59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>>>>>   59.052537: device_remove_file <-composite_dev_cleanup
>>>>>>
>>>>>> device_remove_file gets called twice, once by device_remove after
>>>>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
>>>>>> (removing the gadget)
>>>>>
>>>>> I believe that the device_remove_file function is only removing
>>>>> suspend-specific attributes, not the complete gadget.  Typically, when you
>>>>> perform the role switch, the Gadget start/stop function in your UDC driver is
>>>>> called. These functions should not delete the gadget
>>>>>
>>>>> To investigate further, could you please enable the DWC3 functions in ftrace
>>>>> and check who is removing the gadget?  I can also enable this on my system
>>>>> and compare the logs with yours, but I will be in PI planning for 1.5 weeks
>>>>> and may not be able to provide immediate support.
>>>>
>>>> Since we are almost at -rc7, I propose to revert and try again next cycle.
>>>
>>> Why? There is no problem with this patch!
>>>
>>> I don't know why you are so excited to revert the patch instead of
>>> investigating the original issue. Even though I have proved that the
>>> problem is caused by usb0 being removed just by role-switching the
>>> port by the UDC driver, this behavior is incorrect.
>>>
>>> There is no behavior in the Linux kernel that keeps the network
>>> interface and removes the related sysfs entry. THIS IS WRONG and has
>>> been FIXED without reverting any mainline patch.
>>>
>>> Please don't encourage reverting any mainstream patches to avoid
>>> investigating a (probably) faulty driver. This is not how the
>>> community should work.
>>
>> I only skimmed the thread and this is not my area of expertise, so
>> correct me if I'm wrong anywhere here:
>>
>> But from what I see I tend to disagree: the patch mentioned in the
>> subject apparently regressed something. This afaics was reported 16
>> weeks ago during the 6.7 cycle[1]. Due to how we apply the "no
>> regressions" rule[2] the change causing this ideally should have been
>> reverted before 6.8 was release -- it does not matter that it just
>> exposed an existing problem rooted somewhere else. That revert did not
>> happen. Then even a few more weeks went by and this is still not fixed.
>> Then I'd guess reverting this might be the right course of action to
>> motivate someone to fix the exposed problem. Unless of course we know or
>> fear that a revert causes regressions for users of 6.8.y -- is that the
>> case?
>>
>> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_ZaQS5x-2DXK08Jre6I-40smile.fi.intel.com_&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=HBokvujOYQdyCxk3eMV6xtwLLNdCxjsnZ4nTzbhbLXM&e=
>>
>> [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.kernel.org_process_handling-2Dregressions.html&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=lOh1a5Vzu2VNEaspoNgEipxRU9UDDPcibRLoOVrGbIU&e=
>>
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__linux-2Dregtracking.leemhuis.info_about_-23tldr&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=MuVuulVZqYAjIqCVpNc01ve8yD6DFt2wOZEH4p9khTc&e=
>> If I did something stupid, please tell me, as explained on that page.
> 
> I understand your perspective. Do you have documentation or
> guidelines available to ensure that the existing problem is resolved
> and the correct patch is reinstated?

The second link I gave (e.g. to a rendered version of
Documentation/process/handling-regressions.rst ) has quite a few quotes
from Linus which iirc should support this. Apart from this there is only
the lore archive and the the git log afaik.

> As far as I know, the issue is currently only reproducible by the
> reporter. I also have a similar setup, but I have not been able to
> reproduce the issue myself (please refer to our previous
> discussions). This discrepancy could be due to some OEM
> modifications.

To the board/machine in question or the kernel used beforehand? If the
former: that does not matter much. In the latter: than of course it's
not a regression.

> If the problem is only reproducible by the reporter, how do we
> motivate others to support and upstream the fix?

The general idea afaics is: there was someone that for some reason was
motivated to bring in the patch that caused the regression (indirectly
in this case); with a bit of luck that someone after the revert wants it
to be included again and thus is motivated to fix it (or search for the
underlying issue and fix that in this case).

Of course that does not always work out. OTOH with a bit of luck the
other developers will help with that, too.

And there are gray areas, but they are not that big. Not sure if this is
one, that in the end is up to Greg or Linus to decide.

> Isn't it
> demotivating to revert the correct patch and retain the wrong
> behavior of the driver?

Sure, but I'd say it's even more demotivating if things break for users
when they update their kernel, as then they won't update the next time
and stay on old versions with bugs and vulnerabilities. As that's afaics
what Linus wants to prevent -- and why he made the rule and established
current practices.

> Additionally, is it possible to determine how many correct patches
> have been reverted in the past to address this particular problem?

git log?

> Please correct me if I am wrong ?

HTH, Ciao, Thorsten

>>> Also, I'm a bit wondering: the patch has been in the mainline for
>>> some time, and we haven't had any issues except this one, which is
>>> due to the wrong behavior of the UDC driver.
>>>>
>>>>> Additionally, please check if you have any customized DWC patches that may be causing this problem.
>>>>>
>>>>>>> You may recall the whole issue did not occur before this patch got applied.
>>>>>>>
>>>>>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>>>>>> kobject and sysfs interface.
>>>>>>>>>>> I suggest sharing the analysis here to understand why this practice
>>>>>>>>>>> is not followed in your use case or driver ?
>>>>>>>>>>
>>>>>>>>>> Yes, I'll try to trace where that happens.
>>>>>>>>>>
>>>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>>>>>> harmless? But the usb: net device remaining after disconnect or role
>>>>>>>>>> switch is not good, as the route remains.
>>>>>>>>>>
>>>>>>>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>>>>>>>> happens if you make eem connection and then unplug?
>>>>>>>>>>
>>>>>>>>>>> I am curious why the driver was developed without adhering to the
>>>>>>>>>>> kernel's gadget architecture.
>>>>>>>>>
>>>>>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>>>>>
>>>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>>>>>> with this patch and
>>>>>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>>>>>> controller as well. But
>>>>>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>
>>>> -- 
>>>> With Best Regards,
>>>> Andy Shevchenko
>>>>
>>>>
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2024-05-03 12:39 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-06 14:12 [PATCH] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach Hardik Gajjar
2023-10-06 14:21 ` Greg KH
2023-10-06 14:50 ` Hardik Gajjar
2023-10-06 14:53 ` [PATCH v2] " Hardik Gajjar
2023-10-06 14:59   ` Greg KH
2023-10-06 15:38   ` [PATCH v3] " Hardik Gajjar
2023-10-06 15:56     ` [PATCH v4] " Hardik Gajjar
2024-01-14 16:59       ` Andy Shevchenko
2024-01-15 13:27         ` Hardik Gajjar
2024-01-15 20:10           ` Ferry Toth
2024-04-03 21:01             ` Ferry Toth
2024-04-05 11:38               ` Hardik Gajjar
2024-04-07 20:51                 ` Ferry Toth
2024-04-10 17:37                   ` Andy Shevchenko
2024-04-11 14:26                     ` Hardik Gajjar
2024-04-11 16:39                       ` Andy Shevchenko
2024-04-11 20:52                         ` Ferry Toth
2024-04-16 13:48                           ` Andy Shevchenko
2024-04-17 15:13                             ` Hardik Gajjar
2024-04-25 21:27                               ` Ferry Toth
2024-04-28 21:07                                 ` Ferry Toth
2024-04-30 15:32                                   ` Hardik Gajjar
2024-04-30 19:40                                     ` Ferry Toth
2024-04-30 21:12                                       ` Ferry Toth
2024-05-02 15:29                                         ` Hardik Gajjar
2024-05-02 15:31                                           ` Andy Shevchenko
2024-05-02 16:16                                             ` Hardik Gajjar
2024-05-03  7:24                                               ` Linux regression tracking (Thorsten Leemhuis)
2024-05-03  9:15                                                 ` Hardik Gajjar
2024-05-03 12:39                                                   ` Linux regression tracking (Thorsten Leemhuis)
2024-05-02 20:13                                           ` Ferry Toth
2024-05-02 20:32                                             ` Ferry Toth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).