linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] net: dsa: microchip: Added the condition for scheduling ksz_mib_read_work
@ 2021-10-08  8:43 Arun Ramadoss
  2021-10-08 13:58 ` Andrew Lunn
  0 siblings, 1 reply; 5+ messages in thread
From: Arun Ramadoss @ 2021-10-08  8:43 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: George McCollister, Jakub Kicinski, David S. Miller,
	Vladimir Oltean, Florian Fainelli, Vivien Didelot, Andrew Lunn,
	UNGLinuxDriver, Woojung Huh

When the ksz module is installed and removed using rmmod, kernel crashes
with null pointer dereferrence error. During rmmod, ksz_switch_remove
function tries to cancel the mib_read_workqueue using
cancel_delayed_work_sync routine.

At the end of  mib_read_workqueue execution, it again reschedule the
workqueue unconditionally. Due to which queue rescheduled after
mib_interval, during this execution it tries to access dp->slave. But
the slave is unregistered in the ksz_switch_remove function. Hence
kernel crashes.

To avoid this crash, before canceling the workqueue, resetted the
mib_interval to 0. In the work queue execution, it schedules the
workqueue next time only if the mib_interval is non zero.

Fixes: 469b390e1ba3 ("net: dsa: microchip: use delayed_work instead of timer + work")
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
---
 drivers/net/dsa/microchip/ksz_common.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
index 1542bfb8b5e5..ffc8e6fb300a 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -94,7 +94,8 @@ static void ksz_mib_read_work(struct work_struct *work)
 		mutex_unlock(&mib->cnt_mutex);
 	}
 
-	schedule_delayed_work(&dev->mib_read, dev->mib_read_interval);
+	if (dev->mib_read_interval)
+		schedule_delayed_work(&dev->mib_read, dev->mib_read_interval);
 }
 
 void ksz_init_mib_timer(struct ksz_device *dev)
@@ -449,8 +450,10 @@ EXPORT_SYMBOL(ksz_switch_register);
 void ksz_switch_remove(struct ksz_device *dev)
 {
 	/* timer started */
-	if (dev->mib_read_interval)
+	if (dev->mib_read_interval) {
+		dev->mib_read_interval = 0;
 		cancel_delayed_work_sync(&dev->mib_read);
+	}
 
 	dev->dev_ops->exit(dev);
 	dsa_unregister_switch(dev->ds);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net] net: dsa: microchip: Added the condition for scheduling ksz_mib_read_work
  2021-10-08  8:43 [PATCH net] net: dsa: microchip: Added the condition for scheduling ksz_mib_read_work Arun Ramadoss
@ 2021-10-08 13:58 ` Andrew Lunn
  2021-10-08 18:34   ` Jakub Kicinski
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Lunn @ 2021-10-08 13:58 UTC (permalink / raw)
  To: Arun Ramadoss
  Cc: linux-kernel, netdev, George McCollister, Jakub Kicinski,
	David S. Miller, Vladimir Oltean, Florian Fainelli,
	Vivien Didelot, UNGLinuxDriver, Woojung Huh

On Fri, Oct 08, 2021 at 02:13:48PM +0530, Arun Ramadoss wrote:
> When the ksz module is installed and removed using rmmod, kernel crashes
> with null pointer dereferrence error. During rmmod, ksz_switch_remove
> function tries to cancel the mib_read_workqueue using
> cancel_delayed_work_sync routine.
> 
> At the end of  mib_read_workqueue execution, it again reschedule the
> workqueue unconditionally. Due to which queue rescheduled after
> mib_interval, during this execution it tries to access dp->slave. But
> the slave is unregistered in the ksz_switch_remove function. Hence
> kernel crashes.

Something not correct here.

https://www.kernel.org/doc/html/latest/core-api/workqueue.html?highlight=delayed_work#c.cancel_delayed_work_sync

This is cancel_work_sync() for delayed works.

and

https://www.kernel.org/doc/html/latest/core-api/workqueue.html?highlight=delayed_work#c.cancel_work_sync

This function can be used even if the work re-queues itself or
migrates to another workqueue.

Maybe the real problem is a missing call to destroy_worker()?

      Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] net: dsa: microchip: Added the condition for scheduling ksz_mib_read_work
  2021-10-08 13:58 ` Andrew Lunn
@ 2021-10-08 18:34   ` Jakub Kicinski
  2021-10-11  9:41     ` Arun.Ramadoss
  0 siblings, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2021-10-08 18:34 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Arun Ramadoss, linux-kernel, netdev, George McCollister,
	David S. Miller, Vladimir Oltean, Florian Fainelli,
	Vivien Didelot, UNGLinuxDriver, Woojung Huh

On Fri, 8 Oct 2021 15:58:16 +0200 Andrew Lunn wrote:
> On Fri, Oct 08, 2021 at 02:13:48PM +0530, Arun Ramadoss wrote:
> > When the ksz module is installed and removed using rmmod, kernel crashes
> > with null pointer dereferrence error. During rmmod, ksz_switch_remove
> > function tries to cancel the mib_read_workqueue using
> > cancel_delayed_work_sync routine.
> > 
> > At the end of  mib_read_workqueue execution, it again reschedule the
> > workqueue unconditionally. Due to which queue rescheduled after
> > mib_interval, during this execution it tries to access dp->slave. But
> > the slave is unregistered in the ksz_switch_remove function. Hence
> > kernel crashes.  
> 
> Something not correct here.
> 
> https://www.kernel.org/doc/html/latest/core-api/workqueue.html?highlight=delayed_work#c.cancel_delayed_work_sync
> 
> This is cancel_work_sync() for delayed works.
> 
> and
> 
> https://www.kernel.org/doc/html/latest/core-api/workqueue.html?highlight=delayed_work#c.cancel_work_sync
> 
> This function can be used even if the work re-queues itself or
> migrates to another workqueue.
> 
> Maybe the real problem is a missing call to destroy_worker()?

Also the cancel_delayed_work_sync() is suspiciously early in the remove
flow. There is a schedule_work call in ksz_mac_link_down() which may 
schedule the work back in. That'd also explain why the patch helps since
ksz_mac_link_down() only schedules if (dev->mib_read_interval).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] net: dsa: microchip: Added the condition for scheduling ksz_mib_read_work
  2021-10-08 18:34   ` Jakub Kicinski
@ 2021-10-11  9:41     ` Arun.Ramadoss
  2021-10-11 13:45       ` Jakub Kicinski
  0 siblings, 1 reply; 5+ messages in thread
From: Arun.Ramadoss @ 2021-10-11  9:41 UTC (permalink / raw)
  To: kuba, andrew
  Cc: olteanv, linux-kernel, george.mccollister, vivien.didelot,
	UNGLinuxDriver, f.fainelli, netdev, Woojung.Huh, davem

On Fri, 2021-10-08 at 11:34 -0700, Jakub Kicinski wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you
> know the content is safe
> 
> On Fri, 8 Oct 2021 15:58:16 +0200 Andrew Lunn wrote:
> > On Fri, Oct 08, 2021 at 02:13:48PM +0530, Arun Ramadoss wrote:
> > > When the ksz module is installed and removed using rmmod, kernel
> > > crashes
> > > with null pointer dereferrence error. During rmmod,
> > > ksz_switch_remove
> > > function tries to cancel the mib_read_workqueue using
> > > cancel_delayed_work_sync routine.
> > > 
> > > At the end of  mib_read_workqueue execution, it again reschedule
> > > the
> > > workqueue unconditionally. Due to which queue rescheduled after
> > > mib_interval, during this execution it tries to access dp->slave. 
> > > But
> > > the slave is unregistered in the ksz_switch_remove function.
> > > Hence
> > > kernel crashes.
> > 
> > Something not correct here.
> > 
> > 
https://www.kernel.org/doc/html/latest/core-api/workqueue.html?highlight=delayed_work#c.cancel_delayed_work_sync
> > 
> > This is cancel_work_sync() for delayed works.
> > 
> > and
> > 
> > 
https://www.kernel.org/doc/html/latest/core-api/workqueue.html?highlight=delayed_work#c.cancel_work_sync
> > 
> > This function can be used even if the work re-queues itself or
> > migrates to another workqueue.
> > 
> > Maybe the real problem is a missing call to destroy_worker()?
> 
> Also the cancel_delayed_work_sync() is suspiciously early in the
> remove
> flow. There is a schedule_work call in ksz_mac_link_down() which may
> schedule the work back in. That'd also explain why the patch helps
> since
> ksz_mac_link_down() only schedules if (dev->mib_read_interval).
In this patch, I did two things. Added the if condition for
rescheduling the queue and other is resetted the mib_read_interval to
zero.
As per the cancel_delayed_queue_sync() functionality, Now I tried rmod
after removing the if condition for resheduling the queue,kernel didn't
crash. So, concluded that it is not rearm in ksz_mib_read_work  is
causing the problem but it is due to scheduling in the
ksz_mac_link_down function. This function is called due to the
dsa_unregister_switch. Due to resetting of the mib_read_interval to
zero in switch_remove, the queue is not scheduled in mac_link_down, so
kernel didn't crash.

And also, as per suggestion on cancel_delayed_work_sync() is
suspiciously placed in switch_remove. I undo this patch, and tried just
by moving the canceling of delayed_work after the dsa_unregister_switch
function. As expected dsa_unregister_switch calls the
ksz_mac_link_down, which schedules the mib_read_work. Now, when
cancel_delayed_work_sync is called, it cancels all the workqueue. As a
result, module is removed successfully without kernel crash.

Can I send the updated patch as v1 or new patch with updated commit
message and description.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] net: dsa: microchip: Added the condition for scheduling ksz_mib_read_work
  2021-10-11  9:41     ` Arun.Ramadoss
@ 2021-10-11 13:45       ` Jakub Kicinski
  0 siblings, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2021-10-11 13:45 UTC (permalink / raw)
  To: Arun.Ramadoss
  Cc: andrew, olteanv, linux-kernel, george.mccollister,
	vivien.didelot, UNGLinuxDriver, f.fainelli, netdev, Woojung.Huh,
	davem

On Mon, 11 Oct 2021 09:41:55 +0000 Arun.Ramadoss@microchip.com wrote:
> On Fri, 2021-10-08 at 11:34 -0700, Jakub Kicinski wrote:
> > Also the cancel_delayed_work_sync() is suspiciously early in the
> > remove
> > flow. There is a schedule_work call in ksz_mac_link_down() which may
> > schedule the work back in. That'd also explain why the patch helps
> > since
> > ksz_mac_link_down() only schedules if (dev->mib_read_interval).  
> In this patch, I did two things. Added the if condition for
> rescheduling the queue and other is resetted the mib_read_interval to
> zero.
> As per the cancel_delayed_queue_sync() functionality, Now I tried rmod
> after removing the if condition for resheduling the queue,kernel didn't
> crash. So, concluded that it is not rearm in ksz_mib_read_work  is
> causing the problem but it is due to scheduling in the
> ksz_mac_link_down function. This function is called due to the
> dsa_unregister_switch. Due to resetting of the mib_read_interval to
> zero in switch_remove, the queue is not scheduled in mac_link_down, so
> kernel didn't crash.
> 
> And also, as per suggestion on cancel_delayed_work_sync() is
> suspiciously placed in switch_remove. I undo this patch, and tried just
> by moving the canceling of delayed_work after the dsa_unregister_switch
> function. As expected dsa_unregister_switch calls the
> ksz_mac_link_down, which schedules the mib_read_work. Now, when
> cancel_delayed_work_sync is called, it cancels all the workqueue. As a
> result, module is removed successfully without kernel crash.
> 
> Can I send the updated patch as v1 or new patch with updated commit
> message and description.

Please send a patch with just the second chunk (zeroing
mib_read_interval), you can mark it as v2.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-11 13:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-08  8:43 [PATCH net] net: dsa: microchip: Added the condition for scheduling ksz_mib_read_work Arun Ramadoss
2021-10-08 13:58 ` Andrew Lunn
2021-10-08 18:34   ` Jakub Kicinski
2021-10-11  9:41     ` Arun.Ramadoss
2021-10-11 13:45       ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).