All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] ice: Fix race during aux device (un)plugging
@ 2022-04-14 16:39 ` Ivan Vecera
  0 siblings, 0 replies; 10+ messages in thread
From: Ivan Vecera @ 2022-04-14 16:39 UTC (permalink / raw)
  To: netdev
  Cc: poros, mschmidt, Jesse Brandeburg, Tony Nguyen, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Shiraz Saleem, Dave Ertman,
	moderated list:INTEL ETHERNET DRIVERS, open list

Function ice_plug_aux_dev() assigns pf->adev field too early prior
aux device initialization and on other side ice_unplug_aux_dev()
starts aux device deinit and at the end assigns NULL to pf->adev.
This is wrong and can causes a crash when ice_send_event_to_aux()
call occurs during these operations because that function depends
on non-NULL value of pf->adev and does not assume that aux device
is half-initialized or half-destroyed.

Modify affected functions so pf->adev field is set after aux device
init and prior aux device destroy.

Reproducer:
cycle=1
while :;do
        echo "#### Cycle: $cycle"

        ip link set ens7f0 mtu 9000
        ip link add bond0 type bond mode 1 miimon 100
        ip link set bond0 up
        ifenslave bond0 ens7f0
        ip link set bond0 mtu 9000
        ethtool -L ens7f0 combined 1
        ip link del bond0
        ip link set ens7f0 mtu 1500
        sleep 1

        let cycle++
done

In short when the device is added/removed to/from bond the aux device
is unplugged/plugged. When MTU of the device is changed an event is
sent to aux device asynchronously. This can race with (un)plugging
operation and because pf->adev is set too early (plug) or too late
(unplug) and the function ice_send_event_to_aux() can touch
uninitialized or destroyed fields (e.g. pf->adev->dev.mutex like
in the following crash)

[   53.372066] bond0: (slave ens7f0): making interface the new active one
[   53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u
p link
[   53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[   53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an up
 link
[   54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed inval
idating tc mappings. Priority traffic classification disabled!
[   54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed inval
idating tc mappings. Priority traffic classification disabled!
[   54.248204] bond0: (slave ens7f0): Releasing backup interface
[   54.253955] bond0: (slave ens7f1): making interface the new active one
[   54.274875] bond0: (slave ens7f1): Releasing backup interface
[   54.289153] bond0 (unregistering): Released all slaves
[   55.383179] MII link monitoring set to 100 ms
[   55.398696] bond0: (slave ens7f0): making interface the new active one
[   55.405241] BUG: kernel NULL pointer dereference, address: 0000000000000080
[   55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u
p link
[   55.412198] #PF: supervisor write access in kernel mode
[   55.412200] #PF: error_code(0x0002) - not-present page
[   55.412201] PGD 25d2ad067 P4D 0
[   55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI
[   55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G S
           5.17.0-13579-g57f2d6540f03 #1
[   55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an up
 link
[   55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4 10/07/
2021
[   55.430226] Workqueue: ice ice_service_task [ice]
[   55.468169] RIP: 0010:mutex_unlock+0x10/0x20
[   55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 <f0> 48 0f b1 17 75 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48
[   55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246
[   55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: 0000000000000001
[   55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: 0000000000000080
[   55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: 0000000000000041
[   55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: ff1a79d1c7e48bc0
[   55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: 0000000000000000
[   55.532076] FS:  0000000000000000(0000) GS:ff1a79d0ffc00000(0000) knlGS:0000000000000000
[   55.540163] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: 0000000000771ef0
[   55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   55.567305] PKRU: 55555554
[   55.570018] Call Trace:
[   55.572474]  <TASK>
[   55.574579]  ice_service_task+0xaab/0xef0 [ice]
[   55.579130]  process_one_work+0x1c5/0x390
[   55.583141]  ? process_one_work+0x390/0x390
[   55.587326]  worker_thread+0x30/0x360
[   55.590994]  ? process_one_work+0x390/0x390
[   55.595180]  kthread+0xe6/0x110
[   55.598325]  ? kthread_complete_and_exit+0x20/0x20
[   55.603116]  ret_from_fork+0x1f/0x30
[   55.606698]  </TASK>

Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
---
 drivers/net/ethernet/intel/ice/ice_idc.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_idc.c b/drivers/net/ethernet/intel/ice/ice_idc.c
index 25a436d342c2..4d889049231b 100644
--- a/drivers/net/ethernet/intel/ice/ice_idc.c
+++ b/drivers/net/ethernet/intel/ice/ice_idc.c
@@ -290,7 +290,6 @@ int ice_plug_aux_dev(struct ice_pf *pf)
 		return -ENOMEM;
 
 	adev = &iadev->adev;
-	pf->adev = adev;
 	iadev->pf = pf;
 
 	adev->id = pf->aux_idx;
@@ -300,18 +299,18 @@ int ice_plug_aux_dev(struct ice_pf *pf)
 
 	ret = auxiliary_device_init(adev);
 	if (ret) {
-		pf->adev = NULL;
 		kfree(iadev);
 		return ret;
 	}
 
 	ret = auxiliary_device_add(adev);
 	if (ret) {
-		pf->adev = NULL;
 		auxiliary_device_uninit(adev);
 		return ret;
 	}
 
+	pf->adev = adev;
+
 	return 0;
 }
 
@@ -320,12 +319,14 @@ int ice_plug_aux_dev(struct ice_pf *pf)
  */
 void ice_unplug_aux_dev(struct ice_pf *pf)
 {
-	if (!pf->adev)
+	struct auxiliary_device *adev = pf->adev;
+
+	if (!adev)
 		return;
 
-	auxiliary_device_delete(pf->adev);
-	auxiliary_device_uninit(pf->adev);
 	pf->adev = NULL;
+	auxiliary_device_delete(adev);
+	auxiliary_device_uninit(adev);
 }
 
 /**
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Intel-wired-lan] [PATCH net] ice: Fix race during aux device (un)plugging
@ 2022-04-14 16:39 ` Ivan Vecera
  0 siblings, 0 replies; 10+ messages in thread
From: Ivan Vecera @ 2022-04-14 16:39 UTC (permalink / raw)
  To: intel-wired-lan

Function ice_plug_aux_dev() assigns pf->adev field too early prior
aux device initialization and on other side ice_unplug_aux_dev()
starts aux device deinit and at the end assigns NULL to pf->adev.
This is wrong and can causes a crash when ice_send_event_to_aux()
call occurs during these operations because that function depends
on non-NULL value of pf->adev and does not assume that aux device
is half-initialized or half-destroyed.

Modify affected functions so pf->adev field is set after aux device
init and prior aux device destroy.

Reproducer:
cycle=1
while :;do
        echo "#### Cycle: $cycle"

        ip link set ens7f0 mtu 9000
        ip link add bond0 type bond mode 1 miimon 100
        ip link set bond0 up
        ifenslave bond0 ens7f0
        ip link set bond0 mtu 9000
        ethtool -L ens7f0 combined 1
        ip link del bond0
        ip link set ens7f0 mtu 1500
        sleep 1

        let cycle++
done

In short when the device is added/removed to/from bond the aux device
is unplugged/plugged. When MTU of the device is changed an event is
sent to aux device asynchronously. This can race with (un)plugging
operation and because pf->adev is set too early (plug) or too late
(unplug) and the function ice_send_event_to_aux() can touch
uninitialized or destroyed fields (e.g. pf->adev->dev.mutex like
in the following crash)

[   53.372066] bond0: (slave ens7f0): making interface the new active one
[   53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u
p link
[   53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[   53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an up
 link
[   54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed inval
idating tc mappings. Priority traffic classification disabled!
[   54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed inval
idating tc mappings. Priority traffic classification disabled!
[   54.248204] bond0: (slave ens7f0): Releasing backup interface
[   54.253955] bond0: (slave ens7f1): making interface the new active one
[   54.274875] bond0: (slave ens7f1): Releasing backup interface
[   54.289153] bond0 (unregistering): Released all slaves
[   55.383179] MII link monitoring set to 100 ms
[   55.398696] bond0: (slave ens7f0): making interface the new active one
[   55.405241] BUG: kernel NULL pointer dereference, address: 0000000000000080
[   55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u
p link
[   55.412198] #PF: supervisor write access in kernel mode
[   55.412200] #PF: error_code(0x0002) - not-present page
[   55.412201] PGD 25d2ad067 P4D 0
[   55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI
[   55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G S
           5.17.0-13579-g57f2d6540f03 #1
[   55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an up
 link
[   55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4 10/07/
2021
[   55.430226] Workqueue: ice ice_service_task [ice]
[   55.468169] RIP: 0010:mutex_unlock+0x10/0x20
[   55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 <f0> 48 0f b1 17 75 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48
[   55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246
[   55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: 0000000000000001
[   55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: 0000000000000080
[   55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: 0000000000000041
[   55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: ff1a79d1c7e48bc0
[   55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: 0000000000000000
[   55.532076] FS:  0000000000000000(0000) GS:ff1a79d0ffc00000(0000) knlGS:0000000000000000
[   55.540163] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: 0000000000771ef0
[   55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   55.567305] PKRU: 55555554
[   55.570018] Call Trace:
[   55.572474]  <TASK>
[   55.574579]  ice_service_task+0xaab/0xef0 [ice]
[   55.579130]  process_one_work+0x1c5/0x390
[   55.583141]  ? process_one_work+0x390/0x390
[   55.587326]  worker_thread+0x30/0x360
[   55.590994]  ? process_one_work+0x390/0x390
[   55.595180]  kthread+0xe6/0x110
[   55.598325]  ? kthread_complete_and_exit+0x20/0x20
[   55.603116]  ret_from_fork+0x1f/0x30
[   55.606698]  </TASK>

Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
---
 drivers/net/ethernet/intel/ice/ice_idc.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_idc.c b/drivers/net/ethernet/intel/ice/ice_idc.c
index 25a436d342c2..4d889049231b 100644
--- a/drivers/net/ethernet/intel/ice/ice_idc.c
+++ b/drivers/net/ethernet/intel/ice/ice_idc.c
@@ -290,7 +290,6 @@ int ice_plug_aux_dev(struct ice_pf *pf)
 		return -ENOMEM;
 
 	adev = &iadev->adev;
-	pf->adev = adev;
 	iadev->pf = pf;
 
 	adev->id = pf->aux_idx;
@@ -300,18 +299,18 @@ int ice_plug_aux_dev(struct ice_pf *pf)
 
 	ret = auxiliary_device_init(adev);
 	if (ret) {
-		pf->adev = NULL;
 		kfree(iadev);
 		return ret;
 	}
 
 	ret = auxiliary_device_add(adev);
 	if (ret) {
-		pf->adev = NULL;
 		auxiliary_device_uninit(adev);
 		return ret;
 	}
 
+	pf->adev = adev;
+
 	return 0;
 }
 
@@ -320,12 +319,14 @@ int ice_plug_aux_dev(struct ice_pf *pf)
  */
 void ice_unplug_aux_dev(struct ice_pf *pf)
 {
-	if (!pf->adev)
+	struct auxiliary_device *adev = pf->adev;
+
+	if (!adev)
 		return;
 
-	auxiliary_device_delete(pf->adev);
-	auxiliary_device_uninit(pf->adev);
 	pf->adev = NULL;
+	auxiliary_device_delete(adev);
+	auxiliary_device_uninit(adev);
 }
 
 /**
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Intel-wired-lan] [PATCH net] ice: Fix race during aux device (un)plugging
  2022-04-14 16:39 ` [Intel-wired-lan] " Ivan Vecera
  (?)
@ 2022-04-15 11:12 ` Michal Schmidt
  2022-04-15 15:49     ` [Intel-wired-lan] " Ivan Vecera
  -1 siblings, 1 reply; 10+ messages in thread
From: Michal Schmidt @ 2022-04-15 11:12 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, Apr 14, 2022 at 6:39 PM Ivan Vecera <ivecera@redhat.com> wrote:

> Function ice_plug_aux_dev() assigns pf->adev field too early prior
> aux device initialization and on other side ice_unplug_aux_dev()
> starts aux device deinit and at the end assigns NULL to pf->adev.
> This is wrong and can causes a crash when ice_send_event_to_aux()
> call occurs during these operations because that function depends
> on non-NULL value of pf->adev and does not assume that aux device
> is half-initialized or half-destroyed.
>
> Modify affected functions so pf->adev field is set after aux device
> init and prior aux device destroy.
>
[...]

> @@ -320,12 +319,14 @@ int ice_plug_aux_dev(struct ice_pf *pf)
>   */
>  void ice_unplug_aux_dev(struct ice_pf *pf)
>  {
> -       if (!pf->adev)
> +       struct auxiliary_device *adev = pf->adev;
> +
> +       if (!adev)
>                 return;
>
> -       auxiliary_device_delete(pf->adev);
> -       auxiliary_device_uninit(pf->adev);
>         pf->adev = NULL;
> +       auxiliary_device_delete(adev);
> +       auxiliary_device_uninit(adev);
>  }
>

Hi Ivan,
What prevents ice_unplug_aux_dev() from running immediately after
ice_send_event_to_aux() gets past its "if (!pf->adev)" test ?
Michal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20220415/1465d2d6/attachment.html>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] ice: Fix race during aux device (un)plugging
  2022-04-15 11:12 ` Michal Schmidt
@ 2022-04-15 15:49     ` Ivan Vecera
  0 siblings, 0 replies; 10+ messages in thread
From: Ivan Vecera @ 2022-04-15 15:49 UTC (permalink / raw)
  To: Michal Schmidt
  Cc: netdev, Petr Oros, Jesse Brandeburg, Tony Nguyen,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Shiraz Saleem,
	Dave Ertman, moderated list:INTEL ETHERNET DRIVERS, open list

On Fri, 15 Apr 2022 13:12:03 +0200
Michal Schmidt <mschmidt@redhat.com> wrote:

> On Thu, Apr 14, 2022 at 6:39 PM Ivan Vecera <ivecera@redhat.com> wrote:
> 
> > Function ice_plug_aux_dev() assigns pf->adev field too early prior
> > aux device initialization and on other side ice_unplug_aux_dev()
> > starts aux device deinit and at the end assigns NULL to pf->adev.
> > This is wrong and can causes a crash when ice_send_event_to_aux()
> > call occurs during these operations because that function depends
> > on non-NULL value of pf->adev and does not assume that aux device
> > is half-initialized or half-destroyed.
> >
> > Modify affected functions so pf->adev field is set after aux device
> > init and prior aux device destroy.
> >  
> [...]
> 
> > @@ -320,12 +319,14 @@ int ice_plug_aux_dev(struct ice_pf *pf)
> >   */
> >  void ice_unplug_aux_dev(struct ice_pf *pf)
> >  {
> > -       if (!pf->adev)
> > +       struct auxiliary_device *adev = pf->adev;
> > +
> > +       if (!adev)
> >                 return;
> >
> > -       auxiliary_device_delete(pf->adev);
> > -       auxiliary_device_uninit(pf->adev);
> >         pf->adev = NULL;
> > +       auxiliary_device_delete(adev);
> > +       auxiliary_device_uninit(adev);
> >  }
> >  
> 
> Hi Ivan,
> What prevents ice_unplug_aux_dev() from running immediately after
> ice_send_event_to_aux() gets past its "if (!pf->adev)" test ?
> Michal

ice_send_event_to_aux() takes aux device lock. ice_unplug_aux_dev()
calls auxiliary_device_delete() that calls device_del(). device_del()
takes device_lock() prior kill_device(). So if ice_send_event_to_aux()
is in progress then device_del() waits for its completion.

Thanks,
Ivan


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Intel-wired-lan] [PATCH net] ice: Fix race during aux device (un)plugging
@ 2022-04-15 15:49     ` Ivan Vecera
  0 siblings, 0 replies; 10+ messages in thread
From: Ivan Vecera @ 2022-04-15 15:49 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, 15 Apr 2022 13:12:03 +0200
Michal Schmidt <mschmidt@redhat.com> wrote:

> On Thu, Apr 14, 2022 at 6:39 PM Ivan Vecera <ivecera@redhat.com> wrote:
> 
> > Function ice_plug_aux_dev() assigns pf->adev field too early prior
> > aux device initialization and on other side ice_unplug_aux_dev()
> > starts aux device deinit and at the end assigns NULL to pf->adev.
> > This is wrong and can causes a crash when ice_send_event_to_aux()
> > call occurs during these operations because that function depends
> > on non-NULL value of pf->adev and does not assume that aux device
> > is half-initialized or half-destroyed.
> >
> > Modify affected functions so pf->adev field is set after aux device
> > init and prior aux device destroy.
> >  
> [...]
> 
> > @@ -320,12 +319,14 @@ int ice_plug_aux_dev(struct ice_pf *pf)
> >   */
> >  void ice_unplug_aux_dev(struct ice_pf *pf)
> >  {
> > -       if (!pf->adev)
> > +       struct auxiliary_device *adev = pf->adev;
> > +
> > +       if (!adev)
> >                 return;
> >
> > -       auxiliary_device_delete(pf->adev);
> > -       auxiliary_device_uninit(pf->adev);
> >         pf->adev = NULL;
> > +       auxiliary_device_delete(adev);
> > +       auxiliary_device_uninit(adev);
> >  }
> >  
> 
> Hi Ivan,
> What prevents ice_unplug_aux_dev() from running immediately after
> ice_send_event_to_aux() gets past its "if (!pf->adev)" test ?
> Michal

ice_send_event_to_aux() takes aux device lock. ice_unplug_aux_dev()
calls auxiliary_device_delete() that calls device_del(). device_del()
takes device_lock() prior kill_device(). So if ice_send_event_to_aux()
is in progress then device_del() waits for its completion.

Thanks,
Ivan


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH net] ice: Fix race during aux device (un)plugging
  2022-04-15 15:49     ` [Intel-wired-lan] " Ivan Vecera
@ 2022-04-15 17:53       ` Ertman, David M
  -1 siblings, 0 replies; 10+ messages in thread
From: Ertman, David M @ 2022-04-15 17:53 UTC (permalink / raw)
  To: ivecera, mschmidt
  Cc: netdev, poros, Brandeburg, Jesse, Nguyen, Anthony L,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Saleem, Shiraz,
	moderated list:INTEL ETHERNET DRIVERS, open list

> -----Original Message-----
> From: Ivan Vecera <ivecera@redhat.com>
> Sent: Friday, April 15, 2022 8:50 AM
> To: mschmidt <mschmidt@redhat.com>
> Cc: netdev@vger.kernel.org; poros <poros@redhat.com>; Brandeburg,
> Jesse <jesse.brandeburg@intel.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; David S. Miller <davem@davemloft.net>;
> Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>;
> Saleem, Shiraz <shiraz.saleem@intel.com>; Ertman, David M
> <david.m.ertman@intel.com>; moderated list:INTEL ETHERNET DRIVERS
> <intel-wired-lan@lists.osuosl.org>; open list <linux-kernel@vger.kernel.org>
> Subject: Re: [PATCH net] ice: Fix race during aux device (un)plugging
> 
> On Fri, 15 Apr 2022 13:12:03 +0200
> Michal Schmidt <mschmidt@redhat.com> wrote:
> 
> > On Thu, Apr 14, 2022 at 6:39 PM Ivan Vecera <ivecera@redhat.com> wrote:
> >
> > > Function ice_plug_aux_dev() assigns pf->adev field too early prior
> > > aux device initialization and on other side ice_unplug_aux_dev()
> > > starts aux device deinit and at the end assigns NULL to pf->adev.
> > > This is wrong and can causes a crash when ice_send_event_to_aux()
> > > call occurs during these operations because that function depends
> > > on non-NULL value of pf->adev and does not assume that aux device
> > > is half-initialized or half-destroyed.
> > >
> > > Modify affected functions so pf->adev field is set after aux device
> > > init and prior aux device destroy.
> > >
> > [...]
> >
> > > @@ -320,12 +319,14 @@ int ice_plug_aux_dev(struct ice_pf *pf)
> > >   */
> > >  void ice_unplug_aux_dev(struct ice_pf *pf)
> > >  {
> > > -       if (!pf->adev)
> > > +       struct auxiliary_device *adev = pf->adev;
> > > +
> > > +       if (!adev)
> > >                 return;
> > >
> > > -       auxiliary_device_delete(pf->adev);
> > > -       auxiliary_device_uninit(pf->adev);
> > >         pf->adev = NULL;
> > > +       auxiliary_device_delete(adev);
> > > +       auxiliary_device_uninit(adev);
> > >  }
> > >
> >
> > Hi Ivan,
> > What prevents ice_unplug_aux_dev() from running immediately after
> > ice_send_event_to_aux() gets past its "if (!pf->adev)" test ?
> > Michal
> 
> ice_send_event_to_aux() takes aux device lock. ice_unplug_aux_dev()
> calls auxiliary_device_delete() that calls device_del(). device_del()
> takes device_lock() prior kill_device(). So if ice_send_event_to_aux()
> is in progress then device_del() waits for its completion.
> 
> Thanks,
> Ivan

Thanks for the patch Ivan!

Reviewed-by: Dave Ertman <david.m.ertman@intel.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Intel-wired-lan] [PATCH net] ice: Fix race during aux device (un)plugging
@ 2022-04-15 17:53       ` Ertman, David M
  0 siblings, 0 replies; 10+ messages in thread
From: Ertman, David M @ 2022-04-15 17:53 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Ivan Vecera <ivecera@redhat.com>
> Sent: Friday, April 15, 2022 8:50 AM
> To: mschmidt <mschmidt@redhat.com>
> Cc: netdev at vger.kernel.org; poros <poros@redhat.com>; Brandeburg,
> Jesse <jesse.brandeburg@intel.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; David S. Miller <davem@davemloft.net>;
> Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>;
> Saleem, Shiraz <shiraz.saleem@intel.com>; Ertman, David M
> <david.m.ertman@intel.com>; moderated list:INTEL ETHERNET DRIVERS
> <intel-wired-lan@lists.osuosl.org>; open list <linux-kernel@vger.kernel.org>
> Subject: Re: [PATCH net] ice: Fix race during aux device (un)plugging
> 
> On Fri, 15 Apr 2022 13:12:03 +0200
> Michal Schmidt <mschmidt@redhat.com> wrote:
> 
> > On Thu, Apr 14, 2022 at 6:39 PM Ivan Vecera <ivecera@redhat.com> wrote:
> >
> > > Function ice_plug_aux_dev() assigns pf->adev field too early prior
> > > aux device initialization and on other side ice_unplug_aux_dev()
> > > starts aux device deinit and at the end assigns NULL to pf->adev.
> > > This is wrong and can causes a crash when ice_send_event_to_aux()
> > > call occurs during these operations because that function depends
> > > on non-NULL value of pf->adev and does not assume that aux device
> > > is half-initialized or half-destroyed.
> > >
> > > Modify affected functions so pf->adev field is set after aux device
> > > init and prior aux device destroy.
> > >
> > [...]
> >
> > > @@ -320,12 +319,14 @@ int ice_plug_aux_dev(struct ice_pf *pf)
> > >   */
> > >  void ice_unplug_aux_dev(struct ice_pf *pf)
> > >  {
> > > -       if (!pf->adev)
> > > +       struct auxiliary_device *adev = pf->adev;
> > > +
> > > +       if (!adev)
> > >                 return;
> > >
> > > -       auxiliary_device_delete(pf->adev);
> > > -       auxiliary_device_uninit(pf->adev);
> > >         pf->adev = NULL;
> > > +       auxiliary_device_delete(adev);
> > > +       auxiliary_device_uninit(adev);
> > >  }
> > >
> >
> > Hi Ivan,
> > What prevents ice_unplug_aux_dev() from running immediately after
> > ice_send_event_to_aux() gets past its "if (!pf->adev)" test ?
> > Michal
> 
> ice_send_event_to_aux() takes aux device lock. ice_unplug_aux_dev()
> calls auxiliary_device_delete() that calls device_del(). device_del()
> takes device_lock() prior kill_device(). So if ice_send_event_to_aux()
> is in progress then device_del() waits for its completion.
> 
> Thanks,
> Ivan

Thanks for the patch Ivan!

Reviewed-by: Dave Ertman <david.m.ertman@intel.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] ice: Fix race during aux device (un)plugging
  2022-04-15 15:49     ` [Intel-wired-lan] " Ivan Vecera
@ 2022-04-20  6:36       ` Leon Romanovsky
  -1 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2022-04-20  6:36 UTC (permalink / raw)
  To: Ivan Vecera
  Cc: Michal Schmidt, netdev, Petr Oros, Jesse Brandeburg, Tony Nguyen,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Shiraz Saleem,
	Dave Ertman, moderated list:INTEL ETHERNET DRIVERS, open list

On Fri, Apr 15, 2022 at 05:49:32PM +0200, Ivan Vecera wrote:
> On Fri, 15 Apr 2022 13:12:03 +0200
> Michal Schmidt <mschmidt@redhat.com> wrote:
> 
> > On Thu, Apr 14, 2022 at 6:39 PM Ivan Vecera <ivecera@redhat.com> wrote:
> > 
> > > Function ice_plug_aux_dev() assigns pf->adev field too early prior
> > > aux device initialization and on other side ice_unplug_aux_dev()
> > > starts aux device deinit and at the end assigns NULL to pf->adev.
> > > This is wrong and can causes a crash when ice_send_event_to_aux()
> > > call occurs during these operations because that function depends
> > > on non-NULL value of pf->adev and does not assume that aux device
> > > is half-initialized or half-destroyed.
> > >
> > > Modify affected functions so pf->adev field is set after aux device
> > > init and prior aux device destroy.
> > >  
> > [...]
> > 
> > > @@ -320,12 +319,14 @@ int ice_plug_aux_dev(struct ice_pf *pf)
> > >   */
> > >  void ice_unplug_aux_dev(struct ice_pf *pf)
> > >  {
> > > -       if (!pf->adev)
> > > +       struct auxiliary_device *adev = pf->adev;
> > > +
> > > +       if (!adev)
> > >                 return;
> > >
> > > -       auxiliary_device_delete(pf->adev);
> > > -       auxiliary_device_uninit(pf->adev);
> > >         pf->adev = NULL;
> > > +       auxiliary_device_delete(adev);
> > > +       auxiliary_device_uninit(adev);
> > >  }
> > >  
> > 
> > Hi Ivan,
> > What prevents ice_unplug_aux_dev() from running immediately after
> > ice_send_event_to_aux() gets past its "if (!pf->adev)" test ?
> > Michal
> 
> ice_send_event_to_aux() takes aux device lock. ice_unplug_aux_dev()
> calls auxiliary_device_delete() that calls device_del(). device_del()
> takes device_lock() prior kill_device(). So if ice_send_event_to_aux()
> is in progress then device_del() waits for its completion.

Not really, you nullify pf->adev without any lock protection and
ice_send_event_to_aux() will simply crash.

 CPU#1          	|   CPU#2
			| ice_send_event_to_aux
 ice_unplug_aux_dev()   | ...
 ...                    | 
 pf->adev = NULL;       | 
      			| device_lock(&pf->adev->dev); <--- crash here.

Thanks


> 
> Thanks,
> Ivan
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Intel-wired-lan] [PATCH net] ice: Fix race during aux device (un)plugging
@ 2022-04-20  6:36       ` Leon Romanovsky
  0 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2022-04-20  6:36 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, Apr 15, 2022 at 05:49:32PM +0200, Ivan Vecera wrote:
> On Fri, 15 Apr 2022 13:12:03 +0200
> Michal Schmidt <mschmidt@redhat.com> wrote:
> 
> > On Thu, Apr 14, 2022 at 6:39 PM Ivan Vecera <ivecera@redhat.com> wrote:
> > 
> > > Function ice_plug_aux_dev() assigns pf->adev field too early prior
> > > aux device initialization and on other side ice_unplug_aux_dev()
> > > starts aux device deinit and at the end assigns NULL to pf->adev.
> > > This is wrong and can causes a crash when ice_send_event_to_aux()
> > > call occurs during these operations because that function depends
> > > on non-NULL value of pf->adev and does not assume that aux device
> > > is half-initialized or half-destroyed.
> > >
> > > Modify affected functions so pf->adev field is set after aux device
> > > init and prior aux device destroy.
> > >  
> > [...]
> > 
> > > @@ -320,12 +319,14 @@ int ice_plug_aux_dev(struct ice_pf *pf)
> > >   */
> > >  void ice_unplug_aux_dev(struct ice_pf *pf)
> > >  {
> > > -       if (!pf->adev)
> > > +       struct auxiliary_device *adev = pf->adev;
> > > +
> > > +       if (!adev)
> > >                 return;
> > >
> > > -       auxiliary_device_delete(pf->adev);
> > > -       auxiliary_device_uninit(pf->adev);
> > >         pf->adev = NULL;
> > > +       auxiliary_device_delete(adev);
> > > +       auxiliary_device_uninit(adev);
> > >  }
> > >  
> > 
> > Hi Ivan,
> > What prevents ice_unplug_aux_dev() from running immediately after
> > ice_send_event_to_aux() gets past its "if (!pf->adev)" test ?
> > Michal
> 
> ice_send_event_to_aux() takes aux device lock. ice_unplug_aux_dev()
> calls auxiliary_device_delete() that calls device_del(). device_del()
> takes device_lock() prior kill_device(). So if ice_send_event_to_aux()
> is in progress then device_del() waits for its completion.

Not really, you nullify pf->adev without any lock protection and
ice_send_event_to_aux() will simply crash.

 CPU#1          	|   CPU#2
			| ice_send_event_to_aux
 ice_unplug_aux_dev()   | ...
 ...                    | 
 pf->adev = NULL;       | 
      			| device_lock(&pf->adev->dev); <--- crash here.

Thanks


> 
> Thanks,
> Ivan
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Intel-wired-lan] [PATCH net] ice: Fix race during aux device (un)plugging
  2022-04-20  6:36       ` [Intel-wired-lan] " Leon Romanovsky
  (?)
@ 2022-04-20 13:59       ` Ivan Vecera
  -1 siblings, 0 replies; 10+ messages in thread
From: Ivan Vecera @ 2022-04-20 13:59 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, 20 Apr 2022 09:36:43 +0300
Leon Romanovsky <leon@kernel.org> wrote:

> > ice_send_event_to_aux() takes aux device lock. ice_unplug_aux_dev()
> > calls auxiliary_device_delete() that calls device_del(). device_del()
> > takes device_lock() prior kill_device(). So if ice_send_event_to_aux()
> > is in progress then device_del() waits for its completion.  
> 
> Not really, you nullify pf->adev without any lock protection and
> ice_send_event_to_aux() will simply crash.
> 
>  CPU#1          	|   CPU#2
> 			| ice_send_event_to_aux
>  ice_unplug_aux_dev()   | ...
>  ...                    | 
>  pf->adev = NULL;       | 
>       			| device_lock(&pf->adev->dev); <--- crash here.
> 
> Thanks

You are right, the window is very tiny but it's still there.
Will send v2.

Thanks,
Ivan


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-04-20 13:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-14 16:39 [PATCH net] ice: Fix race during aux device (un)plugging Ivan Vecera
2022-04-14 16:39 ` [Intel-wired-lan] " Ivan Vecera
2022-04-15 11:12 ` Michal Schmidt
2022-04-15 15:49   ` Ivan Vecera
2022-04-15 15:49     ` [Intel-wired-lan] " Ivan Vecera
2022-04-15 17:53     ` Ertman, David M
2022-04-15 17:53       ` [Intel-wired-lan] " Ertman, David M
2022-04-20  6:36     ` Leon Romanovsky
2022-04-20  6:36       ` [Intel-wired-lan] " Leon Romanovsky
2022-04-20 13:59       ` Ivan Vecera

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.