All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net:phy fix driver reference count error when attach and detach phy device
@ 2016-11-30 10:22 Mao Wenan
  2016-12-02  9:45 ` David Laight
  0 siblings, 1 reply; 11+ messages in thread
From: Mao Wenan @ 2016-11-30 10:22 UTC (permalink / raw)
  To: netdev, f.fainelli, dingtianhong

The nic in my board use the phy dev from marvell, and the system will
load the marvell phy driver automatically, but when I remove the phy
drivers, the system immediately panic:
Call trace:
[ 2582.834493] [<ffff800000715384>] phy_state_machine+0x3c/0x438 [
2582.851754] [<ffff8000000db3b8>] process_one_work+0x150/0x428 [
2582.868188] [<ffff8000000db7d4>] worker_thread+0x144/0x4b0 [
2582.883882] [<ffff8000000e1d0c>] kthread+0xfc/0x110

there should be proper reference counting in place to avoid that.
I found that phy_attach_direct() forgets to add phy device driver
reference count, and phy_detach() forgets to subtract reference count.
This patch is to fix this bug, after that panic is disappeared when remove
marvell.ko

Signed-off-by: Mao Wenan <maowenan@huawei.com>
---
 drivers/net/phy/phy_device.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 1a4bf8a..a7ec7c2 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -866,6 +866,11 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
 		return -EIO;
 	}
 
+	if (!try_module_get(d->driver->owner)) {
+		dev_err(&dev->dev, "failed to get the device driver module\n");
+		return -EIO;
+	}
+
 	get_device(d);
 
 	/* Assume that if there is no driver, that it doesn't
@@ -921,6 +926,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
 
 error:
 	put_device(d);
+	module_put(d->driver->owner);
 	module_put(bus->owner);
 	return err;
 }
@@ -998,6 +1004,7 @@ void phy_detach(struct phy_device *phydev)
 	bus = phydev->mdio.bus;
 
 	put_device(&phydev->mdio.dev);
+	module_put(phydev->mdio.dev.driver->owner);
 	module_put(bus->owner);
 }
 EXPORT_SYMBOL(phy_detach);
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [PATCH] net:phy fix driver reference count error when attach and detach phy device
  2016-11-30 10:22 [PATCH] net:phy fix driver reference count error when attach and detach phy device Mao Wenan
@ 2016-12-02  9:45 ` David Laight
  2016-12-05  8:47   ` maowenan
  0 siblings, 1 reply; 11+ messages in thread
From: David Laight @ 2016-12-02  9:45 UTC (permalink / raw)
  To: 'Mao Wenan', netdev, f.fainelli, dingtianhong

From: Mao Wenan
> Sent: 30 November 2016 10:23
> The nic in my board use the phy dev from marvell, and the system will
> load the marvell phy driver automatically, but when I remove the phy
> drivers, the system immediately panic:
> Call trace:
> [ 2582.834493] [<ffff800000715384>] phy_state_machine+0x3c/0x438 [
> 2582.851754] [<ffff8000000db3b8>] process_one_work+0x150/0x428 [
> 2582.868188] [<ffff8000000db7d4>] worker_thread+0x144/0x4b0 [
> 2582.883882] [<ffff8000000e1d0c>] kthread+0xfc/0x110
> 
> there should be proper reference counting in place to avoid that.
> I found that phy_attach_direct() forgets to add phy device driver
> reference count, and phy_detach() forgets to subtract reference count.
> This patch is to fix this bug, after that panic is disappeared when remove
> marvell.ko
> 
> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> ---
>  drivers/net/phy/phy_device.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 1a4bf8a..a7ec7c2 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -866,6 +866,11 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>  		return -EIO;
>  	}
> 
> +	if (!try_module_get(d->driver->owner)) {
> +		dev_err(&dev->dev, "failed to get the device driver module\n");
> +		return -EIO;
> +	}

If this is the phy code, what stops the phy driver being unloaded
before the try_module_get() obtains a reference.
If it isn't the phy driver then there ought to be a reference count obtained
when the phy driver is located (by whatever decides which phy driver to use).
Even if that code later releases its reference (it probably shouldn't on success)
then you can't fail to get an extra reference here.

> +
>  	get_device(d);
> 
>  	/* Assume that if there is no driver, that it doesn't
> @@ -921,6 +926,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
> 
>  error:
>  	put_device(d);
> +	module_put(d->driver->owner);

Are those two in the wrong order ?

>  	module_put(bus->owner);
>  	return err;
>  }
> @@ -998,6 +1004,7 @@ void phy_detach(struct phy_device *phydev)
>  	bus = phydev->mdio.bus;
> 
>  	put_device(&phydev->mdio.dev);
> +	module_put(phydev->mdio.dev.driver->owner);
>  	module_put(bus->owner);

Where is this code called from?
You can't call it from the phy driver because the driver can be unloaded
as soon as the last reference is removed.
At that point the code memory is freed.

>  }
>  EXPORT_SYMBOL(phy_detach);
> --
> 2.7.0
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net:phy fix driver reference count error when attach and detach phy device
  2016-12-02  9:45 ` David Laight
@ 2016-12-05  8:47   ` maowenan
  2016-12-12  8:49     ` maowenan
  0 siblings, 1 reply; 11+ messages in thread
From: maowenan @ 2016-12-05  8:47 UTC (permalink / raw)
  To: David Laight, netdev, f.fainelli, dingtianhong, weiyongjun (A)



On 2016/12/2 17:45, David Laight wrote:
> From: Mao Wenan
>> Sent: 30 November 2016 10:23
>> The nic in my board use the phy dev from marvell, and the system will
>> load the marvell phy driver automatically, but when I remove the phy
>> drivers, the system immediately panic:
>> Call trace:
>> [ 2582.834493] [<ffff800000715384>] phy_state_machine+0x3c/0x438 [
>> 2582.851754] [<ffff8000000db3b8>] process_one_work+0x150/0x428 [
>> 2582.868188] [<ffff8000000db7d4>] worker_thread+0x144/0x4b0 [
>> 2582.883882] [<ffff8000000e1d0c>] kthread+0xfc/0x110
>>
>> there should be proper reference counting in place to avoid that.
>> I found that phy_attach_direct() forgets to add phy device driver
>> reference count, and phy_detach() forgets to subtract reference count.
>> This patch is to fix this bug, after that panic is disappeared when remove
>> marvell.ko
>>
>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
>> ---
>>  drivers/net/phy/phy_device.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index 1a4bf8a..a7ec7c2 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -866,6 +866,11 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>>  		return -EIO;
>>  	}
>>
>> +	if (!try_module_get(d->driver->owner)) {
>> +		dev_err(&dev->dev, "failed to get the device driver module\n");
>> +		return -EIO;
>> +	}
> 
> If this is the phy code, what stops the phy driver being unloaded
> before the try_module_get() obtains a reference.
> If it isn't the phy driver then there ought to be a reference count obtained
> when the phy driver is located (by whatever decides which phy driver to use).
> Even if that code later releases its reference (it probably shouldn't on success)
> then you can't fail to get an extra reference here.

[Mao Wenan]Yes, this is phy code, in function phy_attach_direct(), drivers/net/phy/phy_device.c.
when one NIC driver to do probe behavior, it will attach one matched phy driver. phy_attach_direct()
is to obtain phy driver reference and bind phy driver, if try_module_get() execute on success, the reference
count is added; if failed, the driver can't be attached to this NIC, and it can't added the phy driver
reference count. So before try_module_get obtains a reference, phy driver can't can't be bound to this NIC.
when the phy driver is attached to NIC, the reference count is added, if someone remove phy driver directly,
it will be failed because reference count is not equal to 0.

An example of call trace when there is NIC driver to attch one phy driver:
hns_nic_dev_probe->hns_nic_try_get_ae->hns_nic_init_phy->of_phy_connect->phy_connect_direct->phy_attach_direct

Consider the steps of phy driver(marvell.ko) added and removed, and NIC driver(hns_enet_drv.ko) added and removed:
1)insmod marvell       ref=0
2)insmod hns_enet_drv  ref=1
3)rmmod marvell        (should not on success, ref=1)
4)rmmod hns_enet_drv   ref=0
5)rmmod marvell        (should on success, because ref=0)

if we don't add the reference count in phy_attach_direct(the second step ref=0), so the third step rmmod marvell will
be panic, because there is one user remain use marvell driver and phy_stat_machine use the NULL drv pointer.

> 
>> +
>>  	get_device(d);
>>
>>  	/* Assume that if there is no driver, that it doesn't
>> @@ -921,6 +926,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>>
>>  error:
>>  	put_device(d);
>> +	module_put(d->driver->owner);
> 
> Are those two in the wrong order ?
> 
>>  	module_put(bus->owner);
>>  	return err;
>>  }
>> @@ -998,6 +1004,7 @@ void phy_detach(struct phy_device *phydev)
>>  	bus = phydev->mdio.bus;
>>
>>  	put_device(&phydev->mdio.dev);
>> +	module_put(phydev->mdio.dev.driver->owner);
>>  	module_put(bus->owner);
> 
> Where is this code called from?
> You can't call it from the phy driver because the driver can be unloaded
> as soon as the last reference is removed.
> At that point the code memory is freed.

[Mao Wenan] it is called by NIC when it is removed, which aims to disconnect one bound phy driver. If this phy driver
is not used for this NIC, reference count should be subtracted, and phy driver can be removed if there is no user.
hns_nic_dev_remove->phy_disconnect->phy_detach



> 
>>  }
>>  EXPORT_SYMBOL(phy_detach);
>> --
>> 2.7.0
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net:phy fix driver reference count error when attach and detach phy device
  2016-12-05  8:47   ` maowenan
@ 2016-12-12  8:49     ` maowenan
  2016-12-12 16:33       ` Florian Fainelli
  0 siblings, 1 reply; 11+ messages in thread
From: maowenan @ 2016-12-12  8:49 UTC (permalink / raw)
  To: David Laight, netdev, f.fainelli, dingtianhong, weiyongjun (A)



On 2016/12/5 16:47, maowenan wrote:
> 
> 
> On 2016/12/2 17:45, David Laight wrote:
>> From: Mao Wenan
>>> Sent: 30 November 2016 10:23
>>> The nic in my board use the phy dev from marvell, and the system will
>>> load the marvell phy driver automatically, but when I remove the phy
>>> drivers, the system immediately panic:
>>> Call trace:
>>> [ 2582.834493] [<ffff800000715384>] phy_state_machine+0x3c/0x438 [
>>> 2582.851754] [<ffff8000000db3b8>] process_one_work+0x150/0x428 [
>>> 2582.868188] [<ffff8000000db7d4>] worker_thread+0x144/0x4b0 [
>>> 2582.883882] [<ffff8000000e1d0c>] kthread+0xfc/0x110
>>>
>>> there should be proper reference counting in place to avoid that.
>>> I found that phy_attach_direct() forgets to add phy device driver
>>> reference count, and phy_detach() forgets to subtract reference count.
>>> This patch is to fix this bug, after that panic is disappeared when remove
>>> marvell.ko
>>>
>>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
>>> ---
>>>  drivers/net/phy/phy_device.c | 7 +++++++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>>> index 1a4bf8a..a7ec7c2 100644
>>> --- a/drivers/net/phy/phy_device.c
>>> +++ b/drivers/net/phy/phy_device.c
>>> @@ -866,6 +866,11 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>>>  		return -EIO;
>>>  	}
>>>
>>> +	if (!try_module_get(d->driver->owner)) {
>>> +		dev_err(&dev->dev, "failed to get the device driver module\n");
>>> +		return -EIO;
>>> +	}
>>
>> If this is the phy code, what stops the phy driver being unloaded
>> before the try_module_get() obtains a reference.
>> If it isn't the phy driver then there ought to be a reference count obtained
>> when the phy driver is located (by whatever decides which phy driver to use).
>> Even if that code later releases its reference (it probably shouldn't on success)
>> then you can't fail to get an extra reference here.
> 
> [Mao Wenan]Yes, this is phy code, in function phy_attach_direct(), drivers/net/phy/phy_device.c.
> when one NIC driver to do probe behavior, it will attach one matched phy driver. phy_attach_direct()
> is to obtain phy driver reference and bind phy driver, if try_module_get() execute on success, the reference
> count is added; if failed, the driver can't be attached to this NIC, and it can't added the phy driver
> reference count. So before try_module_get obtains a reference, phy driver can't can't be bound to this NIC.
> when the phy driver is attached to NIC, the reference count is added, if someone remove phy driver directly,
> it will be failed because reference count is not equal to 0.
> 
> An example of call trace when there is NIC driver to attch one phy driver:
> hns_nic_dev_probe->hns_nic_try_get_ae->hns_nic_init_phy->of_phy_connect->phy_connect_direct->phy_attach_direct
> 
> Consider the steps of phy driver(marvell.ko) added and removed, and NIC driver(hns_enet_drv.ko) added and removed:
> 1)insmod marvell       ref=0
> 2)insmod hns_enet_drv  ref=1
> 3)rmmod marvell        (should not on success, ref=1)
> 4)rmmod hns_enet_drv   ref=0
> 5)rmmod marvell        (should on success, because ref=0)
> 
> if we don't add the reference count in phy_attach_direct(the second step ref=0), so the third step rmmod marvell will
> be panic, because there is one user remain use marvell driver and phy_stat_machine use the NULL drv pointer.
> 
>>
>>> +
>>>  	get_device(d);
>>>
>>>  	/* Assume that if there is no driver, that it doesn't
>>> @@ -921,6 +926,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>>>
>>>  error:
>>>  	put_device(d);
>>> +	module_put(d->driver->owner);
>>
>> Are those two in the wrong order ?
>>
>>>  	module_put(bus->owner);
>>>  	return err;
>>>  }
>>> @@ -998,6 +1004,7 @@ void phy_detach(struct phy_device *phydev)
>>>  	bus = phydev->mdio.bus;
>>>
>>>  	put_device(&phydev->mdio.dev);
>>> +	module_put(phydev->mdio.dev.driver->owner);
>>>  	module_put(bus->owner);
>>
>> Where is this code called from?
>> You can't call it from the phy driver because the driver can be unloaded
>> as soon as the last reference is removed.
>> At that point the code memory is freed.
> 
> [Mao Wenan] it is called by NIC when it is removed, which aims to disconnect one bound phy driver. If this phy driver
> is not used for this NIC, reference count should be subtracted, and phy driver can be removed if there is no user.
> hns_nic_dev_remove->phy_disconnect->phy_detach
> 
> 
> 
>>
>>>  }
>>>  EXPORT_SYMBOL(phy_detach);
>>> --
>>> 2.7.0
>>>
>>
>>
>> .
>>

@Florian Fainelli, what's your comments about this patch?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net:phy fix driver reference count error when attach and detach phy device
  2016-12-12  8:49     ` maowenan
@ 2016-12-12 16:33       ` Florian Fainelli
  2017-01-06  2:29         ` maowenan
  0 siblings, 1 reply; 11+ messages in thread
From: Florian Fainelli @ 2016-12-12 16:33 UTC (permalink / raw)
  To: maowenan, David Laight, netdev, dingtianhong, weiyongjun (A)

On 12/12/2016 12:49 AM, maowenan wrote:
> 
> 
> On 2016/12/5 16:47, maowenan wrote:
>>
>>
>> On 2016/12/2 17:45, David Laight wrote:
>>> From: Mao Wenan
>>>> Sent: 30 November 2016 10:23
>>>> The nic in my board use the phy dev from marvell, and the system will
>>>> load the marvell phy driver automatically, but when I remove the phy
>>>> drivers, the system immediately panic:
>>>> Call trace:
>>>> [ 2582.834493] [<ffff800000715384>] phy_state_machine+0x3c/0x438 [
>>>> 2582.851754] [<ffff8000000db3b8>] process_one_work+0x150/0x428 [
>>>> 2582.868188] [<ffff8000000db7d4>] worker_thread+0x144/0x4b0 [
>>>> 2582.883882] [<ffff8000000e1d0c>] kthread+0xfc/0x110
>>>>
>>>> there should be proper reference counting in place to avoid that.
>>>> I found that phy_attach_direct() forgets to add phy device driver
>>>> reference count, and phy_detach() forgets to subtract reference count.
>>>> This patch is to fix this bug, after that panic is disappeared when remove
>>>> marvell.ko
>>>>
>>>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
>>>> ---
>>>>  drivers/net/phy/phy_device.c | 7 +++++++
>>>>  1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>>>> index 1a4bf8a..a7ec7c2 100644
>>>> --- a/drivers/net/phy/phy_device.c
>>>> +++ b/drivers/net/phy/phy_device.c
>>>> @@ -866,6 +866,11 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>>>>  		return -EIO;
>>>>  	}
>>>>
>>>> +	if (!try_module_get(d->driver->owner)) {
>>>> +		dev_err(&dev->dev, "failed to get the device driver module\n");
>>>> +		return -EIO;
>>>> +	}
>>>
>>> If this is the phy code, what stops the phy driver being unloaded
>>> before the try_module_get() obtains a reference.
>>> If it isn't the phy driver then there ought to be a reference count obtained
>>> when the phy driver is located (by whatever decides which phy driver to use).
>>> Even if that code later releases its reference (it probably shouldn't on success)
>>> then you can't fail to get an extra reference here.
>>
>> [Mao Wenan]Yes, this is phy code, in function phy_attach_direct(), drivers/net/phy/phy_device.c.
>> when one NIC driver to do probe behavior, it will attach one matched phy driver. phy_attach_direct()
>> is to obtain phy driver reference and bind phy driver, if try_module_get() execute on success, the reference
>> count is added; if failed, the driver can't be attached to this NIC, and it can't added the phy driver
>> reference count. So before try_module_get obtains a reference, phy driver can't can't be bound to this NIC.
>> when the phy driver is attached to NIC, the reference count is added, if someone remove phy driver directly,
>> it will be failed because reference count is not equal to 0.
>>
>> An example of call trace when there is NIC driver to attch one phy driver:
>> hns_nic_dev_probe->hns_nic_try_get_ae->hns_nic_init_phy->of_phy_connect->phy_connect_direct->phy_attach_direct
>>
>> Consider the steps of phy driver(marvell.ko) added and removed, and NIC driver(hns_enet_drv.ko) added and removed:
>> 1)insmod marvell       ref=0
>> 2)insmod hns_enet_drv  ref=1
>> 3)rmmod marvell        (should not on success, ref=1)
>> 4)rmmod hns_enet_drv   ref=0
>> 5)rmmod marvell        (should on success, because ref=0)
>>
>> if we don't add the reference count in phy_attach_direct(the second step ref=0), so the third step rmmod marvell will
>> be panic, because there is one user remain use marvell driver and phy_stat_machine use the NULL drv pointer.
>>
>>>
>>>> +
>>>>  	get_device(d);
>>>>
>>>>  	/* Assume that if there is no driver, that it doesn't
>>>> @@ -921,6 +926,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>>>>
>>>>  error:
>>>>  	put_device(d);
>>>> +	module_put(d->driver->owner);
>>>
>>> Are those two in the wrong order ?
>>>
>>>>  	module_put(bus->owner);
>>>>  	return err;
>>>>  }
>>>> @@ -998,6 +1004,7 @@ void phy_detach(struct phy_device *phydev)
>>>>  	bus = phydev->mdio.bus;
>>>>
>>>>  	put_device(&phydev->mdio.dev);
>>>> +	module_put(phydev->mdio.dev.driver->owner);
>>>>  	module_put(bus->owner);
>>>
>>> Where is this code called from?
>>> You can't call it from the phy driver because the driver can be unloaded
>>> as soon as the last reference is removed.
>>> At that point the code memory is freed.
>>
>> [Mao Wenan] it is called by NIC when it is removed, which aims to disconnect one bound phy driver. If this phy driver
>> is not used for this NIC, reference count should be subtracted, and phy driver can be removed if there is no user.
>> hns_nic_dev_remove->phy_disconnect->phy_detach
>>
>>
>>
>>>
>>>>  }
>>>>  EXPORT_SYMBOL(phy_detach);
>>>> --
>>>> 2.7.0
>>>>
>>>
>>>
>>> .
>>>
> 
> @Florian Fainelli, what's your comments about this patch?

I am trying to reproduce what you are seeing, but at first glance is
looks like an appropriate solution to me. Do you mind giving me a couple
more days?

Thanks!
-- 
Florian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] net:phy fix driver reference count error when attach and detach phy device
  2016-12-12 16:33       ` Florian Fainelli
@ 2017-01-06  2:29         ` maowenan
  2017-01-06  3:21           ` Florian Fainelli
  0 siblings, 1 reply; 11+ messages in thread
From: maowenan @ 2017-01-06  2:29 UTC (permalink / raw)
  To: Florian Fainelli, David Laight, netdev, Dingtianhong, weiyongjun (A)



> -----Original Message-----
> From: Florian Fainelli [mailto:f.fainelli@gmail.com]
> Sent: Tuesday, December 13, 2016 12:33 AM
> To: maowenan; David Laight; netdev@vger.kernel.org; Dingtianhong;
> weiyongjun (A)
> Subject: Re: [PATCH] net:phy fix driver reference count error when attach and
> detach phy device
> 
> On 12/12/2016 12:49 AM, maowenan wrote:
> >
> >
> > On 2016/12/5 16:47, maowenan wrote:
> >>
> >>
> >> On 2016/12/2 17:45, David Laight wrote:
> >>> From: Mao Wenan
> >>>> Sent: 30 November 2016 10:23
> >>>> The nic in my board use the phy dev from marvell, and the system
> >>>> will load the marvell phy driver automatically, but when I remove
> >>>> the phy drivers, the system immediately panic:
> >>>> Call trace:
> >>>> [ 2582.834493] [<ffff800000715384>] phy_state_machine+0x3c/0x438 [
> >>>> 2582.851754] [<ffff8000000db3b8>] process_one_work+0x150/0x428 [
> >>>> 2582.868188] [<ffff8000000db7d4>] worker_thread+0x144/0x4b0 [
> >>>> 2582.883882] [<ffff8000000e1d0c>] kthread+0xfc/0x110
> >>>>
> >>>> there should be proper reference counting in place to avoid that.
> >>>> I found that phy_attach_direct() forgets to add phy device driver
> >>>> reference count, and phy_detach() forgets to subtract reference count.
> >>>> This patch is to fix this bug, after that panic is disappeared when
> >>>> remove marvell.ko
> >>>>
> >>>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> >>>> ---
> >>>>  drivers/net/phy/phy_device.c | 7 +++++++
> >>>>  1 file changed, 7 insertions(+)
> >>>>
> >>>> diff --git a/drivers/net/phy/phy_device.c
> >>>> b/drivers/net/phy/phy_device.c index 1a4bf8a..a7ec7c2 100644
> >>>> --- a/drivers/net/phy/phy_device.c
> >>>> +++ b/drivers/net/phy/phy_device.c
> >>>> @@ -866,6 +866,11 @@ int phy_attach_direct(struct net_device *dev,
> struct phy_device *phydev,
> >>>>  		return -EIO;
> >>>>  	}
> >>>>
> >>>> +	if (!try_module_get(d->driver->owner)) {
> >>>> +		dev_err(&dev->dev, "failed to get the device driver module\n");
> >>>> +		return -EIO;
> >>>> +	}
> >>>
> >>> If this is the phy code, what stops the phy driver being unloaded
> >>> before the try_module_get() obtains a reference.
> >>> If it isn't the phy driver then there ought to be a reference count
> >>> obtained when the phy driver is located (by whatever decides which phy
> driver to use).
> >>> Even if that code later releases its reference (it probably
> >>> shouldn't on success) then you can't fail to get an extra reference here.
> >>
> >> [Mao Wenan]Yes, this is phy code, in function phy_attach_direct(),
> drivers/net/phy/phy_device.c.
> >> when one NIC driver to do probe behavior, it will attach one matched
> >> phy driver. phy_attach_direct() is to obtain phy driver reference and
> >> bind phy driver, if try_module_get() execute on success, the
> >> reference count is added; if failed, the driver can't be attached to this NIC,
> and it can't added the phy driver reference count. So before try_module_get
> obtains a reference, phy driver can't can't be bound to this NIC.
> >> when the phy driver is attached to NIC, the reference count is added,
> >> if someone remove phy driver directly, it will be failed because reference
> count is not equal to 0.
> >>
> >> An example of call trace when there is NIC driver to attch one phy driver:
> >> hns_nic_dev_probe->hns_nic_try_get_ae->hns_nic_init_phy->of_phy_conne
> >> ct->phy_connect_direct->phy_attach_direct
> >>
> >> Consider the steps of phy driver(marvell.ko) added and removed, and NIC
> driver(hns_enet_drv.ko) added and removed:
> >> 1)insmod marvell       ref=0
> >> 2)insmod hns_enet_drv  ref=1
> >> 3)rmmod marvell        (should not on success, ref=1)
> >> 4)rmmod hns_enet_drv   ref=0
> >> 5)rmmod marvell        (should on success, because ref=0)
> >>
> >> if we don't add the reference count in phy_attach_direct(the second
> >> step ref=0), so the third step rmmod marvell will be panic, because there is
> one user remain use marvell driver and phy_stat_machine use the NULL drv
> pointer.
> >>
> >>>
> >>>> +
> >>>>  	get_device(d);
> >>>>
> >>>>  	/* Assume that if there is no driver, that it doesn't @@ -921,6
> >>>> +926,7 @@ int phy_attach_direct(struct net_device *dev, struct
> >>>> phy_device *phydev,
> >>>>
> >>>>  error:
> >>>>  	put_device(d);
> >>>> +	module_put(d->driver->owner);
> >>>
> >>> Are those two in the wrong order ?
> >>>
> >>>>  	module_put(bus->owner);
> >>>>  	return err;
> >>>>  }
> >>>> @@ -998,6 +1004,7 @@ void phy_detach(struct phy_device *phydev)
> >>>>  	bus = phydev->mdio.bus;
> >>>>
> >>>>  	put_device(&phydev->mdio.dev);
> >>>> +	module_put(phydev->mdio.dev.driver->owner);
> >>>>  	module_put(bus->owner);
> >>>
> >>> Where is this code called from?
> >>> You can't call it from the phy driver because the driver can be
> >>> unloaded as soon as the last reference is removed.
> >>> At that point the code memory is freed.
> >>
> >> [Mao Wenan] it is called by NIC when it is removed, which aims to
> >> disconnect one bound phy driver. If this phy driver is not used for this NIC,
> reference count should be subtracted, and phy driver can be removed if there is
> no user.
> >> hns_nic_dev_remove->phy_disconnect->phy_detach
> >>
> >>
> >>
> >>>
> >>>>  }
> >>>>  EXPORT_SYMBOL(phy_detach);
> >>>> --
> >>>> 2.7.0
> >>>>
> >>>
> >>>
> >>> .
> >>>
> >
> > @Florian Fainelli, what's your comments about this patch?
> 
> I am trying to reproduce what you are seeing, but at first glance is looks like an
> appropriate solution to me. Do you mind giving me a couple more days?
> 
> Thanks!
> --
> Florian

Hi Florian, 
  Do you have any update about this patch?
  Thank you!
  









^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net:phy fix driver reference count error when attach and detach phy device
  2017-01-06  2:29         ` maowenan
@ 2017-01-06  3:21           ` Florian Fainelli
  2017-01-06  3:39             ` maowenan
  0 siblings, 1 reply; 11+ messages in thread
From: Florian Fainelli @ 2017-01-06  3:21 UTC (permalink / raw)
  To: maowenan, David Laight, netdev, Dingtianhong, weiyongjun (A),
	Andrew Lunn

+Andrew,

Le 01/05/17 à 18:29, maowenan a écrit :
>>> @Florian Fainelli, what's your comments about this patch?
>>
>> I am trying to reproduce what you are seeing, but at first glance is looks like an
>> appropriate solution to me. Do you mind giving me a couple more days?
>>
>> Thanks!
>> --
>> Florian
> 
> Hi Florian, 
>   Do you have any update about this patch?

Your patch is not complete, there are now MDIO device (which PHY devices
are a superset of) that would also need a similar fix.
-- 
Florian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net:phy fix driver reference count error when attach and detach phy device
  2017-01-06  3:21           ` Florian Fainelli
@ 2017-01-06  3:39             ` maowenan
  2017-01-06  4:48               ` Florian Fainelli
  0 siblings, 1 reply; 11+ messages in thread
From: maowenan @ 2017-01-06  3:39 UTC (permalink / raw)
  To: Florian Fainelli, David Laight, netdev, Dingtianhong,
	weiyongjun (A),
	Andrew Lunn



On 2017/1/6 11:21, Florian Fainelli wrote:
> +Andrew,
> 
> Le 01/05/17 à 18:29, maowenan a écrit :
>>>> @Florian Fainelli, what's your comments about this patch?
>>>
>>> I am trying to reproduce what you are seeing, but at first glance is looks like an
>>> appropriate solution to me. Do you mind giving me a couple more days?
>>>
>>> Thanks!
>>> --
>>> Florian
>>
>> Hi Florian, 
>>   Do you have any update about this patch?
> 
> Your patch is not complete, there are now MDIO device (which PHY devices
> are a superset of) that would also need a similar fix.
> 
ok, is there any patch to fix MDIO yet?  if not, i will verify it and give a fix patch?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net:phy fix driver reference count error when attach and detach phy device
  2017-01-06  3:39             ` maowenan
@ 2017-01-06  4:48               ` Florian Fainelli
  2017-01-23  9:33                 ` maowenan
  0 siblings, 1 reply; 11+ messages in thread
From: Florian Fainelli @ 2017-01-06  4:48 UTC (permalink / raw)
  To: maowenan, David Laight, netdev, Dingtianhong, weiyongjun (A),
	Andrew Lunn

Le 01/05/17 à 19:39, maowenan a écrit :
> 
> 
> On 2017/1/6 11:21, Florian Fainelli wrote:
>> +Andrew,
>>
>> Le 01/05/17 à 18:29, maowenan a écrit :
>>>>> @Florian Fainelli, what's your comments about this patch?
>>>>
>>>> I am trying to reproduce what you are seeing, but at first glance is looks like an
>>>> appropriate solution to me. Do you mind giving me a couple more days?
>>>>
>>>> Thanks!
>>>> --
>>>> Florian
>>>
>>> Hi Florian, 
>>>   Do you have any update about this patch?
>>
>> Your patch is not complete, there are now MDIO device (which PHY devices
>> are a superset of) that would also need a similar fix.
>>
> ok, is there any patch to fix MDIO yet?  if not, i will verify it and give a fix patch?
> 

No, there is not a patch yet, your approach looks okay, but need to be
made general and cover MDIO devices as well.

Thank you!
-- 
Florian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net:phy fix driver reference count error when attach and detach phy device
  2017-01-06  4:48               ` Florian Fainelli
@ 2017-01-23  9:33                 ` maowenan
  2017-01-31 20:05                   ` Florian Fainelli
  0 siblings, 1 reply; 11+ messages in thread
From: maowenan @ 2017-01-23  9:33 UTC (permalink / raw)
  To: Florian Fainelli, David Laight, netdev, Dingtianhong,
	weiyongjun (A),
	Andrew Lunn



On 2017/1/6 12:48, Florian Fainelli wrote:
> Le 01/05/17 à 19:39, maowenan a écrit :
>>
>>
>> On 2017/1/6 11:21, Florian Fainelli wrote:
>>> +Andrew,
>>>
>>> Le 01/05/17 à 18:29, maowenan a écrit :
>>>>>> @Florian Fainelli, what's your comments about this patch?
>>>>>
>>>>> I am trying to reproduce what you are seeing, but at first glance is looks like an
>>>>> appropriate solution to me. Do you mind giving me a couple more days?
>>>>>
>>>>> Thanks!
>>>>> --
>>>>> Florian
>>>>
>>>> Hi Florian, 
>>>>   Do you have any update about this patch?
>>>
>>> Your patch is not complete, there are now MDIO device (which PHY devices
>>> are a superset of) that would also need a similar fix.
>>>
>> ok, is there any patch to fix MDIO yet?  if not, i will verify it and give a fix patch?
>>
> 
> No, there is not a patch yet, your approach looks okay, but need to be
> made general and cover MDIO devices as well.
> 
> Thank you!
> 

Hi Florian,
Sorry I can't get you. There has already existed codes which are not originally written by me to cover MDIO device in phy_attach_direct and phy_detach in my patch .
Please help check, thank you.
phy_attach_direct:
struct device *d = &phydev->mdio.dev;
...
get_device(d);
...

phy_detach:
 	put_device(&phydev->mdio.dev);       /*--MDIO device--*/
+	module_put(phydev->mdio.dev.driver->owner);
 	module_put(bus->owner);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net:phy fix driver reference count error when attach and detach phy device
  2017-01-23  9:33                 ` maowenan
@ 2017-01-31 20:05                   ` Florian Fainelli
  0 siblings, 0 replies; 11+ messages in thread
From: Florian Fainelli @ 2017-01-31 20:05 UTC (permalink / raw)
  To: maowenan, David Laight, netdev, Dingtianhong, weiyongjun (A),
	Andrew Lunn

On 01/23/2017 01:33 AM, maowenan wrote:
> 
> 
> On 2017/1/6 12:48, Florian Fainelli wrote:
>> Le 01/05/17 à 19:39, maowenan a écrit :
>>>
>>>
>>> On 2017/1/6 11:21, Florian Fainelli wrote:
>>>> +Andrew,
>>>>
>>>> Le 01/05/17 à 18:29, maowenan a écrit :
>>>>>>> @Florian Fainelli, what's your comments about this patch?
>>>>>>
>>>>>> I am trying to reproduce what you are seeing, but at first glance is looks like an
>>>>>> appropriate solution to me. Do you mind giving me a couple more days?
>>>>>>
>>>>>> Thanks!
>>>>>> --
>>>>>> Florian
>>>>>
>>>>> Hi Florian, 
>>>>>   Do you have any update about this patch?
>>>>
>>>> Your patch is not complete, there are now MDIO device (which PHY devices
>>>> are a superset of) that would also need a similar fix.
>>>>
>>> ok, is there any patch to fix MDIO yet?  if not, i will verify it and give a fix patch?
>>>
>>
>> No, there is not a patch yet, your approach looks okay, but need to be
>> made general and cover MDIO devices as well.
>>
>> Thank you!
>>
> 
> Hi Florian,
> Sorry I can't get you. There has already existed codes which are not originally written by me to cover MDIO device in phy_attach_direct and phy_detach in my patch .
> Please help check, thank you.
> phy_attach_direct:
> struct device *d = &phydev->mdio.dev;
> ...
> get_device(d);
> ...
> 
> phy_detach:
>  	put_device(&phydev->mdio.dev);       /*--MDIO device--*/
> +	module_put(phydev->mdio.dev.driver->owner);
>  	module_put(bus->owner);

Took me a while, but I can finally reproduce this here as well, will
come up with a fix, thanks for your patience!
-- 
Florian

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-01-31 20:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-30 10:22 [PATCH] net:phy fix driver reference count error when attach and detach phy device Mao Wenan
2016-12-02  9:45 ` David Laight
2016-12-05  8:47   ` maowenan
2016-12-12  8:49     ` maowenan
2016-12-12 16:33       ` Florian Fainelli
2017-01-06  2:29         ` maowenan
2017-01-06  3:21           ` Florian Fainelli
2017-01-06  3:39             ` maowenan
2017-01-06  4:48               ` Florian Fainelli
2017-01-23  9:33                 ` maowenan
2017-01-31 20:05                   ` Florian Fainelli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.