linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC net] net: dsa: Add missing reference counting
@ 2020-05-05 21:02 Florian Fainelli
  2020-05-05 21:23 ` Vivien Didelot
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Fainelli @ 2020-05-05 21:02 UTC (permalink / raw)
  To: netdev
  Cc: olteanv, Florian Fainelli, Andrew Lunn, Vivien Didelot,
	David S. Miller, Jakub Kicinski, open list

If we are probed through platform_data we would be intentionally
dropping the reference count on master after dev_to_net_device()
incremented it. If we are probed through Device Tree,
of_find_net_device() does not do a dev_hold() at all.

Ensure that the DSA master device is properly reference counted by
holding it as soon as the CPU port is successfully initialized and later
released during dsa_switch_release_ports(). dsa_get_tag_protocol() does
a short de-reference, so we hold and release the master at that time,
too.

Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 net/dsa/dsa2.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index d90665b465b8..875231252ada 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -626,6 +626,7 @@ static enum dsa_tag_protocol dsa_get_tag_protocol(struct dsa_port *dp,
 	 * happens the switch driver may want to know if its tagging protocol
 	 * is going to work in such a configuration.
 	 */
+	dev_hold(master);
 	if (dsa_slave_dev_check(master)) {
 		mdp = dsa_slave_to_port(master);
 		mds = mdp->ds;
@@ -633,6 +634,7 @@ static enum dsa_tag_protocol dsa_get_tag_protocol(struct dsa_port *dp,
 		tag_protocol = mds->ops->get_tag_protocol(mds, mdp_upstream,
 							  DSA_TAG_PROTO_NONE);
 	}
+	dev_put(master);
 
 	/* If the master device is not itself a DSA slave in a disjoint DSA
 	 * tree, then return immediately.
@@ -657,6 +659,7 @@ static int dsa_port_parse_cpu(struct dsa_port *dp, struct net_device *master)
 		return PTR_ERR(tag_ops);
 	}
 
+	dev_hold(master);
 	dp->master = master;
 	dp->type = DSA_PORT_TYPE_CPU;
 	dp->filter = tag_ops->filter;
@@ -858,6 +861,8 @@ static void dsa_switch_release_ports(struct dsa_switch *ds)
 		if (dp->ds != ds)
 			continue;
 		list_del(&dp->list);
+		if (dsa_port_is_cpu(dp))
+			dev_put(dp->master);
 		kfree(dp);
 	}
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC net] net: dsa: Add missing reference counting
  2020-05-05 21:02 [RFC net] net: dsa: Add missing reference counting Florian Fainelli
@ 2020-05-05 21:23 ` Vivien Didelot
  2020-05-06 21:24   ` Florian Fainelli
  0 siblings, 1 reply; 7+ messages in thread
From: Vivien Didelot @ 2020-05-05 21:23 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, olteanv, Florian Fainelli, Andrew Lunn, David S. Miller,
	Jakub Kicinski, open list

On Tue,  5 May 2020 14:02:53 -0700, Florian Fainelli <f.fainelli@gmail.com> wrote:
> If we are probed through platform_data we would be intentionally
> dropping the reference count on master after dev_to_net_device()
> incremented it. If we are probed through Device Tree,
> of_find_net_device() does not do a dev_hold() at all.
> 
> Ensure that the DSA master device is properly reference counted by
> holding it as soon as the CPU port is successfully initialized and later
> released during dsa_switch_release_ports(). dsa_get_tag_protocol() does
> a short de-reference, so we hold and release the master at that time,
> too.
> 
> Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC net] net: dsa: Add missing reference counting
  2020-05-05 21:23 ` Vivien Didelot
@ 2020-05-06 21:24   ` Florian Fainelli
  2020-05-06 21:40     ` Vladimir Oltean
  2020-05-07 13:02     ` Andrew Lunn
  0 siblings, 2 replies; 7+ messages in thread
From: Florian Fainelli @ 2020-05-06 21:24 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, olteanv, Andrew Lunn, David S. Miller, Jakub Kicinski, open list



On 5/5/2020 2:23 PM, Vivien Didelot wrote:
> On Tue,  5 May 2020 14:02:53 -0700, Florian Fainelli <f.fainelli@gmail.com> wrote:
>> If we are probed through platform_data we would be intentionally
>> dropping the reference count on master after dev_to_net_device()
>> incremented it. If we are probed through Device Tree,
>> of_find_net_device() does not do a dev_hold() at all.
>>
>> Ensure that the DSA master device is properly reference counted by
>> holding it as soon as the CPU port is successfully initialized and later
>> released during dsa_switch_release_ports(). dsa_get_tag_protocol() does
>> a short de-reference, so we hold and release the master at that time,
>> too.
>>
>> Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> 
> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
> 
Andrew, Vladimir, any thoughts on that?
-- 
Florian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC net] net: dsa: Add missing reference counting
  2020-05-06 21:24   ` Florian Fainelli
@ 2020-05-06 21:40     ` Vladimir Oltean
  2020-05-06 22:45       ` Florian Fainelli
  2020-05-07 13:02     ` Andrew Lunn
  1 sibling, 1 reply; 7+ messages in thread
From: Vladimir Oltean @ 2020-05-06 21:40 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Vivien Didelot, netdev, Andrew Lunn, David S. Miller,
	Jakub Kicinski, open list

Hi Florian,

On Thu, 7 May 2020 at 00:24, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 5/5/2020 2:23 PM, Vivien Didelot wrote:
> > On Tue,  5 May 2020 14:02:53 -0700, Florian Fainelli <f.fainelli@gmail.com> wrote:
> >> If we are probed through platform_data we would be intentionally
> >> dropping the reference count on master after dev_to_net_device()
> >> incremented it. If we are probed through Device Tree,
> >> of_find_net_device() does not do a dev_hold() at all.
> >>
> >> Ensure that the DSA master device is properly reference counted by
> >> holding it as soon as the CPU port is successfully initialized and later
> >> released during dsa_switch_release_ports(). dsa_get_tag_protocol() does
> >> a short de-reference, so we hold and release the master at that time,
> >> too.
> >>
> >> Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
> >> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> >
> > Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
> >
> Andrew, Vladimir, any thoughts on that?
> --
> Florian

I might be completely off because I guess I just don't understand what
is the goal of keeping a reference to the DSA master in this way for
the entire lifetime of the DSA switch. I think that dev_hold is for
short-term things that cannot complete atomically, but I think that
you are trying to prevent the DSA master from getting freed from under
our feet, which at the moment would fault the kernel instantaneously?

If this is correct, it certainly doesn't do what it intends to do:
echo 0000\:00\:00.5> /sys/bus/pci/drivers/mscc_felix/unbind
[   71.576333] unregister_netdevice: waiting for swp0 to become free.
Usage count = 1
(hangs there)

But if I'm right and that's indeed what you want to achieve, shouldn't
we be using device links instead?
https://www.kernel.org/doc/html/v4.14/driver-api/device_link.html

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC net] net: dsa: Add missing reference counting
  2020-05-06 21:40     ` Vladimir Oltean
@ 2020-05-06 22:45       ` Florian Fainelli
  2020-05-06 23:32         ` Vladimir Oltean
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Fainelli @ 2020-05-06 22:45 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Vivien Didelot, netdev, Andrew Lunn, David S. Miller,
	Jakub Kicinski, open list



On 5/6/2020 2:40 PM, Vladimir Oltean wrote:
> Hi Florian,
> 
> On Thu, 7 May 2020 at 00:24, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>>
>>
>> On 5/5/2020 2:23 PM, Vivien Didelot wrote:
>>> On Tue,  5 May 2020 14:02:53 -0700, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>>> If we are probed through platform_data we would be intentionally
>>>> dropping the reference count on master after dev_to_net_device()
>>>> incremented it. If we are probed through Device Tree,
>>>> of_find_net_device() does not do a dev_hold() at all.
>>>>
>>>> Ensure that the DSA master device is properly reference counted by
>>>> holding it as soon as the CPU port is successfully initialized and later
>>>> released during dsa_switch_release_ports(). dsa_get_tag_protocol() does
>>>> a short de-reference, so we hold and release the master at that time,
>>>> too.
>>>>
>>>> Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
>>>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>>>
>>> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
>>>
>> Andrew, Vladimir, any thoughts on that?
>> --
>> Florian
> 
> I might be completely off because I guess I just don't understand what
> is the goal of keeping a reference to the DSA master in this way for
> the entire lifetime of the DSA switch. I think that dev_hold is for
> short-term things that cannot complete atomically, but I think that
> you are trying to prevent the DSA master from getting freed from under
> our feet, which at the moment would fault the kernel instantaneously?

Yes, that's the idea, you should not be able to rmmod/unbind the DSA
master while there is a DSA switch tree hanging off of it.

> 
> If this is correct, it certainly doesn't do what it intends to do:
> echo 0000\:00\:00.5> /sys/bus/pci/drivers/mscc_felix/unbind
> [   71.576333] unregister_netdevice: waiting for swp0 to become free.
> Usage count = 1
> (hangs there)

Is this with the sja1105 switch hanging off felix? If so, is not it
working as expected because you still have sja1150 being bound to one of
those ports? If not, then I will look into why.

> 
> But if I'm right and that's indeed what you want to achieve, shouldn't
> we be using device links instead?
> https://www.kernel.org/doc/html/v4.14/driver-api/device_link.html

device links could work but given that the struct device and struct
net_device have almost the same lifetime, with the net_device being a
little bit shorter, and that is what DSA uses, I am not sure whether
device link would bring something better.
-- 
Florian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC net] net: dsa: Add missing reference counting
  2020-05-06 22:45       ` Florian Fainelli
@ 2020-05-06 23:32         ` Vladimir Oltean
  0 siblings, 0 replies; 7+ messages in thread
From: Vladimir Oltean @ 2020-05-06 23:32 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Vivien Didelot, netdev, Andrew Lunn, David S. Miller,
	Jakub Kicinski, open list

On Thu, 7 May 2020 at 01:45, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 5/6/2020 2:40 PM, Vladimir Oltean wrote:
> > Hi Florian,
> >
> > On Thu, 7 May 2020 at 00:24, Florian Fainelli <f.fainelli@gmail.com> wrote:
> >>
> >>
> >>
> >> On 5/5/2020 2:23 PM, Vivien Didelot wrote:
> >>> On Tue,  5 May 2020 14:02:53 -0700, Florian Fainelli <f.fainelli@gmail.com> wrote:
> >>>> If we are probed through platform_data we would be intentionally
> >>>> dropping the reference count on master after dev_to_net_device()
> >>>> incremented it. If we are probed through Device Tree,
> >>>> of_find_net_device() does not do a dev_hold() at all.
> >>>>
> >>>> Ensure that the DSA master device is properly reference counted by
> >>>> holding it as soon as the CPU port is successfully initialized and later
> >>>> released during dsa_switch_release_ports(). dsa_get_tag_protocol() does
> >>>> a short de-reference, so we hold and release the master at that time,
> >>>> too.
> >>>>
> >>>> Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
> >>>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> >>>
> >>> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
> >>>
> >> Andrew, Vladimir, any thoughts on that?
> >> --
> >> Florian
> >
> > I might be completely off because I guess I just don't understand what
> > is the goal of keeping a reference to the DSA master in this way for
> > the entire lifetime of the DSA switch. I think that dev_hold is for
> > short-term things that cannot complete atomically, but I think that
> > you are trying to prevent the DSA master from getting freed from under
> > our feet, which at the moment would fault the kernel instantaneously?
>
> Yes, that's the idea, you should not be able to rmmod/unbind the DSA
> master while there is a DSA switch tree hanging off of it.
>
> >
> > If this is correct, it certainly doesn't do what it intends to do:
> > echo 0000\:00\:00.5> /sys/bus/pci/drivers/mscc_felix/unbind
> > [   71.576333] unregister_netdevice: waiting for swp0 to become free.
> > Usage count = 1
> > (hangs there)
>
> Is this with the sja1105 switch hanging off felix?

Yes, but it actually doesn't matter that the DSA master is a DSA slave too.

> If so, is not it
> working as expected because you still have sja1150 being bound to one of
> those ports? If not, then I will look into why.
>

I just unbound the driver for the DSA master and the shell got stuck
in kernel process context telling me that it's waiting for the
reference to be freed. So I think it's just that my "expected" is not
the same as yours - it looks like what I'm doing would qualify as
"incorrect usage".

> >
> > But if I'm right and that's indeed what you want to achieve, shouldn't
> > we be using device links instead?
> > https://www.kernel.org/doc/html/v4.14/driver-api/device_link.html
>
> device links could work but given that the struct device and struct
> net_device have almost the same lifetime, with the net_device being a
> little bit shorter, and that is what DSA uses, I am not sure whether
> device link would bring something better.

At the very least, I think they would bring us graceful teardown of
consumers of the DSA master device.

> --
> Florian

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC net] net: dsa: Add missing reference counting
  2020-05-06 21:24   ` Florian Fainelli
  2020-05-06 21:40     ` Vladimir Oltean
@ 2020-05-07 13:02     ` Andrew Lunn
  1 sibling, 0 replies; 7+ messages in thread
From: Andrew Lunn @ 2020-05-07 13:02 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Vivien Didelot, netdev, olteanv, David S. Miller, Jakub Kicinski,
	open list

On Wed, May 06, 2020 at 02:24:14PM -0700, Florian Fainelli wrote:
> 
> 
> On 5/5/2020 2:23 PM, Vivien Didelot wrote:
> > On Tue,  5 May 2020 14:02:53 -0700, Florian Fainelli <f.fainelli@gmail.com> wrote:
> >> If we are probed through platform_data we would be intentionally
> >> dropping the reference count on master after dev_to_net_device()
> >> incremented it. If we are probed through Device Tree,
> >> of_find_net_device() does not do a dev_hold() at all.
> >>
> >> Ensure that the DSA master device is properly reference counted by
> >> holding it as soon as the CPU port is successfully initialized and later
> >> released during dsa_switch_release_ports(). dsa_get_tag_protocol() does
> >> a short de-reference, so we hold and release the master at that time,
> >> too.
> >>
> >> Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
> >> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> > 
> > Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
> > 
> Andrew, Vladimir, any thoughts on that?

Hi Florian

Have you looked at how other stacked drivers do this? bond/team, vlan,
bridge, BATMAN?

Do we maybe need to subscribe to the master devices notifier chain,
and do a tear down when the device is removed?

    Andrew

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-05-07 13:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-05 21:02 [RFC net] net: dsa: Add missing reference counting Florian Fainelli
2020-05-05 21:23 ` Vivien Didelot
2020-05-06 21:24   ` Florian Fainelli
2020-05-06 21:40     ` Vladimir Oltean
2020-05-06 22:45       ` Florian Fainelli
2020-05-06 23:32         ` Vladimir Oltean
2020-05-07 13:02     ` Andrew Lunn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).