All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Fainelli <f.fainelli@gmail.com>
To: Vladimir Oltean <vladimir.oltean@nxp.com>,
	Andrew Lunn <andrew@lunn.ch>, Jiri Pirko <jiri@resnulli.us>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Vivien Didelot <vivien.didelot@gmail.com>,
	Tobias Waldekranz <tobias@waldekranz.com>
Subject: Re: [PATCH v3 net-next] net: dsa: reference count the host mdb addresses
Date: Sat, 12 Dec 2020 17:42:22 -0800	[thread overview]
Message-ID: <e3e7311e-f205-9b91-7eaa-5f8e371d12c3@gmail.com> (raw)
In-Reply-To: <20201213004933.pbjwfltwudvokrej@skbuf>



On 12/12/2020 4:49 PM, Vladimir Oltean wrote:
> On Sun, Dec 13, 2020 at 01:34:10AM +0100, Andrew Lunn wrote:
>> On Sun, Dec 13, 2020 at 12:14:19AM +0000, Vladimir Oltean wrote:
>>> On Sun, Dec 13, 2020 at 01:08:55AM +0100, Andrew Lunn wrote:
>>>>>> And you need some way to cleanup the allocated memory when the commit
>>>>>> never happens because some other layer has said No!
>>>>>
>>>>> So this would be a fatal problem with the switchdev transactional model
>>>>> if I am not misunderstanding it. On one hand there's this nice, bubbly
>>>>> idea that you should preallocate memory in the prepare phase, so that
>>>>> there's one reason less to fail at commit time. But on the other hand,
>>>>> if "the commit phase might never happen" is even a remove possibility,
>>>>> all of that goes to trash - how are you even supposed to free the
>>>>> preallocated memory.
>>>>
>>>> It can definitely happen, that commit is never called:
>>>>
>>>> static int switchdev_port_obj_add_now(struct net_device *dev,
>>>>                                       const struct switchdev_obj *obj,
>>>>                                       struct netlink_ext_ack *extack)
>>>> {
>>>>
>>>>        /* Phase I: prepare for obj add. Driver/device should fail
>>>>          * here if there are going to be issues in the commit phase,
>>>>          * such as lack of resources or support.  The driver/device
>>>>          * should reserve resources needed for the commit phase here,
>>>>          * but should not commit the obj.
>>>>          */
>>>>
>>>>         trans.ph_prepare = true;
>>>>         err = switchdev_port_obj_notify(SWITCHDEV_PORT_OBJ_ADD,
>>>>                                         dev, obj, &trans, extack);
>>>>         if (err)
>>>>                 return err;
>>>>
>>>>         /* Phase II: commit obj add.  This cannot fail as a fault
>>>>          * of driver/device.  If it does, it's a bug in the driver/device
>>>>          * because the driver said everythings was OK in phase I.
>>>>          */
>>>>
>>>>         trans.ph_prepare = false;
>>>>         err = switchdev_port_obj_notify(SWITCHDEV_PORT_OBJ_ADD,
>>>>                                         dev, obj, &trans, extack);
>>>>         WARN(err, "%s: Commit of object (id=%d) failed.\n", dev->name, obj->id);
>>>>
>>>>         return err;
>>>>
>>>> So if any notifier returns an error during prepare, the commit is
>>>> never called.
>>>>
>>>> So the memory you allocated and added to the list may never get
>>>> used. Its refcount stays zero.  Which is why i suggested making the
>>>> MDB remove call do a general garbage collect. It is not perfect, the
>>>> cleanup could be deferred a long time, but is should get removed
>>>> eventually.
>>>
>>> What would the garbage collection look like?
>>
>>         struct dsa_host_addr *a;
>>
>>         list_for_each_entry_safe(a, addr_list, list)
>> 		if (refcount_read(&a->refcount) == 0) {
>> 			list_del(&a->list);
>> 			free(a);
>> 		}
>> 	}
> 
> Sorry, this seems a bit absurd. The code is already absurdly complicated
> as is. I don't want to go against the current and add more unjustified
> nonsense instead of taking a step back, which I should have done earlier.
> I thought this transactional API was supposed to help. Though I scanned
> the kernel tree a bit and I fail to understand whom it helps exactly.
> What I see is that the whole 'transaction' spans only the length of the
> switchdev_port_attr_set_now function.
> 
> Am I right to say that there is no in-kernel code that depends upon the
> switchdev transaction model right now, as it's completely hidden away
> from callers? As in, we could just squash the two switchdev_port_attr_notify
> calls into one and nothing would functionally change for the callers of
> switchdev_port_attr_set?
> Why don't we do just that? I might be completely blind, but I am getting
> the feeling that this idea has not aged very well.

IIRC that was the conclusion that Ido and I had reached as well way back
when doing the commit you cited below.

> 
> Florian, has anything happened in the meantime since this commit of yours?

This is where I stopped, mainly because the series that had motivated
this work was the one bringing management mode to bcm_sf2 and CPU RX
filtering had me wire up yet another switched attribute that drivers
like b53 wanted to veto (namely the disabling of IGMP snooping). We did
not agree on the approach to use switchdev for notifying drivers about
UC, MC lists down to drivers and so the series stalled.

IIRC Jiri and Ido were also keen on merging the switchdev with the
bridge code but I did not do that part, nor did I completely remove the
transaction model, but those were the next steps had I not been side
tracked with work on other topics.

> 
> commit 91cf8eceffc131d41f098351e1b290bdaab45ea7
> Author: Florian Fainelli <f.fainelli@gmail.com>
> Date:   Wed Feb 27 16:29:16 2019 -0800
> 
>     switchdev: Remove unused transaction item queue
> 
>     There are no more in tree users of the
>     switchdev_trans_item_{dequeue,enqueue} or switchdev_trans_item structure
>     in the kernel since commit 00fc0c51e35b ("rocker: Change world_ops API
>     and implementation to be switchdev independant").
> 
>     Remove this unused code and update the documentation accordingly since.
> 
>     Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>     Acked-by: Jiri Pirko <jiri@mellanox.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> There isn't an API to hold this stuff any longer. So let's go back to
> the implementation from v2, with memory allocation in the commit phase.
> The way forward anyway is probably not to add a garbage collector in
> DSA, but to fold the prepare and commit phases into one.

Agreed.
-- 
Florian

      reply	other threads:[~2020-12-13  1:43 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-12 20:39 [PATCH v3 net-next] net: dsa: reference count the host mdb addresses Vladimir Oltean
2020-12-12 22:06 ` Andrew Lunn
2020-12-12 22:18   ` Vladimir Oltean
2020-12-13  0:08     ` Andrew Lunn
2020-12-13  0:14       ` Vladimir Oltean
2020-12-13  0:34         ` Andrew Lunn
2020-12-13  0:49           ` Vladimir Oltean
2020-12-13  1:42             ` Florian Fainelli [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e3e7311e-f205-9b91-7eaa-5f8e371d12c3@gmail.com \
    --to=f.fainelli@gmail.com \
    --cc=andrew@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tobias@waldekranz.com \
    --cc=vivien.didelot@gmail.com \
    --cc=vladimir.oltean@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.