On Thu, 16 Jan 2020 at 00:09, Denis Kenzior wrote: > >> Are you sure? netdev_connect_free doesn't invoke callbacks. It is > >> netdev_connect_failed and netdev_connect_ok that do. > > > > True, I misread something. > > > > Also note that by extension, netdev_disconnect only calls the connect_cb > with ABORTED. It doesn't do anything if the connection is 'operational'. It calls netdev_connect_free, and that wipes the callback so it does exactly what we need. > > >> > >>> pending, I believe we should call netdev_disconnect, and whenever we > >>> have a wsc_enrollee object we need to call wsc_enrollee_cancel. > >> > >> I think we should not try to lump together a 'clean-async-shutdown' and > >> 'cleanup-right-away' concepts. > > > > Ok, then we need two methods for wsc_enrollee. I also think that > > Isn't this what I keep trying to tell you? :) Oh, ok. Then we agree on this since early in this thread :) > > > station and wsc_enrollee shouldn't assume whether the netdev is going > > away or not, based on whether station and/or wsc_enrollee are going > > They should not. But in effect they're being told that they're being > removed and to clean up after themselves immediately. No async cleanup > is possible. So here you're agreeing that station_free should not be assuming that the netdev is going away --> station_free *should* call some sort of netdev cleanup function (netdev_disconnect or a new function that doesn't send a CMD_DISCONNECT, but does call netdev_connect_free)... > > > away. So rather than assume the connect callback is not getting > > happening anymore, they also should call a cleanup-right-away function > > for netdev. Or ideally they'd just call a 'cleanup' function and it > > would be netdev figuring out which cleanup it needs, but if you want > > two separate methods I can add this. > > They should not. netdev already knows everything it needs to clean up > after itself. ...But here you're saying they shouldn't call any netdev cleanup function? Perhaps I'm misunderstanding you. > > > > >> > >>>> I disagree :) I think likely what is saving station in this case is > >>>> that the Disconnect event comes first (most of the time). But > >>>> inherently we might get this event via RTNL first and then we can go *boom*. > >>> > >>> Mostly what can happen is that the CMD_DISCONNECT command returns an > >>> error so we've sent a useless command (if we'd cancelled a wrong > >>> connection attempt that would be slightly worse). On the other hand > >>> if we free station without calling netdev_disconnect, then things can > >>> actually go *boom* because we could get the connect callback with an > >>> outdated user_data pointer. > >> > >> Aha! And this is likely why this was added in the first place: > >> git show daf248e1b > >> > >> But it doesn't make it 'right' from an API perspective. I suspect that > >> netdev should be invoking connect_cb with ABORTED in this case instead. > >> In fact I'm still a bit lost what this call is accomplishing at the > >> moment. > > > > In the current code it is preventing a crash in station_connect_cb though. > > > > Hmm, I take it because we call connect_cb with an invalid station > object? In other words: > > 1. we get rtnl event that netdev is down. > 2. This results in station being removed. > 3. Then nl80211 sends us a Disconnect event. > 4. netdev calls connect_cb with an invalid object > > But again, IMO the fix should be in netdev, not station. > > netdev should cleanup whatever stale state it has when it receives 1. > It *knows* what just happened, it should not be depending on every > client it has to do the right thing instead. That will hardcode the assumption that the netdev is disappearing in station. If they have a "client" type relationship then station should take care of cleaning up the processes it has initiated. Otherwise it would need to be very well documented in the comments and I bet we'll still occasionally get it wrong, the relationship just becomes very complex. > > Maybe you want it to invoke connect_cb with ABORTED after step 1, but > looking at station code it already takes care of this in station_free. > > >> The only thing station_connect_cb does when > >> NETDEV_RESULT_ABORTED is set is to generate a dbus reply. And then we > >> do it unconditionally later in station_free. > > > > Well it's conditional and I guess the condition is always false > > because we always send the reply in station_connect_cb (but we don't > > enter the CONNECTED stated until netconfig is successful? should we > > also delay the dbus reply?) > > Right. Imprecise wording from me. We do it if connect_pending is > non-NULL, but we always check it. > > > > > So it looks like we should either not send the reply in > > station_connect_cb if the result is aborted, or not send it in > > station_free (that'd be my preference). I think we should also be > > I'd just have netdev invoke netdev_connect_free prior to > NETDEV_WATCH_EVENT_DOWN. And remove netdev_disconnect from station_free. > > > able to cancel a netdev_disconnect() if one is running asynchronously. > > Curious. For what purpose? Same as the netdev_disconnect call makes netdev forget station's connect_cb, as far as I'm reading netdev.c there's currently no place where disconnect_cb is forgotten when station gets freed so we likely have the same crash there. Best regards