All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] frame-xchg: fix invalid read
@ 2020-10-29 16:37 James Prestwood
  2020-10-29 18:05 ` Andrew Zaborowski
  0 siblings, 1 reply; 10+ messages in thread
From: James Prestwood @ 2020-10-29 16:37 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 3981 bytes --]

This seems to happen occationally with testAP (potentially others).
The invalid read appears to happen when the frame_xchg_tx_cb detects
an early status and no ACK. In this particular case there is no
retry interval so we reach the retry limit and 'done' the frame.
This frees the 'fx' data all before the destroy callback can get
called. Once we finally return and the destroy callback is called
'fx' is freed and we see the invalid write.

==206== Memcheck, a memory error detector
==206== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==206== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==206== Command: iwd -p rad1,rad2,rad3,rad4 -d
==206== Parent PID: 140
==206==
==206== Invalid write of size 4
==206==    at 0x4493A0: frame_xchg_tx_destroy (frame-xchg.c:941)
==206==    by 0x46DAF6: destroy_request (genl.c:673)
==206==    by 0x46DAF6: process_unicast (genl.c:1002)
==206==    by 0x46DAF6: received_data (genl.c:1101)
==206==    by 0x46AA4B: io_callback (io.c:118)
==206==    by 0x469D6C: l_main_iterate (main.c:477)
==206==    by 0x469E1B: l_main_run (main.c:524)
==206==    by 0x469E1B: l_main_run (main.c:506)
==206==    by 0x46A02B: l_main_run_with_signal (main.c:646)
==206==    by 0x403E78: main (main.c:490)
==206==  Address 0x4c59c6c is 172 bytes inside a block of size 176 free'd
==206==    at 0x483B9F5: free (vg_replace_malloc.c:538)
==206==    by 0x40F14C: destroy_work (wiphy.c:248)
==206==    by 0x40F14C: wiphy_radio_work_done (wiphy.c:1578)
==206==    by 0x44A916: frame_xchg_tx_cb (frame-xchg.c:930)
==206==    by 0x46DAD9: process_unicast (genl.c:993)
==206==    by 0x46DAD9: received_data (genl.c:1101)
==206==    by 0x46AA4B: io_callback (io.c:118)
==206==    by 0x469D6C: l_main_iterate (main.c:477)
==206==    by 0x469E1B: l_main_run (main.c:524)
==206==    by 0x469E1B: l_main_run (main.c:506)
==206==    by 0x46A02B: l_main_run_with_signal (main.c:646)
==206==    by 0x403E78: main (main.c:490)
==206==  Block was alloc'd at
==206==    at 0x483A809: malloc (vg_replace_malloc.c:307)
==206==    by 0x4643CD: l_malloc (util.c:61)
==206==    by 0x44AF8C: frame_xchg_startv (frame-xchg.c:1155)
==206==    by 0x44B2A4: frame_xchg_start (frame-xchg.c:1108)
==206==    by 0x42BC55: ap_send_mgmt_frame (ap.c:709)
==206==    by 0x42F513: ap_probe_req_cb (ap.c:1869)
==206==    by 0x449752: frame_watch_unicast_notify (frame-xchg.c:233)
==206==    by 0x46DA2F: dispatch_unicast_watches (genl.c:961)
==206==    by 0x46DA2F: process_unicast (genl.c:980)
==206==    by 0x46DA2F: received_data (genl.c:1101)
==206==    by 0x46AA4B: io_callback (io.c:118)
==206==    by 0x469D6C: l_main_iterate (main.c:477)
==206==    by 0x469E1B: l_main_run (main.c:524)
==206==    by 0x469E1B: l_main_run (main.c:506)
==206==    by 0x46A02B: l_main_run_with_signal (main.c:646)
==206==
---
 src/frame-xchg.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/src/frame-xchg.c b/src/frame-xchg.c
index 0e3e330d..21dfd706 100644
--- a/src/frame-xchg.c
+++ b/src/frame-xchg.c
@@ -901,6 +901,8 @@ static void frame_xchg_tx_cb(struct l_genl_msg *msg, void *user_data)
 	uint64_t cookie;
 	bool early_status;
 
+	fx->tx_cmd_id = 0;
+
 	l_debug("err %i", -error);
 
 	if (error < 0) {
@@ -934,13 +936,6 @@ error:
 	frame_xchg_done(fx, error);
 }
 
-static void frame_xchg_tx_destroy(void *user_data)
-{
-	struct frame_xchg_data *fx = user_data;
-
-	fx->tx_cmd_id = 0;
-}
-
 static bool frame_xchg_tx_retry(struct wiphy_radio_work_item *item)
 {
 	struct frame_xchg_data *fx = l_container_of(item,
@@ -974,7 +969,7 @@ static bool frame_xchg_tx_retry(struct wiphy_radio_work_item *item)
 					&duration);
 
 	fx->tx_cmd_id = l_genl_family_send(nl80211, msg, frame_xchg_tx_cb, fx,
-						frame_xchg_tx_destroy);
+						NULL);
 	if (!fx->tx_cmd_id) {
 		l_error("Error sending frame");
 		l_genl_msg_unref(msg);
-- 
2.26.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] frame-xchg: fix invalid read
  2020-10-29 16:37 [PATCH] frame-xchg: fix invalid read James Prestwood
@ 2020-10-29 18:05 ` Andrew Zaborowski
  2020-10-29 18:17   ` James Prestwood
  2020-10-29 18:40   ` James Prestwood
  0 siblings, 2 replies; 10+ messages in thread
From: Andrew Zaborowski @ 2020-10-29 18:05 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 876 bytes --]

Hi James!

On Thu, 29 Oct 2020 at 17:37, James Prestwood <prestwoj@gmail.com> wrote:
> This seems to happen occationally with testAP (potentially others).
> The invalid read appears to happen when the frame_xchg_tx_cb detects
> an early status and no ACK. In this particular case there is no
> retry interval so we reach the retry limit and 'done' the frame.
> This frees the 'fx' data all before the destroy callback can get
> called.

frame_xchg_done() should result in l_genl_family_cancel(fx->tx_cmd_id)
which would call the destroy callback before 'fx' is freed so there
should be some other reason this is happening.

One possibility is that we passed a wrong id to
l_genl_family_cancel(), can you check that it returns true?

By the way do you use the usb passthrough to trigger this?  I don't
think hwsim has the "early status" quirk.

Best regards

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] frame-xchg: fix invalid read
  2020-10-29 18:05 ` Andrew Zaborowski
@ 2020-10-29 18:17   ` James Prestwood
  2020-10-29 18:40   ` James Prestwood
  1 sibling, 0 replies; 10+ messages in thread
From: James Prestwood @ 2020-10-29 18:17 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 2047 bytes --]

On Thu, 2020-10-29 at 19:05 +0100, Andrew Zaborowski wrote:
> Hi James!
> 
> On Thu, 29 Oct 2020 at 17:37, James Prestwood <prestwoj@gmail.com>
> wrote:
> > This seems to happen occationally with testAP (potentially others).
> > The invalid read appears to happen when the frame_xchg_tx_cb
> > detects
> > an early status and no ACK. In this particular case there is no
> > retry interval so we reach the retry limit and 'done' the frame.
> > This frees the 'fx' data all before the destroy callback can get
> > called.
> 
> frame_xchg_done() should result in l_genl_family_cancel(fx-
> >tx_cmd_id)
> which would call the destroy callback before 'fx' is freed so there
> should be some other reason this is happening.
> 
> One possibility is that we passed a wrong id to
> l_genl_family_cancel(), can you check that it returns true?

Yeah, Ill check this.

> 
> By the way do you use the usb passthrough to trigger this?  I don't
> think hwsim has the "early status" quirk.

No I've just been running testAP in test-runner. It doesnt happen all
the time but I've seen it now a few times.

I also noticed another issue (could be related) where the STA isnt
ACK'ing the probe response:

AP Probe Request from 02:00:00:00:03:00
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 8
src/wiphy.c:wiphy_radio_work_next() Starting work item 8
src/frame-xchg.c:frame_xchg_tx_cb() err 0
src/netdev.c:netdev_mlme_notify() MLME notification Frame TX Status(60)
src/frame-xchg.c:frame_xchg_mlme_notify() Received no ACK
src/frame-xchg.c:frame_xchg_wait_cancel() 
Frame tx retry limit reached
AP Probe Response received no ACK
src/wiphy.c:wiphy_radio_work_done() Work item 8 done

But this doesn't always result in the invalid read, and could be
completely unrelated. I have 'fixed' this by adding a response timeout
to frame_xchg_start but had to go up to 200ms to get testAP to pass
reliably. Even a 100ms timeout resulted in occational un-acked probe
responses.

Thanks,
James

> 
> Best regards

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] frame-xchg: fix invalid read
  2020-10-29 18:05 ` Andrew Zaborowski
  2020-10-29 18:17   ` James Prestwood
@ 2020-10-29 18:40   ` James Prestwood
  2020-10-29 23:33     ` Andrew Zaborowski
  1 sibling, 1 reply; 10+ messages in thread
From: James Prestwood @ 2020-10-29 18:40 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 2314 bytes --]

Hi Andrew,

On Thu, 2020-10-29 at 19:05 +0100, Andrew Zaborowski wrote:
> Hi James!
> 
> On Thu, 29 Oct 2020 at 17:37, James Prestwood <prestwoj@gmail.com>
> wrote:
> > This seems to happen occationally with testAP (potentially others).
> > The invalid read appears to happen when the frame_xchg_tx_cb
> > detects
> > an early status and no ACK. In this particular case there is no
> > retry interval so we reach the retry limit and 'done' the frame.
> > This frees the 'fx' data all before the destroy callback can get
> > called.
> 
> frame_xchg_done() should result in l_genl_family_cancel(fx-
> >tx_cmd_id)
> which would call the destroy callback before 'fx' is freed so there
> should be some other reason this is happening.
> 
> One possibility is that we passed a wrong id to
> l_genl_family_cancel(), can you check that it returns true?

So it looks like frame-xchg has got some order of operations backwards.
Every single call to l_genl_family_cancel fails. And this isn't really
surprising because in normal case we are canceling after we are already
in the callback so there is really nothing to cancel right?

Maybe you had a better idea of how to handle this but I think zero'ing
the command ID in the callback is still the right thing to do. That way
l_genl_family_cancel won't get called unless we are actually canceling
a still in-flight frame.

Here is with some prints added [*]:

[*]src/frame-xchg.c:frame_xchg_tx_retry() Sending frame with ID 83
src/station.c:station_netdev_event() Associating
src/netdev.c:netdev_mlme_notify() MLME notification Frame TX Status(60)
src/frame-xchg.c:frame_xchg_mlme_notify() Received an ACK
src/frame-xchg.c:frame_xchg_tx_cb() err 0
AP Authentication frame 2 ACKed by STA
src/wiphy.c:wiphy_radio_work_done() Work item 10 done
[*]src/frame-xchg.c:frame_xchg_reset() Canceling ID 83
[*]src/frame-xchg.c:frame_xchg_reset() Failed to cancel 83

And this is true for every single frame from what I can tell. I was
also able to reproduce this reliably by adding 'hwsim_medium=no' to
testAP/main.conf if you care to give it a spin (with --valgrind as
well).

Thanks,
James

> 
> By the way do you use the usb passthrough to trigger this?  I don't
> think hwsim has the "early status" quirk.
> 
> Best regards

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] frame-xchg: fix invalid read
  2020-10-29 18:40   ` James Prestwood
@ 2020-10-29 23:33     ` Andrew Zaborowski
  2020-10-30  0:14       ` James Prestwood
  2020-10-30  0:47       ` Denis Kenzior
  0 siblings, 2 replies; 10+ messages in thread
From: Andrew Zaborowski @ 2020-10-29 23:33 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 2395 bytes --]

On Thu, 29 Oct 2020 at 19:40, James Prestwood <prestwoj@gmail.com> wrote:
> On Thu, 2020-10-29 at 19:05 +0100, Andrew Zaborowski wrote:
> > frame_xchg_done() should result in l_genl_family_cancel(fx-
> > >tx_cmd_id)
> > which would call the destroy callback before 'fx' is freed so there
> > should be some other reason this is happening.
> >
> > One possibility is that we passed a wrong id to
> > l_genl_family_cancel(), can you check that it returns true?
>
> So it looks like frame-xchg has got some order of operations backwards.
> Every single call to l_genl_family_cancel fails. And this isn't really
> surprising because in normal case we are canceling after we are already
> in the callback so there is really nothing to cancel right?

Oh okay, that's why it's not cancelling the destroy callback.  I
definitely think there should be a way to make ell forget the genl
command at any phase including in the callback.  So I'd be in favour
of changing l_genl_family_cancel to do this.  Denis, what do you think
is better?

>
> Maybe you had a better idea of how to handle this but I think zero'ing
> the command ID in the callback is still the right thing to do. That way
> l_genl_family_cancel won't get called unless we are actually canceling
> a still in-flight frame.

The destroy callback would still be called though.  Or if we drop it
like in the patch, we'd have to manually do: fx->tx_cmd_id = 0 every
time we cancel tx_cmd_id or free nl80211.

>
> Here is with some prints added [*]:
>
> [*]src/frame-xchg.c:frame_xchg_tx_retry() Sending frame with ID 83
> src/station.c:station_netdev_event() Associating
> src/netdev.c:netdev_mlme_notify() MLME notification Frame TX Status(60)
> src/frame-xchg.c:frame_xchg_mlme_notify() Received an ACK
> src/frame-xchg.c:frame_xchg_tx_cb() err 0
> AP Authentication frame 2 ACKed by STA
> src/wiphy.c:wiphy_radio_work_done() Work item 10 done
> [*]src/frame-xchg.c:frame_xchg_reset() Canceling ID 83
> [*]src/frame-xchg.c:frame_xchg_reset() Failed to cancel 83
>
> And this is true for every single frame from what I can tell. I was
> also able to reproduce this reliably by adding 'hwsim_medium=no' to
> testAP/main.conf if you care to give it a spin (with --valgrind as
> well).

I wasn't able to reproduce the invalid read yet, but I do get
occasional failures in test_connection_success.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] frame-xchg: fix invalid read
  2020-10-29 23:33     ` Andrew Zaborowski
@ 2020-10-30  0:14       ` James Prestwood
  2020-10-30  0:47       ` Denis Kenzior
  1 sibling, 0 replies; 10+ messages in thread
From: James Prestwood @ 2020-10-30  0:14 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 2995 bytes --]

On Fri, 2020-10-30 at 00:33 +0100, Andrew Zaborowski wrote:
> On Thu, 29 Oct 2020 at 19:40, James Prestwood <prestwoj@gmail.com>
> wrote:
> > On Thu, 2020-10-29 at 19:05 +0100, Andrew Zaborowski wrote:
> > > frame_xchg_done() should result in l_genl_family_cancel(fx-
> > > > tx_cmd_id)
> > > which would call the destroy callback before 'fx' is freed so
> > > there
> > > should be some other reason this is happening.
> > > 
> > > One possibility is that we passed a wrong id to
> > > l_genl_family_cancel(), can you check that it returns true?
> > 
> > So it looks like frame-xchg has got some order of operations
> > backwards.
> > Every single call to l_genl_family_cancel fails. And this isn't
> > really
> > surprising because in normal case we are canceling after we are
> > already
> > in the callback so there is really nothing to cancel right?
> 
> Oh okay, that's why it's not cancelling the destroy callback.  I
> definitely think there should be a way to make ell forget the genl
> command at any phase including in the callback.  So I'd be in favour
> of changing l_genl_family_cancel to do this.  Denis, what do you
> think
> is better?
> 
> > Maybe you had a better idea of how to handle this but I think
> > zero'ing
> > the command ID in the callback is still the right thing to do. That
> > way
> > l_genl_family_cancel won't get called unless we are actually
> > canceling
> > a still in-flight frame.
> 
> The destroy callback would still be called though.  Or if we drop it
> like in the patch, we'd have to manually do: fx->tx_cmd_id = 0 every
> time we cancel tx_cmd_id or free nl80211.

Yeah, but isn't this what we do basically everywhere in IWD?

> 
> > Here is with some prints added [*]:
> > 
> > [*]src/frame-xchg.c:frame_xchg_tx_retry() Sending frame with ID 83
> > src/station.c:station_netdev_event() Associating
> > src/netdev.c:netdev_mlme_notify() MLME notification Frame TX
> > Status(60)
> > src/frame-xchg.c:frame_xchg_mlme_notify() Received an ACK
> > src/frame-xchg.c:frame_xchg_tx_cb() err 0
> > AP Authentication frame 2 ACKed by STA
> > src/wiphy.c:wiphy_radio_work_done() Work item 10 done
> > [*]src/frame-xchg.c:frame_xchg_reset() Canceling ID 83
> > [*]src/frame-xchg.c:frame_xchg_reset() Failed to cancel 83
> > 
> > And this is true for every single frame from what I can tell. I was
> > also able to reproduce this reliably by adding 'hwsim_medium=no' to
> > testAP/main.conf if you care to give it a spin (with --valgrind as
> > well).
> 
> I wasn't able to reproduce the invalid read yet, but I do get
> occasional failures in test_connection_success.

Ok, this is likely the other problem of the probe responses not being
ACK'ed, or at least not receiving the ACK in time. Not sure the best
route here, I'm thinking its purely a test environment issue because
100ms+ should be plenty of time and I had to go up to 200ms for the
test to pass reliably.

Thanks,
James


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] frame-xchg: fix invalid read
  2020-10-29 23:33     ` Andrew Zaborowski
  2020-10-30  0:14       ` James Prestwood
@ 2020-10-30  0:47       ` Denis Kenzior
  2020-10-30  1:16         ` Andrew Zaborowski
  1 sibling, 1 reply; 10+ messages in thread
From: Denis Kenzior @ 2020-10-30  0:47 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 1380 bytes --]

Hi Andrew,

>> So it looks like frame-xchg has got some order of operations backwards.
>> Every single call to l_genl_family_cancel fails. And this isn't really
>> surprising because in normal case we are canceling after we are already
>> in the callback so there is really nothing to cancel right?
> 
> Oh okay, that's why it's not cancelling the destroy callback.  I
> definitely think there should be a way to make ell forget the genl
> command at any phase including in the callback.  So I'd be in favour
> of changing l_genl_family_cancel to do this.  Denis, what do you think
> is better?

I don't know the code path in question all that well... But what is it that 
you're proposing?  That l_genl_family_cancel would not call the destroy 
callback?  I don't think we should do that, since the caller who provided the 
user data might not be the same as the one calling cancel.  And that would be 
different to just about every other use of the cancel method.

We can certainly try and detect whether cancel is being called on a command that 
is already in the callback stage to prevent calling destroy twice.  But I would 
almost say that such behavior should trigger a L_WARN_ON and the proper solution 
would be to fix the logic flow.  In other words, why are you triggering a cancel 
for a command that is in the callback path?

Regards,
-Denis

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] frame-xchg: fix invalid read
  2020-10-30  0:47       ` Denis Kenzior
@ 2020-10-30  1:16         ` Andrew Zaborowski
  2020-10-30  1:18           ` Denis Kenzior
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Zaborowski @ 2020-10-30  1:16 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 1854 bytes --]

On Fri, 30 Oct 2020 at 02:01, Denis Kenzior <denkenz@gmail.com> wrote:
> >> So it looks like frame-xchg has got some order of operations backwards.
> >> Every single call to l_genl_family_cancel fails. And this isn't really
> >> surprising because in normal case we are canceling after we are already
> >> in the callback so there is really nothing to cancel right?
> >
> > Oh okay, that's why it's not cancelling the destroy callback.  I
> > definitely think there should be a way to make ell forget the genl
> > command at any phase including in the callback.  So I'd be in favour
> > of changing l_genl_family_cancel to do this.  Denis, what do you think
> > is better?
>
> I don't know the code path in question all that well... But what is it that
> you're proposing?  That l_genl_family_cancel would not call the destroy
> callback?  I don't think we should do that, since the caller who provided the
> user data might not be the same as the one calling cancel.  And that would be
> different to just about every other use of the cancel method.

I guess I propose that l_genl_family_cancel, when called from inside
the callback:
* calls the destroy function and removes any references to the genl command,
* returns true.

So same as outside of the callback.  The current behaviour is to return false.

>
> We can certainly try and detect whether cancel is being called on a command that
> is already in the callback stage to prevent calling destroy twice.  But I would
> almost say that such behavior should trigger a L_WARN_ON and the proper solution
> would be to fix the logic flow.  In other words, why are you triggering a cancel
> for a command that is in the callback path?

I think it's useful to have a way to ensure all references to the
provided callbacks/userdata are dropped right now.

Best regards

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] frame-xchg: fix invalid read
  2020-10-30  1:16         ` Andrew Zaborowski
@ 2020-10-30  1:18           ` Denis Kenzior
  2020-10-30  1:49             ` Andrew Zaborowski
  0 siblings, 1 reply; 10+ messages in thread
From: Denis Kenzior @ 2020-10-30  1:18 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 2537 bytes --]

Hi Andrew,

On 10/29/20 8:16 PM, Andrew Zaborowski wrote:
> On Fri, 30 Oct 2020 at 02:01, Denis Kenzior <denkenz@gmail.com> wrote:
>>>> So it looks like frame-xchg has got some order of operations backwards.
>>>> Every single call to l_genl_family_cancel fails. And this isn't really
>>>> surprising because in normal case we are canceling after we are already
>>>> in the callback so there is really nothing to cancel right?
>>>
>>> Oh okay, that's why it's not cancelling the destroy callback.  I
>>> definitely think there should be a way to make ell forget the genl
>>> command at any phase including in the callback.  So I'd be in favour
>>> of changing l_genl_family_cancel to do this.  Denis, what do you think
>>> is better?
>>
>> I don't know the code path in question all that well... But what is it that
>> you're proposing?  That l_genl_family_cancel would not call the destroy
>> callback?  I don't think we should do that, since the caller who provided the
>> user data might not be the same as the one calling cancel.  And that would be
>> different to just about every other use of the cancel method.
> 
> I guess I propose that l_genl_family_cancel, when called from inside
> the callback:
> * calls the destroy function and removes any references to the genl command,
> * returns true.
> 
> So same as outside of the callback.  The current behaviour is to return false.

That is sort of on purpose.  I mean sure, we can try to 'fix' this by not taking 
the request off the pending_list prior to calling the callback and adding some 
flags if cancel is called recursively.  But no other class provides any sort of 
guarantees on such behavior and up until now I would have considered behavior in 
such situations as 'undefined'.

Maybe you have a valid use case here, but I know I would never consider using 
_cancel this way.

> 
>>
>> We can certainly try and detect whether cancel is being called on a command that
>> is already in the callback stage to prevent calling destroy twice.  But I would
>> almost say that such behavior should trigger a L_WARN_ON and the proper solution
>> would be to fix the logic flow.  In other words, why are you triggering a cancel
>> for a command that is in the callback path?
> 
> I think it's useful to have a way to ensure all references to the
> provided callbacks/userdata are dropped right now.

Useful how?  Wouldn't it be easier to change the logic flow to conform to the 
general API conventions instead?

Regards,
Denis

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] frame-xchg: fix invalid read
  2020-10-30  1:18           ` Denis Kenzior
@ 2020-10-30  1:49             ` Andrew Zaborowski
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Zaborowski @ 2020-10-30  1:49 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 2801 bytes --]

On Fri, 30 Oct 2020 at 02:36, Denis Kenzior <denkenz@gmail.com> wrote:
> On 10/29/20 8:16 PM, Andrew Zaborowski wrote:
> > On Fri, 30 Oct 2020 at 02:01, Denis Kenzior <denkenz@gmail.com> wrote:
> >>>> So it looks like frame-xchg has got some order of operations backwards.
> >>>> Every single call to l_genl_family_cancel fails. And this isn't really
> >>>> surprising because in normal case we are canceling after we are already
> >>>> in the callback so there is really nothing to cancel right?
> >>>
> >>> Oh okay, that's why it's not cancelling the destroy callback.  I
> >>> definitely think there should be a way to make ell forget the genl
> >>> command at any phase including in the callback.  So I'd be in favour
> >>> of changing l_genl_family_cancel to do this.  Denis, what do you think
> >>> is better?
> >>
> >> I don't know the code path in question all that well... But what is it that
> >> you're proposing?  That l_genl_family_cancel would not call the destroy
> >> callback?  I don't think we should do that, since the caller who provided the
> >> user data might not be the same as the one calling cancel.  And that would be
> >> different to just about every other use of the cancel method.
> >
> > I guess I propose that l_genl_family_cancel, when called from inside
> > the callback:
> > * calls the destroy function and removes any references to the genl command,
> > * returns true.
> >
> > So same as outside of the callback.  The current behaviour is to return false.
>
> That is sort of on purpose.  I mean sure, we can try to 'fix' this by not taking
> the request off the pending_list prior to calling the callback and adding some
> flags if cancel is called recursively.  But no other class provides any sort of
> guarantees on such behavior and up until now I would have considered behavior in
> such situations as 'undefined'.
>
> Maybe you have a valid use case here, but I know I would never consider using
> _cancel this way.
>
> >
> >>
> >> We can certainly try and detect whether cancel is being called on a command that
> >> is already in the callback stage to prevent calling destroy twice.  But I would
> >> almost say that such behavior should trigger a L_WARN_ON and the proper solution
> >> would be to fix the logic flow.  In other words, why are you triggering a cancel
> >> for a command that is in the callback path?
> >
> > I think it's useful to have a way to ensure all references to the
> > provided callbacks/userdata are dropped right now.
>
> Useful how?  Wouldn't it be easier to change the logic flow to conform to the
> general API conventions instead?

Ok, if that's the convention then yes.  I had assumed it was the other
way and that's where the frame-xchg bug comes from.

Best regards

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-10-30  1:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-29 16:37 [PATCH] frame-xchg: fix invalid read James Prestwood
2020-10-29 18:05 ` Andrew Zaborowski
2020-10-29 18:17   ` James Prestwood
2020-10-29 18:40   ` James Prestwood
2020-10-29 23:33     ` Andrew Zaborowski
2020-10-30  0:14       ` James Prestwood
2020-10-30  0:47       ` Denis Kenzior
2020-10-30  1:16         ` Andrew Zaborowski
2020-10-30  1:18           ` Denis Kenzior
2020-10-30  1:49             ` Andrew Zaborowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.