All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Ipw2100-devel] ipw2100: race between isr_indicate_associated and rx path
       [not found]   ` <200902051511.31268.helmut.schaa@gmail.com>
@ 2009-02-23 10:38     ` Helmut Schaa
  2009-02-23 18:08       ` Jouni Malinen
  2009-02-24  3:34         ` Zhu Yi
  0 siblings, 2 replies; 14+ messages in thread
From: Helmut Schaa @ 2009-02-23 10:38 UTC (permalink / raw)
  To: netdev; +Cc: Zhu, Yi, ipw2100-devel, Jouni Malinen

Am Donnerstag, 5. Februar 2009 schrieb Helmut Schaa:
> Am Dienstag, 27. Januar 2009 schrieb Helmut Schaa:
> > Am Freitag, 23. Januar 2009 schrieb Helmut Schaa:
> > > Am Freitag, 23. Januar 2009 schrieb Zhu, Yi:
[...]
> > > > I see. This should be a firmware bug. I think your idea to queue packets
> > > > between ASSOCIATING and ASSOCIATED and replay them later (state becomes
> > > > ASSOCIATED) should work. 
> > > 
> > > Agreed, I'll try that (maybe today, maybe next week).
> > 
> > Ok, I've done a first try and the frame buffering/replaying works quite well
> > but I've ran into another issue now:
> > 
> > The supplicant successfully receives the EAP frame which was buffered by the
> > driver and sends the appropriate resone. However the response is not send over
> > the air. If I just add a sleep(1) before sending the frame in the supplicant
> > all works well. I have no clue yet why the frame is not send.
> 
> JFYI, got a bit further now. The driver never got the frame from the
> supplicant. It's the netdev which does not accept the frame that short
> after the queues are woken up.

Found some time again to investigate this issue again. The current state
is as follows:

After the firmware notifies the driver about the association it starts
buffering all frames. Once the delayed work is executed and moves the
driver state to ASSOCIATED the following happens:

1) netif_carrier_on
2) netif_wake_queue
3) wireless_send_event
4) replay buffered frames

Hereupon wpa_supplicant receives the buffered EAP-frame and builds the
according reply and tries to send it. The sendto call does _not_ indicate
an error. Nevertheless, the frame is not passed to the ipw2100 driver. I
was able to track that down to the following situation:


This happens when the driver moves to the associated state:
----------------------------
netif_carrier_on
  linkwatch_fire_event
    linkwatch_schedule_work
netif_wake_queue
----------------------------
At that point in time the device's tx queue has a noop_qdisc assigned.

Now wpa_supplicant sends the EAP reply:
---------------------------
packet_sendmsg
  dev_queue_xmit
    qdisc_enqueue_root
      qdisc_enqueue
        return NET_XMIT_CN
  return 0
---------------------------
Since the qdisc is still noop_qdisc, qdisc_enqueue returns NET_XMIT_CN for
every frame while packet_sendmsg translates that to 0, see netdevice.h:

#define net_xmit_errno(e) ((e) != NET_XMIT_CN ? -ENOBUFS : 0)

Hence, wpa_supplicant thinks the frame was sent out successfully.

Somewhat later when the queued linkwatch work is executed the qdisc gets
swapped to the default_qdisc which would allow frames to be send.

---------------------------
linkwatch_event
  __linkwatch_run_queue
    activate_dev
      attach_default_qdisc
---------------------------

So, how should I proceed here?

Some possibilities that come to mind:

1) let the noop_qdisc return NET_XMIT_DROP instead of NET_XMIT_CN and extend
   wpa_supplicant to retry after a short timeout. Already tried this approach
   and it works fine for me. wpa_supplicant typically needs one retry (200ms
   delay) until the frame is successfully send out.

2) Run activate_dev somehow without a delay. I guess this could be achieved by
   changing linkwatch_urgent_event. I haven't tested this yet. But I guess we
   would still have a small race here.

3) Wait until activate_dev was called in ipw2100 before replaying the cached
   frames.

Maybe, someone from the netdev people can give me a hand here?

Jouni, would you accept a patch for wpa_supplicant that adds some retries
to l2_packet_send when the network stack returns an error?

Thanks,
Helmut

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Ipw2100-devel] ipw2100: race between isr_indicate_associated and rx path
  2009-02-23 10:38     ` [Ipw2100-devel] ipw2100: race between isr_indicate_associated and rx path Helmut Schaa
@ 2009-02-23 18:08       ` Jouni Malinen
  2009-02-24  7:36         ` Helmut Schaa
  2009-02-24  3:34         ` Zhu Yi
  1 sibling, 1 reply; 14+ messages in thread
From: Jouni Malinen @ 2009-02-23 18:08 UTC (permalink / raw)
  To: Helmut Schaa; +Cc: netdev, Zhu, Yi, ipw2100-devel

On Mon, Feb 23, 2009 at 11:38:57AM +0100, Helmut Schaa wrote:

> Jouni, would you accept a patch for wpa_supplicant that adds some retries
> to l2_packet_send when the network stack returns an error?

Changing l2_packet_send() to do some magic on errors does not sound like
the correct solution here. l2_packet_send() caller might be more
interested in doing something should the send fail. If I understood the
description correctly, it sounds like kernel code is telling user space
that everything is ready and it is even delivering a received frame into
user space. If sending of the reply to this frame is now failing, it
sounds much more like a kernel side issue that should really be fixed
instead of trying to come up with workarounds in user space
applications. If the driver/network stack is not ready to accept
packets, it should not really claim to be.

I did not fully understand the details of the issue and why noop_qdisc
is used at the point when a frame is delivered to user space, so I may
be missing some details here. Anyway, I would highly prefer the kernel
to avoid delivering EAPOL frames into user space if it is not ready to
transmit a response to them.

-- 
Jouni Malinen                                            PGP id EFC895FA

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Ipw2100-devel] ipw2100: race between isr_indicate_associated and rx path
@ 2009-02-24  3:34         ` Zhu Yi
  0 siblings, 0 replies; 14+ messages in thread
From: Zhu Yi @ 2009-02-24  3:34 UTC (permalink / raw)
  To: Helmut Schaa; +Cc: netdev, linux-wireless, Jouni Malinen

[ remove ipw2100-devel and cc linux-wireless ]

On Mon, 2009-02-23 at 18:38 +0800, Helmut Schaa wrote:
> Am Donnerstag, 5. Februar 2009 schrieb Helmut Schaa:
> > Am Dienstag, 27. Januar 2009 schrieb Helmut Schaa:
> > > Am Freitag, 23. Januar 2009 schrieb Helmut Schaa:
> > > > Am Freitag, 23. Januar 2009 schrieb Zhu, Yi:
> [...]
> > > > > I see. This should be a firmware bug. I think your idea to queue packets
> > > > > between ASSOCIATING and ASSOCIATED and replay them later (state becomes
> > > > > ASSOCIATED) should work. 
> > > > 
> > > > Agreed, I'll try that (maybe today, maybe next week).
> > > 
> > > Ok, I've done a first try and the frame buffering/replaying works quite well
> > > but I've ran into another issue now:
> > > 
> > > The supplicant successfully receives the EAP frame which was buffered by the
> > > driver and sends the appropriate resone. However the response is not send over
> > > the air. If I just add a sleep(1) before sending the frame in the supplicant
> > > all works well. I have no clue yet why the frame is not send.
> > 
> > JFYI, got a bit further now. The driver never got the frame from the
> > supplicant. It's the netdev which does not accept the frame that short
> > after the queues are woken up.
> 
> Found some time again to investigate this issue again. The current state
> is as follows:
> 
> After the firmware notifies the driver about the association it starts
> buffering all frames. Once the delayed work is executed and moves the
> driver state to ASSOCIATED the following happens:
> 
> 1) netif_carrier_on
> 2) netif_wake_queue
> 3) wireless_send_event
> 4) replay buffered frames
> 
> Hereupon wpa_supplicant receives the buffered EAP-frame and builds the
> according reply and tries to send it. The sendto call does _not_ indicate
> an error. Nevertheless, the frame is not passed to the ipw2100 driver. I
> was able to track that down to the following situation:
> 
> 
> This happens when the driver moves to the associated state:
> ----------------------------
> netif_carrier_on
>   linkwatch_fire_event
>     linkwatch_schedule_work
> netif_wake_queue
> ----------------------------
> At that point in time the device's tx queue has a noop_qdisc assigned.
> 
> Now wpa_supplicant sends the EAP reply:
> ---------------------------
> packet_sendmsg
>   dev_queue_xmit
>     qdisc_enqueue_root
>       qdisc_enqueue
>         return NET_XMIT_CN
>   return 0
> ---------------------------
> Since the qdisc is still noop_qdisc, qdisc_enqueue returns NET_XMIT_CN for
> every frame while packet_sendmsg translates that to 0, see netdevice.h:
> 
> #define net_xmit_errno(e) ((e) != NET_XMIT_CN ? -ENOBUFS : 0)
> 
> Hence, wpa_supplicant thinks the frame was sent out successfully.
> 
> Somewhat later when the queued linkwatch work is executed the qdisc gets
> swapped to the default_qdisc which would allow frames to be send.
> 
> ---------------------------
> linkwatch_event
>   __linkwatch_run_queue
>     activate_dev
>       attach_default_qdisc
> ---------------------------

Thanks for the analysis. Are you sure noop_qdisc is still used when we
are about to netif_carrier_on() after receiving the association success
response? From dev_open(), dev_activate() is called after netdev->open.
So the txq->qdisc_sleeping should be already replaced with pfifo_fast.
But the state is still DEACTIVATED. Should the packet from
wpa_supplicant be dropped by dev_queue_xmit()?

> So, how should I proceed here?
> 
> Some possibilities that come to mind:
> 
> 1) let the noop_qdisc return NET_XMIT_DROP instead of NET_XMIT_CN and extend
>    wpa_supplicant to retry after a short timeout. Already tried this approach
>    and it works fine for me. wpa_supplicant typically needs one retry (200ms
>    delay) until the frame is successfully send out.
> 
> 2) Run activate_dev somehow without a delay. I guess this could be achieved by
>    changing linkwatch_urgent_event. I haven't tested this yet. But I guess we
>    would still have a small race here.
> 
> 3) Wait until activate_dev was called in ipw2100 before replaying the cached
>    frames.

I think making a sync version of netif_carrier_on/activate_dev should be
the way to go. This could be a requirement from wireless. In wired
network, netif_carrier_on() is called after a network cable plug event
is detected. Some delay should be OK. But in wireless,
netif_carrier_on() is usually called after an association is succeeded.
The driver has already some management frames transfered with AP. Now
it's the time to open the data frame transmission. The driver requires
to get the activate_dev() result (synchronously or via callback) because
otherwise the driver has no idea when the Qdisc is ready and then it can
start to deliver data frames to network stack and user space. The real
failure example here is the one Helmut found about the wpa_supplicant
EAPOL frames lost case above.

> Maybe, someone from the netdev people can give me a hand here?

Yeah, please comment.

Thanks,
-yi


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Ipw2100-devel] ipw2100: race between isr_indicate_associated and rx path
@ 2009-02-24  3:34         ` Zhu Yi
  0 siblings, 0 replies; 14+ messages in thread
From: Zhu Yi @ 2009-02-24  3:34 UTC (permalink / raw)
  To: Helmut Schaa
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA, Jouni Malinen

[ remove ipw2100-devel and cc linux-wireless ]

On Mon, 2009-02-23 at 18:38 +0800, Helmut Schaa wrote:
> Am Donnerstag, 5. Februar 2009 schrieb Helmut Schaa:
> > Am Dienstag, 27. Januar 2009 schrieb Helmut Schaa:
> > > Am Freitag, 23. Januar 2009 schrieb Helmut Schaa:
> > > > Am Freitag, 23. Januar 2009 schrieb Zhu, Yi:
> [...]
> > > > > I see. This should be a firmware bug. I think your idea to queue packets
> > > > > between ASSOCIATING and ASSOCIATED and replay them later (state becomes
> > > > > ASSOCIATED) should work. 
> > > > 
> > > > Agreed, I'll try that (maybe today, maybe next week).
> > > 
> > > Ok, I've done a first try and the frame buffering/replaying works quite well
> > > but I've ran into another issue now:
> > > 
> > > The supplicant successfully receives the EAP frame which was buffered by the
> > > driver and sends the appropriate resone. However the response is not send over
> > > the air. If I just add a sleep(1) before sending the frame in the supplicant
> > > all works well. I have no clue yet why the frame is not send.
> > 
> > JFYI, got a bit further now. The driver never got the frame from the
> > supplicant. It's the netdev which does not accept the frame that short
> > after the queues are woken up.
> 
> Found some time again to investigate this issue again. The current state
> is as follows:
> 
> After the firmware notifies the driver about the association it starts
> buffering all frames. Once the delayed work is executed and moves the
> driver state to ASSOCIATED the following happens:
> 
> 1) netif_carrier_on
> 2) netif_wake_queue
> 3) wireless_send_event
> 4) replay buffered frames
> 
> Hereupon wpa_supplicant receives the buffered EAP-frame and builds the
> according reply and tries to send it. The sendto call does _not_ indicate
> an error. Nevertheless, the frame is not passed to the ipw2100 driver. I
> was able to track that down to the following situation:
> 
> 
> This happens when the driver moves to the associated state:
> ----------------------------
> netif_carrier_on
>   linkwatch_fire_event
>     linkwatch_schedule_work
> netif_wake_queue
> ----------------------------
> At that point in time the device's tx queue has a noop_qdisc assigned.
> 
> Now wpa_supplicant sends the EAP reply:
> ---------------------------
> packet_sendmsg
>   dev_queue_xmit
>     qdisc_enqueue_root
>       qdisc_enqueue
>         return NET_XMIT_CN
>   return 0
> ---------------------------
> Since the qdisc is still noop_qdisc, qdisc_enqueue returns NET_XMIT_CN for
> every frame while packet_sendmsg translates that to 0, see netdevice.h:
> 
> #define net_xmit_errno(e) ((e) != NET_XMIT_CN ? -ENOBUFS : 0)
> 
> Hence, wpa_supplicant thinks the frame was sent out successfully.
> 
> Somewhat later when the queued linkwatch work is executed the qdisc gets
> swapped to the default_qdisc which would allow frames to be send.
> 
> ---------------------------
> linkwatch_event
>   __linkwatch_run_queue
>     activate_dev
>       attach_default_qdisc
> ---------------------------

Thanks for the analysis. Are you sure noop_qdisc is still used when we
are about to netif_carrier_on() after receiving the association success
response? From dev_open(), dev_activate() is called after netdev->open.
So the txq->qdisc_sleeping should be already replaced with pfifo_fast.
But the state is still DEACTIVATED. Should the packet from
wpa_supplicant be dropped by dev_queue_xmit()?

> So, how should I proceed here?
> 
> Some possibilities that come to mind:
> 
> 1) let the noop_qdisc return NET_XMIT_DROP instead of NET_XMIT_CN and extend
>    wpa_supplicant to retry after a short timeout. Already tried this approach
>    and it works fine for me. wpa_supplicant typically needs one retry (200ms
>    delay) until the frame is successfully send out.
> 
> 2) Run activate_dev somehow without a delay. I guess this could be achieved by
>    changing linkwatch_urgent_event. I haven't tested this yet. But I guess we
>    would still have a small race here.
> 
> 3) Wait until activate_dev was called in ipw2100 before replaying the cached
>    frames.

I think making a sync version of netif_carrier_on/activate_dev should be
the way to go. This could be a requirement from wireless. In wired
network, netif_carrier_on() is called after a network cable plug event
is detected. Some delay should be OK. But in wireless,
netif_carrier_on() is usually called after an association is succeeded.
The driver has already some management frames transfered with AP. Now
it's the time to open the data frame transmission. The driver requires
to get the activate_dev() result (synchronously or via callback) because
otherwise the driver has no idea when the Qdisc is ready and then it can
start to deliver data frames to network stack and user space. The real
failure example here is the one Helmut found about the wpa_supplicant
EAPOL frames lost case above.

> Maybe, someone from the netdev people can give me a hand here?

Yeah, please comment.

Thanks,
-yi

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Ipw2100-devel] ipw2100: race between isr_indicate_associated and rx path
  2009-02-23 18:08       ` Jouni Malinen
@ 2009-02-24  7:36         ` Helmut Schaa
  0 siblings, 0 replies; 14+ messages in thread
From: Helmut Schaa @ 2009-02-24  7:36 UTC (permalink / raw)
  To: Jouni Malinen; +Cc: netdev, Zhu, Yi, ipw2100-devel

Am Montag, 23. Februar 2009 schrieb Jouni Malinen:
> On Mon, Feb 23, 2009 at 11:38:57AM +0100, Helmut Schaa wrote:
> 
> > Jouni, would you accept a patch for wpa_supplicant that adds some retries
> > to l2_packet_send when the network stack returns an error?

[...]

> I did not fully understand the details of the issue and why noop_qdisc
> is used at the point when a frame is delivered to user space, so I may
> be missing some details here. Anyway, I would highly prefer the kernel
> to avoid delivering EAPOL frames into user space if it is not ready to
> transmit a response to them.

Sounds reasonable to me.

Helmut

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Ipw2100-devel] ipw2100: race between isr_indicate_associated and rx path
  2009-02-24  3:34         ` Zhu Yi
  (?)
@ 2009-02-24 12:15         ` Helmut Schaa
  2009-02-25  1:18             ` Zhu Yi
  -1 siblings, 1 reply; 14+ messages in thread
From: Helmut Schaa @ 2009-02-24 12:15 UTC (permalink / raw)
  To: Zhu Yi; +Cc: netdev, linux-wireless, Jouni Malinen

Am Dienstag, 24. Februar 2009 schrieb Zhu Yi:

[...]

> Thanks for the analysis. Are you sure noop_qdisc is still used when we
> are about to netif_carrier_on() after receiving the association success
> response? 

Yes, had a printk in noop_enqueue.

> From dev_open(), dev_activate() is called after netdev->open.
> So the txq->qdisc_sleeping should be already replaced with pfifo_fast.
> But the state is still DEACTIVATED.

Right. But for example when a connection cannot be established by ipw2100
it will call netif_carrier_off on disassociation which in turn leads to a
call to dev_deavtivate. Hence, it is possible that a noop_qdisc is
assigned while the device is up and carrier is off.

> Should the packet from 
> wpa_supplicant be dropped by dev_queue_xmit()?
>
> > So, how should I proceed here?
> > 
> > Some possibilities that come to mind:
> > 
> > 1) let the noop_qdisc return NET_XMIT_DROP instead of NET_XMIT_CN and extend
> >    wpa_supplicant to retry after a short timeout. Already tried this approach
> >    and it works fine for me. wpa_supplicant typically needs one retry (200ms
> >    delay) until the frame is successfully send out.
> > 
> > 2) Run activate_dev somehow without a delay. I guess this could be achieved by
> >    changing linkwatch_urgent_event. I haven't tested this yet. But I guess we
> >    would still have a small race here.
> > 
> > 3) Wait until activate_dev was called in ipw2100 before replaying the cached
> >    frames.
> 
> I think making a sync version of netif_carrier_on/activate_dev should be
> the way to go. This could be a requirement from wireless. In wired
> network, netif_carrier_on() is called after a network cable plug event
> is detected. Some delay should be OK.

Maybe, but I can imagine a similar race when using a wired 802.1x secured
network. The switch might send an EAP-request very quickly after the carrier
was detected.

> But in wireless, 
> netif_carrier_on() is usually called after an association is succeeded.
> The driver has already some management frames transfered with AP. Now
> it's the time to open the data frame transmission. The driver requires
> to get the activate_dev() result (synchronously or via callback) because
> otherwise the driver has no idea when the Qdisc is ready and then it can
> start to deliver data frames to network stack and user space.

Exactly.

[...]

Helmut


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Ipw2100-devel] ipw2100: race between isr_indicate_associated and rx path
@ 2009-02-25  1:18             ` Zhu Yi
  0 siblings, 0 replies; 14+ messages in thread
From: Zhu Yi @ 2009-02-25  1:18 UTC (permalink / raw)
  To: Helmut Schaa; +Cc: netdev, linux-wireless, Jouni Malinen

On Tue, 2009-02-24 at 20:15 +0800, Helmut Schaa wrote:
> Am Dienstag, 24. Februar 2009 schrieb Zhu Yi:
> 
> [...]
> 
> > Thanks for the analysis. Are you sure noop_qdisc is still used when we
> > are about to netif_carrier_on() after receiving the association success
> > response? 
> 
> Yes, had a printk in noop_enqueue.
> 
> > From dev_open(), dev_activate() is called after netdev->open.
> > So the txq->qdisc_sleeping should be already replaced with pfifo_fast.
> > But the state is still DEACTIVATED.
> 
> Right. But for example when a connection cannot be established by ipw2100
> it will call netif_carrier_off on disassociation which in turn leads to a
> call to dev_deavtivate. Hence, it is possible that a noop_qdisc is
> assigned while the device is up and carrier is off.

Make sense. Whatever there is race here but either the driver or the
user space is able to control and avoid it.

BTW, does wpa_supplicant start to receive EAPOL frames after it gets the
association event?

[...]

> > But in wireless, 
> > netif_carrier_on() is usually called after an association is succeeded.
> > The driver has already some management frames transfered with AP. Now
> > it's the time to open the data frame transmission. The driver requires
> > to get the activate_dev() result (synchronously or via callback) because
> > otherwise the driver has no idea when the Qdisc is ready and then it can
> > start to deliver data frames to network stack and user space.
> 
> Exactly.

Looks like we didn't bring enough attention for netdev people on this.
Should you change the title to remove ipw2100? It should be a generic
issue for the current netif_carrier_on/dev_activate() implementation.

Thanks,
-yi


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Ipw2100-devel] ipw2100: race between isr_indicate_associated and rx path
@ 2009-02-25  1:18             ` Zhu Yi
  0 siblings, 0 replies; 14+ messages in thread
From: Zhu Yi @ 2009-02-25  1:18 UTC (permalink / raw)
  To: Helmut Schaa
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA, Jouni Malinen

On Tue, 2009-02-24 at 20:15 +0800, Helmut Schaa wrote:
> Am Dienstag, 24. Februar 2009 schrieb Zhu Yi:
> 
> [...]
> 
> > Thanks for the analysis. Are you sure noop_qdisc is still used when we
> > are about to netif_carrier_on() after receiving the association success
> > response? 
> 
> Yes, had a printk in noop_enqueue.
> 
> > From dev_open(), dev_activate() is called after netdev->open.
> > So the txq->qdisc_sleeping should be already replaced with pfifo_fast.
> > But the state is still DEACTIVATED.
> 
> Right. But for example when a connection cannot be established by ipw2100
> it will call netif_carrier_off on disassociation which in turn leads to a
> call to dev_deavtivate. Hence, it is possible that a noop_qdisc is
> assigned while the device is up and carrier is off.

Make sense. Whatever there is race here but either the driver or the
user space is able to control and avoid it.

BTW, does wpa_supplicant start to receive EAPOL frames after it gets the
association event?

[...]

> > But in wireless, 
> > netif_carrier_on() is usually called after an association is succeeded.
> > The driver has already some management frames transfered with AP. Now
> > it's the time to open the data frame transmission. The driver requires
> > to get the activate_dev() result (synchronously or via callback) because
> > otherwise the driver has no idea when the Qdisc is ready and then it can
> > start to deliver data frames to network stack and user space.
> 
> Exactly.

Looks like we didn't bring enough attention for netdev people on this.
Should you change the title to remove ipw2100? It should be a generic
issue for the current netif_carrier_on/dev_activate() implementation.

Thanks,
-yi

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Ipw2100-devel] ipw2100: race between isr_indicate_associated and rx path
  2009-02-25  1:18             ` Zhu Yi
@ 2009-02-25 12:39               ` Helmut Schaa
  -1 siblings, 0 replies; 14+ messages in thread
From: Helmut Schaa @ 2009-02-25 12:39 UTC (permalink / raw)
  To: Zhu Yi; +Cc: netdev, linux-wireless, Jouni Malinen

Am Mittwoch, 25. Februar 2009 schrieb Zhu Yi:
> Make sense. Whatever there is race here but either the driver or the
> user space is able to control and avoid it.
> 
> BTW, does wpa_supplicant start to receive EAPOL frames after it gets the
> association event?

Yep.

> [...]
> 
> > > But in wireless, 
> > > netif_carrier_on() is usually called after an association is succeeded.
> > > The driver has already some management frames transfered with AP. Now
> > > it's the time to open the data frame transmission. The driver requires
> > > to get the activate_dev() result (synchronously or via callback) because
> > > otherwise the driver has no idea when the Qdisc is ready and then it can
> > > start to deliver data frames to network stack and user space.
> > 
> > Exactly.
> 
> Looks like we didn't bring enough attention for netdev people on this.
> Should you change the title to remove ipw2100? It should be a generic
> issue for the current netif_carrier_on/dev_activate() implementation.

Argh! Just found out why dev_activate is called such late after
netif_carrier_on:

ipw2100 calls netif_carrier_on followed by netif_wake_queue when the driver
moves from associating to associated state. netif_carrier_on will then
call linkwatch_fire_event. However the carrier_on event is not treated as
urgent and as such the event is delayed (and thus dev_activate too).

An event is considered urgent if the netdev is running, has a carrier
_and_ any of the TX qdiscs changed. Since ipw2100 first calls carrier_on,
the last condition is not met and thus the event is not considered urgent
and gets delayed.

Just changing the order to first wake up the queues followed by the
carrier_on results in an urgent event. I ran a few tests with that change
(+ frame buffering patch) and wasn't able to trigger the race again.

I'll fold that into the frame buffer patch and send it to ipw2100-devel
once I finished the tests.

Helmut

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Ipw2100-devel] ipw2100: race between isr_indicate_associated and rx path
@ 2009-02-25 12:39               ` Helmut Schaa
  0 siblings, 0 replies; 14+ messages in thread
From: Helmut Schaa @ 2009-02-25 12:39 UTC (permalink / raw)
  To: Zhu Yi
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA, Jouni Malinen

Am Mittwoch, 25. Februar 2009 schrieb Zhu Yi:
> Make sense. Whatever there is race here but either the driver or the
> user space is able to control and avoid it.
> 
> BTW, does wpa_supplicant start to receive EAPOL frames after it gets the
> association event?

Yep.

> [...]
> 
> > > But in wireless, 
> > > netif_carrier_on() is usually called after an association is succeeded.
> > > The driver has already some management frames transfered with AP. Now
> > > it's the time to open the data frame transmission. The driver requires
> > > to get the activate_dev() result (synchronously or via callback) because
> > > otherwise the driver has no idea when the Qdisc is ready and then it can
> > > start to deliver data frames to network stack and user space.
> > 
> > Exactly.
> 
> Looks like we didn't bring enough attention for netdev people on this.
> Should you change the title to remove ipw2100? It should be a generic
> issue for the current netif_carrier_on/dev_activate() implementation.

Argh! Just found out why dev_activate is called such late after
netif_carrier_on:

ipw2100 calls netif_carrier_on followed by netif_wake_queue when the driver
moves from associating to associated state. netif_carrier_on will then
call linkwatch_fire_event. However the carrier_on event is not treated as
urgent and as such the event is delayed (and thus dev_activate too).

An event is considered urgent if the netdev is running, has a carrier
_and_ any of the TX qdiscs changed. Since ipw2100 first calls carrier_on,
the last condition is not met and thus the event is not considered urgent
and gets delayed.

Just changing the order to first wake up the queues followed by the
carrier_on results in an urgent event. I ran a few tests with that change
(+ frame buffering patch) and wasn't able to trigger the race again.

I'll fold that into the frame buffer patch and send it to ipw2100-devel
once I finished the tests.

Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Ipw2100-devel] ipw2100: race between isr_indicate_associated and rx path
@ 2009-02-27  0:55                 ` Zhu Yi
  0 siblings, 0 replies; 14+ messages in thread
From: Zhu Yi @ 2009-02-27  0:55 UTC (permalink / raw)
  To: Helmut Schaa; +Cc: netdev, linux-wireless, Jouni Malinen

On Wed, 2009-02-25 at 20:39 +0800, Helmut Schaa wrote:
> Argh! Just found out why dev_activate is called such late after
> netif_carrier_on:
> 
> ipw2100 calls netif_carrier_on followed by netif_wake_queue when the
> driver
> moves from associating to associated state. netif_carrier_on will then
> call linkwatch_fire_event. However the carrier_on event is not treated
> as
> urgent and as such the event is delayed (and thus dev_activate too).
> 
> An event is considered urgent if the netdev is running, has a carrier
> _and_ any of the TX qdiscs changed. Since ipw2100 first calls
> carrier_on,
> the last condition is not met and thus the event is not considered
> urgent and gets delayed.

Calling netif_wake_queue() before netif_carrier_on() is not correct in
semantics. Even it works, it looks like a workaround hack. I still think
making a sync version of netif_carrier_on is the way to go.

> Just changing the order to first wake up the queues followed by the
> carrier_on results in an urgent event. I ran a few tests with that
> change
> (+ frame buffering patch) and wasn't able to trigger the race again.
> 
> I'll fold that into the frame buffer patch and send it to
> ipw2100-devel
> once I finished the tests.

Sounds good.

Thanks,
-yi


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Ipw2100-devel] ipw2100: race between isr_indicate_associated and rx path
@ 2009-02-27  0:55                 ` Zhu Yi
  0 siblings, 0 replies; 14+ messages in thread
From: Zhu Yi @ 2009-02-27  0:55 UTC (permalink / raw)
  To: Helmut Schaa
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA, Jouni Malinen

On Wed, 2009-02-25 at 20:39 +0800, Helmut Schaa wrote:
> Argh! Just found out why dev_activate is called such late after
> netif_carrier_on:
> 
> ipw2100 calls netif_carrier_on followed by netif_wake_queue when the
> driver
> moves from associating to associated state. netif_carrier_on will then
> call linkwatch_fire_event. However the carrier_on event is not treated
> as
> urgent and as such the event is delayed (and thus dev_activate too).
> 
> An event is considered urgent if the netdev is running, has a carrier
> _and_ any of the TX qdiscs changed. Since ipw2100 first calls
> carrier_on,
> the last condition is not met and thus the event is not considered
> urgent and gets delayed.

Calling netif_wake_queue() before netif_carrier_on() is not correct in
semantics. Even it works, it looks like a workaround hack. I still think
making a sync version of netif_carrier_on is the way to go.

> Just changing the order to first wake up the queues followed by the
> carrier_on results in an urgent event. I ran a few tests with that
> change
> (+ frame buffering patch) and wasn't able to trigger the race again.
> 
> I'll fold that into the frame buffer patch and send it to
> ipw2100-devel
> once I finished the tests.

Sounds good.

Thanks,
-yi

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ipw2100: race between isr_indicate_associated and rx path
  2009-02-27  0:55                 ` Zhu Yi
@ 2009-03-03 11:33                   ` Helmut Schaa
  -1 siblings, 0 replies; 14+ messages in thread
From: Helmut Schaa @ 2009-03-03 11:33 UTC (permalink / raw)
  To: Zhu Yi; +Cc: netdev, linux-wireless, Jouni Malinen

Am Freitag, 27. Februar 2009 schrieb Zhu Yi:
> On Wed, 2009-02-25 at 20:39 +0800, Helmut Schaa wrote:
> > Argh! Just found out why dev_activate is called such late after
> > netif_carrier_on:
> > 
> > ipw2100 calls netif_carrier_on followed by netif_wake_queue when the
> > driver
> > moves from associating to associated state. netif_carrier_on will then
> > call linkwatch_fire_event. However the carrier_on event is not treated
> > as
> > urgent and as such the event is delayed (and thus dev_activate too).
> > 
> > An event is considered urgent if the netdev is running, has a carrier
> > _and_ any of the TX qdiscs changed. Since ipw2100 first calls
> > carrier_on,
> > the last condition is not met and thus the event is not considered
> > urgent and gets delayed.
> 
> Calling netif_wake_queue() before netif_carrier_on() is not correct in
> semantics. Even it works, it looks like a workaround hack. I still think
> making a sync version of netif_carrier_on is the way to go.

Hmm, we could use register_netdevice_notifier to get notified when the
carrier event is sent out and thus the tx queues are available.

The flow in ipw2100 would then be:

1) isr_indicate_associated:
- move to ASSOCIATING state
- start buffering incoming frames

2) ipw2100_ex_event_work:
- call carrier_on and wake_queues

3) netdevice_notification:
- move to ASSOCIATED state and replay buffered frames

I hope I'll find some time to try that soon.

Helmut

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ipw2100: race between isr_indicate_associated and rx path
@ 2009-03-03 11:33                   ` Helmut Schaa
  0 siblings, 0 replies; 14+ messages in thread
From: Helmut Schaa @ 2009-03-03 11:33 UTC (permalink / raw)
  To: Zhu Yi
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA, Jouni Malinen

Am Freitag, 27. Februar 2009 schrieb Zhu Yi:
> On Wed, 2009-02-25 at 20:39 +0800, Helmut Schaa wrote:
> > Argh! Just found out why dev_activate is called such late after
> > netif_carrier_on:
> > 
> > ipw2100 calls netif_carrier_on followed by netif_wake_queue when the
> > driver
> > moves from associating to associated state. netif_carrier_on will then
> > call linkwatch_fire_event. However the carrier_on event is not treated
> > as
> > urgent and as such the event is delayed (and thus dev_activate too).
> > 
> > An event is considered urgent if the netdev is running, has a carrier
> > _and_ any of the TX qdiscs changed. Since ipw2100 first calls
> > carrier_on,
> > the last condition is not met and thus the event is not considered
> > urgent and gets delayed.
> 
> Calling netif_wake_queue() before netif_carrier_on() is not correct in
> semantics. Even it works, it looks like a workaround hack. I still think
> making a sync version of netif_carrier_on is the way to go.

Hmm, we could use register_netdevice_notifier to get notified when the
carrier event is sent out and thus the tx queues are available.

The flow in ipw2100 would then be:

1) isr_indicate_associated:
- move to ASSOCIATING state
- start buffering incoming frames

2) ipw2100_ex_event_work:
- call carrier_on and wake_queues

3) netdevice_notification:
- move to ASSOCIATED state and replay buffered frames

I hope I'll find some time to try that soon.

Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-03-03 11:33 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <200901211734.48625.helmut.schaa@gmail.com>
     [not found] ` <200901271521.24395.helmut.schaa@gmail.com>
     [not found]   ` <200902051511.31268.helmut.schaa@gmail.com>
2009-02-23 10:38     ` [Ipw2100-devel] ipw2100: race between isr_indicate_associated and rx path Helmut Schaa
2009-02-23 18:08       ` Jouni Malinen
2009-02-24  7:36         ` Helmut Schaa
2009-02-24  3:34       ` Zhu Yi
2009-02-24  3:34         ` Zhu Yi
2009-02-24 12:15         ` Helmut Schaa
2009-02-25  1:18           ` Zhu Yi
2009-02-25  1:18             ` Zhu Yi
2009-02-25 12:39             ` Helmut Schaa
2009-02-25 12:39               ` Helmut Schaa
2009-02-27  0:55               ` Zhu Yi
2009-02-27  0:55                 ` Zhu Yi
2009-03-03 11:33                 ` Helmut Schaa
2009-03-03 11:33                   ` Helmut Schaa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.