All of lore.kernel.org
 help / color / mirror / Atom feed
* wireless drivers fail to report link speed?
@ 2017-08-08 19:07 James Feeney
  2017-08-08 21:42 ` Dan Williams
  2017-08-09 12:24 ` Kalle Valo
  0 siblings, 2 replies; 15+ messages in thread
From: James Feeney @ 2017-08-08 19:07 UTC (permalink / raw)
  To: linux-wireless

Hello All

Would you please look at kernel bug report "Since 4.12 - bonding module not
working with wireless drivers", and tell me if you know why the kernel ethtool
does not receive a speed report from the wireless drivers?

 https://bugzilla.kernel.org/show_bug.cgi?id=196547

It seems that Mahesh Bandewar became annoyed that some network drivers do not
report speed and duplex to the bonding module properly, so that it becomes
impossible to make "best connection" decisions.  A patch was applied to the
bonding module in linux 4.12 which now disables any network interface that does
not successfully report its speed and duplex.  In practice, this seems to
include every wireless network driver I've tried, the ath5k, ath9k, the
rtl8192ce and RTL8188CUS.  Of course, this new behavior breaks wireless bonding!

Do you know if there is some general reason why the wireless drivers do not work
with the kernel ethtool?  Is this something that can be fixed?  Can you tell if
this reporting failure would be the fault of the kernel ethtool?  Or the
wireless driver?  Or the bonding module?

Thanks
James

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: wireless drivers fail to report link speed?
  2017-08-08 19:07 wireless drivers fail to report link speed? James Feeney
@ 2017-08-08 21:42 ` Dan Williams
  2017-08-08 21:58   ` Ben Greear
  2017-08-09 12:24 ` Kalle Valo
  1 sibling, 1 reply; 15+ messages in thread
From: Dan Williams @ 2017-08-08 21:42 UTC (permalink / raw)
  To: james, linux-wireless

On Tue, 2017-08-08 at 13:07 -0600, James Feeney wrote:
> Hello All
> 
> Would you please look at kernel bug report "Since 4.12 - bonding
> module not
> working with wireless drivers", and tell me if you know why the
> kernel ethtool
> does not receive a speed report from the wireless drivers?
> 
>  https://bugzilla.kernel.org/show_bug.cgi?id=196547
> 
> It seems that Mahesh Bandewar became annoyed that some network
> drivers do not
> report speed and duplex to the bonding module properly, so that it
> becomes
> impossible to make "best connection" decisions.  A patch was applied
> to the
> bonding module in linux 4.12 which now disables any network interface
> that does
> not successfully report its speed and duplex.  In practice, this
> seems to
> include every wireless network driver I've tried, the ath5k, ath9k,
> the
> rtl8192ce and RTL8188CUS.  Of course, this new behavior breaks
> wireless bonding!
> 
> Do you know if there is some general reason why the wireless drivers
> do not work
> with the kernel ethtool?  Is this something that can be fixed?  Can
> you tell if
> this reporting failure would be the fault of the kernel ethtool?  Or
> the
> wireless driver?  Or the bonding module?

Because the "speed" (whatever that means) can and sometimes does change
with every packet.  The driver dynamically adjusts the link rate based
on all kinds of things.  But mainly the current radio environment; how
many other APs are around, how much interference there is, how many
other clients are trying to talk, that kind of thing.

So one second the wifi might be the "best" link and then when somebody
turns on a microwave oven or a baby monitor, it may be the "worst"
until the microwave's duty cycle completes a few seconds later then
it'll become the "best" again for a couple seconds then "worst" again,
repeat until your Easy Mac is nice and warm and creamy.

Furthermore, for some drivers IIRC when there isn't any traffic, they
might drop the link rate very low because there's no reason keep
powering blocks when you're not transmitting/receiving any data.  IIRC
the Intel drivers used to do that a couple years ago.

Also, "duplex" doesn't mean anything in wireless land.  So no clue what
bonding is expecting them to say here.  I would say the modifications
to the bonding core made assumptions that simply aren't applicable to
mediums other than wired ones.

Dan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: wireless drivers fail to report link speed?
  2017-08-08 21:42 ` Dan Williams
@ 2017-08-08 21:58   ` Ben Greear
  2017-08-08 22:25     ` James Feeney
  0 siblings, 1 reply; 15+ messages in thread
From: Ben Greear @ 2017-08-08 21:58 UTC (permalink / raw)
  To: Dan Williams, james, linux-wireless

On 08/08/2017 02:42 PM, Dan Williams wrote:
> On Tue, 2017-08-08 at 13:07 -0600, James Feeney wrote:
>> Hello All
>>
>> Would you please look at kernel bug report "Since 4.12 - bonding
>> module not
>> working with wireless drivers", and tell me if you know why the
>> kernel ethtool
>> does not receive a speed report from the wireless drivers?
>>
>>  https://bugzilla.kernel.org/show_bug.cgi?id=196547
>>
>> It seems that Mahesh Bandewar became annoyed that some network
>> drivers do not
>> report speed and duplex to the bonding module properly, so that it
>> becomes
>> impossible to make "best connection" decisions.  A patch was applied
>> to the
>> bonding module in linux 4.12 which now disables any network interface
>> that does
>> not successfully report its speed and duplex.  In practice, this
>> seems to
>> include every wireless network driver I've tried, the ath5k, ath9k,
>> the
>> rtl8192ce and RTL8188CUS.  Of course, this new behavior breaks
>> wireless bonding!
>>
>> Do you know if there is some general reason why the wireless drivers
>> do not work
>> with the kernel ethtool?  Is this something that can be fixed?  Can
>> you tell if
>> this reporting failure would be the fault of the kernel ethtool?  Or
>> the
>> wireless driver?  Or the bonding module?
>
> Because the "speed" (whatever that means) can and sometimes does change
> with every packet.  The driver dynamically adjusts the link rate based
> on all kinds of things.  But mainly the current radio environment; how
> many other APs are around, how much interference there is, how many
> other clients are trying to talk, that kind of thing.
>
> So one second the wifi might be the "best" link and then when somebody
> turns on a microwave oven or a baby monitor, it may be the "worst"
> until the microwave's duty cycle completes a few seconds later then
> it'll become the "best" again for a couple seconds then "worst" again,
> repeat until your Easy Mac is nice and warm and creamy.
>
> Furthermore, for some drivers IIRC when there isn't any traffic, they
> might drop the link rate very low because there's no reason keep
> powering blocks when you're not transmitting/receiving any data.  IIRC
> the Intel drivers used to do that a couple years ago.
>
> Also, "duplex" doesn't mean anything in wireless land.  So no clue what
> bonding is expecting them to say here.  I would say the modifications
> to the bonding core made assumptions that simply aren't applicable to
> mediums other than wired ones.

Well, wifi acts half-duplex in that only one side can transmit
at once.

But, no argument with the rest!

Thanks,
Ben

>
> Dan
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: wireless drivers fail to report link speed?
  2017-08-08 21:58   ` Ben Greear
@ 2017-08-08 22:25     ` James Feeney
  2017-08-08 22:49       ` Ben Greear
  2017-08-08 23:43       ` Dan Williams
  0 siblings, 2 replies; 15+ messages in thread
From: James Feeney @ 2017-08-08 22:25 UTC (permalink / raw)
  To: Ben Greear, Dan Williams, linux-wireless

Hey Dan

> ...
> So one second the wifi might be the "best" link and then when somebody
> turns on a microwave oven or a baby monitor, it may be the "worst"
> until the microwave's duty cycle completes a few seconds later then
> it'll become the "best" again for a couple seconds then "worst" again,
> repeat until your Easy Mac is nice and warm and creamy.
> 
> Furthermore, for some drivers IIRC when there isn't any traffic, they
> might drop the link rate very low because there's no reason keep
> powering blocks when you're not transmitting/receiving any data.  IIRC
> the Intel drivers used to do that a couple years ago.

Yes, thanks, but, while all of that is true, it has nothing to do with the
question asked.

The question is, regardless that the wireless speed may be constantly changing,
why is it that the kernel ethtool returns an error on get_settings(), instead of
returning the current wireless speed, whatever that link speed might be at the
moment?

> Also, "duplex" doesn't mean anything in wireless land.  So no clue what
> bonding is expecting them to say here.  I would say the modifications
> to the bonding core made assumptions that simply aren't applicable to
> mediums other than wired ones.

Since, as Ben mentions,

> Well, wifi acts half-duplex in that only one side can transmit at once.

then the wireless drivers would be expected to simply report "half-duplex".

Again, the issue is not that wireless is half-duplex or full-duplex, but rather,
why does the kernel ethtool return an error on get_settings()?

And, why is it that it seems get_settings() returns an error with multiple
wireless drivers?  Is there some assumption, or convention, that causes the
kernel ethtool to fail with *all* the wireless drivers?

I see that, on bugzilla, Florian is suggesting that wireless network devices
*cannot* report a speed/duplex, simply because the wireless speed changes on a
per-packet basis, but that does not seem to me to be a persuasive argument.  A
wireless connection *does* always have a current speed, even if that speed
changes frequently.  The kernel ethtool get_settings() should simply report that
speed, not throw an error, yes?


Thanks
James

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: wireless drivers fail to report link speed?
  2017-08-08 22:25     ` James Feeney
@ 2017-08-08 22:49       ` Ben Greear
  2017-08-09  0:36         ` James Feeney
  2017-08-08 23:43       ` Dan Williams
  1 sibling, 1 reply; 15+ messages in thread
From: Ben Greear @ 2017-08-08 22:49 UTC (permalink / raw)
  To: james, Dan Williams, linux-wireless

On 08/08/2017 03:25 PM, James Feeney wrote:
> Hey Dan
>
>> ...
>> So one second the wifi might be the "best" link and then when somebody
>> turns on a microwave oven or a baby monitor, it may be the "worst"
>> until the microwave's duty cycle completes a few seconds later then
>> it'll become the "best" again for a couple seconds then "worst" again,
>> repeat until your Easy Mac is nice and warm and creamy.
>>
>> Furthermore, for some drivers IIRC when there isn't any traffic, they
>> might drop the link rate very low because there's no reason keep
>> powering blocks when you're not transmitting/receiving any data.  IIRC
>> the Intel drivers used to do that a couple years ago.
>
> Yes, thanks, but, while all of that is true, it has nothing to do with the
> question asked.
>
> The question is, regardless that the wireless speed may be constantly changing,
> why is it that the kernel ethtool returns an error on get_settings(), instead of
> returning the current wireless speed, whatever that link speed might be at the
> moment?
>
>> Also, "duplex" doesn't mean anything in wireless land.  So no clue what
>> bonding is expecting them to say here.  I would say the modifications
>> to the bonding core made assumptions that simply aren't applicable to
>> mediums other than wired ones.
>
> Since, as Ben mentions,
>
>> Well, wifi acts half-duplex in that only one side can transmit at once.
>
> then the wireless drivers would be expected to simply report "half-duplex".
>
> Again, the issue is not that wireless is half-duplex or full-duplex, but rather,
> why does the kernel ethtool return an error on get_settings()?
>
> And, why is it that it seems get_settings() returns an error with multiple
> wireless drivers?  Is there some assumption, or convention, that causes the
> kernel ethtool to fail with *all* the wireless drivers?
>
> I see that, on bugzilla, Florian is suggesting that wireless network devices
> *cannot* report a speed/duplex, simply because the wireless speed changes on a
> per-packet basis, but that does not seem to me to be a persuasive argument.  A
> wireless connection *does* always have a current speed, even if that speed
> changes frequently.  The kernel ethtool get_settings() should simply report that
> speed, not throw an error, yes?

Some time back, I added some support to ath10k to report some ethtool info.
At least most of this is upstream.  I do report rx and rx link rate, and yes,
it changes, but it does contain some useful info, at least when modest amounts
of packets are being transmitted and received (so that rate-ctrl logic
is working).  I think the stuff not prepended with d_ will show up for any
mac80211 driver.  Someone could improve ethtool to report these through more
normal API than just getting the stats if they wanted...

[root@lf0350-7220 lanforge]# ethtool -S wlan0
NIC statistics:
      rx_packets: 3321
      rx_bytes: 338788
      rx_duplicates: 0
      rx_fragments: 1671
      rx_dropped: 0
      tx_packets: 15
      tx_bytes: 484
      tx_filtered: 0
      tx_retry_failed: 2
      tx_retries: 0
      sta_state: 4
      txrate: 13000000
      rxrate: 0
      signal: 201
      channel: 5180
      noise: 150
      ch_time: 56
      ch_time_busy: 3
      ch_time_ext_busy: 18446744073709551615
      ch_time_rx: 18446744073709551615
      ch_time_tx: 18446744073709551615
      tx_hw_reaped: 1836
      tx_pkts_nic: 157
      tx_bytes_nic: 17639
      tx_bytes_to_fw: 24626
      rx_pkts_nic: 211
      rx_bytes_nic: 17346681
      d_noise_floor: 18446744073709551510
      d_cycle_count: 4138309451
      d_tx_cycle_count: 63796069
      d_rx_cycle_count: 3941659360
      d_busy_count: 4061048026
      d_flags: 0
      d_phy_error: 0
      d_rts_bad: 0
      d_rts_good: 4
      d_tx_power: 46
      d_rx_crc_err: 1518
      d_no_beacon: 0
      d_tx_mpdus_queued: 489
      d_tx_msdu_queued: 494
      d_tx_msdu_dropped: 0
      d_local_enqued: 46
      d_local_freed: 46
      d_tx_ppdu_hw_queued: 1836
      d_tx_ppdu_reaped: 1836
      d_tx_fifo_underrun: 4
      d_tx_ppdu_abort: 0
      d_tx_mpdu_requed: 1587
      d_tx_excessive_retries: 1575
      d_tx_hw_rate: 192
      d_tx_dropped_sw_retries: 51
      d_tx_noack: 0
      d_tx_noack_bytes: 0
      d_tx_discard: 240
      d_tx_discard_bytes: 5763
      d_tx_illegal_rate: 0
      d_tx_continuous_xretries: 0
      d_tx_timeout: 0
      d_tx_mpdu_txop_limit: 0
      d_pdev_resets: 25
      d_rx_mid_ppdu_route_change: 0
      d_rx_status: 0
      d_rx_extra_frags_ring0: 84453
      d_rx_extra_frags_ring1: 0
      d_rx_extra_frags_ring2: 0
      d_rx_extra_frags_ring3: 0
      d_rx_msdu_htt: 211
      d_rx_mpdu_htt: 211
      d_rx_msdu_stack: 84242
      d_rx_mpdu_stack: 84242
      d_rx_phy_err: 0
      d_rx_phy_err_drops: 0
      d_rx_mpdu_errors: 0
      d_fw_crash_count: 0
      d_fw_warm_reset_count: 0
      d_fw_cold_reset_count: 8
      d_fw_powerup_failed: 0
      d_short_tx_retries: 4
      d_long_tx_retries: 1571
      d_fw_adc_temp: 3081417395
[root@lf0350-7220 lanforge]#

Thanks,
Ben

>
>
> Thanks
> James
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: wireless drivers fail to report link speed?
  2017-08-08 22:25     ` James Feeney
  2017-08-08 22:49       ` Ben Greear
@ 2017-08-08 23:43       ` Dan Williams
  1 sibling, 0 replies; 15+ messages in thread
From: Dan Williams @ 2017-08-08 23:43 UTC (permalink / raw)
  To: james, Ben Greear, linux-wireless

On Tue, 2017-08-08 at 16:25 -0600, James Feeney wrote:
> Hey Dan
> 
> > ...
> > So one second the wifi might be the "best" link and then when
> > somebody
> > turns on a microwave oven or a baby monitor, it may be the "worst"
> > until the microwave's duty cycle completes a few seconds later then
> > it'll become the "best" again for a couple seconds then "worst"
> > again,
> > repeat until your Easy Mac is nice and warm and creamy.
> > 
> > Furthermore, for some drivers IIRC when there isn't any traffic,
> > they
> > might drop the link rate very low because there's no reason keep
> > powering blocks when you're not transmitting/receiving any
> > data.  IIRC
> > the Intel drivers used to do that a couple years ago.
> 
> Yes, thanks, but, while all of that is true, it has nothing to do
> with the
> question asked.

It's very relevant to the question.  Because if the speed is actually
not useful for the requested purpose, there is no real point in having
it reported it via ethtool.  Same for duplex.  Wifi is only "half
duplex", and so the property is actually meaningless for WiFi.

The bonding driver is requiring completely irrelevant information, or
at least information that simply doesn't make sense for some
communication mechanisms.  There's no way the bonding driver can make a
useful decision if the speed *intentionally* changes regularly.  At
worst, your bonding link will flip-flop between slaves every second or
two.  At best, bonding won't do anything differently than if the speed
was just faked to some fake lowest common denominator number so that
your wired link was always faster.

Sure, somebody could do a patch (like Ben has) that plumbs all this
stuff through.  But that's not solving the *actual* problem, which is
that this bonding change makes assumptions of slave devices that simply
don't match how those devices work.

Dan

> The question is, regardless that the wireless speed may be constantly
> changing,
> why is it that the kernel ethtool returns an error on get_settings(),
> instead of
> returning the current wireless speed, whatever that link speed might
> be at the
> moment?
> 
> > Also, "duplex" doesn't mean anything in wireless land.  So no clue
> > what
> > bonding is expecting them to say here.  I would say the
> > modifications
> > to the bonding core made assumptions that simply aren't applicable
> > to
> > mediums other than wired ones.
> 
> Since, as Ben mentions,
> 
> > Well, wifi acts half-duplex in that only one side can transmit at
> > once.
> 
> then the wireless drivers would be expected to simply report "half-
> duplex".
> 
> Again, the issue is not that wireless is half-duplex or full-duplex,
> but rather,
> why does the kernel ethtool return an error on get_settings()?
> 
> And, why is it that it seems get_settings() returns an error with
> multiple
> wireless drivers?  Is there some assumption, or convention, that
> causes the
> kernel ethtool to fail with *all* the wireless drivers?
> 
> I see that, on bugzilla, Florian is suggesting that wireless network
> devices
> *cannot* report a speed/duplex, simply because the wireless speed
> changes on a
> per-packet basis, but that does not seem to me to be a persuasive
> argument.  A
> wireless connection *does* always have a current speed, even if that
> speed
> changes frequently.  The kernel ethtool get_settings() should simply
> report that
> speed, not throw an error, yes?
> 
> 
> Thanks
> James
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: wireless drivers fail to report link speed?
  2017-08-08 22:49       ` Ben Greear
@ 2017-08-09  0:36         ` James Feeney
  2017-08-09  9:30           ` Arend van Spriel
  2017-08-09 13:43           ` Dan Williams
  0 siblings, 2 replies; 15+ messages in thread
From: James Feeney @ 2017-08-09  0:36 UTC (permalink / raw)
  To: Ben Greear, Dan Williams, linux-wireless

Hey

On 08/08/2017 04:49 PM, Ben Greear wrote:
> 
> Some time back, I added some support to ath10k to report some ethtool info.
> At least most of this is upstream.  I do report rx and rx link rate, and yes,
> it changes, but it does contain some useful info, at least when modest amounts
> of packets are being transmitted and received (so that rate-ctrl logic
> is working).  I think the stuff not prepended with d_ will show up for any
> mac80211 driver.  Someone could improve ethtool to report these through more
> normal API than just getting the stats if they wanted...

Hmm - would you then lean in the direction of saying that this failure to report
a link speed is a fault in the kernel ethtool?

And, if so, would an update be required in just the kernel ethtool, or in both
the kernel ethtool and in every wireless driver?

Should the kernel ethtool get_settings() be made to report a proper link speed
and duplex when used with the wireless drivers?

On 08/08/2017 05:43 PM, Dan Williams wrote:
>
> It's very relevant to the question.  Because if the speed is actually
> not useful for the requested purpose, there is no real point in having
> it reported it via ethtool.  Same for duplex.  Wifi is only "half
> duplex", and so the property is actually meaningless for WiFi.

That seems a little over-broad, at least certainly with respect to "half
duplex".  If the link is known to be half duplex, then the kernel ethtool can
simply report that the link is "half duplex".  I am not hearing a good
justification, or a necessity, for the kernel ethtool to return an error, instead.

> At
> worst, your bonding link will flip-flop between slaves every second or
> two.  At best, bonding won't do anything differently than if the speed
> was just faked to some fake lowest common denominator number so that
> your wired link was always faster.

Well ok, flip-flopping every second would seem a pointless and bad effect.  But
then, for instance, my rtl8192ce driver shows a stable, actual link speed:

$ iwconfig wlp2s0
...
Bit Rate=72.2 Mb/s
...

$ ethtool -S wlp2s0
...
     txrate: 72200000
     rxrate: 1000000
...

Then, I don't know if this effect is as bad as you suggest.  Is there an actual
"stable" link speed here?  Or is this an "illusion", of bit of "fluff" being
promoted by the user-space iwconfig and ethtool?

> Sure, somebody could do a patch (like Ben has) that plumbs all this
> stuff through.  But that's not solving the *actual* problem, which is
> that this bonding change makes assumptions of slave devices that simply
> don't match how those devices work.

I take it that your position would be that the bonding module people, and Mahesh
in particular, are being unreasonable in expecting the kernel ethtool to provide
anything but an error in response to get_settings()?  What do you think of
Florian's suggestion, to check for dev->ieee80211_ptr being set, and letting
these interfaces proceed with being enslaved in a bond master network device if
that's the case?

Would you go so far as to say that modifying the kernel ethtool to report
wireless link speed and duplex would be entirely pointless?

Thanks
James

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: wireless drivers fail to report link speed?
  2017-08-09  0:36         ` James Feeney
@ 2017-08-09  9:30           ` Arend van Spriel
  2017-08-09 17:01             ` James Feeney
  2017-08-09 13:43           ` Dan Williams
  1 sibling, 1 reply; 15+ messages in thread
From: Arend van Spriel @ 2017-08-09  9:30 UTC (permalink / raw)
  To: james, Ben Greear, Dan Williams, linux-wireless

On 8/9/2017 2:36 AM, James Feeney wrote:
> Hey
>

[...]

> On 08/08/2017 05:43 PM, Dan Williams wrote:
>>
>> It's very relevant to the question.  Because if the speed is actually
>> not useful for the requested purpose, there is no real point in having
>> it reported it via ethtool.  Same for duplex.  Wifi is only "half
>> duplex", and so the property is actually meaningless for WiFi.
>
> That seems a little over-broad, at least certainly with respect to "half
> duplex".  If the link is known to be half duplex, then the kernel ethtool can
> simply report that the link is "half duplex".  I am not hearing a good
> justification, or a necessity, for the kernel ethtool to return an error, instead.

There is nothing "over-board" about it. Whhy asking a question if you 
already know the answer.

>> At
>> worst, your bonding link will flip-flop between slaves every second or
>> two.  At best, bonding won't do anything differently than if the speed
>> was just faked to some fake lowest common denominator number so that
>> your wired link was always faster.
>
> Well ok, flip-flopping every second would seem a pointless and bad effect.  But
> then, for instance, my rtl8192ce driver shows a stable, actual link speed:
>
> $ iwconfig wlp2s0
> ...
> Bit Rate=72.2 Mb/s
> ...
>
> $ ethtool -S wlp2s0
> ...
>       txrate: 72200000
>       rxrate:  1000000
> ...
>
> Then, I don't know if this effect is as bad as you suggest.  Is there an actual
> "stable" link speed here?  Or is this an "illusion", of bit of "fluff" being
> promoted by the user-space iwconfig and ethtool?
>
>> Sure, somebody could do a patch (like Ben has) that plumbs all this
>> stuff through.  But that's not solving the *actual* problem, which is
>> that this bonding change makes assumptions of slave devices that simply
>> don't match how those devices work.
>
> I take it that your position would be that the bonding module people, and Mahesh
> in particular, are being unreasonable in expecting the kernel ethtool to provide
> anything but an error in response to get_settings()?  What do you think of
> Florian's suggestion, to check for dev->ieee80211_ptr being set, and letting
> these interfaces proceed with being enslaved in a bond master network device if
> that's the case?
>
> Would you go so far as to say that modifying the kernel ethtool to report
> wireless link speed and duplex would be entirely pointless?

It really depends on how it is used. In case of the bonding module it 
seems it need to be pretty accurate to assure using the fastest link. 
You say your rtl8192ce driver reports stable speed, but what is your 
connection state and are you actually sending packets over it. Also the 
rxrate reported is 1MB/s, which is probably rate of the last received 
packet. Apart from your data connection to the AP the device is 
receiving beacons which are sent at a low speed thus screwing up the rx 
speed accuracy of your link.

Actually what the bonding module could rely on would be what is 
described in section 11.46 ("Estimated throughput") of IEEE802.11-2016 
as it seems to address exactly the bonding use-case. However, I am not 
aware of any devices in the field carrying that feature (but I am not 
all knowing ;-) ).

Regards,
Arend

> Thanks
> James
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: wireless drivers fail to report link speed?
  2017-08-08 19:07 wireless drivers fail to report link speed? James Feeney
  2017-08-08 21:42 ` Dan Williams
@ 2017-08-09 12:24 ` Kalle Valo
  1 sibling, 0 replies; 15+ messages in thread
From: Kalle Valo @ 2017-08-09 12:24 UTC (permalink / raw)
  To: james; +Cc: linux-wireless

James Feeney <james@nurealm.net> writes:

> Would you please look at kernel bug report "Since 4.12 - bonding module not
> working with wireless drivers", and tell me if you know why the kernel ethtool
> does not receive a speed report from the wireless drivers?
>
>  https://bugzilla.kernel.org/show_bug.cgi?id=196547

Have you reported this on netdev (CCing linux-wireless, David Miller and
the patch authors)? I think the offending bonding patch should be
reverted but first it needs to be properly reported on the mailing list.
Most people don't really follow bugzilla.

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: wireless drivers fail to report link speed?
  2017-08-09  0:36         ` James Feeney
  2017-08-09  9:30           ` Arend van Spriel
@ 2017-08-09 13:43           ` Dan Williams
  1 sibling, 0 replies; 15+ messages in thread
From: Dan Williams @ 2017-08-09 13:43 UTC (permalink / raw)
  To: james, Ben Greear, linux-wireless

On Tue, 2017-08-08 at 18:36 -0600, James Feeney wrote:
> Hey
> 
> On 08/08/2017 04:49 PM, Ben Greear wrote:
> > 
> > Some time back, I added some support to ath10k to report some
> > ethtool info.
> > At least most of this is upstream.  I do report rx and rx link
> > rate, and yes,
> > it changes, but it does contain some useful info, at least when
> > modest amounts
> > of packets are being transmitted and received (so that rate-ctrl
> > logic
> > is working).  I think the stuff not prepended with d_ will show up
> > for any
> > mac80211 driver.  Someone could improve ethtool to report these
> > through more
> > normal API than just getting the stats if they wanted...
> 
> Hmm - would you then lean in the direction of saying that this
> failure to report
> a link speed is a fault in the kernel ethtool?

No, it's not a fault of ethtool.  Ethtool only reports something, it's
up to the thing that interprets that data (eg, bonding) to do the right
thing with it.

> And, if so, would an update be required in just the kernel ethtool,
> or in both
> the kernel ethtool and in every wireless driver?

Likely every wireless driver, except that for mac80211-based drivers it
would only take updating the mac80211 stack.

I'm not really arguing against updating mac80211 to report this
information if somebody actually wants to do the patch.  I'm only
saying that even with the patch, it's not going to do exactly what you
want it to do, and even if it works for you 90% of the time, it's not
going to work for others that much of the time, and thus it gives a
false sense of "correctness" which is just wrong.

> Should the kernel ethtool get_settings() be made to report a proper
> link speed
> and duplex when used with the wireless drivers?
> 
> On 08/08/2017 05:43 PM, Dan Williams wrote:
> > 
> > It's very relevant to the question.  Because if the speed is
> > actually
> > not useful for the requested purpose, there is no real point in
> > having
> > it reported it via ethtool.  Same for duplex.  Wifi is only "half
> > duplex", and so the property is actually meaningless for WiFi.
> 
> That seems a little over-broad, at least certainly with respect to
> "half
> duplex".  If the link is known to be half duplex, then the kernel
> ethtool can
> simply report that the link is "half duplex".  I am not hearing a
> good
> justification, or a necessity, for the kernel ethtool to return an
> error, instead.

> > At
> > worst, your bonding link will flip-flop between slaves every second
> > or
> > two.  At best, bonding won't do anything differently than if the
> > speed
> > was just faked to some fake lowest common denominator number so
> > that
> > your wired link was always faster.
> 
> Well ok, flip-flopping every second would seem a pointless and bad
> effect.  But
> then, for instance, my rtl8192ce driver shows a stable, actual link
> speed:
> 
> $ iwconfig wlp2s0
> ...
> Bit Rate=72.2 Mb/s
> ...

iwconfig cannot report high rates, so try 'iw dev <name> link' to make
sure.

When I did 'iw dev wlp4s0 link' with a 2.4GHz baby monitor on in the
next room, my device flipped continuously between ~70Mb/s and 130Mb/s
every couple seconds. YMMV.  It's gonna be the same anywhere near a
microwave.

> $ ethtool -S wlp2s0
> ...
>      txrate: 72200000
>      rxrate: 1000000
> ...
> 
> Then, I don't know if this effect is as bad as you suggest.  Is there
> an actual
> "stable" link speed here?  Or is this an "illusion", of bit of
> "fluff" being
> promoted by the user-space iwconfig and ethtool?

There is no "stable" link speed.  The link selects the maximum speed
that produces as few errors as possible, and adjusts that speed
continuously due to the radio environment.  Again, many external
factors that you have no control over affect link speed.

In the span of 5 seconds, 2 feet away from my 11n AP, my link went
through 65Mb, 130Mb, 78Mb, and back to 130Mb.  That's just how this
works.

It's like if your ethernet link dynamically adjusted speed from 2Mb/s
up to 10Gb/s based on how much traffic was going through it at a given
time, or to save power, or something.

> > Sure, somebody could do a patch (like Ben has) that plumbs all this
> > stuff through.  But that's not solving the *actual* problem, which
> > is
> > that this bonding change makes assumptions of slave devices that
> > simply
> > don't match how those devices work.
> 
> I take it that your position would be that the bonding module people,
> and Mahesh
> in particular, are being unreasonable in expecting the kernel ethtool
> to provide
> anything but an error in response to get_settings()?  What do you
> think of
> Florian's suggestion, to check for dev->ieee80211_ptr being set, and
> letting
> these interfaces proceed with being enslaved in a bond master network
> device if
> that's the case?

I'm suggesting that if the bonding driver is expecting a *continuous*
stable link rate from any kind of radio device, whether that's WiFi,
WWAN, Bluetooth, or whatever, it's being unreasonable.

It's not necessarily unreasonable to add speed/duplex reporting to the
ethtool hooks for wifi drivers.  But before that happens, we should
understand what other bits will use that information, how they use it,
and if they are going to use it incorrectly and thus do something that
users don't expect and consider a bug itself.

Dan

> Would you go so far as to say that modifying the kernel ethtool to
> report
> wireless link speed and duplex would be entirely pointless?
> 
> Thanks
> James

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: wireless drivers fail to report link speed?
  2017-08-09  9:30           ` Arend van Spriel
@ 2017-08-09 17:01             ` James Feeney
  2017-08-09 18:25               ` Dan Williams
  2017-08-10  5:25               ` Kalle Valo
  0 siblings, 2 replies; 15+ messages in thread
From: James Feeney @ 2017-08-09 17:01 UTC (permalink / raw)
  To: Arend van Spriel, Ben Greear, Dan Williams, linux-wireless
  Cc: Andy Gospodarek


On 08/09/2017 03:30 AM, Arend van Spriel wrote:
>> That seems a little over-broad, at least certainly with respect to "half
>> duplex".  If the link is known to be half duplex, then the kernel ethtool can
>> simply report that the link is "half duplex".  I am not hearing a good
>> justification, or a necessity, for the kernel ethtool to return an error,
>> instead.> > There is nothing "over-board" about it. Whhy asking a question if
you already> know the answer.
Sorry - I do not understand to what "answer" you are referring.  Are you saying
that the kernel ethtool should *not* return an error?  Or are you saying that
the kernel ethtool *should* return an error, because the "wifi duplex" is
*always* half duplex?  Or are you referring to something else?  The kernel
ethtool functions need to work with *all* network interface types, wired,
wireless, and virtual.

Or, are you saying that the bonding module should not be using the kernel
ethtool functions?

> Actually what the bonding module could rely on would be what is described in
> section 11.46 ("Estimated throughput") of IEEE802.11-2016 as it seems to address
> exactly the bonding use-case. However, I am not aware of any devices in the
> field carrying that feature (but I am not all knowing ;-) ).

Ah!  That sounds like a useful focus.

I would like to discover a consensus among the wireless driver community about
what the "correct" resolution would be, with respect to the bonding module's
need to determine the link speed of an interface.

Should there be a "push" for, as you reference, proper reporting of "Estimated
throughput"?  Should there be a "wireless will never report link speed", because
- hey - it requires too much work to change all the wireless drivers?

What should the wireless group say to the bonding module group?

@ Kalle Valo

> Have you reported this on netdev (CCing linux-wireless, David Miller and
> the patch authors)? I think the offending bonding patch should be
> reverted but first it needs to be properly reported on the mailing list.
> Most people don't really follow bugzilla.

I have not.  I first contacted David Miller and the patch authors personally, to
see what sort of tact they might want to take.  They have been notified.  There
has been no response from anyone except Andy.  I can only make-up stories, based
upon no information, about why they are ignoring this issue.  I have been
following an ever-expanding sequence of suggestions about where to discuss this
issue - privately, Arch Linux, kernel bugzilla, linux-wireless - and now,
netdev.  I may do that next, but then, there may be so many different forums
where this topic is being introduced, that no one anywhere will want to track it
at all, or participate.

Really, who's responsibility is it, and who should have the authority, deciding
what functionality wireless drivers "must" provide for functionality like
"wireless bonding"?  I'd like to hear some kind of consensus on that.  So far,
no one is "owning" anything, not the wireless driver people, not the bonding
module people, not the kernel ethtool people.  So far, there are only a
developing set of "attitudes" and opinions.  I appreciate that some people are
willing to express opinions.

@ Dan Williams

> I'm not really arguing against updating mac80211 to report this
> information if somebody actually wants to do the patch.  I'm only
> saying that even with the patch, it's not going to do exactly what you
> want it to do, and even if it works for you 90% of the time, it's not
> going to work for others that much of the time, and thus it gives a
> false sense of "correctness" which is just wrong.

Hey - don't put this on me!  This is not about "what I want it to do".  I'm only
trying to make my wireless bonding work again.  But I also don't want to simply
"slap down" Mahesh, by only reverting his patch, which addressed another, real,
problem.  This needs to be a cooperative effort.  How do *we all* address the
problem that Mahesh was trying to resolve, and, at the same time, continue to
support wireless bonding?  Please, don't just "kick the can down the road".  It
seems to me that Mahesh must have been pretty upset about the wireless drivers
not reporting speed, to have written a patch that just disables the wireless
interface when the reporting fails.  Think about it.

If there is a long-standing screw-up with the wireless drivers failing to
properly support 'section 11.46 ("Estimated throughput") of IEEE802.11-2016',
then let's start-off by admitting that.  *Then* everyone can argue about what to
do about it.  And, if that's not the underlying problem, let's make that
determination.  I'm just trying to find a way forward.

> No, it's not a fault of ethtool.  Ethtool only reports something, it's
> up to the thing that interprets that data (eg, bonding) to do the right
> thing with it.

It has not yet been established that there is anything - "Estimated throughput"
- being provided universally by the wireless drivers for the kernel ethtool to
report.  So, you cannot blame this immediately upon "the thing that interprets
the data (eg, bonding)", when there *is no data* to interpret.  That was the
original question and issue.  There first *has* to be some data to interpret!

I will say that it is no more appropriate that the wireless drivers generate a
"piss-off" error on a get_settings() request than that the bonding module
respond with a "screw-you", disabling the wireless interface when it returns
that error.  This has turned into some kind of nasty lovers quarrel.  Or like a
couple of children having temper tantrums and retaliations.

> Likely every wireless driver, except that for mac80211-based drivers it
> would only take updating the mac80211 stack.

Ok.  That sounds positive.  Then there is a possibility to both update the
mac80211 stack, to provide "Estimated throughput", and also for the bonding
module to fall-back to a work-around for those wireless drivers that do not use
the mac80211 stack.

> I'm only
> saying that even with the patch, it's not going to do exactly what you
> want it to do, and even if it works for you 90% of the time, it's not
> going to work for others that much of the time, and thus it gives a
> false sense of "correctness" which is just wrong.

Ok.  So what *is* the "right thing" to do here?

The current, actual, in-place, "solution", implemented now, in the linux kernel,
is to simply "nuke" all wireless network interfaces that try to use the bonding
module.  I'd say that is a "rude, slap in the face" solution, but it suggests to
me that there is a sense of "hopelessness" in trying to get some support from
the wireless driver people, to actually fix the wireless speed reporting issue.
We could say that a patch "nuking" all wireless network interfaces is really a
desperate cry for help.  And this patch was signed-off by David Miller.

> When I did 'iw dev wlp4s0 link' with a 2.4GHz baby monitor on in the
> next room, my device flipped continuously between ~70Mb/s and 130Mb/s
> every couple seconds. YMMV.  It's gonna be the same anywhere near a
> microwave.

It appears to me to be absolutely certain that both 70Mb/s and 130Mb/s are
greater link speeds than, for instance, 10Mb/s wired ethernet, or 54Mb/s 802.11g
wireless.  You see?

> There is no "stable" link speed.  The link selects the maximum speed
> that produces as few errors as possible, and adjusts that speed
> continuously due to the radio environment.  Again, many external
> factors that you have no control over affect link speed.

It is *not* the responsibility of the wireless driver to determine the policy of
the bonding module!  It is only the responsibility of the wireless driver to
*report* the speed of the wireless link.  Don't try to "second-guess" the
bonding module people!

I imagine that a huge step forward would be made if only the kernel ethtool did
not just report an error in response to a wireless driver get_settings() request!

> I'm suggesting that if the bonding driver is expecting a *continuous*
> stable link rate from any kind of radio device, whether that's WiFi,
> WWAN, Bluetooth, or whatever, it's being unreasonable.
>
> It's not necessarily unreasonable to add speed/duplex reporting to the
> ethtool hooks for wifi drivers.  But before that happens, we should
> understand what other bits will use that information, how they use it,
> and if they are going to use it incorrectly and thus do something that
> users don't expect and consider a bug itself.

I really appreciate that you are engaging this topic - because lots of people
have not responded at all.  So, what do you suggest, with respect to addressing
the issue that Mahesh's patch was trying to address?  Do we ridicule Mahesh for
his "slap in the face" patch?  And David Miller for signing-off on it?  Or do we
update speed reporting in the wireless drivers, and provide some support for the
bonding module people, to make the bonding module do what they want?

Because, right now, wireless bonding is absolutely, and purposefully, "broken".


James

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: wireless drivers fail to report link speed?
  2017-08-09 17:01             ` James Feeney
@ 2017-08-09 18:25               ` Dan Williams
  2017-08-10  1:20                 ` James Feeney
  2017-08-11 18:43                 ` Andy Gospodarek
  2017-08-10  5:25               ` Kalle Valo
  1 sibling, 2 replies; 15+ messages in thread
From: Dan Williams @ 2017-08-09 18:25 UTC (permalink / raw)
  To: james, Arend van Spriel, Ben Greear, linux-wireless; +Cc: Andy Gospodarek

On Wed, 2017-08-09 at 11:01 -0600, James Feeney wrote:
> @ Dan Williams
> 
> > I'm not really arguing against updating mac80211 to report this
> > information if somebody actually wants to do the patch.  I'm only
> > saying that even with the patch, it's not going to do exactly what
> > you
> > want it to do, and even if it works for you 90% of the time, it's
> > not
> > going to work for others that much of the time, and thus it gives a
> > false sense of "correctness" which is just wrong.
> 
> Hey - don't put this on me!  This is not about "what I want it to
> do".  I'm only

I had a whole email written out, but it wasn't very constructive.

To be clear: I am not putting anything on you, or blaming you for
anything :)  Sorry if my tone implied that to you.  I feel like the
tone of this thread is becoming contentious and that's not my desire.

There is clearly a problem.  That problem was exposed by a patch to the
bonding driver that newly requires information that the WiFi drivers
don't provide.

The relevant questions, in my view, are:

1) why does the bonding driver now require this information?

2) is this information reasonable to request from WiFi drivers?

3) how would this information affect the operation of the bonding
driver if it doesn't mean the same thing as it means for wired devices?

My answers are:

1) I have no idea, though to continue being constructive to this
discussion, I should probably go find out.  We should also get the
bonding module patch authors to weigh in.  But the core point is that
it used to work, and now it doesn't work, and the WiFi drivers have not
changed in this area.

2) ethtool's API doesn't say much about semantics.  It is likely
reasonable to request this information from WiFi drivers, but
unfortunately WiFi has some different semantics for the information
than ethernet devices do.

3) The bonding module is interpreting the speed in a certain way that
can hugely affect its operation, and ethernet devices don't change
speed very frequently.  But wifi devices do.  This may well cause
unexpected operation from bonding, and we should be well aware of that
before we do anything to fix this problem.

I'd also like to point out the various virtual devices (veth, virtio,
etc) and how they report speed.  They lock the speed to a certain
value, but that doesn't actually mean anything because they are not
hardware based and their throughput is more a function of current CPU
load rather than actual wire speed.  The 'tun' driver locks the rate to
10Mb/s with no capability to change.  These are another case of
mismatch in expectations between bonding and reported speeds.

Again, there is a problem that should be solved.  I am only advocating
that instead of simply adding ethtool get_settings support to WiFi
drivers and washing our hands of it, which may have unintended
consequences, we gather all the information first, and discuss whether
the bonding driver may need to adjust its expectations before this kind
of change is made.

Dan

> trying to make my wireless bonding work again.  But I also don't want
> to simply
> "slap down" Mahesh, by only reverting his patch, which addressed
> another, real,
> problem.  This needs to be a cooperative effort.  How do *we all*
> address the
> problem that Mahesh was trying to resolve, and, at the same time,
> continue to
> support wireless bonding?  Please, don't just "kick the can down the
> road".  It
> seems to me that Mahesh must have been pretty upset about the
> wireless drivers
> not reporting speed, to have written a patch that just disables the
> wireless
> interface when the reporting fails.  Think about it.
> 
> If there is a long-standing screw-up with the wireless drivers
> failing to
> properly support 'section 11.46 ("Estimated throughput") of
> IEEE802.11-2016',
> then let's start-off by admitting that.  *Then* everyone can argue
> about what to
> do about it.  And, if that's not the underlying problem, let's make
> that
> determination.  I'm just trying to find a way forward.
> 
> > No, it's not a fault of ethtool.  Ethtool only reports something,
> > it's
> > up to the thing that interprets that data (eg, bonding) to do the
> > right
> > thing with it.
> 
> It has not yet been established that there is anything - "Estimated
> throughput"
> - being provided universally by the wireless drivers for the kernel
> ethtool to
> report.  So, you cannot blame this immediately upon "the thing that
> interprets
> the data (eg, bonding)", when there *is no data* to interpret.  That
> was the
> original question and issue.  There first *has* to be some data to
> interpret!
> 
> I will say that it is no more appropriate that the wireless drivers
> generate a
> "piss-off" error on a get_settings() request than that the bonding
> module
> respond with a "screw-you", disabling the wireless interface when it
> returns
> that error.  This has turned into some kind of nasty lovers
> quarrel.  Or like a
> couple of children having temper tantrums and retaliations.
> 
> > Likely every wireless driver, except that for mac80211-based
> > drivers it
> > would only take updating the mac80211 stack.
> 
> Ok.  That sounds positive.  Then there is a possibility to both
> update the
> mac80211 stack, to provide "Estimated throughput", and also for the
> bonding
> module to fall-back to a work-around for those wireless drivers that
> do not use
> the mac80211 stack.
> 
> > I'm only
> > saying that even with the patch, it's not going to do exactly what
> > you
> > want it to do, and even if it works for you 90% of the time, it's
> > not
> > going to work for others that much of the time, and thus it gives a
> > false sense of "correctness" which is just wrong.
> 
> Ok.  So what *is* the "right thing" to do here?
> 
> The current, actual, in-place, "solution", implemented now, in the
> linux kernel,
> is to simply "nuke" all wireless network interfaces that try to use
> the bonding
> module.  I'd say that is a "rude, slap in the face" solution, but it
> suggests to
> me that there is a sense of "hopelessness" in trying to get some
> support from
> the wireless driver people, to actually fix the wireless speed
> reporting issue.
> We could say that a patch "nuking" all wireless network interfaces is
> really a
> desperate cry for help.  And this patch was signed-off by David
> Miller.
> 
> > When I did 'iw dev wlp4s0 link' with a 2.4GHz baby monitor on in
> > the
> > next room, my device flipped continuously between ~70Mb/s and
> > 130Mb/s
> > every couple seconds. YMMV.  It's gonna be the same anywhere near a
> > microwave.
> 
> It appears to me to be absolutely certain that both 70Mb/s and
> 130Mb/s are
> greater link speeds than, for instance, 10Mb/s wired ethernet, or
> 54Mb/s 802.11g
> wireless.  You see?
> 
> > There is no "stable" link speed.  The link selects the maximum
> > speed
> > that produces as few errors as possible, and adjusts that speed
> > continuously due to the radio environment.  Again, many external
> > factors that you have no control over affect link speed.
> 
> It is *not* the responsibility of the wireless driver to determine
> the policy of
> the bonding module!  It is only the responsibility of the wireless
> driver to
> *report* the speed of the wireless link.  Don't try to "second-guess" 
> the
> bonding module people!
> 
> I imagine that a huge step forward would be made if only the kernel
> ethtool did
> not just report an error in response to a wireless driver
> get_settings() request!
> 
> > I'm suggesting that if the bonding driver is expecting a
> > *continuous*
> > stable link rate from any kind of radio device, whether that's
> > WiFi,
> > WWAN, Bluetooth, or whatever, it's being unreasonable.
> > 
> > It's not necessarily unreasonable to add speed/duplex reporting to
> > the
> > ethtool hooks for wifi drivers.  But before that happens, we should
> > understand what other bits will use that information, how they use
> > it,
> > and if they are going to use it incorrectly and thus do something
> > that
> > users don't expect and consider a bug itself.
> 
> I really appreciate that you are engaging this topic - because lots
> of people
> have not responded at all.  So, what do you suggest, with respect to
> addressing
> the issue that Mahesh's patch was trying to address?  Do we ridicule
> Mahesh for
> his "slap in the face" patch?  And David Miller for signing-off on
> it?  Or do we
> update speed reporting in the wireless drivers, and provide some
> support for the
> bonding module people, to make the bonding module do what they want?
> 
> Because, right now, wireless bonding is absolutely, and purposefully,
> "broken".
> 
> 
> James

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: wireless drivers fail to report link speed?
  2017-08-09 18:25               ` Dan Williams
@ 2017-08-10  1:20                 ` James Feeney
  2017-08-11 18:43                 ` Andy Gospodarek
  1 sibling, 0 replies; 15+ messages in thread
From: James Feeney @ 2017-08-10  1:20 UTC (permalink / raw)
  To: Dan Williams, Arend van Spriel, Ben Greear, linux-wireless
  Cc: Andy Gospodarek

Hey Dan

On 08/09/2017 12:25 PM, Dan Williams wrote:
> The relevant questions, in my view, are:
> 
> 1) why does the bonding driver now require this information?

Well, it *always* required the information.  Just now, Mahesh has finally
decided "up with this I will not put", not being able to get the information needed.

Please see Mahesh's first patch:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=c4adfc822bf5d8e97660b6114b5a8892530ce8cb

----

bonding: make speed, duplex setting consistent with link state

bond_update_speed_duplex() retrieves speed and duplex settings. There
is a possibility of failure in retrieving these values but caller has
to assume it's always successful. This leads to having inconsistent
slave link settings. If these (speed, duplex) values cannot be
retrieved, then keeping the link UP causes problems.

The updated bond_update_speed_duplex() returns 0 on success if it
retrieves sane values for speed and duplex. On failure it returns 1
and marks the link down.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

----

Mahesh does not explain what these specific "problems" might be, though.

Generally, the bonding module has to be able to dynamically prioritize, enable,
and disable different network interfaces, as network circumstances change.  It
constantly monitors link speeds and connectivity, comparing links and
reconfiguring interfaces.


> 2) is this information reasonable to request from WiFi drivers?
>
> 3) how would this information affect the operation of the bonding
> driver if it doesn't mean the same thing as it means for wired devices?

I don't know that that dialog, between bonding module and wireless, have ever
happened, until now, since there just was never any link information returned
from the wireless drivers.

You might want to look through "linux/Documentation/networking/bonding.txt".

I do know that the Bonding Driver Option "primary_reselect=better" does not work
with a wireless interface - because the bonding module does not receive the
wireless link speed.  That's not the bonding module's fault, it turns out.

But then, with no information, the bonding module would be forced to "punt".
For instance, it is not correct to simply assume that a wired connection is
always faster than a wireless connection, though that assumption might be needed
as a fall-back with older non-reporting wireless drivers.

> Again, there is a problem that should be solved.  I am only advocating
> that instead of simply adding ethtool get_settings support to WiFi
> drivers and washing our hands of it, which may have unintended
> consequences, we gather all the information first, and discuss whether
> the bonding driver may need to adjust its expectations before this kind
> of change is made.

Yes, please.


James

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: wireless drivers fail to report link speed?
  2017-08-09 17:01             ` James Feeney
  2017-08-09 18:25               ` Dan Williams
@ 2017-08-10  5:25               ` Kalle Valo
  1 sibling, 0 replies; 15+ messages in thread
From: Kalle Valo @ 2017-08-10  5:25 UTC (permalink / raw)
  To: james
  Cc: Arend van Spriel, Ben Greear, Dan Williams, linux-wireless,
	Andy Gospodarek

(Please don't drop me from CC)

James Feeney <james@nurealm.net> writes:

> @ Kalle Valo
>
>> Have you reported this on netdev (CCing linux-wireless, David Miller and
>> the patch authors)? I think the offending bonding patch should be
>> reverted but first it needs to be properly reported on the mailing list.
>> Most people don't really follow bugzilla.
>
> I have not. I first contacted David Miller and the patch authors
> personally, to see what sort of tact they might want to take. They
> have been notified. There has been no response from anyone except
> Andy. I can only make-up stories, based upon no information, about why
> they are ignoring this issue.

Contacting people privately is not a good idea as most people just
ignore them, me included. So this is not a surprise for me.

> I have been following an ever-expanding sequence of suggestions about
> where to discuss this issue - privately, Arch Linux, kernel bugzilla,
> linux-wireless - and now, netdev. 

Also bugzilla.kernel.org is not really actively followed by all
maintainers, so the best approach is to report the issue via
corresponding mailing lists and CC the developers and maintainters.

> I may do that next, but then, there
> may be so many different forums where this topic is being introduced,
> that no one anywhere will want to track it at all, or participate.

Let's not wait any longer, I'll report this forward. v4.13 release is
getting close and if we want to get this fixed for that release we have
to be quick.

> Really, who's responsibility is it, and who should have the authority, deciding
> what functionality wireless drivers "must" provide for functionality like
> "wireless bonding"?

The MAINTAINERS documents the responsibility. Usually the best approach
is to connect patch authors and CC mailing lists and maintainers
involved.

For bonding it's:

BONDING DRIVER
M:      Jay Vosburgh <j.vosburgh@gmail.com>
M:      Veaceslav Falico <vfalico@gmail.com>
M:      Andy Gospodarek <andy@greyhouse.net>
L:      netdev@vger.kernel.org
W:      http://sourceforge.net/projects/bonding/
S:      Supported
F:      drivers/net/bonding/
F:      include/uapi/linux/if_bonding.h

And for wireless:

CFG80211 and NL80211
M:      Johannes Berg <johannes@sipsolutions.net>
L:      linux-wireless@vger.kernel.org
W:      http://wireless.kernel.org/
T:      git
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git
T:      git
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git
S:      Maintained
F:      include/uapi/linux/nl80211.h
F:      include/net/cfg80211.h
F:      net/wireless/*
X:      net/wireless/wext*

MAC80211
M:      Johannes Berg <johannes@sipsolutions.net>
L:      linux-wireless@vger.kernel.org
W:      http://wireless.kernel.org/
T:      git
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git
T:      git
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git
S:      Maintained
F:      Documentation/networking/mac80211-injection.txt
F:      include/net/mac80211.h
F:      net/mac80211/
F:      drivers/net/wireless/mac80211_hwsim.[ch]


NETWORKING DRIVERS (WIRELESS)
M:      Kalle Valo <kvalo@codeaurora.org>
L:      linux-wireless@vger.kernel.org
Q:      http://patchwork.kernel.org/project/linux-wireless/list/
T:      git
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git
T:      git
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git
S:      Maintained
F:      Documentation/devicetree/bindings/net/wireless/
F:      drivers/net/wireless/

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: wireless drivers fail to report link speed?
  2017-08-09 18:25               ` Dan Williams
  2017-08-10  1:20                 ` James Feeney
@ 2017-08-11 18:43                 ` Andy Gospodarek
  1 sibling, 0 replies; 15+ messages in thread
From: Andy Gospodarek @ 2017-08-11 18:43 UTC (permalink / raw)
  To: Dan Williams; +Cc: James Feeney, Arend van Spriel, Ben Greear, linux-wireless

On Wed, Aug 9, 2017 at 2:25 PM, Dan Williams <dcbw@redhat.com> wrote:
> On Wed, 2017-08-09 at 11:01 -0600, James Feeney wrote:
>> @ Dan Williams
>>
>> > I'm not really arguing against updating mac80211 to report this
>> > information if somebody actually wants to do the patch.  I'm only
>> > saying that even with the patch, it's not going to do exactly what
>> > you
>> > want it to do, and even if it works for you 90% of the time, it's
>> > not
>> > going to work for others that much of the time, and thus it gives a
>> > false sense of "correctness" which is just wrong.
>>
>> Hey - don't put this on me!  This is not about "what I want it to
>> do".  I'm only
>
> I had a whole email written out, but it wasn't very constructive.
>
> To be clear: I am not putting anything on you, or blaming you for
> anything :)  Sorry if my tone implied that to you.  I feel like the
> tone of this thread is becoming contentious and that's not my desire.
>
> There is clearly a problem.  That problem was exposed by a patch to the
> bonding driver that newly requires information that the WiFi drivers
> don't provide.
>
> The relevant questions, in my view, are:
>
> 1) why does the bonding driver now require this information?
>
> 2) is this information reasonable to request from WiFi drivers?
>
> 3) how would this information affect the operation of the bonding
> driver if it doesn't mean the same thing as it means for wired devices?
>
> My answers are:
>
> 1) I have no idea, though to continue being constructive to this
> discussion, I should probably go find out.  We should also get the
> bonding module patch authors to weigh in.  But the core point is that
> it used to work, and now it doesn't work, and the WiFi drivers have not
> changed in this area.

I actually reached out to Mahesh a while ago to get more details on
what exactly he was trying to fix with c4adfc822bf5 ("bonding: make
speed, duplex setting consistent with link state") and 3f3c278c94d
("bonding: fix active-backup transition") but did not get a response.

I've been AFK for quite a bit of August, so unfortunately I haven't
had a change to dig into this deeply.  I'll pledge to look at this in
detail before the end of next week.  I even put it on my calendar so I
do not forget!

>
> 2) ethtool's API doesn't say much about semantics.  It is likely
> reasonable to request this information from WiFi drivers, but
> unfortunately WiFi has some different semantics for the information
> than ethernet devices do.
>
> 3) The bonding module is interpreting the speed in a certain way that
> can hugely affect its operation, and ethernet devices don't change
> speed very frequently.  But wifi devices do.  This may well cause
> unexpected operation from bonding, and we should be well aware of that
> before we do anything to fix this problem.
>
> I'd also like to point out the various virtual devices (veth, virtio,
> etc) and how they report speed.  They lock the speed to a certain
> value, but that doesn't actually mean anything because they are not
> hardware based and their throughput is more a function of current CPU
> load rather than actual wire speed.  The 'tun' driver locks the rate to
> 10Mb/s with no capability to change.  These are another case of
> mismatch in expectations between bonding and reported speeds.
>
> Again, there is a problem that should be solved.  I am only advocating
> that instead of simply adding ethtool get_settings support to WiFi
> drivers and washing our hands of it, which may have unintended
> consequences, we gather all the information first, and discuss whether
> the bonding driver may need to adjust its expectations before this kind
> of change is made.

Thanks for thoughtfully weighing in on this, Dan.  I do think that if
support for this is as significant as suggested by the many concerned
users in this thread, you are correct it is worth examining to make
sure that bonding and wireless can work happily together.

>
> Dan
>
>> trying to make my wireless bonding work again.  But I also don't want
>> to simply
>> "slap down" Mahesh, by only reverting his patch, which addressed
>> another, real,
>> problem.  This needs to be a cooperative effort.  How do *we all*
>> address the
>> problem that Mahesh was trying to resolve, and, at the same time,
>> continue to
>> support wireless bonding?  Please, don't just "kick the can down the
>> road".  It
>> seems to me that Mahesh must have been pretty upset about the
>> wireless drivers
>> not reporting speed, to have written a patch that just disables the
>> wireless
>> interface when the reporting fails.  Think about it.
>>
>> If there is a long-standing screw-up with the wireless drivers
>> failing to
>> properly support 'section 11.46 ("Estimated throughput") of
>> IEEE802.11-2016',
>> then let's start-off by admitting that.  *Then* everyone can argue
>> about what to
>> do about it.  And, if that's not the underlying problem, let's make
>> that
>> determination.  I'm just trying to find a way forward.
>>
>> > No, it's not a fault of ethtool.  Ethtool only reports something,
>> > it's
>> > up to the thing that interprets that data (eg, bonding) to do the
>> > right
>> > thing with it.
>>
>> It has not yet been established that there is anything - "Estimated
>> throughput"
>> - being provided universally by the wireless drivers for the kernel
>> ethtool to
>> report.  So, you cannot blame this immediately upon "the thing that
>> interprets
>> the data (eg, bonding)", when there *is no data* to interpret.  That
>> was the
>> original question and issue.  There first *has* to be some data to
>> interpret!
>>
>> I will say that it is no more appropriate that the wireless drivers
>> generate a
>> "piss-off" error on a get_settings() request than that the bonding
>> module
>> respond with a "screw-you", disabling the wireless interface when it
>> returns
>> that error.  This has turned into some kind of nasty lovers
>> quarrel.  Or like a
>> couple of children having temper tantrums and retaliations.
>>
>> > Likely every wireless driver, except that for mac80211-based
>> > drivers it
>> > would only take updating the mac80211 stack.
>>
>> Ok.  That sounds positive.  Then there is a possibility to both
>> update the
>> mac80211 stack, to provide "Estimated throughput", and also for the
>> bonding
>> module to fall-back to a work-around for those wireless drivers that
>> do not use
>> the mac80211 stack.
>>
>> > I'm only
>> > saying that even with the patch, it's not going to do exactly what
>> > you
>> > want it to do, and even if it works for you 90% of the time, it's
>> > not
>> > going to work for others that much of the time, and thus it gives a
>> > false sense of "correctness" which is just wrong.
>>
>> Ok.  So what *is* the "right thing" to do here?
>>
>> The current, actual, in-place, "solution", implemented now, in the
>> linux kernel,
>> is to simply "nuke" all wireless network interfaces that try to use
>> the bonding
>> module.  I'd say that is a "rude, slap in the face" solution, but it
>> suggests to
>> me that there is a sense of "hopelessness" in trying to get some
>> support from
>> the wireless driver people, to actually fix the wireless speed
>> reporting issue.
>> We could say that a patch "nuking" all wireless network interfaces is
>> really a
>> desperate cry for help.  And this patch was signed-off by David
>> Miller.
>>
>> > When I did 'iw dev wlp4s0 link' with a 2.4GHz baby monitor on in
>> > the
>> > next room, my device flipped continuously between ~70Mb/s and
>> > 130Mb/s
>> > every couple seconds. YMMV.  It's gonna be the same anywhere near a
>> > microwave.
>>
>> It appears to me to be absolutely certain that both 70Mb/s and
>> 130Mb/s are
>> greater link speeds than, for instance, 10Mb/s wired ethernet, or
>> 54Mb/s 802.11g
>> wireless.  You see?
>>
>> > There is no "stable" link speed.  The link selects the maximum
>> > speed
>> > that produces as few errors as possible, and adjusts that speed
>> > continuously due to the radio environment.  Again, many external
>> > factors that you have no control over affect link speed.
>>
>> It is *not* the responsibility of the wireless driver to determine
>> the policy of
>> the bonding module!  It is only the responsibility of the wireless
>> driver to
>> *report* the speed of the wireless link.  Don't try to "second-guess"
>> the
>> bonding module people!
>>
>> I imagine that a huge step forward would be made if only the kernel
>> ethtool did
>> not just report an error in response to a wireless driver
>> get_settings() request!
>>
>> > I'm suggesting that if the bonding driver is expecting a
>> > *continuous*
>> > stable link rate from any kind of radio device, whether that's
>> > WiFi,
>> > WWAN, Bluetooth, or whatever, it's being unreasonable.
>> >
>> > It's not necessarily unreasonable to add speed/duplex reporting to
>> > the
>> > ethtool hooks for wifi drivers.  But before that happens, we should
>> > understand what other bits will use that information, how they use
>> > it,
>> > and if they are going to use it incorrectly and thus do something
>> > that
>> > users don't expect and consider a bug itself.
>>
>> I really appreciate that you are engaging this topic - because lots
>> of people
>> have not responded at all.  So, what do you suggest, with respect to
>> addressing
>> the issue that Mahesh's patch was trying to address?  Do we ridicule
>> Mahesh for
>> his "slap in the face" patch?  And David Miller for signing-off on
>> it?  Or do we
>> update speed reporting in the wireless drivers, and provide some
>> support for the
>> bonding module people, to make the bonding module do what they want?
>>
>> Because, right now, wireless bonding is absolutely, and purposefully,
>> "broken".
>>
>>
>> James

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-08-11 18:44 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-08 19:07 wireless drivers fail to report link speed? James Feeney
2017-08-08 21:42 ` Dan Williams
2017-08-08 21:58   ` Ben Greear
2017-08-08 22:25     ` James Feeney
2017-08-08 22:49       ` Ben Greear
2017-08-09  0:36         ` James Feeney
2017-08-09  9:30           ` Arend van Spriel
2017-08-09 17:01             ` James Feeney
2017-08-09 18:25               ` Dan Williams
2017-08-10  1:20                 ` James Feeney
2017-08-11 18:43                 ` Andy Gospodarek
2017-08-10  5:25               ` Kalle Valo
2017-08-09 13:43           ` Dan Williams
2017-08-08 23:43       ` Dan Williams
2017-08-09 12:24 ` Kalle Valo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.