On 2018-10-12 03:16, Toke Høiland-Jørgensen wrote:
> 
> - Just loop with the smaller quantum until one of the stations go into
>   the positive (what we do now).
> 
> - Go through all active stations, find the one that is closest being in
>   the positive, and add that amount to the quantum. I.e., something
>   like (assuming no station has positive deficit; if one does, you 
> don't
>   want to add anything anyway):
> 
>   to_add = -(max(stn.deficit) for stn in active stations)
>   for stn in active stations:
>     stn.deficit += to_add + stn.weight
> 
Toke,

Sorry for the delayed response. I did lot of experiments. Below are my 
observations.
Sorry for lengthy reply.

In current model, next_txq() is main routine that serves DRR and 
fairness is
enforced by serving only only first txq. Here the first node could be 
either
newly initiated traffic or returned node by return_txq(). This works 
perfectly
as long as the driver is running any RR algo.

Whereas in ath10k, firmware runs its own RR in pull mode and builds txq 
list
based on driver's hostq table. In this case it can not be simply assumed 
that
firmware always gives fetch request for first node of mac80211's txq 
list.
i.e both RR algo could be out of sync.

Two major differences b/w ath9k and ath10k

1) Serving txqs
The ath9k always serves txq by next_txq and so that the txqs_list is 
rotated to serve
other txq. But in ath10k (pull-mode), first node becomes sticky one 
until it is
picked by firmware via fetch indication and it becomes negative deficit.
The sequence is followed in wake_tx_queue

    - dequeue first node
    - push is not allowed
    - enqueue same txq back to head

2) Refill rate of deficit.

The ath9k refills deficit mostly in hot path by next_txq() in tx & isr 
routine.
In case of ath10k, due to above problem, deficits wont be filled in hot 
path.
Either it should be filled in fetch_ind itself or by scheduling another 
task.
Both the approaches are slower compared to hot path when the driver is 
bursting
aggregation. On an idle condition a single fetch indication can dequeue 
~190 msdus
from each tid of give stn list. This drains the deficit quickly and 
becomes too low.
To speed up this, either refill the station by multiples of stn airtime 
weight or
allows the txqs_list rotation. So that next_txq will be used for 
refilling deficit.

Attaching return_txq() change that helps to get rid of quantum multiple.

-Rajkumar