* [RFC] mt76: fix tx hung regression on MT7630E @ 2019-07-26 12:10 Stanislaw Gruszka 2019-07-29 12:53 ` Stanislaw Gruszka 0 siblings, 1 reply; 12+ messages in thread From: Stanislaw Gruszka @ 2019-07-26 12:10 UTC (permalink / raw) To: linux-wireless; +Cc: Felix Fietkau, Lorenzo Bianconi, Ryder Lee, Roy Luo Since 41634aa8d6db ("mt76: only schedule txqs from the tx tasklet") I can observe firmware hangs on MT7630E on station mode: tx stop functioning after minor activity (rx keep working) and on module unload device fail to stop with messages: [ 5446.141413] mt76x0e 0000:06:00.0: TX DMA did not stop [ 5449.176764] mt76x0e 0000:06:00.0: TX DMA did not stop Loading module again results in failure to associate with AP. Only machine power off / power on cycle can make device work again. I have no idea why the commit caused F/W hangs. Most likely some proper fix is needed of changing registers programming (or assuring proper order of actions), but so far I can not came up with any better fix than a partial revert of 41634aa8d6db. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> --- drivers/net/wireless/mediatek/mt76/tx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/mediatek/mt76/tx.c b/drivers/net/wireless/mediatek/mt76/tx.c index 5397827668b9..fefe0ee52584 100644 --- a/drivers/net/wireless/mediatek/mt76/tx.c +++ b/drivers/net/wireless/mediatek/mt76/tx.c @@ -598,7 +598,7 @@ void mt76_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *txq) if (!test_bit(MT76_STATE_RUNNING, &dev->state)) return; - tasklet_schedule(&dev->tx_tasklet); + mt76_txq_schedule(dev, txq->ac); } EXPORT_SYMBOL_GPL(mt76_wake_tx_queue); -- 1.9.3 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC] mt76: fix tx hung regression on MT7630E 2019-07-26 12:10 [RFC] mt76: fix tx hung regression on MT7630E Stanislaw Gruszka @ 2019-07-29 12:53 ` Stanislaw Gruszka 2019-07-29 14:02 ` Lorenzo Bianconi 0 siblings, 1 reply; 12+ messages in thread From: Stanislaw Gruszka @ 2019-07-29 12:53 UTC (permalink / raw) To: linux-wireless; +Cc: Felix Fietkau, Lorenzo Bianconi, Ryder Lee, Roy Luo On Fri, Jul 26, 2019 at 02:10:56PM +0200, Stanislaw Gruszka wrote: > Since 41634aa8d6db ("mt76: only schedule txqs from the tx tasklet") > I can observe firmware hangs on MT7630E on station mode: tx stop > functioning after minor activity (rx keep working) and on module > unload device fail to stop with messages: > > [ 5446.141413] mt76x0e 0000:06:00.0: TX DMA did not stop > [ 5449.176764] mt76x0e 0000:06:00.0: TX DMA did not stop > > Loading module again results in failure to associate with AP. > Only machine power off / power on cycle can make device work again. > > I have no idea why the commit caused F/W hangs. Most likely some proper > fix is needed of changing registers programming (or assuring proper order > of actions), but so far I can not came up with any better fix than > a partial revert of 41634aa8d6db. The difference is that with 41634aa8d6db we can run mt76x02_poll_tx() and mt76x02_tx_tasklet() in parallel on 2 different CPUs. Without the commit, since tasklet is not scheduled from mt76_wake_tx_queue(), it does not happen. I'm not quite sure why, but MT7630E does not like when we poll tx status on 2 cpus at once. Change like below: diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c index 467b28379870..622251faa415 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c @@ -170,7 +170,7 @@ static int mt76x02_poll_tx(struct napi_struct *napi, int budget) mt76.tx_napi); int i; - mt76x02_mac_poll_tx_status(dev, false); + mt76x02_mac_poll_tx_status(dev, true); for (i = MT_TXQ_MCU; i >= 0; i--) mt76_queue_tx_cleanup(dev, i, false); is sufficient to avoid the hangs. > diff --git a/drivers/net/wireless/mediatek/mt76/tx.c b/drivers/net/wireless/mediatek/mt76/tx.c > index 5397827668b9..fefe0ee52584 100644 > --- a/drivers/net/wireless/mediatek/mt76/tx.c > +++ b/drivers/net/wireless/mediatek/mt76/tx.c > @@ -598,7 +598,7 @@ void mt76_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *txq) > if (!test_bit(MT76_STATE_RUNNING, &dev->state)) > return; > > - tasklet_schedule(&dev->tx_tasklet); > + mt76_txq_schedule(dev, txq->ac); However I'm not sure if change to tasklet_schedule() is indeed correct, since on tasklet we schedule all queues, hence queues that could possibly be still blocked? And on mt76_wake_tx_queue() we schedule only one queue. Stanislaw ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC] mt76: fix tx hung regression on MT7630E 2019-07-29 12:53 ` Stanislaw Gruszka @ 2019-07-29 14:02 ` Lorenzo Bianconi 2019-07-30 13:54 ` Stanislaw Gruszka 0 siblings, 1 reply; 12+ messages in thread From: Lorenzo Bianconi @ 2019-07-29 14:02 UTC (permalink / raw) To: Stanislaw Gruszka; +Cc: linux-wireless, Felix Fietkau, Ryder Lee, Roy Luo [-- Attachment #1: Type: text/plain, Size: 3146 bytes --] > On Fri, Jul 26, 2019 at 02:10:56PM +0200, Stanislaw Gruszka wrote: > > Since 41634aa8d6db ("mt76: only schedule txqs from the tx tasklet") > > I can observe firmware hangs on MT7630E on station mode: tx stop > > functioning after minor activity (rx keep working) and on module > > unload device fail to stop with messages: > > > > [ 5446.141413] mt76x0e 0000:06:00.0: TX DMA did not stop > > [ 5449.176764] mt76x0e 0000:06:00.0: TX DMA did not stop > > > > Loading module again results in failure to associate with AP. > > Only machine power off / power on cycle can make device work again. > > > > I have no idea why the commit caused F/W hangs. Most likely some proper > > fix is needed of changing registers programming (or assuring proper order > > of actions), but so far I can not came up with any better fix than > > a partial revert of 41634aa8d6db. > > The difference is that with 41634aa8d6db we can run mt76x02_poll_tx() > and mt76x02_tx_tasklet() in parallel on 2 different CPUs. Without > the commit, since tasklet is not scheduled from mt76_wake_tx_queue(), > it does not happen. > > I'm not quite sure why, but MT7630E does not like when we poll tx status > on 2 cpus at once. Change like below: Hi Stanislaw, have you tried to look at the commit: 6fe533378795f87 ("mt76: mt76x02: remove irqsave/restore in locking for tx status fifo")? Could it be a race between a registermap update and a stats load? (we are using a different lock now) > > diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > index 467b28379870..622251faa415 100644 > --- a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > @@ -170,7 +170,7 @@ static int mt76x02_poll_tx(struct napi_struct *napi, int budget) > mt76.tx_napi); > int i; > > - mt76x02_mac_poll_tx_status(dev, false); > + mt76x02_mac_poll_tx_status(dev, true); I am not sure if we really need mt76x02_mac_poll_tx_status() here since we run it in mt76x02_tx_complete_skb() and in mt76x02_tx_tasklet(). Anyway the only difference doing so is we do not run mt76x02_send_tx_status(). Regards, Lorenzo > > for (i = MT_TXQ_MCU; i >= 0; i--) > mt76_queue_tx_cleanup(dev, i, false); > > is sufficient to avoid the hangs. > > > diff --git a/drivers/net/wireless/mediatek/mt76/tx.c b/drivers/net/wireless/mediatek/mt76/tx.c > > index 5397827668b9..fefe0ee52584 100644 > > --- a/drivers/net/wireless/mediatek/mt76/tx.c > > +++ b/drivers/net/wireless/mediatek/mt76/tx.c > > @@ -598,7 +598,7 @@ void mt76_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *txq) > > if (!test_bit(MT76_STATE_RUNNING, &dev->state)) > > return; > > > > - tasklet_schedule(&dev->tx_tasklet); > > + mt76_txq_schedule(dev, txq->ac); > > However I'm not sure if change to tasklet_schedule() is indeed correct, > since on tasklet we schedule all queues, hence queues that could > possibly be still blocked? And on mt76_wake_tx_queue() we schedule only > one queue. > > Stanislaw [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] mt76: fix tx hung regression on MT7630E 2019-07-29 14:02 ` Lorenzo Bianconi @ 2019-07-30 13:54 ` Stanislaw Gruszka 2019-07-30 14:55 ` Lorenzo Bianconi 0 siblings, 1 reply; 12+ messages in thread From: Stanislaw Gruszka @ 2019-07-30 13:54 UTC (permalink / raw) To: Lorenzo Bianconi; +Cc: linux-wireless, Felix Fietkau, Ryder Lee, Roy Luo On Mon, Jul 29, 2019 at 04:02:41PM +0200, Lorenzo Bianconi wrote: > > On Fri, Jul 26, 2019 at 02:10:56PM +0200, Stanislaw Gruszka wrote: > > > Since 41634aa8d6db ("mt76: only schedule txqs from the tx tasklet") > > > I can observe firmware hangs on MT7630E on station mode: tx stop > > > functioning after minor activity (rx keep working) and on module > > > unload device fail to stop with messages: > > > > > > [ 5446.141413] mt76x0e 0000:06:00.0: TX DMA did not stop > > > [ 5449.176764] mt76x0e 0000:06:00.0: TX DMA did not stop > > > > > > Loading module again results in failure to associate with AP. > > > Only machine power off / power on cycle can make device work again. > > > > > > I have no idea why the commit caused F/W hangs. Most likely some proper > > > fix is needed of changing registers programming (or assuring proper order > > > of actions), but so far I can not came up with any better fix than > > > a partial revert of 41634aa8d6db. > > > > The difference is that with 41634aa8d6db we can run mt76x02_poll_tx() > > and mt76x02_tx_tasklet() in parallel on 2 different CPUs. Without > > the commit, since tasklet is not scheduled from mt76_wake_tx_queue(), > > it does not happen. > > > > I'm not quite sure why, but MT7630E does not like when we poll tx status > > on 2 cpus at once. Change like below: > > Hi Stanislaw, Hi > have you tried to look at the commit: 6fe533378795f87 > ("mt76: mt76x02: remove irqsave/restore in locking for tx status fifo")? > Could it be a race between a registermap update and a stats load? (we are using a > different lock now) This commit seems to be fine, reverting it did not solve the issue. > > diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > index 467b28379870..622251faa415 100644 > > --- a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > @@ -170,7 +170,7 @@ static int mt76x02_poll_tx(struct napi_struct *napi, int budget) > > mt76.tx_napi); > > int i; > > > > - mt76x02_mac_poll_tx_status(dev, false); > > + mt76x02_mac_poll_tx_status(dev, true); > > I am not sure if we really need mt76x02_mac_poll_tx_status() here since we run > it in mt76x02_tx_complete_skb() and in mt76x02_tx_tasklet(). Anyway the only > difference doing so is we do not run mt76x02_send_tx_status(). I thought this is the problem, but it was my mistake during testing. I tested the above change together with mt76_txq_schedule(dev, txq->ac) change and get wrong impression it fixes the issue. But above change alone does not help. I tried to add some locking to avoid parallel execution of mt76x02_poll_tx() and mt76x02_tx_tasklet(), but it didn't help either. So far only patch originally posted here make the problem gone. Stanislaw ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] mt76: fix tx hung regression on MT7630E 2019-07-30 13:54 ` Stanislaw Gruszka @ 2019-07-30 14:55 ` Lorenzo Bianconi 2019-07-31 8:19 ` Stanislaw Gruszka 0 siblings, 1 reply; 12+ messages in thread From: Lorenzo Bianconi @ 2019-07-30 14:55 UTC (permalink / raw) To: Stanislaw Gruszka; +Cc: linux-wireless, Felix Fietkau, Ryder Lee, Roy Luo [-- Attachment #1: Type: text/plain, Size: 3278 bytes --] > On Mon, Jul 29, 2019 at 04:02:41PM +0200, Lorenzo Bianconi wrote: > > > On Fri, Jul 26, 2019 at 02:10:56PM +0200, Stanislaw Gruszka wrote: > > > > Since 41634aa8d6db ("mt76: only schedule txqs from the tx tasklet") > > > > I can observe firmware hangs on MT7630E on station mode: tx stop > > > > functioning after minor activity (rx keep working) and on module > > > > unload device fail to stop with messages: > > > > > > > > [ 5446.141413] mt76x0e 0000:06:00.0: TX DMA did not stop > > > > [ 5449.176764] mt76x0e 0000:06:00.0: TX DMA did not stop > > > > > > > > Loading module again results in failure to associate with AP. > > > > Only machine power off / power on cycle can make device work again. > > > > > > > > I have no idea why the commit caused F/W hangs. Most likely some proper > > > > fix is needed of changing registers programming (or assuring proper order > > > > of actions), but so far I can not came up with any better fix than > > > > a partial revert of 41634aa8d6db. > > > > > > The difference is that with 41634aa8d6db we can run mt76x02_poll_tx() > > > and mt76x02_tx_tasklet() in parallel on 2 different CPUs. Without > > > the commit, since tasklet is not scheduled from mt76_wake_tx_queue(), > > > it does not happen. > > > > > > I'm not quite sure why, but MT7630E does not like when we poll tx status > > > on 2 cpus at once. Change like below: > > > > Hi Stanislaw, > > Hi > > > have you tried to look at the commit: 6fe533378795f87 > > ("mt76: mt76x02: remove irqsave/restore in locking for tx status fifo")? > > Could it be a race between a registermap update and a stats load? (we are using a > > different lock now) > > This commit seems to be fine, reverting it did not solve the issue. ack > > > > diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > > index 467b28379870..622251faa415 100644 > > > --- a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > > +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > > @@ -170,7 +170,7 @@ static int mt76x02_poll_tx(struct napi_struct *napi, int budget) > > > mt76.tx_napi); > > > int i; > > > > > > - mt76x02_mac_poll_tx_status(dev, false); > > > + mt76x02_mac_poll_tx_status(dev, true); > > > > I am not sure if we really need mt76x02_mac_poll_tx_status() here since we run > > it in mt76x02_tx_complete_skb() and in mt76x02_tx_tasklet(). Anyway the only > > difference doing so is we do not run mt76x02_send_tx_status(). > > I thought this is the problem, but it was my mistake during testing. > I tested the above change together with mt76_txq_schedule(dev, txq->ac) > change and get wrong impression it fixes the issue. But above change > alone does not help. > > I tried to add some locking to avoid parallel execution of mt76x02_poll_tx() > and mt76x02_tx_tasklet(), but it didn't help either. So far only patch > originally posted here make the problem gone. so, in order to be on the same page, if you comment out mt76x02_mac_poll_tx_status() in mt76x02_poll_tx() the issue will still occur. The only to 'fix' it is to run mt76_txq_schedule_all() in mt76x02_poll_tx(), right? Lorenzo > > Stanislaw > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] mt76: fix tx hung regression on MT7630E 2019-07-30 14:55 ` Lorenzo Bianconi @ 2019-07-31 8:19 ` Stanislaw Gruszka 2019-07-31 8:51 ` Stanislaw Gruszka 0 siblings, 1 reply; 12+ messages in thread From: Stanislaw Gruszka @ 2019-07-31 8:19 UTC (permalink / raw) To: Lorenzo Bianconi; +Cc: linux-wireless, Felix Fietkau, Ryder Lee, Roy Luo On Tue, Jul 30, 2019 at 04:55:31PM +0200, Lorenzo Bianconi wrote: > > > > diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > > > index 467b28379870..622251faa415 100644 > > > > --- a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > > > +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > > > @@ -170,7 +170,7 @@ static int mt76x02_poll_tx(struct napi_struct *napi, int budget) > > > > mt76.tx_napi); > > > > int i; > > > > > > > > - mt76x02_mac_poll_tx_status(dev, false); > > > > + mt76x02_mac_poll_tx_status(dev, true); > > > > > > I am not sure if we really need mt76x02_mac_poll_tx_status() here since we run > > > it in mt76x02_tx_complete_skb() and in mt76x02_tx_tasklet(). Anyway the only > > > difference doing so is we do not run mt76x02_send_tx_status(). > > > > I thought this is the problem, but it was my mistake during testing. > > I tested the above change together with mt76_txq_schedule(dev, txq->ac) > > change and get wrong impression it fixes the issue. But above change > > alone does not help. > > > > I tried to add some locking to avoid parallel execution of mt76x02_poll_tx() > > and mt76x02_tx_tasklet(), but it didn't help either. So far only patch > > originally posted here make the problem gone. > > so, in order to be on the same page, if you comment out mt76x02_mac_poll_tx_status() > in mt76x02_poll_tx() the issue will still occur. The only to 'fix' it is to run > mt76_txq_schedule_all() in mt76x02_poll_tx(), right? Yes. Stanislaw ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] mt76: fix tx hung regression on MT7630E 2019-07-31 8:19 ` Stanislaw Gruszka @ 2019-07-31 8:51 ` Stanislaw Gruszka 2019-07-31 9:09 ` Lorenzo Bianconi 0 siblings, 1 reply; 12+ messages in thread From: Stanislaw Gruszka @ 2019-07-31 8:51 UTC (permalink / raw) To: Lorenzo Bianconi; +Cc: linux-wireless, Felix Fietkau, Ryder Lee, Roy Luo On Wed, Jul 31, 2019 at 10:19:58AM +0200, Stanislaw Gruszka wrote: > On Tue, Jul 30, 2019 at 04:55:31PM +0200, Lorenzo Bianconi wrote: > > > > > diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > > > > index 467b28379870..622251faa415 100644 > > > > > --- a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > > > > +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > > > > @@ -170,7 +170,7 @@ static int mt76x02_poll_tx(struct napi_struct *napi, int budget) > > > > > mt76.tx_napi); > > > > > int i; > > > > > > > > > > - mt76x02_mac_poll_tx_status(dev, false); > > > > > + mt76x02_mac_poll_tx_status(dev, true); > > > > > > > > I am not sure if we really need mt76x02_mac_poll_tx_status() here since we run > > > > it in mt76x02_tx_complete_skb() and in mt76x02_tx_tasklet(). Anyway the only > > > > difference doing so is we do not run mt76x02_send_tx_status(). > > > > > > I thought this is the problem, but it was my mistake during testing. > > > I tested the above change together with mt76_txq_schedule(dev, txq->ac) > > > change and get wrong impression it fixes the issue. But above change > > > alone does not help. > > > > > > I tried to add some locking to avoid parallel execution of mt76x02_poll_tx() > > > and mt76x02_tx_tasklet(), but it didn't help either. So far only patch > > > originally posted here make the problem gone. > > > > so, in order to be on the same page, if you comment out mt76x02_mac_poll_tx_status() > > in mt76x02_poll_tx() the issue will still occur. The only to 'fix' it is to run > > mt76_txq_schedule_all() in mt76x02_poll_tx(), right? > > Yes. Err, no, I should read more cerfully. It is partiall revert of 41634aa8d6db ("mt76: only schedule txqs from the tx tasklet") : diff --git a/drivers/net/wireless/mediatek/mt76/tx.c b/drivers/net/wireless/mediatek/mt76/tx.c index 5397827668b9..fefe0ee52584 100644 --- a/drivers/net/wireless/mediatek/mt76/tx.c +++ b/drivers/net/wireless/mediatek/mt76/tx.c @@ -598,7 +598,7 @@ void mt76_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *txq) if (!test_bit(MT76_STATE_RUNNING, &dev->state)) return; - tasklet_schedule(&dev->tx_tasklet); + mt76_txq_schedule(dev, txq->ac); } EXPORT_SYMBOL_GPL(mt76_wake_tx_queue); ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC] mt76: fix tx hung regression on MT7630E 2019-07-31 8:51 ` Stanislaw Gruszka @ 2019-07-31 9:09 ` Lorenzo Bianconi 2019-08-05 10:01 ` Stanislaw Gruszka 0 siblings, 1 reply; 12+ messages in thread From: Lorenzo Bianconi @ 2019-07-31 9:09 UTC (permalink / raw) To: Stanislaw Gruszka; +Cc: linux-wireless, Felix Fietkau, Ryder Lee, Roy Luo [-- Attachment #1: Type: text/plain, Size: 3253 bytes --] > On Wed, Jul 31, 2019 at 10:19:58AM +0200, Stanislaw Gruszka wrote: > > On Tue, Jul 30, 2019 at 04:55:31PM +0200, Lorenzo Bianconi wrote: > > > > > > diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > > > > > index 467b28379870..622251faa415 100644 > > > > > > --- a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > > > > > +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > > > > > @@ -170,7 +170,7 @@ static int mt76x02_poll_tx(struct napi_struct *napi, int budget) > > > > > > mt76.tx_napi); > > > > > > int i; > > > > > > > > > > > > - mt76x02_mac_poll_tx_status(dev, false); > > > > > > + mt76x02_mac_poll_tx_status(dev, true); > > > > > > > > > > I am not sure if we really need mt76x02_mac_poll_tx_status() here since we run > > > > > it in mt76x02_tx_complete_skb() and in mt76x02_tx_tasklet(). Anyway the only > > > > > difference doing so is we do not run mt76x02_send_tx_status(). > > > > > > > > I thought this is the problem, but it was my mistake during testing. > > > > I tested the above change together with mt76_txq_schedule(dev, txq->ac) > > > > change and get wrong impression it fixes the issue. But above change > > > > alone does not help. > > > > > > > > I tried to add some locking to avoid parallel execution of mt76x02_poll_tx() > > > > and mt76x02_tx_tasklet(), but it didn't help either. So far only patch > > > > originally posted here make the problem gone. > > > > > > so, in order to be on the same page, if you comment out mt76x02_mac_poll_tx_status() > > > in mt76x02_poll_tx() the issue will still occur. The only to 'fix' it is to run > > > mt76_txq_schedule_all() in mt76x02_poll_tx(), right? > > > > Yes. > > Err, no, I should read more cerfully. It is partiall revert of > 41634aa8d6db ("mt76: only schedule txqs from the tx tasklet") : > > diff --git a/drivers/net/wireless/mediatek/mt76/tx.c b/drivers/net/wireless/mediatek/mt76/tx.c > index 5397827668b9..fefe0ee52584 100644 > --- a/drivers/net/wireless/mediatek/mt76/tx.c > +++ b/drivers/net/wireless/mediatek/mt76/tx.c > @@ -598,7 +598,7 @@ void mt76_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *txq) > if (!test_bit(MT76_STATE_RUNNING, &dev->state)) > return; > > - tasklet_schedule(&dev->tx_tasklet); > + mt76_txq_schedule(dev, txq->ac); > } > EXPORT_SYMBOL_GPL(mt76_wake_tx_queue); reviewing the code I think: - we should not run mt76u_tx_tasklet() from mt76_wake_tx_queue() since we do not have tx_napi for usb and it will unnecessary go through tx queue checks. We should probably do in mt76_wake_tx_queue() something like: if (is_mmio()) tasklet_schedule(&dev->tx_tasklet); else mt76_txq_schedule(dev, txq->ac); Another solution would be add a status_tasklet that just goes through the tx queues receiving the usb tx completion and it schedules the tx_tasklet What do you think? - I guess it does not fix the 76x0e issue but we should just schedule tx queues in mt76x02_tx_tasklet() (like it is done for mt7603 and mt7615) and move status processing in mt76x02_poll_tx() Regards, Lorenzo [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] mt76: fix tx hung regression on MT7630E 2019-07-31 9:09 ` Lorenzo Bianconi @ 2019-08-05 10:01 ` Stanislaw Gruszka 2019-08-05 11:27 ` Lorenzo Bianconi 0 siblings, 1 reply; 12+ messages in thread From: Stanislaw Gruszka @ 2019-08-05 10:01 UTC (permalink / raw) To: Lorenzo Bianconi; +Cc: linux-wireless, Felix Fietkau, Ryder Lee, Roy Luo On Wed, Jul 31, 2019 at 11:09:27AM +0200, Lorenzo Bianconi wrote: > > On Wed, Jul 31, 2019 at 10:19:58AM +0200, Stanislaw Gruszka wrote: > > > On Tue, Jul 30, 2019 at 04:55:31PM +0200, Lorenzo Bianconi wrote: > > > > > > > diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > > > > > > index 467b28379870..622251faa415 100644 > > > > > > > --- a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > > > > > > +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > > > > > > @@ -170,7 +170,7 @@ static int mt76x02_poll_tx(struct napi_struct *napi, int budget) > > > > > > > mt76.tx_napi); > > > > > > > int i; > > > > > > > > > > > > > > - mt76x02_mac_poll_tx_status(dev, false); > > > > > > > + mt76x02_mac_poll_tx_status(dev, true); > > > > > > > > > > > > I am not sure if we really need mt76x02_mac_poll_tx_status() here since we run > > > > > > it in mt76x02_tx_complete_skb() and in mt76x02_tx_tasklet(). Anyway the only > > > > > > difference doing so is we do not run mt76x02_send_tx_status(). > > > > > > > > > > I thought this is the problem, but it was my mistake during testing. > > > > > I tested the above change together with mt76_txq_schedule(dev, txq->ac) > > > > > change and get wrong impression it fixes the issue. But above change > > > > > alone does not help. > > > > > > > > > > I tried to add some locking to avoid parallel execution of mt76x02_poll_tx() > > > > > and mt76x02_tx_tasklet(), but it didn't help either. So far only patch > > > > > originally posted here make the problem gone. > > > > > > > > so, in order to be on the same page, if you comment out mt76x02_mac_poll_tx_status() > > > > in mt76x02_poll_tx() the issue will still occur. The only to 'fix' it is to run > > > > mt76_txq_schedule_all() in mt76x02_poll_tx(), right? > > > > > > Yes. > > > > Err, no, I should read more cerfully. It is partiall revert of > > 41634aa8d6db ("mt76: only schedule txqs from the tx tasklet") : > > > > diff --git a/drivers/net/wireless/mediatek/mt76/tx.c b/drivers/net/wireless/mediatek/mt76/tx.c > > index 5397827668b9..fefe0ee52584 100644 > > --- a/drivers/net/wireless/mediatek/mt76/tx.c > > +++ b/drivers/net/wireless/mediatek/mt76/tx.c > > @@ -598,7 +598,7 @@ void mt76_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *txq) > > if (!test_bit(MT76_STATE_RUNNING, &dev->state)) > > return; > > > > - tasklet_schedule(&dev->tx_tasklet); > > + mt76_txq_schedule(dev, txq->ac); > > } > > EXPORT_SYMBOL_GPL(mt76_wake_tx_queue); > > reviewing the code I think: > > - we should not run mt76u_tx_tasklet() from mt76_wake_tx_queue() since we do > not have tx_napi for usb and it will unnecessary go through tx queue checks. > We should probably do in mt76_wake_tx_queue() something like: > > if (is_mmio()) Adding '&& !is_mt7630()' will solve the problem for MT7630E as well ... > tasklet_schedule(&dev->tx_tasklet); > else > mt76_txq_schedule(dev, txq->ac); > > Another solution would be add a status_tasklet that just goes through the tx > queues receiving the usb tx completion and it schedules the tx_tasklet > What do you think? > > - I guess it does not fix the 76x0e issue but we should just schedule tx queues in > mt76x02_tx_tasklet() (like it is done for mt7603 and mt7615) and move status > processing in mt76x02_poll_tx() ... but I think we have bug when do mt76_txq_schedule_all() in tx_tasklet, because we can schedule on queues that are stoped. So reverting 41634aa8d6db and then optimize by removing tx_tasklet for mmio and remove not needed mt76_txq_schedule_all() calls looks more reasoneble to me. Stanislaw ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] mt76: fix tx hung regression on MT7630E 2019-08-05 10:01 ` Stanislaw Gruszka @ 2019-08-05 11:27 ` Lorenzo Bianconi 2019-08-05 12:39 ` Stanislaw Gruszka 0 siblings, 1 reply; 12+ messages in thread From: Lorenzo Bianconi @ 2019-08-05 11:27 UTC (permalink / raw) To: Stanislaw Gruszka; +Cc: linux-wireless, Felix Fietkau, Ryder Lee, Roy Luo [-- Attachment #1: Type: text/plain, Size: 5246 bytes --] > On Wed, Jul 31, 2019 at 11:09:27AM +0200, Lorenzo Bianconi wrote: > > > On Wed, Jul 31, 2019 at 10:19:58AM +0200, Stanislaw Gruszka wrote: > > > > On Tue, Jul 30, 2019 at 04:55:31PM +0200, Lorenzo Bianconi wrote: > > > > > > > > diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > > > > > > > index 467b28379870..622251faa415 100644 > > > > > > > > --- a/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > > > > > > > +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_mmio.c > > > > > > > > @@ -170,7 +170,7 @@ static int mt76x02_poll_tx(struct napi_struct *napi, int budget) > > > > > > > > mt76.tx_napi); > > > > > > > > int i; > > > > > > > > > > > > > > > > - mt76x02_mac_poll_tx_status(dev, false); > > > > > > > > + mt76x02_mac_poll_tx_status(dev, true); > > > > > > > > > > > > > > I am not sure if we really need mt76x02_mac_poll_tx_status() here since we run > > > > > > > it in mt76x02_tx_complete_skb() and in mt76x02_tx_tasklet(). Anyway the only > > > > > > > difference doing so is we do not run mt76x02_send_tx_status(). > > > > > > > > > > > > I thought this is the problem, but it was my mistake during testing. > > > > > > I tested the above change together with mt76_txq_schedule(dev, txq->ac) > > > > > > change and get wrong impression it fixes the issue. But above change > > > > > > alone does not help. > > > > > > > > > > > > I tried to add some locking to avoid parallel execution of mt76x02_poll_tx() > > > > > > and mt76x02_tx_tasklet(), but it didn't help either. So far only patch > > > > > > originally posted here make the problem gone. > > > > > > > > > > so, in order to be on the same page, if you comment out mt76x02_mac_poll_tx_status() > > > > > in mt76x02_poll_tx() the issue will still occur. The only to 'fix' it is to run > > > > > mt76_txq_schedule_all() in mt76x02_poll_tx(), right? > > > > > > > > Yes. > > > > > > Err, no, I should read more cerfully. It is partiall revert of > > > 41634aa8d6db ("mt76: only schedule txqs from the tx tasklet") : > > > > > > diff --git a/drivers/net/wireless/mediatek/mt76/tx.c b/drivers/net/wireless/mediatek/mt76/tx.c > > > index 5397827668b9..fefe0ee52584 100644 > > > --- a/drivers/net/wireless/mediatek/mt76/tx.c > > > +++ b/drivers/net/wireless/mediatek/mt76/tx.c > > > @@ -598,7 +598,7 @@ void mt76_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *txq) > > > if (!test_bit(MT76_STATE_RUNNING, &dev->state)) > > > return; > > > > > > - tasklet_schedule(&dev->tx_tasklet); > > > + mt76_txq_schedule(dev, txq->ac); > > > } > > > EXPORT_SYMBOL_GPL(mt76_wake_tx_queue); > > > > reviewing the code I think: > > > > - we should not run mt76u_tx_tasklet() from mt76_wake_tx_queue() since we do > > not have tx_napi for usb and it will unnecessary go through tx queue checks. > > We should probably do in mt76_wake_tx_queue() something like: > > > > if (is_mmio()) > > Adding '&& !is_mt7630()' will solve the problem for MT7630E as well ... > > > tasklet_schedule(&dev->tx_tasklet); > > else > > mt76_txq_schedule(dev, txq->ac); > > > > Another solution would be add a status_tasklet that just goes through the tx > > queues receiving the usb tx completion and it schedules the tx_tasklet > > What do you think? > > > > - I guess it does not fix the 76x0e issue but we should just schedule tx queues in > > mt76x02_tx_tasklet() (like it is done for mt7603 and mt7615) and move status > > processing in mt76x02_poll_tx() > > ... but I think we have bug when do mt76_txq_schedule_all() in > tx_tasklet, because we can schedule on queues that are stoped. > So reverting 41634aa8d6db and then optimize by removing tx_tasklet > for mmio and remove not needed mt76_txq_schedule_all() calls looks > more reasoneble to me. schedule a stopped queue seems not harmful at a first glance since we do not copy pending skbs if we have not enough room in the dma ring. Maybe we can be more conservative doing something like: diff --git a/drivers/net/wireless/mediatek/mt76/dma.c b/drivers/net/wireless/mediatek/mt76/dma.c index d8f61e540bfd..c6482155e5e4 100644 --- a/drivers/net/wireless/mediatek/mt76/dma.c +++ b/drivers/net/wireless/mediatek/mt76/dma.c @@ -346,6 +346,11 @@ mt76_dma_tx_queue_skb(struct mt76_dev *dev, enum mt76_txq_id qid, goto unmap; if (q->queued + (tx_info.nbuf + 1) / 2 >= q->ndesc - 1) { + if (!q->stopped) { + ieee80211_stop_queue(dev->hw, + skb_get_queue_mapping(skb)); + q->stopped = true; + } ret = -ENOMEM; goto unmap; } diff --git a/drivers/net/wireless/mediatek/mt76/tx.c b/drivers/net/wireless/mediatek/mt76/tx.c index 5397827668b9..bd2d34c4f326 100644 --- a/drivers/net/wireless/mediatek/mt76/tx.c +++ b/drivers/net/wireless/mediatek/mt76/tx.c @@ -495,6 +495,9 @@ mt76_txq_schedule_list(struct mt76_dev *dev, enum mt76_txq_id qid) while (1) { bool empty = false; + if (hwq->stopped) + break; + if (sq->swq_queued >= 4) break; Does it fix the issue you are facing? Regards, Lorenzo > > Stanislaw [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC] mt76: fix tx hung regression on MT7630E 2019-08-05 11:27 ` Lorenzo Bianconi @ 2019-08-05 12:39 ` Stanislaw Gruszka 2019-08-12 11:50 ` Stanislaw Gruszka 0 siblings, 1 reply; 12+ messages in thread From: Stanislaw Gruszka @ 2019-08-05 12:39 UTC (permalink / raw) To: Lorenzo Bianconi; +Cc: linux-wireless, Felix Fietkau, Ryder Lee, Roy Luo On Mon, Aug 05, 2019 at 01:27:19PM +0200, Lorenzo Bianconi wrote: > > ... but I think we have bug when do mt76_txq_schedule_all() in > > tx_tasklet, because we can schedule on queues that are stoped. > > So reverting 41634aa8d6db and then optimize by removing tx_tasklet > > for mmio and remove not needed mt76_txq_schedule_all() calls looks > > more reasoneble to me. > > schedule a stopped queue seems not harmful at a first glance since we do not > copy pending skbs if we have not enough room in the dma ring. mac80211 stop queues for various other reasons than IEEE80211_QUEUE_STOP_REASON_DRIVER . > Maybe we can be > more conservative doing something like: > > diff --git a/drivers/net/wireless/mediatek/mt76/dma.c b/drivers/net/wireless/mediatek/mt76/dma.c > index d8f61e540bfd..c6482155e5e4 100644 > --- a/drivers/net/wireless/mediatek/mt76/dma.c > +++ b/drivers/net/wireless/mediatek/mt76/dma.c > @@ -346,6 +346,11 @@ mt76_dma_tx_queue_skb(struct mt76_dev *dev, enum mt76_txq_id qid, > goto unmap; > > if (q->queued + (tx_info.nbuf + 1) / 2 >= q->ndesc - 1) { > + if (!q->stopped) { > + ieee80211_stop_queue(dev->hw, > + skb_get_queue_mapping(skb)); > + q->stopped = true; > + } > ret = -ENOMEM; > goto unmap; > } > diff --git a/drivers/net/wireless/mediatek/mt76/tx.c b/drivers/net/wireless/mediatek/mt76/tx.c > index 5397827668b9..bd2d34c4f326 100644 > --- a/drivers/net/wireless/mediatek/mt76/tx.c > +++ b/drivers/net/wireless/mediatek/mt76/tx.c > @@ -495,6 +495,9 @@ mt76_txq_schedule_list(struct mt76_dev *dev, enum mt76_txq_id qid) > while (1) { > bool empty = false; > > + if (hwq->stopped) > + break; > + > if (sq->swq_queued >= 4) > break; > > Does it fix the issue you are facing? I'll not be able to test this patch this week. Will have access to the hardware next week. I checeked before, if 'q->queued + (tx_info.nbuf + 1) / 2 >= q->ndesc - 1' is triggered when MT7630E hangs and it is not. But maybe second part of the patch will help. Stanislaw ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] mt76: fix tx hung regression on MT7630E 2019-08-05 12:39 ` Stanislaw Gruszka @ 2019-08-12 11:50 ` Stanislaw Gruszka 0 siblings, 0 replies; 12+ messages in thread From: Stanislaw Gruszka @ 2019-08-12 11:50 UTC (permalink / raw) To: Lorenzo Bianconi; +Cc: linux-wireless, Felix Fietkau, Ryder Lee, Roy Luo On Mon, Aug 05, 2019 at 02:39:16PM +0200, Stanislaw Gruszka wrote: > On Mon, Aug 05, 2019 at 01:27:19PM +0200, Lorenzo Bianconi wrote: > > > ... but I think we have bug when do mt76_txq_schedule_all() in > > > tx_tasklet, because we can schedule on queues that are stoped. > > > So reverting 41634aa8d6db and then optimize by removing tx_tasklet > > > for mmio and remove not needed mt76_txq_schedule_all() calls looks > > > more reasoneble to me. > > > > schedule a stopped queue seems not harmful at a first glance since we do not > > copy pending skbs if we have not enough room in the dma ring. > > mac80211 stop queues for various other reasons than > IEEE80211_QUEUE_STOP_REASON_DRIVER . After looking in more details, we check if queue is stopped in ieee80211_tx_dequeue(). But we do not check that for skb's queued in mtxq->retry_q . > > diff --git a/drivers/net/wireless/mediatek/mt76/tx.c b/drivers/net/wireless/mediatek/mt76/tx.c > > index 5397827668b9..bd2d34c4f326 100644 > > --- a/drivers/net/wireless/mediatek/mt76/tx.c > > +++ b/drivers/net/wireless/mediatek/mt76/tx.c > > @@ -495,6 +495,9 @@ mt76_txq_schedule_list(struct mt76_dev *dev, enum mt76_txq_id qid) > > while (1) { > > bool empty = false; > > > > + if (hwq->stopped) > > + break; > > + > > if (sq->swq_queued >= 4) > > break; > > > > Does it fix the issue you are facing? > > I'll not be able to test this patch this week. Will have access to > the hardware next week. > > I checeked before, if > 'q->queued + (tx_info.nbuf + 1) / 2 >= q->ndesc - 1' is triggered > when MT7630E hangs and it is not. But maybe second part of the patch > will help. Patch did not help. I debugged problem a bit more and issue is related with HW encryption. Using full mac80211 SW encyption by retuning -EOPNOTSUPP mt76x02_set_key() make the problem gone as well. Looks there is some race between setting HW keys and transmitting. Stanislaw ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2019-08-12 11:50 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-07-26 12:10 [RFC] mt76: fix tx hung regression on MT7630E Stanislaw Gruszka 2019-07-29 12:53 ` Stanislaw Gruszka 2019-07-29 14:02 ` Lorenzo Bianconi 2019-07-30 13:54 ` Stanislaw Gruszka 2019-07-30 14:55 ` Lorenzo Bianconi 2019-07-31 8:19 ` Stanislaw Gruszka 2019-07-31 8:51 ` Stanislaw Gruszka 2019-07-31 9:09 ` Lorenzo Bianconi 2019-08-05 10:01 ` Stanislaw Gruszka 2019-08-05 11:27 ` Lorenzo Bianconi 2019-08-05 12:39 ` Stanislaw Gruszka 2019-08-12 11:50 ` Stanislaw Gruszka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).