* mt76x2e hardware restart @ 2019-09-19 16:24 Oleksandr Natalenko 2019-09-19 21:22 ` Oleksandr Natalenko 0 siblings, 1 reply; 11+ messages in thread From: Oleksandr Natalenko @ 2019-09-19 16:24 UTC (permalink / raw) To: linux-mediatek Cc: Felix Fietkau, Lorenzo Bianconi, Lorenzo Bianconi, Stanislaw Gruszka, Ryder Lee, Roy Luo, Kalle Valo, David S. Miller, Matthias Brugger, linux-wireless, netdev, linux-arm-kernel, linux-kernel Hi. Recently, I've got the following card: 01:00.0 Network controller: MEDIATEK Corp. Device 7612 Subsystem: MEDIATEK Corp. Device 7612 Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at 81200000 (64-bit, non-prefetchable) [size=1M] Expansion ROM at 81300000 [disabled] [size=64K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [148] Device Serial Number 00-00-00-00-00-00-00-00 Capabilities: [158] Latency Tolerance Reporting Capabilities: [160] L1 PM Substates Kernel driver in use: mt76x2e Kernel modules: mt76x2e I try to use it as an access point with the following configuration: interface=wlp1s0 driver=nl80211 ssid=someap channel=36 noscan=1 hw_mode=a ieee80211n=1 require_ht=1 ieee80211ac=1 require_vht=1 vht_oper_chwidth=1 vht_capab=[SHORT-GI-80][RX-STBC-1][RX-ANTENNA-PATTERN][TX-ANTENNA-PATTERN] vht_oper_centr_freq_seg0_idx=42 auth_algs=1 wpa=2 wpa_passphrase=somepswd wpa_key_mgmt=WPA-PSK rsn_pairwise=CCMP macaddr_acl=1 accept_mac_file=/etc/hostapd/hostapd.allow ctrl_interface=/run/hostapd ctrl_interface_group=0 country_code=CZ ieee80211d=1 ieee80211h=1 wmm_enabled=1 ht_capab=[GF][HT40+][SHORT-GI-20][SHORT-GI-40][RX-STBC1][DSSS_CCK-40] The hostapd daemon starts, and the AP broadcasts the beacons: zář 19 17:50:04 srv hostapd[13251]: Configuration file: /etc/hostapd/ap_5ghz.conf zář 19 17:50:05 srv hostapd[13251]: wlp1s0: interface state UNINITIALIZED->COUNTRY_UPDATE zář 19 17:50:05 srv hostapd[13251]: Using interface wlp1s0 with hwaddr xx:xx:xx:xx:xx:xx and ssid "someap" zář 19 17:50:05 srv hostapd[13251]: wlp1s0: interface state COUNTRY_UPDATE->ENABLED zář 19 17:50:05 srv hostapd[13251]: wlp1s0: AP-ENABLED zář 19 17:50:17 srv hostapd[13251]: wlp1s0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authenticated zář 19 17:50:17 srv hostapd[13251]: wlp1s0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authenticated zář 19 17:50:17 srv hostapd[13251]: wlp1s0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: associated (aid 1) zář 19 17:50:17 srv hostapd[13251]: wlp1s0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: associated (aid 1) zář 19 17:50:17 srv hostapd[13251]: wlp1s0: AP-STA-CONNECTED xx:xx:xx:xx:xx:xx zář 19 17:50:17 srv hostapd[13251]: wlp1s0: STA xx:xx:xx:xx:xx:xx RADIUS: starting accounting session 07E311195378B570 zář 19 17:50:17 srv hostapd[13251]: wlp1s0: STA xx:xx:xx:xx:xx:xx WPA: pairwise key handshake completed (RSN) zář 19 17:50:17 srv hostapd[13251]: wlp1s0: STA xx:xx:xx:xx:xx:xx RADIUS: starting accounting session 07E311195378B570 zář 19 17:50:17 srv hostapd[13251]: wlp1s0: STA xx:xx:xx:xx:xx:xx WPA: pairwise key handshake completed (RSN) The client is able to see it and connect to it, but after a couple of seconds the following happens on the AP: [ +9,979664] mt76x2e 0000:01:00.0: Firmware Version: 0.0.00 [ +0,000014] mt76x2e 0000:01:00.0: Build: 1 [ +0,000010] mt76x2e 0000:01:00.0: Build Time: 201507311614____ [ +0,018017] mt76x2e 0000:01:00.0: Firmware running! [ +0,001101] ieee80211 phy4: Hardware restart was requested and the AP dies. The client cannot reconnect to it, although hostapd logs show that it tries: zář 19 17:51:15 srv hostapd[13504]: wlp1s0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authenticated zář 19 17:51:15 srv hostapd[13504]: wlp1s0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authenticated zář 19 17:51:19 srv hostapd[13504]: wlp1s0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authenticated zář 19 17:51:19 srv hostapd[13504]: wlp1s0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authenticated zář 19 17:52:54 srv hostapd[13504]: wlp1s0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authenticated zář 19 17:52:54 srv hostapd[13504]: wlp1s0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authenticated zář 19 17:52:59 srv hostapd[13504]: wlp1s0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authenticated zář 19 17:52:59 srv hostapd[13504]: wlp1s0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authenticated zář 19 17:56:14 srv hostapd[13504]: wlp1s0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE) AP stays completely unusable until I remove and modprobe mt76x2e module again. And then everything begins from scratch, and the AP dies within seconds. I observe this on a fresh v5.3 kernel. I haven't tried anything older. The only somewhat relevant thread I was able to found is [1], but it's not clear what's the resolution if any. Could you please suggest how to deal with this issue? Thanks. [1] https://forum.openwrt.org/t/wifi-issues-with-18-06-4-on-mt76/40537 -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mt76x2e hardware restart 2019-09-19 16:24 mt76x2e hardware restart Oleksandr Natalenko @ 2019-09-19 21:22 ` Oleksandr Natalenko 2019-09-20 6:07 ` Oleksandr Natalenko 0 siblings, 1 reply; 11+ messages in thread From: Oleksandr Natalenko @ 2019-09-19 21:22 UTC (permalink / raw) To: linux-mediatek Cc: Felix Fietkau, Lorenzo Bianconi, Lorenzo Bianconi, Stanislaw Gruszka, Ryder Lee, Roy Luo, Kalle Valo, David S. Miller, Matthias Brugger, linux-wireless, netdev, linux-arm-kernel, linux-kernel On 19.09.2019 18:24, Oleksandr Natalenko wrote: > [ +9,979664] mt76x2e 0000:01:00.0: Firmware Version: 0.0.00 > [ +0,000014] mt76x2e 0000:01:00.0: Build: 1 > [ +0,000010] mt76x2e 0000:01:00.0: Build Time: 201507311614____ > [ +0,018017] mt76x2e 0000:01:00.0: Firmware running! > [ +0,001101] ieee80211 phy4: Hardware restart was requested IIUC, this happens due to watchdog. I think the following applies. Watchdog is started here: === mt76x02_util.c 130 void mt76x02_init_device(struct mt76x02_dev *dev) 131 { ... 155 INIT_DELAYED_WORK(&dev->wdt_work, mt76x02_wdt_work); === It checks for TX hang here: === mt76x02_mmio.c 557 void mt76x02_wdt_work(struct work_struct *work) 558 { ... 562 mt76x02_check_tx_hang(dev); === Conditions: === mt76x02_mmio.c 530 static void mt76x02_check_tx_hang(struct mt76x02_dev *dev) 531 { 532 if (mt76x02_tx_hang(dev)) { 533 if (++dev->tx_hang_check >= MT_TX_HANG_TH) 534 goto restart; 535 } else { 536 dev->tx_hang_check = 0; 537 } 538 539 if (dev->mcu_timeout) 540 goto restart; 541 542 return; 543 544 restart: 545 mt76x02_watchdog_reset(dev); === Actual check: === mt76x02_mmio.c 367 static bool mt76x02_tx_hang(struct mt76x02_dev *dev) 368 { 369 u32 dma_idx, prev_dma_idx; 370 struct mt76_queue *q; 371 int i; 372 373 for (i = 0; i < 4; i++) { 374 q = dev->mt76.q_tx[i].q; 375 376 if (!q->queued) 377 continue; 378 379 prev_dma_idx = dev->mt76.tx_dma_idx[i]; 380 dma_idx = readl(&q->regs->dma_idx); 381 dev->mt76.tx_dma_idx[i] = dma_idx; 382 383 if (prev_dma_idx == dma_idx) 384 break; 385 } 386 387 return i < 4; 388 } === (I don't quite understand what it does here; why 4? does each device have 4 queues? maybe, my does not? I guess this is where watchdog is triggered, though, because otherwise I'd see mcu_timeout message like "MCU message %d (seq %d) timed out\n") Once it detects TX hang, the reset is triggered: === mt76x02_mmio.c 446 static void mt76x02_watchdog_reset(struct mt76x02_dev *dev) 447 { ... 485 if (restart) 486 mt76_mcu_restart(dev); === mt76_mcu_restart() is just a define for this series here: === mt76.h 555 #define mt76_mcu_restart(dev, ...) (dev)->mt76.mcu_ops->mcu_restart(&((dev)->mt76)) === Actual OP: === mt76x2/pci_mcu.c 188 int mt76x2_mcu_init(struct mt76x02_dev *dev) 189 { 190 static const struct mt76_mcu_ops mt76x2_mcu_ops = { 191 .mcu_restart = mt76pci_mcu_restart, 192 .mcu_send_msg = mt76x02_mcu_msg_send, 193 }; === This triggers loading the firmware: === mt76x2/pci_mcu.c 168 static int 169 mt76pci_mcu_restart(struct mt76_dev *mdev) 170 { ... 179 ret = mt76pci_load_firmware(dev); === which does the printout I observe: === mt76x2/pci_mcu.c 91 static int 92 mt76pci_load_firmware(struct mt76x02_dev *dev) 93 { ... 156 dev_info(dev->mt76.dev, "Firmware running!\n"); === Too bad it doesn't show the actual watchdog message, IOW, why the reset happens. I guess I will have to insert some pr_infos here and there. Does it make sense? Any ideas why this can happen? More info on the device during boot: === [ +0,333233] mt76x2e 0000:01:00.0: enabling device (0000 -> 0002) [ +0,000571] mt76x2e 0000:01:00.0: ASIC revision: 76120044 [ +0,017806] mt76x2e 0000:01:00.0: ROM patch build: 20141115060606a === -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mt76x2e hardware restart 2019-09-19 21:22 ` Oleksandr Natalenko @ 2019-09-20 6:07 ` Oleksandr Natalenko 2019-10-12 16:50 ` Lorenzo Bianconi 0 siblings, 1 reply; 11+ messages in thread From: Oleksandr Natalenko @ 2019-09-20 6:07 UTC (permalink / raw) To: linux-mediatek Cc: Felix Fietkau, Lorenzo Bianconi, Lorenzo Bianconi, Stanislaw Gruszka, Ryder Lee, Roy Luo, Kalle Valo, David S. Miller, Matthias Brugger, linux-wireless, netdev, linux-arm-kernel, linux-kernel On 19.09.2019 23:22, Oleksandr Natalenko wrote: > It checks for TX hang here: > > === mt76x02_mmio.c > 557 void mt76x02_wdt_work(struct work_struct *work) > 558 { > ... > 562 mt76x02_check_tx_hang(dev); > === I've commented out the watchdog here ^^, and the card is not resetted any more, but similarly it stops working shortly after the first client connects. So, indeed, it must be some hang in the HW, and wdt seems to do a correct job. Is it even debuggable/fixable from the driver? -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mt76x2e hardware restart 2019-09-20 6:07 ` Oleksandr Natalenko @ 2019-10-12 16:50 ` Lorenzo Bianconi 2019-10-13 3:30 ` [PATCH] mt76: mt76x2: disable pcie_aspm by default kbuild test robot 2019-10-15 16:52 ` mt76x2e hardware restart Oleksandr Natalenko 0 siblings, 2 replies; 11+ messages in thread From: Lorenzo Bianconi @ 2019-10-12 16:50 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-mediatek, Felix Fietkau, Lorenzo Bianconi, Stanislaw Gruszka, Ryder Lee, Roy Luo, Kalle Valo, David S. Miller, Matthias Brugger, linux-wireless, netdev, linux-arm-kernel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 4826 bytes --] > On 19.09.2019 23:22, Oleksandr Natalenko wrote: > > It checks for TX hang here: > > > > === mt76x02_mmio.c > > 557 void mt76x02_wdt_work(struct work_struct *work) > > 558 { > > ... > > 562 mt76x02_check_tx_hang(dev); > > === > > I've commented out the watchdog here ^^, and the card is not resetted any > more, but similarly it stops working shortly after the first client > connects. So, indeed, it must be some hang in the HW, and wdt seems to do a > correct job. > > Is it even debuggable/fixable from the driver? Hi Oleksandr, sorry for the delay. Felix and me worked on this issue today. Could you please try if the following patch fixes your issue? Regards, Lorenzo From cf3436c42a297967235a9c9778620c585100529e Mon Sep 17 00:00:00 2001 Message-Id: <cf3436c42a297967235a9c9778620c585100529e.1570897574.git.lorenzo@kernel.org> From: Lorenzo Bianconi <lorenzo@kernel.org> Date: Sat, 12 Oct 2019 17:32:57 +0200 Subject: [PATCH] mt76: mt76x2: disable pcie_aspm by default On same device (e.g. U7612E-H1) PCIE_ASPM causes continues mcu hangs and instability. This patch disable PCIE_ASPM by default. This patch has been successfully tested on U7612E-H1 mini-pice card Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> --- drivers/net/wireless/mediatek/mt76/mmio.c | 48 +++++++++++++++++++ drivers/net/wireless/mediatek/mt76/mt76.h | 1 + .../net/wireless/mediatek/mt76/mt76x2/pci.c | 2 + 3 files changed, 51 insertions(+) diff --git a/drivers/net/wireless/mediatek/mt76/mmio.c b/drivers/net/wireless/mediatek/mt76/mmio.c index 1c974df1fe25..8e1dbc1903f3 100644 --- a/drivers/net/wireless/mediatek/mt76/mmio.c +++ b/drivers/net/wireless/mediatek/mt76/mmio.c @@ -3,6 +3,9 @@ * Copyright (C) 2016 Felix Fietkau <nbd@nbd.name> */ +#include <linux/pci.h> +#include <linux/pci-aspm.h> + #include "mt76.h" #include "trace.h" @@ -78,6 +81,51 @@ void mt76_set_irq_mask(struct mt76_dev *dev, u32 addr, } EXPORT_SYMBOL_GPL(mt76_set_irq_mask); +void mt76_mmio_disable_aspm(struct pci_dev *pdev) +{ + struct pci_dev *parent = pdev->bus->self; + u16 aspm_conf, parent_aspm_conf = 0; + + pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &aspm_conf); + aspm_conf &= PCI_EXP_LNKCTL_ASPMC; + if (parent) { + pcie_capability_read_word(parent, PCI_EXP_LNKCTL, + &parent_aspm_conf); + parent_aspm_conf &= PCI_EXP_LNKCTL_ASPMC; + } + + if (!aspm_conf && (!parent || !parent_aspm_conf)) { + /* aspm already disabled */ + return; + } + + dev_info(&pdev->dev, "disabling ASPM %s %s\n", + (aspm_conf & PCI_EXP_LNKCTL_ASPM_L0S) ? "L0s" : "", + (aspm_conf & PCI_EXP_LNKCTL_ASPM_L1) ? "L1" : ""); + +#ifdef CONFIG_PCIEASPM + pci_disable_link_state(pdev, aspm_conf); + + /* Double-check ASPM control. If not disabled by the above, the + * BIOS is preventing that from happening (or CONFIG_PCIEASPM is + * not enabled); override by writing PCI config space directly. + */ + pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &aspm_conf); + if (!(aspm_conf & PCI_EXP_LNKCTL_ASPMC)) + return; +#endif /* CONFIG_PCIEASPM */ + + /* Both device and parent should have the same ASPM setting. + * Disable ASPM in downstream component first and then upstream. + */ + pcie_capability_clear_word(pdev, PCI_EXP_LNKCTL, aspm_conf); + + if (parent) + pcie_capability_clear_word(parent, PCI_EXP_LNKCTL, + aspm_conf); +} +EXPORT_SYMBOL_GPL(mt76_mmio_disable_aspm); + void mt76_mmio_init(struct mt76_dev *dev, void __iomem *regs) { static const struct mt76_bus_ops mt76_mmio_ops = { diff --git a/drivers/net/wireless/mediatek/mt76/mt76.h b/drivers/net/wireless/mediatek/mt76/mt76.h index 8bcc7f21e83c..e95a5893f93b 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76.h +++ b/drivers/net/wireless/mediatek/mt76/mt76.h @@ -596,6 +596,7 @@ bool __mt76_poll_msec(struct mt76_dev *dev, u32 offset, u32 mask, u32 val, #define mt76_poll_msec(dev, ...) __mt76_poll_msec(&((dev)->mt76), __VA_ARGS__) void mt76_mmio_init(struct mt76_dev *dev, void __iomem *regs); +void mt76_mmio_disable_aspm(struct pci_dev *pdev); static inline u16 mt76_chip(struct mt76_dev *dev) { diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2/pci.c b/drivers/net/wireless/mediatek/mt76/mt76x2/pci.c index 6253ec5fbd72..06fb80163c8e 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x2/pci.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x2/pci.c @@ -83,6 +83,8 @@ mt76pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) /* RG_SSUSB_CDR_BR_PE1D = 0x3 */ mt76_rmw_field(dev, 0x15c58, 0x3 << 6, 0x3); + mt76_mmio_disable_aspm(pdev); + return 0; error: -- 2.21.0 > > -- > Oleksandr Natalenko (post-factum) [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] mt76: mt76x2: disable pcie_aspm by default 2019-10-12 16:50 ` Lorenzo Bianconi @ 2019-10-13 3:30 ` kbuild test robot 2019-10-15 16:52 ` mt76x2e hardware restart Oleksandr Natalenko 1 sibling, 0 replies; 11+ messages in thread From: kbuild test robot @ 2019-10-13 3:30 UTC (permalink / raw) To: Lorenzo Bianconi Cc: kbuild-all, Oleksandr Natalenko, linux-mediatek, Felix Fietkau, Lorenzo Bianconi, Stanislaw Gruszka, Ryder Lee, Roy Luo, Kalle Valo, David S. Miller, Matthias Brugger, linux-wireless, netdev, linux-arm-kernel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1385 bytes --] Hi Lorenzo, I love your patch! Yet something to improve: [auto build test ERROR on wireless-drivers-next/master] [cannot apply to v5.4-rc2 next-20191011] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Lorenzo-Bianconi/mt76-mt76x2-disable-pcie_aspm-by-default/20191013-093134 base: https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git master config: x86_64-allyesconfig (attached as .config) compiler: gcc-7 (Debian 7.4.0-13) 7.4.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 If you fix the issue, kindly add following tag Reported-by: kbuild test robot <lkp@intel.com> All errors (new ones prefixed by >>): >> drivers/net/wireless/mediatek/mt76/mmio.c:7:10: fatal error: linux/pci-aspm.h: No such file or directory #include <linux/pci-aspm.h> ^~~~~~~~~~~~~~~~~~ compilation terminated. vim +7 drivers/net/wireless/mediatek/mt76/mmio.c > 7 #include <linux/pci-aspm.h> 8 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 70190 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mt76x2e hardware restart 2019-10-12 16:50 ` Lorenzo Bianconi 2019-10-13 3:30 ` [PATCH] mt76: mt76x2: disable pcie_aspm by default kbuild test robot @ 2019-10-15 16:52 ` Oleksandr Natalenko 2019-10-16 16:31 ` Oleksandr Natalenko 1 sibling, 1 reply; 11+ messages in thread From: Oleksandr Natalenko @ 2019-10-15 16:52 UTC (permalink / raw) To: Lorenzo Bianconi Cc: linux-mediatek, Felix Fietkau, Lorenzo Bianconi, Stanislaw Gruszka, Ryder Lee, Roy Luo, Kalle Valo, David S. Miller, Matthias Brugger, linux-wireless, netdev, linux-arm-kernel, linux-kernel Hey. On 12.10.2019 18:50, Lorenzo Bianconi wrote: > sorry for the delay. Felix and me worked on this issue today. Could you > please > try if the following patch fixes your issue? Thanks for the answer and the IRC discussion. As agreed I've applied [1] and [2], and have just swapped the card to try it again. So far, it works fine in 5 GHz band in 802.11ac mode as an AP. I'll give it more load with my phone over evening, and we can discuss what to do next (if needed) tomorrow again. Or feel free to drop me an email today. Thanks for your efforts. [1] https://github.com/LorenzoBianconi/wireless-drivers-next/commit/cf3436c42a297967235a9c9778620c585100529e.patch [2] https://github.com/LorenzoBianconi/wireless-drivers-next/commit/aad256eb62620f9646d39c1aa69234f50c89eed8.patch -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mt76x2e hardware restart 2019-10-15 16:52 ` mt76x2e hardware restart Oleksandr Natalenko @ 2019-10-16 16:31 ` Oleksandr Natalenko 2019-10-16 16:38 ` Lorenzo Bianconi 0 siblings, 1 reply; 11+ messages in thread From: Oleksandr Natalenko @ 2019-10-16 16:31 UTC (permalink / raw) To: Lorenzo Bianconi Cc: linux-mediatek, Felix Fietkau, Lorenzo Bianconi, Stanislaw Gruszka, Ryder Lee, Roy Luo, Kalle Valo, David S. Miller, Matthias Brugger, linux-wireless, netdev, linux-arm-kernel, linux-kernel Hello. On 15.10.2019 18:52, Oleksandr Natalenko wrote: > Thanks for the answer and the IRC discussion. As agreed I've applied > [1] and [2], and have just swapped the card to try it again. So far, > it works fine in 5 GHz band in 802.11ac mode as an AP. > > I'll give it more load with my phone over evening, and we can discuss > what to do next (if needed) tomorrow again. Or feel free to drop me an > email today. > > Thanks for your efforts. > > [1] > https://github.com/LorenzoBianconi/wireless-drivers-next/commit/cf3436c42a297967235a9c9778620c585100529e.patch > [2] > https://github.com/LorenzoBianconi/wireless-drivers-next/commit/aad256eb62620f9646d39c1aa69234f50c89eed8.patch As agreed, here are iperf3 results, AP to STA distance is 2 meters. Client sends, TCP: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 70.4 MBytes 59.0 Mbits/sec 3800 sender [ 5] 0.00-10.03 sec 70.0 MBytes 58.6 Mbits/sec receiver Client receives, TCP: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.06 sec 196 MBytes 163 Mbits/sec 3081 sender [ 5] 0.00-10.01 sec 191 MBytes 160 Mbits/sec receiver Client sends, UDP, 128 streams: [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [SUM] 0.00-10.00 sec 160 MBytes 134 Mbits/sec 0.000 ms 0/115894 (0%) sender [SUM] 0.00-10.01 sec 160 MBytes 134 Mbits/sec 0.347 ms 0/115892 (0%) receiver Client receives, UDP, 128 streams: [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [SUM] 0.00-10.01 sec 119 MBytes 99.4 Mbits/sec 0.000 ms 0/85888 (0%) sender [SUM] 0.00-10.00 sec 119 MBytes 99.5 Mbits/sec 0.877 ms 0/85888 (0%) receiver Given the HW is not the most powerful, the key point here is that nothing crashed after doing these tests. -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mt76x2e hardware restart 2019-10-16 16:31 ` Oleksandr Natalenko @ 2019-10-16 16:38 ` Lorenzo Bianconi 2019-10-23 8:50 ` Lorenzo Bianconi 0 siblings, 1 reply; 11+ messages in thread From: Lorenzo Bianconi @ 2019-10-16 16:38 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-mediatek, Felix Fietkau, Lorenzo Bianconi, Stanislaw Gruszka, Ryder Lee, Roy Luo, Kalle Valo, David S. Miller, Matthias Brugger, linux-wireless, netdev, linux-arm-kernel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2362 bytes --] > Hello. > > On 15.10.2019 18:52, Oleksandr Natalenko wrote: > > Thanks for the answer and the IRC discussion. As agreed I've applied > > [1] and [2], and have just swapped the card to try it again. So far, > > it works fine in 5 GHz band in 802.11ac mode as an AP. > > > > I'll give it more load with my phone over evening, and we can discuss > > what to do next (if needed) tomorrow again. Or feel free to drop me an > > email today. > > > > Thanks for your efforts. > > > > [1] > > https://github.com/LorenzoBianconi/wireless-drivers-next/commit/cf3436c42a297967235a9c9778620c585100529e.patch > > [2] > > https://github.com/LorenzoBianconi/wireless-drivers-next/commit/aad256eb62620f9646d39c1aa69234f50c89eed8.patch > > As agreed, here are iperf3 results, AP to STA distance is 2 meters. > > Client sends, TCP: > > [ ID] Interval Transfer Bitrate Retr > [ 5] 0.00-10.00 sec 70.4 MBytes 59.0 Mbits/sec 3800 > sender > [ 5] 0.00-10.03 sec 70.0 MBytes 58.6 Mbits/sec > receiver > > Client receives, TCP: > > [ ID] Interval Transfer Bitrate Retr > [ 5] 0.00-10.06 sec 196 MBytes 163 Mbits/sec 3081 > sender > [ 5] 0.00-10.01 sec 191 MBytes 160 Mbits/sec > receiver > > Client sends, UDP, 128 streams: > > [ ID] Interval Transfer Bitrate Jitter Lost/Total > Datagrams > [SUM] 0.00-10.00 sec 160 MBytes 134 Mbits/sec 0.000 ms 0/115894 > (0%) sender > [SUM] 0.00-10.01 sec 160 MBytes 134 Mbits/sec 0.347 ms 0/115892 > (0%) receiver > > Client receives, UDP, 128 streams: > > [ ID] Interval Transfer Bitrate Jitter Lost/Total > Datagrams > [SUM] 0.00-10.01 sec 119 MBytes 99.4 Mbits/sec 0.000 ms 0/85888 (0%) > sender > [SUM] 0.00-10.00 sec 119 MBytes 99.5 Mbits/sec 0.877 ms 0/85888 (0%) > receiver > > Given the HW is not the most powerful, the key point here is that nothing > crashed after doing these tests. Hi Oleksandr, thx a lot for testing these 2 patches. Now we need to understand why the chip hangs if we enable scatter gather dma transfer on x86 while it is working fine on multiple mips/arm devices (patch 2/2 just disable it for debugging). Regards, Lorenzo > > -- > Oleksandr Natalenko (post-factum) [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mt76x2e hardware restart 2019-10-16 16:38 ` Lorenzo Bianconi @ 2019-10-23 8:50 ` Lorenzo Bianconi 2019-10-23 16:25 ` Oleksandr Natalenko 2019-10-24 9:43 ` Daniel Golle 0 siblings, 2 replies; 11+ messages in thread From: Lorenzo Bianconi @ 2019-10-23 8:50 UTC (permalink / raw) To: Lorenzo Bianconi Cc: Oleksandr Natalenko, linux-mediatek, Felix Fietkau, Lorenzo Bianconi, Stanislaw Gruszka, Ryder Lee, Roy Luo, Kalle Valo, David S. Miller, Matthias Brugger, linux-wireless, netdev, linux-arm-kernel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 6385 bytes --] > > Hello. > > > > On 15.10.2019 18:52, Oleksandr Natalenko wrote: > > > Thanks for the answer and the IRC discussion. As agreed I've applied > > > [1] and [2], and have just swapped the card to try it again. So far, > > > it works fine in 5 GHz band in 802.11ac mode as an AP. > > > > > > I'll give it more load with my phone over evening, and we can discuss > > > what to do next (if needed) tomorrow again. Or feel free to drop me an > > > email today. > > > > > > Thanks for your efforts. > > > > > > [1] > > > https://github.com/LorenzoBianconi/wireless-drivers-next/commit/cf3436c42a297967235a9c9778620c585100529e.patch > > > [2] > > > https://github.com/LorenzoBianconi/wireless-drivers-next/commit/aad256eb62620f9646d39c1aa69234f50c89eed8.patch > > > > As agreed, here are iperf3 results, AP to STA distance is 2 meters. > > > > Client sends, TCP: > > > > [ ID] Interval Transfer Bitrate Retr > > [ 5] 0.00-10.00 sec 70.4 MBytes 59.0 Mbits/sec 3800 > > sender > > [ 5] 0.00-10.03 sec 70.0 MBytes 58.6 Mbits/sec > > receiver > > > > Client receives, TCP: > > > > [ ID] Interval Transfer Bitrate Retr > > [ 5] 0.00-10.06 sec 196 MBytes 163 Mbits/sec 3081 > > sender > > [ 5] 0.00-10.01 sec 191 MBytes 160 Mbits/sec > > receiver > > > > Client sends, UDP, 128 streams: > > > > [ ID] Interval Transfer Bitrate Jitter Lost/Total > > Datagrams > > [SUM] 0.00-10.00 sec 160 MBytes 134 Mbits/sec 0.000 ms 0/115894 > > (0%) sender > > [SUM] 0.00-10.01 sec 160 MBytes 134 Mbits/sec 0.347 ms 0/115892 > > (0%) receiver > > > > Client receives, UDP, 128 streams: > > > > [ ID] Interval Transfer Bitrate Jitter Lost/Total > > Datagrams > > [SUM] 0.00-10.01 sec 119 MBytes 99.4 Mbits/sec 0.000 ms 0/85888 (0%) > > sender > > [SUM] 0.00-10.00 sec 119 MBytes 99.5 Mbits/sec 0.877 ms 0/85888 (0%) > > receiver > > > > Given the HW is not the most powerful, the key point here is that nothing > > crashed after doing these tests. > > Hi Oleksandr, > > thx a lot for testing these 2 patches. Now we need to understand why the chip > hangs if we enable scatter gather dma transfer on x86 while it is working fine > on multiple mips/arm devices (patch 2/2 just disable it for debugging). Hi Oleksandr, I think I spotted the SG issue on mt76x2e. Could you please: - keep pcie_aspm patch I sent - remove the debug patch where I disabled TX Scatter-Gather on mt76x2e - apply the following patch Regards, Lorenzo mt76: dma: fix buffer unmap with non-linear skbs mt76 dma layer is supposed to unmap skb data buffers while keep txwi mapped on hw dma ring. At the moment mt76 wrongly unmap txwi or does not unmap data fragments in even positions for non-linear skbs. This issue may result in hw hangs with A-MSUD if the system relies on IOMMU or SWIOTLB. Fix this behaviour marking first and last queue entries introducing MT_QUEUE_ENTRY_FIRST and MT_QUEUE_ENTRY_LAST flags and properly unmap data fragments Fixes: 17f1de56df05 ("mt76: add common code shared between multiple chipsets") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> --- drivers/net/wireless/mediatek/mt76/dma.c | 33 +++++++++++++---------- drivers/net/wireless/mediatek/mt76/mt76.h | 3 +++ 2 files changed, 22 insertions(+), 14 deletions(-) diff --git a/drivers/net/wireless/mediatek/mt76/dma.c b/drivers/net/wireless/mediatek/mt76/dma.c index 4da7cffbab29..a3026a0ca8c5 100644 --- a/drivers/net/wireless/mediatek/mt76/dma.c +++ b/drivers/net/wireless/mediatek/mt76/dma.c @@ -54,7 +54,7 @@ mt76_dma_add_buf(struct mt76_dev *dev, struct mt76_queue *q, int i, idx = -1; if (txwi) - q->entry[q->head].txwi = DMA_DUMMY_DATA; + q->entry[q->head].flags = MT_QUEUE_ENTRY_FIRST; for (i = 0; i < nbufs; i += 2, buf += 2) { u32 buf0 = buf[0].addr, buf1 = 0; @@ -83,6 +83,7 @@ mt76_dma_add_buf(struct mt76_dev *dev, struct mt76_queue *q, q->queued++; } + q->entry[idx].flags |= MT_QUEUE_ENTRY_LAST; q->entry[idx].txwi = txwi; q->entry[idx].skb = skb; @@ -93,27 +94,31 @@ static void mt76_dma_tx_cleanup_idx(struct mt76_dev *dev, struct mt76_queue *q, int idx, struct mt76_queue_entry *prev_e) { + __le32 addr, __ctrl = READ_ONCE(q->desc[idx].ctrl); struct mt76_queue_entry *e = &q->entry[idx]; - __le32 __ctrl = READ_ONCE(q->desc[idx].ctrl); - u32 ctrl = le32_to_cpu(__ctrl); - - if (!e->txwi || !e->skb) { - __le32 addr = READ_ONCE(q->desc[idx].buf0); - u32 len = FIELD_GET(MT_DMA_CTL_SD_LEN0, ctrl); + u32 len, ctrl = le32_to_cpu(__ctrl); + if (e->flags & MT_QUEUE_ENTRY_FIRST) { + addr = READ_ONCE(q->desc[idx].buf1); + len = FIELD_GET(MT_DMA_CTL_SD_LEN1, ctrl); dma_unmap_single(dev->dev, le32_to_cpu(addr), len, DMA_TO_DEVICE); - } - - if (!(ctrl & MT_DMA_CTL_LAST_SEC0)) { - __le32 addr = READ_ONCE(q->desc[idx].buf1); - u32 len = FIELD_GET(MT_DMA_CTL_SD_LEN1, ctrl); - + } else { + addr = READ_ONCE(q->desc[idx].buf0); + len = FIELD_GET(MT_DMA_CTL_SD_LEN0, ctrl); dma_unmap_single(dev->dev, le32_to_cpu(addr), len, DMA_TO_DEVICE); + if (e->txwi && + ((ctrl & MT_DMA_CTL_LAST_SEC1) || + !(e->flags & MT_QUEUE_ENTRY_LAST))) { + addr = READ_ONCE(q->desc[idx].buf1); + len = FIELD_GET(MT_DMA_CTL_SD_LEN1, ctrl); + dma_unmap_single(dev->dev, le32_to_cpu(addr), len, + DMA_TO_DEVICE); + } } - if (e->txwi == DMA_DUMMY_DATA) + if (!(e->flags & MT_QUEUE_ENTRY_LAST)) e->txwi = NULL; if (e->skb == DMA_DUMMY_DATA) diff --git a/drivers/net/wireless/mediatek/mt76/mt76.h b/drivers/net/wireless/mediatek/mt76/mt76.h index e95a5893f93b..b0ac82b31789 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76.h +++ b/drivers/net/wireless/mediatek/mt76/mt76.h @@ -83,6 +83,8 @@ struct mt76_tx_info { u32 info; }; +#define MT_QUEUE_ENTRY_FIRST BIT(0) +#define MT_QUEUE_ENTRY_LAST BIT(1) struct mt76_queue_entry { union { void *buf; @@ -95,6 +97,7 @@ struct mt76_queue_entry { enum mt76_txq_id qid; bool schedule; bool done; + u32 flags; }; struct mt76_queue_regs { -- 2.21.0 > > Regards, > Lorenzo > > > > > -- > > Oleksandr Natalenko (post-factum) [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: mt76x2e hardware restart 2019-10-23 8:50 ` Lorenzo Bianconi @ 2019-10-23 16:25 ` Oleksandr Natalenko 2019-10-24 9:43 ` Daniel Golle 1 sibling, 0 replies; 11+ messages in thread From: Oleksandr Natalenko @ 2019-10-23 16:25 UTC (permalink / raw) To: Lorenzo Bianconi Cc: Lorenzo Bianconi, linux-mediatek, Felix Fietkau, Lorenzo Bianconi, Stanislaw Gruszka, Ryder Lee, Roy Luo, Kalle Valo, David S. Miller, Matthias Brugger, linux-wireless, netdev, linux-arm-kernel, linux-kernel Hi. On 23.10.2019 10:50, Lorenzo Bianconi wrote: > I think I spotted the SG issue on mt76x2e. Could you please: > - keep pcie_aspm patch I sent > - remove the debug patch where I disabled TX Scatter-Gather on mt76x2e > - apply the following patch Thanks for the patch. So far so good, I was able to start AP, connect to it and conduct a couple of simple speed tests. I'll use it more today and will let you know in case something breaks. -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mt76x2e hardware restart 2019-10-23 8:50 ` Lorenzo Bianconi 2019-10-23 16:25 ` Oleksandr Natalenko @ 2019-10-24 9:43 ` Daniel Golle 1 sibling, 0 replies; 11+ messages in thread From: Daniel Golle @ 2019-10-24 9:43 UTC (permalink / raw) To: Lorenzo Bianconi Cc: Lorenzo Bianconi, Ryder Lee, Stanislaw Gruszka, netdev, linux-wireless, Oleksandr Natalenko, linux-kernel, Matthias Brugger, linux-mediatek, linux-arm-kernel, Roy Luo, Lorenzo Bianconi, David S. Miller, Kalle Valo, Felix Fietkau Hi Lorenzo, On Wed, Oct 23, 2019 at 10:50:39AM +0200, Lorenzo Bianconi wrote: > ... > I think I spotted the SG issue on mt76x2e. Could you please: > - keep pcie_aspm patch I sent > - remove the debug patch where I disabled TX Scatter-Gather on mt76x2e > - apply the following patch With those two patches I'm for the first time able to use the U7612 mPCIe module on my x86 Laptop in a more or less stable way. In now 10 hours uptime I had one serious hickup of [35790.926455] mt76x2e 0000:02:00.0: MCU message 31 (seq 11) timed out [35790.991227] mt76x2e 0000:02:00.0: Firmware Version: 0.0.00 [35790.991231] mt76x2e 0000:02:00.0: Build: 1 [35790.991233] mt76x2e 0000:02:00.0: Build Time: 201507311614____ [35791.016460] mt76x2e 0000:02:00.0: Firmware running! [35791.017153] ieee80211 phy0: Hardware restart was requested ...(repeating about 10 times, every 20 seconds) and one less serious, all related to MCU message 31. However, unlike before, the hardware actually recovers and works quite well most of the time. Thank you!!! Cheers Daniel > > Regards, > Lorenzo > > mt76: dma: fix buffer unmap with non-linear skbs > > mt76 dma layer is supposed to unmap skb data buffers while keep txwi mapped > on hw dma ring. At the moment mt76 wrongly unmap txwi or does not unmap data > fragments in even positions for non-linear skbs. This issue may result in hw > hangs with A-MSUD if the system relies on IOMMU or SWIOTLB. > Fix this behaviour marking first and last queue entries introducing > MT_QUEUE_ENTRY_FIRST and MT_QUEUE_ENTRY_LAST flags and properly unmap > data fragments > > Fixes: 17f1de56df05 ("mt76: add common code shared between multiple chipsets") > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> > --- > drivers/net/wireless/mediatek/mt76/dma.c | 33 +++++++++++++---------- > drivers/net/wireless/mediatek/mt76/mt76.h | 3 +++ > 2 files changed, 22 insertions(+), 14 deletions(-) > > diff --git a/drivers/net/wireless/mediatek/mt76/dma.c b/drivers/net/wireless/mediatek/mt76/dma.c > index 4da7cffbab29..a3026a0ca8c5 100644 > --- a/drivers/net/wireless/mediatek/mt76/dma.c > +++ b/drivers/net/wireless/mediatek/mt76/dma.c > @@ -54,7 +54,7 @@ mt76_dma_add_buf(struct mt76_dev *dev, struct mt76_queue *q, > int i, idx = -1; > > if (txwi) > - q->entry[q->head].txwi = DMA_DUMMY_DATA; > + q->entry[q->head].flags = MT_QUEUE_ENTRY_FIRST; > > for (i = 0; i < nbufs; i += 2, buf += 2) { > u32 buf0 = buf[0].addr, buf1 = 0; > @@ -83,6 +83,7 @@ mt76_dma_add_buf(struct mt76_dev *dev, struct mt76_queue *q, > q->queued++; > } > > + q->entry[idx].flags |= MT_QUEUE_ENTRY_LAST; > q->entry[idx].txwi = txwi; > q->entry[idx].skb = skb; > > @@ -93,27 +94,31 @@ static void > mt76_dma_tx_cleanup_idx(struct mt76_dev *dev, struct mt76_queue *q, int idx, > struct mt76_queue_entry *prev_e) > { > + __le32 addr, __ctrl = READ_ONCE(q->desc[idx].ctrl); > struct mt76_queue_entry *e = &q->entry[idx]; > - __le32 __ctrl = READ_ONCE(q->desc[idx].ctrl); > - u32 ctrl = le32_to_cpu(__ctrl); > - > - if (!e->txwi || !e->skb) { > - __le32 addr = READ_ONCE(q->desc[idx].buf0); > - u32 len = FIELD_GET(MT_DMA_CTL_SD_LEN0, ctrl); > + u32 len, ctrl = le32_to_cpu(__ctrl); > > + if (e->flags & MT_QUEUE_ENTRY_FIRST) { > + addr = READ_ONCE(q->desc[idx].buf1); > + len = FIELD_GET(MT_DMA_CTL_SD_LEN1, ctrl); > dma_unmap_single(dev->dev, le32_to_cpu(addr), len, > DMA_TO_DEVICE); > - } > - > - if (!(ctrl & MT_DMA_CTL_LAST_SEC0)) { > - __le32 addr = READ_ONCE(q->desc[idx].buf1); > - u32 len = FIELD_GET(MT_DMA_CTL_SD_LEN1, ctrl); > - > + } else { > + addr = READ_ONCE(q->desc[idx].buf0); > + len = FIELD_GET(MT_DMA_CTL_SD_LEN0, ctrl); > dma_unmap_single(dev->dev, le32_to_cpu(addr), len, > DMA_TO_DEVICE); > + if (e->txwi && > + ((ctrl & MT_DMA_CTL_LAST_SEC1) || > + !(e->flags & MT_QUEUE_ENTRY_LAST))) { > + addr = READ_ONCE(q->desc[idx].buf1); > + len = FIELD_GET(MT_DMA_CTL_SD_LEN1, ctrl); > + dma_unmap_single(dev->dev, le32_to_cpu(addr), len, > + DMA_TO_DEVICE); > + } > } > > - if (e->txwi == DMA_DUMMY_DATA) > + if (!(e->flags & MT_QUEUE_ENTRY_LAST)) > e->txwi = NULL; > > if (e->skb == DMA_DUMMY_DATA) > diff --git a/drivers/net/wireless/mediatek/mt76/mt76.h b/drivers/net/wireless/mediatek/mt76/mt76.h > index e95a5893f93b..b0ac82b31789 100644 > --- a/drivers/net/wireless/mediatek/mt76/mt76.h > +++ b/drivers/net/wireless/mediatek/mt76/mt76.h > @@ -83,6 +83,8 @@ struct mt76_tx_info { > u32 info; > }; > > +#define MT_QUEUE_ENTRY_FIRST BIT(0) > +#define MT_QUEUE_ENTRY_LAST BIT(1) > struct mt76_queue_entry { > union { > void *buf; > @@ -95,6 +97,7 @@ struct mt76_queue_entry { > enum mt76_txq_id qid; > bool schedule; > bool done; > + u32 flags; > }; > > struct mt76_queue_regs { > -- > 2.21.0 > > > > > Regards, > > Lorenzo > > > > > > > > -- > > > Oleksandr Natalenko (post-factum) > > > _______________________________________________ > Linux-mediatek mailing list > Linux-mediatek@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-mediatek ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2019-10-24 10:08 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-09-19 16:24 mt76x2e hardware restart Oleksandr Natalenko 2019-09-19 21:22 ` Oleksandr Natalenko 2019-09-20 6:07 ` Oleksandr Natalenko 2019-10-12 16:50 ` Lorenzo Bianconi 2019-10-13 3:30 ` [PATCH] mt76: mt76x2: disable pcie_aspm by default kbuild test robot 2019-10-15 16:52 ` mt76x2e hardware restart Oleksandr Natalenko 2019-10-16 16:31 ` Oleksandr Natalenko 2019-10-16 16:38 ` Lorenzo Bianconi 2019-10-23 8:50 ` Lorenzo Bianconi 2019-10-23 16:25 ` Oleksandr Natalenko 2019-10-24 9:43 ` Daniel Golle
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).