On 02/11/2020 16:54, Toke Høiland-Jørgensen wrote: > Pali Rohár writes: > >> On Saturday 31 October 2020 13:49:49 Toke Høiland-Jørgensen wrote: >>> "™֟☻̭҇ Ѽ ҉ ®" writes: >>> >>>> On 30/10/2020 15:23, Pali Rohár wrote: >>>>> On Friday 30 October 2020 14:02:22 Toke Høiland-Jørgensen wrote: >>>>>> Pali Rohár writes: >>>>>>> My experience with that WLE900VX card, aardvark driver and aspm code: >>>>>>> >>>>>>> Link training in GEN2 mode for this card succeed only once after reset. >>>>>>> Repeated link retraining fails and it fails even when aardvark is >>>>>>> reconfigured to GEN1 mode. Reset via PERST# signal is required to have >>>>>>> working link training. >>>>>>> >>>>>>> What I did in aardvark driver: Set mode to GEN2, do link training. If >>>>>>> success read "negotiated link speed" from "Link Control Status Register" >>>>>>> (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then >>>>>>> retrain link again (for WLE900VX now it would be at GEN1). After that >>>>>>> card is stable and all future retraining (e.g. from aspm.c) also passes. >>>>>>> >>>>>>> If I do not change aardvark mode from GEN2 to GEN1 the second link >>>>>>> training fails. And if I change mode to GEN1 after this failed link >>>>>>> training then nothing happen, link training do not success. >>>>>>> >>>>>>> So just speculation now... In current setup initialization of card does >>>>>>> one link training at GEN2. Then aspm.c is called which is doing second >>>>>>> link retraining at GEN2. And if it fails then below patch issue third >>>>>>> link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark >>>>>>> then second link retraining must be at GEN1 (not GEN2) to workaround >>>>>>> this issue. >>>>>>> >>>>>>> Bjorn, Toke: what about trying to hack aspm.c code to never do link >>>>>>> retraining at GEN2 speed? And always force GEN1 speed prior link >>>>>>> training? >>>>>> Sounds like a plan. I poked around in aspm.c and must confess to being a >>>>>> bit lost in the soup of registers ;) >>>>>> >>>>>> So if one of you can cook up a patch, that would be most helpful! >>>>> I modified Bjorn's patch, explicitly set tls to 1 and added debug info >>>>> about cls (current link speed, that what is used by aardvark). It is >>>>> untested, I just tried to compile it. >>>>> >>>>> Can try it? >>>>> >>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c >>>>> index 253c30cc1967..f934c0b52f41 100644 >>>>> --- a/drivers/pci/pcie/aspm.c >>>>> +++ b/drivers/pci/pcie/aspm.c >>>>> @@ -206,6 +206,27 @@ static bool pcie_retrain_link(struct pcie_link_state *link) >>>>> unsigned long end_jiffies; >>>>> u16 reg16; >>>>> >>>>> + u32 lnkcap2; >>>>> + u16 lnksta, lnkctl2, cls, tls; >>>>> + >>>>> + pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &lnkcap2); >>>>> + pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &lnksta); >>>>> + pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2); >>>>> + cls = lnksta & PCI_EXP_LNKSTA_CLS; >>>>> + tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS; >>>>> + >>>>> + pci_info(parent, "lnkcap2 %#010x sls %#04x lnksta %#06x cls %#03x lnkctl2 %#06x tls %#03x\n", >>>>> + lnkcap2, (lnkcap2 & 0x3F) >> 1, >>>>> + lnksta, cls, >>>>> + lnkctl2, tls); >>>>> + >>>>> + tls = 1; >>>>> + pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2, >>>>> + PCI_EXP_LNKCTL2_TLS, tls); >>>>> + pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2); >>>>> + pci_info(parent, "lnkctl2 %#010x new tls %#03x\n", >>>>> + lnkctl2, tls); >>>>> + >>>>> pcie_capability_read_word(parent, PCI_EXP_LNKCTL, ®16); >>>>> reg16 |= PCI_EXP_LNKCTL_RL; >>>>> pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16); >>>>> @@ -227,6 +248,8 @@ static bool pcie_retrain_link(struct pcie_link_state *link) >>>>> break; >>>>> msleep(1); >>>>> } while (time_before(jiffies, end_jiffies)); >>>>> + pci_info(parent, "lnksta %#06x new cls %#03x\n", >>>>> + lnksta, (cls & PCI_EXP_LNKSTA_CLS)); >>>>> return !(reg16 & PCI_EXP_LNKSTA_LT); >>>>> } >>>>> >>>> Still exhibiting the BAR update error, run tested with next--20201030 >>> Yup, same for me :( >> So then it is different issue and not similar to aardvark one. >> >> Anyway, was ASPM working on some previous kernel version? Or was it >> always broken on Turris Omnia? > I tried bisecting and couldn't find a commit that worked. And OpenWrt by > default builds with ASPM off, so my best guess is that it was always > broken. > > However, the two other PCI slots *do* work with ASPM on, as long as > they're both occupied when booting. If I only have one card installed > apart from the dodge WLE900, both of them fail... Just to be sure it is not a (particular) mPCIe slot issue on the TO - did you change the device order in the mPCIe slots? On my node: - right slot (next to the CPU) hosts a SSD - centre slot hosts WLE900VX - left slot (over the SIM card slot) hosts the WLE200N2