Hi, On Tue, 29 Sep 2020 22:08:56 +0200 Hans de Goede wrote: > Hi, > > On 9/29/20 9:07 PM, Petr Tesarik wrote: > > Hi Heiner (and now also Hans)! > > > > @Hans: I'm adding you to this conversation, because you're the author > > of commit b1e3454d39f99, which seems to break the r8169 driver on a > > laptop of mine. > > Erm, no, as you bi-sected yourself already commit 9f0b54cd167219 > ("r8169: move switching optional clock on/off to pll power functions") > > Broke your laptop, commit b1e3454d39f99 ("clk: x86: add "ether_clk" alias > for Bay Trail / Cherry Trail") is about 18 months older. Well, I'm no expert on power management, nor clock drivers or network drivers. I'm an end user, so I had to accept Hans' claim that the pointer should be NULL... >[...] > > That's because this kernel is built with CONFIG_PMC_ATOM=y, and looking > > at the platform initialization code, the "ether_clk" alias is created > > unconditionally. Hans already added. > > Petr, unconditionally is not really correct here, just as claiming > above that my commit broke things was not really correct either. I'm sorry if you feel offended. By "unconditionally" I mean that this clock cannot be disabled by the user (e.g. with a kernel command line option). >[...] > So with that all clarified lets try to see if we can figure out > *why* this is actually happening. > > Petr, can you describe your hardware in some more detail, > in the bits quoted when you first Cc-ed me there is not that > much detail. What is the vendor and model of your laptop? Seeing that Heiner has already agreed to revert his patch, I'm not sure this is still relevant, but here I go. This is an ASUS X453MA notebook with a dual Intel(R) Celeron(R) CPU N2840 @ 2.16GHz, so it's totally expected that my system uses the Intel Atom Processor E3800 Series drivers. > Looking closer at commit 9f0b54cd167219 > ("r8169: move switching optional clock on/off to pll power functions") > I notice that the functions which now enable/disable the clock: > rtl_pll_power_up() and rtl_pll_power_down() > > Only run when the interface is up during suspend/resume. > Petr, I guess the laptop is not connected to ethernet when you > hibernate it? The laptop is always connected to an active 1000G Ethernet switch port. However, there seem to be two bugs related to the Realtek driver: one causes a hard hang on driver load (that's the one we're handling here), the other is related to incorrect Rx queue initialization after resume. They may or may not be related. > That means that on resume the clock will not be re-enabled. > > This is a subtle but important change and I believe that > this is what is breaking things. I guess that the PLL which > rtl_pll_power_up() / rtl_pll_power_down() controls is only > used for ethernet-timing. But the external clock controlled > through pt->clk is a replacement for using an external > crystal with the r8169 chip. So with it disabled, the entire > chip does not have a clock and is essentially dead. > It can then e.g. not respond to any pci-e reads/writes done > to it. ...which would nicely explain the hang, indeed. Thanks, Petr T > So I believe that the proper fix for this is to revert > commit 9f0b54cd167219 > ("r8169: move switching optional clock on/off to pll power functions") > > As that caused the whole chip's clock to be left off after > a suspend/resume while the interface is down. > > Also some remarks about this while I'm being a bit grumpy about > all this anyways (sorry): > > 1. 9f0b54cd167219 ("r8169: move switching optional clock on/off > to pll power functions") commit's message does not seem to really > explain why this change was made... > > 2. If a git blame would have been done to find the commit adding > the clk support: commit c2f6f3ee7f22 ("r8169: Get and enable optional > ether_clk clock") then you could have known that the clk in question > is an external clock for the entire chip, the commit message pretty > clearly states this (although "the entire" part is implied only) : > > "On some boards a platform clock is used as clock for the r8169 chip, > this commit adds support for getting and enabling this clock (assuming > it has an "ether_clk" alias set on it). > > This is related to commit d31fd43c0f9a ("clk: x86: Do not gate clocks > enabled by the firmware") which is a previous attempt to fix this for > some x86 boards, but this causes all Cherry Trail SoC using boards to > not reach there lowest power states when suspending. > > This commit (together with an atom-pmc-clk driver commit adding the > alias) fixes things properly by making the r8169 get the clock and > enable it when it needs it." > > Regards, > > Hans >