All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Clayton <chris2553@googlemail.com>
To: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Cc: Heiner Kallweit <hkallweit1@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	Azat Khuzhin <a3at.mail@gmail.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Realtek linux nic maintainers <nic_swsd@realtek.com>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)
Date: Wed, 10 Oct 2018 23:30:55 +0100	[thread overview]
Message-ID: <c1b09d33-f2f0-337e-6e85-0ec035cbf179@googlemail.com> (raw)
In-Reply-To: <9d99060a-db1d-7177-3041-e407b131548e@maciej.szmigiero.name>

OK, right kernel/module used this time. Please see findings below.

On 10/10/2018 01:24, Maciej S. Szmigiero wrote:
> On 09.10.2018 22:36, Heiner Kallweit wrote:
>> On 09.10.2018 16:40, Chris Clayton wrote:
>>> Thanks to Maciej and Heiner for their replies.
>>>
>>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
>>>> On 07.10.2018 21:36, Chris Clayton wrote:
>>>>> Hi again,
>>>>>
>>>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but tried it anyway. I can confirm that the
>>>>> regression is still present and my network still fails when, after a resume from suspend (to ram or disk), I open my
>>>>> browser or my mail client. In both those cases the failure is almost immediate - e.g. my home page doesn't get displayed
>>>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so quickly but the reported time increases from
>>>>> 14-15ms to more than 1000ms.
>>>>
>>>> You can try comparing chip registers (ethtool -d eth0) in the working
>>>> state (before a suspend) and in the broken state (after a resume).
>>>> Maybe there will be some obvious in the difference.
>>>>
>>>> The same goes for the PCI configuration (lspci -d :8168 -vv).
>>>>
>>> Maciej suggested comparing the output from lspci -vv for the ethernet device. They are identical.
>>>
>>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre and post suspend. Again, they are identical.
>>> Heiner specifically suggested looking at the RxConfig. The value of that is 0x0002870e both pre and post suspend.
>>>
>> Hmm, this is very weird, especially taking into account that in your original
>> report you state that removing the call to rtl_init_rxcfg() from rtl_hw_start()
>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
>> register values seem to be the same before and after resume. So how can the
>> chip behave differently?
>> So far my best guess is that some chip quirk causes it to accept writes to
>> register RxConfig, but to misinterpret or ignore the written value.
>> So far your report is the only one (affecting RTL8411), but we don't know
>> whether other chip versions are affected too.
> 
> Also, it is interesting that even if one removes a call to
> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get
> written to moments later by rtl_set_rx_mode().
> 
> The only chip accesses in the meantime seems to be a write to TxConfig by
> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes
> to MAR0 earlier in rtl_set_rx_mode().
> 
> My proposals are:
> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);"
> in rtl_hw_start().
> Maybe the chip does not like sometimes that RxConfig is written before
> TxConfig.
> 

This change made no difference. Networking still dies if I open a browser or leave ping running long enough.

> 2) Check the original value of RxConfig (after a resume) before
> rtl_init_rxcfg() overwrites it (compile tested only):
> --- r8169.c.ori
> +++ r8169.c
> @@ -5155,6 +5155,9 @@
>  	/* Initially a 10 us delay. Turned it into a PCI commit. - FR */
>  	RTL_R8(tp, IntrMask);
>  	RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
> +
> +	pr_notice("RxConfig before init was %.8x\n",
> +		(unsigned int)RTL_R32(tp, RxConfig));
>  	rtl_init_rxcfg(tp);
>  	rtl_set_tx_config_registers(tp);
>  
> 
> This should be the value that you got when you removed the call to
> rtl_init_rxcfg() for testing.
> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg()
> writes (under the "default:" label for your NIC model).

This might be more interesting. Through combination of viewing the output from pr_notice() and the output from "ethtool
-d", I can see RxConfig with the following values

	During boot:	0x00028700
	Before suspend:	0x0002870e
	During resume:	0x00024000
	Post resume:	0x0002870e

I then removed the call to rtl_init_rxcfg() from rtl_hw_start() and rebuilt, installed and rebooted. Now I see the
following values:

	During boot:	0x00028700
	Before suspend:	0x0002870e
	During resume:	0x00024000
	Post resume:	0x0002870e

> 
> Hope this helps,
> Maciej
> 

  parent reply	other threads:[~2018-10-10 22:31 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-28 15:54 R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev) Maciej S. Szmigiero
2018-09-28 22:00 ` Chris Clayton
2018-09-28 22:13   ` Heiner Kallweit
2018-09-29  7:25     ` Chris Clayton
2018-09-29  7:38       ` Chris Clayton
2018-10-04  8:41     ` Chris Clayton
2018-10-07 19:36       ` Chris Clayton
2018-10-09 12:32         ` Maciej S. Szmigiero
2018-10-09 14:40           ` Chris Clayton
2018-10-09 20:36             ` Heiner Kallweit
2018-10-10  0:24               ` Maciej S. Szmigiero
2018-10-10  8:09                 ` Chris Clayton
2018-10-10  8:51                   ` Chris Clayton
2018-10-10 22:30                 ` Chris Clayton [this message]
2018-10-10 22:32                   ` Chris Clayton
2018-10-10 22:49                 ` Chris Clayton
2018-10-11  0:12                   ` Maciej S. Szmigiero
2018-10-11  8:24                     ` Chris Clayton
2018-10-11 12:23                       ` Maciej S. Szmigiero
2018-10-11 13:34                         ` Chris Clayton
2018-10-09 21:39             ` Heiner Kallweit
2018-10-09 23:32               ` Chris Clayton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c1b09d33-f2f0-337e-6e85-0ec035cbf179@googlemail.com \
    --to=chris2553@googlemail.com \
    --cc=a3at.mail@gmail.com \
    --cc=davem@davemloft.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=hkallweit1@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=nic_swsd@realtek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.