Memory corruption with r8169 across several device revisions and kernels

All of lore.kernel.org
 help / color / mirror / Atom feed

* Memory corruption with r8169 across several device revisions and kernels
@ 2018-01-20 20:18 Oliver Freyermuth
  2018-01-21 20:48 ` Francois Romieu
  0 siblings, 1 reply; 15+ messages in thread
From: Oliver Freyermuth @ 2018-01-20 20:18 UTC (permalink / raw)
  To: netdev

Dear network experts,

please redirect me if this is the wrong place. 

I have reproduced the following issue across three devices with different Realtek card revisions
and different Distros (Debian 9, Ubuntu 17.04, Gentoo with kernels 4.9, 4.11.3, 4.14.12). 

It's safely reproducible with at least:
Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 12)

Memory corruption at physical addresses either in low memory or kernel memory or user space memory occurs when reading from:
/proc/self/net/dev
The physical memory addresses which get corrupted change with each boot of the system,
and also appear to change with each reload of the kernel module (I have only one data point on that). 

To reproduce, execute:
$ while true; do cat /proc/self/net/dev > /dev/null; done
and in parallel, scan memory for corruption, e.g.
$ memtester 15G
Of course, one should try to map all system memory here. 
It usually shows up in the first loop iteration if the "while" loop is executed in parallel. 

Depending on the actual memory being corrupted, it may also become visible via
Corrupted low memory at ffff88000000b000 (b000 phys) = 0016e109
in klog, if the low memory corruption scanning is activated. 

The values found in overwritten memory match those contained in /proc/self/net/dev for the realtek ethernet device. 

Unloading r8169 or disabling the card in bios "fixes" this issue. 

I have already ended up with two corrupted btrfs filesystems due to this issue, and many segfaults in userspace. 

Please include me directly in replies, I may not stay subscribed to the list. 

Cheers,
	Oliver

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory corruption with r8169 across several device revisions and kernels
  2018-01-20 20:18 Memory corruption with r8169 across several device revisions and kernels Oliver Freyermuth
@ 2018-01-21 20:48 ` Francois Romieu
  2018-01-21 20:50   ` Oliver Freyermuth
  0 siblings, 1 reply; 15+ messages in thread
From: Francois Romieu @ 2018-01-21 20:48 UTC (permalink / raw)
  To: Oliver Freyermuth; +Cc: netdev

Oliver Freyermuth <o.freyermuth@googlemail.com> :
[...]

Is it an AMD based system ?

-- 
Ueimor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory corruption with r8169 across several device revisions and kernels
  2018-01-21 20:48 ` Francois Romieu
@ 2018-01-21 20:50   ` Oliver Freyermuth
  2018-01-22  0:09     ` Francois Romieu
  0 siblings, 1 reply; 15+ messages in thread
From: Oliver Freyermuth @ 2018-01-21 20:50 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev

Hi, 

Am 21.01.2018 um 21:48 schrieb Francois Romieu:
> Oliver Freyermuth <o.freyermuth@googlemail.com> :
> [...]
> 
> Is it an AMD based system ?
> 

No, all the systems on which I have observed this up to now are Intel-based. 
Two Haswell and one Sandy Bridge system. 

Cheers,
	Oliver

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory corruption with r8169 across several device revisions and kernels
  2018-01-21 20:50   ` Oliver Freyermuth
@ 2018-01-22  0:09     ` Francois Romieu
  2018-01-22  0:44       ` Oliver Freyermuth
  2018-01-22 22:55       ` Oliver Freyermuth
  0 siblings, 2 replies; 15+ messages in thread
From: Francois Romieu @ 2018-01-22  0:09 UTC (permalink / raw)
  To: Oliver Freyermuth; +Cc: netdev

Oliver Freyermuth <o.freyermuth@googlemail.com> :
> Am 21.01.2018 um 21:48 schrieb Francois Romieu:
> > Oliver Freyermuth <o.freyermuth@googlemail.com> :
[...]
> > Is it an AMD based system ?
> > 
> 
> No, all the systems on which I have observed this up to now are Intel-based. 
> Two Haswell and one Sandy Bridge system. 

Ok.

You said:

Oliver Freyermuth <o.freyermuth@googlemail.com> :
[...]
> The values found in overwritten memory match those contained in
> /proc/self/net/dev for the realtek ethernet device.

Are you able to retrieve the layout ? That is, does it appear to match:

- r8169 hardware stats DMA buffer ?
  TxOk, RxOk, TxErr, RxErr, ...

- rtnl_link_stats ?
  rx_packets, tx_packets, rx_bytes, tx_bytes, ...

or something else ?

-- 
Ueimor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory corruption with r8169 across several device revisions and kernels
  2018-01-22  0:09     ` Francois Romieu
@ 2018-01-22  0:44       ` Oliver Freyermuth
  2018-01-22 22:55       ` Oliver Freyermuth
  1 sibling, 0 replies; 15+ messages in thread
From: Oliver Freyermuth @ 2018-01-22  0:44 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev

Am 22.01.2018 um 01:09 schrieb Francois Romieu:
> You said:
> 
> Oliver Freyermuth <o.freyermuth@googlemail.com> :
> [...]
>> The values found in overwritten memory match those contained in
>> /proc/self/net/dev for the realtek ethernet device.
> 
> Are you able to retrieve the layout ? That is, does it appear to match:
> 
> - r8169 hardware stats DMA buffer ?
>   TxOk, RxOk, TxErr, RxErr, ...
> 
> - rtnl_link_stats ?
>   rx_packets, tx_packets, rx_bytes, tx_bytes, ...
> 
> or something else ?

Not cleanly. 
Since I'm no expert in kernel module development, I can only deduce from what I get in mapped memory,
e.g. with memtester. What I found there I found back in /proc/self/net/dev,
I'm not sure anymore whether it was RX or TX bytes / packets (but it was none of the error counters). 
I can try to reproduce to clarify, but it's a somwhat dangerous undertaking. 

Also, from a time when the physical offset was in low memory, I got the following in syslog:
Oct 12 10:05:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065b8ea
Oct 12 10:10:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065be39
Oct 12 10:11:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065be8c
Oct 12 10:12:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065bef8
Oct 12 10:13:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065bfbe
Oct 12 10:18:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065c37a
Oct 12 10:19:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065c3db
Oct 12 10:31:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065cc48
Oct 12 10:35:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065d402
Oct 12 10:47:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065dcbb
Oct 12 10:53:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065e0a3
Oct 12 11:39:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 006602f2
Oct 12 11:44:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 00661ef0

Also, I'm not sure whether the low memory scanner continues after a single corruption was found, potentially it would only see the first corrupted region. 
memtester in userspace stops on the first corruption and then tries another pass. At least I only ever saw one corrupted region with the tools I used. 

The same was true for the corrupted btrfs filesystem: As far as I could tell, there was a single corrupted region, no series of counters, i.e. not a full structure. 

Cheers,
	Oliver

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory corruption with r8169 across several device revisions and kernels
  2018-01-22  0:09     ` Francois Romieu
  2018-01-22  0:44       ` Oliver Freyermuth
@ 2018-01-22 22:55       ` Oliver Freyermuth
  2018-01-23 15:28         ` David Miller
  2018-01-23 22:13         ` Francois Romieu
  1 sibling, 2 replies; 15+ messages in thread
From: Oliver Freyermuth @ 2018-01-22 22:55 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev

Dear Francois, other r8169 experts, 

Am 22.01.2018 um 01:09 schrieb Francois Romieu:
> Are you able to retrieve the layout ? That is, does it appear to match:
> 
> - r8169 hardware stats DMA buffer ?
>   TxOk, RxOk, TxErr, RxErr, ...
> 
> - rtnl_link_stats ?
>   rx_packets, tx_packets, rx_bytes, tx_bytes, ...
> 
> or something else ?
> 

It took me a while, somehow it seems the bug does not always occur - potentially there's also some race involved. 
Reproducing on a Ubuntu 17.10 system I found the following:

Address in virtual memory || value
0x7f87bb4c6000            || 0x00000217
0x7f87bb4c6008            || 0x000003ab
0x7f87bb4c6018            || 0x00000000
0x7f87bb4c6028            || 0x00000279
0x7f87bb4c6030            || 0x000000e1
0x7f87bb4c6038            || 0x00000051

At almost the same time, I find the following numbers in /proc/self/net/dev for the device:

             decimal || hex
RX bytes:    870820  || 0x000d49a4
   packets:     945  || 0x000003b1
   errs           0  || 
   drop           0  || 
   fifo           0  ||  
   frame          0  || 
   compressed     0  || 
   multicast     83  || 0x00000053
TX bytes:     58505  || 0x0000e489
   packets:     535  || 0x00000217
   errs           0  || 
   drop           0  ||
   fifo           0  || 
   frame          0  || 
   compressed     0  || 
   multicast      0  || 

Since there was a small delay in time (reading from /proc/self/net/dev happened a few seconds later),
these values are by a few packets off from the memory dump. 

So I deduce the layout:
0x7f87bb4c6000   TX Packets
0x7f87bb4c6008   RX Packets
0x7f87bb4c6010    * corruption not seen by memtester for whatever reason *
0x7f87bb4c6018   ???
0x7f87bb4c6020    * corruption not seen by memtester for whatever reason *
0x7f87bb4c6028   ???
0x7f87bb4c6030   ???
0x7f87bb4c6038   RX multicast (?)

So the only thing which is fully clear is that there are TX Packets and after that RX Packets. 

Checking through the driver sources, I find rtnl_link_stats64 can not be the culprit, since it has rx_packets and only after tx_packets. 
However, struct rtl8169_counters looks like:
struct rtl8169_counters {
	__le64	tx_packets;
	__le64	rx_packets;
	__le64	tx_errors;
	__le32	rx_errors;
	__le16	rx_missed;
	__le16	align_errors;
	__le32	tx_one_collision;
	__le32	tx_multi_collision;
	__le64	rx_unicast;
	__le64	rx_broadcast;
	__le32	rx_multicast;
	__le16	tx_aborted;
	__le16	tx_underun;
};
This looks like it could very well match the structure found in memory, so something would be broken related to rtl8169_do_counters, in the DMA transfer. 

Does this help - can I provide more info? I get the feeling this affects many tens of thousands of systems and just has been hidden due to 
network stats being read rarely... 

Cheers,
Oliver

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory corruption with r8169 across several device revisions and kernels
  2018-01-22 22:55       ` Oliver Freyermuth
@ 2018-01-23 15:28         ` David Miller
  2018-01-23 15:47           ` Oliver Freyermuth
  2018-01-23 22:13         ` Francois Romieu
  1 sibling, 1 reply; 15+ messages in thread
From: David Miller @ 2018-01-23 15:28 UTC (permalink / raw)
  To: o.freyermuth; +Cc: romieu, netdev

From: Oliver Freyermuth <o.freyermuth@googlemail.com>
Date: Mon, 22 Jan 2018 23:55:58 +0100

> Checking through the driver sources, I find rtnl_link_stats64 can
> not be the culprit, since it has rx_packets and only after
> tx_packets.  However, struct rtl8169_counters looks like:
>
> struct rtl8169_counters {
> 	__le64	tx_packets;
> 	__le64	rx_packets;
> 	__le64	tx_errors;
> 	__le32	rx_errors;
> 	__le16	rx_missed;
> 	__le16	align_errors;
> 	__le32	tx_one_collision;
> 	__le32	tx_multi_collision;
> 	__le64	rx_unicast;
> 	__le64	rx_broadcast;
> 	__le32	rx_multicast;
> 	__le16	tx_aborted;
> 	__le16	tx_underun;
> };
>
> This looks like it could very well match the structure found in
> memory, so something would be broken related to rtl8169_do_counters,
> in the DMA transfer.
> 
> Does this help - can I provide more info? I get the feeling this
> affects many tens of thousands of systems and just has been hidden
> due to network stats being read rarely...

Looking at how these DMA counters are handled, there appears to be a
requirement that the memory buffer is 64-byte aligned.

This is because the low bits in the counter address register are used
for various commands, for example:

	/* ResetCounterCommand */
	CounterReset	= 0x1,

	/* DumpCounterCommand */
	CounterDump	= 0x8,

Looking at the FreeBSD driver, the requirement seems to be 64-bytes of
alignment.  (see RL_DUMP_ALIGN define)

However, nothing is being done in r8169.c to enforce this alignment at
counter allocation time:

	tp->counters = dmam_alloc_coherent (&pdev->dev, sizeof(*tp->counters),
					    &tp->counters_phys_addr,

There is no alignment guaranteed by this allocation interface.  On a
lot of platforms you get PAGE_SIZE aligned buffers, but this is not
a universal thing at all.

Therefore the driver needs to allocate "size + (64 - 1)" bytes and do
the 64-byte alignment of the CPU pointer and the DMA address by hand.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory corruption with r8169 across several device revisions and kernels
  2018-01-23 15:28         ` David Miller
@ 2018-01-23 15:47           ` Oliver Freyermuth
  2018-01-24  5:31             ` Andreas Hartmann
  0 siblings, 1 reply; 15+ messages in thread
From: Oliver Freyermuth @ 2018-01-23 15:47 UTC (permalink / raw)
  To: David Miller; +Cc: romieu, netdev

Am 23.01.2018 um 16:28 schrieb David Miller:
> Looking at how these DMA counters are handled, there appears to be a
> requirement that the memory buffer is 64-byte aligned.
> 
> [...]
> 
> Therefore the driver needs to allocate "size + (64 - 1)" bytes and do
> the 64-byte alignment of the CPU pointer and the DMA address by hand.

This is also what I wondered about as a non-expert in hardware drivers; 
alignment should surely be enforced here. 

However, for the memory corruption I observed, I used an x86_64 system
(which I believe always has PAGE_SIZE aligned buffers). 
So there should be another bug, unless I am mistaken about x86_64. 

I checked the deprecated r8168 driver by Realtek (I am not sure if this one is also affected by the issue, though)
and found two major differences in DMA handling:
1) It wraps the DMA operations (writing of adresses, waiting for cmd bits to be pulled down) in spin_lock_irqsave / spin_unlock_irqrestore. 
2) It does not reset CounterAddrLow / CounterAddrHigh to 0 / 0 after finishing. 
   That's not really good, but may have hidden this issue with r8168. 

Again, I have not tried to use r8168 yet (especially since it only supports old kernels),
but maybe this helps to trigger some ideas. 

Worst case, this could be a firmware timing bug, i.e. the card writes the counters to system memory
shortly before the cmd bytes are pulled high / shortly after they have been pulled down (then using the partially zeroed
out memory address) - I don't know. Let me know if I can extract any more info from an affected machine,
but I believe these machines should be very abundant. 

HTH and thanks,
Oliver

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory corruption with r8169 across several device revisions and kernels
  2018-01-22 22:55       ` Oliver Freyermuth
  2018-01-23 15:28         ` David Miller
@ 2018-01-23 22:13         ` Francois Romieu
  2018-01-24  1:21           ` Oliver Freyermuth
  1 sibling, 1 reply; 15+ messages in thread
From: Francois Romieu @ 2018-01-23 22:13 UTC (permalink / raw)
  To: Oliver Freyermuth; +Cc: netdev

Oliver Freyermuth <o.freyermuth@googlemail.com> :
[...]
> This looks like it could very well match the structure found in memory,
> so something would be broken related to rtl8169_do_counters, in the DMA
> transfer. 
> 
> Does this help - can I provide more info? I get the feeling this affects
> many tens of thousands of systems and just has been hidden due to network
> stats being read rarely... 

It helps. Can you try the snippet below ?

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 272c5962e4f7..8531b41e3397 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -2238,16 +2238,12 @@ static bool rtl8169_do_counters(struct net_device *dev, u32 counter_cmd)
 	bool ret;
 
 	RTL_W32(CounterAddrHigh, (u64)paddr >> 32);
+	RTL_R32(CounterAddrHigh);
 	cmd = (u64)paddr & DMA_BIT_MASK(32);
 	RTL_W32(CounterAddrLow, cmd);
 	RTL_W32(CounterAddrLow, cmd | counter_cmd);
 
-	ret = rtl_udelay_loop_wait_low(tp, &rtl_counters_cond, 10, 1000);
-
-	RTL_W32(CounterAddrLow, 0);
-	RTL_W32(CounterAddrHigh, 0);
-
-	return ret;
+	return rtl_udelay_loop_wait_low(tp, &rtl_counters_cond, 10, 1000);
 }
 
 static bool rtl8169_reset_counters(struct net_device *dev)

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Memory corruption with r8169 across several device revisions and kernels
  2018-01-23 22:13         ` Francois Romieu
@ 2018-01-24  1:21           ` Oliver Freyermuth
  2018-01-24  1:33             ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: Oliver Freyermuth @ 2018-01-24  1:21 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev

Am 23.01.2018 um 23:13 schrieb Francois Romieu:
> 
> It helps. Can you try the snippet below ?

It seems to fix the issue - I could not reproduce memory corruption anymore
neither on an Ubuntu 17.10.1 live system (with patched kernel module)
nor on my Gentoo system (4.14.12 with your patch applied) 
across several reboots and module reloads! 

Sorry for my ignorance, but I'm curious - the R32 is there to avoid reordering of the writes? 

Also, a small issue (for the final patch, in case you did not notice): 
"bool ret" is now unused and will produce a compiler warning. 

Many thanks for the quick help. I don't know the policies, but from user point of view,
this should be a good candidate for backporting to stable kernels, 
since many systems in the wild should be affected by this,
and spurious memory corruption leading to e.g. broken filesystems is rather nasty. 

All the best,
	Oliver

> 
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 272c5962e4f7..8531b41e3397 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -2238,16 +2238,12 @@ static bool rtl8169_do_counters(struct net_device *dev, u32 counter_cmd)
>  	bool ret;
>  
>  	RTL_W32(CounterAddrHigh, (u64)paddr >> 32);
> +	RTL_R32(CounterAddrHigh);
>  	cmd = (u64)paddr & DMA_BIT_MASK(32);
>  	RTL_W32(CounterAddrLow, cmd);
>  	RTL_W32(CounterAddrLow, cmd | counter_cmd);
>  
> -	ret = rtl_udelay_loop_wait_low(tp, &rtl_counters_cond, 10, 1000);
> -
> -	RTL_W32(CounterAddrLow, 0);
> -	RTL_W32(CounterAddrHigh, 0);
> -
> -	return ret;
> +	return rtl_udelay_loop_wait_low(tp, &rtl_counters_cond, 10, 1000);
>  }
>  
>  static bool rtl8169_reset_counters(struct net_device *dev)
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory corruption with r8169 across several device revisions and kernels
  2018-01-24  1:21           ` Oliver Freyermuth
@ 2018-01-24  1:33             ` David Miller
  2018-01-26  0:53               ` [PATCH net 1/1] r8169: fix memory corruption on retrieval of hardware statistics Francois Romieu
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2018-01-24  1:33 UTC (permalink / raw)
  To: o.freyermuth; +Cc: romieu, netdev

From: Oliver Freyermuth <o.freyermuth@googlemail.com>
Date: Wed, 24 Jan 2018 02:21:55 +0100

> Am 23.01.2018 um 23:13 schrieb Francois Romieu:
>> 
>> It helps. Can you try the snippet below ?
> 
> It seems to fix the issue - I could not reproduce memory corruption
> anymore neither on an Ubuntu 17.10.1 live system (with patched
> kernel module) nor on my Gentoo system (4.14.12 with your patch
> applied) across several reboots and module reloads!

Thanks for helping test this out.

> Sorry for my ignorance, but I'm curious - the R32 is there to avoid
> reordering of the writes?

The R32 make sure the write(s) beforehand have reached the chip, and
indeed another thing it ensures is write ordering.

> Many thanks for the quick help. I don't know the policies, but from
> user point of view, this should be a good candidate for backporting
> to stable kernels, since many systems in the wild should be affected
> by this, and spurious memory corruption leading to e.g. broken
> filesystems is rather nasty.

Hopefully Francois can post a bonafide patch for me to apply with
a full commit log message etc.

I'm really surprised we were clearing those registers after
programming properly, I wonder what that was all about. :-/

Thanks again for your testing and help.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory corruption with r8169 across several device revisions and kernels
  2018-01-23 15:47           ` Oliver Freyermuth
@ 2018-01-24  5:31             ` Andreas Hartmann
  2018-01-24  7:05               ` Andreas Hartmann
  0 siblings, 1 reply; 15+ messages in thread
From: Andreas Hartmann @ 2018-01-24  5:31 UTC (permalink / raw)
  To: Oliver Freyermuth, David Miller; +Cc: romieu, netdev

On 01/23/2018 at 04:47 PM Oliver Freyermuth wrote:
> Am 23.01.2018 um 16:28 schrieb David Miller:
>> Looking at how these DMA counters are handled, there appears to be a
>> requirement that the memory buffer is 64-byte aligned.
>>
>> [...]
>>
>> Therefore the driver needs to allocate "size + (64 - 1)" bytes and do
>> the 64-byte alignment of the CPU pointer and the DMA address by hand.
> 
> This is also what I wondered about as a non-expert in hardware drivers; 
> alignment should surely be enforced here. 
> 
> However, for the memory corruption I observed, I used an x86_64 system
> (which I believe always has PAGE_SIZE aligned buffers). 
> So there should be another bug, unless I am mistaken about x86_64. 
> 
> I checked the deprecated r8168 driver by Realtek (I am not sure if this one is also affected by the issue, though)

I'm using since years this driver because r8169 is broken (it is slow
and it misses packages - which is extremely bad for real time
applications like asterisk, if they appear 50s later ...).

r8168-8.045.08 is an actual version which is provided by realtek on
their homepage and which even compiles fine w/ 4.14.x.


Regards,
Andreas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory corruption with r8169 across several device revisions and kernels
  2018-01-24  5:31             ` Andreas Hartmann
@ 2018-01-24  7:05               ` Andreas Hartmann
  0 siblings, 0 replies; 15+ messages in thread
From: Andreas Hartmann @ 2018-01-24  7:05 UTC (permalink / raw)
  To: Oliver Freyermuth, David Miller; +Cc: romieu, netdev

On 01/24/2018 at 06:31 AM Andreas Hartmann wrote:
> On 01/23/2018 at 04:47 PM Oliver Freyermuth wrote:
>> Am 23.01.2018 um 16:28 schrieb David Miller:
>>> Looking at how these DMA counters are handled, there appears to be a
>>> requirement that the memory buffer is 64-byte aligned.
>>>
>>> [...]
>>>
>>> Therefore the driver needs to allocate "size + (64 - 1)" bytes and do
>>> the 64-byte alignment of the CPU pointer and the DMA address by hand.
>>
>> This is also what I wondered about as a non-expert in hardware drivers; 
>> alignment should surely be enforced here. 
>>
>> However, for the memory corruption I observed, I used an x86_64 system
>> (which I believe always has PAGE_SIZE aligned buffers). 
>> So there should be another bug, unless I am mistaken about x86_64. 
>>
>> I checked the deprecated r8168 driver by Realtek (I am not sure if this one is also affected by the issue, though)
> 
> I'm using since years this driver because r8169 is broken (it is slow
> and it misses packages - which is extremely bad for real time
> applications like asterisk, if they appear 50s later ...).
> 
> r8168-8.045.08 is an actual version which is provided by realtek on
> their homepage and which even compiles fine w/ 4.14.x.

It just *compiles* w/ 4.14 - but doesn't work here at all. It's running
fine here w/ vanilla 4.4. Realtek officially writes about support up to
4.7. at the moment - but code already knows about 4.11.


Regards,
Andreas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net 1/1] r8169: fix memory corruption on retrieval of hardware statistics.
  2018-01-24  1:33             ` David Miller
@ 2018-01-26  0:53               ` Francois Romieu
  2018-01-26  2:34                 ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: Francois Romieu @ 2018-01-26  0:53 UTC (permalink / raw)
  To: David Miller; +Cc: o.freyermuth, netdev

Hardware statistics retrieval hurts in tight invocation loops.

Avoid extraneous write and enforce strict ordering of writes targeted to
the tally counters dump area address registers.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Tested-by: Oliver Freyermuth <o.freyermuth@googlemail.com>
---
 drivers/net/ethernet/realtek/r8169.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 272c5962e4f7..8e91274174f1 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -2235,19 +2235,14 @@ static bool rtl8169_do_counters(struct net_device *dev, u32 counter_cmd)
 	void __iomem *ioaddr = tp->mmio_addr;
 	dma_addr_t paddr = tp->counters_phys_addr;
 	u32 cmd;
-	bool ret;
 
 	RTL_W32(CounterAddrHigh, (u64)paddr >> 32);
+	RTL_R32(CounterAddrHigh);
 	cmd = (u64)paddr & DMA_BIT_MASK(32);
 	RTL_W32(CounterAddrLow, cmd);
 	RTL_W32(CounterAddrLow, cmd | counter_cmd);
 
-	ret = rtl_udelay_loop_wait_low(tp, &rtl_counters_cond, 10, 1000);
-
-	RTL_W32(CounterAddrLow, 0);
-	RTL_W32(CounterAddrHigh, 0);
-
-	return ret;
+	return rtl_udelay_loop_wait_low(tp, &rtl_counters_cond, 10, 1000);
 }
 
 static bool rtl8169_reset_counters(struct net_device *dev)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net 1/1] r8169: fix memory corruption on retrieval of hardware statistics.
  2018-01-26  0:53               ` [PATCH net 1/1] r8169: fix memory corruption on retrieval of hardware statistics Francois Romieu
@ 2018-01-26  2:34                 ` David Miller
  0 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2018-01-26  2:34 UTC (permalink / raw)
  To: romieu; +Cc: o.freyermuth, netdev

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Fri, 26 Jan 2018 01:53:26 +0100

> Hardware statistics retrieval hurts in tight invocation loops.
> 
> Avoid extraneous write and enforce strict ordering of writes targeted to
> the tally counters dump area address registers.
> 
> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
> Tested-by: Oliver Freyermuth <o.freyermuth@googlemail.com>

Applied and queued up for -stable, thanks Francois.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-01-26  2:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-20 20:18 Memory corruption with r8169 across several device revisions and kernels Oliver Freyermuth
2018-01-21 20:48 ` Francois Romieu
2018-01-21 20:50   ` Oliver Freyermuth
2018-01-22  0:09     ` Francois Romieu
2018-01-22  0:44       ` Oliver Freyermuth
2018-01-22 22:55       ` Oliver Freyermuth
2018-01-23 15:28         ` David Miller
2018-01-23 15:47           ` Oliver Freyermuth
2018-01-24  5:31             ` Andreas Hartmann
2018-01-24  7:05               ` Andreas Hartmann
2018-01-23 22:13         ` Francois Romieu
2018-01-24  1:21           ` Oliver Freyermuth
2018-01-24  1:33             ` David Miller
2018-01-26  0:53               ` [PATCH net 1/1] r8169: fix memory corruption on retrieval of hardware statistics Francois Romieu
2018-01-26  2:34                 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.