All of lore.kernel.org
 help / color / mirror / Atom feed
* PROBLEM: network data corruption (bisected to e5a4b0bb803b)
@ 2016-07-24  3:35 Alan Curry
  2016-07-24 17:45   ` Christian Lamparter
  0 siblings, 1 reply; 35+ messages in thread
From: Alan Curry @ 2016-07-24  3:35 UTC (permalink / raw)
  To: chunkeey, linux-wireless, netdev, linux-kernel

[1.] One line summary of the problem:
network data corruption (bisected to e5a4b0bb803b)

[2.] Full description of the problem/report:
Note: although my bisect ended at a commit from before 3.19, I have the
same symptom in all newer kernels I've tried, up to 4.6.4.

The commit was:

>commit e5a4b0bb803b39a36478451eae53a880d2663d5b
>Author: Al Viro <viro@zeniv.linux.org.uk>
>Date:   Mon Nov 24 18:17:55 2014 -0500
>
>    switch memcpy_to_msg() and skb_copy{,_and_csum}_datagram_msg() to primitives

The symptom is that downloaded files (http, ftp, and probably other
protocols) have small corrupted segments (about 1-2 kilobytes long) in
random locations. Only downloads that sustain a high speed for at least a
few seconds are corrupted. Anything small enough to be received in less
than about 5 seconds is not affected.

If I download the same file twice in a row, the corruption is in different
places in each copy.

If I try to do a git clone, it fails a few seconds into the "Receiving
objects" stage with a deflate error.

[3.] Keywords: networking, carl9170

[4.] Kernel information
[4.1.] Kernel version (from /proc/version):
Multiple versions are known to be affected, from 3.19 to 4.6.4

[4.2.] Kernel .config file:
For testing I built with make x86_64_defconfig followed by enabling the
carl9170 driver, which adds these lines:
CONFIG_ATH_COMMON=m
CONFIG_ATH_CARDS=m
CONFIG_CARL9170=m
CONFIG_CARL9170_LEDS=y
CONFIG_CARL9170_WPC=y

[5.] Most recent kernel version which did not have the bug:
That would be the predecessor of e5a4b0bb803b39a36478451eae53a880d2663d5b
which is v3.18-rc6-1620-g17836394e578

[6.] no Oops

[7.] A small shell script or example program which triggers the
     problem (if possible)

This command fails reliably for me when running an affected kernel:

git clone git://git.kernel.org/pub/scm/git/git.git

(I'm including all the standard format stuff suggested by REPORTING-BUGS,
but I think you can skip from here to section 8.7 without missing anything
relevant)

[8.] Environment
[8.1.] Software (add the output of the ver_linux script here)

Mostly Debian 8.5 stable packages here.

GNU C			4.9.2
GNU Make		4.0
Binutils		2.25
Util-linux		2.25.2
Mount			2.25.2
Quota-tools		4.01
Linux C Library		2.19
Dynamic linker (ldd)	2.19
Procps			2.0.11
Kbd			1.15.5
Console-tools		1.15.5
Sh-utils		8.23
Modules Loaded          <snipped since I'm not running the affected kernel now>

[8.2.] Processor information (from /proc/cpuinfo):
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Pentium(R) CPU G620 @ 2.60GHz
stepping	: 7
microcode	: 0x14
cpu MHz		: 1599.914
cache size	: 3072 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer xsave lahf_lm epb tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm arat pln pts
bugs		:
bogomips	: 5188.45
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Pentium(R) CPU G620 @ 2.60GHz
stepping	: 7
microcode	: 0x14
cpu MHz		: 2340.304
cache size	: 3072 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer xsave lahf_lm epb tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm arat pln pts
bugs		:
bogomips	: 5189.38
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

[8.3.] Module information (from /proc/modules):

When I tested with the x86_64_defconfig + carl9170 kernel, there were
hardly any modules built, and I reproduced the problem after booting with
init=/bin/sh, so no unnecessary modules were loaded. Currently running a
normal 4.6.4 kernel which is showing the bug, there are many more modules:

ip6t_REJECT 1524 1 - Live 0xffffffffa0393000
nf_reject_ipv6 2744 1 ip6t_REJECT, Live 0xffffffffa038f000
ip6table_filter 1419 1 - Live 0xffffffffa038b000
ip6_tables 13135 1 ip6table_filter, Live 0xffffffffa0383000
nf_nat_tftp 1158 0 - Live 0xffffffffa037f000
nf_conntrack_tftp 4017 1 nf_nat_tftp, Live 0xffffffffa037b000
nf_nat_irc 1574 0 - Live 0xffffffffa0377000
nf_conntrack_irc 3787 1 nf_nat_irc, Live 0xffffffffa0373000
nf_nat_ftp 1836 0 - Live 0xffffffffa036f000
nf_conntrack_ftp 6335 1 nf_nat_ftp, Live 0xffffffffa036a000
ipt_REJECT 1457 2 - Live 0xffffffffa0366000
nf_reject_ipv4 2483 1 ipt_REJECT, Live 0xffffffffa0362000
xt_TCPMSS 3187 1 - Live 0xffffffffa035e000
iptable_mangle 1544 1 - Live 0xffffffffa035a000
nf_log_ipv4 3667 1 - Live 0xffffffffa0356000
nf_log_common 2837 1 nf_log_ipv4, Live 0xffffffffa0352000
xt_LOG 1487 1 - Live 0xffffffffa034e000
xt_tcpudp 2522 5 - Live 0xffffffffa034a000
iptable_filter 1480 1 - Live 0xffffffffa0346000
iptable_nat 1786 0 - Live 0xffffffffa0342000
nf_conntrack_ipv4 10891 1 - Live 0xffffffffa033b000
nf_defrag_ipv4 1475 1 nf_conntrack_ipv4, Live 0xffffffffa0337000
nf_nat_ipv4 4349 1 iptable_nat, Live 0xffffffffa0332000
nf_nat 10019 4 nf_nat_tftp,nf_nat_irc,nf_nat_ftp,nf_nat_ipv4, Live 0xffffffffa032a000
nf_conntrack 59739 9 nf_nat_tftp,nf_conntrack_tftp,nf_nat_irc,nf_conntrack_irc,nf_nat_ftp,nf_conntrack_ftp,nf_conntrack_ipv4,nf_nat_ipv4,nf_nat, Live 0xffffffffa030f000
ip_tables 12753 3 iptable_mangle,iptable_filter,iptable_nat, Live 0xffffffffa0307000
x_tables 14824 10 ip6t_REJECT,ip6table_filter,ip6_tables,ipt_REJECT,xt_TCPMSS,iptable_mangle,xt_LOG,xt_tcpudp,iptable_filter,ip_tables, Live 0xffffffffa02fd000
ide_generic 1377 0 [permanent], Live 0xffffffffa02f9000
ide_core 71087 1 ide_generic, Live 0xffffffffa02da000
snd_hda_codec_hdmi 28560 1 - Live 0xffffffffa02cd000
snd_hda_codec_realtek 50872 1 - Live 0xffffffffa02b7000
snd_hda_codec_generic 44234 1 snd_hda_codec_realtek, Live 0xffffffffa02a5000
snd_hda_intel 16223 1 - Live 0xffffffffa029b000
snd_hda_codec 66159 4 snd_hda_codec_hdmi,snd_hda_codec_realtek,snd_hda_codec_generic,snd_hda_intel, Live 0xffffffffa027c000
snd_hda_core 37571 5 snd_hda_codec_hdmi,snd_hda_codec_realtek,snd_hda_codec_generic,snd_hda_intel,snd_hda_codec, Live 0xffffffffa0267000
snd_pcm_oss 29084 0 - Live 0xffffffffa025a000
snd_pcm 64191 5 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_hda_core,snd_pcm_oss, Live 0xffffffffa023e000
snd_timer 16920 1 snd_pcm, Live 0xffffffffa0234000
snd_mixer_oss 12283 2 snd_pcm_oss, Live 0xffffffffa022d000
snd 48509 8 snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_intel,snd_hda_codec,snd_pcm_oss,snd_pcm,snd_timer,snd_mixer_oss, Live 0xffffffffa0217000
soundcore 4527 2 snd, Live 0xffffffffa0211000
af_packet 26418 6 - Live 0xffffffffa0205000
arc4 2040 2 - Live 0xffffffffa0201000
carl9170 65428 0 - Live 0xffffffffa01ea000
ehci_pci 3783 0 - Live 0xffffffffa01e6000
ath 16832 1 carl9170, Live 0xffffffffa01dd000
led_class 3520 1 carl9170, Live 0xffffffffa01d8000
mac80211 319536 1 carl9170, Live 0xffffffffa0170000
cfg80211 178240 3 carl9170,ath,mac80211, Live 0xffffffffa0132000
rfkill 13258 1 cfg80211, Live 0xffffffffa0129000
xts 3015 2 - Live 0xffffffffa0125000
gf128mul 5879 1 xts, Live 0xffffffffa0120000
dm_crypt 15756 1 - Live 0xffffffffa0118000
dm_mod 75130 16 dm_crypt, Live 0xffffffffa00f9000
raid1 23138 1 - Live 0xffffffffa00ef000
md_mod 88740 2 raid1, Live 0xffffffffa00ce000
coretemp 4358 0 - Live 0xffffffffa00c9000
hwmon 2866 1 coretemp, Live 0xffffffffa00c4000
hid_generic 1321 0 - Live 0xffffffffa00c0000
usbhid 22332 1 - Live 0xffffffffa00b5000
hid 54261 2 hid_generic,usbhid, Live 0xffffffffa00a1000
xhci_pci 4024 0 - Live 0xffffffffa009d000
xhci_hcd 84050 1 xhci_pci, Live 0xffffffffa0082000
uhci_hcd 18422 0 - Live 0xffffffffa0079000
ohci_hcd 16591 0 - Live 0xffffffffa0070000
ehci_hcd 35022 1 ehci_pci, Live 0xffffffffa0062000
usbcore 142047 9 carl9170,ehci_pci,usbhid,xhci_pci,xhci_hcd,uhci_hcd,ohci_hcd,ehci_hcd, Live 0xffffffffa002b000
usb_common 2222 1 usbcore, Live 0xffffffffa0027000
loop 14910 0 - Live 0xffffffffa001e000
fuse 65423 0 - Live 0xffffffffa0005000
rtc 5294 0 - Live 0xffffffffa0000000

[8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)

0000-0000 : PCI Bus 0000:00
  0000-0000 : dma1
  0000-0000 : pic1
  0000-0000 : timer0
  0000-0000 : timer1
  0000-0000 : keyboard
  0000-0000 : PNP0800:00
  0000-0000 : keyboard
  0000-0000 : rtc
  0000-0000 : dma page reg
  0000-0000 : pic2
  0000-0000 : dma2
  0000-0000 : PNP0C04:00
    0000-0000 : fpu
  0000-0000 : ide_generic
  0000-0000 : ide_generic
  0000-0000 : ide_generic
0000-0000 : PCI Bus 0000:00
  0000-0000 : vga+
0000-0000 : PCI Bus 0000:00
  0000-0000 : ide_generic
  0000-0000 : ACPI PM1a_EVT_BLK
  0000-0000 : ACPI PM1a_CNT_BLK
  0000-0000 : ACPI PM_TMR
  0000-0000 : ACPI CPU throttle
  0000-0000 : ACPI GPE0_BLK
  0000-0000 : ACPI PM2_CNT_BLK
  0000-0000 : pnp 00:06
  0000-0000 : pnp 00:05
  0000-0000 : pnp 00:04
  0000-0000 : pnp 00:05
  0000-0000 : pnp 00:01
  0000-0000 : pnp 00:01
  0000-0000 : pnp 00:01
0000-0000 : PCI conf1
0000-0000 : PCI Bus 0000:00
  0000-0000 : pnp 00:05
  0000-0000 : PCI Bus 0000:01
    0000-0000 : 0000:01:00.0
  0000-0000 : 0000:00:02.0
  0000-0000 : 0000:00:1f.3
  0000-0000 : 0000:00:1f.2
    0000-0000 : ahci
  0000-0000 : 0000:00:1f.2
    0000-0000 : ahci
  0000-0000 : 0000:00:1f.2
    0000-0000 : ahci
  0000-0000 : 0000:00:1f.2
    0000-0000 : ahci
  0000-0000 : 0000:00:1f.2
    0000-0000 : ahci

00000000-00000000 : reserved
00000000-00000000 : System RAM
00000000-00000000 : reserved
00000000-00000000 : PCI Bus 0000:00
00000000-00000000 : PCI Bus 0000:00
  00000000-00000000 : Video ROM
00000000-00000000 : reserved
  00000000-00000000 : System ROM
00000000-00000000 : System RAM
  00000000-00000000 : Kernel code
  00000000-00000000 : Kernel data
  00000000-00000000 : Kernel bss
00000000-00000000 : reserved
00000000-00000000 : System RAM
00000000-00000000 : reserved
00000000-00000000 : System RAM
00000000-00000000 : ACPI Non-volatile Storage
00000000-00000000 : ACPI Tables
00000000-00000000 : ACPI Non-volatile Storage
00000000-00000000 : reserved
00000000-00000000 : System RAM
00000000-00000000 : reserved
00000000-00000000 : ACPI Non-volatile Storage
00000000-00000000 : reserved
00000000-00000000 : ACPI Non-volatile Storage
00000000-00000000 : reserved
00000000-00000000 : RAM buffer
00000000-00000000 : reserved
00000000-00000000 : PCI Bus 0000:00
  00000000-00000000 : reserved
    00000000-00000000 : 0000:00:02.0
    00000000-00000000 : PCI Bus 0000:01
      00000000-00000000 : 0000:01:00.0
      00000000-00000000 : 0000:01:00.0
  00000000-00000000 : PCI MMCONFIG 0000 [bus 00-ff]
    00000000-00000000 : reserved
      00000000-00000000 : pnp 00:00
  00000000-00000000 : reserved
    00000000-00000000 : 0000:00:02.0
    00000000-00000000 : PCI Bus 0000:03
      00000000-00000000 : 0000:03:00.0
        00000000-00000000 : xhci-hcd
    00000000-00000000 : PCI Bus 0000:02
      00000000-00000000 : 0000:02:00.0
        00000000-00000000 : xhci-hcd
    00000000-00000000 : 0000:00:1b.0
      00000000-00000000 : ICH HD audio
    00000000-00000000 : 0000:00:1f.3
    00000000-00000000 : 0000:00:1f.2
      00000000-00000000 : ahci
    00000000-00000000 : 0000:00:1d.0
      00000000-00000000 : ehci_hcd
    00000000-00000000 : 0000:00:1a.0
      00000000-00000000 : ehci_hcd
    00000000-00000000 : 0000:00:16.0
  00000000-00000000 : reserved
  00000000-00000000 : reserved
    00000000-00000000 : IOAPIC 0
    00000000-00000000 : HPET 0
      00000000-00000000 : PNP0103:00
  00000000-00000000 : reserved
    00000000-00000000 : pnp 00:05
  00000000-00000000 : reserved
    00000000-00000000 : pnp 00:00
  00000000-00000000 : reserved
    00000000-00000000 : pnp 00:05
    00000000-00000000 : pnp 00:00
  00000000-00000000 : reserved
    00000000-00000000 : pnp 00:00
  00000000-00000000 : reserved
    00000000-00000000 : pnp 00:00
      00000000-00000000 : Local APIC
  00000000-00000000 : reserved
    00000000-00000000 : pnp 00:05
00000000-00000000 : System RAM
00000000-00000000 : RAM buffer

[8.5.] PCI information ('lspci -vvv' as root)

00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
	Subsystem: Biostar Microtech Int'l Corp Device 3108
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
	Latency: 0
	Capabilities: [e0] Vendor Specific Information: Len=0c <?>
	Kernel driver in use: snb_uncore

00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
	Subsystem: Biostar Microtech Int'l Corp Device 110d
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 11
	Region 0: Memory at fe000000 (64-bit, non-prefetchable) [size=4M]
	Region 2: Memory at c0000000 (64-bit, prefetchable) [size=256M]
	Region 4: I/O ports at f000 [size=64]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
		Address: 00000000  Data: 0000
	Capabilities: [d0] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [a4] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-

00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
	Subsystem: Biostar Microtech Int'l Corp Device 3108
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 11
	Region 0: Memory at fe608000 (64-bit, non-prefetchable) [size=16]
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [8c] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000

00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05) (prog-if 20 [EHCI])
	Subsystem: Biostar Microtech Int'l Corp Device 3108
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at fe607000 (32-bit, non-prefetchable) [size=1K]
	Capabilities: [50] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Debug port: BAR=1 offset=00a0
	Capabilities: [98] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-
	Kernel driver in use: ehci-pci

00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 05)
	Subsystem: Biostar Microtech Int'l Corp Device 8229
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 33
	Region 0: Memory at fe600000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [50] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee0300c  Data: 41a2
	Capabilities: [70] Express (v1) Root Complex Integrated Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag- RBE-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
	Capabilities: [100 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=01
			Status:	NegoPending- InProgress-
		VC1:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=1 ArbSelect=Fixed TC/VC=22
			Status:	NegoPending- InProgress-
	Capabilities: [130 v1] Root Complex Link
		Desc:	PortNumber=0f ComponentID=00 EltType=Config
		Link0:	Desc:	TargetPort=00 TargetComponent=00 AssocRCRB- LinkType=MemMapped LinkValid+
			Addr:	00000000fed1c000
	Kernel driver in use: snd_hda_intel

00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 0000e000-0000efff
	Memory behind bridge: fff00000-000fffff
	Prefetchable memory behind bridge: 00000000d0000000-00000000d00fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
			ClockPM- Surprise- LLActRep+ BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #0, PowerLimit 10.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet- LinkState+
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee0300c  Data: 41a1
	Capabilities: [90] Subsystem: Biostar Microtech Int'l Corp Device 3108
	Capabilities: [a0] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: pcieport

00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b5) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
	I/O behind bridge: 0000f000-00000fff
	Memory behind bridge: fe500000-fe5fffff
	Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #2, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
			ClockPM- Surprise- LLActRep+ BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #1, PowerLimit 10.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet- LinkState+
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee0300c  Data: 41b1
	Capabilities: [90] Subsystem: Biostar Microtech Int'l Corp Device 3108
	Capabilities: [a0] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: pcieport

00:1c.2 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5) (prog-if 01 [Subtractive decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
	I/O behind bridge: 0000f000-00000fff
	Memory behind bridge: fe400000-fe4fffff
	Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #3, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
			ClockPM- Surprise- LLActRep+ BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #2, PowerLimit 10.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet- LinkState+
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
		Address: 00000000  Data: 0000
	Capabilities: [90] Subsystem: Biostar Microtech Int'l Corp Device 3108
	Capabilities: [a0] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05) (prog-if 20 [EHCI])
	Subsystem: Biostar Microtech Int'l Corp Device 3108
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 23
	Region 0: Memory at fe606000 (32-bit, non-prefetchable) [size=1K]
	Capabilities: [50] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Debug port: BAR=1 offset=00a0
	Capabilities: [98] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-
	Kernel driver in use: ehci-pci

00:1f.0 ISA bridge: Intel Corporation H61 Express Chipset Family LPC Controller (rev 05)
	Subsystem: Biostar Microtech Int'l Corp Device 3108
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Capabilities: [e0] Vendor Specific Information: Len=0c <?>

00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family SATA AHCI Controller (rev 05) (prog-if 01 [AHCI 1.0])
	Subsystem: Biostar Microtech Int'l Corp Device 5207
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin B routed to IRQ 26
	Region 0: I/O ports at f0b0 [size=8]
	Region 1: I/O ports at f0a0 [size=4]
	Region 2: I/O ports at f090 [size=8]
	Region 3: I/O ports at f080 [size=4]
	Region 4: I/O ports at f060 [size=32]
	Region 5: Memory at fe605000 (32-bit, non-prefetchable) [size=2K]
	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee0300c  Data: 41d1
	Capabilities: [70] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=00000004
	Capabilities: [b0] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-
	Kernel driver in use: ahci

00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05)
	Subsystem: Biostar Microtech Int'l Corp Device 3108
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin C routed to IRQ 3
	Region 0: Memory at fe604000 (64-bit, non-prefetchable) [size=256]
	Region 4: I/O ports at f040 [size=32]

01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
	Subsystem: Biostar Microtech Int'l Corp Device 230a
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 11
	Region 0: I/O ports at e000 [size=256]
	Region 2: Memory at d0004000 (64-bit, prefetchable) [size=4K]
	Region 4: Memory at d0000000 (64-bit, prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 01
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
		Vector table: BAR=4 offset=00000000
		PBA: BAR=4 offset=00000800
	Capabilities: [d0] Vital Product Data
		Not readable
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [140 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=01
			Status:	NegoPending- InProgress-
	Capabilities: [160 v1] Device Serial Number 3b-03-00-00-68-4c-e0-00

02:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller (prog-if 30 [XHCI])
	Subsystem: Biostar Microtech Int'l Corp Device 6300
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 17
	Region 0: Memory at fe500000 (64-bit, non-prefetchable) [size=32K]
	Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [68] MSI-X: Enable+ Count=8 Masked-
		Vector table: BAR=0 offset=00002000
		PBA: BAR=0 offset=00002080
	Capabilities: [78] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [80] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 <2us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <2us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=01
			Status:	NegoPending- InProgress-
	Kernel driver in use: xhci_hcd

03:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller (prog-if 30 [XHCI])
	Subsystem: Biostar Microtech Int'l Corp Device 6300
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 18
	Region 0: Memory at fe400000 (64-bit, non-prefetchable) [size=32K]
	Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [68] MSI-X: Enable+ Count=8 Masked-
		Vector table: BAR=0 offset=00002000
		PBA: BAR=0 offset=00002080
	Capabilities: [78] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [80] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 <2us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr+ UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <2us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=01
			Status:	NegoPending- InProgress-
	Kernel driver in use: xhci_hcd

[8.6.] SCSI information (from /proc/scsi/scsi)

Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: TSSTcorp Model: CDDVDW SH-222BB  Rev: SB00
  Type:   CD-ROM                           ANSI  SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: WDC WD10EZEX-00W Rev: 1A01
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi4 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: WDC WD10EZEX-00W Rev: 1A01
  Type:   Direct-Access                    ANSI  SCSI revision: 05

[8.7.] Other information that might be relevant to the problem
       (please look in /proc and include all information that you
       think to be relevant):

lsusb identifies my network device as:

Bus 005 Device 004: ID 0cf3:1002 Atheros Communications, Inc. TP-Link TL-WN821N v2 802.11n [Atheros AR9170]

I have version 1.9.9 of carl9170-1.fw in /lib/firmware

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
@ 2016-07-24 17:45   ` Christian Lamparter
  0 siblings, 0 replies; 35+ messages in thread
From: Christian Lamparter @ 2016-07-24 17:45 UTC (permalink / raw)
  To: Alan Curry
  Cc: chunkeey, linux-wireless, netdev, linux-kernel, Al Viro, alexmcwhirter

Hello,

I added Al Viro to the CC (probably not necessary...)

On Sunday, July 24, 2016 3:35:14 AM CEST Alan Curry wrote:
> [1.] One line summary of the problem:
> network data corruption (bisected to e5a4b0bb803b)
> 
> [2.] Full description of the problem/report:
> Note: although my bisect ended at a commit from before 3.19, I have the
> same symptom in all newer kernels I've tried, up to 4.6.4.
> 
> The commit was:
> 
> >commit e5a4b0bb803b39a36478451eae53a880d2663d5b
> >Author: Al Viro <viro@zeniv.linux.org.uk>
> >Date:   Mon Nov 24 18:17:55 2014 -0500
> >
> >    switch memcpy_to_msg() and skb_copy{,_and_csum}_datagram_msg() to primitives
> 
> The symptom is that downloaded files (http, ftp, and probably other
> protocols) have small corrupted segments (about 1-2 kilobytes long) in
> random locations. Only downloads that sustain a high speed for at least a
> few seconds are corrupted. Anything small enough to be received in less
> than about 5 seconds is not affected.
> 
> If I download the same file twice in a row, the corruption is in different
> places in each copy.
> 
> If I try to do a git clone, it fails a few seconds into the "Receiving
> objects" stage with a deflate error.

Thanks for the detailed bug-report. I looked around the web to see if it
was already reported or not. If found that this issue was reported before:
[0], [1] and [2] by the same person (CC'ed). One difference is that the 
reporter had this issue with rsync on multiple SPARC systems. I ran a
git grep on a 4.7.0-rc7+ (wt-2016-07-21-15-g97bd3b0). But it didn't find
any patches directly referencing the commit. I'm not sure if this issue
has been fixed by now or not. I would greatly appreciate any comment
about this from the "people of netdev" (Al Viro? Alex Mcwhirter?).

As for carl9170: I'm not sure what the driver or firmware can do about
this at this time. You can try to disable the hardware crypto by setting
nohwcrypt via the module option. However, this might not do anything at all.

> [3.] Keywords: networking, carl9170
> 
> [4.] Kernel information
> [4.1.] Kernel version (from /proc/version):
> Multiple versions are known to be affected, from 3.19 to 4.6.4
> 
> [4.2.] Kernel .config file:
> For testing I built with make x86_64_defconfig followed by enabling the
> carl9170 driver, which adds these lines:
> CONFIG_ATH_COMMON=m
> CONFIG_ATH_CARDS=m
> CONFIG_CARL9170=m
> CONFIG_CARL9170_LEDS=y
> CONFIG_CARL9170_WPC=y
> 
> [5.] Most recent kernel version which did not have the bug:
> That would be the predecessor of e5a4b0bb803b39a36478451eae53a880d2663d5b
> which is v3.18-rc6-1620-g17836394e578
> 
> [6.] no Oops
> 
> [7.] A small shell script or example program which triggers the
>      problem (if possible)
> 
> This command fails reliably for me when running an affected kernel:
> 
> git clone git://git.kernel.org/pub/scm/git/git.git
> 
> (I'm including all the standard format stuff suggested by REPORTING-BUGS,
> but I think you can skip from here to section 8.7 without missing anything
> relevant)
Yes, I removed it for the most part. If anyone is interested in the details:
Here's a link to the original post @LKML [3].

> 
> [8.] Environment
> [8.1.] Software (add the output of the ver_linux script here)
> 
> Mostly Debian 8.5 stable packages here.
> 
> [8.3.] Module information (from /proc/modules):
> 
> When I tested with the x86_64_defconfig + carl9170 kernel, there were
> hardly any modules built, and I reproduced the problem after booting with
> init=/bin/sh, so no unnecessary modules were loaded. Currently running a
> normal 4.6.4 kernel which is showing the bug.
> 
> [...]
> [8.7.] Other information that might be relevant to the problem
>        (please look in /proc and include all information that you
>        think to be relevant):
> 
> lsusb identifies my network device as:
> 
> Bus 005 Device 004: ID 0cf3:1002 Atheros Communications, Inc. TP-Link TL-WN821N v2 802.11n [Atheros AR9170]
> 
> I have version 1.9.9 of carl9170-1.fw in /lib/firmware
Just one additional question: Is the TL-WN821N connected to a USB3 port?

Regards,
Christian

[0] <https://lists.debian.org/debian-sparc/2016/06/msg00160.html>
[1] <https://marc.info/?l=gentoo-sparc&m=145766845820114&w=2>
[2] <http://permalink.gmane.org/gmane.linux.ports.sparc/22507>
[3] <https://lkml.org/lkml/2016/7/23/184>




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
@ 2016-07-24 17:45   ` Christian Lamparter
  0 siblings, 0 replies; 35+ messages in thread
From: Christian Lamparter @ 2016-07-24 17:45 UTC (permalink / raw)
  To: Alan Curry
  Cc: chunkeey-gM/Ye1E23mwN+BqQ9rBEUg,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	alexmcwhirter-O8/uFoRGvHWcqzYg7KEe8g

Hello,

I added Al Viro to the CC (probably not necessary...)

On Sunday, July 24, 2016 3:35:14 AM CEST Alan Curry wrote:
> [1.] One line summary of the problem:
> network data corruption (bisected to e5a4b0bb803b)
> 
> [2.] Full description of the problem/report:
> Note: although my bisect ended at a commit from before 3.19, I have the
> same symptom in all newer kernels I've tried, up to 4.6.4.
> 
> The commit was:
> 
> >commit e5a4b0bb803b39a36478451eae53a880d2663d5b
> >Author: Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
> >Date:   Mon Nov 24 18:17:55 2014 -0500
> >
> >    switch memcpy_to_msg() and skb_copy{,_and_csum}_datagram_msg() to primitives
> 
> The symptom is that downloaded files (http, ftp, and probably other
> protocols) have small corrupted segments (about 1-2 kilobytes long) in
> random locations. Only downloads that sustain a high speed for at least a
> few seconds are corrupted. Anything small enough to be received in less
> than about 5 seconds is not affected.
> 
> If I download the same file twice in a row, the corruption is in different
> places in each copy.
> 
> If I try to do a git clone, it fails a few seconds into the "Receiving
> objects" stage with a deflate error.

Thanks for the detailed bug-report. I looked around the web to see if it
was already reported or not. If found that this issue was reported before:
[0], [1] and [2] by the same person (CC'ed). One difference is that the 
reporter had this issue with rsync on multiple SPARC systems. I ran a
git grep on a 4.7.0-rc7+ (wt-2016-07-21-15-g97bd3b0). But it didn't find
any patches directly referencing the commit. I'm not sure if this issue
has been fixed by now or not. I would greatly appreciate any comment
about this from the "people of netdev" (Al Viro? Alex Mcwhirter?).

As for carl9170: I'm not sure what the driver or firmware can do about
this at this time. You can try to disable the hardware crypto by setting
nohwcrypt via the module option. However, this might not do anything at all.

> [3.] Keywords: networking, carl9170
> 
> [4.] Kernel information
> [4.1.] Kernel version (from /proc/version):
> Multiple versions are known to be affected, from 3.19 to 4.6.4
> 
> [4.2.] Kernel .config file:
> For testing I built with make x86_64_defconfig followed by enabling the
> carl9170 driver, which adds these lines:
> CONFIG_ATH_COMMON=m
> CONFIG_ATH_CARDS=m
> CONFIG_CARL9170=m
> CONFIG_CARL9170_LEDS=y
> CONFIG_CARL9170_WPC=y
> 
> [5.] Most recent kernel version which did not have the bug:
> That would be the predecessor of e5a4b0bb803b39a36478451eae53a880d2663d5b
> which is v3.18-rc6-1620-g17836394e578
> 
> [6.] no Oops
> 
> [7.] A small shell script or example program which triggers the
>      problem (if possible)
> 
> This command fails reliably for me when running an affected kernel:
> 
> git clone git://git.kernel.org/pub/scm/git/git.git
> 
> (I'm including all the standard format stuff suggested by REPORTING-BUGS,
> but I think you can skip from here to section 8.7 without missing anything
> relevant)
Yes, I removed it for the most part. If anyone is interested in the details:
Here's a link to the original post @LKML [3].

> 
> [8.] Environment
> [8.1.] Software (add the output of the ver_linux script here)
> 
> Mostly Debian 8.5 stable packages here.
> 
> [8.3.] Module information (from /proc/modules):
> 
> When I tested with the x86_64_defconfig + carl9170 kernel, there were
> hardly any modules built, and I reproduced the problem after booting with
> init=/bin/sh, so no unnecessary modules were loaded. Currently running a
> normal 4.6.4 kernel which is showing the bug.
> 
> [...]
> [8.7.] Other information that might be relevant to the problem
>        (please look in /proc and include all information that you
>        think to be relevant):
> 
> lsusb identifies my network device as:
> 
> Bus 005 Device 004: ID 0cf3:1002 Atheros Communications, Inc. TP-Link TL-WN821N v2 802.11n [Atheros AR9170]
> 
> I have version 1.9.9 of carl9170-1.fw in /lib/firmware
Just one additional question: Is the TL-WN821N connected to a USB3 port?

Regards,
Christian

[0] <https://lists.debian.org/debian-sparc/2016/06/msg00160.html>
[1] <https://marc.info/?l=gentoo-sparc&m=145766845820114&w=2>
[2] <http://permalink.gmane.org/gmane.linux.ports.sparc/22507>
[3] <https://lkml.org/lkml/2016/7/23/184>



--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2016-07-24 17:45   ` Christian Lamparter
@ 2016-07-24 19:02     ` Al Viro
  -1 siblings, 0 replies; 35+ messages in thread
From: Al Viro @ 2016-07-24 19:02 UTC (permalink / raw)
  To: Christian Lamparter
  Cc: Alan Curry, linux-wireless, netdev, linux-kernel, alexmcwhirter

On Sun, Jul 24, 2016 at 07:45:13PM +0200, Christian Lamparter wrote:

> > The symptom is that downloaded files (http, ftp, and probably other
> > protocols) have small corrupted segments (about 1-2 kilobytes long) in
> > random locations. Only downloads that sustain a high speed for at least a
> > few seconds are corrupted. Anything small enough to be received in less
> > than about 5 seconds is not affected.

Can that sucker be reproduced with netcat?  That would eliminate all issues
with multi-iovec recvmsg(2), narrowing the things down quite bit.

Another thing (and if that works, it's *NOT* a proper fix - it would be
papering over the problem, but at least it would show where to look for
it) - try (on top of mainline) the following delta:

diff --git a/net/core/datagram.c b/net/core/datagram.c
index b7de71f..0ee5995 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -734,7 +734,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
 	if (!chunk)
 		return 0;
 
-	if (msg_data_left(msg) < chunk) {
+	if (iov_iter_single_seg_count(&msg->msg_iter) < chunk) {
 		if (__skb_checksum_complete(skb))
 			goto csum_error;
 		if (skb_copy_datagram_msg(skb, hlen, msg, chunk))

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
@ 2016-07-24 19:02     ` Al Viro
  0 siblings, 0 replies; 35+ messages in thread
From: Al Viro @ 2016-07-24 19:02 UTC (permalink / raw)
  To: Christian Lamparter
  Cc: Alan Curry, linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	alexmcwhirter-O8/uFoRGvHWcqzYg7KEe8g

On Sun, Jul 24, 2016 at 07:45:13PM +0200, Christian Lamparter wrote:

> > The symptom is that downloaded files (http, ftp, and probably other
> > protocols) have small corrupted segments (about 1-2 kilobytes long) in
> > random locations. Only downloads that sustain a high speed for at least a
> > few seconds are corrupted. Anything small enough to be received in less
> > than about 5 seconds is not affected.

Can that sucker be reproduced with netcat?  That would eliminate all issues
with multi-iovec recvmsg(2), narrowing the things down quite bit.

Another thing (and if that works, it's *NOT* a proper fix - it would be
papering over the problem, but at least it would show where to look for
it) - try (on top of mainline) the following delta:

diff --git a/net/core/datagram.c b/net/core/datagram.c
index b7de71f..0ee5995 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -734,7 +734,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
 	if (!chunk)
 		return 0;
 
-	if (msg_data_left(msg) < chunk) {
+	if (iov_iter_single_seg_count(&msg->msg_iter) < chunk) {
 		if (__skb_checksum_complete(skb))
 			goto csum_error;
 		if (skb_copy_datagram_msg(skb, hlen, msg, chunk))
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2016-07-24 17:45   ` Christian Lamparter
  (?)
  (?)
@ 2016-07-26  4:32   ` Alan Curry
  -1 siblings, 0 replies; 35+ messages in thread
From: Alan Curry @ 2016-07-26  4:32 UTC (permalink / raw)
  To: Christian Lamparter
  Cc: linux-wireless, netdev, linux-kernel, Al Viro, alexmcwhirter

Christian Lamparter wrote:
> 
> As for carl9170: I'm not sure what the driver or firmware can do about
> this at this time. You can try to disable the hardware crypto by setting
> nohwcrypt via the module option. However, this might not do anything at all.

The nohwcrypt parameter didn't make any difference.

> > 
> > lsusb identifies my network device as:
> > 
> > Bus 005 Device 004: ID 0cf3:1002 Atheros Communications, Inc. TP-Link TL-WN821N v2 802.11n [Atheros AR9170]
> > 
> > I have version 1.9.9 of carl9170-1.fw in /lib/firmware
> Just one additional question: Is the TL-WN821N connected to a USB3 port?

It never has been before. I tried it today and it made no difference.

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2016-07-24 17:45   ` Christian Lamparter
                     ` (2 preceding siblings ...)
  (?)
@ 2016-07-26  4:38   ` alexmcwhirter
  -1 siblings, 0 replies; 35+ messages in thread
From: alexmcwhirter @ 2016-07-26  4:38 UTC (permalink / raw)
  To: Christian Lamparter
  Cc: Alan Curry, linux-wireless, netdev, linux-kernel, Al Viro

> Thanks for the detailed bug-report. I looked around the web to see if 
> it
> was already reported or not. If found that this issue was reported 
> before:
> [0], [1] and [2] by the same person (CC'ed). One difference is that the
> reporter had this issue with rsync on multiple SPARC systems. I ran a
> git grep on a 4.7.0-rc7+ (wt-2016-07-21-15-g97bd3b0). But it didn't 
> find
> any patches directly referencing the commit. I'm not sure if this issue
> has been fixed by now or not. I would greatly appreciate any comment
> about this from the "people of netdev" (Al Viro? Alex Mcwhirter?).

I can confirm the issue i was having with this commit still exists on 
sparc with the latest mainline kernel.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2016-07-24 19:02     ` Al Viro
  (?)
@ 2016-07-26  4:57     ` Alan Curry
  2016-07-26 13:59       ` Christian Lamparter
  -1 siblings, 1 reply; 35+ messages in thread
From: Alan Curry @ 2016-07-26  4:57 UTC (permalink / raw)
  To: Al Viro
  Cc: Christian Lamparter, Alan Curry, linux-wireless, netdev,
	linux-kernel, alexmcwhirter

Al Viro wrote:
> On Sun, Jul 24, 2016 at 07:45:13PM +0200, Christian Lamparter wrote:
> 
> > > The symptom is that downloaded files (http, ftp, and probably other
> > > protocols) have small corrupted segments (about 1-2 kilobytes long) in
> > > random locations. Only downloads that sustain a high speed for at least a
> > > few seconds are corrupted. Anything small enough to be received in less
> > > than about 5 seconds is not affected.
> 
> Can that sucker be reproduced with netcat?  That would eliminate all issues
> with multi-iovec recvmsg(2), narrowing the things down quite bit.

netcat seems to be immune. Comparing strace results, I didn't see any
recvmsg() calls in the other programs that have had the problem, but there
is an interesting difference: netcat calls select() to wait for the socket
to be ready for reading, where my other test programs just call read() and
let it block until ready.

So I wrote a small test program to isolate that difference. It downloads
a file using only read() and write() and a hardcoded HTTP request. It has
a select mode (main loop alternates read() and select() on the TCP socket)
and a noselect mode (main loop just read()s the TCP socket).

The program is included at the bottom of this message.

I ran it several times in both modes and got corruption if and only if the
noselect mode was used.

> 
> Another thing (and if that works, it's *NOT* a proper fix - it would be
> papering over the problem, but at least it would show where to look for
> it) - try (on top of mainline) the following delta:
> 
> diff --git a/net/core/datagram.c b/net/core/datagram.c

Will try that patch soon. Meanwhile, here's my test:

/* Demonstration program "dlbug".
   Usage: dlbug select > outfile
          or
          dlbug noselect > outfile
   outfile will contain the full HTTP response. Edit out the HTTP headers
   and what's left should be a valid gzip if the download worked. */

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <sys/select.h>

int main(int argc, char **argv)
{
  const char *request =
    "GET /debian/dists/stable/main/Contents-amd64.gz HTTP/1.0\r\n"
    "Host: ftp.us.debian.org\r\n"
    "\r\n";
  ssize_t request_len = strlen(request), w, r, copied;
  struct addrinfo hints, *host;
  int sock, err, doselect;
  char buf[10240];

  if(argc!=2 || (!strcmp(argv[1], "select") && !strcmp(argv[1], "noselect"))) {
    fprintf(stderr, "Usage: %s {select|noselect}\n", argv[0]);
    return 1;
  }

  doselect = !strcmp(argv[1], "select");

  memset(&hints, 0, sizeof hints);
  hints.ai_family = AF_INET;
  hints.ai_socktype = SOCK_STREAM;

  err = getaddrinfo("ftp.us.debian.org", 0, &hints, &host);
  if(err) {
    fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(err));
    return 1;
  }

  sock = socket(host->ai_family, host->ai_socktype, host->ai_protocol);
  if(sock < 0) {
    perror("socket");
    return 1;
  }

  ((struct sockaddr_in *)host->ai_addr)->sin_port = htons(80);

  if(connect(sock, host->ai_addr, host->ai_addrlen) < 0) {
    perror("connect");
    return 1;
  }

  while(request_len) {
    w = write(sock, request, request_len);
    if(w < 0) {
      perror("write to socket");
      return 1;
    }
    request += w;
    request_len -= w;
  }

  while((r = read(sock, buf, sizeof buf))) {
    if(r < 0) {
      perror("read from socket");
      return 1;
    }

    copied = 0;
    while(copied < r) {
      w = write(1, buf+copied, r-copied);
      if(w < 0) {
        perror("write to stdout");
        return 1;
      }
      copied += w;
    }

    if(doselect) {
      fd_set rfds;
      FD_ZERO(&rfds);
      FD_SET(sock, &rfds);
      select(sock+1, &rfds, 0, 0, 0);
    }
  }

  return 0;
}

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2016-07-26  4:57     ` Alan Curry
@ 2016-07-26 13:59       ` Christian Lamparter
  2016-07-26 18:15         ` alexmcwhirter
  2016-07-27  1:14         ` Alan Curry
  0 siblings, 2 replies; 35+ messages in thread
From: Christian Lamparter @ 2016-07-26 13:59 UTC (permalink / raw)
  To: Alan Curry
  Cc: Al Viro, Christian Lamparter, linux-wireless, netdev,
	linux-kernel, alexmcwhirter

On Tuesday, July 26, 2016 4:57:03 AM CEST Alan Curry wrote:
> Al Viro wrote:
> > On Sun, Jul 24, 2016 at 07:45:13PM +0200, Christian Lamparter wrote:
> > 
> > > > The symptom is that downloaded files (http, ftp, and probably other
> > > > protocols) have small corrupted segments (about 1-2 kilobytes long) in
> > > > random locations. Only downloads that sustain a high speed for at least a
> > > > few seconds are corrupted. Anything small enough to be received in less
> > > > than about 5 seconds is not affected.
> > 
> > Can that sucker be reproduced with netcat?  That would eliminate all issues
> > with multi-iovec recvmsg(2), narrowing the things down quite bit.
> 
> netcat seems to be immune. Comparing strace results, I didn't see any
> recvmsg() calls in the other programs that have had the problem, but there
> is an interesting difference: netcat calls select() to wait for the socket
> to be ready for reading, where my other test programs just call read() and
> let it block until ready.
> 
> So I wrote a small test program to isolate that difference. It downloads
> a file using only read() and write() and a hardcoded HTTP request. It has
> a select mode (main loop alternates read() and select() on the TCP socket)
> and a noselect mode (main loop just read()s the TCP socket).
> 
> The program is included at the bottom of this message.
> 
> I ran it several times in both modes and got corruption if and only if the
> noselect mode was used.
> 
> > 
> > Another thing (and if that works, it's *NOT* a proper fix - it would be
> > papering over the problem, but at least it would show where to look for
> > it) - try (on top of mainline) the following delta:
> > 
> > diff --git a/net/core/datagram.c b/net/core/datagram.c
> 
> Will try that patch soon. Meanwhile, here's my test:
> 
> /* Demonstration program "dlbug".
>    Usage: dlbug select > outfile
>           or
>           dlbug noselect > outfile
>    outfile will contain the full HTTP response. Edit out the HTTP headers
>    and what's left should be a valid gzip if the download worked. */
> [...]
Thanks, I gave the program a try with my WNDA3100 and a WN821N v2 devices.
I did not see any corruptions in any of the tests though. Can you tell me
something about your wireless network too? I would like to know what router
and firmware are you using? Also important: what's your wireless configuration?
(WPA?, CCMP or TKIP? HT40, HT20 or Legacy rates? ...)

Probably the quickest and easiest way to get that information is by running
the following commands as root, when you are connected to your wifi network
and post the results:
# iw dev wlan0 link
# iw dev wlan0 scan dump

(You can of course remove your device's MACs, but please do it consistently).

Regards,
Christian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2016-07-26 13:59       ` Christian Lamparter
@ 2016-07-26 18:15         ` alexmcwhirter
  2016-07-27  6:39           ` Kalle Valo
  2016-07-27  1:14         ` Alan Curry
  1 sibling, 1 reply; 35+ messages in thread
From: alexmcwhirter @ 2016-07-26 18:15 UTC (permalink / raw)
  To: Christian Lamparter
  Cc: Alan Curry, Al Viro, linux-wireless, netdev, linux-kernel

On 2016-07-26 09:59, Christian Lamparter wrote:
> Thanks, I gave the program a try with my WNDA3100 and a WN821N v2 
> devices.
> I did not see any corruptions in any of the tests though. Can you tell 
> me
> something about your wireless network too? I would like to know what 
> router
> and firmware are you using? Also important: what's your wireless 
> configuration?
> (WPA?, CCMP or TKIP? HT40, HT20 or Legacy rates? ...)
> 
> Probably the quickest and easiest way to get that information is by 
> running
> the following commands as root, when you are connected to your wifi 
> network
> and post the results:
> # iw dev wlan0 link
> # iw dev wlan0 scan dump
> 
> (You can of course remove your device's MACs, but please do it 
> consistently).
> 
> Regards,
> Christian

I wonder if this is something that is only affecting certain NIC's? For 
example, it see this issue on sun4u boxes with tigon3 and hme 
interfaces. But on my sun4v boxes that have e1000 interfaces everything 
works fine.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2016-07-26 13:59       ` Christian Lamparter
  2016-07-26 18:15         ` alexmcwhirter
@ 2016-07-27  1:14         ` Alan Curry
  1 sibling, 0 replies; 35+ messages in thread
From: Alan Curry @ 2016-07-27  1:14 UTC (permalink / raw)
  To: Christian Lamparter
  Cc: Alan Curry, Al Viro, linux-wireless, netdev, linux-kernel, alexmcwhirter

Christian Lamparter wrote:
> Thanks, I gave the program a try with my WNDA3100 and a WN821N v2 devices.
> I did not see any corruptions in any of the tests though. Can you tell me
> something about your wireless network too? I would like to know what router
> and firmware are you using? Also important: what's your wireless configuration?

The router/access-point is a Comcast-issued Technicolor cable modem, model
TC8305C. The only thing I can find on it that looks like it might identify
a firmware version is this:

    System Software Version

    eMTA & DOCSIS Software Version: 01.E6.01.22.25
    Packet Cable: 2.0

I assume Comcast pushes firmware updates whenever they feel like it.

There is possibly another clue. I get this message from the kernel sometimes:

    ieee80211 phy0: invalid plcp cck rate (0).

I had this message appearing long before the data corruption bug started.
It never correlated with any actual problems, so I turned down the priority
level of the message to get it off the console, and forgot about it. I
was unable to discover what a "plcp" or "cck" is so the message means
nothing to me.

> (WPA?, CCMP or TKIP? HT40, HT20 or Legacy rates? ...)
> 
> Probably the quickest and easiest way to get that information is by running
> the following commands as root, when you are connected to your wifi network
> and post the results:
> # iw dev wlan0 link
> # iw dev wlan0 scan dump

Connected to cc:03:fa:bf:e9:ea (on wlan0)
	SSID: HOME-E9EA
	freq: 2462
	RX: 20726719 bytes (106483 packets)
	TX: 5902478 bytes (44707 packets)
	signal: -43 dBm
	tx bitrate: 54.0 MBit/s

	bss flags:	short-slot-time
	dtim period:	1
	beacon int:	100

BSS cc:03:fa:bf:e9:ea(on wlan0) -- associated
	TSF: 236407205748 usec (2d, 17:40:07)
	freq: 2462
	beacon interval: 100 TUs
	capability: ESS Privacy ShortSlotTime (0x0411)
	signal: -33.00 dBm
	last seen: 634452 ms ago
	Information elements from Probe Response frame:
	SSID: HOME-E9EA
	Supported rates: 1.0* 2.0* 5.5* 11.0* 18.0 24.0* 36.0 54.0 
	DS Parameter set: channel 11
	ERP: <no flags>
	ERP D4.0: <no flags>
	RSN:	 * Version: 1
		 * Group cipher: TKIP
		 * Pairwise ciphers: CCMP TKIP
		 * Authentication suites: PSK
		 * Capabilities: 16-PTKSA-RC 1-GTKSA-RC (0x000c)
	Extended supported rates: 6.0* 9.0 12.0* 48.0 
	HT capabilities:
		Capabilities: 0x18bd
			RX LDPC
			HT20
			SM Power Save disabled
			RX Greenfield
			RX HT20 SGI
			TX STBC
			No RX STBC
			Max AMSDU length: 7935 bytes
			DSSS/CCK HT40
		Maximum RX AMPDU length 65535 bytes (exponent: 0x003)
		Minimum RX AMPDU time spacing: 8 usec (0x06)
		HT RX MCS rate indexes supported: 0-23
		HT TX MCS rate indexes are undefined
	HT operation:
		 * primary channel: 11
		 * secondary channel offset: no secondary
		 * STA channel width: 20 MHz
		 * RIFS: 1
		 * HT protection: nonmember
		 * non-GF present: 1
		 * OBSS non-GF present: 1
		 * dual beacon: 0
		 * dual CTS protection: 0
		 * STBC beacon: 0
		 * L-SIG TXOP Prot: 0
		 * PCO active: 0
		 * PCO phase: 0
	WPS:	 * Version: 1.0
		 * Wi-Fi Protected Setup State: 2 (Configured)
		 * Response Type: 3 (AP)
		 * UUID: 6d1b1911-14a9-391c-cdee-89850a5aa1ef
		 * Manufacturer: Technicolor
		 * Model: Technicolor
		 * Model Number: 123456
		 * Serial Number: 0000001
		 * Primary Device Type: 6-0050f204-1
		 * Device name: TechnicolorAP
		 * Config methods: Display
		 * RF Bands: 0x1
		 * Unknown TLV (0x1049, 6 bytes): 00 37 2a 00 01 20
	WPA:	 * Version: 1
		 * Group cipher: TKIP
		 * Pairwise ciphers: CCMP TKIP
		 * Authentication suites: PSK
		 * Capabilities: 16-PTKSA-RC 1-GTKSA-RC (0x000c)
	WMM:	 * Parameter version 1
		 * u-APSD
		 * BE: CW 15-1023, AIFSN 3
		 * BK: CW 15-1023, AIFSN 7
		 * VI: CW 7-15, AIFSN 2, TXOP 3008 usec
		 * VO: CW 3-7, AIFSN 2, TXOP 1504 usec

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2016-07-26 18:15         ` alexmcwhirter
@ 2016-07-27  6:39           ` Kalle Valo
  0 siblings, 0 replies; 35+ messages in thread
From: Kalle Valo @ 2016-07-27  6:39 UTC (permalink / raw)
  To: alexmcwhirter
  Cc: Christian Lamparter, Alan Curry, Al Viro, linux-wireless, netdev,
	linux-kernel

alexmcwhirter@triadic.us writes:

> On 2016-07-26 09:59, Christian Lamparter wrote:
>> Thanks, I gave the program a try with my WNDA3100 and a WN821N v2
>> devices.
>> I did not see any corruptions in any of the tests though. Can you
>> tell me
>> something about your wireless network too? I would like to know what
>> router
>> and firmware are you using? Also important: what's your wireless
>> configuration?
>> (WPA?, CCMP or TKIP? HT40, HT20 or Legacy rates? ...)
>>
>> Probably the quickest and easiest way to get that information is by
>> running
>> the following commands as root, when you are connected to your wifi
>> network
>> and post the results:
>> # iw dev wlan0 link
>> # iw dev wlan0 scan dump
>>
>> (You can of course remove your device's MACs, but please do it
>> consistently).
>>
>> Regards,
>> Christian
>
> I wonder if this is something that is only affecting certain NIC's?
> For example, it see this issue on sun4u boxes with tigon3 and hme
> interfaces. But on my sun4v boxes that have e1000 interfaces
> everything works fine.

Kernel configuration might also make a difference. Trying to find an
Kconfig option which affects this could be useful.

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2016-07-24 19:02     ` Al Viro
  (?)
  (?)
@ 2016-07-27 10:32     ` Alan Curry
  2016-07-27 18:04       ` alexmcwhirter
  -1 siblings, 1 reply; 35+ messages in thread
From: Alan Curry @ 2016-07-27 10:32 UTC (permalink / raw)
  To: Al Viro
  Cc: Christian Lamparter, Alan Curry, linux-wireless, netdev,
	linux-kernel, alexmcwhirter

Al Viro wrote:
> 
> Another thing (and if that works, it's *NOT* a proper fix - it would be
> papering over the problem, but at least it would show where to look for
> it) - try (on top of mainline) the following delta:

I tried it on top of v4.6.4 and on top of the very recent v4.7-2509-g59ebc44
from Linus, and still got corruption.

> 
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index b7de71f..0ee5995 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -734,7 +734,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
>  	if (!chunk)
>  		return 0;
>  
> -	if (msg_data_left(msg) < chunk) {
> +	if (iov_iter_single_seg_count(&msg->msg_iter) < chunk) {
>  		if (__skb_checksum_complete(skb))
>  			goto csum_error;
>  		if (skb_copy_datagram_msg(skb, hlen, msg, chunk))
> 

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2016-07-27 10:32     ` Alan Curry
@ 2016-07-27 18:04       ` alexmcwhirter
  2016-07-27 23:02         ` alexmcwhirter
  0 siblings, 1 reply; 35+ messages in thread
From: alexmcwhirter @ 2016-07-27 18:04 UTC (permalink / raw)
  To: Alan Curry
  Cc: Al Viro, Christian Lamparter, linux-wireless, netdev, linux-kernel

Just to add some more information to this, the corruption seems to 
effect ssh as well.

Using a sun hme interface, occasionally upon an ssh connection it will 
refuse to authenticate a client with either password or cert 
authentication. Using wireshark to capture and decrypt the packets 
between the two machines, the data coming from the server seems good, 
but the data received by the server from the client is essentially 
garbage. Note that the client is sending valid data, but the server is 
corrupting it upon receipt. Closing the connection and starting a new 
one will remedy the login issue, but you do occasionally see corruption 
on the server side sporadically.

So far this only seems to occur on incoming data, outgoing data seems 
fine.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2016-07-27 18:04       ` alexmcwhirter
@ 2016-07-27 23:02         ` alexmcwhirter
  2016-07-27 23:45           ` David Miller
  0 siblings, 1 reply; 35+ messages in thread
From: alexmcwhirter @ 2016-07-27 23:02 UTC (permalink / raw)
  To: Alan Curry
  Cc: Al Viro, Christian Lamparter, linux-wireless, netdev, linux-kernel

On 2016-07-27 14:04, alexmcwhirter@triadic.us wrote:
> Just to add some more information to this, the corruption seems to
> effect ssh as well.
> 
> Using a sun hme interface, occasionally upon an ssh connection it will
> refuse to authenticate a client with either password or cert
> authentication. Using wireshark to capture and decrypt the packets
> between the two machines, the data coming from the server seems good,
> but the data received by the server from the client is essentially
> garbage. Note that the client is sending valid data, but the server is
> corrupting it upon receipt. Closing the connection and starting a new
> one will remedy the login issue, but you do occasionally see
> corruption on the server side sporadically.
> 
> So far this only seems to occur on incoming data, outgoing data seems 
> fine.

Also, there is another patch the references this commit on sparc64 at 
least.

https://patchwork.kernel.org/patch/9221895/

I highly expect both my issue and OP's issue to revolve not around 
commit e5a4b0bb803b specifically, but around other code that no longer 
behaves as expected because of it.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2016-07-27 23:02         ` alexmcwhirter
@ 2016-07-27 23:45           ` David Miller
  2016-07-28  0:31             ` Al Viro
  0 siblings, 1 reply; 35+ messages in thread
From: David Miller @ 2016-07-27 23:45 UTC (permalink / raw)
  To: alexmcwhirter
  Cc: rlwinm, viro, chunkeey, linux-wireless, netdev, linux-kernel

From: alexmcwhirter@triadic.us
Date: Wed, 27 Jul 2016 19:02:40 -0400

> On 2016-07-27 14:04, alexmcwhirter@triadic.us wrote:
>> Just to add some more information to this, the corruption seems to
>> effect ssh as well.
>> Using a sun hme interface, occasionally upon an ssh connection it will
>> refuse to authenticate a client with either password or cert
>> authentication. Using wireshark to capture and decrypt the packets
>> between the two machines, the data coming from the server seems good,
>> but the data received by the server from the client is essentially
>> garbage. Note that the client is sending valid data, but the server is
>> corrupting it upon receipt. Closing the connection and starting a new
>> one will remedy the login issue, but you do occasionally see
>> corruption on the server side sporadically.
>> So far this only seems to occur on incoming data, outgoing data seems
>> fine.
> 
> Also, there is another patch the references this commit on sparc64 at
> least.
> 
> https://patchwork.kernel.org/patch/9221895/
> 
> I highly expect both my issue and OP's issue to revolve not around
> commit e5a4b0bb803b specifically, but around other code that no longer
> behaves as expected because of it.

Indeed, and that fault address rounding bug occurs two other times
in arch/sparc/lib/user_fixup.c

The mentioned patchwork patch should fix the bug and I'll get that
into my sparc tree, merged, and queued up for -stable ASAP.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2016-07-28  0:31             ` Al Viro
@ 2016-07-28  0:26               ` alexmcwhirter
  2016-07-28  1:22                   ` Al Viro
  0 siblings, 1 reply; 35+ messages in thread
From: alexmcwhirter @ 2016-07-28  0:26 UTC (permalink / raw)
  To: Al Viro
  Cc: David Miller, rlwinm, chunkeey, linux-wireless, netdev,
	linux-kernel, Al Viro

On 2016-07-27 20:31, Al Viro wrote:
> On Wed, Jul 27, 2016 at 04:45:43PM -0700, David Miller wrote:
> 
>> > I highly expect both my issue and OP's issue to revolve not around
>> > commit e5a4b0bb803b specifically, but around other code that no longer
>> > behaves as expected because of it.
>> 
>> Indeed, and that fault address rounding bug occurs two other times
>> in arch/sparc/lib/user_fixup.c
>> 
>> The mentioned patchwork patch should fix the bug and I'll get that
>> into my sparc tree, merged, and queued up for -stable ASAP.
> 
> Plausible for sparc, but I don't see similar __copy_to_user_inatomic()
> bugs in case of x86_64...

I'm going to go ahead and say this is where my issue and the op's issue 
begin to branch apart from one another. He's seeing this on all incoming 
data, whereas i am only seeing it on ssl data and not on sun4v.

At this point i would say data from my issue is only going to cloud this 
issue as they seem to be two completely different issues revolving 
around the same commit. If i come across any relevant data for x86_64 
ill be sure to post it if this isn't resolved by then, but for now i'm 
going to refrain from submitting anything sparc related.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2016-07-27 23:45           ` David Miller
@ 2016-07-28  0:31             ` Al Viro
  2016-07-28  0:26               ` alexmcwhirter
  0 siblings, 1 reply; 35+ messages in thread
From: Al Viro @ 2016-07-28  0:31 UTC (permalink / raw)
  To: David Miller
  Cc: alexmcwhirter, rlwinm, chunkeey, linux-wireless, netdev, linux-kernel

On Wed, Jul 27, 2016 at 04:45:43PM -0700, David Miller wrote:

> > I highly expect both my issue and OP's issue to revolve not around
> > commit e5a4b0bb803b specifically, but around other code that no longer
> > behaves as expected because of it.
> 
> Indeed, and that fault address rounding bug occurs two other times
> in arch/sparc/lib/user_fixup.c
> 
> The mentioned patchwork patch should fix the bug and I'll get that
> into my sparc tree, merged, and queued up for -stable ASAP.

Plausible for sparc, but I don't see similar __copy_to_user_inatomic()
bugs in case of x86_64...

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
@ 2016-07-28  1:22                   ` Al Viro
  0 siblings, 0 replies; 35+ messages in thread
From: Al Viro @ 2016-07-28  1:22 UTC (permalink / raw)
  To: alexmcwhirter
  Cc: David Miller, rlwinm, chunkeey, linux-wireless, netdev,
	linux-kernel, Al Viro

On Wed, Jul 27, 2016 at 08:26:48PM -0400, alexmcwhirter@triadic.us wrote:

> I'm going to go ahead and say this is where my issue and the op's issue
> begin to branch apart from one another. He's seeing this on all incoming
> data, whereas i am only seeing it on ssl data and not on sun4v.
> 
> At this point i would say data from my issue is only going to cloud this
> issue as they seem to be two completely different issues revolving around
> the same commit. If i come across any relevant data for x86_64 ill be sure
> to post it if this isn't resolved by then, but for now i'm going to refrain
> from submitting anything sparc related.

Which just might mean that we have *three* issues here -
	(1) buggered __copy_to_user_inatomic() (and friends) on some sparcs
	(2) your ssl-only corruption
	(3) Alan's x86_64 corruption on plain TCP read - no ssl *or* sparc
anywhere, and no multi-segment recvmsg().  Which would strongly argue in
favour of some kind of copy_page_to_iter() breakage triggered when handling
a fragmented skb, as in (1).  Except that I don't see anything similar in
x86_64 uaccess primitives...

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
@ 2016-07-28  1:22                   ` Al Viro
  0 siblings, 0 replies; 35+ messages in thread
From: Al Viro @ 2016-07-28  1:22 UTC (permalink / raw)
  To: alexmcwhirter-O8/uFoRGvHWcqzYg7KEe8g
  Cc: David Miller, rlwinm-WF+c3Tt1nJM,
	chunkeey-gM/Ye1E23mwN+BqQ9rBEUg,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Al Viro

On Wed, Jul 27, 2016 at 08:26:48PM -0400, alexmcwhirter-O8/uFoRGvHWcqzYg7KEe8g@public.gmane.org wrote:

> I'm going to go ahead and say this is where my issue and the op's issue
> begin to branch apart from one another. He's seeing this on all incoming
> data, whereas i am only seeing it on ssl data and not on sun4v.
> 
> At this point i would say data from my issue is only going to cloud this
> issue as they seem to be two completely different issues revolving around
> the same commit. If i come across any relevant data for x86_64 ill be sure
> to post it if this isn't resolved by then, but for now i'm going to refrain
> from submitting anything sparc related.

Which just might mean that we have *three* issues here -
	(1) buggered __copy_to_user_inatomic() (and friends) on some sparcs
	(2) your ssl-only corruption
	(3) Alan's x86_64 corruption on plain TCP read - no ssl *or* sparc
anywhere, and no multi-segment recvmsg().  Which would strongly argue in
favour of some kind of copy_page_to_iter() breakage triggered when handling
a fragmented skb, as in (1).  Except that I don't see anything similar in
x86_64 uaccess primitives...
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2016-07-28  1:22                   ` Al Viro
  (?)
@ 2016-08-03  3:49                   ` Alan Curry
  2016-08-03 12:43                     ` Christian Lamparter
       [not found]                     ` <20160803054118.GG2356@ZenIV.linux.org.uk>
  -1 siblings, 2 replies; 35+ messages in thread
From: Alan Curry @ 2016-08-03  3:49 UTC (permalink / raw)
  To: Al Viro
  Cc: alexmcwhirter, David Miller, rlwinm, chunkeey, linux-wireless,
	netdev, linux-kernel

Al Viro wrote:
> 
> Which just might mean that we have *three* issues here -
> 	(1) buggered __copy_to_user_inatomic() (and friends) on some sparcs
> 	(2) your ssl-only corruption
> 	(3) Alan's x86_64 corruption on plain TCP read - no ssl *or* sparc
> anywhere, and no multi-segment recvmsg().  Which would strongly argue in
> favour of some kind of copy_page_to_iter() breakage triggered when handling
> a fragmented skb, as in (1).  Except that I don't see anything similar in
> x86_64 uaccess primitives...
> 

I think I've solved (3) at least...

Using the twin weapons of printk and stubbornness, I have built a working
theory of the bug. I haven't traced it all the way through, so my explanation
may be partly wrong. I do have a patch that eliminates the symptom in all my
tests though. Here's what happens:

A corrupted packet somehow arrives in skb_copy_and_csum_datagram_msg().
During downloads at reasonably high speed, about 0.1% of my incoming
packets are bad. Probably because the access point is that suspicious
Comcast thing.

skb_copy_and_csum_datagram_msg() does this:

		if (skb_copy_and_csum_datagram(skb, hlen, &msg->msg_iter,
					       chunk, &csum))
			goto fault;
		if (csum_fold(csum))
			goto csum_error;

skb_copy_and_csum_datagram() copies the bad data, computes the checksum,
and *advances the iterator*. The checksum is bad, so it goes to
csum_error, which returns without indicating success to userspace, but the
bad data is in the userspace buffer, and since the iterator has advanced,
the proper data doesn't get written to the proper place when it arrives in a
retransmission. The same iterator is still used because we're still in the
same syscall (I guess - this is one of the parts I didn't check out).

My ugly patch fixes this in the most obvious way: make a local copy of
msg->msg_iter before the call to skb_copy_and_csum_datagram(), and copy it
back if the checksum is bad, just before "goto csum_error;". (I wonder if the
other failure exits from this function might need to do the same thing.)

You can probably reproduce this problem if you deliberately inject some
bad TCP checksums into a stream. Just make sure the receiving machine is
in a blocking read() on the socket when the bad packet arrives. You may
need to resend the offending packet afterward with the checksum corrected.

diff --git a/net/core/datagram.c b/net/core/datagram.c
index b7de71f..574d4bf 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -730,6 +730,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
 {
 	__wsum csum;
 	int chunk = skb->len - hlen;
+	struct iov_iter save_iter;
 
 	if (!chunk)
 		return 0;
@@ -741,11 +742,14 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
 			goto fault;
 	} else {
 		csum = csum_partial(skb->data, hlen, skb->csum);
+		memcpy(&save_iter, &msg->msg_iter, sizeof save_iter);
 		if (skb_copy_and_csum_datagram(skb, hlen, &msg->msg_iter,
 					       chunk, &csum))
 			goto fault;
-		if (csum_fold(csum))
+		if (csum_fold(csum)) {
+			memcpy(&msg->msg_iter, &save_iter, sizeof save_iter);
 			goto csum_error;
+		}
 		if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE))
 			netdev_rx_csum_fault(skb->dev);
 	}

-- 
Alan Curry

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2016-08-03  3:49                   ` Alan Curry
@ 2016-08-03 12:43                     ` Christian Lamparter
  2016-08-03 23:25                       ` Alan Curry
       [not found]                     ` <20160803054118.GG2356@ZenIV.linux.org.uk>
  1 sibling, 1 reply; 35+ messages in thread
From: Christian Lamparter @ 2016-08-03 12:43 UTC (permalink / raw)
  To: Alan Curry
  Cc: Al Viro, alexmcwhirter, David Miller, chunkeey, linux-wireless,
	netdev, linux-kernel

On Wednesday, August 3, 2016 3:49:26 AM CEST Alan Curry wrote:
> Al Viro wrote:
> > 
> > Which just might mean that we have *three* issues here -
> > 	(1) buggered __copy_to_user_inatomic() (and friends) on some sparcs
> > 	(2) your ssl-only corruption
> > 	(3) Alan's x86_64 corruption on plain TCP read - no ssl *or* sparc
> > anywhere, and no multi-segment recvmsg().  Which would strongly argue in
> > favour of some kind of copy_page_to_iter() breakage triggered when handling
> > a fragmented skb, as in (1).  Except that I don't see anything similar in
> > x86_64 uaccess primitives...
> > 
> 
> I think I've solved (3) at least...
> 
> Using the twin weapons of printk and stubbornness, I have built a working
> theory of the bug. I haven't traced it all the way through, so my explanation
> may be partly wrong. I do have a patch that eliminates the symptom in all my
> tests though. Here's what happens:
> 
> A corrupted packet somehow arrives in skb_copy_and_csum_datagram_msg().
> During downloads at reasonably high speed, about 0.1% of my incoming
> packets are bad. Probably because the access point is that suspicious
> Comcast thing.

Thanks for being very persistent with this. I think I'm able to reproduce
this now (on any hardware... like r8169 ethernet) as long as the following 
"traffic policy" is enacted on the HTTP - Server: 

# tc qdisc add dev eth0 root netem corrupt 0.1%

(This needs the "Network Emulation" Sched CONFIG_NET_SCH_NETEM [0].)

With your tool (changed to point to my apache local server). I'm seeing 
corruptions in the "noselect" case. Running it in "select" mode however
and the resulting files have no corruptions.

About AR9170 corruption issues: I know of one report that the AR9170's
Encryption Engine can cause corruptions [1]. In this case outgoing
data was corrupted which lead to deauths/disassocs since the AP was
basically sending out multicast deauths/disassocs with bad addresses.
However, "nohwcrypt" should have made a difference there since the
software decryption would discard the faulty package due the message
integrety checks.

Another source for corruptions could be the USB-PHY (FUSB200) in the
AR9170 [2]. I know it's causing problems for the ath9k_htc. However
not everyone is affected.

One thing I noticed in your previous post is that you "might" not have
draft-802.11n enabled. Do you see any "disabling HT/VHT due to WEP/TKIP use."
in your dmesg logs? If so, check if you can force your AP to use WPA2
with CCMP/AES only.

Regards,
Christian

[0] <http://www.spinics.net/lists/linux-wireless/msg60104.html>
[1] <https://wiki.linuxfoundation.org/networking/netem>
[2] <https://github.com/qca/open-ath9k-htc-firmware/wiki/usb-related-issues>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2016-08-03 12:43                     ` Christian Lamparter
@ 2016-08-03 23:25                       ` Alan Curry
  0 siblings, 0 replies; 35+ messages in thread
From: Alan Curry @ 2016-08-03 23:25 UTC (permalink / raw)
  To: Christian Lamparter
  Cc: Alan Curry, Al Viro, alexmcwhirter, David Miller, linux-wireless,
	netdev, linux-kernel

Christian Lamparter wrote:
> 
> One thing I noticed in your previous post is that you "might" not have
> draft-802.11n enabled. Do you see any "disabling HT/VHT due to WEP/TKIP use."
> in your dmesg logs? If so, check if you can force your AP to use WPA2
> with CCMP/AES only.
> 

Yes, I've had that message. The reason wan't on the AP though. My
wpa_supplicant.conf only had TKIP enabled, because that's what was in the
sample configuration file I started with. Adding CCMP there worked, and in
that mode I'm no longer getting any corrupted packets.

If I'd paid attention to the encryption options when setting up this network
originally, I would have had CCMP the whole time, with no corrupted packets,
and never would have found the iov iterator bug...

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
       [not found]                           ` <20170210081126.GA14157@ZenIV.linux.org.uk>
@ 2017-02-10 21:45                             ` Al Viro
  2017-02-11 19:37                               ` Christian Lamparter
  0 siblings, 1 reply; 35+ messages in thread
From: Al Viro @ 2017-02-10 21:45 UTC (permalink / raw)
  To: netdev
  Cc: Eric Dumazet, Alan Curry, alexmcwhirter, David Miller,
	Christian Lamparter

[repost with netdev added - hadn't realized it wasn't in Cc]

On Tue, Aug 09, 2016 at 03:58:36PM +0100, Al Viro wrote:

> Actually returning to the original behaviour would be "restore ->msg_iter
> if we tried skb_copy_and_csum_datagram() and failed for any reason".  Which
> would be bloody inconsistent wrt EFAULT, since the other branch (chunk
> large enough to cover the entire recvmsg()) will copy as much as it can
> and (in old kernel) drain iovec or (on the current one) leave iov_iter
> advance unreverted.

To resurrect the old thread: the problem is still there.  Namely, csum
mismatch on packet should leave the iterator as it had been.  That much
is clear; the question is what should be done on EFAULT halfway through.

Semantics of both csum and non-csum skb_copy_datagram_msg() variants in
EFAULT case is an interesting question.  None of that family report
partial copy; it's full or -EFAULT.  So for the sake of basic sanity
it would be better to leave iterator in the original state when that
kind of thing happens.  On the other hand, quite a few callers don't
care about the state of iterator after that and I wonder if the overhead
would be sensitive.  OTTH, the overhead in question is "save 5 words into
local variable and don't use it in the normal case" - in the code that
copies an skb worth of data.

AFAICS, the following gives consistent (and minimally surprising) semantics,
as well as fixing the outright bug with iov_iter left advanced in case of csum
errors.  Comments?

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index c27011bbe30c..14ae17e77603 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -848,7 +848,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
 		vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto);
 		total += VLAN_HLEN;
 
-		ret = skb_copy_datagram_iter(skb, 0, iter, vlan_offset);
+		ret = __skb_copy_datagram_iter(skb, 0, iter, vlan_offset);
 		if (ret || !iov_iter_count(iter))
 			goto done;
 
@@ -857,7 +857,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
 			goto done;
 	}
 
-	ret = skb_copy_datagram_iter(skb, vlan_offset, iter,
+	ret = __skb_copy_datagram_iter(skb, vlan_offset, iter,
 				     skb->len - vlan_offset);
 
 done:
@@ -899,11 +899,14 @@ static ssize_t macvtap_do_read(struct macvtap_queue *q,
 		finish_wait(sk_sleep(&q->sk), &wait);
 
 	if (skb) {
+		struct iov_iter saved = *to;
 		ret = macvtap_put_user(q, skb, to);
-		if (unlikely(ret < 0))
+		if (unlikely(ret < 0)) {
+			*to = saved;
 			kfree_skb(skb);
-		else
+		} else {
 			consume_skb(skb);
+		}
 	}
 	return ret;
 }
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index a411b43a69eb..0d8badc3c4e9 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -480,7 +480,7 @@ static ssize_t ppp_read(struct file *file, char __user *buf,
 	iov.iov_base = buf;
 	iov.iov_len = count;
 	iov_iter_init(&to, READ, &iov, 1, count);
-	if (skb_copy_datagram_iter(skb, 0, &to, skb->len))
+	if (__skb_copy_datagram_iter(skb, 0, &to, skb->len))
 		goto outf;
 	ret = skb->len;
 
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 30863e378925..2003b8c9970e 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1430,7 +1430,7 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 
 		vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto);
 
-		ret = skb_copy_datagram_iter(skb, 0, iter, vlan_offset);
+		ret = __skb_copy_datagram_iter(skb, 0, iter, vlan_offset);
 		if (ret || !iov_iter_count(iter))
 			goto done;
 
@@ -1439,7 +1439,8 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 			goto done;
 	}
 
-	skb_copy_datagram_iter(skb, vlan_offset, iter, skb->len - vlan_offset);
+	/* XXX: no error check? */
+	__skb_copy_datagram_iter(skb, vlan_offset, iter, skb->len - vlan_offset);
 
 done:
 	/* caller is in process context, */
@@ -1501,6 +1502,7 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
 {
 	struct sk_buff *skb;
 	ssize_t ret;
+	struct iov_iter saved;
 	int err;
 
 	tun_debug(KERN_INFO, tun, "tun_do_read\n");
@@ -1513,11 +1515,14 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
 	if (!skb)
 		return err;
 
+	saved = *to;
 	ret = tun_put_user(tun, tfile, skb, to);
-	if (unlikely(ret < 0))
+	if (unlikely(ret < 0)) {
+		*to = saved;
 		kfree_skb(skb);
-	else
+	} else {
 		consume_skb(skb);
+	}
 
 	return ret;
 }
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f1adddc1c5ac..ee8d962373af 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3060,8 +3060,17 @@ struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned flags, int noblock,
 				  int *err);
 unsigned int datagram_poll(struct file *file, struct socket *sock,
 			   struct poll_table_struct *wait);
-int skb_copy_datagram_iter(const struct sk_buff *from, int offset,
+int __skb_copy_datagram_iter(const struct sk_buff *from, int offset,
 			   struct iov_iter *to, int size);
+static inline int skb_copy_datagram_iter(const struct sk_buff *from, int offset,
+					struct iov_iter *to, int size)
+{
+	struct iov_iter saved = *to;
+	int res = __skb_copy_datagram_iter(from, offset, to, size);
+	if (unlikely(res))
+		*to = saved;
+	return res;
+}
 static inline int skb_copy_datagram_msg(const struct sk_buff *from, int offset,
 					struct msghdr *msg, int size)
 {
diff --git a/net/core/datagram.c b/net/core/datagram.c
index ea633342ab0d..33ff2046dda1 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -394,7 +394,7 @@ EXPORT_SYMBOL(skb_kill_datagram);
  *	@to: iovec iterator to copy to
  *	@len: amount of data to copy from buffer to iovec
  */
-int skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
+int __skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
 			   struct iov_iter *to, int len)
 {
 	int start = skb_headlen(skb);
@@ -445,7 +445,7 @@ int skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
 		if ((copy = end - offset) > 0) {
 			if (copy > len)
 				copy = len;
-			if (skb_copy_datagram_iter(frag_iter, offset - start,
+			if (__skb_copy_datagram_iter(frag_iter, offset - start,
 						   to, copy))
 				goto fault;
 			if ((len -= copy) == 0)
@@ -471,7 +471,7 @@ int skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
 
 	return 0;
 }
-EXPORT_SYMBOL(skb_copy_datagram_iter);
+EXPORT_SYMBOL(__skb_copy_datagram_iter);
 
 /**
  *	skb_copy_datagram_from_iter - Copy a datagram from an iov_iter.
@@ -750,14 +750,16 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
 {
 	__wsum csum;
 	int chunk = skb->len - hlen;
+	struct iov_iter saved;
 
 	if (!chunk)
 		return 0;
 
+	saved = msg->msg_iter;
 	if (msg_data_left(msg) < chunk) {
 		if (__skb_checksum_complete(skb))
 			goto csum_error;
-		if (skb_copy_datagram_msg(skb, hlen, msg, chunk))
+		if (__skb_copy_datagram_iter(skb, hlen, &msg->msg_iter, chunk))
 			goto fault;
 	} else {
 		csum = csum_partial(skb->data, hlen, skb->csum);
@@ -771,8 +773,10 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
 	}
 	return 0;
 csum_error:
+	msg->msg_iter = saved;
 	return -EINVAL;
 fault:
+	msg->msg_iter = saved;
 	return -EFAULT;
 }
 EXPORT_SYMBOL(skb_copy_and_csum_datagram_msg);

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2017-02-10 21:45                             ` Al Viro
@ 2017-02-11 19:37                               ` Christian Lamparter
  2017-02-12  5:42                                 ` Al Viro
  0 siblings, 1 reply; 35+ messages in thread
From: Christian Lamparter @ 2017-02-11 19:37 UTC (permalink / raw)
  To: Al Viro; +Cc: netdev, Eric Dumazet, Alan Curry, alexmcwhirter, David Miller

On Friday, February 10, 2017 9:45:13 PM CET Al Viro wrote:
> On Tue, Aug 09, 2016 at 03:58:36PM +0100, Al Viro wrote:
> 
> > Actually returning to the original behaviour would be "restore ->msg_iter
> > if we tried skb_copy_and_csum_datagram() and failed for any reason".  Which
> > would be bloody inconsistent wrt EFAULT, since the other branch (chunk
> > large enough to cover the entire recvmsg()) will copy as much as it can
> > and (in old kernel) drain iovec or (on the current one) leave iov_iter
> > advance unreverted.
> 
> To resurrect the old thread: the problem is still there.  Namely, csum
> mismatch on packet should leave the iterator as it had been.  That much
> is clear; the question is what should be done on EFAULT halfway through.
Thanks for being very persistent with this. The original problem report 
was just about the data corruption issue. I think everyone involved agrees
that restoring the iterator for cases where the checksum check failed
is definitely the right action. (And of course: It is high time that the
data corruption issue gets fixed).

However, it's as you have said earlier about -EFAULT:
"[...] That's why I hadn't simply ACKed the proposed patch; it very much smells
like we have something bogus with EFAULT handling in the whole area."

Because from the explanations that: (*) "-EFAULT can happen at any point, with
zero warning before you get actual page fault when copying the data and
have handle_mm_fault() return VM_FAULT_ERROR. "

I think if you follow through with this argument. You have the problem of:
How to handle EFAULT from skb_copy_datagram_* (and all it's "wrappers")?

Because on one hand, the iovec could be partially bad. I remember that 
the application could do the following shenanigans during recvmsg: 
 - mprotect() could've flipped page read-only and back to read-write.
 - Or truncate() could've shortened the mmapped file,
 - etc.

In this case the error should be propagated back to the userspace.

But OTOH, it could just be a temporary failure (*) and restoring the
iovec and trying again is needed.

Is this a correct/complete assessment of the problem at hand? Or did
I make a mistake / wrong assumption in there?

> Semantics of both csum and non-csum skb_copy_datagram_msg() variants in
> EFAULT case is an interesting question.  None of that family report
                                                              support?
> partial copy; it's full or -EFAULT.  So for the sake of basic sanity
> it would be better to leave iterator in the original state when that
> kind of thing happens.  On the other hand, quite a few callers don't
> care about the state of iterator after that and I wonder if the overhead
> would be sensitive.  OTTH, the overhead in question is "save 5 words into
> local variable and don't use it in the normal case" - in the code that
> copies an skb worth of data.
> 
> AFAICS, the following gives consistent (and minimally surprising) semantics,
> as well as fixing the outright bug with iov_iter left advanced in case of csum
> errors.  Comments?

I'm looking at:
<http://lxr.free-electrons.com/source/net/ipv4/tcp_input.c#L4668>
<http://lxr.free-electrons.com/source/net/ipv4/tcp_input.c#L5232>
<http://lxr.free-electrons.com/source/net/ipv4/tcp_input.c#L5465>

>From what I can see, the tcp functions tcp_data_queue(),
tcp_copy_to_iovec() and tcp_rcv_established() would need to be
extended to handle EFAULT. Because if the iovec is restored
and the application did something bad (mprotect(), truncate(),
 ...), this code would sort of loop?

If this is the case: How many retries do we want, before we can
say it is a permament failure (and abort)?

Regards,
Christian

[...]
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index f1adddc1c5ac..ee8d962373af 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -3060,8 +3060,17 @@ struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned flags, int noblock,
>  				  int *err);
>  unsigned int datagram_poll(struct file *file, struct socket *sock,
>  			   struct poll_table_struct *wait);
> -int skb_copy_datagram_iter(const struct sk_buff *from, int offset,
> +int __skb_copy_datagram_iter(const struct sk_buff *from, int offset,
>  			   struct iov_iter *to, int size);
> +static inline int skb_copy_datagram_iter(const struct sk_buff *from, int offset,
> +					struct iov_iter *to, int size)
> +{
> +	struct iov_iter saved = *to;
> +	int res = __skb_copy_datagram_iter(from, offset, to, size);
> +	if (unlikely(res))
> +		*to = saved;
> +	return res;
> +}
>  static inline int skb_copy_datagram_msg(const struct sk_buff *from, int offset,
>  					struct msghdr *msg, int size)
>  {
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index ea633342ab0d..33ff2046dda1 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -394,7 +394,7 @@ EXPORT_SYMBOL(skb_kill_datagram);
>   *	@to: iovec iterator to copy to
>   *	@len: amount of data to copy from buffer to iovec
>   */
> -int skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
> +int __skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
>  			   struct iov_iter *to, int len)
>  {
>  	int start = skb_headlen(skb);
> @@ -445,7 +445,7 @@ int skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
>  		if ((copy = end - offset) > 0) {
>  			if (copy > len)
>  				copy = len;
> -			if (skb_copy_datagram_iter(frag_iter, offset - start,
> +			if (__skb_copy_datagram_iter(frag_iter, offset - start,
>  						   to, copy))
>  				goto fault;
>  			if ((len -= copy) == 0)
> @@ -471,7 +471,7 @@ int skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
>  
>  	return 0;
>  }
> -EXPORT_SYMBOL(skb_copy_datagram_iter);
> +EXPORT_SYMBOL(__skb_copy_datagram_iter);
>  
>  /**
>   *	skb_copy_datagram_from_iter - Copy a datagram from an iov_iter.
> @@ -750,14 +750,16 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
>  {
>  	__wsum csum;
>  	int chunk = skb->len - hlen;
> +	struct iov_iter saved;
>  
>  	if (!chunk)
>  		return 0;
>  
> +	saved = msg->msg_iter;
>  	if (msg_data_left(msg) < chunk) {
>  		if (__skb_checksum_complete(skb))
>  			goto csum_error;
> -		if (skb_copy_datagram_msg(skb, hlen, msg, chunk))
> +		if (__skb_copy_datagram_iter(skb, hlen, &msg->msg_iter, chunk))
>  			goto fault;
>  	} else {
>  		csum = csum_partial(skb->data, hlen, skb->csum);
> @@ -771,8 +773,10 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
>  	}
>  	return 0;
>  csum_error:
> +	msg->msg_iter = saved;
>  	return -EINVAL;
>  fault:
> +	msg->msg_iter = saved;
>  	return -EFAULT;
>  }
>  EXPORT_SYMBOL(skb_copy_and_csum_datagram_msg);

You mentioned "overhead" a few times. I tried to measure it with
iperf3 over loopback on a i7-4770. I lowered the MTU to 1500 in 
the hope to see see any difference and It still was very minute: 
188.33 GiB (without) vs 187.89 GiB (patched) for 100 seconds TCP
over IPv4. Of course, I would like to hear more results. Are 
there any special settings that could be interesting and worthwile?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2017-02-11 19:37                               ` Christian Lamparter
@ 2017-02-12  5:42                                 ` Al Viro
  2017-02-13 21:56                                   ` Christian Lamparter
  0 siblings, 1 reply; 35+ messages in thread
From: Al Viro @ 2017-02-12  5:42 UTC (permalink / raw)
  To: Christian Lamparter
  Cc: netdev, Eric Dumazet, Alan Curry, alexmcwhirter, David Miller

On Sat, Feb 11, 2017 at 08:37:06PM +0100, Christian Lamparter wrote:

> I think if you follow through with this argument. You have the problem of:
> How to handle EFAULT from skb_copy_datagram_* (and all it's "wrappers")?
> 
> Because on one hand, the iovec could be partially bad. I remember that 
> the application could do the following shenanigans during recvmsg: 
>  - mprotect() could've flipped page read-only and back to read-write.
>  - Or truncate() could've shortened the mmapped file,
>  - etc.
> 
> In this case the error should be propagated back to the userspace.
> 
> But OTOH, it could just be a temporary failure (*) and restoring the
> iovec and trying again is needed.

No.  You can't _rely_ upon -EFAULT being repeated, but it's not something
you would expect to retry your way out of.

The sane semantics is
	* fail read/recvmsg (with EFAULT) if it's a datagram socket
	* fail if it's a stream socket and nothing has been read by
that point
	* a short read if something has been already read.

> Is this a correct/complete assessment of the problem at hand? Or did
> I make a mistake / wrong assumption in there?

> I'm looking at:
> <http://lxr.free-electrons.com/source/net/ipv4/tcp_input.c#L4668>
> <http://lxr.free-electrons.com/source/net/ipv4/tcp_input.c#L5232>
> <http://lxr.free-electrons.com/source/net/ipv4/tcp_input.c#L5465>
> 
> >From what I can see, the tcp functions tcp_data_queue(),
> tcp_copy_to_iovec() and tcp_rcv_established() would need to be
> extended to handle EFAULT. Because if the iovec is restored
> and the application did something bad (mprotect(), truncate(),
>  ...), this code would sort of loop?

tcp_v4_do_rcv() has every right to copy nothing whatsoever - it's a fastpath
and when e.g. it's called in context of another thread or when skb isn't the
next fragment expected it won't bother with tcp_copy_to_iovec() at all.
Failure to copy anything in there is just fine, as long as you don't end
up buggering tp->ucopy state (in particular, tp->ucopy.msg->msg_iter).

> If this is the case: How many retries do we want, before we can
> say it is a permament failure (and abort)?

We don't want any.  What happens is that this path won't copy anything and
when that skb gets to
                        err = skb_copy_datagram_msg(skb, offset, msg, used);
                        if (err) {
                                /* Exception. Bailout! */
                                if (!copied)
                                        copied = -EFAULT;
                                break;
                        }
in tcp_recvmsg() we'll get our short read.

Again, the trouble is not with tcp_v4_do_rcv() failing to copy something -
it's failing to copy and ending up with iov_iter advanced that might be
a problem.  E.g. tp->ucopy.len getting out of sync with tp->ucopy.msg->msg_iter,
etc.

Short read on fault is fine.  So's full copy if somebody had been flipping
memprotect() and slow path ends up catching the moment when the buffer is
writable.  Both outcomes are fine.  Having the same memprotect() flipping
leave ->msg_iter more than one would expect by tp->ucopy.len and everything
back with copy_to_user working again, OTOH, might confuse tcp_input.c.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
  2017-02-12  5:42                                 ` Al Viro
@ 2017-02-13 21:56                                   ` Christian Lamparter
  2017-02-14  1:33                                     ` [PATCH][CFT] Saner error handling in skb_copy_datagram_iter() et.al. (was Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)) Al Viro
  0 siblings, 1 reply; 35+ messages in thread
From: Christian Lamparter @ 2017-02-13 21:56 UTC (permalink / raw)
  To: Al Viro; +Cc: netdev, Eric Dumazet, Alan Curry, alexmcwhirter, David Miller

On Sunday, February 12, 2017 5:42:18 AM CET Al Viro wrote:
> On Sat, Feb 11, 2017 at 08:37:06PM +0100, Christian Lamparter wrote:
> 
> > I think if you follow through with this argument. You have the problem of:
> > How to handle EFAULT from skb_copy_datagram_* (and all it's "wrappers")?
> > 
> > Because on one hand, the iovec could be partially bad. I remember that 
> > the application could do the following shenanigans during recvmsg: 
> >  - mprotect() could've flipped page read-only and back to read-write.
> >  - Or truncate() could've shortened the mmapped file,
> >  - etc.
> > 
> > In this case the error should be propagated back to the userspace.
> > 
> > But OTOH, it could just be a temporary failure (*) and restoring the
> > iovec and trying again is needed.
> 
> No.  You can't _rely_ upon -EFAULT being repeated, but it's not something
> you would expect to retry your way out of.
> 
> The sane semantics is
> 	* fail read/recvmsg (with EFAULT) if it's a datagram socket
> 	* fail if it's a stream socket and nothing has been read by
> that point
> 	* a short read if something has been already read.
> 
> > Is this a correct/complete assessment of the problem at hand? Or did
> > I make a mistake / wrong assumption in there?
> 
> > I'm looking at:
> > <http://lxr.free-electrons.com/source/net/ipv4/tcp_input.c#L4668>
> > <http://lxr.free-electrons.com/source/net/ipv4/tcp_input.c#L5232>
> > <http://lxr.free-electrons.com/source/net/ipv4/tcp_input.c#L5465>
> > 
> > >From what I can see, the tcp functions tcp_data_queue(),
> > tcp_copy_to_iovec() and tcp_rcv_established() would need to be
> > extended to handle EFAULT. Because if the iovec is restored
> > and the application did something bad (mprotect(), truncate(),
> >  ...), this code would sort of loop?
> 
> tcp_v4_do_rcv() has every right to copy nothing whatsoever - it's a fastpath
> and when e.g. it's called in context of another thread or when skb isn't the
> next fragment expected it won't bother with tcp_copy_to_iovec() at all.
> Failure to copy anything in there is just fine, as long as you don't end
> up buggering tp->ucopy state (in particular, tp->ucopy.msg->msg_iter).
> 
> > If this is the case: How many retries do we want, before we can
> > say it is a permament failure (and abort)?
> 
> We don't want any.  What happens is that this path won't copy anything and
> when that skb gets to
>                         err = skb_copy_datagram_msg(skb, offset, msg, used);
>                         if (err) {
>                                 /* Exception. Bailout! */
>                                 if (!copied)
>                                         copied = -EFAULT;
>                                 break;
>                         }
> in tcp_recvmsg() we'll get our short read.
> 
> Again, the trouble is not with tcp_v4_do_rcv() failing to copy something -
> it's failing to copy and ending up with iov_iter advanced that might be
> a problem.  E.g. tp->ucopy.len getting out of sync with tp->ucopy.msg->msg_iter,
> etc.
> 
> Short read on fault is fine.  So's full copy if somebody had been flipping
> memprotect() and slow path ends up catching the moment when the buffer is
> writable.  Both outcomes are fine.  Having the same memprotect() flipping
> leave ->msg_iter more than one would expect by tp->ucopy.len and everything
> back with copy_to_user working again, OTOH, might confuse tcp_input.c.
> 
Ok, thank you for sticking around. As for the patch: I've tested it with
the dlbug program from <https://lkml.org/lkml/2016/7/26/25> (modified to
pull from a local server) and the netem corruption policy as described
in <https://lkml.org/lkml/2016/8/3/181>. 
It works as expected. I did not get a single corruption with the patch
applied. Without the patch: every try had corruptions in random places. 

Tested-by: Christian Lamparter <chunkeey@googlemail.com>

Regards,
Christian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH][CFT] Saner error handling in skb_copy_datagram_iter() et.al. (was Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b))
  2017-02-13 21:56                                   ` Christian Lamparter
@ 2017-02-14  1:33                                     ` Al Viro
  2017-02-17 15:54                                       ` [PATCH][CFT] Saner error handling in skb_copy_datagram_iter() et.al David Miller
  0 siblings, 1 reply; 35+ messages in thread
From: Al Viro @ 2017-02-14  1:33 UTC (permalink / raw)
  To: netdev
  Cc: Eric Dumazet, Alan Curry, alexmcwhirter, David Miller,
	Christian Lamparter

On Mon, Feb 13, 2017 at 10:56:46PM +0100, Christian Lamparter wrote:

> Ok, thank you for sticking around. As for the patch: I've tested it with
> the dlbug program from <https://lkml.org/lkml/2016/7/26/25> (modified to
> pull from a local server) and the netem corruption policy as described
> in <https://lkml.org/lkml/2016/8/3/181>. 
> It works as expected. I did not get a single corruption with the patch
> applied. Without the patch: every try had corruptions in random places. 
> 
> Tested-by: Christian Lamparter <chunkeey@googlemail.com>

OK...  Remaining interesting question is whether it adds a noticable
overhead.  Could somebody try it on assorted benchmarks and see if
it slows the things down?  The patch in question follows:

Saner error handling in skb_copy_datagram_iter() et.al.

skb_copy_datagram_{iter,msg}() and skb_copy_and_csum_datagram_msg()
return 0 on success and -ve on error; there is no way to report
a partially successful copy.  We definitely should not leave iterator
advanced in case of -EINVAL (checksum failure); we will be asking for
retransmit and we'll attempt to copy the retransmitted packet when
it arrives.  We want it copied into the same place, not after the
(corrupt) copy we'd got the first time.  That got broken a while ago
when skb_copy_and_csum_datagram_msg() had been switched to use of
iov_iter primitives, resulting in rsync failures on noisy links, etc.
since commit e5a4b0bb803b in v3.19 ("switch memcpy_to_msg() and
skb_copy{,_and_csum}_datagram_msg() to primitives").

The other error possible there (-EFAULT) also should get the same
treatment - we do not tell the caller how much got copied, so we
shouldn't advance the iterator.  Unlike the -EINVAL case, such failures
normally terminate recvmsg(), so it's less damaging, but it's still asking
for trouble.  E.g. TCP receive fastpath treats failure to copy skb as
"just leave it to slowpath".  Unfortunately, EFAULT _may_ transient if
attacker is e.g. playing with mprotect() from another thread, and we end
up with tp->ucopy.len out of sync with tp->ucopy.msg->msg_iter without
being guaranteed that all subsequent attempts to copy anything will fail
anyway.

For consistency sake, let's make all these primitives restore the
state of iterator in all error cases.  Old skb_copy_datagram_iter()
renamed to __skb_copy_datagram_iter(), while skb_copy_datagram_iter(),
skb_copy_datagram_msg()/skb_csum_and_copy_datagram_msg() take care to
restore the original state of iterator.

Tested-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index c27011bbe30c..14ae17e77603 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -848,7 +848,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
 		vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto);
 		total += VLAN_HLEN;
 
-		ret = skb_copy_datagram_iter(skb, 0, iter, vlan_offset);
+		ret = __skb_copy_datagram_iter(skb, 0, iter, vlan_offset);
 		if (ret || !iov_iter_count(iter))
 			goto done;
 
@@ -857,7 +857,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
 			goto done;
 	}
 
-	ret = skb_copy_datagram_iter(skb, vlan_offset, iter,
+	ret = __skb_copy_datagram_iter(skb, vlan_offset, iter,
 				     skb->len - vlan_offset);
 
 done:
@@ -899,11 +899,14 @@ static ssize_t macvtap_do_read(struct macvtap_queue *q,
 		finish_wait(sk_sleep(&q->sk), &wait);
 
 	if (skb) {
+		struct iov_iter saved = *to;
 		ret = macvtap_put_user(q, skb, to);
-		if (unlikely(ret < 0))
+		if (unlikely(ret < 0)) {
+			*to = saved;
 			kfree_skb(skb);
-		else
+		} else {
 			consume_skb(skb);
+		}
 	}
 	return ret;
 }
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index a411b43a69eb..0d8badc3c4e9 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -480,7 +480,7 @@ static ssize_t ppp_read(struct file *file, char __user *buf,
 	iov.iov_base = buf;
 	iov.iov_len = count;
 	iov_iter_init(&to, READ, &iov, 1, count);
-	if (skb_copy_datagram_iter(skb, 0, &to, skb->len))
+	if (__skb_copy_datagram_iter(skb, 0, &to, skb->len))
 		goto outf;
 	ret = skb->len;
 
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 30863e378925..2003b8c9970e 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1430,7 +1430,7 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 
 		vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto);
 
-		ret = skb_copy_datagram_iter(skb, 0, iter, vlan_offset);
+		ret = __skb_copy_datagram_iter(skb, 0, iter, vlan_offset);
 		if (ret || !iov_iter_count(iter))
 			goto done;
 
@@ -1439,7 +1439,8 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 			goto done;
 	}
 
-	skb_copy_datagram_iter(skb, vlan_offset, iter, skb->len - vlan_offset);
+	/* XXX: no error check? */
+	__skb_copy_datagram_iter(skb, vlan_offset, iter, skb->len - vlan_offset);
 
 done:
 	/* caller is in process context, */
@@ -1501,6 +1502,7 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
 {
 	struct sk_buff *skb;
 	ssize_t ret;
+	struct iov_iter saved;
 	int err;
 
 	tun_debug(KERN_INFO, tun, "tun_do_read\n");
@@ -1513,11 +1515,14 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
 	if (!skb)
 		return err;
 
+	saved = *to;
 	ret = tun_put_user(tun, tfile, skb, to);
-	if (unlikely(ret < 0))
+	if (unlikely(ret < 0)) {
+		*to = saved;
 		kfree_skb(skb);
-	else
+	} else {
 		consume_skb(skb);
+	}
 
 	return ret;
 }
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f1adddc1c5ac..ee8d962373af 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3060,8 +3060,17 @@ struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned flags, int noblock,
 				  int *err);
 unsigned int datagram_poll(struct file *file, struct socket *sock,
 			   struct poll_table_struct *wait);
-int skb_copy_datagram_iter(const struct sk_buff *from, int offset,
+int __skb_copy_datagram_iter(const struct sk_buff *from, int offset,
 			   struct iov_iter *to, int size);
+static inline int skb_copy_datagram_iter(const struct sk_buff *from, int offset,
+					struct iov_iter *to, int size)
+{
+	struct iov_iter saved = *to;
+	int res = __skb_copy_datagram_iter(from, offset, to, size);
+	if (unlikely(res))
+		*to = saved;
+	return res;
+}
 static inline int skb_copy_datagram_msg(const struct sk_buff *from, int offset,
 					struct msghdr *msg, int size)
 {
diff --git a/net/core/datagram.c b/net/core/datagram.c
index ea633342ab0d..33ff2046dda1 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -394,7 +394,7 @@ EXPORT_SYMBOL(skb_kill_datagram);
  *	@to: iovec iterator to copy to
  *	@len: amount of data to copy from buffer to iovec
  */
-int skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
+int __skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
 			   struct iov_iter *to, int len)
 {
 	int start = skb_headlen(skb);
@@ -445,7 +445,7 @@ int skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
 		if ((copy = end - offset) > 0) {
 			if (copy > len)
 				copy = len;
-			if (skb_copy_datagram_iter(frag_iter, offset - start,
+			if (__skb_copy_datagram_iter(frag_iter, offset - start,
 						   to, copy))
 				goto fault;
 			if ((len -= copy) == 0)
@@ -471,7 +471,7 @@ int skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
 
 	return 0;
 }
-EXPORT_SYMBOL(skb_copy_datagram_iter);
+EXPORT_SYMBOL(__skb_copy_datagram_iter);
 
 /**
  *	skb_copy_datagram_from_iter - Copy a datagram from an iov_iter.
@@ -750,14 +750,16 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
 {
 	__wsum csum;
 	int chunk = skb->len - hlen;
+	struct iov_iter saved;
 
 	if (!chunk)
 		return 0;
 
+	saved = msg->msg_iter;
 	if (msg_data_left(msg) < chunk) {
 		if (__skb_checksum_complete(skb))
 			goto csum_error;
-		if (skb_copy_datagram_msg(skb, hlen, msg, chunk))
+		if (__skb_copy_datagram_iter(skb, hlen, &msg->msg_iter, chunk))
 			goto fault;
 	} else {
 		csum = csum_partial(skb->data, hlen, skb->csum);
@@ -771,8 +773,10 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
 	}
 	return 0;
 csum_error:
+	msg->msg_iter = saved;
 	return -EINVAL;
 fault:
+	msg->msg_iter = saved;
 	return -EFAULT;
 }
 EXPORT_SYMBOL(skb_copy_and_csum_datagram_msg);

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH][CFT] Saner error handling in skb_copy_datagram_iter() et.al.
  2017-02-14  1:33                                     ` [PATCH][CFT] Saner error handling in skb_copy_datagram_iter() et.al. (was Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)) Al Viro
@ 2017-02-17 15:54                                       ` David Miller
  2017-02-17 17:03                                         ` Al Viro
  0 siblings, 1 reply; 35+ messages in thread
From: David Miller @ 2017-02-17 15:54 UTC (permalink / raw)
  To: viro; +Cc: netdev, eric.dumazet, rlwinm, alexmcwhirter, chunkeey

From: Al Viro <viro@ZenIV.linux.org.uk>
Date: Tue, 14 Feb 2017 01:33:06 +0000

> OK...  Remaining interesting question is whether it adds a noticable
> overhead.  Could somebody try it on assorted benchmarks and see if
> it slows the things down?  The patch in question follows:

That's about a 40 byte copy onto the stack for each invocation of this
thing.  You can benchmark all you want, but it's clear that this is
non-trivial amount of work and will take some operations from fitting
in the cache to not doing so for sure.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH][CFT] Saner error handling in skb_copy_datagram_iter() et.al.
  2017-02-17 15:54                                       ` [PATCH][CFT] Saner error handling in skb_copy_datagram_iter() et.al David Miller
@ 2017-02-17 17:03                                         ` Al Viro
  2017-02-18  0:02                                           ` Al Viro
  0 siblings, 1 reply; 35+ messages in thread
From: Al Viro @ 2017-02-17 17:03 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, eric.dumazet, rlwinm, alexmcwhirter, chunkeey

On Fri, Feb 17, 2017 at 10:54:20AM -0500, David Miller wrote:
> From: Al Viro <viro@ZenIV.linux.org.uk>
> Date: Tue, 14 Feb 2017 01:33:06 +0000
> 
> > OK...  Remaining interesting question is whether it adds a noticable
> > overhead.  Could somebody try it on assorted benchmarks and see if
> > it slows the things down?  The patch in question follows:
> 
> That's about a 40 byte copy onto the stack for each invocation of this
> thing.  You can benchmark all you want, but it's clear that this is
> non-trivial amount of work and will take some operations from fitting
> in the cache to not doing so for sure.

In principle, that could be reduced a bit (32 bytes - ->type is never
changed, so we don't need to restore it), but that's not much of improvement...

Hell knows; at the very least, we need to restore the original position in
case of csum failure.  EFAULT is a separate story (and I'm still not sure
if tcp_input is handling it right), but csum mismatch will be retried and
we want the payload of retransmitted packet to go into the same place, not
past what we'd managed to copy.  AFAICS, original logics used to be
	* skb_copy_and_csum_...() never copied more than one iovec worth of
data.
	* iovec had been drained only on success; during the copy we kept
pointer + remaining length in local variables
	* if copying would have to go into skb fragments, we would just
calculate the checksum separately and then go for plain copy.

Not crossing from iovec to iovec reduced the amount of state to carry/revert
and AFAICS recursive call into skb fragments on the checksum-as-we-copy
path had been dead code all along - we went for separate checksum path in
cases when it would be reached.

Is my reading of pre-e5a4b0bb803b logics correct?  Was the recursive call
of skb_copy_and_csum_datagram() ever reached?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH][CFT] Saner error handling in skb_copy_datagram_iter() et.al.
  2017-02-17 17:03                                         ` Al Viro
@ 2017-02-18  0:02                                           ` Al Viro
  2017-02-18  2:24                                             ` Al Viro
                                                               ` (3 more replies)
  0 siblings, 4 replies; 35+ messages in thread
From: Al Viro @ 2017-02-18  0:02 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, eric.dumazet, rlwinm, alexmcwhirter, chunkeey

On Fri, Feb 17, 2017 at 05:03:15PM +0000, Al Viro wrote:
> On Fri, Feb 17, 2017 at 10:54:20AM -0500, David Miller wrote:
> > From: Al Viro <viro@ZenIV.linux.org.uk>
> > Date: Tue, 14 Feb 2017 01:33:06 +0000
> > 
> > > OK...  Remaining interesting question is whether it adds a noticable
> > > overhead.  Could somebody try it on assorted benchmarks and see if
> > > it slows the things down?  The patch in question follows:
> > 
> > That's about a 40 byte copy onto the stack for each invocation of this
> > thing.  You can benchmark all you want, but it's clear that this is
> > non-trivial amount of work and will take some operations from fitting
> > in the cache to not doing so for sure.
> 
> In principle, that could be reduced a bit (32 bytes - ->type is never
> changed, so we don't need to restore it), but that's not much of improvement...

Actually, I've a better solution.  Namely, analogue of iov_iter_advance()
for going backwards.  The restriction is that you should never unroll
further than where you've initially started *or* have the iovec, etc.
array modified under you.  For iovec/kvec/bio_vec it's trivial, for pipe -
a bit more convoluted, but still doable.  Then net/core/datagram.c stuff
could simply use iov_iter_unroll() in case of error - all we need to keep
track of is how much had we copied and that's easy to do.

The patch below is completely untested, but if it works it should avoid
buggering the fast paths at all, still giving the same semantics re
reverting ->msg_iter both on EINVAL and EFAULT.  Comments?

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 804e34c6f981..237965348bc2 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -39,7 +39,10 @@ struct iov_iter {
 	};
 	union {
 		unsigned long nr_segs;
-		int idx;
+		struct {
+			int idx;
+			int start_idx;
+		};
 	};
 };
 
@@ -81,6 +84,7 @@ unsigned long iov_shorten(struct iovec *iov, unsigned long nr_segs, size_t to);
 size_t iov_iter_copy_from_user_atomic(struct page *page,
 		struct iov_iter *i, unsigned long offset, size_t bytes);
 void iov_iter_advance(struct iov_iter *i, size_t bytes);
+void iov_iter_unroll(struct iov_iter *i, size_t bytes);
 int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes);
 size_t iov_iter_single_seg_count(const struct iov_iter *i);
 size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index e68604ae3ced..afdaca37f5c9 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -786,6 +786,68 @@ void iov_iter_advance(struct iov_iter *i, size_t size)
 }
 EXPORT_SYMBOL(iov_iter_advance);
 
+void iov_iter_unroll(struct iov_iter *i, size_t unroll)
+{
+	if (!unroll)
+		return;
+	i->count += unroll;
+	if (unlikely(i->type & ITER_PIPE)) {
+		struct pipe_inode_info *pipe = i->pipe;
+		int idx = i->idx;
+		size_t off = i->iov_offset;
+		while (1) {
+			size_t n = off - pipe->bufs[idx].offset;
+			if (unroll < n) {
+				off -= (n - unroll);
+				break;
+			}
+			unroll -= n;
+			if (!unroll && idx == i->start_idx) {
+				off = 0;
+				break;
+			}
+			if (!idx--)
+				idx = pipe->buffers - 1;
+			off = pipe->bufs[idx].offset + pipe->bufs[idx].len;
+		}
+		i->iov_offset = off;
+		i->idx = idx;
+		pipe_truncate(i);
+		return;
+	}
+	if (unroll <= i->iov_offset) {
+		i->iov_offset -= unroll;
+		return;
+	}
+	unroll -= i->iov_offset;
+	if (i->type & ITER_BVEC) {
+		const struct bio_vec *bvec = i->bvec;
+		while (1) {
+			size_t n = (--bvec)->bv_len;
+			i->nr_segs++;
+			if (unroll <= n) {
+				i->bvec = bvec;
+				i->iov_offset = n - unroll;
+				return;
+			}
+			unroll -= n;
+		}
+	} else { /* same logics for iovec and kvec */
+		const struct iovec *iov = i->iov;
+		while (1) {
+			size_t n = (--iov)->iov_len;
+			i->nr_segs++;
+			if (unroll <= n) {
+				i->iov = iov;
+				i->iov_offset = n - unroll;
+				return;
+			}
+			unroll -= n;
+		}
+	}
+}
+EXPORT_SYMBOL(iov_iter_unroll);
+
 /*
  * Return the count of just the current iov_iter segment.
  */
@@ -839,6 +901,7 @@ void iov_iter_pipe(struct iov_iter *i, int direction,
 	i->idx = (pipe->curbuf + pipe->nrbufs) & (pipe->buffers - 1);
 	i->iov_offset = 0;
 	i->count = count;
+	i->start_idx = i->idx;
 }
 EXPORT_SYMBOL(iov_iter_pipe);
 
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 662bea587165..63c7353a6ba2 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -394,7 +394,7 @@ int skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
 			   struct iov_iter *to, int len)
 {
 	int start = skb_headlen(skb);
-	int i, copy = start - offset;
+	int i, copy = start - offset, start_off = offset, n;
 	struct sk_buff *frag_iter;
 
 	trace_skb_copy_datagram_iovec(skb, len);
@@ -403,11 +403,12 @@ int skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
 	if (copy > 0) {
 		if (copy > len)
 			copy = len;
-		if (copy_to_iter(skb->data + offset, copy, to) != copy)
+		n = copy_to_iter(skb->data + offset, copy, to);
+		offset += n;
+		if (n != copy)
 			goto short_copy;
 		if ((len -= copy) == 0)
 			return 0;
-		offset += copy;
 	}
 
 	/* Copy paged appendix. Hmm... why does this look so complicated? */
@@ -421,13 +422,14 @@ int skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
 		if ((copy = end - offset) > 0) {
 			if (copy > len)
 				copy = len;
-			if (copy_page_to_iter(skb_frag_page(frag),
+			n = copy_page_to_iter(skb_frag_page(frag),
 					      frag->page_offset + offset -
-					      start, copy, to) != copy)
+					      start, copy, to);
+			offset += n;
+			if (n != copy)
 				goto short_copy;
 			if (!(len -= copy))
 				return 0;
-			offset += copy;
 		}
 		start = end;
 	}
@@ -459,6 +461,7 @@ int skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
 	 */
 
 fault:
+	iov_iter_unroll(to, offset - start_off);
 	return -EFAULT;
 
 short_copy:
@@ -609,7 +612,7 @@ static int skb_copy_and_csum_datagram(const struct sk_buff *skb, int offset,
 				      __wsum *csump)
 {
 	int start = skb_headlen(skb);
-	int i, copy = start - offset;
+	int i, copy = start - offset, start_off = offset;
 	struct sk_buff *frag_iter;
 	int pos = 0;
 	int n;
@@ -619,11 +622,11 @@ static int skb_copy_and_csum_datagram(const struct sk_buff *skb, int offset,
 		if (copy > len)
 			copy = len;
 		n = csum_and_copy_to_iter(skb->data + offset, copy, csump, to);
+		offset += n;
 		if (n != copy)
 			goto fault;
 		if ((len -= copy) == 0)
 			return 0;
-		offset += copy;
 		pos = copy;
 	}
 
@@ -645,12 +648,12 @@ static int skb_copy_and_csum_datagram(const struct sk_buff *skb, int offset,
 						  offset - start, copy,
 						  &csum2, to);
 			kunmap(page);
+			offset += n;
 			if (n != copy)
 				goto fault;
 			*csump = csum_block_add(*csump, csum2, pos);
 			if (!(len -= copy))
 				return 0;
-			offset += copy;
 			pos += copy;
 		}
 		start = end;
@@ -683,6 +686,7 @@ static int skb_copy_and_csum_datagram(const struct sk_buff *skb, int offset,
 		return 0;
 
 fault:
+	iov_iter_unroll(to, offset - start_off);
 	return -EFAULT;
 }
 
@@ -767,6 +771,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
 	}
 	return 0;
 csum_error:
+	iov_iter_unroll(&msg->msg_iter, chunk);
 	return -EINVAL;
 fault:
 	return -EFAULT;

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH][CFT] Saner error handling in skb_copy_datagram_iter() et.al.
  2017-02-18  0:02                                           ` Al Viro
@ 2017-02-18  2:24                                             ` Al Viro
  2017-02-19 19:19                                             ` Christian Lamparter
                                                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 35+ messages in thread
From: Al Viro @ 2017-02-18  2:24 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, eric.dumazet, rlwinm, alexmcwhirter, chunkeey

On Sat, Feb 18, 2017 at 12:02:14AM +0000, Al Viro wrote:
> On Fri, Feb 17, 2017 at 05:03:15PM +0000, Al Viro wrote:
> > On Fri, Feb 17, 2017 at 10:54:20AM -0500, David Miller wrote:
> > > From: Al Viro <viro@ZenIV.linux.org.uk>
> > > Date: Tue, 14 Feb 2017 01:33:06 +0000
> > > 
> > > > OK...  Remaining interesting question is whether it adds a noticable
> > > > overhead.  Could somebody try it on assorted benchmarks and see if
> > > > it slows the things down?  The patch in question follows:
> > > 
> > > That's about a 40 byte copy onto the stack for each invocation of this
> > > thing.  You can benchmark all you want, but it's clear that this is
> > > non-trivial amount of work and will take some operations from fitting
> > > in the cache to not doing so for sure.
> > 
> > In principle, that could be reduced a bit (32 bytes - ->type is never
> > changed, so we don't need to restore it), but that's not much of improvement...
> 
> Actually, I've a better solution.  Namely, analogue of iov_iter_advance()
> for going backwards.  The restriction is that you should never unroll
> further than where you've initially started *or* have the iovec, etc.
> array modified under you.  For iovec/kvec/bio_vec it's trivial, for pipe -
> a bit more convoluted, but still doable.  Then net/core/datagram.c stuff
> could simply use iov_iter_unroll() in case of error - all we need to keep
> track of is how much had we copied and that's easy to do.
> 
> The patch below is completely untested, but if it works it should avoid
> buggering the fast paths at all, still giving the same semantics re
> reverting ->msg_iter both on EINVAL and EFAULT.  Comments?

Incidentally, at least 4 other places in net/* can stop messing with
copying iov_iter just in case we'll need to revert and I'm fairly sure
there's more of the same in the tree.  FWIW, net/rds and net/tipc parts
(on top of the previous patch, also completely untested) follow:

diff --git a/net/rds/recv.c b/net/rds/recv.c
index 9d0666e5fe35..ed407156b3b4 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -564,7 +564,6 @@ int rds_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 		goto out;
 
 	while (1) {
-		struct iov_iter save;
 		/* If there are pending notifications, do those - and nothing else */
 		if (!list_empty(&rs->rs_notify_queue)) {
 			ret = rds_notify_queue_get(rs, msg);
@@ -600,7 +599,6 @@ int rds_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 		rdsdebug("copying inc %p from %pI4:%u to user\n", inc,
 			 &inc->i_conn->c_faddr,
 			 ntohs(inc->i_hdr.h_sport));
-		save = msg->msg_iter;
 		ret = inc->i_conn->c_trans->inc_copy_to_user(inc, &msg->msg_iter);
 		if (ret < 0)
 			break;
@@ -614,7 +612,7 @@ int rds_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 			rds_inc_put(inc);
 			inc = NULL;
 			rds_stats_inc(s_recv_deliver_raced);
-			msg->msg_iter = save;
+			iov_iter_unroll(&msg->msg_iter, ret);
 			continue;
 		}
 
diff --git a/net/tipc/msg.c b/net/tipc/msg.c
index a22be502f1bd..5bfd5d641ba2 100644
--- a/net/tipc/msg.c
+++ b/net/tipc/msg.c
@@ -316,6 +316,7 @@ int tipc_msg_build(struct tipc_msg *mhdr, struct msghdr *m,
 		skb = tipc_buf_acquire(pktsz);
 		if (!skb) {
 			rc = -ENOMEM;
+			iov_iter_unroll(&m->msg_iter, dsz - drem);
 			goto error;
 		}
 		skb_orphan(skb);
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 333c5dae0072..1fa220a2fa00 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -688,7 +688,6 @@ static int tipc_sendmcast(struct  socket *sock, struct tipc_name_seq *seq,
 	struct net *net = sock_net(sk);
 	struct tipc_msg *mhdr = &tsk->phdr;
 	struct sk_buff_head pktchain;
-	struct iov_iter save = msg->msg_iter;
 	uint mtu;
 	int rc;
 
@@ -725,7 +724,7 @@ static int tipc_sendmcast(struct  socket *sock, struct tipc_name_seq *seq,
 		}
 		__skb_queue_purge(&pktchain);
 		if (rc == -EMSGSIZE) {
-			msg->msg_iter = save;
+			iov_iter_unroll(&msg->msg_iter, dsz);
 			goto new_mtu;
 		}
 		break;
@@ -891,7 +890,7 @@ static int __tipc_sendmsg(struct socket *sock, struct msghdr *m, size_t dsz)
 	bool is_connectionless = tipc_sk_type_connectionless(sk);
 	struct sk_buff *skb;
 	struct tipc_name_seq *seq;
-	struct iov_iter save;
+	int size;
 	u32 mtu;
 	long timeo;
 	int rc;
@@ -950,10 +949,9 @@ static int __tipc_sendmsg(struct socket *sock, struct msghdr *m, size_t dsz)
 	}
 
 	skb_queue_head_init(&pktchain);
-	save = m->msg_iter;
 new_mtu:
 	mtu = tipc_node_get_mtu(net, dnode, tsk->portid);
-	rc = tipc_msg_build(mhdr, m, 0, dsz, mtu, &pktchain);
+	size = rc = tipc_msg_build(mhdr, m, 0, dsz, mtu, &pktchain);
 	if (rc < 0)
 		return rc;
 
@@ -974,7 +972,7 @@ static int __tipc_sendmsg(struct socket *sock, struct msghdr *m, size_t dsz)
 		}
 		__skb_queue_purge(&pktchain);
 		if (rc == -EMSGSIZE) {
-			m->msg_iter = save;
+			iov_iter_unroll(&m->msg_iter, size);
 			goto new_mtu;
 		}
 		break;
@@ -1049,7 +1047,7 @@ static int __tipc_send_stream(struct socket *sock, struct msghdr *m, size_t dsz)
 	long timeo;
 	u32 dnode;
 	uint mtu, send, sent = 0;
-	struct iov_iter save;
+	int size;
 	int hlen = MIN_H_SIZE;
 
 	/* Handle implied connection establishment */
@@ -1078,10 +1076,9 @@ static int __tipc_send_stream(struct socket *sock, struct msghdr *m, size_t dsz)
 	skb_queue_head_init(&pktchain);
 
 next:
-	save = m->msg_iter;
 	mtu = tsk->max_pkt;
 	send = min_t(uint, dsz - sent, TIPC_MAX_USER_MSG_SIZE);
-	rc = tipc_msg_build(mhdr, m, sent, send, mtu, &pktchain);
+	size = rc = tipc_msg_build(mhdr, m, sent, send, mtu, &pktchain);
 	if (unlikely(rc < 0))
 		return rc;
 
@@ -1099,7 +1096,7 @@ static int __tipc_send_stream(struct socket *sock, struct msghdr *m, size_t dsz)
 				__skb_queue_purge(&pktchain);
 				tsk->max_pkt = tipc_node_get_mtu(net, dnode,
 								 portid);
-				m->msg_iter = save;
+				iov_iter_unroll(&m->msg_iter, size);
 				goto next;
 			}
 			if (rc != -ELINKCONG)

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH][CFT] Saner error handling in skb_copy_datagram_iter() et.al.
  2017-02-18  0:02                                           ` Al Viro
  2017-02-18  2:24                                             ` Al Viro
@ 2017-02-19 19:19                                             ` Christian Lamparter
  2017-02-20 15:14                                             ` David Miller
  2017-02-21 13:25                                             ` David Laight
  3 siblings, 0 replies; 35+ messages in thread
From: Christian Lamparter @ 2017-02-19 19:19 UTC (permalink / raw)
  To: Al Viro; +Cc: David Miller, netdev, eric.dumazet, rlwinm, alexmcwhirter

On Saturday, February 18, 2017 12:02:14 AM CET Al Viro wrote:
> On Fri, Feb 17, 2017 at 05:03:15PM +0000, Al Viro wrote:
> > On Fri, Feb 17, 2017 at 10:54:20AM -0500, David Miller wrote:
> > > From: Al Viro <viro@ZenIV.linux.org.uk>
> > > Date: Tue, 14 Feb 2017 01:33:06 +0000
> > > 
> > > > OK...  Remaining interesting question is whether it adds a noticable
> > > > overhead.  Could somebody try it on assorted benchmarks and see if
> > > > it slows the things down?  The patch in question follows:
> > > 
> > > That's about a 40 byte copy onto the stack for each invocation of this
> > > thing.  You can benchmark all you want, but it's clear that this is
> > > non-trivial amount of work and will take some operations from fitting
> > > in the cache to not doing so for sure.
> > 
> > In principle, that could be reduced a bit (32 bytes - ->type is never
> > changed, so we don't need to restore it), but that's not much of improvement...
> 
> Actually, I've a better solution.  Namely, analogue of iov_iter_advance()
> for going backwards.  The restriction is that you should never unroll
> further than where you've initially started *or* have the iovec, etc.
> array modified under you.  For iovec/kvec/bio_vec it's trivial, for pipe -
> a bit more convoluted, but still doable.  Then net/core/datagram.c stuff
> could simply use iov_iter_unroll() in case of error - all we need to keep
> track of is how much had we copied and that's easy to do.
> 
> The patch below is completely untested, but if it works it should avoid
> buggering the fast paths at all, still giving the same semantics re
> reverting ->msg_iter both on EINVAL and EFAULT.  Comments?
I've tested it. It also fixes the corruption issue I can reproduce
with my setup.:
# tc qdisc add dev eth0 root netem corrupt 0.1%
(and the dlbug code)

So, for what's worth:
Tested-by: Christian Lamparter <chunkeey@googlemail.com>

> diff --git a/include/linux/uio.h b/include/linux/uio.h
> index 804e34c6f981..237965348bc2 100644
> --- a/include/linux/uio.h
> +++ b/include/linux/uio.h
> @@ -39,7 +39,10 @@ struct iov_iter {
>  	};
>  	union {
>  		unsigned long nr_segs;
> -		int idx;
> +		struct {
> +			int idx;
> +			int start_idx;
> +		};
>  	};
>  };
>  
> @@ -81,6 +84,7 @@ unsigned long iov_shorten(struct iovec *iov, unsigned long nr_segs, size_t to);
>  size_t iov_iter_copy_from_user_atomic(struct page *page,
>  		struct iov_iter *i, unsigned long offset, size_t bytes);
>  void iov_iter_advance(struct iov_iter *i, size_t bytes);
> +void iov_iter_unroll(struct iov_iter *i, size_t bytes);
>  int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes);
>  size_t iov_iter_single_seg_count(const struct iov_iter *i);
>  size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index e68604ae3ced..afdaca37f5c9 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -786,6 +786,68 @@ void iov_iter_advance(struct iov_iter *i, size_t size)
>  }
>  EXPORT_SYMBOL(iov_iter_advance);
>  
> +void iov_iter_unroll(struct iov_iter *i, size_t unroll)
> +{
> +	if (!unroll)
> +		return;
> +	i->count += unroll;
> +	if (unlikely(i->type & ITER_PIPE)) {
> +		struct pipe_inode_info *pipe = i->pipe;
> +		int idx = i->idx;
> +		size_t off = i->iov_offset;
> +		while (1) {
> +			size_t n = off - pipe->bufs[idx].offset;
> +			if (unroll < n) {
> +				off -= (n - unroll);
> +				break;
> +			}
> +			unroll -= n;
> +			if (!unroll && idx == i->start_idx) {
> +				off = 0;
> +				break;
> +			}
> +			if (!idx--)
> +				idx = pipe->buffers - 1;
> +			off = pipe->bufs[idx].offset + pipe->bufs[idx].len;
> +		}
> +		i->iov_offset = off;
> +		i->idx = idx;
> +		pipe_truncate(i);
> +		return;
> +	}
> +	if (unroll <= i->iov_offset) {
> +		i->iov_offset -= unroll;
> +		return;
> +	}
> +	unroll -= i->iov_offset;
> +	if (i->type & ITER_BVEC) {
> +		const struct bio_vec *bvec = i->bvec;
> +		while (1) {
> +			size_t n = (--bvec)->bv_len;
> +			i->nr_segs++;
> +			if (unroll <= n) {
> +				i->bvec = bvec;
> +				i->iov_offset = n - unroll;
> +				return;
> +			}
> +			unroll -= n;
> +		}
> +	} else { /* same logics for iovec and kvec */
> +		const struct iovec *iov = i->iov;
> +		while (1) {
> +			size_t n = (--iov)->iov_len;
> +			i->nr_segs++;
> +			if (unroll <= n) {
> +				i->iov = iov;
> +				i->iov_offset = n - unroll;
> +				return;
> +			}
> +			unroll -= n;
> +		}
> +	}
> +}
> +EXPORT_SYMBOL(iov_iter_unroll);
> +
>  /*
>   * Return the count of just the current iov_iter segment.
>   */
> @@ -839,6 +901,7 @@ void iov_iter_pipe(struct iov_iter *i, int direction,
>  	i->idx = (pipe->curbuf + pipe->nrbufs) & (pipe->buffers - 1);
>  	i->iov_offset = 0;
>  	i->count = count;
> +	i->start_idx = i->idx;
>  }
>  EXPORT_SYMBOL(iov_iter_pipe);
>  
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index 662bea587165..63c7353a6ba2 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -394,7 +394,7 @@ int skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
>  			   struct iov_iter *to, int len)
>  {
>  	int start = skb_headlen(skb);
> -	int i, copy = start - offset;
> +	int i, copy = start - offset, start_off = offset, n;
>  	struct sk_buff *frag_iter;
>  
>  	trace_skb_copy_datagram_iovec(skb, len);
> @@ -403,11 +403,12 @@ int skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
>  	if (copy > 0) {
>  		if (copy > len)
>  			copy = len;
> -		if (copy_to_iter(skb->data + offset, copy, to) != copy)
> +		n = copy_to_iter(skb->data + offset, copy, to);
> +		offset += n;
> +		if (n != copy)
>  			goto short_copy;
>  		if ((len -= copy) == 0)
>  			return 0;
> -		offset += copy;
>  	}
>  
>  	/* Copy paged appendix. Hmm... why does this look so complicated? */
> @@ -421,13 +422,14 @@ int skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
>  		if ((copy = end - offset) > 0) {
>  			if (copy > len)
>  				copy = len;
> -			if (copy_page_to_iter(skb_frag_page(frag),
> +			n = copy_page_to_iter(skb_frag_page(frag),
>  					      frag->page_offset + offset -
> -					      start, copy, to) != copy)
> +					      start, copy, to);
> +			offset += n;
> +			if (n != copy)
>  				goto short_copy;
>  			if (!(len -= copy))
>  				return 0;
> -			offset += copy;
>  		}
>  		start = end;
>  	}
> @@ -459,6 +461,7 @@ int skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
>  	 */
>  
>  fault:
> +	iov_iter_unroll(to, offset - start_off);
>  	return -EFAULT;
>  
>  short_copy:
> @@ -609,7 +612,7 @@ static int skb_copy_and_csum_datagram(const struct sk_buff *skb, int offset,
>  				      __wsum *csump)
>  {
>  	int start = skb_headlen(skb);
> -	int i, copy = start - offset;
> +	int i, copy = start - offset, start_off = offset;
>  	struct sk_buff *frag_iter;
>  	int pos = 0;
>  	int n;
> @@ -619,11 +622,11 @@ static int skb_copy_and_csum_datagram(const struct sk_buff *skb, int offset,
>  		if (copy > len)
>  			copy = len;
>  		n = csum_and_copy_to_iter(skb->data + offset, copy, csump, to);
> +		offset += n;
>  		if (n != copy)
>  			goto fault;
>  		if ((len -= copy) == 0)
>  			return 0;
> -		offset += copy;
>  		pos = copy;
>  	}
>  
> @@ -645,12 +648,12 @@ static int skb_copy_and_csum_datagram(const struct sk_buff *skb, int offset,
>  						  offset - start, copy,
>  						  &csum2, to);
>  			kunmap(page);
> +			offset += n;
>  			if (n != copy)
>  				goto fault;
>  			*csump = csum_block_add(*csump, csum2, pos);
>  			if (!(len -= copy))
>  				return 0;
> -			offset += copy;
>  			pos += copy;
>  		}
>  		start = end;
> @@ -683,6 +686,7 @@ static int skb_copy_and_csum_datagram(const struct sk_buff *skb, int offset,
>  		return 0;
>  
>  fault:
> +	iov_iter_unroll(to, offset - start_off);
>  	return -EFAULT;
>  }
>  
> @@ -767,6 +771,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
>  	}
>  	return 0;
>  csum_error:
> +	iov_iter_unroll(&msg->msg_iter, chunk);
>  	return -EINVAL;
>  fault:
>  	return -EFAULT;
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH][CFT] Saner error handling in skb_copy_datagram_iter() et.al.
  2017-02-18  0:02                                           ` Al Viro
  2017-02-18  2:24                                             ` Al Viro
  2017-02-19 19:19                                             ` Christian Lamparter
@ 2017-02-20 15:14                                             ` David Miller
  2017-02-21 13:25                                             ` David Laight
  3 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2017-02-20 15:14 UTC (permalink / raw)
  To: viro; +Cc: netdev, eric.dumazet, rlwinm, alexmcwhirter, chunkeey

From: Al Viro <viro@ZenIV.linux.org.uk>
Date: Sat, 18 Feb 2017 00:02:14 +0000

> On Fri, Feb 17, 2017 at 05:03:15PM +0000, Al Viro wrote:
>> On Fri, Feb 17, 2017 at 10:54:20AM -0500, David Miller wrote:
>> > From: Al Viro <viro@ZenIV.linux.org.uk>
>> > Date: Tue, 14 Feb 2017 01:33:06 +0000
>> > 
>> > > OK...  Remaining interesting question is whether it adds a noticable
>> > > overhead.  Could somebody try it on assorted benchmarks and see if
>> > > it slows the things down?  The patch in question follows:
>> > 
>> > That's about a 40 byte copy onto the stack for each invocation of this
>> > thing.  You can benchmark all you want, but it's clear that this is
>> > non-trivial amount of work and will take some operations from fitting
>> > in the cache to not doing so for sure.
>> 
>> In principle, that could be reduced a bit (32 bytes - ->type is never
>> changed, so we don't need to restore it), but that's not much of improvement...
> 
> Actually, I've a better solution.  Namely, analogue of iov_iter_advance()
> for going backwards.  The restriction is that you should never unroll
> further than where you've initially started *or* have the iovec, etc.
> array modified under you.  For iovec/kvec/bio_vec it's trivial, for pipe -
> a bit more convoluted, but still doable.  Then net/core/datagram.c stuff
> could simply use iov_iter_unroll() in case of error - all we need to keep
> track of is how much had we copied and that's easy to do.
> 
> The patch below is completely untested, but if it works it should avoid
> buggering the fast paths at all, still giving the same semantics re
> reverting ->msg_iter both on EINVAL and EFAULT.  Comments?

This looks a lot better to me.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH][CFT] Saner error handling in skb_copy_datagram_iter() et.al.
  2017-02-18  0:02                                           ` Al Viro
                                                               ` (2 preceding siblings ...)
  2017-02-20 15:14                                             ` David Miller
@ 2017-02-21 13:25                                             ` David Laight
  3 siblings, 0 replies; 35+ messages in thread
From: David Laight @ 2017-02-21 13:25 UTC (permalink / raw)
  To: 'Al Viro', David Miller
  Cc: netdev, eric.dumazet, rlwinm, alexmcwhirter, chunkeey

From: Al Viro
> Sent: 18 February 2017 00:02
...
> Actually, I've a better solution.  Namely, analogue of iov_iter_advance()
> for going backwards.  The restriction is that you should never unroll
> further than where you've initially started *or* have the iovec, etc.
> array modified under you
...
> +void iov_iter_unroll(struct iov_iter *i, size_t unroll)
...

I'm sure there is a better name, maybe iov_iter_backup() ?

	David

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2017-02-21 13:25 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-24  3:35 PROBLEM: network data corruption (bisected to e5a4b0bb803b) Alan Curry
2016-07-24 17:45 ` Christian Lamparter
2016-07-24 17:45   ` Christian Lamparter
2016-07-24 19:02   ` Al Viro
2016-07-24 19:02     ` Al Viro
2016-07-26  4:57     ` Alan Curry
2016-07-26 13:59       ` Christian Lamparter
2016-07-26 18:15         ` alexmcwhirter
2016-07-27  6:39           ` Kalle Valo
2016-07-27  1:14         ` Alan Curry
2016-07-27 10:32     ` Alan Curry
2016-07-27 18:04       ` alexmcwhirter
2016-07-27 23:02         ` alexmcwhirter
2016-07-27 23:45           ` David Miller
2016-07-28  0:31             ` Al Viro
2016-07-28  0:26               ` alexmcwhirter
2016-07-28  1:22                 ` Al Viro
2016-07-28  1:22                   ` Al Viro
2016-08-03  3:49                   ` Alan Curry
2016-08-03 12:43                     ` Christian Lamparter
2016-08-03 23:25                       ` Alan Curry
     [not found]                     ` <20160803054118.GG2356@ZenIV.linux.org.uk>
     [not found]                       ` <2363167.YiBS7sFNO2@debian64>
     [not found]                         ` <20160809145836.GQ2356@ZenIV.linux.org.uk>
     [not found]                           ` <20170210081126.GA14157@ZenIV.linux.org.uk>
2017-02-10 21:45                             ` Al Viro
2017-02-11 19:37                               ` Christian Lamparter
2017-02-12  5:42                                 ` Al Viro
2017-02-13 21:56                                   ` Christian Lamparter
2017-02-14  1:33                                     ` [PATCH][CFT] Saner error handling in skb_copy_datagram_iter() et.al. (was Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)) Al Viro
2017-02-17 15:54                                       ` [PATCH][CFT] Saner error handling in skb_copy_datagram_iter() et.al David Miller
2017-02-17 17:03                                         ` Al Viro
2017-02-18  0:02                                           ` Al Viro
2017-02-18  2:24                                             ` Al Viro
2017-02-19 19:19                                             ` Christian Lamparter
2017-02-20 15:14                                             ` David Miller
2017-02-21 13:25                                             ` David Laight
2016-07-26  4:32   ` PROBLEM: network data corruption (bisected to e5a4b0bb803b) Alan Curry
2016-07-26  4:38   ` alexmcwhirter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.