[1.] One line summary of the problem: Crash when timestamping outgoing PTP packets under heavy network load (ppc, gianfar) [2.] Full description of the problem/report: I have a custom embedded platform running a QorIQ P2020 PPC processor and Freescale eTSEC for gigabit Ethernet. When I have heavy network load (full gigabit Ethernet usage), and send PTP Event packets with gianfar’s hardware timestamping enabled, memory/stack corruption seems to occur, leading to many different symptoms, including DMA API Debug warnings, CPU stalls, and oopses/panics due to null pointer dereference or invalid page access. The crash can also happen under lower load, but below 100MBps it can take hours for the issue to occur. [3.] Keywords (i.e., modules, networking, kernel): networking, ptp, ieee 1588, hardware timestamping, gianfar, ppc, p2020 [4.] Kernel information [4.1.] Kernel version (from /proc/version): # uname -a Linux version 4.4.235 (jjurack@oh-val-eng-12) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0) ) #13 SMP Thu Sep 10 08:28:37 EDT 2020 [4.2.] Kernel .config file: linux.config is attached [5.] Most recent kernel version which did not have the bug: The latest version of this system to not have this crash was using kernel 3.2 (Linux morlun 3.2.0 #1 SMP Fri Jun 10 12:43:28 PDT 2016 ppc GNU/Linux). However, many other parts of our system have changed since then and I have so far not been able to create modified versions of that build for 1:1 comparison testing. I have gone back as far as 4.4.129 on the 4.4 branch; that is where I first discovered the issue. [6.] Output of Oops.. message (if applicable) with symbolic information      resolved (see Documentation/oops-tracing.txt) Console logs from a few different crashes are attached (console{1..5}.log) [7.] A small shell script or example program which triggers the      problem (if possible) I have attached minimal programs that I used to generate network load and send timestamped PTP packets: * netdump.c generates network traffic. * dump.sh runs 16 instances of netdump. * test.py connects to the netdump instances from another system to consume the traffic. * hwts.c sends timestamped PTP packets. To reproduce I usually do these steps:  * run dump.sh  * start test.py on another system  * run `hwts 32` to send 32 ptp packets  * a crash usually happens within 10-20 seconds [8.] Environment [8.1.] Software (add the output of the ver_linux script here) $ scripts/ver_linux If some fields are empty or look unusual you may have an old version. Compare to the current minimal requirements in Documentation/Changes. Linux oh-val-eng-12 5.8.5-arch1-1 #1 SMP PREEMPT Thu, 27 Aug 2020 18:53:02 +0000 x86_64 GNU/Linux GNU C                   10.2.0 GNU Make                4.3 Binutils                2.35 Util-linux              2.36 Mount                   2.36 Module-init-tools       27 E2fsprogs               1.45.6 Jfsutils                1.1.15 Reiserfsprogs           3.6.27 Xfsprogs                5.7.0 Pcmciautils             018 PPP                     2.4.7 Linux C Library         2.32 Dynamic linker (ldd)    2.32 Linux C++ Library       6.0.28 Net-tools               2.10 Kbd                     2.3.0 Console-tools           2.3.0 Sh-utils                8.32 Udev                    246 Wireless-tools          30 Modules Loaded          acpi_cpufreq agpgart at24 auth_rpcgss bluetooth cdrom cec cfg80211 coretemp crc16 crc32c_generic crc32c_intel crypto_user dcdbas dell_smbios dell_wmi dell_wmi_descriptor drm drm_kms_helper e1000e ecc ecdh_generic ehci_hcd ehci_pci evdev ext4 fb_sys_fops fuse gpio_ich grace hid hid_generic hid_plantronics i2c_algo_bit i2c_i801 i2c_smbus i7core_edac input_leds intel_cstate intel_pmc_bxt intel_powerclamp intel_uncore ip_tables irqbypass iTCO_vendor_support iTCO_wdt jbd2 joydev kvm kvm_intel ledtrig_audio lockd loop lpc_ich mac_hid mbcache mc mousedev nfnetlink nfnetlink_log nfnetlink_queue nfs_acl nfsd parport parport_pc pcspkr ppdev radeon rc_core rfkill sg snd snd_hda_codec snd_hda_codec_generic snd_hda_codec_realtek snd_hda_core snd_hda_intel snd_hwdep snd_intel_dspcfg snd_pcm snd_rawmidi snd_seq_device snd_timer snd_usb_audio snd_usbmidi_lib soundcore sparse_keymap sr_mod sunrpc syscopyarea sysfillrect sysimgblt ttm uas usbhid usb_storage vboxdrv vboxnetadp vboxnetflt wmi wmi_bmof x_tables [8.2.] Processor information (from /proc/cpuinfo): # cat /proc/cpuinfo processor       : 0 cpu             : e500v2 clock           : 1200.000000MHz revision        : 5.1 (pvr 8021 1051) bogomips        : 150.00 processor       : 1 cpu             : e500v2 clock           : 1200.000000MHz revision        : 5.1 (pvr 8021 1051) bogomips        : 150.00 total bogomips  : 300.00 timebase        : 75000000 platform        : P2020 RDB model           : EMX-2500 Memory          : 512 MB [8.3.] Module information (from /proc/modules): # cat /proc/modules [8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem) # cat /proc/ioports 00000000-0000ffff : /pcie@ff708000   00000000-00000fff : Legacy IO 00020000-0002ffff : /pcie@ff709000   00020000-0002ffff : PCI Bus 0001:12 00040000-0004ffff : /pcie@ff70a000   00040000-0004ffff : PCI Bus 0002:17 # cat /proc/iomem 00000000-1fffffff : System RAM 80000000-bfffffff : /pcie@ff70a000   80000000-bfffffff : PCI Bus 0002:17     a0000000-bfffffff : /pcie@ff709000       a0000000-bfffffff : PCI Bus 0001:12 e0000000-ffffffff : /pcie@ff708000   e0000000-ffffffff : PCI Bus 0000:01     ff704500-ff704507 : serial     ff707000-ff707fff : /soc@ff700000/spi@7000     ff722000-ff722fff : /soc@ff700000/usb@22000       ff722000-ff722fff : /soc@ff700000/usb@22000         ff722000-ff722fff : /soc@ff700000/usb@22000 [8.5.] PCI information ('lspci -vvv' as root) lspci.log is attached [8.6.] SCSI information (from /proc/scsi/scsi) # cat /proc/scsi/scsi [8.7.] Other information that might be relevant to the problem        (please look in /proc and include all information that you        think to be relevant): I’ve attached our system’s device tree source file (emx-2500.dts). [X.] Other notes, patches, fixes, workarounds: [X.1.] I have tried commenting gianfar.c:965 (priv->hwts_tx_en = 1;). This causes the crash to disappear. [X.2.] Just calling the SIOCSHWTSTAMP ioctl to turn on hardware timestamping while under network load causes a DMA API Debug warning, but does not seem to destabilize the system otherwise. I’m not sure if this is the same issue or separate. Examples of this are included in each console log, as well as below: fsl-gianfar ff724000.ethernet: DMA-API: device driver frees DMA memory with wrong function [device address=0x000000001ed90000] [size=232 bytes] [mapped as page] [unmapped as single] ------------[ cut here ]------------ WARNING: at lib/dma-debug.c:1116 Modules linked in: CPU: 1 PID: 1589 Comm: hwts Tainted: G        W       4.4.235 #13 task: d9e41900 ti: db960000 task.ti: db960000 NIP: c030f5c0 LR: c030f5c0 CTR: c0367c44 REGS: db961c20 TRAP: 0700   Tainted: G        W        (4.4.235) MSR: 00021000   CR: 28002822  XER: 20000000 GPR00: c030f5c0 db961cd0 d9e41900 000000b5 dffd12f0 dffd2e0c 1f85f000 db960000 GPR08: 00000007 c0772d4c 1f85f000 00000297 42002884 100192ac 00000000 00000000 GPR16: 00000000 d9da443c 00000002 d9da4420 00000000 df980900 00000020 d9da4000 GPR24: 00029000 c07a8314 c07fe728 c07b0000 c07d9ec0 db961d28 c0808e20 d9ddbd20 NIP [c030f5c0] check_unmap+0x948/0xa90 LR [c030f5c0] check_unmap+0x948/0xa90 Call Trace: [db961cd0] [c030f5c0] check_unmap+0x948/0xa90 (unreliable) [db961d20] [c030f7a4] debug_dma_unmap_page+0x9c/0xb0 [db961da0] [c03eeb70] free_skb_resources+0xf4/0x3e4 [db961df0] [c03f354c] reset_gfar+0x68/0x9c [db961e00] [c03f378c] gfar_ioctl+0x20c/0x210 [db961e30] [c04a2d14] dev_ifsioc+0x308/0x31c [db961e60] [c04a2f94] dev_ioctl+0x1c0/0x624 [db961ec0] [c014b4d0] do_vfs_ioctl+0x38c/0x6b4 [db961f20] [c014b844] SyS_ioctl+0x4c/0x80 [db961f40] [c0011004] ret_from_syscall+0x0/0x3c --- interrupt: c01 at 0xff40194     LR = 0xffed0a8 Instruction dump: 554a103a 7c69402e 7cc9502e 811d001c 813d0020 815d0024 90610008 3c60c06b 90c1000c 3863b7a8 4cc63182 482c99b9 <0fe00000> 4bfffa60 3c80c06b 3884b0f8 ---[ end trace 2398b56cb968a2e0 ]--- Mapped at: [] gfar_start_xmit+0x888/0x9f0 [] dev_hard_start_xmit+0x27c/0x47c [] sch_direct_xmit+0xe4/0x278 [] __qdisc_run+0x94/0x1dc [] __dev_queue_xmit+0x384/0x70c --- James Jurack Systems Engineer VTI Instruments / Ametek Programmable Power