* Latest net-next from GIT panic [not found] <4745525f-18e4-7f69-fe21-8e507e407b33@itcare.pl> @ 2017-09-19 22:35 ` Paweł Staszewski 2017-09-19 23:45 ` Paweł Staszewski 0 siblings, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-19 22:35 UTC (permalink / raw) To: Linux Kernel Network Developers Just tried latest net-next git and found kernel panic. Below link to bugzilla. https://bugzilla.kernel.org/attachment.cgi?id=258499 ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-19 22:35 ` Latest net-next from GIT panic Paweł Staszewski @ 2017-09-19 23:45 ` Paweł Staszewski 2017-09-20 0:01 ` Paweł Staszewski 0 siblings, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-19 23:45 UTC (permalink / raw) To: Linux Kernel Network Developers Added few more screenshoots from kernels 4.14-rc1(net-next) and 4.14-rc1(linux-next) https://bugzilla.kernel.org/show_bug.cgi?id=197005 W dniu 2017-09-20 o 00:35, Paweł Staszewski pisze: > Just tried latest net-next git and found kernel panic. > > Below link to bugzilla. > > https://bugzilla.kernel.org/attachment.cgi?id=258499 > > > > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-19 23:45 ` Paweł Staszewski @ 2017-09-20 0:01 ` Paweł Staszewski 2017-09-20 0:06 ` Paweł Staszewski 0 siblings, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 0:01 UTC (permalink / raw) To: Paweł Staszewski, Linux Kernel Network Developers Some information about enviroment: Server is acting as a ip router with bgp There are 6x bgp sessions - each with full bgp table ~600k prefixes And it looks like panic is appearing after bgp sessions are connected - not by traffic - cause at time when panic occured there is almost no traffic. Also when I run tris server without turning on BGP and push thru this server traffic by pktgen there is no panic. just after it learn routes it panick W dniu 2017-09-20 o 01:45, Paweł Staszewski pisze: > Added few more screenshoots from kernels 4.14-rc1(net-next) and > 4.14-rc1(linux-next) > > https://bugzilla.kernel.org/show_bug.cgi?id=197005 > > > W dniu 2017-09-20 o 00:35, Paweł Staszewski pisze: >> Just tried latest net-next git and found kernel panic. >> >> Below link to bugzilla. >> >> https://bugzilla.kernel.org/attachment.cgi?id=258499 >> >> >> >> >> > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 0:01 ` Paweł Staszewski @ 2017-09-20 0:06 ` Paweł Staszewski 2017-09-20 0:26 ` Paweł Staszewski 2017-09-20 3:24 ` Eric Dumazet 0 siblings, 2 replies; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 0:06 UTC (permalink / raw) To: Linux Kernel Network Developers Just checked kernel 4.13.2 and same problem Just after start all 6 bgp sessions - and kernel starts to learn routes it panic. https://bugzilla.kernel.org/attachment.cgi?id=258509 W dniu 2017-09-20 o 02:01, Paweł Staszewski pisze: > Some information about enviroment: > Server is acting as a ip router with bgp > There are 6x bgp sessions - each with full bgp table ~600k prefixes > > And it looks like panic is appearing after bgp sessions are connected > - not by traffic - cause at time when panic occured there is almost no > traffic. > > Also when I run tris server without turning on BGP and push thru this > server traffic by pktgen there is no panic. > > just after it learn routes it panick > > > > > > > > W dniu 2017-09-20 o 01:45, Paweł Staszewski pisze: >> Added few more screenshoots from kernels 4.14-rc1(net-next) and >> 4.14-rc1(linux-next) >> >> https://bugzilla.kernel.org/show_bug.cgi?id=197005 >> >> >> W dniu 2017-09-20 o 00:35, Paweł Staszewski pisze: >>> Just tried latest net-next git and found kernel panic. >>> >>> Below link to bugzilla. >>> >>> https://bugzilla.kernel.org/attachment.cgi?id=258499 >>> >>> >>> >>> >>> >> >> > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 0:06 ` Paweł Staszewski @ 2017-09-20 0:26 ` Paweł Staszewski 2017-09-20 3:24 ` Eric Dumazet 1 sibling, 0 replies; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 0:26 UTC (permalink / raw) To: Linux Kernel Network Developers Latest working kernel with same configuration and kernel config 4.12.13 There is no panic after routes from all 6x bgp sessions are learned. ip r | wc -l 653112 W dniu 2017-09-20 o 02:06, Paweł Staszewski pisze: > Just checked kernel 4.13.2 and same problem > > Just after start all 6 bgp sessions - and kernel starts to learn > routes it panic. > > https://bugzilla.kernel.org/attachment.cgi?id=258509 > > > > W dniu 2017-09-20 o 02:01, Paweł Staszewski pisze: >> Some information about enviroment: >> Server is acting as a ip router with bgp >> There are 6x bgp sessions - each with full bgp table ~600k prefixes >> >> And it looks like panic is appearing after bgp sessions are connected >> - not by traffic - cause at time when panic occured there is almost >> no traffic. >> >> Also when I run tris server without turning on BGP and push thru this >> server traffic by pktgen there is no panic. >> >> just after it learn routes it panick >> >> >> >> >> >> >> >> W dniu 2017-09-20 o 01:45, Paweł Staszewski pisze: >>> Added few more screenshoots from kernels 4.14-rc1(net-next) and >>> 4.14-rc1(linux-next) >>> >>> https://bugzilla.kernel.org/show_bug.cgi?id=197005 >>> >>> >>> W dniu 2017-09-20 o 00:35, Paweł Staszewski pisze: >>>> Just tried latest net-next git and found kernel panic. >>>> >>>> Below link to bugzilla. >>>> >>>> https://bugzilla.kernel.org/attachment.cgi?id=258499 >>>> >>>> >>>> >>>> >>>> >>> >>> >> >> > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 0:06 ` Paweł Staszewski 2017-09-20 0:26 ` Paweł Staszewski @ 2017-09-20 3:24 ` Eric Dumazet 2017-09-20 7:58 ` Paweł Staszewski 1 sibling, 1 reply; 52+ messages in thread From: Eric Dumazet @ 2017-09-20 3:24 UTC (permalink / raw) To: Paweł Staszewski; +Cc: Linux Kernel Network Developers On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote: > Just checked kernel 4.13.2 and same problem > > Just after start all 6 bgp sessions - and kernel starts to learn routes > it panic. > > https://bugzilla.kernel.org/attachment.cgi?id=258509 > Unfortunately we have not enough information from these traces. Can you get a full stack trace ? Alternatively, can you bisect ? Thanks. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 3:24 ` Eric Dumazet @ 2017-09-20 7:58 ` Paweł Staszewski 2017-09-20 8:44 ` Paweł Staszewski 0 siblings, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 7:58 UTC (permalink / raw) To: Eric Dumazet; +Cc: Linux Kernel Network Developers Hi Will try bisecting tonight W dniu 2017-09-20 o 05:24, Eric Dumazet pisze: > On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote: >> Just checked kernel 4.13.2 and same problem >> >> Just after start all 6 bgp sessions - and kernel starts to learn routes >> it panic. >> >> https://bugzilla.kernel.org/attachment.cgi?id=258509 >> > > Unfortunately we have not enough information from these traces. > > Can you get a full stack trace ? > > Alternatively, can you bisect ? > > Thanks. > > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 7:58 ` Paweł Staszewski @ 2017-09-20 8:44 ` Paweł Staszewski 2017-09-20 9:45 ` Paweł Staszewski 0 siblings, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 8:44 UTC (permalink / raw) To: Eric Dumazet; +Cc: Linux Kernel Network Developers Trying to make video from ipmi :) with that results: https://bugzilla.kernel.org/attachment.cgi?id=258521 catched two more lines where it starts - panic from 4.13.2. Now will try tro do some bisection W dniu 2017-09-20 o 09:58, Paweł Staszewski pisze: > Hi > > > Will try bisecting tonight > > > > W dniu 2017-09-20 o 05:24, Eric Dumazet pisze: >> On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote: >>> Just checked kernel 4.13.2 and same problem >>> >>> Just after start all 6 bgp sessions - and kernel starts to learn routes >>> it panic. >>> >>> https://bugzilla.kernel.org/attachment.cgi?id=258509 >>> >> >> Unfortunately we have not enough information from these traces. >> >> Can you get a full stack trace ? >> >> Alternatively, can you bisect ? >> >> Thanks. >> >> >> > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 8:44 ` Paweł Staszewski @ 2017-09-20 9:45 ` Paweł Staszewski 2017-09-20 10:21 ` Paweł Staszewski 0 siblings, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 9:45 UTC (permalink / raw) To: Eric Dumazet; +Cc: Linux Kernel Network Developers Ok looks like ending bisection Latest bisected kernel when there is no kernel panic 4.12.0+ (from next) - but only this warning: [ 309.030019] NETDEV WATCHDOG: enp4s0f0 (ixgbe): transmit queue 0 timed out [ 309.030034] ------------[ cut here ]------------ [ 309.030040] WARNING: CPU: 35 PID: 0 at dev_watchdog+0xcf/0x139 [ 309.030041] Modules linked in: bonding ipmi_si x86_pkg_temp_thermal [ 309.030045] CPU: 35 PID: 0 Comm: swapper/35 Not tainted 4.12.0+ #5 [ 309.030046] task: ffff88086d98a000 task.stack: ffffc90003378000 [ 309.030048] RIP: 0010:dev_watchdog+0xcf/0x139 [ 309.030049] RSP: 0018:ffff88087fbc3ea8 EFLAGS: 00010246 [ 309.030050] RAX: 000000000000003d RBX: ffff88046b680000 RCX: 0000000000000000 [ 309.030050] RDX: ffff88087fbd2f01 RSI: 0000000000000000 RDI: ffff88087fbcda08 [ 309.030051] RBP: ffff88087fbc3eb8 R08: 0000000000000000 R09: ffff88087ff80a04 [ 309.030051] R10: 0000000000000000 R11: ffff88086d98a001 R12: 0000000000000000 [ 309.030052] R13: ffff88087fbc3ef8 R14: ffff88086d98a000 R15: ffffffff81c06008 [ 309.030053] FS: 0000000000000000(0000) GS:ffff88087fbc0000(0000) knlGS:0000000000000000 [ 309.030054] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 309.030054] CR2: 00007fba600f6098 CR3: 000000086b955000 CR4: 00000000001406e0 [ 309.030055] Call Trace: [ 309.030057] <IRQ> [ 309.030059] ? netif_tx_lock+0x79/0x79 [ 309.030062] call_timer_fn.isra.24+0x17/0x77 [ 309.030063] run_timer_softirq+0x118/0x161 [ 309.030065] ? netif_tx_lock+0x79/0x79 [ 309.030066] ? ktime_get+0x2b/0x42 [ 309.030070] ? lapic_next_deadline+0x21/0x27 [ 309.030073] ? clockevents_program_event+0xa8/0xc5 [ 309.030076] __do_softirq+0xa8/0x19d [ 309.030078] irq_exit+0x5d/0x6b [ 309.030079] smp_apic_timer_interrupt+0x2a/0x36 [ 309.030082] apic_timer_interrupt+0x89/0x90 [ 309.030085] RIP: 0010:mwait_idle+0x4e/0x6a [ 309.030086] RSP: 0018:ffffc9000337be98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 [ 309.030087] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 309.030087] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88086d98a000 [ 309.030088] RBP: ffffc9000337be98 R08: ffff88046f8279a0 R09: ffff88046f827040 [ 309.030089] R10: ffff88086d98a000 R11: ffff88086d98a000 R12: 0000000000000000 [ 309.030089] R13: ffff88086d98a000 R14: ffff88086d98a000 R15: ffff88086d98a000 [ 309.030090] </IRQ> [ 309.030094] arch_cpu_idle+0xa/0xc [ 309.030095] default_idle_call+0x19/0x1b [ 309.030102] do_idle+0xbc/0x196 [ 309.030104] cpu_startup_entry+0x1d/0x20 [ 309.030105] start_secondary+0xd8/0xdc [ 309.030108] secondary_startup_64+0x9f/0x9f [ 309.030109] Code: cc 75 bd eb 35 48 89 df c6 05 c3 dc 74 00 01 e8 3a 62 fe ff 44 89 e1 48 89 de 48 89 c2 48 c7 c7 0f 65 a4 81 31 c0 e8 3d 4c b5 ff <0f> ff 48 8b 83 e0 01 00 00 48 89 df ff 50 78 48 8b 05 a0 bc 6a [ 309.030128] ---[ end trace 9102cb25703ae2d9 ]--- I just marked it as good - cause this problem above is differend - and im going to: git bisect good Bisecting: 1787 revisions left to test after this (roughly 11 steps) W dniu 2017-09-20 o 10:44, Paweł Staszewski pisze: > Trying to make video from ipmi :) > > with that results: > > https://bugzilla.kernel.org/attachment.cgi?id=258521 > > catched two more lines where it starts - panic from 4.13.2. > > > Now will try tro do some bisection > > > > W dniu 2017-09-20 o 09:58, Paweł Staszewski pisze: >> Hi >> >> >> Will try bisecting tonight >> >> >> >> W dniu 2017-09-20 o 05:24, Eric Dumazet pisze: >>> On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote: >>>> Just checked kernel 4.13.2 and same problem >>>> >>>> Just after start all 6 bgp sessions - and kernel starts to learn >>>> routes >>>> it panic. >>>> >>>> https://bugzilla.kernel.org/attachment.cgi?id=258509 >>>> >>> >>> Unfortunately we have not enough information from these traces. >>> >>> Can you get a full stack trace ? >>> >>> Alternatively, can you bisect ? >>> >>> Thanks. >>> >>> >>> >> >> > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 9:45 ` Paweł Staszewski @ 2017-09-20 10:21 ` Paweł Staszewski 2017-09-20 10:22 ` Paweł Staszewski 0 siblings, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 10:21 UTC (permalink / raw) To: Eric Dumazet; +Cc: Linux Kernel Network Developers Ok kernel crashed with different panic that i didnt catch when i was doing bisect and now my bisection is broken :) git bisect good Bisecting: 1787 revisions left to test after this (roughly 11 steps) error: Your local changes to the following files would be overwritten by checkout: Documentation/00-INDEX Documentation/ABI/stable/sysfs-class-udc Documentation/ABI/testing/configfs-usb-gadget-uac1 Documentation/ABI/testing/ima_policy Documentation/ABI/testing/sysfs-bus-iio Documentation/ABI/testing/sysfs-bus-iio-meas-spec Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 Documentation/ABI/testing/sysfs-class-net Documentation/ABI/testing/sysfs-class-power-twl4030 Documentation/ABI/testing/sysfs-class-typec Documentation/DMA-API.txt Documentation/IRQ-domain.txt Documentation/Makefile Documentation/PCI/MSI-HOWTO.txt Documentation/RCU/00-INDEX Documentation/RCU/Design/Requirements/Requirements.html Documentation/RCU/checklist.txt Documentation/admin-guide/README.rst Documentation/admin-guide/devices.txt Documentation/admin-guide/index.rst Documentation/admin-guide/kernel-parameters.txt Documentation/admin-guide/pm/cpufreq.rst Documentation/admin-guide/pm/intel_pstate.rst Documentation/admin-guide/ras.rst Documentation/arm/Atmel/README Documentation/block/biodoc.txt Documentation/conf.py Documentation/core-api/assoc_array.rst Documentation/core-api/atomic_ops.rst Documentation/core-api/index.rst Documentation/crypto/asymmetric-keys.txt Documentation/dev-tools/index.rst Documentation/dev-tools/sparse.rst Documentation/devicetree/bindings/arm/amlogic.txt Documentation/devicetree/bindings/arm/atmel-at91.txt Documentation/devicetree/bindings/arm/ccn.txt Documentation/devicetree/bindings/arm/cpus.txt Documentation/devicetree/bindings/arm/gemini.txt Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt Documentation/devicetree/bindings/arm/keystone/keystone.txt Documentation/devicetree/bindings/arm/mediatek.txt Documentation/devicetree/bindings/arm/rockchip.txt Documentation/devicetree/bindings/arm/shmobile.txt Documentation/devicetree/bindings/arm/tegra.txt Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt Documentation/devicetree/bindings/gpio/gpio_atmel.txt Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt Documentation/devicetree/bindings/iio/adc/renesas,gyroadc.txt Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt Documentation/devicetree/bindings/interrupt-controller/allwinner,sunxi-nmi.txt Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-vic.txt Documentation/devicetree/bindings/interrupt-controller/mediatek,sysirq.txt Documentation/devicetree/bindings/leds/common.txt Documentation/devicetree/bindings/mfd/hi6421.txt Documentation/devicetree/bindings/mfd/tps65910.txt Documentation/devicetree/bindings/mmc/fsl-esdhc.txt Documentation/devicetree/bindings/mmc/k3-dw-mshc.txt Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt Documentation/devicetree/bindings/mtd/atmel-nand.txt Documentation/devicetree/bindings/net/dsa/b53.txt Documentation/devicetree/bindings/net/ethernet.txt Documentation/devicetree/bindings/net/macb.txt Documentation/devicetree/bindings/net/marvell-orion-mdio.txt Documentation/devicetree/bindings/net/ti,wilink-st.txt Documentation/devicetree/bindings/net/wireless/ti,wlcore.txt Documentation/devicetree/bindings/nvmem/rockchip-efuse.txt Documentation/devicetree/bindings/opp/opp.txt Documentation/devicetree/bindings/phy/bcm-ns-usb3-phy.txt Documentation/devicetree/bindings/phy/brcm-sata-phy.txt Documentation/devicetree/bindings/phy/meson8b-usb2-phy.txt Documentation/devicetree/bindings/phy/phy-rockchip-inno-usb2.txt Documentation/devicetree/bindings/power/rockchip-io-domain.txt Documentation/devicetree/bindings/power/supply/bq27xxx.txt Documentation/devicetree/bindings/property-units.txt Documentation/devicetree/bindings/regulator/regulator.txt Documentation/devicetree/bindings/serial/8 error: The following untracked working tree files would be overwritten by checkout: Documentation/ABI/testing/sysfs-class-net-phydev Documentation/DocBook/.gitignore Documentation/DocBook/Makefile Documentation/DocBook/filesystems.tmpl Documentation/DocBook/kernel-hacking.tmpl Documentation/DocBook/kernel-locking.tmpl Documentation/DocBook/kgdb.tmpl Documentation/DocBook/libata.tmpl Documentation/DocBook/librs.tmpl Documentation/DocBook/lsm.tmpl Documentation/DocBook/mtdnand.tmpl Documentation/DocBook/networking.tmpl Documentation/DocBook/rapidio.tmpl Documentation/DocBook/s390-drivers.tmpl Documentation/DocBook/scsi.tmpl Documentation/DocBook/sh.tmpl Documentation/DocBook/stylesheet.xsl Documentation/DocBook/w1.tmpl Documentation/DocBook/z8530book.tmpl Documentation/Makefile.sphinx Documentation/RCU/trace.txt Documentation/devicetree/bindings/i2c/i2c-mt6577.txt Documentation/devicetree/bindings/misc/allwinner,syscon.txt Documentation/devicetree/bindings/net/cortina.txt Documentation/devicetree/bindings/net/dsa/ksz.txt Documentation/devicetree/bindings/net/dwmac-sun8i.txt Documentation/devicetree/bindings/net/qca,qca7000.txt Documentation/devicetree/bindings/power/max8903-charger.txt Documentation/devicetree/bindings/power_supply/maxim,max14656.txt Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt Documentation/devicetree/bindings/timer/moxa,moxart-timer.txt Documentation/doc-guide/docbook.rst Documentation/networking/tls.txt Documentation/prctl/no_new_privs.txt Documentation/prctl/seccomp_filter.txt Documentation/security/00-INDEX Documentation/security/IMA-templates.txt Documentation/security/LSM.txt Documentation/security/LoadPin.txt Documentation/security/SELinux.txt Documentation/security/Smack.txt Documentation/security/Yama.txt Documentation/security/apparmor.txt Documentation/security/conf.py Documentation/security/credentials.txt Documentation/security/keys-ecryptfs.txt Documentation/security/keys-request-key.txt Documentation/security/keys-trusted-encrypted.txt Documentation/security/keys.txt Documentation/security/self-protection.txt Documentation/security/tomoyo.txt Documentation/sphinx/convert_template.sed Documentation/sphinx/post_convert.sed Documentation/sphinx/tmplcvt Documentation/usb/typec.rst Documentation/usb/usb3-debug-port.rst arch/arm/boot/dts/rk1108-evb.dts arch/arm/boot/dts/rk1108.dtsi arch/arm/boot/dts/tegra20-whistler.dts arch/arm/mach-omap2/opp.c arch/arm/mach-omap2/pmu.c arch/ia64/include/asm/siginfo.h arch/m32r/include/uapi/asm/siginfo.h arch/microblaze/include/asm/bitops.h arch/microblaze/include/asm/bug.h arch/microblaze/include/asm/bugs.h arch/microblaze/include/asm/div64.h arch/microblaze/include/asm/emergency-restart.h arch/microblaze/include/asm/fb.h arch/microblaze/include/asm/hardirq.h arch/microblaze/include/asm/irq_regs.h arch/microblaze/include/asm/kdebug.h arch/microblaze/include/asm/kmap_types.h arch/microblaze/include/asm/linkage.h arch/microblaze/include/asm/local.h arch/microblaze/include/asm/local64.h arch/microblaze/include/asm/parport.h arch/microblaze/include/asm/percpu.h arch/microblaze/include/asm/serial.h arch/microblaze/include/asm/shmparam.h arch/microblaze/include/asm/topology.h arch/microblaze/include/asm/ucontext.h arch/microblaze/include/asm/vga.h arch/microblaze/include/asm/xor.h arch/microblaze/include/uapi/asm/bitsperlong.h arch/microblaze/include/uapi/asm/errno.h arch/microblaze/include/uapi/asm/fcntl.h arch/microblaze/include/uapi/asm/ioctl.h arch/microblaze/include/uapi/asm/ioctls.h arch/microblaze/include/uapi/asm/ipcbuf.h arch/microblaze/include/uapi/asm/kvm_para.h arch/microblaze/include/uapi/asm/mman.h arch/microblaze/include/uapi/asm/msgbuf.h arch/microblaze/include/uapi/asm/param.h arch/microblaze/include/uapi/asm/poll.h arch/microblaze/include/uapi/asm/resource.h arch/microblaze/include/uapi/asm/sembuf.h arch/microblaze/include/uapi/asm/shmbuf.h arch/microblaze/include/uapi/asm/siginfo.h arch/microblaze/include/uapi/asm/signal.h arch/microblaze/includ Aborting W dniu 2017-09-20 o 11:45, Paweł Staszewski pisze: > Ok looks like ending bisection > > > Latest bisected kernel when there is no kernel panic 4.12.0+ (from > next) - but only this warning: > > [ 309.030019] NETDEV WATCHDOG: enp4s0f0 (ixgbe): transmit queue 0 > timed out > [ 309.030034] ------------[ cut here ]------------ > [ 309.030040] WARNING: CPU: 35 PID: 0 at dev_watchdog+0xcf/0x139 > [ 309.030041] Modules linked in: bonding ipmi_si x86_pkg_temp_thermal > [ 309.030045] CPU: 35 PID: 0 Comm: swapper/35 Not tainted 4.12.0+ #5 > [ 309.030046] task: ffff88086d98a000 task.stack: ffffc90003378000 > [ 309.030048] RIP: 0010:dev_watchdog+0xcf/0x139 > [ 309.030049] RSP: 0018:ffff88087fbc3ea8 EFLAGS: 00010246 > [ 309.030050] RAX: 000000000000003d RBX: ffff88046b680000 RCX: > 0000000000000000 > [ 309.030050] RDX: ffff88087fbd2f01 RSI: 0000000000000000 RDI: > ffff88087fbcda08 > [ 309.030051] RBP: ffff88087fbc3eb8 R08: 0000000000000000 R09: > ffff88087ff80a04 > [ 309.030051] R10: 0000000000000000 R11: ffff88086d98a001 R12: > 0000000000000000 > [ 309.030052] R13: ffff88087fbc3ef8 R14: ffff88086d98a000 R15: > ffffffff81c06008 > [ 309.030053] FS: 0000000000000000(0000) GS:ffff88087fbc0000(0000) > knlGS:0000000000000000 > [ 309.030054] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 309.030054] CR2: 00007fba600f6098 CR3: 000000086b955000 CR4: > 00000000001406e0 > [ 309.030055] Call Trace: > [ 309.030057] <IRQ> > [ 309.030059] ? netif_tx_lock+0x79/0x79 > [ 309.030062] call_timer_fn.isra.24+0x17/0x77 > [ 309.030063] run_timer_softirq+0x118/0x161 > [ 309.030065] ? netif_tx_lock+0x79/0x79 > [ 309.030066] ? ktime_get+0x2b/0x42 > [ 309.030070] ? lapic_next_deadline+0x21/0x27 > [ 309.030073] ? clockevents_program_event+0xa8/0xc5 > [ 309.030076] __do_softirq+0xa8/0x19d > [ 309.030078] irq_exit+0x5d/0x6b > [ 309.030079] smp_apic_timer_interrupt+0x2a/0x36 > [ 309.030082] apic_timer_interrupt+0x89/0x90 > [ 309.030085] RIP: 0010:mwait_idle+0x4e/0x6a > [ 309.030086] RSP: 0018:ffffc9000337be98 EFLAGS: 00000246 ORIG_RAX: > ffffffffffffff10 > [ 309.030087] RAX: 0000000000000000 RBX: 0000000000000000 RCX: > 0000000000000000 > [ 309.030087] RDX: 0000000000000000 RSI: 0000000000000000 RDI: > ffff88086d98a000 > [ 309.030088] RBP: ffffc9000337be98 R08: ffff88046f8279a0 R09: > ffff88046f827040 > [ 309.030089] R10: ffff88086d98a000 R11: ffff88086d98a000 R12: > 0000000000000000 > [ 309.030089] R13: ffff88086d98a000 R14: ffff88086d98a000 R15: > ffff88086d98a000 > [ 309.030090] </IRQ> > [ 309.030094] arch_cpu_idle+0xa/0xc > [ 309.030095] default_idle_call+0x19/0x1b > [ 309.030102] do_idle+0xbc/0x196 > [ 309.030104] cpu_startup_entry+0x1d/0x20 > [ 309.030105] start_secondary+0xd8/0xdc > [ 309.030108] secondary_startup_64+0x9f/0x9f > [ 309.030109] Code: cc 75 bd eb 35 48 89 df c6 05 c3 dc 74 00 01 e8 > 3a 62 fe ff 44 89 e1 48 89 de 48 89 c2 48 c7 c7 0f 65 a4 81 31 c0 e8 > 3d 4c b5 ff <0f> ff 48 8b 83 e0 01 00 00 48 89 df ff 50 78 48 8b 05 a0 > bc 6a > [ 309.030128] ---[ end trace 9102cb25703ae2d9 ]--- > > > I just marked it as good - cause this problem above is differend - and > im going to: > > git bisect good > Bisecting: 1787 revisions left to test after this (roughly 11 steps) > > > > > W dniu 2017-09-20 o 10:44, Paweł Staszewski pisze: >> Trying to make video from ipmi :) >> >> with that results: >> >> https://bugzilla.kernel.org/attachment.cgi?id=258521 >> >> catched two more lines where it starts - panic from 4.13.2. >> >> >> Now will try tro do some bisection >> >> >> >> W dniu 2017-09-20 o 09:58, Paweł Staszewski pisze: >>> Hi >>> >>> >>> Will try bisecting tonight >>> >>> >>> >>> W dniu 2017-09-20 o 05:24, Eric Dumazet pisze: >>>> On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote: >>>>> Just checked kernel 4.13.2 and same problem >>>>> >>>>> Just after start all 6 bgp sessions - and kernel starts to learn >>>>> routes >>>>> it panic. >>>>> >>>>> https://bugzilla.kernel.org/attachment.cgi?id=258509 >>>>> >>>> >>>> Unfortunately we have not enough information from these traces. >>>> >>>> Can you get a full stack trace ? >>>> >>>> Alternatively, can you bisect ? >>>> >>>> Thanks. >>>> >>>> >>>> >>> >>> >> >> > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 10:21 ` Paweł Staszewski @ 2017-09-20 10:22 ` Paweł Staszewski 2017-09-20 11:02 ` Paweł Staszewski 0 siblings, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 10:22 UTC (permalink / raw) To: Eric Dumazet; +Cc: Linux Kernel Network Developers Soo far bisected and marked: git bisect start # bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2 git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355 # good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13 git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269 # good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12 git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 'pinctrl-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze: > Ok kernel crashed with different panic that i didnt catch when i was > doing bisect and now my bisection is broken :) > > git bisect good > Bisecting: 1787 revisions left to test after this (roughly 11 steps) > error: Your local changes to the following files would be overwritten > by checkout: > Documentation/00-INDEX > Documentation/ABI/stable/sysfs-class-udc > Documentation/ABI/testing/configfs-usb-gadget-uac1 > Documentation/ABI/testing/ima_policy > Documentation/ABI/testing/sysfs-bus-iio > Documentation/ABI/testing/sysfs-bus-iio-meas-spec > Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 > Documentation/ABI/testing/sysfs-class-net > Documentation/ABI/testing/sysfs-class-power-twl4030 > Documentation/ABI/testing/sysfs-class-typec > Documentation/DMA-API.txt > Documentation/IRQ-domain.txt > Documentation/Makefile > Documentation/PCI/MSI-HOWTO.txt > Documentation/RCU/00-INDEX > Documentation/RCU/Design/Requirements/Requirements.html > Documentation/RCU/checklist.txt > Documentation/admin-guide/README.rst > Documentation/admin-guide/devices.txt > Documentation/admin-guide/index.rst > Documentation/admin-guide/kernel-parameters.txt > Documentation/admin-guide/pm/cpufreq.rst > Documentation/admin-guide/pm/intel_pstate.rst > Documentation/admin-guide/ras.rst > Documentation/arm/Atmel/README > Documentation/block/biodoc.txt > Documentation/conf.py > Documentation/core-api/assoc_array.rst > Documentation/core-api/atomic_ops.rst > Documentation/core-api/index.rst > Documentation/crypto/asymmetric-keys.txt > Documentation/dev-tools/index.rst > Documentation/dev-tools/sparse.rst > Documentation/devicetree/bindings/arm/amlogic.txt > Documentation/devicetree/bindings/arm/atmel-at91.txt > Documentation/devicetree/bindings/arm/ccn.txt > Documentation/devicetree/bindings/arm/cpus.txt > Documentation/devicetree/bindings/arm/gemini.txt > Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt > Documentation/devicetree/bindings/arm/keystone/keystone.txt > Documentation/devicetree/bindings/arm/mediatek.txt > Documentation/devicetree/bindings/arm/rockchip.txt > Documentation/devicetree/bindings/arm/shmobile.txt > Documentation/devicetree/bindings/arm/tegra.txt > Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt > Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt > Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt > Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt > Documentation/devicetree/bindings/gpio/gpio_atmel.txt > Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt > Documentation/devicetree/bindings/iio/adc/renesas,gyroadc.txt > Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt > Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt > Documentation/devicetree/bindings/interrupt-controller/allwinner,sunxi-nmi.txt > > Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-vic.txt > > Documentation/devicetree/bindings/interrupt-controller/mediatek,sysirq.txt > > Documentation/devicetree/bindings/leds/common.txt > Documentation/devicetree/bindings/mfd/hi6421.txt > Documentation/devicetree/bindings/mfd/tps65910.txt > Documentation/devicetree/bindings/mmc/fsl-esdhc.txt > Documentation/devicetree/bindings/mmc/k3-dw-mshc.txt > Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt > Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt > Documentation/devicetree/bindings/mtd/atmel-nand.txt > Documentation/devicetree/bindings/net/dsa/b53.txt > Documentation/devicetree/bindings/net/ethernet.txt > Documentation/devicetree/bindings/net/macb.txt > Documentation/devicetree/bindings/net/marvell-orion-mdio.txt > Documentation/devicetree/bindings/net/ti,wilink-st.txt > Documentation/devicetree/bindings/net/wireless/ti,wlcore.txt > Documentation/devicetree/bindings/nvmem/rockchip-efuse.txt > Documentation/devicetree/bindings/opp/opp.txt > Documentation/devicetree/bindings/phy/bcm-ns-usb3-phy.txt > Documentation/devicetree/bindings/phy/brcm-sata-phy.txt > Documentation/devicetree/bindings/phy/meson8b-usb2-phy.txt > Documentation/devicetree/bindings/phy/phy-rockchip-inno-usb2.txt > Documentation/devicetree/bindings/power/rockchip-io-domain.txt > Documentation/devicetree/bindings/power/supply/bq27xxx.txt > Documentation/devicetree/bindings/property-units.txt > Documentation/devicetree/bindings/regulator/regulator.txt > Documentation/devicetree/bindings/serial/8 > error: The following untracked working tree files would be overwritten > by checkout: > Documentation/ABI/testing/sysfs-class-net-phydev > Documentation/DocBook/.gitignore > Documentation/DocBook/Makefile > Documentation/DocBook/filesystems.tmpl > Documentation/DocBook/kernel-hacking.tmpl > Documentation/DocBook/kernel-locking.tmpl > Documentation/DocBook/kgdb.tmpl > Documentation/DocBook/libata.tmpl > Documentation/DocBook/librs.tmpl > Documentation/DocBook/lsm.tmpl > Documentation/DocBook/mtdnand.tmpl > Documentation/DocBook/networking.tmpl > Documentation/DocBook/rapidio.tmpl > Documentation/DocBook/s390-drivers.tmpl > Documentation/DocBook/scsi.tmpl > Documentation/DocBook/sh.tmpl > Documentation/DocBook/stylesheet.xsl > Documentation/DocBook/w1.tmpl > Documentation/DocBook/z8530book.tmpl > Documentation/Makefile.sphinx > Documentation/RCU/trace.txt > Documentation/devicetree/bindings/i2c/i2c-mt6577.txt > Documentation/devicetree/bindings/misc/allwinner,syscon.txt > Documentation/devicetree/bindings/net/cortina.txt > Documentation/devicetree/bindings/net/dsa/ksz.txt > Documentation/devicetree/bindings/net/dwmac-sun8i.txt > Documentation/devicetree/bindings/net/qca,qca7000.txt > Documentation/devicetree/bindings/power/max8903-charger.txt > Documentation/devicetree/bindings/power_supply/maxim,max14656.txt > Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt > Documentation/devicetree/bindings/timer/moxa,moxart-timer.txt > Documentation/doc-guide/docbook.rst > Documentation/networking/tls.txt > Documentation/prctl/no_new_privs.txt > Documentation/prctl/seccomp_filter.txt > Documentation/security/00-INDEX > Documentation/security/IMA-templates.txt > Documentation/security/LSM.txt > Documentation/security/LoadPin.txt > Documentation/security/SELinux.txt > Documentation/security/Smack.txt > Documentation/security/Yama.txt > Documentation/security/apparmor.txt > Documentation/security/conf.py > Documentation/security/credentials.txt > Documentation/security/keys-ecryptfs.txt > Documentation/security/keys-request-key.txt > Documentation/security/keys-trusted-encrypted.txt > Documentation/security/keys.txt > Documentation/security/self-protection.txt > Documentation/security/tomoyo.txt > Documentation/sphinx/convert_template.sed > Documentation/sphinx/post_convert.sed > Documentation/sphinx/tmplcvt > Documentation/usb/typec.rst > Documentation/usb/usb3-debug-port.rst > arch/arm/boot/dts/rk1108-evb.dts > arch/arm/boot/dts/rk1108.dtsi > arch/arm/boot/dts/tegra20-whistler.dts > arch/arm/mach-omap2/opp.c > arch/arm/mach-omap2/pmu.c > arch/ia64/include/asm/siginfo.h > arch/m32r/include/uapi/asm/siginfo.h > arch/microblaze/include/asm/bitops.h > arch/microblaze/include/asm/bug.h > arch/microblaze/include/asm/bugs.h > arch/microblaze/include/asm/div64.h > arch/microblaze/include/asm/emergency-restart.h > arch/microblaze/include/asm/fb.h > arch/microblaze/include/asm/hardirq.h > arch/microblaze/include/asm/irq_regs.h > arch/microblaze/include/asm/kdebug.h > arch/microblaze/include/asm/kmap_types.h > arch/microblaze/include/asm/linkage.h > arch/microblaze/include/asm/local.h > arch/microblaze/include/asm/local64.h > arch/microblaze/include/asm/parport.h > arch/microblaze/include/asm/percpu.h > arch/microblaze/include/asm/serial.h > arch/microblaze/include/asm/shmparam.h > arch/microblaze/include/asm/topology.h > arch/microblaze/include/asm/ucontext.h > arch/microblaze/include/asm/vga.h > arch/microblaze/include/asm/xor.h > arch/microblaze/include/uapi/asm/bitsperlong.h > arch/microblaze/include/uapi/asm/errno.h > arch/microblaze/include/uapi/asm/fcntl.h > arch/microblaze/include/uapi/asm/ioctl.h > arch/microblaze/include/uapi/asm/ioctls.h > arch/microblaze/include/uapi/asm/ipcbuf.h > arch/microblaze/include/uapi/asm/kvm_para.h > arch/microblaze/include/uapi/asm/mman.h > arch/microblaze/include/uapi/asm/msgbuf.h > arch/microblaze/include/uapi/asm/param.h > arch/microblaze/include/uapi/asm/poll.h > arch/microblaze/include/uapi/asm/resource.h > arch/microblaze/include/uapi/asm/sembuf.h > arch/microblaze/include/uapi/asm/shmbuf.h > arch/microblaze/include/uapi/asm/siginfo.h > arch/microblaze/include/uapi/asm/signal.h > arch/microblaze/includ > Aborting > > > > W dniu 2017-09-20 o 11:45, Paweł Staszewski pisze: >> Ok looks like ending bisection >> >> >> Latest bisected kernel when there is no kernel panic 4.12.0+ (from >> next) - but only this warning: >> >> [ 309.030019] NETDEV WATCHDOG: enp4s0f0 (ixgbe): transmit queue 0 >> timed out >> [ 309.030034] ------------[ cut here ]------------ >> [ 309.030040] WARNING: CPU: 35 PID: 0 at dev_watchdog+0xcf/0x139 >> [ 309.030041] Modules linked in: bonding ipmi_si x86_pkg_temp_thermal >> [ 309.030045] CPU: 35 PID: 0 Comm: swapper/35 Not tainted 4.12.0+ #5 >> [ 309.030046] task: ffff88086d98a000 task.stack: ffffc90003378000 >> [ 309.030048] RIP: 0010:dev_watchdog+0xcf/0x139 >> [ 309.030049] RSP: 0018:ffff88087fbc3ea8 EFLAGS: 00010246 >> [ 309.030050] RAX: 000000000000003d RBX: ffff88046b680000 RCX: >> 0000000000000000 >> [ 309.030050] RDX: ffff88087fbd2f01 RSI: 0000000000000000 RDI: >> ffff88087fbcda08 >> [ 309.030051] RBP: ffff88087fbc3eb8 R08: 0000000000000000 R09: >> ffff88087ff80a04 >> [ 309.030051] R10: 0000000000000000 R11: ffff88086d98a001 R12: >> 0000000000000000 >> [ 309.030052] R13: ffff88087fbc3ef8 R14: ffff88086d98a000 R15: >> ffffffff81c06008 >> [ 309.030053] FS: 0000000000000000(0000) GS:ffff88087fbc0000(0000) >> knlGS:0000000000000000 >> [ 309.030054] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 309.030054] CR2: 00007fba600f6098 CR3: 000000086b955000 CR4: >> 00000000001406e0 >> [ 309.030055] Call Trace: >> [ 309.030057] <IRQ> >> [ 309.030059] ? netif_tx_lock+0x79/0x79 >> [ 309.030062] call_timer_fn.isra.24+0x17/0x77 >> [ 309.030063] run_timer_softirq+0x118/0x161 >> [ 309.030065] ? netif_tx_lock+0x79/0x79 >> [ 309.030066] ? ktime_get+0x2b/0x42 >> [ 309.030070] ? lapic_next_deadline+0x21/0x27 >> [ 309.030073] ? clockevents_program_event+0xa8/0xc5 >> [ 309.030076] __do_softirq+0xa8/0x19d >> [ 309.030078] irq_exit+0x5d/0x6b >> [ 309.030079] smp_apic_timer_interrupt+0x2a/0x36 >> [ 309.030082] apic_timer_interrupt+0x89/0x90 >> [ 309.030085] RIP: 0010:mwait_idle+0x4e/0x6a >> [ 309.030086] RSP: 0018:ffffc9000337be98 EFLAGS: 00000246 ORIG_RAX: >> ffffffffffffff10 >> [ 309.030087] RAX: 0000000000000000 RBX: 0000000000000000 RCX: >> 0000000000000000 >> [ 309.030087] RDX: 0000000000000000 RSI: 0000000000000000 RDI: >> ffff88086d98a000 >> [ 309.030088] RBP: ffffc9000337be98 R08: ffff88046f8279a0 R09: >> ffff88046f827040 >> [ 309.030089] R10: ffff88086d98a000 R11: ffff88086d98a000 R12: >> 0000000000000000 >> [ 309.030089] R13: ffff88086d98a000 R14: ffff88086d98a000 R15: >> ffff88086d98a000 >> [ 309.030090] </IRQ> >> [ 309.030094] arch_cpu_idle+0xa/0xc >> [ 309.030095] default_idle_call+0x19/0x1b >> [ 309.030102] do_idle+0xbc/0x196 >> [ 309.030104] cpu_startup_entry+0x1d/0x20 >> [ 309.030105] start_secondary+0xd8/0xdc >> [ 309.030108] secondary_startup_64+0x9f/0x9f >> [ 309.030109] Code: cc 75 bd eb 35 48 89 df c6 05 c3 dc 74 00 01 e8 >> 3a 62 fe ff 44 89 e1 48 89 de 48 89 c2 48 c7 c7 0f 65 a4 81 31 c0 e8 >> 3d 4c b5 ff <0f> ff 48 8b 83 e0 01 00 00 48 89 df ff 50 78 48 8b 05 >> a0 bc 6a >> [ 309.030128] ---[ end trace 9102cb25703ae2d9 ]--- >> >> >> I just marked it as good - cause this problem above is differend - >> and im going to: >> >> git bisect good >> Bisecting: 1787 revisions left to test after this (roughly 11 steps) >> >> >> >> >> W dniu 2017-09-20 o 10:44, Paweł Staszewski pisze: >>> Trying to make video from ipmi :) >>> >>> with that results: >>> >>> https://bugzilla.kernel.org/attachment.cgi?id=258521 >>> >>> catched two more lines where it starts - panic from 4.13.2. >>> >>> >>> Now will try tro do some bisection >>> >>> >>> >>> W dniu 2017-09-20 o 09:58, Paweł Staszewski pisze: >>>> Hi >>>> >>>> >>>> Will try bisecting tonight >>>> >>>> >>>> >>>> W dniu 2017-09-20 o 05:24, Eric Dumazet pisze: >>>>> On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote: >>>>>> Just checked kernel 4.13.2 and same problem >>>>>> >>>>>> Just after start all 6 bgp sessions - and kernel starts to learn >>>>>> routes >>>>>> it panic. >>>>>> >>>>>> https://bugzilla.kernel.org/attachment.cgi?id=258509 >>>>>> >>>>> >>>>> Unfortunately we have not enough information from these traces. >>>>> >>>>> Can you get a full stack trace ? >>>>> >>>>> Alternatively, can you bisect ? >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 10:22 ` Paweł Staszewski @ 2017-09-20 11:02 ` Paweł Staszewski 2017-09-20 12:23 ` Paweł Staszewski 0 siblings, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 11:02 UTC (permalink / raw) To: Eric Dumazet; +Cc: Linux Kernel Network Developers Ok resumed and soo far: Panic: # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using stack larger than 1024. git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f No panic: # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 'udp-reduce-cache-pressure' git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 W dniu 2017-09-20 o 12:22, Paweł Staszewski pisze: > Soo far bisected and marked: > > git bisect start > # bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2 > git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355 > # good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13 > git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269 > # good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12 > git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c > # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag > 'pinctrl-v4.13-1' of > git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl > git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 > # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' > of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security > git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 > # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' > of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security > git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 > # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' > of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security > git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 > > > > W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze: >> Ok kernel crashed with different panic that i didnt catch when i was >> doing bisect and now my bisection is broken :) >> >> git bisect good >> Bisecting: 1787 revisions left to test after this (roughly 11 steps) >> error: Your local changes to the following files would be overwritten >> by checkout: >> Documentation/00-INDEX >> Documentation/ABI/stable/sysfs-class-udc >> Documentation/ABI/testing/configfs-usb-gadget-uac1 >> Documentation/ABI/testing/ima_policy >> Documentation/ABI/testing/sysfs-bus-iio >> Documentation/ABI/testing/sysfs-bus-iio-meas-spec >> Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 >> Documentation/ABI/testing/sysfs-class-net >> Documentation/ABI/testing/sysfs-class-power-twl4030 >> Documentation/ABI/testing/sysfs-class-typec >> Documentation/DMA-API.txt >> Documentation/IRQ-domain.txt >> Documentation/Makefile >> Documentation/PCI/MSI-HOWTO.txt >> Documentation/RCU/00-INDEX >> Documentation/RCU/Design/Requirements/Requirements.html >> Documentation/RCU/checklist.txt >> Documentation/admin-guide/README.rst >> Documentation/admin-guide/devices.txt >> Documentation/admin-guide/index.rst >> Documentation/admin-guide/kernel-parameters.txt >> Documentation/admin-guide/pm/cpufreq.rst >> Documentation/admin-guide/pm/intel_pstate.rst >> Documentation/admin-guide/ras.rst >> Documentation/arm/Atmel/README >> Documentation/block/biodoc.txt >> Documentation/conf.py >> Documentation/core-api/assoc_array.rst >> Documentation/core-api/atomic_ops.rst >> Documentation/core-api/index.rst >> Documentation/crypto/asymmetric-keys.txt >> Documentation/dev-tools/index.rst >> Documentation/dev-tools/sparse.rst >> Documentation/devicetree/bindings/arm/amlogic.txt >> Documentation/devicetree/bindings/arm/atmel-at91.txt >> Documentation/devicetree/bindings/arm/ccn.txt >> Documentation/devicetree/bindings/arm/cpus.txt >> Documentation/devicetree/bindings/arm/gemini.txt >> Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt >> Documentation/devicetree/bindings/arm/keystone/keystone.txt >> Documentation/devicetree/bindings/arm/mediatek.txt >> Documentation/devicetree/bindings/arm/rockchip.txt >> Documentation/devicetree/bindings/arm/shmobile.txt >> Documentation/devicetree/bindings/arm/tegra.txt >> Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt >> Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt >> Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt >> Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt >> Documentation/devicetree/bindings/gpio/gpio_atmel.txt >> Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt >> Documentation/devicetree/bindings/iio/adc/renesas,gyroadc.txt >> Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt >> Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt >> Documentation/devicetree/bindings/interrupt-controller/allwinner,sunxi-nmi.txt >> >> Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-vic.txt >> >> Documentation/devicetree/bindings/interrupt-controller/mediatek,sysirq.txt >> >> Documentation/devicetree/bindings/leds/common.txt >> Documentation/devicetree/bindings/mfd/hi6421.txt >> Documentation/devicetree/bindings/mfd/tps65910.txt >> Documentation/devicetree/bindings/mmc/fsl-esdhc.txt >> Documentation/devicetree/bindings/mmc/k3-dw-mshc.txt >> Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt >> Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt >> Documentation/devicetree/bindings/mtd/atmel-nand.txt >> Documentation/devicetree/bindings/net/dsa/b53.txt >> Documentation/devicetree/bindings/net/ethernet.txt >> Documentation/devicetree/bindings/net/macb.txt >> Documentation/devicetree/bindings/net/marvell-orion-mdio.txt >> Documentation/devicetree/bindings/net/ti,wilink-st.txt >> Documentation/devicetree/bindings/net/wireless/ti,wlcore.txt >> Documentation/devicetree/bindings/nvmem/rockchip-efuse.txt >> Documentation/devicetree/bindings/opp/opp.txt >> Documentation/devicetree/bindings/phy/bcm-ns-usb3-phy.txt >> Documentation/devicetree/bindings/phy/brcm-sata-phy.txt >> Documentation/devicetree/bindings/phy/meson8b-usb2-phy.txt >> Documentation/devicetree/bindings/phy/phy-rockchip-inno-usb2.txt >> Documentation/devicetree/bindings/power/rockchip-io-domain.txt >> Documentation/devicetree/bindings/power/supply/bq27xxx.txt >> Documentation/devicetree/bindings/property-units.txt >> Documentation/devicetree/bindings/regulator/regulator.txt >> Documentation/devicetree/bindings/serial/8 >> error: The following untracked working tree files would be >> overwritten by checkout: >> Documentation/ABI/testing/sysfs-class-net-phydev >> Documentation/DocBook/.gitignore >> Documentation/DocBook/Makefile >> Documentation/DocBook/filesystems.tmpl >> Documentation/DocBook/kernel-hacking.tmpl >> Documentation/DocBook/kernel-locking.tmpl >> Documentation/DocBook/kgdb.tmpl >> Documentation/DocBook/libata.tmpl >> Documentation/DocBook/librs.tmpl >> Documentation/DocBook/lsm.tmpl >> Documentation/DocBook/mtdnand.tmpl >> Documentation/DocBook/networking.tmpl >> Documentation/DocBook/rapidio.tmpl >> Documentation/DocBook/s390-drivers.tmpl >> Documentation/DocBook/scsi.tmpl >> Documentation/DocBook/sh.tmpl >> Documentation/DocBook/stylesheet.xsl >> Documentation/DocBook/w1.tmpl >> Documentation/DocBook/z8530book.tmpl >> Documentation/Makefile.sphinx >> Documentation/RCU/trace.txt >> Documentation/devicetree/bindings/i2c/i2c-mt6577.txt >> Documentation/devicetree/bindings/misc/allwinner,syscon.txt >> Documentation/devicetree/bindings/net/cortina.txt >> Documentation/devicetree/bindings/net/dsa/ksz.txt >> Documentation/devicetree/bindings/net/dwmac-sun8i.txt >> Documentation/devicetree/bindings/net/qca,qca7000.txt >> Documentation/devicetree/bindings/power/max8903-charger.txt >> Documentation/devicetree/bindings/power_supply/maxim,max14656.txt >> Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt >> Documentation/devicetree/bindings/timer/moxa,moxart-timer.txt >> Documentation/doc-guide/docbook.rst >> Documentation/networking/tls.txt >> Documentation/prctl/no_new_privs.txt >> Documentation/prctl/seccomp_filter.txt >> Documentation/security/00-INDEX >> Documentation/security/IMA-templates.txt >> Documentation/security/LSM.txt >> Documentation/security/LoadPin.txt >> Documentation/security/SELinux.txt >> Documentation/security/Smack.txt >> Documentation/security/Yama.txt >> Documentation/security/apparmor.txt >> Documentation/security/conf.py >> Documentation/security/credentials.txt >> Documentation/security/keys-ecryptfs.txt >> Documentation/security/keys-request-key.txt >> Documentation/security/keys-trusted-encrypted.txt >> Documentation/security/keys.txt >> Documentation/security/self-protection.txt >> Documentation/security/tomoyo.txt >> Documentation/sphinx/convert_template.sed >> Documentation/sphinx/post_convert.sed >> Documentation/sphinx/tmplcvt >> Documentation/usb/typec.rst >> Documentation/usb/usb3-debug-port.rst >> arch/arm/boot/dts/rk1108-evb.dts >> arch/arm/boot/dts/rk1108.dtsi >> arch/arm/boot/dts/tegra20-whistler.dts >> arch/arm/mach-omap2/opp.c >> arch/arm/mach-omap2/pmu.c >> arch/ia64/include/asm/siginfo.h >> arch/m32r/include/uapi/asm/siginfo.h >> arch/microblaze/include/asm/bitops.h >> arch/microblaze/include/asm/bug.h >> arch/microblaze/include/asm/bugs.h >> arch/microblaze/include/asm/div64.h >> arch/microblaze/include/asm/emergency-restart.h >> arch/microblaze/include/asm/fb.h >> arch/microblaze/include/asm/hardirq.h >> arch/microblaze/include/asm/irq_regs.h >> arch/microblaze/include/asm/kdebug.h >> arch/microblaze/include/asm/kmap_types.h >> arch/microblaze/include/asm/linkage.h >> arch/microblaze/include/asm/local.h >> arch/microblaze/include/asm/local64.h >> arch/microblaze/include/asm/parport.h >> arch/microblaze/include/asm/percpu.h >> arch/microblaze/include/asm/serial.h >> arch/microblaze/include/asm/shmparam.h >> arch/microblaze/include/asm/topology.h >> arch/microblaze/include/asm/ucontext.h >> arch/microblaze/include/asm/vga.h >> arch/microblaze/include/asm/xor.h >> arch/microblaze/include/uapi/asm/bitsperlong.h >> arch/microblaze/include/uapi/asm/errno.h >> arch/microblaze/include/uapi/asm/fcntl.h >> arch/microblaze/include/uapi/asm/ioctl.h >> arch/microblaze/include/uapi/asm/ioctls.h >> arch/microblaze/include/uapi/asm/ipcbuf.h >> arch/microblaze/include/uapi/asm/kvm_para.h >> arch/microblaze/include/uapi/asm/mman.h >> arch/microblaze/include/uapi/asm/msgbuf.h >> arch/microblaze/include/uapi/asm/param.h >> arch/microblaze/include/uapi/asm/poll.h >> arch/microblaze/include/uapi/asm/resource.h >> arch/microblaze/include/uapi/asm/sembuf.h >> arch/microblaze/include/uapi/asm/shmbuf.h >> arch/microblaze/include/uapi/asm/siginfo.h >> arch/microblaze/include/uapi/asm/signal.h >> arch/microblaze/includ >> Aborting >> >> >> >> W dniu 2017-09-20 o 11:45, Paweł Staszewski pisze: >>> Ok looks like ending bisection >>> >>> >>> Latest bisected kernel when there is no kernel panic 4.12.0+ (from >>> next) - but only this warning: >>> >>> [ 309.030019] NETDEV WATCHDOG: enp4s0f0 (ixgbe): transmit queue 0 >>> timed out >>> [ 309.030034] ------------[ cut here ]------------ >>> [ 309.030040] WARNING: CPU: 35 PID: 0 at dev_watchdog+0xcf/0x139 >>> [ 309.030041] Modules linked in: bonding ipmi_si x86_pkg_temp_thermal >>> [ 309.030045] CPU: 35 PID: 0 Comm: swapper/35 Not tainted 4.12.0+ #5 >>> [ 309.030046] task: ffff88086d98a000 task.stack: ffffc90003378000 >>> [ 309.030048] RIP: 0010:dev_watchdog+0xcf/0x139 >>> [ 309.030049] RSP: 0018:ffff88087fbc3ea8 EFLAGS: 00010246 >>> [ 309.030050] RAX: 000000000000003d RBX: ffff88046b680000 RCX: >>> 0000000000000000 >>> [ 309.030050] RDX: ffff88087fbd2f01 RSI: 0000000000000000 RDI: >>> ffff88087fbcda08 >>> [ 309.030051] RBP: ffff88087fbc3eb8 R08: 0000000000000000 R09: >>> ffff88087ff80a04 >>> [ 309.030051] R10: 0000000000000000 R11: ffff88086d98a001 R12: >>> 0000000000000000 >>> [ 309.030052] R13: ffff88087fbc3ef8 R14: ffff88086d98a000 R15: >>> ffffffff81c06008 >>> [ 309.030053] FS: 0000000000000000(0000) GS:ffff88087fbc0000(0000) >>> knlGS:0000000000000000 >>> [ 309.030054] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 309.030054] CR2: 00007fba600f6098 CR3: 000000086b955000 CR4: >>> 00000000001406e0 >>> [ 309.030055] Call Trace: >>> [ 309.030057] <IRQ> >>> [ 309.030059] ? netif_tx_lock+0x79/0x79 >>> [ 309.030062] call_timer_fn.isra.24+0x17/0x77 >>> [ 309.030063] run_timer_softirq+0x118/0x161 >>> [ 309.030065] ? netif_tx_lock+0x79/0x79 >>> [ 309.030066] ? ktime_get+0x2b/0x42 >>> [ 309.030070] ? lapic_next_deadline+0x21/0x27 >>> [ 309.030073] ? clockevents_program_event+0xa8/0xc5 >>> [ 309.030076] __do_softirq+0xa8/0x19d >>> [ 309.030078] irq_exit+0x5d/0x6b >>> [ 309.030079] smp_apic_timer_interrupt+0x2a/0x36 >>> [ 309.030082] apic_timer_interrupt+0x89/0x90 >>> [ 309.030085] RIP: 0010:mwait_idle+0x4e/0x6a >>> [ 309.030086] RSP: 0018:ffffc9000337be98 EFLAGS: 00000246 ORIG_RAX: >>> ffffffffffffff10 >>> [ 309.030087] RAX: 0000000000000000 RBX: 0000000000000000 RCX: >>> 0000000000000000 >>> [ 309.030087] RDX: 0000000000000000 RSI: 0000000000000000 RDI: >>> ffff88086d98a000 >>> [ 309.030088] RBP: ffffc9000337be98 R08: ffff88046f8279a0 R09: >>> ffff88046f827040 >>> [ 309.030089] R10: ffff88086d98a000 R11: ffff88086d98a000 R12: >>> 0000000000000000 >>> [ 309.030089] R13: ffff88086d98a000 R14: ffff88086d98a000 R15: >>> ffff88086d98a000 >>> [ 309.030090] </IRQ> >>> [ 309.030094] arch_cpu_idle+0xa/0xc >>> [ 309.030095] default_idle_call+0x19/0x1b >>> [ 309.030102] do_idle+0xbc/0x196 >>> [ 309.030104] cpu_startup_entry+0x1d/0x20 >>> [ 309.030105] start_secondary+0xd8/0xdc >>> [ 309.030108] secondary_startup_64+0x9f/0x9f >>> [ 309.030109] Code: cc 75 bd eb 35 48 89 df c6 05 c3 dc 74 00 01 e8 >>> 3a 62 fe ff 44 89 e1 48 89 de 48 89 c2 48 c7 c7 0f 65 a4 81 31 c0 e8 >>> 3d 4c b5 ff <0f> ff 48 8b 83 e0 01 00 00 48 89 df ff 50 78 48 8b 05 >>> a0 bc 6a >>> [ 309.030128] ---[ end trace 9102cb25703ae2d9 ]--- >>> >>> >>> I just marked it as good - cause this problem above is differend - >>> and im going to: >>> >>> git bisect good >>> Bisecting: 1787 revisions left to test after this (roughly 11 steps) >>> >>> >>> >>> >>> W dniu 2017-09-20 o 10:44, Paweł Staszewski pisze: >>>> Trying to make video from ipmi :) >>>> >>>> with that results: >>>> >>>> https://bugzilla.kernel.org/attachment.cgi?id=258521 >>>> >>>> catched two more lines where it starts - panic from 4.13.2. >>>> >>>> >>>> Now will try tro do some bisection >>>> >>>> >>>> >>>> W dniu 2017-09-20 o 09:58, Paweł Staszewski pisze: >>>>> Hi >>>>> >>>>> >>>>> Will try bisecting tonight >>>>> >>>>> >>>>> >>>>> W dniu 2017-09-20 o 05:24, Eric Dumazet pisze: >>>>>> On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote: >>>>>>> Just checked kernel 4.13.2 and same problem >>>>>>> >>>>>>> Just after start all 6 bgp sessions - and kernel starts to learn >>>>>>> routes >>>>>>> it panic. >>>>>>> >>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=258509 >>>>>>> >>>>>> >>>>>> Unfortunately we have not enough information from these traces. >>>>>> >>>>>> Can you get a full stack trace ? >>>>>> >>>>>> Alternatively, can you bisect ? >>>>>> >>>>>> Thanks. >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 11:02 ` Paweł Staszewski @ 2017-09-20 12:23 ` Paweł Staszewski 2017-09-20 12:49 ` Paweł Staszewski 0 siblings, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 12:23 UTC (permalink / raw) To: Eric Dumazet; +Cc: Linux Kernel Network Developers Almost there Bisecting: 6 revisions left to test after this (roughly 3 steps) [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() properly W dniu 2017-09-20 o 13:02, Paweł Staszewski pisze: > Ok resumed and soo far: > > Panic: > > # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid > using stack larger than 1024. > git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f > > No panic: > > # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch > 'udp-reduce-cache-pressure' > git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 > > > W dniu 2017-09-20 o 12:22, Paweł Staszewski pisze: >> Soo far bisected and marked: >> >> git bisect start >> # bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2 >> git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355 >> # good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13 >> git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269 >> # good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12 >> git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c >> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag >> 'pinctrl-v4.13-1' of >> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl >> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 >> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch >> 'next' of >> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch >> 'next' of >> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch >> 'next' of >> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >> >> >> >> W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze: >>> Ok kernel crashed with different panic that i didnt catch when i was >>> doing bisect and now my bisection is broken :) >>> >>> git bisect good >>> Bisecting: 1787 revisions left to test after this (roughly 11 steps) >>> error: Your local changes to the following files would be >>> overwritten by checkout: >>> Documentation/00-INDEX >>> Documentation/ABI/stable/sysfs-class-udc >>> Documentation/ABI/testing/configfs-usb-gadget-uac1 >>> Documentation/ABI/testing/ima_policy >>> Documentation/ABI/testing/sysfs-bus-iio >>> Documentation/ABI/testing/sysfs-bus-iio-meas-spec >>> Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 >>> Documentation/ABI/testing/sysfs-class-net >>> Documentation/ABI/testing/sysfs-class-power-twl4030 >>> Documentation/ABI/testing/sysfs-class-typec >>> Documentation/DMA-API.txt >>> Documentation/IRQ-domain.txt >>> Documentation/Makefile >>> Documentation/PCI/MSI-HOWTO.txt >>> Documentation/RCU/00-INDEX >>> Documentation/RCU/Design/Requirements/Requirements.html >>> Documentation/RCU/checklist.txt >>> Documentation/admin-guide/README.rst >>> Documentation/admin-guide/devices.txt >>> Documentation/admin-guide/index.rst >>> Documentation/admin-guide/kernel-parameters.txt >>> Documentation/admin-guide/pm/cpufreq.rst >>> Documentation/admin-guide/pm/intel_pstate.rst >>> Documentation/admin-guide/ras.rst >>> Documentation/arm/Atmel/README >>> Documentation/block/biodoc.txt >>> Documentation/conf.py >>> Documentation/core-api/assoc_array.rst >>> Documentation/core-api/atomic_ops.rst >>> Documentation/core-api/index.rst >>> Documentation/crypto/asymmetric-keys.txt >>> Documentation/dev-tools/index.rst >>> Documentation/dev-tools/sparse.rst >>> Documentation/devicetree/bindings/arm/amlogic.txt >>> Documentation/devicetree/bindings/arm/atmel-at91.txt >>> Documentation/devicetree/bindings/arm/ccn.txt >>> Documentation/devicetree/bindings/arm/cpus.txt >>> Documentation/devicetree/bindings/arm/gemini.txt >>> Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt >>> Documentation/devicetree/bindings/arm/keystone/keystone.txt >>> Documentation/devicetree/bindings/arm/mediatek.txt >>> Documentation/devicetree/bindings/arm/rockchip.txt >>> Documentation/devicetree/bindings/arm/shmobile.txt >>> Documentation/devicetree/bindings/arm/tegra.txt >>> Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt >>> Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt >>> Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt >>> Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt >>> Documentation/devicetree/bindings/gpio/gpio_atmel.txt >>> Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt >>> Documentation/devicetree/bindings/iio/adc/renesas,gyroadc.txt >>> Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt >>> Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt >>> Documentation/devicetree/bindings/interrupt-controller/allwinner,sunxi-nmi.txt >>> >>> Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-vic.txt >>> >>> Documentation/devicetree/bindings/interrupt-controller/mediatek,sysirq.txt >>> >>> Documentation/devicetree/bindings/leds/common.txt >>> Documentation/devicetree/bindings/mfd/hi6421.txt >>> Documentation/devicetree/bindings/mfd/tps65910.txt >>> Documentation/devicetree/bindings/mmc/fsl-esdhc.txt >>> Documentation/devicetree/bindings/mmc/k3-dw-mshc.txt >>> Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt >>> Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt >>> Documentation/devicetree/bindings/mtd/atmel-nand.txt >>> Documentation/devicetree/bindings/net/dsa/b53.txt >>> Documentation/devicetree/bindings/net/ethernet.txt >>> Documentation/devicetree/bindings/net/macb.txt >>> Documentation/devicetree/bindings/net/marvell-orion-mdio.txt >>> Documentation/devicetree/bindings/net/ti,wilink-st.txt >>> Documentation/devicetree/bindings/net/wireless/ti,wlcore.txt >>> Documentation/devicetree/bindings/nvmem/rockchip-efuse.txt >>> Documentation/devicetree/bindings/opp/opp.txt >>> Documentation/devicetree/bindings/phy/bcm-ns-usb3-phy.txt >>> Documentation/devicetree/bindings/phy/brcm-sata-phy.txt >>> Documentation/devicetree/bindings/phy/meson8b-usb2-phy.txt >>> Documentation/devicetree/bindings/phy/phy-rockchip-inno-usb2.txt >>> Documentation/devicetree/bindings/power/rockchip-io-domain.txt >>> Documentation/devicetree/bindings/power/supply/bq27xxx.txt >>> Documentation/devicetree/bindings/property-units.txt >>> Documentation/devicetree/bindings/regulator/regulator.txt >>> Documentation/devicetree/bindings/serial/8 >>> error: The following untracked working tree files would be >>> overwritten by checkout: >>> Documentation/ABI/testing/sysfs-class-net-phydev >>> Documentation/DocBook/.gitignore >>> Documentation/DocBook/Makefile >>> Documentation/DocBook/filesystems.tmpl >>> Documentation/DocBook/kernel-hacking.tmpl >>> Documentation/DocBook/kernel-locking.tmpl >>> Documentation/DocBook/kgdb.tmpl >>> Documentation/DocBook/libata.tmpl >>> Documentation/DocBook/librs.tmpl >>> Documentation/DocBook/lsm.tmpl >>> Documentation/DocBook/mtdnand.tmpl >>> Documentation/DocBook/networking.tmpl >>> Documentation/DocBook/rapidio.tmpl >>> Documentation/DocBook/s390-drivers.tmpl >>> Documentation/DocBook/scsi.tmpl >>> Documentation/DocBook/sh.tmpl >>> Documentation/DocBook/stylesheet.xsl >>> Documentation/DocBook/w1.tmpl >>> Documentation/DocBook/z8530book.tmpl >>> Documentation/Makefile.sphinx >>> Documentation/RCU/trace.txt >>> Documentation/devicetree/bindings/i2c/i2c-mt6577.txt >>> Documentation/devicetree/bindings/misc/allwinner,syscon.txt >>> Documentation/devicetree/bindings/net/cortina.txt >>> Documentation/devicetree/bindings/net/dsa/ksz.txt >>> Documentation/devicetree/bindings/net/dwmac-sun8i.txt >>> Documentation/devicetree/bindings/net/qca,qca7000.txt >>> Documentation/devicetree/bindings/power/max8903-charger.txt >>> Documentation/devicetree/bindings/power_supply/maxim,max14656.txt >>> Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt >>> Documentation/devicetree/bindings/timer/moxa,moxart-timer.txt >>> Documentation/doc-guide/docbook.rst >>> Documentation/networking/tls.txt >>> Documentation/prctl/no_new_privs.txt >>> Documentation/prctl/seccomp_filter.txt >>> Documentation/security/00-INDEX >>> Documentation/security/IMA-templates.txt >>> Documentation/security/LSM.txt >>> Documentation/security/LoadPin.txt >>> Documentation/security/SELinux.txt >>> Documentation/security/Smack.txt >>> Documentation/security/Yama.txt >>> Documentation/security/apparmor.txt >>> Documentation/security/conf.py >>> Documentation/security/credentials.txt >>> Documentation/security/keys-ecryptfs.txt >>> Documentation/security/keys-request-key.txt >>> Documentation/security/keys-trusted-encrypted.txt >>> Documentation/security/keys.txt >>> Documentation/security/self-protection.txt >>> Documentation/security/tomoyo.txt >>> Documentation/sphinx/convert_template.sed >>> Documentation/sphinx/post_convert.sed >>> Documentation/sphinx/tmplcvt >>> Documentation/usb/typec.rst >>> Documentation/usb/usb3-debug-port.rst >>> arch/arm/boot/dts/rk1108-evb.dts >>> arch/arm/boot/dts/rk1108.dtsi >>> arch/arm/boot/dts/tegra20-whistler.dts >>> arch/arm/mach-omap2/opp.c >>> arch/arm/mach-omap2/pmu.c >>> arch/ia64/include/asm/siginfo.h >>> arch/m32r/include/uapi/asm/siginfo.h >>> arch/microblaze/include/asm/bitops.h >>> arch/microblaze/include/asm/bug.h >>> arch/microblaze/include/asm/bugs.h >>> arch/microblaze/include/asm/div64.h >>> arch/microblaze/include/asm/emergency-restart.h >>> arch/microblaze/include/asm/fb.h >>> arch/microblaze/include/asm/hardirq.h >>> arch/microblaze/include/asm/irq_regs.h >>> arch/microblaze/include/asm/kdebug.h >>> arch/microblaze/include/asm/kmap_types.h >>> arch/microblaze/include/asm/linkage.h >>> arch/microblaze/include/asm/local.h >>> arch/microblaze/include/asm/local64.h >>> arch/microblaze/include/asm/parport.h >>> arch/microblaze/include/asm/percpu.h >>> arch/microblaze/include/asm/serial.h >>> arch/microblaze/include/asm/shmparam.h >>> arch/microblaze/include/asm/topology.h >>> arch/microblaze/include/asm/ucontext.h >>> arch/microblaze/include/asm/vga.h >>> arch/microblaze/include/asm/xor.h >>> arch/microblaze/include/uapi/asm/bitsperlong.h >>> arch/microblaze/include/uapi/asm/errno.h >>> arch/microblaze/include/uapi/asm/fcntl.h >>> arch/microblaze/include/uapi/asm/ioctl.h >>> arch/microblaze/include/uapi/asm/ioctls.h >>> arch/microblaze/include/uapi/asm/ipcbuf.h >>> arch/microblaze/include/uapi/asm/kvm_para.h >>> arch/microblaze/include/uapi/asm/mman.h >>> arch/microblaze/include/uapi/asm/msgbuf.h >>> arch/microblaze/include/uapi/asm/param.h >>> arch/microblaze/include/uapi/asm/poll.h >>> arch/microblaze/include/uapi/asm/resource.h >>> arch/microblaze/include/uapi/asm/sembuf.h >>> arch/microblaze/include/uapi/asm/shmbuf.h >>> arch/microblaze/include/uapi/asm/siginfo.h >>> arch/microblaze/include/uapi/asm/signal.h >>> arch/microblaze/includ >>> Aborting >>> >>> >>> >>> W dniu 2017-09-20 o 11:45, Paweł Staszewski pisze: >>>> Ok looks like ending bisection >>>> >>>> >>>> Latest bisected kernel when there is no kernel panic 4.12.0+ (from >>>> next) - but only this warning: >>>> >>>> [ 309.030019] NETDEV WATCHDOG: enp4s0f0 (ixgbe): transmit queue 0 >>>> timed out >>>> [ 309.030034] ------------[ cut here ]------------ >>>> [ 309.030040] WARNING: CPU: 35 PID: 0 at dev_watchdog+0xcf/0x139 >>>> [ 309.030041] Modules linked in: bonding ipmi_si x86_pkg_temp_thermal >>>> [ 309.030045] CPU: 35 PID: 0 Comm: swapper/35 Not tainted 4.12.0+ #5 >>>> [ 309.030046] task: ffff88086d98a000 task.stack: ffffc90003378000 >>>> [ 309.030048] RIP: 0010:dev_watchdog+0xcf/0x139 >>>> [ 309.030049] RSP: 0018:ffff88087fbc3ea8 EFLAGS: 00010246 >>>> [ 309.030050] RAX: 000000000000003d RBX: ffff88046b680000 RCX: >>>> 0000000000000000 >>>> [ 309.030050] RDX: ffff88087fbd2f01 RSI: 0000000000000000 RDI: >>>> ffff88087fbcda08 >>>> [ 309.030051] RBP: ffff88087fbc3eb8 R08: 0000000000000000 R09: >>>> ffff88087ff80a04 >>>> [ 309.030051] R10: 0000000000000000 R11: ffff88086d98a001 R12: >>>> 0000000000000000 >>>> [ 309.030052] R13: ffff88087fbc3ef8 R14: ffff88086d98a000 R15: >>>> ffffffff81c06008 >>>> [ 309.030053] FS: 0000000000000000(0000) >>>> GS:ffff88087fbc0000(0000) knlGS:0000000000000000 >>>> [ 309.030054] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [ 309.030054] CR2: 00007fba600f6098 CR3: 000000086b955000 CR4: >>>> 00000000001406e0 >>>> [ 309.030055] Call Trace: >>>> [ 309.030057] <IRQ> >>>> [ 309.030059] ? netif_tx_lock+0x79/0x79 >>>> [ 309.030062] call_timer_fn.isra.24+0x17/0x77 >>>> [ 309.030063] run_timer_softirq+0x118/0x161 >>>> [ 309.030065] ? netif_tx_lock+0x79/0x79 >>>> [ 309.030066] ? ktime_get+0x2b/0x42 >>>> [ 309.030070] ? lapic_next_deadline+0x21/0x27 >>>> [ 309.030073] ? clockevents_program_event+0xa8/0xc5 >>>> [ 309.030076] __do_softirq+0xa8/0x19d >>>> [ 309.030078] irq_exit+0x5d/0x6b >>>> [ 309.030079] smp_apic_timer_interrupt+0x2a/0x36 >>>> [ 309.030082] apic_timer_interrupt+0x89/0x90 >>>> [ 309.030085] RIP: 0010:mwait_idle+0x4e/0x6a >>>> [ 309.030086] RSP: 0018:ffffc9000337be98 EFLAGS: 00000246 >>>> ORIG_RAX: ffffffffffffff10 >>>> [ 309.030087] RAX: 0000000000000000 RBX: 0000000000000000 RCX: >>>> 0000000000000000 >>>> [ 309.030087] RDX: 0000000000000000 RSI: 0000000000000000 RDI: >>>> ffff88086d98a000 >>>> [ 309.030088] RBP: ffffc9000337be98 R08: ffff88046f8279a0 R09: >>>> ffff88046f827040 >>>> [ 309.030089] R10: ffff88086d98a000 R11: ffff88086d98a000 R12: >>>> 0000000000000000 >>>> [ 309.030089] R13: ffff88086d98a000 R14: ffff88086d98a000 R15: >>>> ffff88086d98a000 >>>> [ 309.030090] </IRQ> >>>> [ 309.030094] arch_cpu_idle+0xa/0xc >>>> [ 309.030095] default_idle_call+0x19/0x1b >>>> [ 309.030102] do_idle+0xbc/0x196 >>>> [ 309.030104] cpu_startup_entry+0x1d/0x20 >>>> [ 309.030105] start_secondary+0xd8/0xdc >>>> [ 309.030108] secondary_startup_64+0x9f/0x9f >>>> [ 309.030109] Code: cc 75 bd eb 35 48 89 df c6 05 c3 dc 74 00 01 >>>> e8 3a 62 fe ff 44 89 e1 48 89 de 48 89 c2 48 c7 c7 0f 65 a4 81 31 >>>> c0 e8 3d 4c b5 ff <0f> ff 48 8b 83 e0 01 00 00 48 89 df ff 50 78 48 >>>> 8b 05 a0 bc 6a >>>> [ 309.030128] ---[ end trace 9102cb25703ae2d9 ]--- >>>> >>>> >>>> I just marked it as good - cause this problem above is differend - >>>> and im going to: >>>> >>>> git bisect good >>>> Bisecting: 1787 revisions left to test after this (roughly 11 steps) >>>> >>>> >>>> >>>> >>>> W dniu 2017-09-20 o 10:44, Paweł Staszewski pisze: >>>>> Trying to make video from ipmi :) >>>>> >>>>> with that results: >>>>> >>>>> https://bugzilla.kernel.org/attachment.cgi?id=258521 >>>>> >>>>> catched two more lines where it starts - panic from 4.13.2. >>>>> >>>>> >>>>> Now will try tro do some bisection >>>>> >>>>> >>>>> >>>>> W dniu 2017-09-20 o 09:58, Paweł Staszewski pisze: >>>>>> Hi >>>>>> >>>>>> >>>>>> Will try bisecting tonight >>>>>> >>>>>> >>>>>> >>>>>> W dniu 2017-09-20 o 05:24, Eric Dumazet pisze: >>>>>>> On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote: >>>>>>>> Just checked kernel 4.13.2 and same problem >>>>>>>> >>>>>>>> Just after start all 6 bgp sessions - and kernel starts to >>>>>>>> learn routes >>>>>>>> it panic. >>>>>>>> >>>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=258509 >>>>>>>> >>>>>>> >>>>>>> Unfortunately we have not enough information from these traces. >>>>>>> >>>>>>> Can you get a full stack trace ? >>>>>>> >>>>>>> Alternatively, can you bisect ? >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 12:23 ` Paweł Staszewski @ 2017-09-20 12:49 ` Paweł Staszewski 2017-09-20 13:05 ` Paweł Staszewski 2017-09-20 13:11 ` Eric Dumazet 0 siblings, 2 replies; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 12:49 UTC (permalink / raw) To: Eric Dumazet; +Cc: Linux Kernel Network Developers And the last one git bisect good Bisecting: 1 revision left to test after this (roughly 1 step) [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for insertion into fib6 tree With this have kernel panic same as always git bisect bad Bisecting: 0 revisions left to test after this (roughly 0 steps) [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() W dniu 2017-09-20 o 14:23, Paweł Staszewski pisze: > Almost there > > Bisecting: 6 revisions left to test after this (roughly 3 steps) > [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() > properly > > > > W dniu 2017-09-20 o 13:02, Paweł Staszewski pisze: >> Ok resumed and soo far: >> >> Panic: >> >> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid >> using stack larger than 1024. >> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f >> >> No panic: >> >> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch >> 'udp-reduce-cache-pressure' >> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 >> >> >> W dniu 2017-09-20 o 12:22, Paweł Staszewski pisze: >>> Soo far bisected and marked: >>> >>> git bisect start >>> # bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2 >>> git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355 >>> # good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13 >>> git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269 >>> # good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12 >>> git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c >>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag >>> 'pinctrl-v4.13-1' of >>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl >>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 >>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch >>> 'next' of >>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch >>> 'next' of >>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch >>> 'next' of >>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >>> >>> >>> >>> W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze: >>>> Ok kernel crashed with different panic that i didnt catch when i >>>> was doing bisect and now my bisection is broken :) >>>> >>>> git bisect good >>>> Bisecting: 1787 revisions left to test after this (roughly 11 steps) >>>> error: Your local changes to the following files would be >>>> overwritten by checkout: >>>> Documentation/00-INDEX >>>> Documentation/ABI/stable/sysfs-class-udc >>>> Documentation/ABI/testing/configfs-usb-gadget-uac1 >>>> Documentation/ABI/testing/ima_policy >>>> Documentation/ABI/testing/sysfs-bus-iio >>>> Documentation/ABI/testing/sysfs-bus-iio-meas-spec >>>> Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 >>>> Documentation/ABI/testing/sysfs-class-net >>>> Documentation/ABI/testing/sysfs-class-power-twl4030 >>>> Documentation/ABI/testing/sysfs-class-typec >>>> Documentation/DMA-API.txt >>>> Documentation/IRQ-domain.txt >>>> Documentation/Makefile >>>> Documentation/PCI/MSI-HOWTO.txt >>>> Documentation/RCU/00-INDEX >>>> Documentation/RCU/Design/Requirements/Requirements.html >>>> Documentation/RCU/checklist.txt >>>> Documentation/admin-guide/README.rst >>>> Documentation/admin-guide/devices.txt >>>> Documentation/admin-guide/index.rst >>>> Documentation/admin-guide/kernel-parameters.txt >>>> Documentation/admin-guide/pm/cpufreq.rst >>>> Documentation/admin-guide/pm/intel_pstate.rst >>>> Documentation/admin-guide/ras.rst >>>> Documentation/arm/Atmel/README >>>> Documentation/block/biodoc.txt >>>> Documentation/conf.py >>>> Documentation/core-api/assoc_array.rst >>>> Documentation/core-api/atomic_ops.rst >>>> Documentation/core-api/index.rst >>>> Documentation/crypto/asymmetric-keys.txt >>>> Documentation/dev-tools/index.rst >>>> Documentation/dev-tools/sparse.rst >>>> Documentation/devicetree/bindings/arm/amlogic.txt >>>> Documentation/devicetree/bindings/arm/atmel-at91.txt >>>> Documentation/devicetree/bindings/arm/ccn.txt >>>> Documentation/devicetree/bindings/arm/cpus.txt >>>> Documentation/devicetree/bindings/arm/gemini.txt >>>> Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt >>>> Documentation/devicetree/bindings/arm/keystone/keystone.txt >>>> Documentation/devicetree/bindings/arm/mediatek.txt >>>> Documentation/devicetree/bindings/arm/rockchip.txt >>>> Documentation/devicetree/bindings/arm/shmobile.txt >>>> Documentation/devicetree/bindings/arm/tegra.txt >>>> Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt >>>> Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt >>>> Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt >>>> Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt >>>> Documentation/devicetree/bindings/gpio/gpio_atmel.txt >>>> Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt >>>> Documentation/devicetree/bindings/iio/adc/renesas,gyroadc.txt >>>> Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt >>>> Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt >>>> Documentation/devicetree/bindings/interrupt-controller/allwinner,sunxi-nmi.txt >>>> >>>> Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-vic.txt >>>> >>>> Documentation/devicetree/bindings/interrupt-controller/mediatek,sysirq.txt >>>> >>>> Documentation/devicetree/bindings/leds/common.txt >>>> Documentation/devicetree/bindings/mfd/hi6421.txt >>>> Documentation/devicetree/bindings/mfd/tps65910.txt >>>> Documentation/devicetree/bindings/mmc/fsl-esdhc.txt >>>> Documentation/devicetree/bindings/mmc/k3-dw-mshc.txt >>>> Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt >>>> Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt >>>> Documentation/devicetree/bindings/mtd/atmel-nand.txt >>>> Documentation/devicetree/bindings/net/dsa/b53.txt >>>> Documentation/devicetree/bindings/net/ethernet.txt >>>> Documentation/devicetree/bindings/net/macb.txt >>>> Documentation/devicetree/bindings/net/marvell-orion-mdio.txt >>>> Documentation/devicetree/bindings/net/ti,wilink-st.txt >>>> Documentation/devicetree/bindings/net/wireless/ti,wlcore.txt >>>> Documentation/devicetree/bindings/nvmem/rockchip-efuse.txt >>>> Documentation/devicetree/bindings/opp/opp.txt >>>> Documentation/devicetree/bindings/phy/bcm-ns-usb3-phy.txt >>>> Documentation/devicetree/bindings/phy/brcm-sata-phy.txt >>>> Documentation/devicetree/bindings/phy/meson8b-usb2-phy.txt >>>> Documentation/devicetree/bindings/phy/phy-rockchip-inno-usb2.txt >>>> Documentation/devicetree/bindings/power/rockchip-io-domain.txt >>>> Documentation/devicetree/bindings/power/supply/bq27xxx.txt >>>> Documentation/devicetree/bindings/property-units.txt >>>> Documentation/devicetree/bindings/regulator/regulator.txt >>>> Documentation/devicetree/bindings/serial/8 >>>> error: The following untracked working tree files would be >>>> overwritten by checkout: >>>> Documentation/ABI/testing/sysfs-class-net-phydev >>>> Documentation/DocBook/.gitignore >>>> Documentation/DocBook/Makefile >>>> Documentation/DocBook/filesystems.tmpl >>>> Documentation/DocBook/kernel-hacking.tmpl >>>> Documentation/DocBook/kernel-locking.tmpl >>>> Documentation/DocBook/kgdb.tmpl >>>> Documentation/DocBook/libata.tmpl >>>> Documentation/DocBook/librs.tmpl >>>> Documentation/DocBook/lsm.tmpl >>>> Documentation/DocBook/mtdnand.tmpl >>>> Documentation/DocBook/networking.tmpl >>>> Documentation/DocBook/rapidio.tmpl >>>> Documentation/DocBook/s390-drivers.tmpl >>>> Documentation/DocBook/scsi.tmpl >>>> Documentation/DocBook/sh.tmpl >>>> Documentation/DocBook/stylesheet.xsl >>>> Documentation/DocBook/w1.tmpl >>>> Documentation/DocBook/z8530book.tmpl >>>> Documentation/Makefile.sphinx >>>> Documentation/RCU/trace.txt >>>> Documentation/devicetree/bindings/i2c/i2c-mt6577.txt >>>> Documentation/devicetree/bindings/misc/allwinner,syscon.txt >>>> Documentation/devicetree/bindings/net/cortina.txt >>>> Documentation/devicetree/bindings/net/dsa/ksz.txt >>>> Documentation/devicetree/bindings/net/dwmac-sun8i.txt >>>> Documentation/devicetree/bindings/net/qca,qca7000.txt >>>> Documentation/devicetree/bindings/power/max8903-charger.txt >>>> Documentation/devicetree/bindings/power_supply/maxim,max14656.txt >>>> Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt >>>> Documentation/devicetree/bindings/timer/moxa,moxart-timer.txt >>>> Documentation/doc-guide/docbook.rst >>>> Documentation/networking/tls.txt >>>> Documentation/prctl/no_new_privs.txt >>>> Documentation/prctl/seccomp_filter.txt >>>> Documentation/security/00-INDEX >>>> Documentation/security/IMA-templates.txt >>>> Documentation/security/LSM.txt >>>> Documentation/security/LoadPin.txt >>>> Documentation/security/SELinux.txt >>>> Documentation/security/Smack.txt >>>> Documentation/security/Yama.txt >>>> Documentation/security/apparmor.txt >>>> Documentation/security/conf.py >>>> Documentation/security/credentials.txt >>>> Documentation/security/keys-ecryptfs.txt >>>> Documentation/security/keys-request-key.txt >>>> Documentation/security/keys-trusted-encrypted.txt >>>> Documentation/security/keys.txt >>>> Documentation/security/self-protection.txt >>>> Documentation/security/tomoyo.txt >>>> Documentation/sphinx/convert_template.sed >>>> Documentation/sphinx/post_convert.sed >>>> Documentation/sphinx/tmplcvt >>>> Documentation/usb/typec.rst >>>> Documentation/usb/usb3-debug-port.rst >>>> arch/arm/boot/dts/rk1108-evb.dts >>>> arch/arm/boot/dts/rk1108.dtsi >>>> arch/arm/boot/dts/tegra20-whistler.dts >>>> arch/arm/mach-omap2/opp.c >>>> arch/arm/mach-omap2/pmu.c >>>> arch/ia64/include/asm/siginfo.h >>>> arch/m32r/include/uapi/asm/siginfo.h >>>> arch/microblaze/include/asm/bitops.h >>>> arch/microblaze/include/asm/bug.h >>>> arch/microblaze/include/asm/bugs.h >>>> arch/microblaze/include/asm/div64.h >>>> arch/microblaze/include/asm/emergency-restart.h >>>> arch/microblaze/include/asm/fb.h >>>> arch/microblaze/include/asm/hardirq.h >>>> arch/microblaze/include/asm/irq_regs.h >>>> arch/microblaze/include/asm/kdebug.h >>>> arch/microblaze/include/asm/kmap_types.h >>>> arch/microblaze/include/asm/linkage.h >>>> arch/microblaze/include/asm/local.h >>>> arch/microblaze/include/asm/local64.h >>>> arch/microblaze/include/asm/parport.h >>>> arch/microblaze/include/asm/percpu.h >>>> arch/microblaze/include/asm/serial.h >>>> arch/microblaze/include/asm/shmparam.h >>>> arch/microblaze/include/asm/topology.h >>>> arch/microblaze/include/asm/ucontext.h >>>> arch/microblaze/include/asm/vga.h >>>> arch/microblaze/include/asm/xor.h >>>> arch/microblaze/include/uapi/asm/bitsperlong.h >>>> arch/microblaze/include/uapi/asm/errno.h >>>> arch/microblaze/include/uapi/asm/fcntl.h >>>> arch/microblaze/include/uapi/asm/ioctl.h >>>> arch/microblaze/include/uapi/asm/ioctls.h >>>> arch/microblaze/include/uapi/asm/ipcbuf.h >>>> arch/microblaze/include/uapi/asm/kvm_para.h >>>> arch/microblaze/include/uapi/asm/mman.h >>>> arch/microblaze/include/uapi/asm/msgbuf.h >>>> arch/microblaze/include/uapi/asm/param.h >>>> arch/microblaze/include/uapi/asm/poll.h >>>> arch/microblaze/include/uapi/asm/resource.h >>>> arch/microblaze/include/uapi/asm/sembuf.h >>>> arch/microblaze/include/uapi/asm/shmbuf.h >>>> arch/microblaze/include/uapi/asm/siginfo.h >>>> arch/microblaze/include/uapi/asm/signal.h >>>> arch/microblaze/includ >>>> Aborting >>>> >>>> >>>> >>>> W dniu 2017-09-20 o 11:45, Paweł Staszewski pisze: >>>>> Ok looks like ending bisection >>>>> >>>>> >>>>> Latest bisected kernel when there is no kernel panic 4.12.0+ (from >>>>> next) - but only this warning: >>>>> >>>>> [ 309.030019] NETDEV WATCHDOG: enp4s0f0 (ixgbe): transmit queue 0 >>>>> timed out >>>>> [ 309.030034] ------------[ cut here ]------------ >>>>> [ 309.030040] WARNING: CPU: 35 PID: 0 at dev_watchdog+0xcf/0x139 >>>>> [ 309.030041] Modules linked in: bonding ipmi_si >>>>> x86_pkg_temp_thermal >>>>> [ 309.030045] CPU: 35 PID: 0 Comm: swapper/35 Not tainted 4.12.0+ #5 >>>>> [ 309.030046] task: ffff88086d98a000 task.stack: ffffc90003378000 >>>>> [ 309.030048] RIP: 0010:dev_watchdog+0xcf/0x139 >>>>> [ 309.030049] RSP: 0018:ffff88087fbc3ea8 EFLAGS: 00010246 >>>>> [ 309.030050] RAX: 000000000000003d RBX: ffff88046b680000 RCX: >>>>> 0000000000000000 >>>>> [ 309.030050] RDX: ffff88087fbd2f01 RSI: 0000000000000000 RDI: >>>>> ffff88087fbcda08 >>>>> [ 309.030051] RBP: ffff88087fbc3eb8 R08: 0000000000000000 R09: >>>>> ffff88087ff80a04 >>>>> [ 309.030051] R10: 0000000000000000 R11: ffff88086d98a001 R12: >>>>> 0000000000000000 >>>>> [ 309.030052] R13: ffff88087fbc3ef8 R14: ffff88086d98a000 R15: >>>>> ffffffff81c06008 >>>>> [ 309.030053] FS: 0000000000000000(0000) >>>>> GS:ffff88087fbc0000(0000) knlGS:0000000000000000 >>>>> [ 309.030054] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> [ 309.030054] CR2: 00007fba600f6098 CR3: 000000086b955000 CR4: >>>>> 00000000001406e0 >>>>> [ 309.030055] Call Trace: >>>>> [ 309.030057] <IRQ> >>>>> [ 309.030059] ? netif_tx_lock+0x79/0x79 >>>>> [ 309.030062] call_timer_fn.isra.24+0x17/0x77 >>>>> [ 309.030063] run_timer_softirq+0x118/0x161 >>>>> [ 309.030065] ? netif_tx_lock+0x79/0x79 >>>>> [ 309.030066] ? ktime_get+0x2b/0x42 >>>>> [ 309.030070] ? lapic_next_deadline+0x21/0x27 >>>>> [ 309.030073] ? clockevents_program_event+0xa8/0xc5 >>>>> [ 309.030076] __do_softirq+0xa8/0x19d >>>>> [ 309.030078] irq_exit+0x5d/0x6b >>>>> [ 309.030079] smp_apic_timer_interrupt+0x2a/0x36 >>>>> [ 309.030082] apic_timer_interrupt+0x89/0x90 >>>>> [ 309.030085] RIP: 0010:mwait_idle+0x4e/0x6a >>>>> [ 309.030086] RSP: 0018:ffffc9000337be98 EFLAGS: 00000246 >>>>> ORIG_RAX: ffffffffffffff10 >>>>> [ 309.030087] RAX: 0000000000000000 RBX: 0000000000000000 RCX: >>>>> 0000000000000000 >>>>> [ 309.030087] RDX: 0000000000000000 RSI: 0000000000000000 RDI: >>>>> ffff88086d98a000 >>>>> [ 309.030088] RBP: ffffc9000337be98 R08: ffff88046f8279a0 R09: >>>>> ffff88046f827040 >>>>> [ 309.030089] R10: ffff88086d98a000 R11: ffff88086d98a000 R12: >>>>> 0000000000000000 >>>>> [ 309.030089] R13: ffff88086d98a000 R14: ffff88086d98a000 R15: >>>>> ffff88086d98a000 >>>>> [ 309.030090] </IRQ> >>>>> [ 309.030094] arch_cpu_idle+0xa/0xc >>>>> [ 309.030095] default_idle_call+0x19/0x1b >>>>> [ 309.030102] do_idle+0xbc/0x196 >>>>> [ 309.030104] cpu_startup_entry+0x1d/0x20 >>>>> [ 309.030105] start_secondary+0xd8/0xdc >>>>> [ 309.030108] secondary_startup_64+0x9f/0x9f >>>>> [ 309.030109] Code: cc 75 bd eb 35 48 89 df c6 05 c3 dc 74 00 01 >>>>> e8 3a 62 fe ff 44 89 e1 48 89 de 48 89 c2 48 c7 c7 0f 65 a4 81 31 >>>>> c0 e8 3d 4c b5 ff <0f> ff 48 8b 83 e0 01 00 00 48 89 df ff 50 78 >>>>> 48 8b 05 a0 bc 6a >>>>> [ 309.030128] ---[ end trace 9102cb25703ae2d9 ]--- >>>>> >>>>> >>>>> I just marked it as good - cause this problem above is differend - >>>>> and im going to: >>>>> >>>>> git bisect good >>>>> Bisecting: 1787 revisions left to test after this (roughly 11 steps) >>>>> >>>>> >>>>> >>>>> >>>>> W dniu 2017-09-20 o 10:44, Paweł Staszewski pisze: >>>>>> Trying to make video from ipmi :) >>>>>> >>>>>> with that results: >>>>>> >>>>>> https://bugzilla.kernel.org/attachment.cgi?id=258521 >>>>>> >>>>>> catched two more lines where it starts - panic from 4.13.2. >>>>>> >>>>>> >>>>>> Now will try tro do some bisection >>>>>> >>>>>> >>>>>> >>>>>> W dniu 2017-09-20 o 09:58, Paweł Staszewski pisze: >>>>>>> Hi >>>>>>> >>>>>>> >>>>>>> Will try bisecting tonight >>>>>>> >>>>>>> >>>>>>> >>>>>>> W dniu 2017-09-20 o 05:24, Eric Dumazet pisze: >>>>>>>> On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote: >>>>>>>>> Just checked kernel 4.13.2 and same problem >>>>>>>>> >>>>>>>>> Just after start all 6 bgp sessions - and kernel starts to >>>>>>>>> learn routes >>>>>>>>> it panic. >>>>>>>>> >>>>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=258509 >>>>>>>>> >>>>>>>> >>>>>>>> Unfortunately we have not enough information from these traces. >>>>>>>> >>>>>>>> Can you get a full stack trace ? >>>>>>>> >>>>>>>> Alternatively, can you bisect ? >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 12:49 ` Paweł Staszewski @ 2017-09-20 13:05 ` Paweł Staszewski 2017-09-20 13:09 ` Paweł Staszewski 2017-09-20 13:11 ` Eric Dumazet 1 sibling, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 13:05 UTC (permalink / raw) To: Eric Dumazet; +Cc: Linux Kernel Network Developers hmm But after b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 Author: Wei Wang <weiwan@google.com> Date: Sat Jun 17 10:42:32 2017 -0700 ipv4: mark DST_NOGC and remove the operation of dst_free() With the previous preparation patches, we are ready to get rid of the dst gc operation in ipv4 code and release dst based on refcnt only. So this patch adds DST_NOGC flag for all IPv4 dst and remove the calls to dst_free(). At this point, all dst created in ipv4 code do not use the dst gc anymore and will be destroyed at the point when refcnt drops to 0. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net Still panic - soo will back to past 3 steps and will try to get again bisect without panic. W dniu 2017-09-20 o 14:49, Paweł Staszewski pisze: > And the last one > > git bisect good > Bisecting: 1 revision left to test after this (roughly 1 step) > [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt > for insertion into fib6 tree > > With this have kernel panic same as always > > git bisect bad > Bisecting: 0 revisions left to test after this (roughly 0 steps) > [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and > remove the operation of dst_free() > > > > W dniu 2017-09-20 o 14:23, Paweł Staszewski pisze: >> Almost there >> >> Bisecting: 6 revisions left to test after this (roughly 3 steps) >> [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() >> properly >> >> >> >> W dniu 2017-09-20 o 13:02, Paweł Staszewski pisze: >>> Ok resumed and soo far: >>> >>> Panic: >>> >>> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid >>> using stack larger than 1024. >>> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f >>> >>> No panic: >>> >>> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch >>> 'udp-reduce-cache-pressure' >>> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 >>> >>> >>> W dniu 2017-09-20 o 12:22, Paweł Staszewski pisze: >>>> Soo far bisected and marked: >>>> >>>> git bisect start >>>> # bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2 >>>> git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355 >>>> # good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13 >>>> git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269 >>>> # good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12 >>>> git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c >>>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag >>>> 'pinctrl-v4.13-1' of >>>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl >>>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 >>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch >>>> 'next' of >>>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch >>>> 'next' of >>>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch >>>> 'next' of >>>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >>>> >>>> >>>> >>>> W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze: >>>>> Ok kernel crashed with different panic that i didnt catch when i >>>>> was doing bisect and now my bisection is broken :) >>>>> >>>>> git bisect good >>>>> Bisecting: 1787 revisions left to test after this (roughly 11 steps) >>>>> error: Your local changes to the following files would be >>>>> overwritten by checkout: >>>>> Documentation/00-INDEX >>>>> Documentation/ABI/stable/sysfs-class-udc >>>>> Documentation/ABI/testing/configfs-usb-gadget-uac1 >>>>> Documentation/ABI/testing/ima_policy >>>>> Documentation/ABI/testing/sysfs-bus-iio >>>>> Documentation/ABI/testing/sysfs-bus-iio-meas-spec >>>>> Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 >>>>> Documentation/ABI/testing/sysfs-class-net >>>>> Documentation/ABI/testing/sysfs-class-power-twl4030 >>>>> Documentation/ABI/testing/sysfs-class-typec >>>>> Documentation/DMA-API.txt >>>>> Documentation/IRQ-domain.txt >>>>> Documentation/Makefile >>>>> Documentation/PCI/MSI-HOWTO.txt >>>>> Documentation/RCU/00-INDEX >>>>> Documentation/RCU/Design/Requirements/Requirements.html >>>>> Documentation/RCU/checklist.txt >>>>> Documentation/admin-guide/README.rst >>>>> Documentation/admin-guide/devices.txt >>>>> Documentation/admin-guide/index.rst >>>>> Documentation/admin-guide/kernel-parameters.txt >>>>> Documentation/admin-guide/pm/cpufreq.rst >>>>> Documentation/admin-guide/pm/intel_pstate.rst >>>>> Documentation/admin-guide/ras.rst >>>>> Documentation/arm/Atmel/README >>>>> Documentation/block/biodoc.txt >>>>> Documentation/conf.py >>>>> Documentation/core-api/assoc_array.rst >>>>> Documentation/core-api/atomic_ops.rst >>>>> Documentation/core-api/index.rst >>>>> Documentation/crypto/asymmetric-keys.txt >>>>> Documentation/dev-tools/index.rst >>>>> Documentation/dev-tools/sparse.rst >>>>> Documentation/devicetree/bindings/arm/amlogic.txt >>>>> Documentation/devicetree/bindings/arm/atmel-at91.txt >>>>> Documentation/devicetree/bindings/arm/ccn.txt >>>>> Documentation/devicetree/bindings/arm/cpus.txt >>>>> Documentation/devicetree/bindings/arm/gemini.txt >>>>> Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt >>>>> Documentation/devicetree/bindings/arm/keystone/keystone.txt >>>>> Documentation/devicetree/bindings/arm/mediatek.txt >>>>> Documentation/devicetree/bindings/arm/rockchip.txt >>>>> Documentation/devicetree/bindings/arm/shmobile.txt >>>>> Documentation/devicetree/bindings/arm/tegra.txt >>>>> Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt >>>>> Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt >>>>> Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt >>>>> Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt >>>>> Documentation/devicetree/bindings/gpio/gpio_atmel.txt >>>>> Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt >>>>> Documentation/devicetree/bindings/iio/adc/renesas,gyroadc.txt >>>>> Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt >>>>> Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt >>>>> Documentation/devicetree/bindings/interrupt-controller/allwinner,sunxi-nmi.txt >>>>> >>>>> Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-vic.txt >>>>> >>>>> Documentation/devicetree/bindings/interrupt-controller/mediatek,sysirq.txt >>>>> >>>>> Documentation/devicetree/bindings/leds/common.txt >>>>> Documentation/devicetree/bindings/mfd/hi6421.txt >>>>> Documentation/devicetree/bindings/mfd/tps65910.txt >>>>> Documentation/devicetree/bindings/mmc/fsl-esdhc.txt >>>>> Documentation/devicetree/bindings/mmc/k3-dw-mshc.txt >>>>> Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt >>>>> Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt >>>>> Documentation/devicetree/bindings/mtd/atmel-nand.txt >>>>> Documentation/devicetree/bindings/net/dsa/b53.txt >>>>> Documentation/devicetree/bindings/net/ethernet.txt >>>>> Documentation/devicetree/bindings/net/macb.txt >>>>> Documentation/devicetree/bindings/net/marvell-orion-mdio.txt >>>>> Documentation/devicetree/bindings/net/ti,wilink-st.txt >>>>> Documentation/devicetree/bindings/net/wireless/ti,wlcore.txt >>>>> Documentation/devicetree/bindings/nvmem/rockchip-efuse.txt >>>>> Documentation/devicetree/bindings/opp/opp.txt >>>>> Documentation/devicetree/bindings/phy/bcm-ns-usb3-phy.txt >>>>> Documentation/devicetree/bindings/phy/brcm-sata-phy.txt >>>>> Documentation/devicetree/bindings/phy/meson8b-usb2-phy.txt >>>>> Documentation/devicetree/bindings/phy/phy-rockchip-inno-usb2.txt >>>>> Documentation/devicetree/bindings/power/rockchip-io-domain.txt >>>>> Documentation/devicetree/bindings/power/supply/bq27xxx.txt >>>>> Documentation/devicetree/bindings/property-units.txt >>>>> Documentation/devicetree/bindings/regulator/regulator.txt >>>>> Documentation/devicetree/bindings/serial/8 >>>>> error: The following untracked working tree files would be >>>>> overwritten by checkout: >>>>> Documentation/ABI/testing/sysfs-class-net-phydev >>>>> Documentation/DocBook/.gitignore >>>>> Documentation/DocBook/Makefile >>>>> Documentation/DocBook/filesystems.tmpl >>>>> Documentation/DocBook/kernel-hacking.tmpl >>>>> Documentation/DocBook/kernel-locking.tmpl >>>>> Documentation/DocBook/kgdb.tmpl >>>>> Documentation/DocBook/libata.tmpl >>>>> Documentation/DocBook/librs.tmpl >>>>> Documentation/DocBook/lsm.tmpl >>>>> Documentation/DocBook/mtdnand.tmpl >>>>> Documentation/DocBook/networking.tmpl >>>>> Documentation/DocBook/rapidio.tmpl >>>>> Documentation/DocBook/s390-drivers.tmpl >>>>> Documentation/DocBook/scsi.tmpl >>>>> Documentation/DocBook/sh.tmpl >>>>> Documentation/DocBook/stylesheet.xsl >>>>> Documentation/DocBook/w1.tmpl >>>>> Documentation/DocBook/z8530book.tmpl >>>>> Documentation/Makefile.sphinx >>>>> Documentation/RCU/trace.txt >>>>> Documentation/devicetree/bindings/i2c/i2c-mt6577.txt >>>>> Documentation/devicetree/bindings/misc/allwinner,syscon.txt >>>>> Documentation/devicetree/bindings/net/cortina.txt >>>>> Documentation/devicetree/bindings/net/dsa/ksz.txt >>>>> Documentation/devicetree/bindings/net/dwmac-sun8i.txt >>>>> Documentation/devicetree/bindings/net/qca,qca7000.txt >>>>> Documentation/devicetree/bindings/power/max8903-charger.txt >>>>> Documentation/devicetree/bindings/power_supply/maxim,max14656.txt >>>>> Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt >>>>> Documentation/devicetree/bindings/timer/moxa,moxart-timer.txt >>>>> Documentation/doc-guide/docbook.rst >>>>> Documentation/networking/tls.txt >>>>> Documentation/prctl/no_new_privs.txt >>>>> Documentation/prctl/seccomp_filter.txt >>>>> Documentation/security/00-INDEX >>>>> Documentation/security/IMA-templates.txt >>>>> Documentation/security/LSM.txt >>>>> Documentation/security/LoadPin.txt >>>>> Documentation/security/SELinux.txt >>>>> Documentation/security/Smack.txt >>>>> Documentation/security/Yama.txt >>>>> Documentation/security/apparmor.txt >>>>> Documentation/security/conf.py >>>>> Documentation/security/credentials.txt >>>>> Documentation/security/keys-ecryptfs.txt >>>>> Documentation/security/keys-request-key.txt >>>>> Documentation/security/keys-trusted-encrypted.txt >>>>> Documentation/security/keys.txt >>>>> Documentation/security/self-protection.txt >>>>> Documentation/security/tomoyo.txt >>>>> Documentation/sphinx/convert_template.sed >>>>> Documentation/sphinx/post_convert.sed >>>>> Documentation/sphinx/tmplcvt >>>>> Documentation/usb/typec.rst >>>>> Documentation/usb/usb3-debug-port.rst >>>>> arch/arm/boot/dts/rk1108-evb.dts >>>>> arch/arm/boot/dts/rk1108.dtsi >>>>> arch/arm/boot/dts/tegra20-whistler.dts >>>>> arch/arm/mach-omap2/opp.c >>>>> arch/arm/mach-omap2/pmu.c >>>>> arch/ia64/include/asm/siginfo.h >>>>> arch/m32r/include/uapi/asm/siginfo.h >>>>> arch/microblaze/include/asm/bitops.h >>>>> arch/microblaze/include/asm/bug.h >>>>> arch/microblaze/include/asm/bugs.h >>>>> arch/microblaze/include/asm/div64.h >>>>> arch/microblaze/include/asm/emergency-restart.h >>>>> arch/microblaze/include/asm/fb.h >>>>> arch/microblaze/include/asm/hardirq.h >>>>> arch/microblaze/include/asm/irq_regs.h >>>>> arch/microblaze/include/asm/kdebug.h >>>>> arch/microblaze/include/asm/kmap_types.h >>>>> arch/microblaze/include/asm/linkage.h >>>>> arch/microblaze/include/asm/local.h >>>>> arch/microblaze/include/asm/local64.h >>>>> arch/microblaze/include/asm/parport.h >>>>> arch/microblaze/include/asm/percpu.h >>>>> arch/microblaze/include/asm/serial.h >>>>> arch/microblaze/include/asm/shmparam.h >>>>> arch/microblaze/include/asm/topology.h >>>>> arch/microblaze/include/asm/ucontext.h >>>>> arch/microblaze/include/asm/vga.h >>>>> arch/microblaze/include/asm/xor.h >>>>> arch/microblaze/include/uapi/asm/bitsperlong.h >>>>> arch/microblaze/include/uapi/asm/errno.h >>>>> arch/microblaze/include/uapi/asm/fcntl.h >>>>> arch/microblaze/include/uapi/asm/ioctl.h >>>>> arch/microblaze/include/uapi/asm/ioctls.h >>>>> arch/microblaze/include/uapi/asm/ipcbuf.h >>>>> arch/microblaze/include/uapi/asm/kvm_para.h >>>>> arch/microblaze/include/uapi/asm/mman.h >>>>> arch/microblaze/include/uapi/asm/msgbuf.h >>>>> arch/microblaze/include/uapi/asm/param.h >>>>> arch/microblaze/include/uapi/asm/poll.h >>>>> arch/microblaze/include/uapi/asm/resource.h >>>>> arch/microblaze/include/uapi/asm/sembuf.h >>>>> arch/microblaze/include/uapi/asm/shmbuf.h >>>>> arch/microblaze/include/uapi/asm/siginfo.h >>>>> arch/microblaze/include/uapi/asm/signal.h >>>>> arch/microblaze/includ >>>>> Aborting >>>>> >>>>> >>>>> >>>>> W dniu 2017-09-20 o 11:45, Paweł Staszewski pisze: >>>>>> Ok looks like ending bisection >>>>>> >>>>>> >>>>>> Latest bisected kernel when there is no kernel panic 4.12.0+ >>>>>> (from next) - but only this warning: >>>>>> >>>>>> [ 309.030019] NETDEV WATCHDOG: enp4s0f0 (ixgbe): transmit queue >>>>>> 0 timed out >>>>>> [ 309.030034] ------------[ cut here ]------------ >>>>>> [ 309.030040] WARNING: CPU: 35 PID: 0 at dev_watchdog+0xcf/0x139 >>>>>> [ 309.030041] Modules linked in: bonding ipmi_si >>>>>> x86_pkg_temp_thermal >>>>>> [ 309.030045] CPU: 35 PID: 0 Comm: swapper/35 Not tainted >>>>>> 4.12.0+ #5 >>>>>> [ 309.030046] task: ffff88086d98a000 task.stack: ffffc90003378000 >>>>>> [ 309.030048] RIP: 0010:dev_watchdog+0xcf/0x139 >>>>>> [ 309.030049] RSP: 0018:ffff88087fbc3ea8 EFLAGS: 00010246 >>>>>> [ 309.030050] RAX: 000000000000003d RBX: ffff88046b680000 RCX: >>>>>> 0000000000000000 >>>>>> [ 309.030050] RDX: ffff88087fbd2f01 RSI: 0000000000000000 RDI: >>>>>> ffff88087fbcda08 >>>>>> [ 309.030051] RBP: ffff88087fbc3eb8 R08: 0000000000000000 R09: >>>>>> ffff88087ff80a04 >>>>>> [ 309.030051] R10: 0000000000000000 R11: ffff88086d98a001 R12: >>>>>> 0000000000000000 >>>>>> [ 309.030052] R13: ffff88087fbc3ef8 R14: ffff88086d98a000 R15: >>>>>> ffffffff81c06008 >>>>>> [ 309.030053] FS: 0000000000000000(0000) >>>>>> GS:ffff88087fbc0000(0000) knlGS:0000000000000000 >>>>>> [ 309.030054] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>> [ 309.030054] CR2: 00007fba600f6098 CR3: 000000086b955000 CR4: >>>>>> 00000000001406e0 >>>>>> [ 309.030055] Call Trace: >>>>>> [ 309.030057] <IRQ> >>>>>> [ 309.030059] ? netif_tx_lock+0x79/0x79 >>>>>> [ 309.030062] call_timer_fn.isra.24+0x17/0x77 >>>>>> [ 309.030063] run_timer_softirq+0x118/0x161 >>>>>> [ 309.030065] ? netif_tx_lock+0x79/0x79 >>>>>> [ 309.030066] ? ktime_get+0x2b/0x42 >>>>>> [ 309.030070] ? lapic_next_deadline+0x21/0x27 >>>>>> [ 309.030073] ? clockevents_program_event+0xa8/0xc5 >>>>>> [ 309.030076] __do_softirq+0xa8/0x19d >>>>>> [ 309.030078] irq_exit+0x5d/0x6b >>>>>> [ 309.030079] smp_apic_timer_interrupt+0x2a/0x36 >>>>>> [ 309.030082] apic_timer_interrupt+0x89/0x90 >>>>>> [ 309.030085] RIP: 0010:mwait_idle+0x4e/0x6a >>>>>> [ 309.030086] RSP: 0018:ffffc9000337be98 EFLAGS: 00000246 >>>>>> ORIG_RAX: ffffffffffffff10 >>>>>> [ 309.030087] RAX: 0000000000000000 RBX: 0000000000000000 RCX: >>>>>> 0000000000000000 >>>>>> [ 309.030087] RDX: 0000000000000000 RSI: 0000000000000000 RDI: >>>>>> ffff88086d98a000 >>>>>> [ 309.030088] RBP: ffffc9000337be98 R08: ffff88046f8279a0 R09: >>>>>> ffff88046f827040 >>>>>> [ 309.030089] R10: ffff88086d98a000 R11: ffff88086d98a000 R12: >>>>>> 0000000000000000 >>>>>> [ 309.030089] R13: ffff88086d98a000 R14: ffff88086d98a000 R15: >>>>>> ffff88086d98a000 >>>>>> [ 309.030090] </IRQ> >>>>>> [ 309.030094] arch_cpu_idle+0xa/0xc >>>>>> [ 309.030095] default_idle_call+0x19/0x1b >>>>>> [ 309.030102] do_idle+0xbc/0x196 >>>>>> [ 309.030104] cpu_startup_entry+0x1d/0x20 >>>>>> [ 309.030105] start_secondary+0xd8/0xdc >>>>>> [ 309.030108] secondary_startup_64+0x9f/0x9f >>>>>> [ 309.030109] Code: cc 75 bd eb 35 48 89 df c6 05 c3 dc 74 00 01 >>>>>> e8 3a 62 fe ff 44 89 e1 48 89 de 48 89 c2 48 c7 c7 0f 65 a4 81 31 >>>>>> c0 e8 3d 4c b5 ff <0f> ff 48 8b 83 e0 01 00 00 48 89 df ff 50 78 >>>>>> 48 8b 05 a0 bc 6a >>>>>> [ 309.030128] ---[ end trace 9102cb25703ae2d9 ]--- >>>>>> >>>>>> >>>>>> I just marked it as good - cause this problem above is differend >>>>>> - and im going to: >>>>>> >>>>>> git bisect good >>>>>> Bisecting: 1787 revisions left to test after this (roughly 11 steps) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> W dniu 2017-09-20 o 10:44, Paweł Staszewski pisze: >>>>>>> Trying to make video from ipmi :) >>>>>>> >>>>>>> with that results: >>>>>>> >>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=258521 >>>>>>> >>>>>>> catched two more lines where it starts - panic from 4.13.2. >>>>>>> >>>>>>> >>>>>>> Now will try tro do some bisection >>>>>>> >>>>>>> >>>>>>> >>>>>>> W dniu 2017-09-20 o 09:58, Paweł Staszewski pisze: >>>>>>>> Hi >>>>>>>> >>>>>>>> >>>>>>>> Will try bisecting tonight >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> W dniu 2017-09-20 o 05:24, Eric Dumazet pisze: >>>>>>>>> On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote: >>>>>>>>>> Just checked kernel 4.13.2 and same problem >>>>>>>>>> >>>>>>>>>> Just after start all 6 bgp sessions - and kernel starts to >>>>>>>>>> learn routes >>>>>>>>>> it panic. >>>>>>>>>> >>>>>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=258509 >>>>>>>>>> >>>>>>>>> >>>>>>>>> Unfortunately we have not enough information from these traces. >>>>>>>>> >>>>>>>>> Can you get a full stack trace ? >>>>>>>>> >>>>>>>>> Alternatively, can you bisect ? >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 13:05 ` Paweł Staszewski @ 2017-09-20 13:09 ` Paweł Staszewski 0 siblings, 0 replies; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 13:09 UTC (permalink / raw) To: Eric Dumazet; +Cc: Linux Kernel Network Developers So far path for bisect was: git bisect start # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 'pinctrl-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using stack larger than 1024. git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 'udp-reduce-cache-pressure' git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch 's390-net-updates-part-2' git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230 # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch 'bpf-ctx-narrow' git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70 # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove cp_outgoing git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2 # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add TCP_MD5SIG_EXT socket option to set a key address prefix git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new function dst_dev_put() git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36 # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove DST_NOCACHE flag git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2 # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() properly git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112 No PANIC # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call dst_hold_safe() properly git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f PANIC # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for insertion into fib6 tree git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855 PANIC # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911 # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() W dniu 2017-09-20 o 15:05, Paweł Staszewski pisze: > hmm > > But after > > b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit > commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 > Author: Wei Wang <weiwan@google.com> > Date: Sat Jun 17 10:42:32 2017 -0700 > > ipv4: mark DST_NOGC and remove the operation of dst_free() > > With the previous preparation patches, we are ready to get rid of the > dst gc operation in ipv4 code and release dst based on refcnt only. > So this patch adds DST_NOGC flag for all IPv4 dst and remove the > calls > to dst_free(). > At this point, all dst created in ipv4 code do not use the dst gc > anymore and will be destroyed at the point when refcnt drops to 0. > > Signed-off-by: Wei Wang <weiwan@google.com> > Acked-by: Martin KaFai Lau <kafai@fb.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > > :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da > 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net > > > Still panic - soo will back to past 3 steps and will try to get again > bisect without panic. > > > > W dniu 2017-09-20 o 14:49, Paweł Staszewski pisze: >> And the last one >> >> git bisect good >> Bisecting: 1 revision left to test after this (roughly 1 step) >> [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt >> for insertion into fib6 tree >> >> With this have kernel panic same as always >> >> git bisect bad >> Bisecting: 0 revisions left to test after this (roughly 0 steps) >> [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and >> remove the operation of dst_free() >> >> >> >> W dniu 2017-09-20 o 14:23, Paweł Staszewski pisze: >>> Almost there >>> >>> Bisecting: 6 revisions left to test after this (roughly 3 steps) >>> [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call >>> dst_hold_safe() properly >>> >>> >>> >>> W dniu 2017-09-20 o 13:02, Paweł Staszewski pisze: >>>> Ok resumed and soo far: >>>> >>>> Panic: >>>> >>>> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid >>>> using stack larger than 1024. >>>> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f >>>> >>>> No panic: >>>> >>>> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch >>>> 'udp-reduce-cache-pressure' >>>> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 >>>> >>>> >>>> W dniu 2017-09-20 o 12:22, Paweł Staszewski pisze: >>>>> Soo far bisected and marked: >>>>> >>>>> git bisect start >>>>> # bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2 >>>>> git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355 >>>>> # good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13 >>>>> git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269 >>>>> # good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12 >>>>> git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c >>>>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag >>>>> 'pinctrl-v4.13-1' of >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl >>>>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 >>>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch >>>>> 'next' of >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >>>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >>>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch >>>>> 'next' of >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >>>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >>>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch >>>>> 'next' of >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >>>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >>>>> >>>>> >>>>> >>>>> W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze: >>>>>> Ok kernel crashed with different panic that i didnt catch when i >>>>>> was doing bisect and now my bisection is broken :) >>>>>> >>>>>> git bisect good >>>>>> Bisecting: 1787 revisions left to test after this (roughly 11 steps) >>>>>> error: Your local changes to the following files would be >>>>>> overwritten by checkout: >>>>>> Documentation/00-INDEX >>>>>> Documentation/ABI/stable/sysfs-class-udc >>>>>> Documentation/ABI/testing/configfs-usb-gadget-uac1 >>>>>> Documentation/ABI/testing/ima_policy >>>>>> Documentation/ABI/testing/sysfs-bus-iio >>>>>> Documentation/ABI/testing/sysfs-bus-iio-meas-spec >>>>>> Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 >>>>>> Documentation/ABI/testing/sysfs-class-net >>>>>> Documentation/ABI/testing/sysfs-class-power-twl4030 >>>>>> Documentation/ABI/testing/sysfs-class-typec >>>>>> Documentation/DMA-API.txt >>>>>> Documentation/IRQ-domain.txt >>>>>> Documentation/Makefile >>>>>> Documentation/PCI/MSI-HOWTO.txt >>>>>> Documentation/RCU/00-INDEX >>>>>> Documentation/RCU/Design/Requirements/Requirements.html >>>>>> Documentation/RCU/checklist.txt >>>>>> Documentation/admin-guide/README.rst >>>>>> Documentation/admin-guide/devices.txt >>>>>> Documentation/admin-guide/index.rst >>>>>> Documentation/admin-guide/kernel-parameters.txt >>>>>> Documentation/admin-guide/pm/cpufreq.rst >>>>>> Documentation/admin-guide/pm/intel_pstate.rst >>>>>> Documentation/admin-guide/ras.rst >>>>>> Documentation/arm/Atmel/README >>>>>> Documentation/block/biodoc.txt >>>>>> Documentation/conf.py >>>>>> Documentation/core-api/assoc_array.rst >>>>>> Documentation/core-api/atomic_ops.rst >>>>>> Documentation/core-api/index.rst >>>>>> Documentation/crypto/asymmetric-keys.txt >>>>>> Documentation/dev-tools/index.rst >>>>>> Documentation/dev-tools/sparse.rst >>>>>> Documentation/devicetree/bindings/arm/amlogic.txt >>>>>> Documentation/devicetree/bindings/arm/atmel-at91.txt >>>>>> Documentation/devicetree/bindings/arm/ccn.txt >>>>>> Documentation/devicetree/bindings/arm/cpus.txt >>>>>> Documentation/devicetree/bindings/arm/gemini.txt >>>>>> Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt >>>>>> Documentation/devicetree/bindings/arm/keystone/keystone.txt >>>>>> Documentation/devicetree/bindings/arm/mediatek.txt >>>>>> Documentation/devicetree/bindings/arm/rockchip.txt >>>>>> Documentation/devicetree/bindings/arm/shmobile.txt >>>>>> Documentation/devicetree/bindings/arm/tegra.txt >>>>>> Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt >>>>>> Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt >>>>>> Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt >>>>>> Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt >>>>>> Documentation/devicetree/bindings/gpio/gpio_atmel.txt >>>>>> Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt >>>>>> Documentation/devicetree/bindings/iio/adc/renesas,gyroadc.txt >>>>>> Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt >>>>>> Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt >>>>>> Documentation/devicetree/bindings/interrupt-controller/allwinner,sunxi-nmi.txt >>>>>> >>>>>> Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-vic.txt >>>>>> >>>>>> Documentation/devicetree/bindings/interrupt-controller/mediatek,sysirq.txt >>>>>> >>>>>> Documentation/devicetree/bindings/leds/common.txt >>>>>> Documentation/devicetree/bindings/mfd/hi6421.txt >>>>>> Documentation/devicetree/bindings/mfd/tps65910.txt >>>>>> Documentation/devicetree/bindings/mmc/fsl-esdhc.txt >>>>>> Documentation/devicetree/bindings/mmc/k3-dw-mshc.txt >>>>>> Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt >>>>>> Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt >>>>>> Documentation/devicetree/bindings/mtd/atmel-nand.txt >>>>>> Documentation/devicetree/bindings/net/dsa/b53.txt >>>>>> Documentation/devicetree/bindings/net/ethernet.txt >>>>>> Documentation/devicetree/bindings/net/macb.txt >>>>>> Documentation/devicetree/bindings/net/marvell-orion-mdio.txt >>>>>> Documentation/devicetree/bindings/net/ti,wilink-st.txt >>>>>> Documentation/devicetree/bindings/net/wireless/ti,wlcore.txt >>>>>> Documentation/devicetree/bindings/nvmem/rockchip-efuse.txt >>>>>> Documentation/devicetree/bindings/opp/opp.txt >>>>>> Documentation/devicetree/bindings/phy/bcm-ns-usb3-phy.txt >>>>>> Documentation/devicetree/bindings/phy/brcm-sata-phy.txt >>>>>> Documentation/devicetree/bindings/phy/meson8b-usb2-phy.txt >>>>>> Documentation/devicetree/bindings/phy/phy-rockchip-inno-usb2.txt >>>>>> Documentation/devicetree/bindings/power/rockchip-io-domain.txt >>>>>> Documentation/devicetree/bindings/power/supply/bq27xxx.txt >>>>>> Documentation/devicetree/bindings/property-units.txt >>>>>> Documentation/devicetree/bindings/regulator/regulator.txt >>>>>> Documentation/devicetree/bindings/serial/8 >>>>>> error: The following untracked working tree files would be >>>>>> overwritten by checkout: >>>>>> Documentation/ABI/testing/sysfs-class-net-phydev >>>>>> Documentation/DocBook/.gitignore >>>>>> Documentation/DocBook/Makefile >>>>>> Documentation/DocBook/filesystems.tmpl >>>>>> Documentation/DocBook/kernel-hacking.tmpl >>>>>> Documentation/DocBook/kernel-locking.tmpl >>>>>> Documentation/DocBook/kgdb.tmpl >>>>>> Documentation/DocBook/libata.tmpl >>>>>> Documentation/DocBook/librs.tmpl >>>>>> Documentation/DocBook/lsm.tmpl >>>>>> Documentation/DocBook/mtdnand.tmpl >>>>>> Documentation/DocBook/networking.tmpl >>>>>> Documentation/DocBook/rapidio.tmpl >>>>>> Documentation/DocBook/s390-drivers.tmpl >>>>>> Documentation/DocBook/scsi.tmpl >>>>>> Documentation/DocBook/sh.tmpl >>>>>> Documentation/DocBook/stylesheet.xsl >>>>>> Documentation/DocBook/w1.tmpl >>>>>> Documentation/DocBook/z8530book.tmpl >>>>>> Documentation/Makefile.sphinx >>>>>> Documentation/RCU/trace.txt >>>>>> Documentation/devicetree/bindings/i2c/i2c-mt6577.txt >>>>>> Documentation/devicetree/bindings/misc/allwinner,syscon.txt >>>>>> Documentation/devicetree/bindings/net/cortina.txt >>>>>> Documentation/devicetree/bindings/net/dsa/ksz.txt >>>>>> Documentation/devicetree/bindings/net/dwmac-sun8i.txt >>>>>> Documentation/devicetree/bindings/net/qca,qca7000.txt >>>>>> Documentation/devicetree/bindings/power/max8903-charger.txt >>>>>> Documentation/devicetree/bindings/power_supply/maxim,max14656.txt >>>>>> Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt >>>>>> Documentation/devicetree/bindings/timer/moxa,moxart-timer.txt >>>>>> Documentation/doc-guide/docbook.rst >>>>>> Documentation/networking/tls.txt >>>>>> Documentation/prctl/no_new_privs.txt >>>>>> Documentation/prctl/seccomp_filter.txt >>>>>> Documentation/security/00-INDEX >>>>>> Documentation/security/IMA-templates.txt >>>>>> Documentation/security/LSM.txt >>>>>> Documentation/security/LoadPin.txt >>>>>> Documentation/security/SELinux.txt >>>>>> Documentation/security/Smack.txt >>>>>> Documentation/security/Yama.txt >>>>>> Documentation/security/apparmor.txt >>>>>> Documentation/security/conf.py >>>>>> Documentation/security/credentials.txt >>>>>> Documentation/security/keys-ecryptfs.txt >>>>>> Documentation/security/keys-request-key.txt >>>>>> Documentation/security/keys-trusted-encrypted.txt >>>>>> Documentation/security/keys.txt >>>>>> Documentation/security/self-protection.txt >>>>>> Documentation/security/tomoyo.txt >>>>>> Documentation/sphinx/convert_template.sed >>>>>> Documentation/sphinx/post_convert.sed >>>>>> Documentation/sphinx/tmplcvt >>>>>> Documentation/usb/typec.rst >>>>>> Documentation/usb/usb3-debug-port.rst >>>>>> arch/arm/boot/dts/rk1108-evb.dts >>>>>> arch/arm/boot/dts/rk1108.dtsi >>>>>> arch/arm/boot/dts/tegra20-whistler.dts >>>>>> arch/arm/mach-omap2/opp.c >>>>>> arch/arm/mach-omap2/pmu.c >>>>>> arch/ia64/include/asm/siginfo.h >>>>>> arch/m32r/include/uapi/asm/siginfo.h >>>>>> arch/microblaze/include/asm/bitops.h >>>>>> arch/microblaze/include/asm/bug.h >>>>>> arch/microblaze/include/asm/bugs.h >>>>>> arch/microblaze/include/asm/div64.h >>>>>> arch/microblaze/include/asm/emergency-restart.h >>>>>> arch/microblaze/include/asm/fb.h >>>>>> arch/microblaze/include/asm/hardirq.h >>>>>> arch/microblaze/include/asm/irq_regs.h >>>>>> arch/microblaze/include/asm/kdebug.h >>>>>> arch/microblaze/include/asm/kmap_types.h >>>>>> arch/microblaze/include/asm/linkage.h >>>>>> arch/microblaze/include/asm/local.h >>>>>> arch/microblaze/include/asm/local64.h >>>>>> arch/microblaze/include/asm/parport.h >>>>>> arch/microblaze/include/asm/percpu.h >>>>>> arch/microblaze/include/asm/serial.h >>>>>> arch/microblaze/include/asm/shmparam.h >>>>>> arch/microblaze/include/asm/topology.h >>>>>> arch/microblaze/include/asm/ucontext.h >>>>>> arch/microblaze/include/asm/vga.h >>>>>> arch/microblaze/include/asm/xor.h >>>>>> arch/microblaze/include/uapi/asm/bitsperlong.h >>>>>> arch/microblaze/include/uapi/asm/errno.h >>>>>> arch/microblaze/include/uapi/asm/fcntl.h >>>>>> arch/microblaze/include/uapi/asm/ioctl.h >>>>>> arch/microblaze/include/uapi/asm/ioctls.h >>>>>> arch/microblaze/include/uapi/asm/ipcbuf.h >>>>>> arch/microblaze/include/uapi/asm/kvm_para.h >>>>>> arch/microblaze/include/uapi/asm/mman.h >>>>>> arch/microblaze/include/uapi/asm/msgbuf.h >>>>>> arch/microblaze/include/uapi/asm/param.h >>>>>> arch/microblaze/include/uapi/asm/poll.h >>>>>> arch/microblaze/include/uapi/asm/resource.h >>>>>> arch/microblaze/include/uapi/asm/sembuf.h >>>>>> arch/microblaze/include/uapi/asm/shmbuf.h >>>>>> arch/microblaze/include/uapi/asm/siginfo.h >>>>>> arch/microblaze/include/uapi/asm/signal.h >>>>>> arch/microblaze/includ >>>>>> Aborting >>>>>> >>>>>> >>>>>> >>>>>> W dniu 2017-09-20 o 11:45, Paweł Staszewski pisze: >>>>>>> Ok looks like ending bisection >>>>>>> >>>>>>> >>>>>>> Latest bisected kernel when there is no kernel panic 4.12.0+ >>>>>>> (from next) - but only this warning: >>>>>>> >>>>>>> [ 309.030019] NETDEV WATCHDOG: enp4s0f0 (ixgbe): transmit queue >>>>>>> 0 timed out >>>>>>> [ 309.030034] ------------[ cut here ]------------ >>>>>>> [ 309.030040] WARNING: CPU: 35 PID: 0 at dev_watchdog+0xcf/0x139 >>>>>>> [ 309.030041] Modules linked in: bonding ipmi_si >>>>>>> x86_pkg_temp_thermal >>>>>>> [ 309.030045] CPU: 35 PID: 0 Comm: swapper/35 Not tainted >>>>>>> 4.12.0+ #5 >>>>>>> [ 309.030046] task: ffff88086d98a000 task.stack: ffffc90003378000 >>>>>>> [ 309.030048] RIP: 0010:dev_watchdog+0xcf/0x139 >>>>>>> [ 309.030049] RSP: 0018:ffff88087fbc3ea8 EFLAGS: 00010246 >>>>>>> [ 309.030050] RAX: 000000000000003d RBX: ffff88046b680000 RCX: >>>>>>> 0000000000000000 >>>>>>> [ 309.030050] RDX: ffff88087fbd2f01 RSI: 0000000000000000 RDI: >>>>>>> ffff88087fbcda08 >>>>>>> [ 309.030051] RBP: ffff88087fbc3eb8 R08: 0000000000000000 R09: >>>>>>> ffff88087ff80a04 >>>>>>> [ 309.030051] R10: 0000000000000000 R11: ffff88086d98a001 R12: >>>>>>> 0000000000000000 >>>>>>> [ 309.030052] R13: ffff88087fbc3ef8 R14: ffff88086d98a000 R15: >>>>>>> ffffffff81c06008 >>>>>>> [ 309.030053] FS: 0000000000000000(0000) >>>>>>> GS:ffff88087fbc0000(0000) knlGS:0000000000000000 >>>>>>> [ 309.030054] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>> [ 309.030054] CR2: 00007fba600f6098 CR3: 000000086b955000 CR4: >>>>>>> 00000000001406e0 >>>>>>> [ 309.030055] Call Trace: >>>>>>> [ 309.030057] <IRQ> >>>>>>> [ 309.030059] ? netif_tx_lock+0x79/0x79 >>>>>>> [ 309.030062] call_timer_fn.isra.24+0x17/0x77 >>>>>>> [ 309.030063] run_timer_softirq+0x118/0x161 >>>>>>> [ 309.030065] ? netif_tx_lock+0x79/0x79 >>>>>>> [ 309.030066] ? ktime_get+0x2b/0x42 >>>>>>> [ 309.030070] ? lapic_next_deadline+0x21/0x27 >>>>>>> [ 309.030073] ? clockevents_program_event+0xa8/0xc5 >>>>>>> [ 309.030076] __do_softirq+0xa8/0x19d >>>>>>> [ 309.030078] irq_exit+0x5d/0x6b >>>>>>> [ 309.030079] smp_apic_timer_interrupt+0x2a/0x36 >>>>>>> [ 309.030082] apic_timer_interrupt+0x89/0x90 >>>>>>> [ 309.030085] RIP: 0010:mwait_idle+0x4e/0x6a >>>>>>> [ 309.030086] RSP: 0018:ffffc9000337be98 EFLAGS: 00000246 >>>>>>> ORIG_RAX: ffffffffffffff10 >>>>>>> [ 309.030087] RAX: 0000000000000000 RBX: 0000000000000000 RCX: >>>>>>> 0000000000000000 >>>>>>> [ 309.030087] RDX: 0000000000000000 RSI: 0000000000000000 RDI: >>>>>>> ffff88086d98a000 >>>>>>> [ 309.030088] RBP: ffffc9000337be98 R08: ffff88046f8279a0 R09: >>>>>>> ffff88046f827040 >>>>>>> [ 309.030089] R10: ffff88086d98a000 R11: ffff88086d98a000 R12: >>>>>>> 0000000000000000 >>>>>>> [ 309.030089] R13: ffff88086d98a000 R14: ffff88086d98a000 R15: >>>>>>> ffff88086d98a000 >>>>>>> [ 309.030090] </IRQ> >>>>>>> [ 309.030094] arch_cpu_idle+0xa/0xc >>>>>>> [ 309.030095] default_idle_call+0x19/0x1b >>>>>>> [ 309.030102] do_idle+0xbc/0x196 >>>>>>> [ 309.030104] cpu_startup_entry+0x1d/0x20 >>>>>>> [ 309.030105] start_secondary+0xd8/0xdc >>>>>>> [ 309.030108] secondary_startup_64+0x9f/0x9f >>>>>>> [ 309.030109] Code: cc 75 bd eb 35 48 89 df c6 05 c3 dc 74 00 >>>>>>> 01 e8 3a 62 fe ff 44 89 e1 48 89 de 48 89 c2 48 c7 c7 0f 65 a4 >>>>>>> 81 31 c0 e8 3d 4c b5 ff <0f> ff 48 8b 83 e0 01 00 00 48 89 df ff >>>>>>> 50 78 48 8b 05 a0 bc 6a >>>>>>> [ 309.030128] ---[ end trace 9102cb25703ae2d9 ]--- >>>>>>> >>>>>>> >>>>>>> I just marked it as good - cause this problem above is differend >>>>>>> - and im going to: >>>>>>> >>>>>>> git bisect good >>>>>>> Bisecting: 1787 revisions left to test after this (roughly 11 >>>>>>> steps) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> W dniu 2017-09-20 o 10:44, Paweł Staszewski pisze: >>>>>>>> Trying to make video from ipmi :) >>>>>>>> >>>>>>>> with that results: >>>>>>>> >>>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=258521 >>>>>>>> >>>>>>>> catched two more lines where it starts - panic from 4.13.2. >>>>>>>> >>>>>>>> >>>>>>>> Now will try tro do some bisection >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> W dniu 2017-09-20 o 09:58, Paweł Staszewski pisze: >>>>>>>>> Hi >>>>>>>>> >>>>>>>>> >>>>>>>>> Will try bisecting tonight >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> W dniu 2017-09-20 o 05:24, Eric Dumazet pisze: >>>>>>>>>> On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote: >>>>>>>>>>> Just checked kernel 4.13.2 and same problem >>>>>>>>>>> >>>>>>>>>>> Just after start all 6 bgp sessions - and kernel starts to >>>>>>>>>>> learn routes >>>>>>>>>>> it panic. >>>>>>>>>>> >>>>>>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=258509 >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Unfortunately we have not enough information from these traces. >>>>>>>>>> >>>>>>>>>> Can you get a full stack trace ? >>>>>>>>>> >>>>>>>>>> Alternatively, can you bisect ? >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 12:49 ` Paweł Staszewski 2017-09-20 13:05 ` Paweł Staszewski @ 2017-09-20 13:11 ` Eric Dumazet 2017-09-20 13:16 ` Paweł Staszewski 2017-09-20 17:50 ` Cong Wang 1 sibling, 2 replies; 52+ messages in thread From: Eric Dumazet @ 2017-09-20 13:11 UTC (permalink / raw) To: Paweł Staszewski, Wei Wang; +Cc: Linux Kernel Network Developers, edumazet Sorry for top-posting, but this is to give context to Wei, since Pawel used a top posting way to report his bisection. Wei, can you take a look at Pawel report ? Crash happens in dst_destroy() at following : if (dst->dev) dev_put(dst->dev); <<CRASH>> dst->dev is not NULL, but netdev->pcpu_refcnt is NULL 65 ff 08 decl %gs:(%rax) // CRASH since rax = NULL Pawel, please share your netdevices and routing setup ? Thanks ! On Wed, 2017-09-20 at 14:49 +0200, Paweł Staszewski wrote: > And the last one > > git bisect good > Bisecting: 1 revision left to test after this (roughly 1 step) > [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for > insertion into fib6 tree > > With this have kernel panic same as always > > git bisect bad > Bisecting: 0 revisions left to test after this (roughly 0 steps) > [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and > remove the operation of dst_free() > > > > W dniu 2017-09-20 o 14:23, Paweł Staszewski pisze: > > Almost there > > > > Bisecting: 6 revisions left to test after this (roughly 3 steps) > > [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() > > properly > > > > > > > > W dniu 2017-09-20 o 13:02, Paweł Staszewski pisze: > >> Ok resumed and soo far: > >> > >> Panic: > >> > >> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid > >> using stack larger than 1024. > >> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f > >> > >> No panic: > >> > >> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch > >> 'udp-reduce-cache-pressure' > >> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 > >> > >> > >> W dniu 2017-09-20 o 12:22, Paweł Staszewski pisze: > >>> Soo far bisected and marked: > >>> > >>> git bisect start > >>> # bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2 > >>> git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355 > >>> # good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13 > >>> git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269 > >>> # good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12 > >>> git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c > >>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag > >>> 'pinctrl-v4.13-1' of > >>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl > >>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 > >>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch > >>> 'next' of > >>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security > >>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 > >>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch > >>> 'next' of > >>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security > >>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 > >>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch > >>> 'next' of > >>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security > >>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 > >>> > >>> > >>> > >>> W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze: > >>>> Ok kernel crashed with different panic that i didnt catch when i > >>>> was doing bisect and now my bisection is broken :) > >>>> > >>>> git bisect good > >>>> Bisecting: 1787 revisions left to test after this (roughly 11 steps) > >>>> error: Your local changes to the following files would be > >>>> overwritten by checkout: > >>>> Documentation/00-INDEX > >>>> Documentation/ABI/stable/sysfs-class-udc > >>>> Documentation/ABI/testing/configfs-usb-gadget-uac1 > >>>> Documentation/ABI/testing/ima_policy > >>>> Documentation/ABI/testing/sysfs-bus-iio > >>>> Documentation/ABI/testing/sysfs-bus-iio-meas-spec > >>>> Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 > >>>> Documentation/ABI/testing/sysfs-class-net > >>>> Documentation/ABI/testing/sysfs-class-power-twl4030 > >>>> Documentation/ABI/testing/sysfs-class-typec > >>>> Documentation/DMA-API.txt > >>>> Documentation/IRQ-domain.txt > >>>> Documentation/Makefile > >>>> Documentation/PCI/MSI-HOWTO.txt > >>>> Documentation/RCU/00-INDEX > >>>> Documentation/RCU/Design/Requirements/Requirements.html > >>>> Documentation/RCU/checklist.txt > >>>> Documentation/admin-guide/README.rst > >>>> Documentation/admin-guide/devices.txt > >>>> Documentation/admin-guide/index.rst > >>>> Documentation/admin-guide/kernel-parameters.txt > >>>> Documentation/admin-guide/pm/cpufreq.rst > >>>> Documentation/admin-guide/pm/intel_pstate.rst > >>>> Documentation/admin-guide/ras.rst > >>>> Documentation/arm/Atmel/README > >>>> Documentation/block/biodoc.txt > >>>> Documentation/conf.py > >>>> Documentation/core-api/assoc_array.rst > >>>> Documentation/core-api/atomic_ops.rst > >>>> Documentation/core-api/index.rst > >>>> Documentation/crypto/asymmetric-keys.txt > >>>> Documentation/dev-tools/index.rst > >>>> Documentation/dev-tools/sparse.rst > >>>> Documentation/devicetree/bindings/arm/amlogic.txt > >>>> Documentation/devicetree/bindings/arm/atmel-at91.txt > >>>> Documentation/devicetree/bindings/arm/ccn.txt > >>>> Documentation/devicetree/bindings/arm/cpus.txt > >>>> Documentation/devicetree/bindings/arm/gemini.txt > >>>> Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt > >>>> Documentation/devicetree/bindings/arm/keystone/keystone.txt > >>>> Documentation/devicetree/bindings/arm/mediatek.txt > >>>> Documentation/devicetree/bindings/arm/rockchip.txt > >>>> Documentation/devicetree/bindings/arm/shmobile.txt > >>>> Documentation/devicetree/bindings/arm/tegra.txt > >>>> Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt > >>>> Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt > >>>> Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt > >>>> Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt > >>>> Documentation/devicetree/bindings/gpio/gpio_atmel.txt > >>>> Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt > >>>> Documentation/devicetree/bindings/iio/adc/renesas,gyroadc.txt > >>>> Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt > >>>> Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt > >>>> Documentation/devicetree/bindings/interrupt-controller/allwinner,sunxi-nmi.txt > >>>> > >>>> Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-vic.txt > >>>> > >>>> Documentation/devicetree/bindings/interrupt-controller/mediatek,sysirq.txt > >>>> > >>>> Documentation/devicetree/bindings/leds/common.txt > >>>> Documentation/devicetree/bindings/mfd/hi6421.txt > >>>> Documentation/devicetree/bindings/mfd/tps65910.txt > >>>> Documentation/devicetree/bindings/mmc/fsl-esdhc.txt > >>>> Documentation/devicetree/bindings/mmc/k3-dw-mshc.txt > >>>> Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt > >>>> Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt > >>>> Documentation/devicetree/bindings/mtd/atmel-nand.txt > >>>> Documentation/devicetree/bindings/net/dsa/b53.txt > >>>> Documentation/devicetree/bindings/net/ethernet.txt > >>>> Documentation/devicetree/bindings/net/macb.txt > >>>> Documentation/devicetree/bindings/net/marvell-orion-mdio.txt > >>>> Documentation/devicetree/bindings/net/ti,wilink-st.txt > >>>> Documentation/devicetree/bindings/net/wireless/ti,wlcore.txt > >>>> Documentation/devicetree/bindings/nvmem/rockchip-efuse.txt > >>>> Documentation/devicetree/bindings/opp/opp.txt > >>>> Documentation/devicetree/bindings/phy/bcm-ns-usb3-phy.txt > >>>> Documentation/devicetree/bindings/phy/brcm-sata-phy.txt > >>>> Documentation/devicetree/bindings/phy/meson8b-usb2-phy.txt > >>>> Documentation/devicetree/bindings/phy/phy-rockchip-inno-usb2.txt > >>>> Documentation/devicetree/bindings/power/rockchip-io-domain.txt > >>>> Documentation/devicetree/bindings/power/supply/bq27xxx.txt > >>>> Documentation/devicetree/bindings/property-units.txt > >>>> Documentation/devicetree/bindings/regulator/regulator.txt > >>>> Documentation/devicetree/bindings/serial/8 > >>>> error: The following untracked working tree files would be > >>>> overwritten by checkout: > >>>> Documentation/ABI/testing/sysfs-class-net-phydev > >>>> Documentation/DocBook/.gitignore > >>>> Documentation/DocBook/Makefile > >>>> Documentation/DocBook/filesystems.tmpl > >>>> Documentation/DocBook/kernel-hacking.tmpl > >>>> Documentation/DocBook/kernel-locking.tmpl > >>>> Documentation/DocBook/kgdb.tmpl > >>>> Documentation/DocBook/libata.tmpl > >>>> Documentation/DocBook/librs.tmpl > >>>> Documentation/DocBook/lsm.tmpl > >>>> Documentation/DocBook/mtdnand.tmpl > >>>> Documentation/DocBook/networking.tmpl > >>>> Documentation/DocBook/rapidio.tmpl > >>>> Documentation/DocBook/s390-drivers.tmpl > >>>> Documentation/DocBook/scsi.tmpl > >>>> Documentation/DocBook/sh.tmpl > >>>> Documentation/DocBook/stylesheet.xsl > >>>> Documentation/DocBook/w1.tmpl > >>>> Documentation/DocBook/z8530book.tmpl > >>>> Documentation/Makefile.sphinx > >>>> Documentation/RCU/trace.txt > >>>> Documentation/devicetree/bindings/i2c/i2c-mt6577.txt > >>>> Documentation/devicetree/bindings/misc/allwinner,syscon.txt > >>>> Documentation/devicetree/bindings/net/cortina.txt > >>>> Documentation/devicetree/bindings/net/dsa/ksz.txt > >>>> Documentation/devicetree/bindings/net/dwmac-sun8i.txt > >>>> Documentation/devicetree/bindings/net/qca,qca7000.txt > >>>> Documentation/devicetree/bindings/power/max8903-charger.txt > >>>> Documentation/devicetree/bindings/power_supply/maxim,max14656.txt > >>>> Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt > >>>> Documentation/devicetree/bindings/timer/moxa,moxart-timer.txt > >>>> Documentation/doc-guide/docbook.rst > >>>> Documentation/networking/tls.txt > >>>> Documentation/prctl/no_new_privs.txt > >>>> Documentation/prctl/seccomp_filter.txt > >>>> Documentation/security/00-INDEX > >>>> Documentation/security/IMA-templates.txt > >>>> Documentation/security/LSM.txt > >>>> Documentation/security/LoadPin.txt > >>>> Documentation/security/SELinux.txt > >>>> Documentation/security/Smack.txt > >>>> Documentation/security/Yama.txt > >>>> Documentation/security/apparmor.txt > >>>> Documentation/security/conf.py > >>>> Documentation/security/credentials.txt > >>>> Documentation/security/keys-ecryptfs.txt > >>>> Documentation/security/keys-request-key.txt > >>>> Documentation/security/keys-trusted-encrypted.txt > >>>> Documentation/security/keys.txt > >>>> Documentation/security/self-protection.txt > >>>> Documentation/security/tomoyo.txt > >>>> Documentation/sphinx/convert_template.sed > >>>> Documentation/sphinx/post_convert.sed > >>>> Documentation/sphinx/tmplcvt > >>>> Documentation/usb/typec.rst > >>>> Documentation/usb/usb3-debug-port.rst > >>>> arch/arm/boot/dts/rk1108-evb.dts > >>>> arch/arm/boot/dts/rk1108.dtsi > >>>> arch/arm/boot/dts/tegra20-whistler.dts > >>>> arch/arm/mach-omap2/opp.c > >>>> arch/arm/mach-omap2/pmu.c > >>>> arch/ia64/include/asm/siginfo.h > >>>> arch/m32r/include/uapi/asm/siginfo.h > >>>> arch/microblaze/include/asm/bitops.h > >>>> arch/microblaze/include/asm/bug.h > >>>> arch/microblaze/include/asm/bugs.h > >>>> arch/microblaze/include/asm/div64.h > >>>> arch/microblaze/include/asm/emergency-restart.h > >>>> arch/microblaze/include/asm/fb.h > >>>> arch/microblaze/include/asm/hardirq.h > >>>> arch/microblaze/include/asm/irq_regs.h > >>>> arch/microblaze/include/asm/kdebug.h > >>>> arch/microblaze/include/asm/kmap_types.h > >>>> arch/microblaze/include/asm/linkage.h > >>>> arch/microblaze/include/asm/local.h > >>>> arch/microblaze/include/asm/local64.h > >>>> arch/microblaze/include/asm/parport.h > >>>> arch/microblaze/include/asm/percpu.h > >>>> arch/microblaze/include/asm/serial.h > >>>> arch/microblaze/include/asm/shmparam.h > >>>> arch/microblaze/include/asm/topology.h > >>>> arch/microblaze/include/asm/ucontext.h > >>>> arch/microblaze/include/asm/vga.h > >>>> arch/microblaze/include/asm/xor.h > >>>> arch/microblaze/include/uapi/asm/bitsperlong.h > >>>> arch/microblaze/include/uapi/asm/errno.h > >>>> arch/microblaze/include/uapi/asm/fcntl.h > >>>> arch/microblaze/include/uapi/asm/ioctl.h > >>>> arch/microblaze/include/uapi/asm/ioctls.h > >>>> arch/microblaze/include/uapi/asm/ipcbuf.h > >>>> arch/microblaze/include/uapi/asm/kvm_para.h > >>>> arch/microblaze/include/uapi/asm/mman.h > >>>> arch/microblaze/include/uapi/asm/msgbuf.h > >>>> arch/microblaze/include/uapi/asm/param.h > >>>> arch/microblaze/include/uapi/asm/poll.h > >>>> arch/microblaze/include/uapi/asm/resource.h > >>>> arch/microblaze/include/uapi/asm/sembuf.h > >>>> arch/microblaze/include/uapi/asm/shmbuf.h > >>>> arch/microblaze/include/uapi/asm/siginfo.h > >>>> arch/microblaze/include/uapi/asm/signal.h > >>>> arch/microblaze/includ > >>>> Aborting > >>>> > >>>> > >>>> > >>>> W dniu 2017-09-20 o 11:45, Paweł Staszewski pisze: > >>>>> Ok looks like ending bisection > >>>>> > >>>>> > >>>>> Latest bisected kernel when there is no kernel panic 4.12.0+ (from > >>>>> next) - but only this warning: > >>>>> > >>>>> [ 309.030019] NETDEV WATCHDOG: enp4s0f0 (ixgbe): transmit queue 0 > >>>>> timed out > >>>>> [ 309.030034] ------------[ cut here ]------------ > >>>>> [ 309.030040] WARNING: CPU: 35 PID: 0 at dev_watchdog+0xcf/0x139 > >>>>> [ 309.030041] Modules linked in: bonding ipmi_si > >>>>> x86_pkg_temp_thermal > >>>>> [ 309.030045] CPU: 35 PID: 0 Comm: swapper/35 Not tainted 4.12.0+ #5 > >>>>> [ 309.030046] task: ffff88086d98a000 task.stack: ffffc90003378000 > >>>>> [ 309.030048] RIP: 0010:dev_watchdog+0xcf/0x139 > >>>>> [ 309.030049] RSP: 0018:ffff88087fbc3ea8 EFLAGS: 00010246 > >>>>> [ 309.030050] RAX: 000000000000003d RBX: ffff88046b680000 RCX: > >>>>> 0000000000000000 > >>>>> [ 309.030050] RDX: ffff88087fbd2f01 RSI: 0000000000000000 RDI: > >>>>> ffff88087fbcda08 > >>>>> [ 309.030051] RBP: ffff88087fbc3eb8 R08: 0000000000000000 R09: > >>>>> ffff88087ff80a04 > >>>>> [ 309.030051] R10: 0000000000000000 R11: ffff88086d98a001 R12: > >>>>> 0000000000000000 > >>>>> [ 309.030052] R13: ffff88087fbc3ef8 R14: ffff88086d98a000 R15: > >>>>> ffffffff81c06008 > >>>>> [ 309.030053] FS: 0000000000000000(0000) > >>>>> GS:ffff88087fbc0000(0000) knlGS:0000000000000000 > >>>>> [ 309.030054] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>> [ 309.030054] CR2: 00007fba600f6098 CR3: 000000086b955000 CR4: > >>>>> 00000000001406e0 > >>>>> [ 309.030055] Call Trace: > >>>>> [ 309.030057] <IRQ> > >>>>> [ 309.030059] ? netif_tx_lock+0x79/0x79 > >>>>> [ 309.030062] call_timer_fn.isra.24+0x17/0x77 > >>>>> [ 309.030063] run_timer_softirq+0x118/0x161 > >>>>> [ 309.030065] ? netif_tx_lock+0x79/0x79 > >>>>> [ 309.030066] ? ktime_get+0x2b/0x42 > >>>>> [ 309.030070] ? lapic_next_deadline+0x21/0x27 > >>>>> [ 309.030073] ? clockevents_program_event+0xa8/0xc5 > >>>>> [ 309.030076] __do_softirq+0xa8/0x19d > >>>>> [ 309.030078] irq_exit+0x5d/0x6b > >>>>> [ 309.030079] smp_apic_timer_interrupt+0x2a/0x36 > >>>>> [ 309.030082] apic_timer_interrupt+0x89/0x90 > >>>>> [ 309.030085] RIP: 0010:mwait_idle+0x4e/0x6a > >>>>> [ 309.030086] RSP: 0018:ffffc9000337be98 EFLAGS: 00000246 > >>>>> ORIG_RAX: ffffffffffffff10 > >>>>> [ 309.030087] RAX: 0000000000000000 RBX: 0000000000000000 RCX: > >>>>> 0000000000000000 > >>>>> [ 309.030087] RDX: 0000000000000000 RSI: 0000000000000000 RDI: > >>>>> ffff88086d98a000 > >>>>> [ 309.030088] RBP: ffffc9000337be98 R08: ffff88046f8279a0 R09: > >>>>> ffff88046f827040 > >>>>> [ 309.030089] R10: ffff88086d98a000 R11: ffff88086d98a000 R12: > >>>>> 0000000000000000 > >>>>> [ 309.030089] R13: ffff88086d98a000 R14: ffff88086d98a000 R15: > >>>>> ffff88086d98a000 > >>>>> [ 309.030090] </IRQ> > >>>>> [ 309.030094] arch_cpu_idle+0xa/0xc > >>>>> [ 309.030095] default_idle_call+0x19/0x1b > >>>>> [ 309.030102] do_idle+0xbc/0x196 > >>>>> [ 309.030104] cpu_startup_entry+0x1d/0x20 > >>>>> [ 309.030105] start_secondary+0xd8/0xdc > >>>>> [ 309.030108] secondary_startup_64+0x9f/0x9f > >>>>> [ 309.030109] Code: cc 75 bd eb 35 48 89 df c6 05 c3 dc 74 00 01 > >>>>> e8 3a 62 fe ff 44 89 e1 48 89 de 48 89 c2 48 c7 c7 0f 65 a4 81 31 > >>>>> c0 e8 3d 4c b5 ff <0f> ff 48 8b 83 e0 01 00 00 48 89 df ff 50 78 > >>>>> 48 8b 05 a0 bc 6a > >>>>> [ 309.030128] ---[ end trace 9102cb25703ae2d9 ]--- > >>>>> > >>>>> > >>>>> I just marked it as good - cause this problem above is differend - > >>>>> and im going to: > >>>>> > >>>>> git bisect good > >>>>> Bisecting: 1787 revisions left to test after this (roughly 11 steps) > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> W dniu 2017-09-20 o 10:44, Paweł Staszewski pisze: > >>>>>> Trying to make video from ipmi :) > >>>>>> > >>>>>> with that results: > >>>>>> > >>>>>> https://bugzilla.kernel.org/attachment.cgi?id=258521 > >>>>>> > >>>>>> catched two more lines where it starts - panic from 4.13.2. > >>>>>> > >>>>>> > >>>>>> Now will try tro do some bisection > >>>>>> > >>>>>> > >>>>>> > >>>>>> W dniu 2017-09-20 o 09:58, Paweł Staszewski pisze: > >>>>>>> Hi > >>>>>>> > >>>>>>> > >>>>>>> Will try bisecting tonight > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> W dniu 2017-09-20 o 05:24, Eric Dumazet pisze: > >>>>>>>> On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote: > >>>>>>>>> Just checked kernel 4.13.2 and same problem > >>>>>>>>> > >>>>>>>>> Just after start all 6 bgp sessions - and kernel starts to > >>>>>>>>> learn routes > >>>>>>>>> it panic. > >>>>>>>>> > >>>>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=258509 > >>>>>>>>> > >>>>>>>> > >>>>>>>> Unfortunately we have not enough information from these traces. > >>>>>>>> > >>>>>>>> Can you get a full stack trace ? > >>>>>>>> > >>>>>>>> Alternatively, can you bisect ? > >>>>>>>> > >>>>>>>> Thanks. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>> > >>>> > >>> > >>> > >> > >> > > > > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 13:11 ` Eric Dumazet @ 2017-09-20 13:16 ` Paweł Staszewski 2017-09-20 13:34 ` Eric Dumazet 2017-09-20 17:50 ` Cong Wang 1 sibling, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 13:16 UTC (permalink / raw) To: Eric Dumazet, Wei Wang; +Cc: Linux Kernel Network Developers, edumazet Yes sorry for top-posting also. Configuration: Ethernet devices: lspci | grep Etherne 02:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 02:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 04:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 04:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 07:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 07:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 81:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 81:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 83:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 83:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) ip l 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp2s0f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 8192 link/ether 00:25:90:e4:97:9a brd ff:ff:ff:ff:ff:ff 3: enp2s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 8192 link/ether 00:25:90:e4:97:9b brd ff:ff:ff:ff:ff:ff 4: enp4s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP mode DEFAULT qlen 8192 link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff 5: enp4s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT qlen 8192 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 6: enp7s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP mode DEFAULT qlen 8192 link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff 7: enp7s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT qlen 8192 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 8: enp129s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP mode DEFAULT qlen 8192 link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff 9: enp129s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT qlen 8192 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 10: enp131s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP mode DEFAULT qlen 8192 link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff 11: enp131s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT qlen 8192 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 12: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT qlen 1000 link/sit 0.0.0.0 brd 0.0.0.0 13: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 14: bond1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff 15: vlan4091@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 16: vlan4032@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 17: vlan514@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 18: vlan87@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 19: vlan518@bond1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff 20: vlan646@bond1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff 21: vlan370@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 22: vlan3212@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 23: vlan746@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff There are bonds: cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: load balancing (round-robin) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: enp4s0f1 MII Status: up Speed: 10000 Mbps Duplex: full Link Failure Count: 1 Permanent HW addr: 0c:c4:7a:bc:b8:69 Slave queue ID: 0 Slave Interface: enp7s0f1 MII Status: up Speed: 10000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 00:25:90:e3:dd:9d Slave queue ID: 0 Slave Interface: enp129s0f1 MII Status: up Speed: 10000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 00:25:90:e3:da:e1 Slave queue ID: 0 Slave Interface: enp131s0f1 MII Status: up Speed: 10000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 0c:c4:7a:bc:b1:fd Slave queue ID: 0 cat /proc/net/bonding/bond1 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: load balancing (round-robin) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: enp4s0f0 MII Status: up Speed: 10000 Mbps Duplex: full Link Failure Count: 2 Permanent HW addr: 0c:c4:7a:bc:b8:68 Slave queue ID: 0 Slave Interface: enp7s0f0 MII Status: up Speed: 10000 Mbps Duplex: full Link Failure Count: 1 Permanent HW addr: 00:25:90:e3:dd:9c Slave queue ID: 0 Slave Interface: enp129s0f0 MII Status: up Speed: 10000 Mbps Duplex: full Link Failure Count: 1 Permanent HW addr: 00:25:90:e3:da:e0 Slave queue ID: 0 Slave Interface: enp131s0f0 MII Status: up Speed: 10000 Mbps Duplex: full Link Failure Count: 1 Permanent HW addr: 0c:c4:7a:bc:b1:fc Slave queue ID: 0 About routing - installed frr with bgp/zebra support 6x BGP sessions with full BGP table ~600k prefixes Ando some clients bgp sessions where prefixes from upstreams are advertised. About 20 L3 ipv4 nexthops W dniu 2017-09-20 o 15:11, Eric Dumazet pisze: > Sorry for top-posting, but this is to give context to Wei, since Pawel > used a top posting way to report his bisection. > > Wei, can you take a look at Pawel report ? > > Crash happens in dst_destroy() at following : > > if (dst->dev) > dev_put(dst->dev); <<CRASH>> > > > dst->dev is not NULL, but netdev->pcpu_refcnt is NULL > > 65 ff 08 decl %gs:(%rax) // CRASH since rax = NULL > > > > Pawel, please share your netdevices and routing setup ? > > Thanks ! > > On Wed, 2017-09-20 at 14:49 +0200, Paweł Staszewski wrote: >> And the last one >> >> git bisect good >> Bisecting: 1 revision left to test after this (roughly 1 step) >> [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for >> insertion into fib6 tree >> >> With this have kernel panic same as always >> >> git bisect bad >> Bisecting: 0 revisions left to test after this (roughly 0 steps) >> [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and >> remove the operation of dst_free() >> >> >> >> W dniu 2017-09-20 o 14:23, Paweł Staszewski pisze: >>> Almost there >>> >>> Bisecting: 6 revisions left to test after this (roughly 3 steps) >>> [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() >>> properly >>> >>> >>> >>> W dniu 2017-09-20 o 13:02, Paweł Staszewski pisze: >>>> Ok resumed and soo far: >>>> >>>> Panic: >>>> >>>> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid >>>> using stack larger than 1024. >>>> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f >>>> >>>> No panic: >>>> >>>> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch >>>> 'udp-reduce-cache-pressure' >>>> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 >>>> >>>> >>>> W dniu 2017-09-20 o 12:22, Paweł Staszewski pisze: >>>>> Soo far bisected and marked: >>>>> >>>>> git bisect start >>>>> # bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2 >>>>> git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355 >>>>> # good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13 >>>>> git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269 >>>>> # good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12 >>>>> git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c >>>>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag >>>>> 'pinctrl-v4.13-1' of >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl >>>>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 >>>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch >>>>> 'next' of >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >>>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >>>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch >>>>> 'next' of >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >>>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >>>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch >>>>> 'next' of >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >>>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >>>>> >>>>> >>>>> >>>>> W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze: >>>>>> Ok kernel crashed with different panic that i didnt catch when i >>>>>> was doing bisect and now my bisection is broken :) >>>>>> >>>>>> git bisect good >>>>>> Bisecting: 1787 revisions left to test after this (roughly 11 steps) >>>>>> error: Your local changes to the following files would be >>>>>> overwritten by checkout: >>>>>> Documentation/00-INDEX >>>>>> Documentation/ABI/stable/sysfs-class-udc >>>>>> Documentation/ABI/testing/configfs-usb-gadget-uac1 >>>>>> Documentation/ABI/testing/ima_policy >>>>>> Documentation/ABI/testing/sysfs-bus-iio >>>>>> Documentation/ABI/testing/sysfs-bus-iio-meas-spec >>>>>> Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 >>>>>> Documentation/ABI/testing/sysfs-class-net >>>>>> Documentation/ABI/testing/sysfs-class-power-twl4030 >>>>>> Documentation/ABI/testing/sysfs-class-typec >>>>>> Documentation/DMA-API.txt >>>>>> Documentation/IRQ-domain.txt >>>>>> Documentation/Makefile >>>>>> Documentation/PCI/MSI-HOWTO.txt >>>>>> Documentation/RCU/00-INDEX >>>>>> Documentation/RCU/Design/Requirements/Requirements.html >>>>>> Documentation/RCU/checklist.txt >>>>>> Documentation/admin-guide/README.rst >>>>>> Documentation/admin-guide/devices.txt >>>>>> Documentation/admin-guide/index.rst >>>>>> Documentation/admin-guide/kernel-parameters.txt >>>>>> Documentation/admin-guide/pm/cpufreq.rst >>>>>> Documentation/admin-guide/pm/intel_pstate.rst >>>>>> Documentation/admin-guide/ras.rst >>>>>> Documentation/arm/Atmel/README >>>>>> Documentation/block/biodoc.txt >>>>>> Documentation/conf.py >>>>>> Documentation/core-api/assoc_array.rst >>>>>> Documentation/core-api/atomic_ops.rst >>>>>> Documentation/core-api/index.rst >>>>>> Documentation/crypto/asymmetric-keys.txt >>>>>> Documentation/dev-tools/index.rst >>>>>> Documentation/dev-tools/sparse.rst >>>>>> Documentation/devicetree/bindings/arm/amlogic.txt >>>>>> Documentation/devicetree/bindings/arm/atmel-at91.txt >>>>>> Documentation/devicetree/bindings/arm/ccn.txt >>>>>> Documentation/devicetree/bindings/arm/cpus.txt >>>>>> Documentation/devicetree/bindings/arm/gemini.txt >>>>>> Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt >>>>>> Documentation/devicetree/bindings/arm/keystone/keystone.txt >>>>>> Documentation/devicetree/bindings/arm/mediatek.txt >>>>>> Documentation/devicetree/bindings/arm/rockchip.txt >>>>>> Documentation/devicetree/bindings/arm/shmobile.txt >>>>>> Documentation/devicetree/bindings/arm/tegra.txt >>>>>> Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt >>>>>> Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt >>>>>> Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt >>>>>> Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt >>>>>> Documentation/devicetree/bindings/gpio/gpio_atmel.txt >>>>>> Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt >>>>>> Documentation/devicetree/bindings/iio/adc/renesas,gyroadc.txt >>>>>> Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt >>>>>> Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt >>>>>> Documentation/devicetree/bindings/interrupt-controller/allwinner,sunxi-nmi.txt >>>>>> >>>>>> Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-vic.txt >>>>>> >>>>>> Documentation/devicetree/bindings/interrupt-controller/mediatek,sysirq.txt >>>>>> >>>>>> Documentation/devicetree/bindings/leds/common.txt >>>>>> Documentation/devicetree/bindings/mfd/hi6421.txt >>>>>> Documentation/devicetree/bindings/mfd/tps65910.txt >>>>>> Documentation/devicetree/bindings/mmc/fsl-esdhc.txt >>>>>> Documentation/devicetree/bindings/mmc/k3-dw-mshc.txt >>>>>> Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt >>>>>> Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt >>>>>> Documentation/devicetree/bindings/mtd/atmel-nand.txt >>>>>> Documentation/devicetree/bindings/net/dsa/b53.txt >>>>>> Documentation/devicetree/bindings/net/ethernet.txt >>>>>> Documentation/devicetree/bindings/net/macb.txt >>>>>> Documentation/devicetree/bindings/net/marvell-orion-mdio.txt >>>>>> Documentation/devicetree/bindings/net/ti,wilink-st.txt >>>>>> Documentation/devicetree/bindings/net/wireless/ti,wlcore.txt >>>>>> Documentation/devicetree/bindings/nvmem/rockchip-efuse.txt >>>>>> Documentation/devicetree/bindings/opp/opp.txt >>>>>> Documentation/devicetree/bindings/phy/bcm-ns-usb3-phy.txt >>>>>> Documentation/devicetree/bindings/phy/brcm-sata-phy.txt >>>>>> Documentation/devicetree/bindings/phy/meson8b-usb2-phy.txt >>>>>> Documentation/devicetree/bindings/phy/phy-rockchip-inno-usb2.txt >>>>>> Documentation/devicetree/bindings/power/rockchip-io-domain.txt >>>>>> Documentation/devicetree/bindings/power/supply/bq27xxx.txt >>>>>> Documentation/devicetree/bindings/property-units.txt >>>>>> Documentation/devicetree/bindings/regulator/regulator.txt >>>>>> Documentation/devicetree/bindings/serial/8 >>>>>> error: The following untracked working tree files would be >>>>>> overwritten by checkout: >>>>>> Documentation/ABI/testing/sysfs-class-net-phydev >>>>>> Documentation/DocBook/.gitignore >>>>>> Documentation/DocBook/Makefile >>>>>> Documentation/DocBook/filesystems.tmpl >>>>>> Documentation/DocBook/kernel-hacking.tmpl >>>>>> Documentation/DocBook/kernel-locking.tmpl >>>>>> Documentation/DocBook/kgdb.tmpl >>>>>> Documentation/DocBook/libata.tmpl >>>>>> Documentation/DocBook/librs.tmpl >>>>>> Documentation/DocBook/lsm.tmpl >>>>>> Documentation/DocBook/mtdnand.tmpl >>>>>> Documentation/DocBook/networking.tmpl >>>>>> Documentation/DocBook/rapidio.tmpl >>>>>> Documentation/DocBook/s390-drivers.tmpl >>>>>> Documentation/DocBook/scsi.tmpl >>>>>> Documentation/DocBook/sh.tmpl >>>>>> Documentation/DocBook/stylesheet.xsl >>>>>> Documentation/DocBook/w1.tmpl >>>>>> Documentation/DocBook/z8530book.tmpl >>>>>> Documentation/Makefile.sphinx >>>>>> Documentation/RCU/trace.txt >>>>>> Documentation/devicetree/bindings/i2c/i2c-mt6577.txt >>>>>> Documentation/devicetree/bindings/misc/allwinner,syscon.txt >>>>>> Documentation/devicetree/bindings/net/cortina.txt >>>>>> Documentation/devicetree/bindings/net/dsa/ksz.txt >>>>>> Documentation/devicetree/bindings/net/dwmac-sun8i.txt >>>>>> Documentation/devicetree/bindings/net/qca,qca7000.txt >>>>>> Documentation/devicetree/bindings/power/max8903-charger.txt >>>>>> Documentation/devicetree/bindings/power_supply/maxim,max14656.txt >>>>>> Documentation/devicetree/bindings/ptp/brcm,ptp-dte.txt >>>>>> Documentation/devicetree/bindings/timer/moxa,moxart-timer.txt >>>>>> Documentation/doc-guide/docbook.rst >>>>>> Documentation/networking/tls.txt >>>>>> Documentation/prctl/no_new_privs.txt >>>>>> Documentation/prctl/seccomp_filter.txt >>>>>> Documentation/security/00-INDEX >>>>>> Documentation/security/IMA-templates.txt >>>>>> Documentation/security/LSM.txt >>>>>> Documentation/security/LoadPin.txt >>>>>> Documentation/security/SELinux.txt >>>>>> Documentation/security/Smack.txt >>>>>> Documentation/security/Yama.txt >>>>>> Documentation/security/apparmor.txt >>>>>> Documentation/security/conf.py >>>>>> Documentation/security/credentials.txt >>>>>> Documentation/security/keys-ecryptfs.txt >>>>>> Documentation/security/keys-request-key.txt >>>>>> Documentation/security/keys-trusted-encrypted.txt >>>>>> Documentation/security/keys.txt >>>>>> Documentation/security/self-protection.txt >>>>>> Documentation/security/tomoyo.txt >>>>>> Documentation/sphinx/convert_template.sed >>>>>> Documentation/sphinx/post_convert.sed >>>>>> Documentation/sphinx/tmplcvt >>>>>> Documentation/usb/typec.rst >>>>>> Documentation/usb/usb3-debug-port.rst >>>>>> arch/arm/boot/dts/rk1108-evb.dts >>>>>> arch/arm/boot/dts/rk1108.dtsi >>>>>> arch/arm/boot/dts/tegra20-whistler.dts >>>>>> arch/arm/mach-omap2/opp.c >>>>>> arch/arm/mach-omap2/pmu.c >>>>>> arch/ia64/include/asm/siginfo.h >>>>>> arch/m32r/include/uapi/asm/siginfo.h >>>>>> arch/microblaze/include/asm/bitops.h >>>>>> arch/microblaze/include/asm/bug.h >>>>>> arch/microblaze/include/asm/bugs.h >>>>>> arch/microblaze/include/asm/div64.h >>>>>> arch/microblaze/include/asm/emergency-restart.h >>>>>> arch/microblaze/include/asm/fb.h >>>>>> arch/microblaze/include/asm/hardirq.h >>>>>> arch/microblaze/include/asm/irq_regs.h >>>>>> arch/microblaze/include/asm/kdebug.h >>>>>> arch/microblaze/include/asm/kmap_types.h >>>>>> arch/microblaze/include/asm/linkage.h >>>>>> arch/microblaze/include/asm/local.h >>>>>> arch/microblaze/include/asm/local64.h >>>>>> arch/microblaze/include/asm/parport.h >>>>>> arch/microblaze/include/asm/percpu.h >>>>>> arch/microblaze/include/asm/serial.h >>>>>> arch/microblaze/include/asm/shmparam.h >>>>>> arch/microblaze/include/asm/topology.h >>>>>> arch/microblaze/include/asm/ucontext.h >>>>>> arch/microblaze/include/asm/vga.h >>>>>> arch/microblaze/include/asm/xor.h >>>>>> arch/microblaze/include/uapi/asm/bitsperlong.h >>>>>> arch/microblaze/include/uapi/asm/errno.h >>>>>> arch/microblaze/include/uapi/asm/fcntl.h >>>>>> arch/microblaze/include/uapi/asm/ioctl.h >>>>>> arch/microblaze/include/uapi/asm/ioctls.h >>>>>> arch/microblaze/include/uapi/asm/ipcbuf.h >>>>>> arch/microblaze/include/uapi/asm/kvm_para.h >>>>>> arch/microblaze/include/uapi/asm/mman.h >>>>>> arch/microblaze/include/uapi/asm/msgbuf.h >>>>>> arch/microblaze/include/uapi/asm/param.h >>>>>> arch/microblaze/include/uapi/asm/poll.h >>>>>> arch/microblaze/include/uapi/asm/resource.h >>>>>> arch/microblaze/include/uapi/asm/sembuf.h >>>>>> arch/microblaze/include/uapi/asm/shmbuf.h >>>>>> arch/microblaze/include/uapi/asm/siginfo.h >>>>>> arch/microblaze/include/uapi/asm/signal.h >>>>>> arch/microblaze/includ >>>>>> Aborting >>>>>> >>>>>> >>>>>> >>>>>> W dniu 2017-09-20 o 11:45, Paweł Staszewski pisze: >>>>>>> Ok looks like ending bisection >>>>>>> >>>>>>> >>>>>>> Latest bisected kernel when there is no kernel panic 4.12.0+ (from >>>>>>> next) - but only this warning: >>>>>>> >>>>>>> [ 309.030019] NETDEV WATCHDOG: enp4s0f0 (ixgbe): transmit queue 0 >>>>>>> timed out >>>>>>> [ 309.030034] ------------[ cut here ]------------ >>>>>>> [ 309.030040] WARNING: CPU: 35 PID: 0 at dev_watchdog+0xcf/0x139 >>>>>>> [ 309.030041] Modules linked in: bonding ipmi_si >>>>>>> x86_pkg_temp_thermal >>>>>>> [ 309.030045] CPU: 35 PID: 0 Comm: swapper/35 Not tainted 4.12.0+ #5 >>>>>>> [ 309.030046] task: ffff88086d98a000 task.stack: ffffc90003378000 >>>>>>> [ 309.030048] RIP: 0010:dev_watchdog+0xcf/0x139 >>>>>>> [ 309.030049] RSP: 0018:ffff88087fbc3ea8 EFLAGS: 00010246 >>>>>>> [ 309.030050] RAX: 000000000000003d RBX: ffff88046b680000 RCX: >>>>>>> 0000000000000000 >>>>>>> [ 309.030050] RDX: ffff88087fbd2f01 RSI: 0000000000000000 RDI: >>>>>>> ffff88087fbcda08 >>>>>>> [ 309.030051] RBP: ffff88087fbc3eb8 R08: 0000000000000000 R09: >>>>>>> ffff88087ff80a04 >>>>>>> [ 309.030051] R10: 0000000000000000 R11: ffff88086d98a001 R12: >>>>>>> 0000000000000000 >>>>>>> [ 309.030052] R13: ffff88087fbc3ef8 R14: ffff88086d98a000 R15: >>>>>>> ffffffff81c06008 >>>>>>> [ 309.030053] FS: 0000000000000000(0000) >>>>>>> GS:ffff88087fbc0000(0000) knlGS:0000000000000000 >>>>>>> [ 309.030054] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>> [ 309.030054] CR2: 00007fba600f6098 CR3: 000000086b955000 CR4: >>>>>>> 00000000001406e0 >>>>>>> [ 309.030055] Call Trace: >>>>>>> [ 309.030057] <IRQ> >>>>>>> [ 309.030059] ? netif_tx_lock+0x79/0x79 >>>>>>> [ 309.030062] call_timer_fn.isra.24+0x17/0x77 >>>>>>> [ 309.030063] run_timer_softirq+0x118/0x161 >>>>>>> [ 309.030065] ? netif_tx_lock+0x79/0x79 >>>>>>> [ 309.030066] ? ktime_get+0x2b/0x42 >>>>>>> [ 309.030070] ? lapic_next_deadline+0x21/0x27 >>>>>>> [ 309.030073] ? clockevents_program_event+0xa8/0xc5 >>>>>>> [ 309.030076] __do_softirq+0xa8/0x19d >>>>>>> [ 309.030078] irq_exit+0x5d/0x6b >>>>>>> [ 309.030079] smp_apic_timer_interrupt+0x2a/0x36 >>>>>>> [ 309.030082] apic_timer_interrupt+0x89/0x90 >>>>>>> [ 309.030085] RIP: 0010:mwait_idle+0x4e/0x6a >>>>>>> [ 309.030086] RSP: 0018:ffffc9000337be98 EFLAGS: 00000246 >>>>>>> ORIG_RAX: ffffffffffffff10 >>>>>>> [ 309.030087] RAX: 0000000000000000 RBX: 0000000000000000 RCX: >>>>>>> 0000000000000000 >>>>>>> [ 309.030087] RDX: 0000000000000000 RSI: 0000000000000000 RDI: >>>>>>> ffff88086d98a000 >>>>>>> [ 309.030088] RBP: ffffc9000337be98 R08: ffff88046f8279a0 R09: >>>>>>> ffff88046f827040 >>>>>>> [ 309.030089] R10: ffff88086d98a000 R11: ffff88086d98a000 R12: >>>>>>> 0000000000000000 >>>>>>> [ 309.030089] R13: ffff88086d98a000 R14: ffff88086d98a000 R15: >>>>>>> ffff88086d98a000 >>>>>>> [ 309.030090] </IRQ> >>>>>>> [ 309.030094] arch_cpu_idle+0xa/0xc >>>>>>> [ 309.030095] default_idle_call+0x19/0x1b >>>>>>> [ 309.030102] do_idle+0xbc/0x196 >>>>>>> [ 309.030104] cpu_startup_entry+0x1d/0x20 >>>>>>> [ 309.030105] start_secondary+0xd8/0xdc >>>>>>> [ 309.030108] secondary_startup_64+0x9f/0x9f >>>>>>> [ 309.030109] Code: cc 75 bd eb 35 48 89 df c6 05 c3 dc 74 00 01 >>>>>>> e8 3a 62 fe ff 44 89 e1 48 89 de 48 89 c2 48 c7 c7 0f 65 a4 81 31 >>>>>>> c0 e8 3d 4c b5 ff <0f> ff 48 8b 83 e0 01 00 00 48 89 df ff 50 78 >>>>>>> 48 8b 05 a0 bc 6a >>>>>>> [ 309.030128] ---[ end trace 9102cb25703ae2d9 ]--- >>>>>>> >>>>>>> >>>>>>> I just marked it as good - cause this problem above is differend - >>>>>>> and im going to: >>>>>>> >>>>>>> git bisect good >>>>>>> Bisecting: 1787 revisions left to test after this (roughly 11 steps) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> W dniu 2017-09-20 o 10:44, Paweł Staszewski pisze: >>>>>>>> Trying to make video from ipmi :) >>>>>>>> >>>>>>>> with that results: >>>>>>>> >>>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=258521 >>>>>>>> >>>>>>>> catched two more lines where it starts - panic from 4.13.2. >>>>>>>> >>>>>>>> >>>>>>>> Now will try tro do some bisection >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> W dniu 2017-09-20 o 09:58, Paweł Staszewski pisze: >>>>>>>>> Hi >>>>>>>>> >>>>>>>>> >>>>>>>>> Will try bisecting tonight >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> W dniu 2017-09-20 o 05:24, Eric Dumazet pisze: >>>>>>>>>> On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote: >>>>>>>>>>> Just checked kernel 4.13.2 and same problem >>>>>>>>>>> >>>>>>>>>>> Just after start all 6 bgp sessions - and kernel starts to >>>>>>>>>>> learn routes >>>>>>>>>>> it panic. >>>>>>>>>>> >>>>>>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=258509 >>>>>>>>>>> >>>>>>>>>> Unfortunately we have not enough information from these traces. >>>>>>>>>> >>>>>>>>>> Can you get a full stack trace ? >>>>>>>>>> >>>>>>>>>> Alternatively, can you bisect ? >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 13:16 ` Paweł Staszewski @ 2017-09-20 13:34 ` Eric Dumazet 2017-09-20 13:37 ` Eric Dumazet 2017-09-20 13:39 ` Paweł Staszewski 0 siblings, 2 replies; 52+ messages in thread From: Eric Dumazet @ 2017-09-20 13:34 UTC (permalink / raw) To: Paweł Staszewski; +Cc: Wei Wang, Linux Kernel Network Developers, edumazet Could you try this debug patch ? diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3331,7 +3331,14 @@ void netdev_run_todo(void); */ static inline void dev_put(struct net_device *dev) { - this_cpu_dec(*dev->pcpu_refcnt); + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); + + if (!pref) { + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n", + dev, dev->name, dev->reg_state, dev->dismantle); + BUG(); + } + this_cpu_dec(*pref); } /** ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 13:34 ` Eric Dumazet @ 2017-09-20 13:37 ` Eric Dumazet 2017-09-20 13:39 ` Paweł Staszewski 1 sibling, 0 replies; 52+ messages in thread From: Eric Dumazet @ 2017-09-20 13:37 UTC (permalink / raw) To: Paweł Staszewski; +Cc: Wei Wang, Linux Kernel Network Developers, edumazet On Wed, 2017-09-20 at 06:34 -0700, Eric Dumazet wrote: > Could you try this debug patch ? > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -3331,7 +3331,14 @@ void netdev_run_todo(void); > */ > static inline void dev_put(struct net_device *dev) > { > - this_cpu_dec(*dev->pcpu_refcnt); > + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); > + > + if (!pref) { > + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n", > + dev, dev->name, dev->reg_state, dev->dismantle); > + BUG(); > + } > + this_cpu_dec(*pref); > } > > /** > And since the console will be filled by stack trace, maybe instead of BUG() use some infinite loop ? for (;;) cpu_relax(); ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 13:34 ` Eric Dumazet 2017-09-20 13:37 ` Eric Dumazet @ 2017-09-20 13:39 ` Paweł Staszewski 2017-09-20 13:44 ` Eric Dumazet 1 sibling, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 13:39 UTC (permalink / raw) To: Eric Dumazet; +Cc: Wei Wang, Linux Kernel Network Developers, edumazet W dniu 2017-09-20 o 15:34, Eric Dumazet pisze: > Could you try this debug patch ? > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -3331,7 +3331,14 @@ void netdev_run_todo(void); > */ > static inline void dev_put(struct net_device *dev) > { > - this_cpu_dec(*dev->pcpu_refcnt); > + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); > + > + if (!pref) { > + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n", > + dev, dev->name, dev->reg_state, dev->dismantle); > + BUG(); > + } > + this_cpu_dec(*pref); > } > > /** > > > You want me to add this patch to what kernel version ? currently im after git bisect reset - so mainline stable ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 13:39 ` Paweł Staszewski @ 2017-09-20 13:44 ` Eric Dumazet 2017-09-20 14:03 ` Paweł Staszewski 0 siblings, 1 reply; 52+ messages in thread From: Eric Dumazet @ 2017-09-20 13:44 UTC (permalink / raw) To: Paweł Staszewski; +Cc: Wei Wang, Linux Kernel Network Developers, edumazet On Wed, 2017-09-20 at 15:39 +0200, Paweł Staszewski wrote: > > W dniu 2017-09-20 o 15:34, Eric Dumazet pisze: > > Could you try this debug patch ? > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82 100644 > > --- a/include/linux/netdevice.h > > +++ b/include/linux/netdevice.h > > @@ -3331,7 +3331,14 @@ void netdev_run_todo(void); > > */ > > static inline void dev_put(struct net_device *dev) > > { > > - this_cpu_dec(*dev->pcpu_refcnt); > > + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); > > + > > + if (!pref) { > > + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n", > > + dev, dev->name, dev->reg_state, dev->dismantle); > > + BUG(); > > + } > > + this_cpu_dec(*pref); > > } > > > > /** > > > > > > > > You want me to add this patch to what kernel version ? > currently im after git bisect reset - so mainline stable > Simply us the latest net-next as mentioned in the thread title, thanks. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 13:44 ` Eric Dumazet @ 2017-09-20 14:03 ` Paweł Staszewski 2017-09-20 14:40 ` Eric Dumazet 0 siblings, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 14:03 UTC (permalink / raw) To: Eric Dumazet; +Cc: Wei Wang, Linux Kernel Network Developers, edumazet Nit much more after adding this patch https://bugzilla.kernel.org/attachment.cgi?id=258529 W dniu 2017-09-20 o 15:44, Eric Dumazet pisze: > On Wed, 2017-09-20 at 15:39 +0200, Paweł Staszewski wrote: >> W dniu 2017-09-20 o 15:34, Eric Dumazet pisze: >>> Could you try this debug patch ? >>> >>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>> index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82 100644 >>> --- a/include/linux/netdevice.h >>> +++ b/include/linux/netdevice.h >>> @@ -3331,7 +3331,14 @@ void netdev_run_todo(void); >>> */ >>> static inline void dev_put(struct net_device *dev) >>> { >>> - this_cpu_dec(*dev->pcpu_refcnt); >>> + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); >>> + >>> + if (!pref) { >>> + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n", >>> + dev, dev->name, dev->reg_state, dev->dismantle); >>> + BUG(); >>> + } >>> + this_cpu_dec(*pref); >>> } >>> >>> /** >>> >>> >>> >> You want me to add this patch to what kernel version ? >> currently im after git bisect reset - so mainline stable >> > Simply us the latest net-next as mentioned in the thread title, thanks. > > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 14:03 ` Paweł Staszewski @ 2017-09-20 14:40 ` Eric Dumazet 2017-09-20 15:05 ` Paweł Staszewski 0 siblings, 1 reply; 52+ messages in thread From: Eric Dumazet @ 2017-09-20 14:40 UTC (permalink / raw) To: Paweł Staszewski; +Cc: Wei Wang, Linux Kernel Network Developers, edumazet On Wed, 2017-09-20 at 16:03 +0200, Paweł Staszewski wrote: > Nit much more after adding this patch > > https://bugzilla.kernel.org/attachment.cgi?id=258529 > This is why I suggested to replace the BUG() in another mail So : diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3331,7 +3331,15 @@ void netdev_run_todo(void); */ static inline void dev_put(struct net_device *dev) { - this_cpu_dec(*dev->pcpu_refcnt); + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); + + if (!pref) { + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n", + dev, dev->name, dev->reg_state, dev->dismantle); + for (;;) + cpu_relax(); + } + this_cpu_dec(*pref); } /** ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 14:40 ` Eric Dumazet @ 2017-09-20 15:05 ` Paweł Staszewski 2017-09-20 17:46 ` Wei Wang 0 siblings, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 15:05 UTC (permalink / raw) To: Eric Dumazet; +Cc: Wei Wang, Linux Kernel Network Developers, edumazet W dniu 2017-09-20 o 16:40, Eric Dumazet pisze: > On Wed, 2017-09-20 at 16:03 +0200, Paweł Staszewski wrote: >> Nit much more after adding this patch >> >> https://bugzilla.kernel.org/attachment.cgi?id=258529 >> > This is why I suggested to replace the BUG() in another mail > > So : > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -3331,7 +3331,15 @@ void netdev_run_todo(void); > */ > static inline void dev_put(struct net_device *dev) > { > - this_cpu_dec(*dev->pcpu_refcnt); > + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); > + > + if (!pref) { > + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n", > + dev, dev->name, dev->reg_state, dev->dismantle); > + for (;;) > + cpu_relax(); > + } > + this_cpu_dec(*pref); > } > > /** > > > Full panic https://bugzilla.kernel.org/attachment.cgi?id=258531 I will change patch and apply but later today cause now cant use backup router as testlab - Internet rush hours if something happens this will be bed when second router will have bugged kernel :) ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 15:05 ` Paweł Staszewski @ 2017-09-20 17:46 ` Wei Wang 2017-09-20 17:58 ` Paweł Staszewski 0 siblings, 1 reply; 52+ messages in thread From: Wei Wang @ 2017-09-20 17:46 UTC (permalink / raw) To: Paweł Staszewski Cc: Eric Dumazet, Linux Kernel Network Developers, Eric Dumazet >> This is why I suggested to replace the BUG() in another mail >> >> So : >> >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >> index >> f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74 >> 100644 >> --- a/include/linux/netdevice.h >> +++ b/include/linux/netdevice.h >> @@ -3331,7 +3331,15 @@ void netdev_run_todo(void); >> */ >> static inline void dev_put(struct net_device *dev) >> { >> - this_cpu_dec(*dev->pcpu_refcnt); >> + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); >> + >> + if (!pref) { >> + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle >> %d\n", >> + dev, dev->name, dev->reg_state, dev->dismantle); >> + for (;;) >> + cpu_relax(); >> + } >> + this_cpu_dec(*pref); >> } >> /** >> Thanks a lot Eric for the debug patch. Pawel, I want to confirm with you about the last good commit when you did bisection. You mentioned: > And the last one > > git bisect good > Bisecting: 1 revision left to test after this (roughly 1 step) > [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for > insertion into fib6 tree > > With this have kernel panic same as always > > git bisect bad > Bisecting: 0 revisions left to test after this (roughly 0 steps) > [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and > remove the operation of dst_free() So it breaks right at: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() Right? If you sync the image to one commit before the above one: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call dst_hold_safe() properly Does it crash? And could you confirm that your config does not have any IPv6 addresses or routes configured? Thanks. Wei 6:03 +0200, Paweł Staszewski wrote: >>> >>> Nit much more after adding this patch >>> >>> https://bugzilla.kernel.org/attachment.cgi?id=258529 >>> >> This is why I suggested to replace the BUG() in another mail >> >> So : >> >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >> index >> f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74 >> 100644 >> --- a/include/linux/netdevice.h >> +++ b/include/linux/netdevice.h >> @@ -3331,7 +3331,15 @@ void netdev_run_todo(void); >> */ >> static inline void dev_put(struct net_device *dev) >> { >> - this_cpu_dec(*dev->pcpu_refcnt); >> + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); >> + >> + if (!pref) { >> + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle >> %d\n", >> + dev, dev->name, dev->reg_state, dev->dismantle); >> + for (;;) >> + cpu_relax(); >> + } >> + this_cpu_dec(*pref); >> } >> /** >> >> >> > > Full panic > > https://bugzilla.kernel.org/attachment.cgi?id=258531 > > > I will change patch and apply but later today cause now cant use backup > router as testlab - Internet rush hours if something happens this will be > bed when second router will have bugged kernel :) > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 17:46 ` Wei Wang @ 2017-09-20 17:58 ` Paweł Staszewski 0 siblings, 0 replies; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 17:58 UTC (permalink / raw) To: Wei Wang; +Cc: Eric Dumazet, Linux Kernel Network Developers, Eric Dumazet W dniu 2017-09-20 o 19:46, Wei Wang pisze: >>> This is why I suggested to replace the BUG() in another mail >>> >>> So : >>> >>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>> index >>> f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74 >>> 100644 >>> --- a/include/linux/netdevice.h >>> +++ b/include/linux/netdevice.h >>> @@ -3331,7 +3331,15 @@ void netdev_run_todo(void); >>> */ >>> static inline void dev_put(struct net_device *dev) >>> { >>> - this_cpu_dec(*dev->pcpu_refcnt); >>> + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); >>> + >>> + if (!pref) { >>> + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle >>> %d\n", >>> + dev, dev->name, dev->reg_state, dev->dismantle); >>> + for (;;) >>> + cpu_relax(); >>> + } >>> + this_cpu_dec(*pref); >>> } >>> /** >>> > Thanks a lot Eric for the debug patch. > > Pawel, > > I want to confirm with you about the last good commit when you did bisection. > You mentioned: > >> And the last one >> >> git bisect good >> Bisecting: 1 revision left to test after this (roughly 1 step) >> [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for >> insertion into fib6 tree >> >> With this have kernel panic same as always >> >> git bisect bad >> Bisecting: 0 revisions left to test after this (roughly 0 steps) >> [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and >> remove the operation of dst_free() > > So it breaks right at: > [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and > remove the operation of dst_free() > Right? > If you sync the image to one commit before the above one: > [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call dst_hold_safe() properly > Does it crash? Later today i will repeat last three steps - in about next 3 hours after rush hours of internet traffic - now i cant touch backup router :) > > And could you confirm that your config does not have any IPv6 > addresses or routes configured? There is ipv6 enabled And yes there are some ipv6 ip's One interface have ipv6 enabled with one static route but no ipv6 bgp sessions - so nt many ipv6 prefixes and ipv6 fib is almost empty ip -6 r ls | wc -l 57 > > Thanks. > Wei > > > 6:03 +0200, Paweł Staszewski wrote: >>>> Nit much more after adding this patch >>>> >>>> https://bugzilla.kernel.org/attachment.cgi?id=258529 >>>> >>> This is why I suggested to replace the BUG() in another mail >>> >>> So : >>> >>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>> index >>> f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74 >>> 100644 >>> --- a/include/linux/netdevice.h >>> +++ b/include/linux/netdevice.h >>> @@ -3331,7 +3331,15 @@ void netdev_run_todo(void); >>> */ >>> static inline void dev_put(struct net_device *dev) >>> { >>> - this_cpu_dec(*dev->pcpu_refcnt); >>> + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); >>> + >>> + if (!pref) { >>> + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle >>> %d\n", >>> + dev, dev->name, dev->reg_state, dev->dismantle); >>> + for (;;) >>> + cpu_relax(); >>> + } >>> + this_cpu_dec(*pref); >>> } >>> /** >>> >>> >>> >> Full panic >> >> https://bugzilla.kernel.org/attachment.cgi?id=258531 >> >> >> I will change patch and apply but later today cause now cant use backup >> router as testlab - Internet rush hours if something happens this will be >> bed when second router will have bugged kernel :) >> >> ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 13:11 ` Eric Dumazet 2017-09-20 13:16 ` Paweł Staszewski @ 2017-09-20 17:50 ` Cong Wang 2017-09-20 17:59 ` Eric Dumazet [not found] ` <3c227be7-a954-a406-1987-24e908cf214c@itcare.pl> 1 sibling, 2 replies; 52+ messages in thread From: Cong Wang @ 2017-09-20 17:50 UTC (permalink / raw) To: Eric Dumazet Cc: Paweł Staszewski, Wei Wang, Linux Kernel Network Developers, Eric Dumazet On Wed, Sep 20, 2017 at 6:11 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Sorry for top-posting, but this is to give context to Wei, since Pawel > used a top posting way to report his bisection. > > Wei, can you take a look at Pawel report ? > > Crash happens in dst_destroy() at following : > > if (dst->dev) > dev_put(dst->dev); <<CRASH>> > > > dst->dev is not NULL, but netdev->pcpu_refcnt is NULL > > 65 ff 08 decl %gs:(%rax) // CRASH since rax = NULL > > > > Pawel, please share your netdevices and routing setup ? Looks like a double dev_put() on some dev... Pawel, do you have any idea how this is triggered? Does your test try to remove some network device? If so which one? I noticed you have at least multiple vlan, bond and ixgbe devices. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 17:50 ` Cong Wang @ 2017-09-20 17:59 ` Eric Dumazet [not found] ` <3c227be7-a954-a406-1987-24e908cf214c@itcare.pl> 1 sibling, 0 replies; 52+ messages in thread From: Eric Dumazet @ 2017-09-20 17:59 UTC (permalink / raw) To: Cong Wang Cc: Paweł Staszewski, Wei Wang, Linux Kernel Network Developers, Eric Dumazet On Wed, 2017-09-20 at 10:50 -0700, Cong Wang wrote: > On Wed, Sep 20, 2017 at 6:11 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > Sorry for top-posting, but this is to give context to Wei, since Pawel > > used a top posting way to report his bisection. > > > > Wei, can you take a look at Pawel report ? > > > > Crash happens in dst_destroy() at following : > > > > if (dst->dev) > > dev_put(dst->dev); <<CRASH>> > > > > > > dst->dev is not NULL, but netdev->pcpu_refcnt is NULL > > > > 65 ff 08 decl %gs:(%rax) // CRASH since rax = NULL > > > > > > > > Pawel, please share your netdevices and routing setup ? > > Looks like a double dev_put() on some dev... > > Pawel, do you have any idea how this is triggered? Does your > test try to remove some network device? If so which one? > I noticed you have at least multiple vlan, bond and ixgbe > devices. Or a missing dev_hold() somewhere. ^ permalink raw reply [flat|nested] 52+ messages in thread
[parent not found: <3c227be7-a954-a406-1987-24e908cf214c@itcare.pl>]
* Re: Latest net-next from GIT panic [not found] ` <3c227be7-a954-a406-1987-24e908cf214c@itcare.pl> @ 2017-09-20 18:22 ` Cong Wang 2017-09-20 18:30 ` Eric Dumazet 0 siblings, 1 reply; 52+ messages in thread From: Cong Wang @ 2017-09-20 18:22 UTC (permalink / raw) To: Paweł Staszewski Cc: Eric Dumazet, Wei Wang, Linux Kernel Network Developers, Eric Dumazet On Wed, Sep 20, 2017 at 10:55 AM, Paweł Staszewski <pstaszewski@itcare.pl> wrote: > > > W dniu 2017-09-20 o 19:50, Cong Wang pisze: > > On Wed, Sep 20, 2017 at 6:11 AM, Eric Dumazet <eric.dumazet@gmail.com> > wrote: > > Sorry for top-posting, but this is to give context to Wei, since Pawel > used a top posting way to report his bisection. > > Wei, can you take a look at Pawel report ? > > Crash happens in dst_destroy() at following : > > if (dst->dev) > dev_put(dst->dev); <<CRASH>> > > > dst->dev is not NULL, but netdev->pcpu_refcnt is NULL > > 65 ff 08 decl %gs:(%rax) // CRASH since rax = NULL > > > > Pawel, please share your netdevices and routing setup ? > > Looks like a double dev_put() on some dev... > > Pawel, do you have any idea how this is triggered? Does your > test try to remove some network device? If so which one? > I noticed you have at least multiple vlan, bond and ixgbe > devices. > > Just after i start bgp sessions > So when host is starting i have all bgp sessions to upstreams shutdown > > To trigger panic i just enable all 6x bgp sessions at once to upstreams - > and zebra is start to pull prefixes and push them to the kernel > > Then some traffic is generated from test hosts thru this backup router and > panic is generated - every time after 10 to 15 seconds after bgp sessions > are connected. > > I'm not removing any interface at this time or do anything with interfaces - > just wait. > > And yes there are vlans attached to the bond devices > but dmesg at this time shows nothing about interfaces or flaps. This is very odd. We only free netdevice in free_netdev() and it is only called when we unregister a netdevice. Otherwise pcpu_refcnt is impossible to be NULL. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 18:22 ` Cong Wang @ 2017-09-20 18:30 ` Eric Dumazet 2017-09-20 18:36 ` Cong Wang 0 siblings, 1 reply; 52+ messages in thread From: Eric Dumazet @ 2017-09-20 18:30 UTC (permalink / raw) To: Cong Wang Cc: Paweł Staszewski, Wei Wang, Linux Kernel Network Developers, Eric Dumazet On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: > but dmesg at this time shows nothing about interfaces or flaps. > > This is very odd. > > We only free netdevice in free_netdev() and it is only called when > we unregister a netdevice. Otherwise pcpu_refcnt is impossible > to be NULL. If there is a missing dev_hold() or one dev_put() in excess, this would allow the netdev to be freed too soon. -> Use after free. memory holding netdev could be reallocated-cleared by some other kernel user. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 18:30 ` Eric Dumazet @ 2017-09-20 18:36 ` Cong Wang 2017-09-20 19:13 ` Paweł Staszewski 0 siblings, 1 reply; 52+ messages in thread From: Cong Wang @ 2017-09-20 18:36 UTC (permalink / raw) To: Eric Dumazet Cc: Paweł Staszewski, Wei Wang, Linux Kernel Network Developers, Eric Dumazet On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: >> but dmesg at this time shows nothing about interfaces or flaps. >> >> This is very odd. >> >> We only free netdevice in free_netdev() and it is only called when >> we unregister a netdevice. Otherwise pcpu_refcnt is impossible >> to be NULL. > > If there is a missing dev_hold() or one dev_put() in excess, > this would allow the netdev to be freed too soon. > > -> Use after free. > memory holding netdev could be reallocated-cleared by some other kernel > user. > Sure, but only unregister could trigger a free. If there is no unregister, like what Pawel claims, then there is no free, the refcnt just goes to 0 but the memory is still there. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 18:36 ` Cong Wang @ 2017-09-20 19:13 ` Paweł Staszewski 2017-09-20 19:23 ` Paweł Staszewski 0 siblings, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 19:13 UTC (permalink / raw) To: Cong Wang, Eric Dumazet Cc: Wei Wang, Linux Kernel Network Developers, Eric Dumazet W dniu 2017-09-20 o 20:36, Cong Wang pisze: > On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: >>> but dmesg at this time shows nothing about interfaces or flaps. >>> >>> This is very odd. >>> >>> We only free netdevice in free_netdev() and it is only called when >>> we unregister a netdevice. Otherwise pcpu_refcnt is impossible >>> to be NULL. >> If there is a missing dev_hold() or one dev_put() in excess, >> this would allow the netdev to be freed too soon. >> >> -> Use after free. >> memory holding netdev could be reallocated-cleared by some other kernel >> user. >> > Sure, but only unregister could trigger a free. If there is no unregister, > like what Pawel claims, then there is no free, the refcnt just goes to > 0 but the memory is still there. > About possible mistake from my side with bisect - i can judge too early that some bisect was good the road was: git bisect start # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 'pinctrl-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using stack larger than 1024. git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 'udp-reduce-cache-pressure' git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch 's390-net-updates-part-2' git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230 # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch 'bpf-ctx-narrow' git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70 # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove cp_outgoing git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2 # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add TCP_MD5SIG_EXT socket option to set a key address prefix git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new function dst_dev_put() And currently have this running for about 4 hours without problems. git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36 # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove DST_NOCACHE flag Here for sure - panic git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2 # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() properly git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112 # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call dst_hold_safe() properly git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for insertion into fib6 tree im not 100% sure tor last two Will test them again starting from [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() properly git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855 # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911 # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 19:13 ` Paweł Staszewski @ 2017-09-20 19:23 ` Paweł Staszewski 2017-09-20 21:10 ` Paweł Staszewski 0 siblings, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 19:23 UTC (permalink / raw) To: Cong Wang, Eric Dumazet Cc: Wei Wang, Linux Kernel Network Developers, Eric Dumazet W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze: > > > W dniu 2017-09-20 o 20:36, Cong Wang pisze: >> On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet >> <eric.dumazet@gmail.com> wrote: >>> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: >>>> but dmesg at this time shows nothing about interfaces or flaps. >>>> >>>> This is very odd. >>>> >>>> We only free netdevice in free_netdev() and it is only called when >>>> we unregister a netdevice. Otherwise pcpu_refcnt is impossible >>>> to be NULL. >>> If there is a missing dev_hold() or one dev_put() in excess, >>> this would allow the netdev to be freed too soon. >>> >>> -> Use after free. >>> memory holding netdev could be reallocated-cleared by some other kernel >>> user. >>> >> Sure, but only unregister could trigger a free. If there is no >> unregister, >> like what Pawel claims, then there is no free, the refcnt just goes to >> 0 but the memory is still there. >> > About possible mistake from my side with bisect - i can judge too > early that some bisect was good > the road was: > git bisect start > # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag > 'pinctrl-v4.13-1' of > git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl > git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 > # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' > of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security > git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 > # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid > using stack larger than 1024. > git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f > # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch > 'udp-reduce-cache-pressure' > git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 > # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch > 's390-net-updates-part-2' > git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230 > # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch > 'bpf-ctx-narrow' > git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70 > # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove > cp_outgoing > git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2 > # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add > TCP_MD5SIG_EXT socket option to set a key address prefix > git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d > # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a > new function dst_dev_put() > > And currently have this running for about 4 hours without problems. > > > > git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36 > # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove > DST_NOCACHE flag > > Here for sure - panic > > git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2 > # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call > dst_hold_safe() properly > git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112 > # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call > dst_hold_safe() properly > git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f > # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take > dst->__refcnt for insertion into fib6 tree > > im not 100% sure tor last two > Will test them again starting from > [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() > properly > > > git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855 > # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC > and remove the operation of dst_free() > > > > git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911 > # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: > mark DST_NOGC and remove the operation of dst_free() > > > What i can say more I can reproduce this on any server with similar configuration the difference can be teamd instead of bonding ixgbe or i40e and mlx5 Same problems vlans - more or less prefixes learned from bgp -> zebra -> netlink -> kernel But normally in lab when using only plain routing no bgpd and about 128 vlans - with 128 routes - cant reproduce this - this apperas only with bgp - minimum where i can reproduce this was about 130k prefixes with about 286 nexthops ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 19:23 ` Paweł Staszewski @ 2017-09-20 21:10 ` Paweł Staszewski 2017-09-20 21:24 ` Paweł Staszewski 0 siblings, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 21:10 UTC (permalink / raw) To: Cong Wang, Eric Dumazet Cc: Wei Wang, Linux Kernel Network Developers, Eric Dumazet W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze: > > > W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze: >> >> >> W dniu 2017-09-20 o 20:36, Cong Wang pisze: >>> On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet >>> <eric.dumazet@gmail.com> wrote: >>>> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: >>>>> but dmesg at this time shows nothing about interfaces or flaps. >>>>> >>>>> This is very odd. >>>>> >>>>> We only free netdevice in free_netdev() and it is only called when >>>>> we unregister a netdevice. Otherwise pcpu_refcnt is impossible >>>>> to be NULL. >>>> If there is a missing dev_hold() or one dev_put() in excess, >>>> this would allow the netdev to be freed too soon. >>>> >>>> -> Use after free. >>>> memory holding netdev could be reallocated-cleared by some other >>>> kernel >>>> user. >>>> >>> Sure, but only unregister could trigger a free. If there is no >>> unregister, >>> like what Pawel claims, then there is no free, the refcnt just goes to >>> 0 but the memory is still there. >>> >> About possible mistake from my side with bisect - i can judge too >> early that some bisect was good >> the road was: >> git bisect start >> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag >> 'pinctrl-v4.13-1' of >> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl >> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 >> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch >> 'next' of >> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid >> using stack larger than 1024. >> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f >> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch >> 'udp-reduce-cache-pressure' >> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 >> # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch >> 's390-net-updates-part-2' >> git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230 >> # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch >> 'bpf-ctx-narrow' >> git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70 >> # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove >> cp_outgoing >> git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2 >> # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add >> TCP_MD5SIG_EXT socket option to set a key address prefix >> git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d >> # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a >> new function dst_dev_put() >> >> And currently have this running for about 4 hours without problems. >> >> >> >> git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36 >> # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove >> DST_NOCACHE flag >> >> Here for sure - panic >> >> git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2 >> # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call >> dst_hold_safe() properly >> git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112 >> # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call >> dst_hold_safe() properly >> git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f >> # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take >> dst->__refcnt for insertion into fib6 tree >> >> im not 100% sure tor last two >> Will test them again starting from >> [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() >> properly >> >> >> git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855 >> # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC >> and remove the operation of dst_free() >> >> >> >> git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911 >> # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: >> mark DST_NOGC and remove the operation of dst_free() >> >> >> > What i can say more > I can reproduce this on any server with similar configuration > the difference can be teamd instead of bonding > ixgbe or i40e and mlx5 > Same problems > > vlans - more or less prefixes learned from bgp -> zebra -> netlink -> > kernel > But normally in lab when using only plain routing no bgpd and about > 128 vlans - with 128 routes - cant reproduce this - this apperas only > with bgp - minimum where i can reproduce this was about 130k prefixes > with about 286 nexthops > > > > bisected again and same result: b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 Author: Wei Wang <weiwan@google.com> Date: Sat Jun 17 10:42:32 2017 -0700 ipv4: mark DST_NOGC and remove the operation of dst_free() With the previous preparation patches, we are ready to get rid of the dst gc operation in ipv4 code and release dst based on refcnt only. So this patch adds DST_NOGC flag for all IPv4 dst and remove the calls to dst_free(). At this point, all dst created in ipv4 code do not use the dst gc anymore and will be destroyed at the point when refcnt drops to 0. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net Will add now version 2 of patch from Eric and we will see ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 21:10 ` Paweł Staszewski @ 2017-09-20 21:24 ` Paweł Staszewski 2017-09-20 21:25 ` Paweł Staszewski 0 siblings, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 21:24 UTC (permalink / raw) To: Cong Wang, Eric Dumazet Cc: Wei Wang, Linux Kernel Network Developers, Eric Dumazet W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze: > > > W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze: >> >> >> W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze: >>> >>> >>> W dniu 2017-09-20 o 20:36, Cong Wang pisze: >>>> On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet >>>> <eric.dumazet@gmail.com> wrote: >>>>> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: >>>>>> but dmesg at this time shows nothing about interfaces or flaps. >>>>>> >>>>>> This is very odd. >>>>>> >>>>>> We only free netdevice in free_netdev() and it is only called when >>>>>> we unregister a netdevice. Otherwise pcpu_refcnt is impossible >>>>>> to be NULL. >>>>> If there is a missing dev_hold() or one dev_put() in excess, >>>>> this would allow the netdev to be freed too soon. >>>>> >>>>> -> Use after free. >>>>> memory holding netdev could be reallocated-cleared by some other >>>>> kernel >>>>> user. >>>>> >>>> Sure, but only unregister could trigger a free. If there is no >>>> unregister, >>>> like what Pawel claims, then there is no free, the refcnt just goes to >>>> 0 but the memory is still there. >>>> >>> About possible mistake from my side with bisect - i can judge too >>> early that some bisect was good >>> the road was: >>> git bisect start >>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag >>> 'pinctrl-v4.13-1' of >>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl >>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 >>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch >>> 'next' of >>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >>> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid >>> using stack larger than 1024. >>> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f >>> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch >>> 'udp-reduce-cache-pressure' >>> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 >>> # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch >>> 's390-net-updates-part-2' >>> git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230 >>> # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch >>> 'bpf-ctx-narrow' >>> git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70 >>> # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove >>> cp_outgoing >>> git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2 >>> # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add >>> TCP_MD5SIG_EXT socket option to set a key address prefix >>> git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d >>> # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a >>> new function dst_dev_put() >>> >>> And currently have this running for about 4 hours without problems. >>> >>> >>> >>> git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36 >>> # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove >>> DST_NOCACHE flag >>> >>> Here for sure - panic >>> >>> git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2 >>> # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call >>> dst_hold_safe() properly >>> git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112 >>> # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call >>> dst_hold_safe() properly >>> git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f >>> # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take >>> dst->__refcnt for insertion into fib6 tree >>> >>> im not 100% sure tor last two >>> Will test them again starting from >>> [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() >>> properly >>> >>> >>> git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855 >>> # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark >>> DST_NOGC and remove the operation of dst_free() >>> >>> >>> >>> git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911 >>> # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: >>> mark DST_NOGC and remove the operation of dst_free() >>> >>> >>> >> What i can say more >> I can reproduce this on any server with similar configuration >> the difference can be teamd instead of bonding >> ixgbe or i40e and mlx5 >> Same problems >> >> vlans - more or less prefixes learned from bgp -> zebra -> netlink -> >> kernel >> But normally in lab when using only plain routing no bgpd and about >> 128 vlans - with 128 routes - cant reproduce this - this apperas only >> with bgp - minimum where i can reproduce this was about 130k prefixes >> with about 286 nexthops >> >> >> >> > bisected again and same result: > b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit > commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 > Author: Wei Wang <weiwan@google.com> > Date: Sat Jun 17 10:42:32 2017 -0700 > > ipv4: mark DST_NOGC and remove the operation of dst_free() > > With the previous preparation patches, we are ready to get rid of the > dst gc operation in ipv4 code and release dst based on refcnt only. > So this patch adds DST_NOGC flag for all IPv4 dst and remove the > calls > to dst_free(). > At this point, all dst created in ipv4 code do not use the dst gc > anymore and will be destroyed at the point when refcnt drops to 0. > > Signed-off-by: Wei Wang <weiwan@google.com> > Acked-by: Martin KaFai Lau <kafai@fb.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > > :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da > 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net > > Will add now version 2 of patch from Eric and we will see > > after adding patch perf top catch PerfTop: 77159 irqs/sec kernel:99.7% exact: 0.0% [4000Hz cycles], (all, 40 CPUs) --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 60.95% [kernel] [k] dev_put.part.6 4.00% [kernel] [k] ixgbe_poll 3.63% [kernel] [k] irq_entries_start 1.22% [kernel] [k] fib_table_lookup 1.15% [kernel] [k] do_raw_spin_lock 1.05% [kernel] [k] ixgbe_xmit_frame_ring 1.04% [kernel] [k] lookup 0.87% [kernel] [k] eth_type_trans no panic on console - rebooting to check logs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 21:24 ` Paweł Staszewski @ 2017-09-20 21:25 ` Paweł Staszewski 2017-09-20 21:27 ` Paweł Staszewski 2017-09-20 22:09 ` Wei Wang 0 siblings, 2 replies; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 21:25 UTC (permalink / raw) To: Cong Wang, Eric Dumazet Cc: Wei Wang, Linux Kernel Network Developers, Eric Dumazet W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze: > > > W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze: >> >> >> W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze: >>> >>> >>> W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze: >>>> >>>> >>>> W dniu 2017-09-20 o 20:36, Cong Wang pisze: >>>>> On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet >>>>> <eric.dumazet@gmail.com> wrote: >>>>>> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: >>>>>>> but dmesg at this time shows nothing about interfaces or flaps. >>>>>>> >>>>>>> This is very odd. >>>>>>> >>>>>>> We only free netdevice in free_netdev() and it is only called when >>>>>>> we unregister a netdevice. Otherwise pcpu_refcnt is impossible >>>>>>> to be NULL. >>>>>> If there is a missing dev_hold() or one dev_put() in excess, >>>>>> this would allow the netdev to be freed too soon. >>>>>> >>>>>> -> Use after free. >>>>>> memory holding netdev could be reallocated-cleared by some other >>>>>> kernel >>>>>> user. >>>>>> >>>>> Sure, but only unregister could trigger a free. If there is no >>>>> unregister, >>>>> like what Pawel claims, then there is no free, the refcnt just >>>>> goes to >>>>> 0 but the memory is still there. >>>>> >>>> About possible mistake from my side with bisect - i can judge too >>>> early that some bisect was good >>>> the road was: >>>> git bisect start >>>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag >>>> 'pinctrl-v4.13-1' of >>>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl >>>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 >>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch >>>> 'next' of >>>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >>>> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid >>>> using stack larger than 1024. >>>> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f >>>> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch >>>> 'udp-reduce-cache-pressure' >>>> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 >>>> # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch >>>> 's390-net-updates-part-2' >>>> git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230 >>>> # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch >>>> 'bpf-ctx-narrow' >>>> git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70 >>>> # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove >>>> cp_outgoing >>>> git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2 >>>> # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add >>>> TCP_MD5SIG_EXT socket option to set a key address prefix >>>> git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d >>>> # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a >>>> new function dst_dev_put() >>>> >>>> And currently have this running for about 4 hours without problems. >>>> >>>> >>>> >>>> git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36 >>>> # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove >>>> DST_NOCACHE flag >>>> >>>> Here for sure - panic >>>> >>>> git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2 >>>> # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call >>>> dst_hold_safe() properly >>>> git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112 >>>> # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call >>>> dst_hold_safe() properly >>>> git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f >>>> # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take >>>> dst->__refcnt for insertion into fib6 tree >>>> >>>> im not 100% sure tor last two >>>> Will test them again starting from >>>> [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() >>>> properly >>>> >>>> >>>> git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855 >>>> # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark >>>> DST_NOGC and remove the operation of dst_free() >>>> >>>> >>>> >>>> git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911 >>>> # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] >>>> ipv4: mark DST_NOGC and remove the operation of dst_free() >>>> >>>> >>>> >>> What i can say more >>> I can reproduce this on any server with similar configuration >>> the difference can be teamd instead of bonding >>> ixgbe or i40e and mlx5 >>> Same problems >>> >>> vlans - more or less prefixes learned from bgp -> zebra -> netlink >>> -> kernel >>> But normally in lab when using only plain routing no bgpd and about >>> 128 vlans - with 128 routes - cant reproduce this - this apperas >>> only with bgp - minimum where i can reproduce this was about 130k >>> prefixes with about 286 nexthops >>> >>> >>> >>> >> bisected again and same result: >> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit >> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 >> Author: Wei Wang <weiwan@google.com> >> Date: Sat Jun 17 10:42:32 2017 -0700 >> >> ipv4: mark DST_NOGC and remove the operation of dst_free() >> >> With the previous preparation patches, we are ready to get rid of >> the >> dst gc operation in ipv4 code and release dst based on refcnt only. >> So this patch adds DST_NOGC flag for all IPv4 dst and remove the >> calls >> to dst_free(). >> At this point, all dst created in ipv4 code do not use the dst gc >> anymore and will be destroyed at the point when refcnt drops to 0. >> >> Signed-off-by: Wei Wang <weiwan@google.com> >> Acked-by: Martin KaFai Lau <kafai@fb.com> >> Signed-off-by: David S. Miller <davem@davemloft.net> >> >> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da >> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net >> >> Will add now version 2 of patch from Eric and we will see >> >> > after adding patch > perf top catch > PerfTop: 77159 irqs/sec kernel:99.7% exact: 0.0% [4000Hz > cycles], (all, 40 CPUs) > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > > 60.95% [kernel] [k] dev_put.part.6 > 4.00% [kernel] [k] ixgbe_poll > 3.63% [kernel] [k] irq_entries_start > 1.22% [kernel] [k] fib_table_lookup > 1.15% [kernel] [k] do_raw_spin_lock > 1.05% [kernel] [k] ixgbe_xmit_frame_ring > 1.04% [kernel] [k] lookup > 0.87% [kernel] [k] eth_type_trans > > > no panic on console - rebooting to check logs > > Nothing logged ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 21:25 ` Paweł Staszewski @ 2017-09-20 21:27 ` Paweł Staszewski 2017-09-20 22:09 ` Wei Wang 1 sibling, 0 replies; 52+ messages in thread From: Paweł Staszewski @ 2017-09-20 21:27 UTC (permalink / raw) To: Cong Wang, Eric Dumazet Cc: Wei Wang, Linux Kernel Network Developers, Eric Dumazet W dniu 2017-09-20 o 23:25, Paweł Staszewski pisze: > > > W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze: >> >> >> W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze: >>> >>> >>> W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze: >>>> >>>> >>>> W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze: >>>>> >>>>> >>>>> W dniu 2017-09-20 o 20:36, Cong Wang pisze: >>>>>> On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet >>>>>> <eric.dumazet@gmail.com> wrote: >>>>>>> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: >>>>>>>> but dmesg at this time shows nothing about interfaces or flaps. >>>>>>>> >>>>>>>> This is very odd. >>>>>>>> >>>>>>>> We only free netdevice in free_netdev() and it is only called when >>>>>>>> we unregister a netdevice. Otherwise pcpu_refcnt is impossible >>>>>>>> to be NULL. >>>>>>> If there is a missing dev_hold() or one dev_put() in excess, >>>>>>> this would allow the netdev to be freed too soon. >>>>>>> >>>>>>> -> Use after free. >>>>>>> memory holding netdev could be reallocated-cleared by some other >>>>>>> kernel >>>>>>> user. >>>>>>> >>>>>> Sure, but only unregister could trigger a free. If there is no >>>>>> unregister, >>>>>> like what Pawel claims, then there is no free, the refcnt just >>>>>> goes to >>>>>> 0 but the memory is still there. >>>>>> >>>>> About possible mistake from my side with bisect - i can judge too >>>>> early that some bisect was good >>>>> the road was: >>>>> git bisect start >>>>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag >>>>> 'pinctrl-v4.13-1' of >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl >>>>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 >>>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch >>>>> 'next' of >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >>>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >>>>> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid >>>>> using stack larger than 1024. >>>>> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f >>>>> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch >>>>> 'udp-reduce-cache-pressure' >>>>> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 >>>>> # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch >>>>> 's390-net-updates-part-2' >>>>> git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230 >>>>> # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch >>>>> 'bpf-ctx-narrow' >>>>> git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70 >>>>> # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: >>>>> remove cp_outgoing >>>>> git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2 >>>>> # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add >>>>> TCP_MD5SIG_EXT socket option to set a key address prefix >>>>> git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d >>>>> # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce >>>>> a new function dst_dev_put() >>>>> >>>>> And currently have this running for about 4 hours without problems. >>>>> >>>>> >>>>> >>>>> git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36 >>>>> # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove >>>>> DST_NOCACHE flag >>>>> >>>>> Here for sure - panic >>>>> >>>>> git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2 >>>>> # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call >>>>> dst_hold_safe() properly >>>>> git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112 >>>>> # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call >>>>> dst_hold_safe() properly >>>>> git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f >>>>> # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take >>>>> dst->__refcnt for insertion into fib6 tree >>>>> >>>>> im not 100% sure tor last two >>>>> Will test them again starting from >>>>> [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call >>>>> dst_dev_put() properly >>>>> >>>>> >>>>> git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855 >>>>> # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark >>>>> DST_NOGC and remove the operation of dst_free() >>>>> >>>>> >>>>> >>>>> git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911 >>>>> # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] >>>>> ipv4: mark DST_NOGC and remove the operation of dst_free() >>>>> >>>>> >>>>> >>>> What i can say more >>>> I can reproduce this on any server with similar configuration >>>> the difference can be teamd instead of bonding >>>> ixgbe or i40e and mlx5 >>>> Same problems >>>> >>>> vlans - more or less prefixes learned from bgp -> zebra -> netlink >>>> -> kernel >>>> But normally in lab when using only plain routing no bgpd and about >>>> 128 vlans - with 128 routes - cant reproduce this - this apperas >>>> only with bgp - minimum where i can reproduce this was about 130k >>>> prefixes with about 286 nexthops >>>> >>>> >>>> >>>> >>> bisected again and same result: >>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit >>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 >>> Author: Wei Wang <weiwan@google.com> >>> Date: Sat Jun 17 10:42:32 2017 -0700 >>> >>> ipv4: mark DST_NOGC and remove the operation of dst_free() >>> >>> With the previous preparation patches, we are ready to get rid >>> of the >>> dst gc operation in ipv4 code and release dst based on refcnt only. >>> So this patch adds DST_NOGC flag for all IPv4 dst and remove the >>> calls >>> to dst_free(). >>> At this point, all dst created in ipv4 code do not use the dst gc >>> anymore and will be destroyed at the point when refcnt drops to 0. >>> >>> Signed-off-by: Wei Wang <weiwan@google.com> >>> Acked-by: Martin KaFai Lau <kafai@fb.com> >>> Signed-off-by: David S. Miller <davem@davemloft.net> >>> >>> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da >>> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net >>> >>> Will add now version 2 of patch from Eric and we will see >>> >>> >> after adding patch >> perf top catch >> PerfTop: 77159 irqs/sec kernel:99.7% exact: 0.0% [4000Hz >> cycles], (all, 40 CPUs) >> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> >> >> 60.95% [kernel] [k] dev_put.part.6 >> 4.00% [kernel] [k] ixgbe_poll >> 3.63% [kernel] [k] irq_entries_start >> 1.22% [kernel] [k] fib_table_lookup >> 1.15% [kernel] [k] do_raw_spin_lock >> 1.05% [kernel] [k] ixgbe_xmit_frame_ring >> 1.04% [kernel] [k] lookup >> 0.87% [kernel] [k] eth_type_trans >> >> >> no panic on console - rebooting to check logs >> >> > Nothing logged > > after adding this patch diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3331,7 +3331,15 @@ void netdev_run_todo(void); */ static inline void dev_put(struct net_device *dev) { - this_cpu_dec(*dev->pcpu_refcnt); + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); + + if (!pref) { + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n", + dev, dev->name, dev->reg_state, dev->dismantle); + for (;;) + cpu_relax(); + } + this_cpu_dec(*pref); } /** Have just halted console - no output no reaction on kbd nothing in any syslog/log and catched only something from perf top ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 21:25 ` Paweł Staszewski 2017-09-20 21:27 ` Paweł Staszewski @ 2017-09-20 22:09 ` Wei Wang 2017-09-21 1:09 ` Wei Wang 1 sibling, 1 reply; 52+ messages in thread From: Wei Wang @ 2017-09-20 22:09 UTC (permalink / raw) To: Paweł Staszewski Cc: Cong Wang, Eric Dumazet, Linux Kernel Network Developers, Eric Dumazet >>> bisected again and same result: >>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit >>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 >>> Author: Wei Wang <weiwan@google.com> >>> Date: Sat Jun 17 10:42:32 2017 -0700 >>> >>> ipv4: mark DST_NOGC and remove the operation of dst_free() >>> >>> With the previous preparation patches, we are ready to get rid of the >>> dst gc operation in ipv4 code and release dst based on refcnt only. >>> So this patch adds DST_NOGC flag for all IPv4 dst and remove the >>> calls >>> to dst_free(). >>> At this point, all dst created in ipv4 code do not use the dst gc >>> anymore and will be destroyed at the point when refcnt drops to 0. >>> >>> Signed-off-by: Wei Wang <weiwan@google.com> >>> Acked-by: Martin KaFai Lau <kafai@fb.com> >>> Signed-off-by: David S. Miller <davem@davemloft.net> >>> >>> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da >>> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net >>> >>> Will add now version 2 of patch from Eric and we will see >>> >>> >> after adding patch >> perf top catch >> PerfTop: 77159 irqs/sec kernel:99.7% exact: 0.0% [4000Hz cycles], >> (all, 40 CPUs) >> >> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> >> 60.95% [kernel] [k] dev_put.part.6 >> 4.00% [kernel] [k] ixgbe_poll >> 3.63% [kernel] [k] irq_entries_start >> 1.22% [kernel] [k] fib_table_lookup >> 1.15% [kernel] [k] do_raw_spin_lock >> 1.05% [kernel] [k] ixgbe_xmit_frame_ring >> 1.04% [kernel] [k] lookup >> 0.87% [kernel] [k] eth_type_trans >> >> >> no panic on console - rebooting to check logs >> >> > Nothing logged > Thanks very much Pawel for the feedback. I was looking into the code (specifically IPv4 part) and found that in free_fib_info_rcu(), we call free_nh_exceptions() without holding the fnhe_lock. I am wondering if that could cause some race condition on fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the same dst could be happening. But as we call free_fib_info_rcu() only after the grace period, and the lookup code which could potentially modify fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems fine... On Wed, Sep 20, 2017 at 2:25 PM, Paweł Staszewski <pstaszewski@itcare.pl> wrote: > > > W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze: > >> >> >> W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze: >>> >>> >>> >>> W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze: >>>> >>>> >>>> >>>> W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze: >>>>> >>>>> >>>>> >>>>> W dniu 2017-09-20 o 20:36, Cong Wang pisze: >>>>>> >>>>>> On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet >>>>>> <eric.dumazet@gmail.com> wrote: >>>>>>> >>>>>>> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: >>>>>>>> >>>>>>>> but dmesg at this time shows nothing about interfaces or flaps. >>>>>>>> >>>>>>>> This is very odd. >>>>>>>> >>>>>>>> We only free netdevice in free_netdev() and it is only called when >>>>>>>> we unregister a netdevice. Otherwise pcpu_refcnt is impossible >>>>>>>> to be NULL. >>>>>>> >>>>>>> If there is a missing dev_hold() or one dev_put() in excess, >>>>>>> this would allow the netdev to be freed too soon. >>>>>>> >>>>>>> -> Use after free. >>>>>>> memory holding netdev could be reallocated-cleared by some other >>>>>>> kernel >>>>>>> user. >>>>>>> >>>>>> Sure, but only unregister could trigger a free. If there is no >>>>>> unregister, >>>>>> like what Pawel claims, then there is no free, the refcnt just goes to >>>>>> 0 but the memory is still there. >>>>>> >>>>> About possible mistake from my side with bisect - i can judge too early >>>>> that some bisect was good >>>>> the road was: >>>>> git bisect start >>>>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag >>>>> 'pinctrl-v4.13-1' of >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl >>>>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 >>>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' >>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >>>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >>>>> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using >>>>> stack larger than 1024. >>>>> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f >>>>> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch >>>>> 'udp-reduce-cache-pressure' >>>>> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 >>>>> # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch >>>>> 's390-net-updates-part-2' >>>>> git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230 >>>>> # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch >>>>> 'bpf-ctx-narrow' >>>>> git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70 >>>>> # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove >>>>> cp_outgoing >>>>> git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2 >>>>> # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add >>>>> TCP_MD5SIG_EXT socket option to set a key address prefix >>>>> git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d >>>>> # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new >>>>> function dst_dev_put() >>>>> >>>>> And currently have this running for about 4 hours without problems. >>>>> >>>>> >>>>> >>>>> git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36 >>>>> # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove >>>>> DST_NOCACHE flag >>>>> >>>>> Here for sure - panic >>>>> >>>>> git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2 >>>>> # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call >>>>> dst_hold_safe() properly >>>>> git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112 >>>>> # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call >>>>> dst_hold_safe() properly >>>>> git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f >>>>> # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take >>>>> dst->__refcnt for insertion into fib6 tree >>>>> >>>>> im not 100% sure tor last two >>>>> Will test them again starting from >>>>> [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() >>>>> properly >>>>> >>>>> >>>>> git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855 >>>>> # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC >>>>> and remove the operation of dst_free() >>>>> >>>>> >>>>> >>>>> git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911 >>>>> # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: >>>>> mark DST_NOGC and remove the operation of dst_free() >>>>> >>>>> >>>>> >>>> What i can say more >>>> I can reproduce this on any server with similar configuration >>>> the difference can be teamd instead of bonding >>>> ixgbe or i40e and mlx5 >>>> Same problems >>>> >>>> vlans - more or less prefixes learned from bgp -> zebra -> netlink -> >>>> kernel >>>> But normally in lab when using only plain routing no bgpd and about 128 >>>> vlans - with 128 routes - cant reproduce this - this apperas only with bgp - >>>> minimum where i can reproduce this was about 130k prefixes with about 286 >>>> nexthops >>>> >>>> >>>> >>>> >>> bisected again and same result: >>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit >>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 >>> Author: Wei Wang <weiwan@google.com> >>> Date: Sat Jun 17 10:42:32 2017 -0700 >>> >>> ipv4: mark DST_NOGC and remove the operation of dst_free() >>> >>> With the previous preparation patches, we are ready to get rid of the >>> dst gc operation in ipv4 code and release dst based on refcnt only. >>> So this patch adds DST_NOGC flag for all IPv4 dst and remove the >>> calls >>> to dst_free(). >>> At this point, all dst created in ipv4 code do not use the dst gc >>> anymore and will be destroyed at the point when refcnt drops to 0. >>> >>> Signed-off-by: Wei Wang <weiwan@google.com> >>> Acked-by: Martin KaFai Lau <kafai@fb.com> >>> Signed-off-by: David S. Miller <davem@davemloft.net> >>> >>> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da >>> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net >>> >>> Will add now version 2 of patch from Eric and we will see >>> >>> >> after adding patch >> perf top catch >> PerfTop: 77159 irqs/sec kernel:99.7% exact: 0.0% [4000Hz cycles], >> (all, 40 CPUs) >> >> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> >> 60.95% [kernel] [k] dev_put.part.6 >> 4.00% [kernel] [k] ixgbe_poll >> 3.63% [kernel] [k] irq_entries_start >> 1.22% [kernel] [k] fib_table_lookup >> 1.15% [kernel] [k] do_raw_spin_lock >> 1.05% [kernel] [k] ixgbe_xmit_frame_ring >> 1.04% [kernel] [k] lookup >> 0.87% [kernel] [k] eth_type_trans >> >> >> no panic on console - rebooting to check logs >> >> > Nothing logged > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-20 22:09 ` Wei Wang @ 2017-09-21 1:09 ` Wei Wang 2017-09-21 1:17 ` Eric Dumazet 0 siblings, 1 reply; 52+ messages in thread From: Wei Wang @ 2017-09-21 1:09 UTC (permalink / raw) To: Paweł Staszewski Cc: Cong Wang, Eric Dumazet, Linux Kernel Network Developers, Eric Dumazet > Thanks very much Pawel for the feedback. > > I was looking into the code (specifically IPv4 part) and found that in > free_fib_info_rcu(), we call free_nh_exceptions() without holding the > fnhe_lock. I am wondering if that could cause some race condition on > fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the > same dst could be happening. > > But as we call free_fib_info_rcu() only after the grace period, and > the lookup code which could potentially modify > fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems > fine... > Hi Pawel, Could you try the following debug patch on top of net-next branch and reproduce the issue check if there are warning msg showing? diff --git a/include/net/dst.h b/include/net/dst.h index 93568bd0a352..82aff41c6f63 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) static inline struct dst_entry *dst_clone(struct dst_entry *dst) { if (dst) - atomic_inc(&dst->__refcnt); + dst_hold(dst); return dst; } Thanks. Wei On Wed, Sep 20, 2017 at 3:09 PM, Wei Wang <weiwan@google.com> wrote: >>>> bisected again and same result: >>>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit >>>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 >>>> Author: Wei Wang <weiwan@google.com> >>>> Date: Sat Jun 17 10:42:32 2017 -0700 >>>> >>>> ipv4: mark DST_NOGC and remove the operation of dst_free() >>>> >>>> With the previous preparation patches, we are ready to get rid of the >>>> dst gc operation in ipv4 code and release dst based on refcnt only. >>>> So this patch adds DST_NOGC flag for all IPv4 dst and remove the >>>> calls >>>> to dst_free(). >>>> At this point, all dst created in ipv4 code do not use the dst gc >>>> anymore and will be destroyed at the point when refcnt drops to 0. >>>> >>>> Signed-off-by: Wei Wang <weiwan@google.com> >>>> Acked-by: Martin KaFai Lau <kafai@fb.com> >>>> Signed-off-by: David S. Miller <davem@davemloft.net> >>>> >>>> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da >>>> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net >>>> >>>> Will add now version 2 of patch from Eric and we will see >>>> >>>> >>> after adding patch >>> perf top catch >>> PerfTop: 77159 irqs/sec kernel:99.7% exact: 0.0% [4000Hz cycles], >>> (all, 40 CPUs) >>> >>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> >>> 60.95% [kernel] [k] dev_put.part.6 >>> 4.00% [kernel] [k] ixgbe_poll >>> 3.63% [kernel] [k] irq_entries_start >>> 1.22% [kernel] [k] fib_table_lookup >>> 1.15% [kernel] [k] do_raw_spin_lock >>> 1.05% [kernel] [k] ixgbe_xmit_frame_ring >>> 1.04% [kernel] [k] lookup >>> 0.87% [kernel] [k] eth_type_trans >>> >>> >>> no panic on console - rebooting to check logs >>> >>> >> Nothing logged >> > > Thanks very much Pawel for the feedback. > > I was looking into the code (specifically IPv4 part) and found that in > free_fib_info_rcu(), we call free_nh_exceptions() without holding the > fnhe_lock. I am wondering if that could cause some race condition on > fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the > same dst could be happening. > > But as we call free_fib_info_rcu() only after the grace period, and > the lookup code which could potentially modify > fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems > fine... > > > On Wed, Sep 20, 2017 at 2:25 PM, Paweł Staszewski <pstaszewski@itcare.pl> wrote: >> >> >> W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze: >> >>> >>> >>> W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze: >>>> >>>> >>>> >>>> W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze: >>>>> >>>>> >>>>> >>>>> W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze: >>>>>> >>>>>> >>>>>> >>>>>> W dniu 2017-09-20 o 20:36, Cong Wang pisze: >>>>>>> >>>>>>> On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet >>>>>>> <eric.dumazet@gmail.com> wrote: >>>>>>>> >>>>>>>> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: >>>>>>>>> >>>>>>>>> but dmesg at this time shows nothing about interfaces or flaps. >>>>>>>>> >>>>>>>>> This is very odd. >>>>>>>>> >>>>>>>>> We only free netdevice in free_netdev() and it is only called when >>>>>>>>> we unregister a netdevice. Otherwise pcpu_refcnt is impossible >>>>>>>>> to be NULL. >>>>>>>> >>>>>>>> If there is a missing dev_hold() or one dev_put() in excess, >>>>>>>> this would allow the netdev to be freed too soon. >>>>>>>> >>>>>>>> -> Use after free. >>>>>>>> memory holding netdev could be reallocated-cleared by some other >>>>>>>> kernel >>>>>>>> user. >>>>>>>> >>>>>>> Sure, but only unregister could trigger a free. If there is no >>>>>>> unregister, >>>>>>> like what Pawel claims, then there is no free, the refcnt just goes to >>>>>>> 0 but the memory is still there. >>>>>>> >>>>>> About possible mistake from my side with bisect - i can judge too early >>>>>> that some bisect was good >>>>>> the road was: >>>>>> git bisect start >>>>>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag >>>>>> 'pinctrl-v4.13-1' of >>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl >>>>>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 >>>>>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' >>>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security >>>>>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 >>>>>> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using >>>>>> stack larger than 1024. >>>>>> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f >>>>>> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch >>>>>> 'udp-reduce-cache-pressure' >>>>>> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 >>>>>> # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch >>>>>> 's390-net-updates-part-2' >>>>>> git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230 >>>>>> # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch >>>>>> 'bpf-ctx-narrow' >>>>>> git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70 >>>>>> # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove >>>>>> cp_outgoing >>>>>> git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2 >>>>>> # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add >>>>>> TCP_MD5SIG_EXT socket option to set a key address prefix >>>>>> git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d >>>>>> # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new >>>>>> function dst_dev_put() >>>>>> >>>>>> And currently have this running for about 4 hours without problems. >>>>>> >>>>>> >>>>>> >>>>>> git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36 >>>>>> # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove >>>>>> DST_NOCACHE flag >>>>>> >>>>>> Here for sure - panic >>>>>> >>>>>> git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2 >>>>>> # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call >>>>>> dst_hold_safe() properly >>>>>> git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112 >>>>>> # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call >>>>>> dst_hold_safe() properly >>>>>> git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f >>>>>> # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take >>>>>> dst->__refcnt for insertion into fib6 tree >>>>>> >>>>>> im not 100% sure tor last two >>>>>> Will test them again starting from >>>>>> [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() >>>>>> properly >>>>>> >>>>>> >>>>>> git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855 >>>>>> # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC >>>>>> and remove the operation of dst_free() >>>>>> >>>>>> >>>>>> >>>>>> git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911 >>>>>> # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: >>>>>> mark DST_NOGC and remove the operation of dst_free() >>>>>> >>>>>> >>>>>> >>>>> What i can say more >>>>> I can reproduce this on any server with similar configuration >>>>> the difference can be teamd instead of bonding >>>>> ixgbe or i40e and mlx5 >>>>> Same problems >>>>> >>>>> vlans - more or less prefixes learned from bgp -> zebra -> netlink -> >>>>> kernel >>>>> But normally in lab when using only plain routing no bgpd and about 128 >>>>> vlans - with 128 routes - cant reproduce this - this apperas only with bgp - >>>>> minimum where i can reproduce this was about 130k prefixes with about 286 >>>>> nexthops >>>>> >>>>> >>>>> >>>>> >>>> bisected again and same result: >>>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit >>>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 >>>> Author: Wei Wang <weiwan@google.com> >>>> Date: Sat Jun 17 10:42:32 2017 -0700 >>>> >>>> ipv4: mark DST_NOGC and remove the operation of dst_free() >>>> >>>> With the previous preparation patches, we are ready to get rid of the >>>> dst gc operation in ipv4 code and release dst based on refcnt only. >>>> So this patch adds DST_NOGC flag for all IPv4 dst and remove the >>>> calls >>>> to dst_free(). >>>> At this point, all dst created in ipv4 code do not use the dst gc >>>> anymore and will be destroyed at the point when refcnt drops to 0. >>>> >>>> Signed-off-by: Wei Wang <weiwan@google.com> >>>> Acked-by: Martin KaFai Lau <kafai@fb.com> >>>> Signed-off-by: David S. Miller <davem@davemloft.net> >>>> >>>> :040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da >>>> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net >>>> >>>> Will add now version 2 of patch from Eric and we will see >>>> >>>> >>> after adding patch >>> perf top catch >>> PerfTop: 77159 irqs/sec kernel:99.7% exact: 0.0% [4000Hz cycles], >>> (all, 40 CPUs) >>> >>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> >>> 60.95% [kernel] [k] dev_put.part.6 >>> 4.00% [kernel] [k] ixgbe_poll >>> 3.63% [kernel] [k] irq_entries_start >>> 1.22% [kernel] [k] fib_table_lookup >>> 1.15% [kernel] [k] do_raw_spin_lock >>> 1.05% [kernel] [k] ixgbe_xmit_frame_ring >>> 1.04% [kernel] [k] lookup >>> 0.87% [kernel] [k] eth_type_trans >>> >>> >>> no panic on console - rebooting to check logs >>> >>> >> Nothing logged >> ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-21 1:09 ` Wei Wang @ 2017-09-21 1:17 ` Eric Dumazet 2017-09-21 9:06 ` Paweł Staszewski 0 siblings, 1 reply; 52+ messages in thread From: Eric Dumazet @ 2017-09-21 1:17 UTC (permalink / raw) To: Wei Wang Cc: Paweł Staszewski, Cong Wang, Linux Kernel Network Developers, Eric Dumazet On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote: > > Thanks very much Pawel for the feedback. > > > > I was looking into the code (specifically IPv4 part) and found that in > > free_fib_info_rcu(), we call free_nh_exceptions() without holding the > > fnhe_lock. I am wondering if that could cause some race condition on > > fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the > > same dst could be happening. > > > > But as we call free_fib_info_rcu() only after the grace period, and > > the lookup code which could potentially modify > > fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems > > fine... > > > > Hi Pawel, > > Could you try the following debug patch on top of net-next branch and > reproduce the issue check if there are warning msg showing? > > diff --git a/include/net/dst.h b/include/net/dst.h > index 93568bd0a352..82aff41c6f63 100644 > --- a/include/net/dst.h > +++ b/include/net/dst.h > @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry > *dst, unsigned long time) > static inline struct dst_entry *dst_clone(struct dst_entry *dst) > { > if (dst) > - atomic_inc(&dst->__refcnt); > + dst_hold(dst); > return dst; > } > > Thanks. > Wei > Yes, we believe skb_dst_force() and skb_dst_force_safe() should be unified (to the 'safe' version) We no longer have gc to protect from 0 -> 1 transition of dst refcount. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-21 1:17 ` Eric Dumazet @ 2017-09-21 9:06 ` Paweł Staszewski 2017-09-21 11:03 ` Eric Dumazet 0 siblings, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-21 9:06 UTC (permalink / raw) To: Eric Dumazet, Wei Wang Cc: Cong Wang, Linux Kernel Network Developers, Eric Dumazet W dniu 2017-09-21 o 03:17, Eric Dumazet pisze: > On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote: >>> Thanks very much Pawel for the feedback. >>> >>> I was looking into the code (specifically IPv4 part) and found that in >>> free_fib_info_rcu(), we call free_nh_exceptions() without holding the >>> fnhe_lock. I am wondering if that could cause some race condition on >>> fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the >>> same dst could be happening. >>> >>> But as we call free_fib_info_rcu() only after the grace period, and >>> the lookup code which could potentially modify >>> fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems >>> fine... >>> >> Hi Pawel, >> >> Could you try the following debug patch on top of net-next branch and >> reproduce the issue check if there are warning msg showing? >> >> diff --git a/include/net/dst.h b/include/net/dst.h >> index 93568bd0a352..82aff41c6f63 100644 >> --- a/include/net/dst.h >> +++ b/include/net/dst.h >> @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry >> *dst, unsigned long time) >> static inline struct dst_entry *dst_clone(struct dst_entry *dst) >> { >> if (dst) >> - atomic_inc(&dst->__refcnt); >> + dst_hold(dst); >> return dst; >> } >> >> Thanks. >> Wei >> > > Yes, we believe skb_dst_force() and skb_dst_force_safe() should be > unified (to the 'safe' version) > > We no longer have gc to protect from 0 -> 1 transition of dst refcount. > > > > After adding patch from Wei https://bugzilla.kernel.org/show_bug.cgi?id=197005#c14 ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-21 9:06 ` Paweł Staszewski @ 2017-09-21 11:03 ` Eric Dumazet 2017-09-21 11:12 ` Paweł Staszewski 2017-09-21 11:31 ` Paweł Staszewski 0 siblings, 2 replies; 52+ messages in thread From: Eric Dumazet @ 2017-09-21 11:03 UTC (permalink / raw) To: Paweł Staszewski Cc: Wei Wang, Cong Wang, Linux Kernel Network Developers, Eric Dumazet On Thu, 2017-09-21 at 11:06 +0200, Paweł Staszewski wrote: > > W dniu 2017-09-21 o 03:17, Eric Dumazet pisze: > > On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote: > >>> Thanks very much Pawel for the feedback. > >>> > >>> I was looking into the code (specifically IPv4 part) and found that in > >>> free_fib_info_rcu(), we call free_nh_exceptions() without holding the > >>> fnhe_lock. I am wondering if that could cause some race condition on > >>> fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the > >>> same dst could be happening. > >>> > >>> But as we call free_fib_info_rcu() only after the grace period, and > >>> the lookup code which could potentially modify > >>> fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems > >>> fine... > >>> > >> Hi Pawel, > >> > >> Could you try the following debug patch on top of net-next branch and > >> reproduce the issue check if there are warning msg showing? > >> > >> diff --git a/include/net/dst.h b/include/net/dst.h > >> index 93568bd0a352..82aff41c6f63 100644 > >> --- a/include/net/dst.h > >> +++ b/include/net/dst.h > >> @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry > >> *dst, unsigned long time) > >> static inline struct dst_entry *dst_clone(struct dst_entry *dst) > >> { > >> if (dst) > >> - atomic_inc(&dst->__refcnt); > >> + dst_hold(dst); > >> return dst; > >> } > >> > >> Thanks. > >> Wei > >> > > > > Yes, we believe skb_dst_force() and skb_dst_force_safe() should be > > unified (to the 'safe' version) > > > > We no longer have gc to protect from 0 -> 1 transition of dst refcount. > > > > > > > > > > After adding patch from Wei > https://bugzilla.kernel.org/show_bug.cgi?id=197005#c14 > OK we have two problems here 1) We need to unify skb_dst_force() ( for net tree ) 2) Vlan devices should try to correctly handle IFF_XMIT_DST_RELEASE from lower device. This will considerably help your performance. For 1), this is what I had in mind, can you try it ? Thanks a lot ! diff --git a/include/net/dst.h b/include/net/dst.h index 93568bd0a3520bb7402f04d90cf04ac99c81cfbe..f23851eeaad917e8dafc06b58d23a2575405c894 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) static inline struct dst_entry *dst_clone(struct dst_entry *dst) { if (dst) - atomic_inc(&dst->__refcnt); + dst_hold(dst); return dst; } @@ -311,21 +311,6 @@ static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb __skb_dst_copy(nskb, oskb->_skb_refdst); } -/** - * skb_dst_force - makes sure skb dst is refcounted - * @skb: buffer - * - * If dst is not yet refcounted, let's do it - */ -static inline void skb_dst_force(struct sk_buff *skb) -{ - if (skb_dst_is_noref(skb)) { - WARN_ON(!rcu_read_lock_held()); - skb->_skb_refdst &= ~SKB_DST_NOREF; - dst_clone(skb_dst(skb)); - } -} - /** * dst_hold_safe - Take a reference on a dst if possible * @dst: pointer to dst entry @@ -356,6 +341,23 @@ static inline void skb_dst_force_safe(struct sk_buff *skb) } } +/** + * skb_dst_force - makes sure skb dst is refcounted + * @skb: buffer + * + * If dst is not yet refcounted, let's do it + */ +static inline void skb_dst_force(struct sk_buff *skb) +{ + if (skb_dst_is_noref(skb)) { + struct dst_entry *dst = skb_dst(skb); + + WARN_ON(!rcu_read_lock_held()); + if (!dst_hold_safe(dst)) + dst = NULL; + skb->_skb_refdst = (unsigned long)dst; + } +} /** * __skb_tunnel_rx - prepare skb for rx reinsert ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-21 11:03 ` Eric Dumazet @ 2017-09-21 11:12 ` Paweł Staszewski 2017-09-21 11:14 ` Paweł Staszewski 2017-09-21 11:31 ` Paweł Staszewski 1 sibling, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-21 11:12 UTC (permalink / raw) To: Eric Dumazet Cc: Wei Wang, Cong Wang, Linux Kernel Network Developers, Eric Dumazet W dniu 2017-09-21 o 13:03, Eric Dumazet pisze: > On Thu, 2017-09-21 at 11:06 +0200, Paweł Staszewski wrote: >> W dniu 2017-09-21 o 03:17, Eric Dumazet pisze: >>> On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote: >>>>> Thanks very much Pawel for the feedback. >>>>> >>>>> I was looking into the code (specifically IPv4 part) and found that in >>>>> free_fib_info_rcu(), we call free_nh_exceptions() without holding the >>>>> fnhe_lock. I am wondering if that could cause some race condition on >>>>> fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the >>>>> same dst could be happening. >>>>> >>>>> But as we call free_fib_info_rcu() only after the grace period, and >>>>> the lookup code which could potentially modify >>>>> fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems >>>>> fine... >>>>> >>>> Hi Pawel, >>>> >>>> Could you try the following debug patch on top of net-next branch and >>>> reproduce the issue check if there are warning msg showing? >>>> >>>> diff --git a/include/net/dst.h b/include/net/dst.h >>>> index 93568bd0a352..82aff41c6f63 100644 >>>> --- a/include/net/dst.h >>>> +++ b/include/net/dst.h >>>> @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry >>>> *dst, unsigned long time) >>>> static inline struct dst_entry *dst_clone(struct dst_entry *dst) >>>> { >>>> if (dst) >>>> - atomic_inc(&dst->__refcnt); >>>> + dst_hold(dst); >>>> return dst; >>>> } >>>> >>>> Thanks. >>>> Wei >>>> >>> Yes, we believe skb_dst_force() and skb_dst_force_safe() should be >>> unified (to the 'safe' version) >>> >>> We no longer have gc to protect from 0 -> 1 transition of dst refcount. >>> >>> >>> >>> >> After adding patch from Wei >> https://bugzilla.kernel.org/show_bug.cgi?id=197005#c14 >> > OK we have two problems here > > 1) We need to unify skb_dst_force() ( for net tree ) > > 2) Vlan devices should try to correctly handle IFF_XMIT_DST_RELEASE from > lower device. This will considerably help your performance. > > > For 1), this is what I had in mind, can you try it ? > > Thanks a lot ! > > diff --git a/include/net/dst.h b/include/net/dst.h > index 93568bd0a3520bb7402f04d90cf04ac99c81cfbe..f23851eeaad917e8dafc06b58d23a2575405c894 100644 > --- a/include/net/dst.h > +++ b/include/net/dst.h > @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) > static inline struct dst_entry *dst_clone(struct dst_entry *dst) > { > if (dst) > - atomic_inc(&dst->__refcnt); > + dst_hold(dst); > return dst; > } > > @@ -311,21 +311,6 @@ static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb > __skb_dst_copy(nskb, oskb->_skb_refdst); > } > > -/** > - * skb_dst_force - makes sure skb dst is refcounted > - * @skb: buffer > - * > - * If dst is not yet refcounted, let's do it > - */ > -static inline void skb_dst_force(struct sk_buff *skb) > -{ > - if (skb_dst_is_noref(skb)) { > - WARN_ON(!rcu_read_lock_held()); > - skb->_skb_refdst &= ~SKB_DST_NOREF; > - dst_clone(skb_dst(skb)); > - } > -} > - > /** > * dst_hold_safe - Take a reference on a dst if possible > * @dst: pointer to dst entry > @@ -356,6 +341,23 @@ static inline void skb_dst_force_safe(struct sk_buff *skb) > } > } > > +/** > + * skb_dst_force - makes sure skb dst is refcounted > + * @skb: buffer > + * > + * If dst is not yet refcounted, let's do it > + */ > +static inline void skb_dst_force(struct sk_buff *skb) > +{ > + if (skb_dst_is_noref(skb)) { > + struct dst_entry *dst = skb_dst(skb); > + > + WARN_ON(!rcu_read_lock_held()); > + if (!dst_hold_safe(dst)) > + dst = NULL; > + skb->_skb_refdst = (unsigned long)dst; > + } > +} > > /** > * __skb_tunnel_rx - prepare skb for rx reinsert > > > Thanks What is weird i have this part in my net-next from git: /** * skb_dst_force_safe - makes sure skb dst is refcounted * @skb: buffer * * If dst is not yet refcounted and not destroyed, grab a ref on it. */ static inline void skb_dst_force_safe(struct sk_buff *skb) { if (skb_dst_is_noref(skb)) { struct dst_entry *dst = skb_dst(skb); if (!dst_hold_safe(dst)) dst = NULL; skb->_skb_refdst = (unsigned long)dst; } } ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-21 11:12 ` Paweł Staszewski @ 2017-09-21 11:14 ` Paweł Staszewski 0 siblings, 0 replies; 52+ messages in thread From: Paweł Staszewski @ 2017-09-21 11:14 UTC (permalink / raw) To: Eric Dumazet Cc: Wei Wang, Cong Wang, Linux Kernel Network Developers, Eric Dumazet W dniu 2017-09-21 o 13:12, Paweł Staszewski pisze: > > > W dniu 2017-09-21 o 13:03, Eric Dumazet pisze: >> On Thu, 2017-09-21 at 11:06 +0200, Paweł Staszewski wrote: >>> W dniu 2017-09-21 o 03:17, Eric Dumazet pisze: >>>> On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote: >>>>>> Thanks very much Pawel for the feedback. >>>>>> >>>>>> I was looking into the code (specifically IPv4 part) and found >>>>>> that in >>>>>> free_fib_info_rcu(), we call free_nh_exceptions() without holding >>>>>> the >>>>>> fnhe_lock. I am wondering if that could cause some race condition on >>>>>> fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the >>>>>> same dst could be happening. >>>>>> >>>>>> But as we call free_fib_info_rcu() only after the grace period, and >>>>>> the lookup code which could potentially modify >>>>>> fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems >>>>>> fine... >>>>>> >>>>> Hi Pawel, >>>>> >>>>> Could you try the following debug patch on top of net-next branch and >>>>> reproduce the issue check if there are warning msg showing? >>>>> >>>>> diff --git a/include/net/dst.h b/include/net/dst.h >>>>> index 93568bd0a352..82aff41c6f63 100644 >>>>> --- a/include/net/dst.h >>>>> +++ b/include/net/dst.h >>>>> @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry >>>>> *dst, unsigned long time) >>>>> static inline struct dst_entry *dst_clone(struct dst_entry *dst) >>>>> { >>>>> if (dst) >>>>> - atomic_inc(&dst->__refcnt); >>>>> + dst_hold(dst); >>>>> return dst; >>>>> } >>>>> >>>>> Thanks. >>>>> Wei >>>>> >>>> Yes, we believe skb_dst_force() and skb_dst_force_safe() should be >>>> unified (to the 'safe' version) >>>> >>>> We no longer have gc to protect from 0 -> 1 transition of dst >>>> refcount. >>>> >>>> >>>> >>>> >>> After adding patch from Wei >>> https://bugzilla.kernel.org/show_bug.cgi?id=197005#c14 >>> >> OK we have two problems here >> >> 1) We need to unify skb_dst_force() ( for net tree ) >> >> 2) Vlan devices should try to correctly handle IFF_XMIT_DST_RELEASE from >> lower device. This will considerably help your performance. >> >> >> For 1), this is what I had in mind, can you try it ? >> >> Thanks a lot ! >> >> diff --git a/include/net/dst.h b/include/net/dst.h >> index >> 93568bd0a3520bb7402f04d90cf04ac99c81cfbe..f23851eeaad917e8dafc06b58d23a2575405c894 >> 100644 >> --- a/include/net/dst.h >> +++ b/include/net/dst.h >> @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry >> *dst, unsigned long time) >> static inline struct dst_entry *dst_clone(struct dst_entry *dst) >> { >> if (dst) >> - atomic_inc(&dst->__refcnt); >> + dst_hold(dst); >> return dst; >> } >> @@ -311,21 +311,6 @@ static inline void skb_dst_copy(struct sk_buff >> *nskb, const struct sk_buff *oskb >> __skb_dst_copy(nskb, oskb->_skb_refdst); >> } >> -/** >> - * skb_dst_force - makes sure skb dst is refcounted >> - * @skb: buffer >> - * >> - * If dst is not yet refcounted, let's do it >> - */ >> -static inline void skb_dst_force(struct sk_buff *skb) >> -{ >> - if (skb_dst_is_noref(skb)) { >> - WARN_ON(!rcu_read_lock_held()); >> - skb->_skb_refdst &= ~SKB_DST_NOREF; >> - dst_clone(skb_dst(skb)); >> - } >> -} >> - >> /** >> * dst_hold_safe - Take a reference on a dst if possible >> * @dst: pointer to dst entry >> @@ -356,6 +341,23 @@ static inline void skb_dst_force_safe(struct >> sk_buff *skb) >> } >> } >> +/** >> + * skb_dst_force - makes sure skb dst is refcounted >> + * @skb: buffer >> + * >> + * If dst is not yet refcounted, let's do it >> + */ >> +static inline void skb_dst_force(struct sk_buff *skb) >> +{ >> + if (skb_dst_is_noref(skb)) { >> + struct dst_entry *dst = skb_dst(skb); >> + >> + WARN_ON(!rcu_read_lock_held()); >> + if (!dst_hold_safe(dst)) >> + dst = NULL; >> + skb->_skb_refdst = (unsigned long)dst; >> + } >> +} >> /** >> * __skb_tunnel_rx - prepare skb for rx reinsert >> >> >> > Thanks > > What is weird i have this part in my net-next from git: > /** > * skb_dst_force_safe - makes sure skb dst is refcounted > * @skb: buffer > * > * If dst is not yet refcounted and not destroyed, grab a ref on it. > */ > static inline void skb_dst_force_safe(struct sk_buff *skb) > { > if (skb_dst_is_noref(skb)) { > struct dst_entry *dst = skb_dst(skb); > > if (!dst_hold_safe(dst)) > dst = NULL; > > skb->_skb_refdst = (unsigned long)dst; > } > } > > > ok the difference is skb_dst_force_safe not skb_dst_force ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-21 11:03 ` Eric Dumazet 2017-09-21 11:12 ` Paweł Staszewski @ 2017-09-21 11:31 ` Paweł Staszewski 2017-09-21 13:18 ` Paweł Staszewski 1 sibling, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-21 11:31 UTC (permalink / raw) To: Eric Dumazet Cc: Wei Wang, Cong Wang, Linux Kernel Network Developers, Eric Dumazet W dniu 2017-09-21 o 13:03, Eric Dumazet pisze: > OK we have two problems here > > 1) We need to unify skb_dst_force() ( for net tree ) > > 2) Vlan devices should try to correctly handle IFF_XMIT_DST_RELEASE from > lower device. This will considerably help your performance. > > > For 1), this is what I had in mind, can you try it ? > > Thanks a lot ! > > diff --git a/include/net/dst.h b/include/net/dst.h > index 93568bd0a3520bb7402f04d90cf04ac99c81cfbe..f23851eeaad917e8dafc06b58d23a2575405c894 100644 > --- a/include/net/dst.h > +++ b/include/net/dst.h > @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) > static inline struct dst_entry *dst_clone(struct dst_entry *dst) > { > if (dst) > - atomic_inc(&dst->__refcnt); > + dst_hold(dst); > return dst; > } > > @@ -311,21 +311,6 @@ static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb > __skb_dst_copy(nskb, oskb->_skb_refdst); > } > > -/** > - * skb_dst_force - makes sure skb dst is refcounted > - * @skb: buffer > - * > - * If dst is not yet refcounted, let's do it > - */ > -static inline void skb_dst_force(struct sk_buff *skb) > -{ > - if (skb_dst_is_noref(skb)) { > - WARN_ON(!rcu_read_lock_held()); > - skb->_skb_refdst &= ~SKB_DST_NOREF; > - dst_clone(skb_dst(skb)); > - } > -} > - > /** > * dst_hold_safe - Take a reference on a dst if possible > * @dst: pointer to dst entry > @@ -356,6 +341,23 @@ static inline void skb_dst_force_safe(struct sk_buff *skb) > } > } > > +/** > + * skb_dst_force - makes sure skb dst is refcounted > + * @skb: buffer > + * > + * If dst is not yet refcounted, let's do it > + */ > +static inline void skb_dst_force(struct sk_buff *skb) > +{ > + if (skb_dst_is_noref(skb)) { > + struct dst_entry *dst = skb_dst(skb); > + > + WARN_ON(!rcu_read_lock_held()); > + if (!dst_hold_safe(dst)) > + dst = NULL; > + skb->_skb_refdst = (unsigned long)dst; > + } > +} > > /** > * __skb_tunnel_rx - prepare skb for rx reinsert > > Patch applied - soo far no problems - and no warnings in dmesg ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-21 11:31 ` Paweł Staszewski @ 2017-09-21 13:18 ` Paweł Staszewski 2017-09-21 14:56 ` Eric Dumazet 0 siblings, 1 reply; 52+ messages in thread From: Paweł Staszewski @ 2017-09-21 13:18 UTC (permalink / raw) To: Eric Dumazet Cc: Wei Wang, Cong Wang, Linux Kernel Network Developers, Eric Dumazet W dniu 2017-09-21 o 13:31, Paweł Staszewski pisze: > > > W dniu 2017-09-21 o 13:03, Eric Dumazet pisze: >> OK we have two problems here >> >> 1) We need to unify skb_dst_force() ( for net tree ) >> >> 2) Vlan devices should try to correctly handle IFF_XMIT_DST_RELEASE from >> lower device. This will considerably help your performance. >> >> >> For 1), this is what I had in mind, can you try it ? >> >> Thanks a lot ! >> >> diff --git a/include/net/dst.h b/include/net/dst.h >> index >> 93568bd0a3520bb7402f04d90cf04ac99c81cfbe..f23851eeaad917e8dafc06b58d23a2575405c894 >> 100644 >> --- a/include/net/dst.h >> +++ b/include/net/dst.h >> @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry >> *dst, unsigned long time) >> static inline struct dst_entry *dst_clone(struct dst_entry *dst) >> { >> if (dst) >> - atomic_inc(&dst->__refcnt); >> + dst_hold(dst); >> return dst; >> } >> @@ -311,21 +311,6 @@ static inline void skb_dst_copy(struct sk_buff >> *nskb, const struct sk_buff *oskb >> __skb_dst_copy(nskb, oskb->_skb_refdst); >> } >> -/** >> - * skb_dst_force - makes sure skb dst is refcounted >> - * @skb: buffer >> - * >> - * If dst is not yet refcounted, let's do it >> - */ >> -static inline void skb_dst_force(struct sk_buff *skb) >> -{ >> - if (skb_dst_is_noref(skb)) { >> - WARN_ON(!rcu_read_lock_held()); >> - skb->_skb_refdst &= ~SKB_DST_NOREF; >> - dst_clone(skb_dst(skb)); >> - } >> -} >> - >> /** >> * dst_hold_safe - Take a reference on a dst if possible >> * @dst: pointer to dst entry >> @@ -356,6 +341,23 @@ static inline void skb_dst_force_safe(struct >> sk_buff *skb) >> } >> } >> +/** >> + * skb_dst_force - makes sure skb dst is refcounted >> + * @skb: buffer >> + * >> + * If dst is not yet refcounted, let's do it >> + */ >> +static inline void skb_dst_force(struct sk_buff *skb) >> +{ >> + if (skb_dst_is_noref(skb)) { >> + struct dst_entry *dst = skb_dst(skb); >> + >> + WARN_ON(!rcu_read_lock_held()); >> + if (!dst_hold_safe(dst)) >> + dst = NULL; >> + skb->_skb_refdst = (unsigned long)dst; >> + } >> +} >> /** >> * __skb_tunnel_rx - prepare skb for rx reinsert >> >> > > Patch applied - soo far no problems - and no warnings in dmesg > > ok after adding patch all is working from now for about 1 hour of normal traffic witc all bgp sessions connected and about 600k prefixes in kernel. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: Latest net-next from GIT panic 2017-09-21 13:18 ` Paweł Staszewski @ 2017-09-21 14:56 ` Eric Dumazet 2017-09-21 16:15 ` [PATCH net] net: prevent dst uses after free Eric Dumazet 0 siblings, 1 reply; 52+ messages in thread From: Eric Dumazet @ 2017-09-21 14:56 UTC (permalink / raw) To: Paweł Staszewski Cc: Wei Wang, Cong Wang, Linux Kernel Network Developers, Eric Dumazet On Thu, 2017-09-21 at 15:18 +0200, Paweł Staszewski wrote: > ok after adding patch all is working from now for about 1 hour of normal > traffic witc all bgp sessions connected and about 600k prefixes in kernel. Great, I am doing to submit an official patch, uniting skb_dst_force() and skb_dst_force_safe() into a single helper. Thanks. ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH net] net: prevent dst uses after free 2017-09-21 14:56 ` Eric Dumazet @ 2017-09-21 16:15 ` Eric Dumazet 2017-09-21 16:49 ` Wei Wang ` (2 more replies) 0 siblings, 3 replies; 52+ messages in thread From: Eric Dumazet @ 2017-09-21 16:15 UTC (permalink / raw) To: Paweł Staszewski Cc: Wei Wang, Cong Wang, Linux Kernel Network Developers, Eric Dumazet From: Eric Dumazet <edumazet@google.com> In linux-4.13, Wei worked hard to convert dst to a traditional refcounted model, removing GC. We now want to make sure a dst refcount can not transition from 0 back to 1. The problem here is that input path attached a not refcounted dst to an skb. Then later, because packet is forwarded and hits skb_dst_force() before exiting RCU section, we might try to take a refcount on one dst that is about to be freed, if another cpu saw 1 -> 0 transition in dst_release() and queued the dst for freeing after one RCU grace period. Lets unify skb_dst_force() and skb_dst_force_safe(), since we should always perform the complete check against dst refcount, and not assume it is not zero. Bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=197005 [ 989.919496] skb_dst_force+0x32/0x34 [ 989.919498] __dev_queue_xmit+0x1ad/0x482 [ 989.919501] ? eth_header+0x28/0xc6 [ 989.919502] dev_queue_xmit+0xb/0xd [ 989.919504] neigh_connected_output+0x9b/0xb4 [ 989.919507] ip_finish_output2+0x234/0x294 [ 989.919509] ? ipt_do_table+0x369/0x388 [ 989.919510] ip_finish_output+0x12c/0x13f [ 989.919512] ip_output+0x53/0x87 [ 989.919513] ip_forward_finish+0x53/0x5a [ 989.919515] ip_forward+0x2cb/0x3e6 [ 989.919516] ? pskb_trim_rcsum.part.9+0x4b/0x4b [ 989.919518] ip_rcv_finish+0x2e2/0x321 [ 989.919519] ip_rcv+0x26f/0x2eb [ 989.919522] ? vlan_do_receive+0x4f/0x289 [ 989.919523] __netif_receive_skb_core+0x467/0x50b [ 989.919526] ? tcp_gro_receive+0x239/0x239 [ 989.919529] ? inet_gro_receive+0x226/0x238 [ 989.919530] __netif_receive_skb+0x4d/0x5f [ 989.919532] netif_receive_skb_internal+0x5c/0xaf [ 989.919533] napi_gro_receive+0x45/0x81 [ 989.919536] ixgbe_poll+0xc8a/0xf09 [ 989.919539] ? kmem_cache_free_bulk+0x1b6/0x1f7 [ 989.919540] net_rx_action+0xf4/0x266 [ 989.919543] __do_softirq+0xa8/0x19d [ 989.919545] irq_exit+0x5d/0x6b [ 989.919546] do_IRQ+0x9c/0xb5 [ 989.919548] common_interrupt+0x93/0x93 [ 989.919548] </IRQ> Similarly dst_clone() can use dst_hold() helper to have additional debugging, as a follow up to commit 44ebe79149ff ("net: add debug atomic_inc_not_zero() in dst_hold()") In net-next we will convert dst atomic_t to refcount_t for peace of mind. Fixes: a4c2fd7f7891 ("net: remove DST_NOCACHE flag") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Reported-by: Paweł Staszewski <pstaszewski@itcare.pl> Bisected-by: Paweł Staszewski <pstaszewski@itcare.pl> --- include/net/dst.h | 22 ++++------------------ include/net/route.h | 2 +- include/net/sock.h | 2 +- 3 files changed, 6 insertions(+), 20 deletions(-) diff --git a/include/net/dst.h b/include/net/dst.h index 93568bd0a3520bb7402f04d90cf04ac99c81cfbe..06a6765da074449e6f1fe42ee05e711e898ad372 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) static inline struct dst_entry *dst_clone(struct dst_entry *dst) { if (dst) - atomic_inc(&dst->__refcnt); + dst_hold(dst); return dst; } @@ -311,21 +311,6 @@ static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb __skb_dst_copy(nskb, oskb->_skb_refdst); } -/** - * skb_dst_force - makes sure skb dst is refcounted - * @skb: buffer - * - * If dst is not yet refcounted, let's do it - */ -static inline void skb_dst_force(struct sk_buff *skb) -{ - if (skb_dst_is_noref(skb)) { - WARN_ON(!rcu_read_lock_held()); - skb->_skb_refdst &= ~SKB_DST_NOREF; - dst_clone(skb_dst(skb)); - } -} - /** * dst_hold_safe - Take a reference on a dst if possible * @dst: pointer to dst entry @@ -339,16 +324,17 @@ static inline bool dst_hold_safe(struct dst_entry *dst) } /** - * skb_dst_force_safe - makes sure skb dst is refcounted + * skb_dst_force - makes sure skb dst is refcounted * @skb: buffer * * If dst is not yet refcounted and not destroyed, grab a ref on it. */ -static inline void skb_dst_force_safe(struct sk_buff *skb) +static inline void skb_dst_force(struct sk_buff *skb) { if (skb_dst_is_noref(skb)) { struct dst_entry *dst = skb_dst(skb); + WARN_ON(!rcu_read_lock_held()); if (!dst_hold_safe(dst)) dst = NULL; diff --git a/include/net/route.h b/include/net/route.h index 1b09a9368c68d46f0c5ee8ce3cefe566000c1ec1..57dfc6850d378e4b96f13b140eef554d66c24cdf 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -190,7 +190,7 @@ static inline int ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src, rcu_read_lock(); err = ip_route_input_noref(skb, dst, src, tos, devin); if (!err) { - skb_dst_force_safe(skb); + skb_dst_force(skb); if (!skb_dst(skb)) err = -EINVAL; } diff --git a/include/net/sock.h b/include/net/sock.h index 03a362568357acc7278a318423dd3873103f90ca..a6b9a8d1a6df3f72df8f1aac0f577257fa6452d0 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -856,7 +856,7 @@ void sk_stream_write_space(struct sock *sk); static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb) { /* dont let skb dst not refcounted, we are going to leave rcu lock */ - skb_dst_force_safe(skb); + skb_dst_force(skb); if (!sk->sk_backlog.tail) sk->sk_backlog.head = skb; ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH net] net: prevent dst uses after free 2017-09-21 16:15 ` [PATCH net] net: prevent dst uses after free Eric Dumazet @ 2017-09-21 16:49 ` Wei Wang 2017-09-21 17:12 ` Martin KaFai Lau 2017-09-22 3:42 ` David Miller 2 siblings, 0 replies; 52+ messages in thread From: Wei Wang @ 2017-09-21 16:49 UTC (permalink / raw) To: Eric Dumazet Cc: Paweł Staszewski, Cong Wang, Linux Kernel Network Developers, Eric Dumazet On Thu, Sep 21, 2017 at 9:15 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > From: Eric Dumazet <edumazet@google.com> > > In linux-4.13, Wei worked hard to convert dst to a traditional > refcounted model, removing GC. > > We now want to make sure a dst refcount can not transition from 0 back > to 1. > > The problem here is that input path attached a not refcounted dst to an > skb. Then later, because packet is forwarded and hits skb_dst_force() > before exiting RCU section, we might try to take a refcount on one dst > that is about to be freed, if another cpu saw 1 -> 0 transition in > dst_release() and queued the dst for freeing after one RCU grace period. > > Lets unify skb_dst_force() and skb_dst_force_safe(), since we should > always perform the complete check against dst refcount, and not assume > it is not zero. > > Bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=197005 > > [ 989.919496] skb_dst_force+0x32/0x34 > [ 989.919498] __dev_queue_xmit+0x1ad/0x482 > [ 989.919501] ? eth_header+0x28/0xc6 > [ 989.919502] dev_queue_xmit+0xb/0xd > [ 989.919504] neigh_connected_output+0x9b/0xb4 > [ 989.919507] ip_finish_output2+0x234/0x294 > [ 989.919509] ? ipt_do_table+0x369/0x388 > [ 989.919510] ip_finish_output+0x12c/0x13f > [ 989.919512] ip_output+0x53/0x87 > [ 989.919513] ip_forward_finish+0x53/0x5a > [ 989.919515] ip_forward+0x2cb/0x3e6 > [ 989.919516] ? pskb_trim_rcsum.part.9+0x4b/0x4b > [ 989.919518] ip_rcv_finish+0x2e2/0x321 > [ 989.919519] ip_rcv+0x26f/0x2eb > [ 989.919522] ? vlan_do_receive+0x4f/0x289 > [ 989.919523] __netif_receive_skb_core+0x467/0x50b > [ 989.919526] ? tcp_gro_receive+0x239/0x239 > [ 989.919529] ? inet_gro_receive+0x226/0x238 > [ 989.919530] __netif_receive_skb+0x4d/0x5f > [ 989.919532] netif_receive_skb_internal+0x5c/0xaf > [ 989.919533] napi_gro_receive+0x45/0x81 > [ 989.919536] ixgbe_poll+0xc8a/0xf09 > [ 989.919539] ? kmem_cache_free_bulk+0x1b6/0x1f7 > [ 989.919540] net_rx_action+0xf4/0x266 > [ 989.919543] __do_softirq+0xa8/0x19d > [ 989.919545] irq_exit+0x5d/0x6b > [ 989.919546] do_IRQ+0x9c/0xb5 > [ 989.919548] common_interrupt+0x93/0x93 > [ 989.919548] </IRQ> > > > Similarly dst_clone() can use dst_hold() helper to have additional > debugging, as a follow up to commit 44ebe79149ff ("net: add debug > atomic_inc_not_zero() in dst_hold()") > > In net-next we will convert dst atomic_t to refcount_t for peace of > mind. > > Fixes: a4c2fd7f7891 ("net: remove DST_NOCACHE flag") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Wei Wang <weiwan@google.com> > Reported-by: Paweł Staszewski <pstaszewski@itcare.pl> > Bisected-by: Paweł Staszewski <pstaszewski@itcare.pl> > --- Thanks a lot for the fix Eric. It makes sense to unify all the usage of skb_dst_force() to always check on the refcnt not being 0. And thank you Pawel for reporting and testing on this. Acked-by: Wei Wang <weiwan@google.com> > include/net/dst.h | 22 ++++------------------ > include/net/route.h | 2 +- > include/net/sock.h | 2 +- > 3 files changed, 6 insertions(+), 20 deletions(-) > > diff --git a/include/net/dst.h b/include/net/dst.h > index 93568bd0a3520bb7402f04d90cf04ac99c81cfbe..06a6765da074449e6f1fe42ee05e711e898ad372 100644 > --- a/include/net/dst.h > +++ b/include/net/dst.h > @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) > static inline struct dst_entry *dst_clone(struct dst_entry *dst) > { > if (dst) > - atomic_inc(&dst->__refcnt); > + dst_hold(dst); > return dst; > } > > @@ -311,21 +311,6 @@ static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb > __skb_dst_copy(nskb, oskb->_skb_refdst); > } > > -/** > - * skb_dst_force - makes sure skb dst is refcounted > - * @skb: buffer > - * > - * If dst is not yet refcounted, let's do it > - */ > -static inline void skb_dst_force(struct sk_buff *skb) > -{ > - if (skb_dst_is_noref(skb)) { > - WARN_ON(!rcu_read_lock_held()); > - skb->_skb_refdst &= ~SKB_DST_NOREF; > - dst_clone(skb_dst(skb)); > - } > -} > - > /** > * dst_hold_safe - Take a reference on a dst if possible > * @dst: pointer to dst entry > @@ -339,16 +324,17 @@ static inline bool dst_hold_safe(struct dst_entry *dst) > } > > /** > - * skb_dst_force_safe - makes sure skb dst is refcounted > + * skb_dst_force - makes sure skb dst is refcounted > * @skb: buffer > * > * If dst is not yet refcounted and not destroyed, grab a ref on it. > */ > -static inline void skb_dst_force_safe(struct sk_buff *skb) > +static inline void skb_dst_force(struct sk_buff *skb) > { > if (skb_dst_is_noref(skb)) { > struct dst_entry *dst = skb_dst(skb); > > + WARN_ON(!rcu_read_lock_held()); > if (!dst_hold_safe(dst)) > dst = NULL; > > diff --git a/include/net/route.h b/include/net/route.h > index 1b09a9368c68d46f0c5ee8ce3cefe566000c1ec1..57dfc6850d378e4b96f13b140eef554d66c24cdf 100644 > --- a/include/net/route.h > +++ b/include/net/route.h > @@ -190,7 +190,7 @@ static inline int ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src, > rcu_read_lock(); > err = ip_route_input_noref(skb, dst, src, tos, devin); > if (!err) { > - skb_dst_force_safe(skb); > + skb_dst_force(skb); > if (!skb_dst(skb)) > err = -EINVAL; > } > diff --git a/include/net/sock.h b/include/net/sock.h > index 03a362568357acc7278a318423dd3873103f90ca..a6b9a8d1a6df3f72df8f1aac0f577257fa6452d0 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -856,7 +856,7 @@ void sk_stream_write_space(struct sock *sk); > static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb) > { > /* dont let skb dst not refcounted, we are going to leave rcu lock */ > - skb_dst_force_safe(skb); > + skb_dst_force(skb); > > if (!sk->sk_backlog.tail) > sk->sk_backlog.head = skb; > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH net] net: prevent dst uses after free 2017-09-21 16:15 ` [PATCH net] net: prevent dst uses after free Eric Dumazet 2017-09-21 16:49 ` Wei Wang @ 2017-09-21 17:12 ` Martin KaFai Lau 2017-09-22 3:42 ` David Miller 2 siblings, 0 replies; 52+ messages in thread From: Martin KaFai Lau @ 2017-09-21 17:12 UTC (permalink / raw) To: Eric Dumazet Cc: Paweł Staszewski, Wei Wang, Cong Wang, Linux Kernel Network Developers, Eric Dumazet On Thu, Sep 21, 2017 at 04:15:46PM +0000, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > In linux-4.13, Wei worked hard to convert dst to a traditional > refcounted model, removing GC. > > We now want to make sure a dst refcount can not transition from 0 back > to 1. > > The problem here is that input path attached a not refcounted dst to an > skb. Then later, because packet is forwarded and hits skb_dst_force() > before exiting RCU section, we might try to take a refcount on one dst > that is about to be freed, if another cpu saw 1 -> 0 transition in > dst_release() and queued the dst for freeing after one RCU grace period. > > Lets unify skb_dst_force() and skb_dst_force_safe(), since we should > always perform the complete check against dst refcount, and not assume > it is not zero. Acked-by: Martin KaFai Lau <kafai@fb.com> > > Bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=197005 > > [ 989.919496] skb_dst_force+0x32/0x34 > [ 989.919498] __dev_queue_xmit+0x1ad/0x482 > [ 989.919501] ? eth_header+0x28/0xc6 > [ 989.919502] dev_queue_xmit+0xb/0xd > [ 989.919504] neigh_connected_output+0x9b/0xb4 > [ 989.919507] ip_finish_output2+0x234/0x294 > [ 989.919509] ? ipt_do_table+0x369/0x388 > [ 989.919510] ip_finish_output+0x12c/0x13f > [ 989.919512] ip_output+0x53/0x87 > [ 989.919513] ip_forward_finish+0x53/0x5a > [ 989.919515] ip_forward+0x2cb/0x3e6 > [ 989.919516] ? pskb_trim_rcsum.part.9+0x4b/0x4b > [ 989.919518] ip_rcv_finish+0x2e2/0x321 > [ 989.919519] ip_rcv+0x26f/0x2eb > [ 989.919522] ? vlan_do_receive+0x4f/0x289 > [ 989.919523] __netif_receive_skb_core+0x467/0x50b > [ 989.919526] ? tcp_gro_receive+0x239/0x239 > [ 989.919529] ? inet_gro_receive+0x226/0x238 > [ 989.919530] __netif_receive_skb+0x4d/0x5f > [ 989.919532] netif_receive_skb_internal+0x5c/0xaf > [ 989.919533] napi_gro_receive+0x45/0x81 > [ 989.919536] ixgbe_poll+0xc8a/0xf09 > [ 989.919539] ? kmem_cache_free_bulk+0x1b6/0x1f7 > [ 989.919540] net_rx_action+0xf4/0x266 > [ 989.919543] __do_softirq+0xa8/0x19d > [ 989.919545] irq_exit+0x5d/0x6b > [ 989.919546] do_IRQ+0x9c/0xb5 > [ 989.919548] common_interrupt+0x93/0x93 > [ 989.919548] </IRQ> > > > Similarly dst_clone() can use dst_hold() helper to have additional > debugging, as a follow up to commit 44ebe79149ff ("net: add debug > atomic_inc_not_zero() in dst_hold()") > > In net-next we will convert dst atomic_t to refcount_t for peace of > mind. > > Fixes: a4c2fd7f7891 ("net: remove DST_NOCACHE flag") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Wei Wang <weiwan@google.com> > Reported-by: Paweł Staszewski <pstaszewski@itcare.pl> > Bisected-by: Paweł Staszewski <pstaszewski@itcare.pl> > --- > include/net/dst.h | 22 ++++------------------ > include/net/route.h | 2 +- > include/net/sock.h | 2 +- > 3 files changed, 6 insertions(+), 20 deletions(-) > > diff --git a/include/net/dst.h b/include/net/dst.h > index 93568bd0a3520bb7402f04d90cf04ac99c81cfbe..06a6765da074449e6f1fe42ee05e711e898ad372 100644 > --- a/include/net/dst.h > +++ b/include/net/dst.h > @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) > static inline struct dst_entry *dst_clone(struct dst_entry *dst) > { > if (dst) > - atomic_inc(&dst->__refcnt); > + dst_hold(dst); > return dst; > } > > @@ -311,21 +311,6 @@ static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb > __skb_dst_copy(nskb, oskb->_skb_refdst); > } > > -/** > - * skb_dst_force - makes sure skb dst is refcounted > - * @skb: buffer > - * > - * If dst is not yet refcounted, let's do it > - */ > -static inline void skb_dst_force(struct sk_buff *skb) > -{ > - if (skb_dst_is_noref(skb)) { > - WARN_ON(!rcu_read_lock_held()); > - skb->_skb_refdst &= ~SKB_DST_NOREF; > - dst_clone(skb_dst(skb)); > - } > -} > - > /** > * dst_hold_safe - Take a reference on a dst if possible > * @dst: pointer to dst entry > @@ -339,16 +324,17 @@ static inline bool dst_hold_safe(struct dst_entry *dst) > } > > /** > - * skb_dst_force_safe - makes sure skb dst is refcounted > + * skb_dst_force - makes sure skb dst is refcounted > * @skb: buffer > * > * If dst is not yet refcounted and not destroyed, grab a ref on it. > */ > -static inline void skb_dst_force_safe(struct sk_buff *skb) > +static inline void skb_dst_force(struct sk_buff *skb) > { > if (skb_dst_is_noref(skb)) { > struct dst_entry *dst = skb_dst(skb); > > + WARN_ON(!rcu_read_lock_held()); > if (!dst_hold_safe(dst)) > dst = NULL; > > diff --git a/include/net/route.h b/include/net/route.h > index 1b09a9368c68d46f0c5ee8ce3cefe566000c1ec1..57dfc6850d378e4b96f13b140eef554d66c24cdf 100644 > --- a/include/net/route.h > +++ b/include/net/route.h > @@ -190,7 +190,7 @@ static inline int ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src, > rcu_read_lock(); > err = ip_route_input_noref(skb, dst, src, tos, devin); > if (!err) { > - skb_dst_force_safe(skb); > + skb_dst_force(skb); > if (!skb_dst(skb)) > err = -EINVAL; > } > diff --git a/include/net/sock.h b/include/net/sock.h > index 03a362568357acc7278a318423dd3873103f90ca..a6b9a8d1a6df3f72df8f1aac0f577257fa6452d0 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -856,7 +856,7 @@ void sk_stream_write_space(struct sock *sk); > static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb) > { > /* dont let skb dst not refcounted, we are going to leave rcu lock */ > - skb_dst_force_safe(skb); > + skb_dst_force(skb); > > if (!sk->sk_backlog.tail) > sk->sk_backlog.head = skb; > > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH net] net: prevent dst uses after free 2017-09-21 16:15 ` [PATCH net] net: prevent dst uses after free Eric Dumazet 2017-09-21 16:49 ` Wei Wang 2017-09-21 17:12 ` Martin KaFai Lau @ 2017-09-22 3:42 ` David Miller 2 siblings, 0 replies; 52+ messages in thread From: David Miller @ 2017-09-22 3:42 UTC (permalink / raw) To: eric.dumazet; +Cc: pstaszewski, weiwan, xiyou.wangcong, netdev, edumazet From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 21 Sep 2017 09:15:46 -0700 > From: Eric Dumazet <edumazet@google.com> > > In linux-4.13, Wei worked hard to convert dst to a traditional > refcounted model, removing GC. > > We now want to make sure a dst refcount can not transition from 0 back > to 1. > > The problem here is that input path attached a not refcounted dst to an > skb. Then later, because packet is forwarded and hits skb_dst_force() > before exiting RCU section, we might try to take a refcount on one dst > that is about to be freed, if another cpu saw 1 -> 0 transition in > dst_release() and queued the dst for freeing after one RCU grace period. > > Lets unify skb_dst_force() and skb_dst_force_safe(), since we should > always perform the complete check against dst refcount, and not assume > it is not zero. > > Bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=197005 ... > Similarly dst_clone() can use dst_hold() helper to have additional > debugging, as a follow up to commit 44ebe79149ff ("net: add debug > atomic_inc_not_zero() in dst_hold()") > > In net-next we will convert dst atomic_t to refcount_t for peace of > mind. > > Fixes: a4c2fd7f7891 ("net: remove DST_NOCACHE flag") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Wei Wang <weiwan@google.com> > Reported-by: Paweł Staszewski <pstaszewski@itcare.pl> > Bisected-by: Paweł Staszewski <pstaszewski@itcare.pl> Applied and queued up for -stable, thanks Eric. ^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2017-09-22 3:42 UTC | newest] Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <4745525f-18e4-7f69-fe21-8e507e407b33@itcare.pl> 2017-09-19 22:35 ` Latest net-next from GIT panic Paweł Staszewski 2017-09-19 23:45 ` Paweł Staszewski 2017-09-20 0:01 ` Paweł Staszewski 2017-09-20 0:06 ` Paweł Staszewski 2017-09-20 0:26 ` Paweł Staszewski 2017-09-20 3:24 ` Eric Dumazet 2017-09-20 7:58 ` Paweł Staszewski 2017-09-20 8:44 ` Paweł Staszewski 2017-09-20 9:45 ` Paweł Staszewski 2017-09-20 10:21 ` Paweł Staszewski 2017-09-20 10:22 ` Paweł Staszewski 2017-09-20 11:02 ` Paweł Staszewski 2017-09-20 12:23 ` Paweł Staszewski 2017-09-20 12:49 ` Paweł Staszewski 2017-09-20 13:05 ` Paweł Staszewski 2017-09-20 13:09 ` Paweł Staszewski 2017-09-20 13:11 ` Eric Dumazet 2017-09-20 13:16 ` Paweł Staszewski 2017-09-20 13:34 ` Eric Dumazet 2017-09-20 13:37 ` Eric Dumazet 2017-09-20 13:39 ` Paweł Staszewski 2017-09-20 13:44 ` Eric Dumazet 2017-09-20 14:03 ` Paweł Staszewski 2017-09-20 14:40 ` Eric Dumazet 2017-09-20 15:05 ` Paweł Staszewski 2017-09-20 17:46 ` Wei Wang 2017-09-20 17:58 ` Paweł Staszewski 2017-09-20 17:50 ` Cong Wang 2017-09-20 17:59 ` Eric Dumazet [not found] ` <3c227be7-a954-a406-1987-24e908cf214c@itcare.pl> 2017-09-20 18:22 ` Cong Wang 2017-09-20 18:30 ` Eric Dumazet 2017-09-20 18:36 ` Cong Wang 2017-09-20 19:13 ` Paweł Staszewski 2017-09-20 19:23 ` Paweł Staszewski 2017-09-20 21:10 ` Paweł Staszewski 2017-09-20 21:24 ` Paweł Staszewski 2017-09-20 21:25 ` Paweł Staszewski 2017-09-20 21:27 ` Paweł Staszewski 2017-09-20 22:09 ` Wei Wang 2017-09-21 1:09 ` Wei Wang 2017-09-21 1:17 ` Eric Dumazet 2017-09-21 9:06 ` Paweł Staszewski 2017-09-21 11:03 ` Eric Dumazet 2017-09-21 11:12 ` Paweł Staszewski 2017-09-21 11:14 ` Paweł Staszewski 2017-09-21 11:31 ` Paweł Staszewski 2017-09-21 13:18 ` Paweł Staszewski 2017-09-21 14:56 ` Eric Dumazet 2017-09-21 16:15 ` [PATCH net] net: prevent dst uses after free Eric Dumazet 2017-09-21 16:49 ` Wei Wang 2017-09-21 17:12 ` Martin KaFai Lau 2017-09-22 3:42 ` David Miller
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.