* [GIT PULL] First batch of KVM changes for 4.1 @ 2015-04-10 15:01 Paolo Bonzini 2015-04-17 8:52 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: Paolo Bonzini @ 2015-04-10 15:01 UTC (permalink / raw) To: torvalds; +Cc: linux-kernel, gleb, kvm, Ralf Baechle Linus, The following changes since commit ae705930fca6322600690df9dc1c7d0516145a93: arm/arm64: KVM: Keep elrsr/aisr in sync with software model (2015-03-14 13:42:07 +0100) are available in the git repository at: git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus for you to fetch changes up to ca3f0874723fad81d0c701b63ae3a17a408d5f25: KVM: use slowpath for cross page cached accesses (2015-04-10 16:04:45 +0200) Note that this includes some MIPS patches that will also come through Ralf Baechle's tree (CCed). As usual, PPC will be sent in a second pull request. This time, no child is due during the merge window, and hence the KVM/PPC pull request shall come normally from me. Thanks, Paolo ---------------------------------------------------------------- The most interesting bit here is irqfd/ioeventfd support for ARM and ARM64. ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling vhost, too), page aging s390: interrupt handling rework, allowing to inject all local interrupts via new ioctl and to get/set the full local irq state for migration and introspection. New ioctls to access memory by virtual address, and to get/set the guest storage keys. SIMD support. MIPS: FPU and MIPS SIMD Architecture (MSA) support. Includes some patches from Ralf Baechle's MIPS tree. x86: bugfixes (notably for pvclock, the others are small) and cleanups. Another small latency improvement for the TSC deadline timer. ---------------------------------------------------------------- Alex Bennée (2): arm/arm64: KVM: export VCPU power state via MP_STATE ioctl arm/arm64: KVM: add a common vgic_queue_irq_to_lr fn Alexander Yarygin (5): KVM: s390: Use the read_guest_abs() in guest debug functions KVM: s390: Fix low-address protection for real addresses KVM: s390: Guest's memory access functions get access registers KVM: s390: Optimize paths where get_vcpu_asce() is invoked KVM: s390: Add access register mode Andre Przywara (10): KVM: move iodev.h from virt/kvm/ to include/kvm KVM: arm/arm64: remove now unneeded include directory from Makefile KVM: x86: remove now unneeded include directory from Makefile KVM: arm/arm64: rename struct kvm_mmio_range to vgic_io_range KVM: arm/arm64: simplify vgic_find_range() and callers KVM: arm/arm64: implement kvm_io_bus MMIO handling for the VGIC KVM: arm/arm64: prepare GICv2 emulation to be handled by kvm_io_bus KVM: arm/arm64: merge GICv3 RD_base and SGI_base register frames KVM: arm/arm64: prepare GICv3 emulation to use kvm_io_bus MMIO handling KVM: arm/arm64: rework MMIO abort handling to use KVM MMIO bus Arseny Solokha (1): kvm/ppc/mpic: drop unused IRQ_testbit Bandan Das (1): KVM: SVM: Fix confusing message if no exit handlers are installed Christian Borntraeger (3): KVM: make halt_poll_ns static KVM: MAINTAINERS: add file arch/x86/kernel/kvm.c|kvmclock.c KVM: s390: enable more features that need no hypervisor changes Christoffer Dall (3): arm/arm64: KVM: Kill CONFIG_KVM_ARM_{VGIC,TIMER} arm/arm64: KVM: support for un-queuing active IRQs arm/arm64: KVM: Fix migration race in the arch timer David Hildenbrand (5): KVM: s390: fix handling of write errors in the tpi handler KVM: s390: reinjection of irqs can fail in the tpi handler KVM: s390: fix instruction interception trace point KVM: s390: store the breaking-event address on pgm interrupts KVM: s390: cpu timer irq priority David Kaplan (3): KVM: SVM: use kvm_register_write()/read() kvm: svm: make wbinvd faster x86: svm: use cr_interception for SVM_EXIT_CR0_SEL_WRITE Dominik Dingel (1): KVM: s390: cleanup jump lables in kvm_arch_init_vm Ekaterina Tumanova (2): KVM: s390: Zero out current VMDB of STSI before including level3 data. KVM: s390: introduce post handlers for STSI Eric Auger (5): KVM: arm/arm64: unset CONFIG_HAVE_KVM_IRQCHIP KVM: introduce kvm_arch_intc_initialized and use it in irqfd KVM: arm/arm64: implement kvm_arch_intc_initialized KVM: arm/arm64: remove coarse grain dist locking at kvm_vgic_sync_hwstate KVM: arm/arm64: add irqfd support Eric Farman (5): KVM: s390: Allocate and save/restore vector registers KVM: s390: Vector exceptions KVM: s390: Add new SIGP order to kernel counters KVM: s390: Machine Check KVM: s390: Enable vector support for capable guest Eugene Korenevsky (4): KVM: remove useless check of "ret" variable prior to returning the same value KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry KVM: nVMX: remove unnecessary double caching of MAXPHYADDR Geert Uytterhoeven (1): KVM: s390: Spelling s/intance/instance/ James Hogan (24): MIPS: lose_fpu(): Disable FPU when MSA enabled Revert "MIPS: Don't assume 64-bit FP registers for context switch" MIPS: MSA: Fix big-endian FPR_IDX implementation Merge branch '4.1-fp' of git://git.linux-mips.org/pub/scm/ralf/upstream-sfr into kvm_mips_queue MIPS: KVM: Handle MSA Disabled exceptions from guest MIPS: Clear [MSA]FPE CSR.Cause after notify_die() MIPS: KVM: Handle TRAP exceptions from guest kernel MIPS: KVM: Implement PRid CP0 register access MIPS: KVM: Sort kvm_mips_get_reg() registers MIPS: KVM: Drop pr_info messages on init/exit MIPS: KVM: Clean up register definitions a little MIPS: KVM: Simplify default guest Config registers MIPS: KVM: Add Config4/5 and writing of Config registers MIPS: KVM: Add vcpu_get_regs/vcpu_set_regs callback MIPS: KVM: Add base guest FPU support MIPS: KVM: Emulate FPU bits in COP0 interface MIPS: KVM: Add FP exception handling MIPS: KVM: Expose FPU registers MIPS: KVM: Wire up FPU capability MIPS: KVM: Add base guest MSA support MIPS: KVM: Emulate MSA bits in COP0 interface MIPS: KVM: Add MSA exception handling MIPS: KVM: Expose MSA registers MIPS: KVM: Wire up MSA capability Jan Kiszka (3): KVM: x86: Fix re-execution of patched vmmcall KVM: nVMX: Do not emulate #UD while in guest mode KVM: nVMX: Add support for rdtscp Jason J. Herne (1): KVM: s390: Create ioctl for Getting/Setting guest storage keys Jens Freimann (5): KVM: s390: fix get_all_floating_irqs KVM: s390: deliver floating interrupts in order of priority KVM: s390: add ioctl to inject local interrupts KVM: s390: refactor vcpu injection function KVM: s390: migrate vcpu interrupt state Joe Perches (1): x86: Use bool function return values of true/false not 1/0 Joel Schopp (1): kvm: x86: make kvm_emulate_* consistant Kevin Mulvey (2): KVM: white space formatting in kvm_main.c KVM: fix checkpatch.pl errors in kvm/irqchip.c Marc Zyngier (3): arm/arm64: KVM: Allow handle_hva_to_gpa to return a value arm/arm64: KVM: Implement Stage-2 page aging arm/arm64: KVM: Optimize handling of Access Flag faults Marcelo Tosatti (3): Merge tag 'kvm-s390-next-20150306' of git://git.kernel.org/.../kvms390/linux into queue x86: kvm: Revert "remove sched notifier for cross-cpu migrations" Merge tag 'kvm-s390-next-20150318' of git://git.kernel.org/.../kvms390/linux into queue Mark Rutland (1): KVM: vgic: add virt-capable compatible strings Michael Mueller (3): KVM: s390: perform vcpu model setup in a function KVM: s390: drop SIMD bit from kvm_s390_fac_list_mask KVM: s390: represent SIMD cap in kvm facility Nadav Amit (8): KVM: x86: CMOV emulation on legacy mode is wrong KVM: x86: POPA emulation may not clear bits [63:32] KVM: x86: BSF and BSR emulation change register unnecassarily KVM: x86: removing redundant eflags bits definitions KVM: x86: Remove redundant definitions KVM: x86: BSP in MSR_IA32_APICBASE is writable KVM: x86: DR0-DR3 are not clear on reset KVM: x86: Clear CR2 on VCPU reset Nikolay Nikolaev (2): KVM: Redesign kvm_io_bus_ API to pass VCPU structure to the callbacks. KVM: arm/arm64: enable KVM_CAP_IOEVENTFD Paolo Bonzini (8): KVM: arm/arm64: prefer IS_ENABLED to a static variable Merge tag 'kvm_mips_20150327' of git://git.kernel.org/.../jhogan/kvm-mips into kvm-next Merge tag 'kvm-arm-fixes-4.0-rc5' of git://git.kernel.org/.../kvmarm/kvmarm into 'kvm-next' Merge tag 'kvm-arm-for-4.1' of git://git.kernel.org/.../kvmarm/kvmarm into 'kvm-next' Merge tag 'kvm-s390-next-20150331' of git://git.kernel.org/.../kvms390/linux into HEAD KVM: x86: extract blocking logic from __vcpu_run KVM: x86: optimize delivery of TSC deadline timer interrupt KVM: remove kvm_read_hva and kvm_read_hva_atomic Paul Burton (8): MIPS: Push .set mips64r* into the functions needing it MIPS: assume at as source/dest of MSA copy/insert instructions MIPS: remove MSA macro recursion MIPS: wrap cfcmsa & ctcmsa accesses for toolchains with MSA support MIPS: clear MSACSR cause bits when handling MSA FP exception MIPS: Ensure FCSR cause bits are clear after invoking FPU emulator MIPS: prevent FP context set via ptrace being discarded MIPS: disable FPU if the mode is unsupported Petr Matousek (1): kvm: x86: i8259: return initialized data on invalid-size read Radim Krčmář (8): KVM: x86: inline kvm_ioapic_handles_vector() x86: vdso: fix pvclock races with task migration KVM: vmx: pass error code with internal error #2 KVM: x86: use MDA for interrupt matching KVM: x86: fix mixed APIC mode broadcast KVM: x86: avoid logical_map when it is invalid KVM: x86: simplify kvm_apic_map KVM: use slowpath for cross page cached accesses Takuya Yoshikawa (1): KVM: Eliminate extra function calls in kvm_get_dirty_log_protect() Thomas Huth (5): KVM: s390: Nullify instruction for certain program exceptions KVM: s390: Forward PSW to next instruction for addressing exceptions KVM: s390: Use insn_length() to calculate length of instruction KVM: Get rid of kvm_kvfree() KVM: s390: Add MEMOP ioctls for reading/writing guest memory Wanpeng Li (2): kvm: x86: fix x86 eflags fixed bit kvm: mmu: lazy collapse small sptes into large sptes Wincy Van (1): KVM: ioapic: Record edge-triggered interrupts delivery status Xiubo Li (10): KVM: Fix WARNINGs for 'sizeof(X)' instead of 'sizeof X' in kvm_main.c KVM: Fix WARNING: labels should not be indented in kvm_main.c KVM: Fix ERROR: do not initialise statics to 0 or NULL in kvm_main.c KVM: EXPORT_SYMBOL should immediately follow its function KVM: Missing blank line after declarations in kvm_main.c KVM: no space before tabs in kvm_main.c KVM: Fix indentation in kvm_main.c KVM: Use pr_info/pr_err in kvm_main.c KVM: x86: Avoid using plain integer as NULL pointer warning KVM: x86: For the symbols used locally only should be static type Yannick Guerrini (1): KVM: s390: Fix trivial typo in comments Documentation/CodeOfConflict | 27 + Documentation/devicetree/bindings/i2c/i2c-imx.txt | 1 + .../devicetree/bindings/net/apm-xgene-enet.txt | 5 +- .../bindings/serial/snps-dw-apb-uart.txt | 16 + Documentation/power/suspend-and-interrupts.txt | 22 +- Documentation/virtual/kvm/api.txt | 335 +++++- Documentation/virtual/kvm/devices/s390_flic.txt | 3 + MAINTAINERS | 15 +- Makefile | 2 +- arch/arm/include/asm/kvm_arm.h | 1 + arch/arm/include/asm/kvm_host.h | 15 +- arch/arm/include/asm/kvm_mmio.h | 22 - arch/arm/include/uapi/asm/kvm.h | 3 + arch/arm/kernel/asm-offsets.c | 4 - arch/arm/kvm/Kconfig | 30 +- arch/arm/kvm/Makefile | 12 +- arch/arm/kvm/arm.c | 45 +- arch/arm/kvm/guest.c | 18 - arch/arm/kvm/interrupts_head.S | 8 - arch/arm/kvm/mmio.c | 64 +- arch/arm/kvm/mmu.c | 134 ++- arch/arm/kvm/trace.h | 48 + arch/arm/mach-pxa/idp.c | 1 + arch/arm/mach-pxa/lpd270.c | 2 +- arch/arm/mach-sa1100/neponset.c | 4 +- arch/arm/mach-sa1100/pleb.c | 2 +- arch/arm64/boot/dts/apm/apm-storm.dtsi | 4 +- arch/arm64/include/asm/esr.h | 1 + arch/arm64/include/asm/kvm_arm.h | 1 + arch/arm64/include/asm/kvm_host.h | 15 +- arch/arm64/include/asm/kvm_mmio.h | 22 - arch/arm64/include/uapi/asm/kvm.h | 3 + arch/arm64/kvm/Kconfig | 18 +- arch/arm64/kvm/Makefile | 20 +- arch/arm64/mm/pageattr.c | 5 +- arch/mips/include/asm/asmmacro-32.h | 128 +-- arch/mips/include/asm/asmmacro.h | 218 ++-- arch/mips/include/asm/fpu.h | 20 +- arch/mips/include/asm/kdebug.h | 3 +- arch/mips/include/asm/kvm_host.h | 125 ++- arch/mips/include/asm/processor.h | 2 +- arch/mips/include/uapi/asm/kvm.h | 164 +-- arch/mips/kernel/asm-offsets.c | 105 +- arch/mips/kernel/genex.S | 15 +- arch/mips/kernel/ptrace.c | 30 +- arch/mips/kernel/r4k_fpu.S | 2 +- arch/mips/kernel/traps.c | 33 +- arch/mips/kvm/Makefile | 8 +- arch/mips/kvm/emulate.c | 332 +++++- arch/mips/kvm/fpu.S | 122 +++ arch/mips/kvm/locore.S | 38 + arch/mips/kvm/mips.c | 472 ++++++++- arch/mips/kvm/msa.S | 161 +++ arch/mips/kvm/stats.c | 4 + arch/mips/kvm/tlb.c | 6 + arch/mips/kvm/trap_emul.c | 199 +++- arch/powerpc/kvm/mpic.c | 17 +- arch/powerpc/kvm/powerpc.c | 4 +- arch/s390/include/asm/kvm_host.h | 46 +- arch/s390/include/asm/mmu_context.h | 2 +- arch/s390/include/asm/page.h | 11 +- arch/s390/include/uapi/asm/kvm.h | 4 + arch/s390/include/uapi/asm/sie.h | 4 +- arch/s390/kernel/asm-offsets.c | 1 + arch/s390/kernel/jump_label.c | 12 +- arch/s390/kernel/module.c | 1 + arch/s390/kernel/processor.c | 2 +- arch/s390/kvm/diag.c | 6 +- arch/s390/kvm/gaccess.c | 296 +++++- arch/s390/kvm/gaccess.h | 21 +- arch/s390/kvm/guestdbg.c | 8 +- arch/s390/kvm/intercept.c | 5 +- arch/s390/kvm/interrupt.c | 1101 +++++++++++++------- arch/s390/kvm/kvm-s390.c | 398 ++++++- arch/s390/kvm/kvm-s390.h | 51 +- arch/s390/kvm/priv.c | 144 ++- arch/s390/kvm/sigp.c | 7 +- arch/s390/pci/pci.c | 28 +- arch/s390/pci/pci_mmio.c | 17 +- arch/x86/Kconfig | 1 + arch/x86/include/asm/kvm_host.h | 28 +- arch/x86/include/asm/kvm_para.h | 2 +- arch/x86/include/asm/pvclock.h | 1 + arch/x86/include/asm/xsave.h | 28 +- arch/x86/include/uapi/asm/vmx.h | 1 + arch/x86/kernel/entry_64.S | 13 +- arch/x86/kernel/pvclock.c | 44 + arch/x86/kvm/Makefile | 2 +- arch/x86/kvm/cpuid.c | 33 +- arch/x86/kvm/cpuid.h | 8 +- arch/x86/kvm/emulate.c | 193 ++-- arch/x86/kvm/i8254.c | 14 +- arch/x86/kvm/i8254.h | 2 +- arch/x86/kvm/i8259.c | 13 +- arch/x86/kvm/ioapic.c | 22 +- arch/x86/kvm/ioapic.h | 11 +- arch/x86/kvm/irq.h | 2 +- arch/x86/kvm/lapic.c | 147 +-- arch/x86/kvm/lapic.h | 17 +- arch/x86/kvm/mmu.c | 73 ++ arch/x86/kvm/pmu.c | 2 +- arch/x86/kvm/svm.c | 43 +- arch/x86/kvm/vmx.c | 146 +-- arch/x86/kvm/x86.c | 157 ++- arch/x86/pci/acpi.c | 11 +- arch/x86/vdso/vclock_gettime.c | 34 +- drivers/acpi/resource.c | 4 +- drivers/acpi/video.c | 20 +- drivers/android/binder.c | 10 +- drivers/ata/sata_fsl.c | 2 + drivers/base/power/domain.c | 24 +- drivers/base/power/wakeup.c | 1 + drivers/char/tpm/tpm-chip.c | 34 +- drivers/char/tpm/tpm_ibmvtpm.c | 10 +- drivers/char/tpm/tpm_ibmvtpm.h | 6 +- drivers/clk/at91/pmc.c | 20 +- drivers/clk/at91/pmc.h | 1 + drivers/cpufreq/exynos-cpufreq.c | 21 +- drivers/cpufreq/ppc-corenet-cpufreq.c | 2 + drivers/cpuidle/cpuidle.c | 61 +- drivers/dma/at_xdmac.c | 7 +- drivers/dma/dw/core.c | 2 +- drivers/dma/ioat/dma_v3.c | 4 + drivers/dma/mmp_pdma.c | 10 + drivers/dma/mmp_tdma.c | 31 +- drivers/dma/qcom_bam_dma.c | 10 +- drivers/dma/sh/shdmac.c | 15 +- drivers/firmware/dmi_scan.c | 17 +- drivers/firmware/efi/libstub/efi-stub-helper.c | 8 +- drivers/gpu/drm/drm_mm.c | 152 +-- drivers/gpu/drm/i915/i915_debugfs.c | 4 +- drivers/gpu/drm/i915/i915_drv.c | 30 +- drivers/gpu/drm/i915/i915_gem_gtt.c | 6 +- drivers/gpu/drm/i915/intel_fifo_underrun.c | 18 +- drivers/gpu/drm/imx/dw_hdmi-imx.c | 36 +- drivers/gpu/drm/imx/imx-ldb.c | 28 +- drivers/gpu/drm/imx/parallel-display.c | 5 +- drivers/gpu/drm/msm/mdp/mdp4/mdp4_irq.c | 5 + drivers/gpu/drm/msm/mdp/mdp5/mdp5.xml.h | 15 +- drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c | 99 +- drivers/gpu/drm/msm/mdp/mdp5/mdp5_encoder.c | 6 +- drivers/gpu/drm/msm/mdp/mdp5/mdp5_irq.c | 5 + drivers/gpu/drm/msm/msm_atomic.c | 4 +- drivers/gpu/drm/nouveau/nouveau_fbcon.c | 2 +- drivers/gpu/drm/radeon/atombios_crtc.c | 3 + drivers/gpu/drm/radeon/atombios_encoders.c | 30 +- drivers/gpu/drm/radeon/cik.c | 3 + drivers/gpu/drm/radeon/dce6_afmt.c | 68 +- drivers/gpu/drm/radeon/evergreen.c | 3 + drivers/gpu/drm/radeon/evergreen_hdmi.c | 59 +- drivers/gpu/drm/radeon/r100.c | 4 + drivers/gpu/drm/radeon/r600.c | 3 + drivers/gpu/drm/radeon/r600_hdmi.c | 11 - drivers/gpu/drm/radeon/radeon_audio.c | 50 +- drivers/gpu/drm/radeon/radeon_cs.c | 4 +- drivers/gpu/drm/radeon/rs600.c | 4 + drivers/gpu/drm/radeon/si.c | 3 + drivers/gpu/drm/radeon/sid.h | 4 +- drivers/gpu/drm/ttm/ttm_bo.c | 2 +- drivers/gpu/ipu-v3/ipu-di.c | 2 + drivers/i2c/busses/i2c-designware-baytrail.c | 40 +- drivers/iio/adc/mcp3422.c | 17 +- drivers/iio/adc/qcom-spmi-iadc.c | 3 +- drivers/iio/common/ssp_sensors/ssp_dev.c | 2 + drivers/iio/dac/ad5686.c | 2 +- drivers/iio/humidity/dht11.c | 69 +- drivers/iio/humidity/si7020.c | 6 +- drivers/iio/imu/adis16400_core.c | 3 +- drivers/iio/imu/inv_mpu6050/inv_mpu_core.c | 6 +- drivers/iio/light/Kconfig | 2 + drivers/iio/magnetometer/Kconfig | 2 + drivers/input/keyboard/tc3589x-keypad.c | 6 +- drivers/input/misc/mma8450.c | 1 + drivers/input/mouse/alps.c | 4 +- drivers/input/mouse/cyapa_gen3.c | 2 +- drivers/input/mouse/cyapa_gen5.c | 4 +- drivers/input/mouse/focaltech.c | 50 +- drivers/input/mouse/psmouse-base.c | 14 +- drivers/input/mouse/psmouse.h | 6 + drivers/input/touchscreen/Kconfig | 1 + drivers/misc/mei/init.c | 2 + drivers/net/can/dev.c | 8 + drivers/net/can/usb/kvaser_usb.c | 48 +- drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 4 + drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 2 +- drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 4 + drivers/net/ethernet/broadcom/bcm63xx_enet.c | 8 +- drivers/net/ethernet/broadcom/bgmac.c | 7 - drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 3 + drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c | 6 +- drivers/net/ethernet/cadence/macb.c | 8 +- drivers/net/ethernet/cadence/macb.h | 2 +- drivers/net/ethernet/freescale/fec_main.c | 3 +- drivers/net/ethernet/freescale/gianfar.c | 19 +- drivers/net/ethernet/smsc/smc91x.c | 1 + .../net/ethernet/stmicro/stmmac/stmmac_platform.c | 65 +- drivers/net/team/team.c | 6 +- drivers/net/xen-netback/interface.c | 3 +- drivers/net/xen-netback/netback.c | 22 +- drivers/pci/host/pci-versatile.c | 2 +- drivers/regulator/core.c | 7 - drivers/regulator/da9210-regulator.c | 9 + drivers/regulator/rk808-regulator.c | 8 + drivers/rtc/rtc-at91rm9200.c | 62 +- drivers/rtc/rtc-at91sam9.c | 73 +- drivers/s390/block/dcssblk.c | 2 +- drivers/s390/block/scm_blk_cluster.c | 2 +- drivers/spi/spi-atmel.c | 12 +- drivers/spi/spi-dw-mid.c | 6 + drivers/spi/spi-dw-pci.c | 4 +- drivers/spi/spi-dw.c | 4 +- drivers/spi/spi-img-spfi.c | 7 + drivers/spi/spi-pl022.c | 2 +- drivers/spi/spi-ti-qspi.c | 22 + drivers/staging/comedi/drivers/adv_pci1710.c | 3 +- drivers/staging/comedi/drivers/comedi_isadma.c | 5 +- drivers/staging/comedi/drivers/vmk80xx.c | 71 -- drivers/staging/iio/adc/mxs-lradc.c | 207 ++-- drivers/staging/iio/resolver/ad2s1200.c | 3 +- .../thermal/int340x_thermal/int340x_thermal_zone.c | 10 +- drivers/thermal/samsung/exynos_tmu.c | 3 +- drivers/thermal/thermal_core.c | 37 +- drivers/tty/bfin_jtag_comm.c | 13 - drivers/tty/serial/8250/8250_core.c | 11 +- drivers/tty/serial/8250/8250_dw.c | 32 + drivers/tty/serial/8250/8250_pci.c | 20 +- drivers/tty/serial/atmel_serial.c | 49 +- drivers/tty/serial/of_serial.c | 4 - drivers/tty/serial/sprd_serial.c | 4 +- drivers/tty/tty_io.c | 4 +- drivers/tty/tty_ioctl.c | 16 +- drivers/usb/class/cdc-acm.c | 2 + drivers/usb/core/devio.c | 2 + drivers/usb/dwc3/dwc3-omap.c | 30 +- drivers/usb/gadget/configfs.c | 2 - drivers/usb/gadget/function/f_hid.c | 2 +- drivers/usb/gadget/function/f_phonet.c | 5 +- drivers/usb/gadget/function/f_sourcesink.c | 4 +- drivers/usb/gadget/function/f_uac2.c | 34 +- drivers/usb/gadget/function/uvc_v4l2.c | 1 + drivers/usb/gadget/function/uvc_video.c | 1 + drivers/usb/gadget/legacy/g_ffs.c | 6 +- drivers/usb/host/xhci-pci.c | 30 + drivers/usb/host/xhci-plat.c | 19 +- drivers/usb/host/xhci-ring.c | 12 +- drivers/usb/host/xhci.c | 100 +- drivers/usb/host/xhci.h | 11 +- drivers/usb/isp1760/isp1760-hcd.c | 6 +- drivers/usb/musb/musb_core.c | 10 +- drivers/usb/musb/musb_dsps.c | 32 +- drivers/usb/musb/musb_host.c | 2 +- drivers/usb/musb/omap2430.c | 7 +- drivers/usb/renesas_usbhs/Kconfig | 1 + drivers/usb/serial/bus.c | 45 +- drivers/usb/serial/ch341.c | 15 +- drivers/usb/serial/console.c | 2 + drivers/usb/serial/cp210x.c | 2 + drivers/usb/serial/ftdi_sio.c | 19 + drivers/usb/serial/ftdi_sio_ids.h | 23 + drivers/usb/serial/generic.c | 5 +- drivers/usb/serial/mxuport.c | 3 +- drivers/usb/serial/pl2303.c | 18 +- drivers/usb/serial/usb-serial.c | 21 +- drivers/usb/storage/unusual_uas.h | 7 + drivers/usb/storage/usb.c | 6 + drivers/video/fbdev/amba-clcd.c | 3 + drivers/video/fbdev/core/fbmon.c | 6 +- drivers/video/fbdev/omap2/dss/display-sysfs.c | 179 ++-- drivers/watchdog/at91sam9_wdt.c | 3 +- fs/btrfs/ctree.c | 8 +- fs/btrfs/extent-tree.c | 16 + fs/btrfs/file.c | 87 +- fs/btrfs/inode.c | 1 - fs/btrfs/ordered-data.c | 7 +- fs/btrfs/send.c | 171 ++- fs/btrfs/transaction.c | 3 - fs/btrfs/tree-log.c | 2 +- fs/btrfs/xattr.c | 8 +- fs/ecryptfs/ecryptfs_kernel.h | 4 +- fs/ecryptfs/file.c | 34 +- fs/ecryptfs/keystore.c | 2 +- fs/ecryptfs/main.c | 2 +- fs/locks.c | 3 +- fs/nfs/client.c | 2 +- fs/nfs/delegation.c | 45 +- fs/nfs/dir.c | 22 +- fs/nfs/file.c | 11 +- fs/nfs/inode.c | 111 +- fs/nfs/internal.h | 1 + fs/nfs/nfs3proc.c | 4 +- fs/nfs/nfs3xdr.c | 5 + fs/nfs/nfs4client.c | 9 +- fs/nfs/nfs4proc.c | 31 +- fs/nfs/nfs4session.h | 1 + fs/nfs/nfs4state.c | 18 +- fs/nfs/proc.c | 6 +- fs/nfs/write.c | 30 + include/drm/drm_mm.h | 52 +- include/drm/ttm/ttm_bo_api.h | 2 +- include/drm/ttm/ttm_bo_driver.h | 2 +- include/kvm/arm_arch_timer.h | 31 +- include/kvm/arm_vgic.h | 117 +-- {virt => include}/kvm/iodev.h | 28 +- include/linux/cpuidle.h | 17 +- include/linux/interrupt.h | 9 +- include/linux/irqdesc.h | 1 + include/linux/kvm_host.h | 32 +- include/linux/nfs_fs.h | 5 +- include/linux/sched.h | 8 + include/linux/serial_core.h | 14 +- include/linux/spi/spi.h | 2 +- include/linux/usb/serial.h | 3 +- include/linux/workqueue.h | 3 +- include/net/netfilter/nf_tables.h | 22 +- include/uapi/linux/kvm.h | 65 +- include/uapi/linux/serial.h | 4 + include/video/omapdss.h | 1 + kernel/cpuset.c | 9 +- kernel/irq/manage.c | 7 +- kernel/irq/pm.c | 7 +- kernel/livepatch/core.c | 3 +- kernel/module.c | 2 + kernel/printk/console_cmdline.h | 2 +- kernel/printk/printk.c | 1 + kernel/sched/core.c | 15 + kernel/sched/idle.c | 54 +- kernel/trace/ftrace.c | 40 +- kernel/workqueue.c | 56 +- lib/seq_buf.c | 4 +- net/can/af_can.c | 3 + net/ipv4/ip_fragment.c | 11 +- net/ipv4/ip_sockglue.c | 33 +- net/ipv4/ping.c | 12 +- net/ipv4/tcp.c | 10 +- net/ipv6/datagram.c | 39 +- net/ipv6/ping.c | 5 +- net/irda/ircomm/ircomm_tty.c | 4 +- net/netfilter/ipvs/ip_vs_sync.c | 3 + net/netfilter/nf_tables_api.c | 61 +- net/netfilter/nft_compat.c | 14 +- net/packet/af_packet.c | 22 +- net/rxrpc/ar-error.c | 4 +- net/sunrpc/cache.c | 2 +- net/sunrpc/xprtrdma/rpc_rdma.c | 3 +- net/sunrpc/xprtrdma/xprt_rdma.h | 2 +- net/tipc/link.c | 7 +- sound/drivers/opl3/opl3_midi.c | 2 + sound/firewire/dice/dice-interface.h | 18 +- sound/firewire/dice/dice-proc.c | 4 +- sound/firewire/oxfw/oxfw-stream.c | 5 +- sound/isa/msnd/msnd_pinnacle_mixer.c | 3 +- sound/pci/hda/patch_realtek.c | 7 + sound/soc/atmel/sam9g20_wm8731.c | 68 +- sound/soc/cirrus/Kconfig | 2 +- sound/soc/codecs/Kconfig | 2 +- sound/soc/codecs/max98357a.c | 12 +- sound/soc/codecs/rt5670.c | 7 +- sound/soc/codecs/rt5677.c | 32 +- sound/soc/codecs/sta32x.c | 6 +- sound/soc/fsl/fsl_ssi.c | 11 +- sound/soc/generic/simple-card.c | 5 + sound/soc/intel/sst-atom-controls.h | 2 +- sound/soc/intel/sst/sst.c | 10 +- sound/soc/omap/omap-hdmi-audio.c | 3 + sound/soc/omap/omap-mcbsp.c | 11 + sound/soc/omap/omap-pcm.c | 2 +- sound/soc/samsung/Kconfig | 10 +- sound/soc/sh/rcar/core.c | 4 +- sound/usb/line6/playback.c | 6 +- virt/kvm/arm/arch_timer.c | 45 +- virt/kvm/arm/vgic-v2-emul.c | 71 +- virt/kvm/arm/vgic-v3-emul.c | 246 +++-- virt/kvm/arm/vgic.c | 479 ++++++--- virt/kvm/arm/vgic.h | 37 +- virt/kvm/coalesced_mmio.c | 7 +- virt/kvm/eventfd.c | 9 +- virt/kvm/irqchip.c | 2 +- virt/kvm/kvm_main.c | 148 ++- 378 files changed, 8320 insertions(+), 3641 deletions(-) create mode 100644 Documentation/CodeOfConflict create mode 100644 arch/mips/kvm/fpu.S create mode 100644 arch/mips/kvm/msa.S rename {virt => include}/kvm/iodev.h (66%) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-10 15:01 [GIT PULL] First batch of KVM changes for 4.1 Paolo Bonzini @ 2015-04-17 8:52 ` Peter Zijlstra 2015-04-17 9:17 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2015-04-17 8:52 UTC (permalink / raw) To: Paolo Bonzini Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto On Fri, Apr 10, 2015 at 05:01:29PM +0200, Paolo Bonzini wrote: > include/linux/sched.h | 8 + > kernel/sched/core.c | 15 + Can you please not puke over the scheduler without Acks from at least one maintainer? I complained about this very thing two years ago: http://marc.info/?l=linux-kernel&m=137345253916751 And now it magically re-appears WTF! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 8:52 ` Peter Zijlstra @ 2015-04-17 9:17 ` Peter Zijlstra 2015-04-17 10:09 ` Paolo Bonzini 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2015-04-17 9:17 UTC (permalink / raw) To: Paolo Bonzini Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto On Fri, Apr 17, 2015 at 10:52:38AM +0200, Peter Zijlstra wrote: > On Fri, Apr 10, 2015 at 05:01:29PM +0200, Paolo Bonzini wrote: > > include/linux/sched.h | 8 + > > kernel/sched/core.c | 15 + > > Can you please not puke over the scheduler without Acks from at least > one maintainer? > > I complained about this very thing two years ago: > > http://marc.info/?l=linux-kernel&m=137345253916751 > > And now it magically re-appears WTF! And I really don't understand _why_ you need that extra callback in the first place. You already have preempt notifiers, just track if you came in on another cpu than you went out on and voila! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 9:17 ` Peter Zijlstra @ 2015-04-17 10:09 ` Paolo Bonzini 2015-04-17 10:36 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: Paolo Bonzini @ 2015-04-17 10:09 UTC (permalink / raw) To: Peter Zijlstra Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto On 17/04/2015 11:17, Peter Zijlstra wrote: > On Fri, Apr 17, 2015 at 10:52:38AM +0200, Peter Zijlstra wrote: >> On Fri, Apr 10, 2015 at 05:01:29PM +0200, Paolo Bonzini wrote: >>> include/linux/sched.h | 8 + >>> kernel/sched/core.c | 15 + >> >> Can you please not puke over the scheduler without Acks from at least >> one maintainer? Sorry, this was done while I was not handling the KVM tree. At the very least the commit message should have included the original hashes of the commit and the revert. This way one could have found the original Acks: commit 582b336ec2c0f0076f5650a029fcc9abd4a906f7 Author: Marcelo Tosatti <mtosatti@redhat.com> Date: Tue Nov 27 23:28:54 2012 -0200 sched: add notifier for cross-cpu migrations Originally from Jeremy Fitzhardinge. Acked-by: Ingo Molnar <mingo@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> >> I complained about this very thing two years ago: >> >> http://marc.info/?l=linux-kernel&m=137345253916751 >> >> And now it magically re-appears WTF! > > And I really don't understand _why_ you need that extra callback in the > first place. You already have preempt notifiers, just track if you came > in on another cpu than you went out on and voila! Then you pay for _all_ preemptions of _all_ processes in the guest, instead of the hopefully rare ones that do a CPU migration. Preempt notifiers are registered on current only, this one is global. Of course, adding a static key is a good idea. I can also add a config symbol, selected by paravirt, if you want. Paolo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 10:09 ` Paolo Bonzini @ 2015-04-17 10:36 ` Peter Zijlstra 2015-04-17 10:38 ` Paolo Bonzini 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2015-04-17 10:36 UTC (permalink / raw) To: Paolo Bonzini Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto On Fri, Apr 17, 2015 at 12:09:49PM +0200, Paolo Bonzini wrote: > > > On 17/04/2015 11:17, Peter Zijlstra wrote: > > On Fri, Apr 17, 2015 at 10:52:38AM +0200, Peter Zijlstra wrote: > >> On Fri, Apr 10, 2015 at 05:01:29PM +0200, Paolo Bonzini wrote: > >>> include/linux/sched.h | 8 + > >>> kernel/sched/core.c | 15 + > >> > >> Can you please not puke over the scheduler without Acks from at least > >> one maintainer? > > Sorry, this was done while I was not handling the KVM tree. At the very > least the commit message should have included the original hashes of the > commit and the revert. This way one could have found the original Acks: > > commit 582b336ec2c0f0076f5650a029fcc9abd4a906f7 > Author: Marcelo Tosatti <mtosatti@redhat.com> > Date: Tue Nov 27 23:28:54 2012 -0200 > > sched: add notifier for cross-cpu migrations > > Originally from Jeremy Fitzhardinge. > > Acked-by: Ingo Molnar <mingo@redhat.com> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Still not a good reason to sneak it back it now, after I got it taken out. There was a reason it was removed, prior acks (esp. 2 year old ones) do not count one whit _NOW_. Also, Ingo later agreed that is was a mistake, http://marc.info/?l=linux-kernel&m=137346715521978&w=2 which is an effective retract of whatever ACK that was. It was crap code then and its crap code now. > >> I complained about this very thing two years ago: > >> > >> http://marc.info/?l=linux-kernel&m=137345253916751 > >> > >> And now it magically re-appears WTF! > > > > And I really don't understand _why_ you need that extra callback in the > > first place. You already have preempt notifiers, just track if you came > > in on another cpu than you went out on and voila! > > Then you pay for _all_ preemptions of _all_ processes in the guest, > instead of the hopefully rare ones that do a CPU migration. Now you make everybody pay for your crap, x86-64 paravirt or not. Keep the cost by those who need it. Please take it out, ASAP. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 10:36 ` Peter Zijlstra @ 2015-04-17 10:38 ` Paolo Bonzini 2015-04-17 10:55 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: Paolo Bonzini @ 2015-04-17 10:38 UTC (permalink / raw) To: Peter Zijlstra Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto On 17/04/2015 12:36, Peter Zijlstra wrote: > Now you make everybody pay for your crap, x86-64 paravirt or not. Keep > the cost by those who need it. > > Please take it out, ASAP. I'll just implement the static key. Paolo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 10:38 ` Paolo Bonzini @ 2015-04-17 10:55 ` Peter Zijlstra 2015-04-17 12:46 ` Paolo Bonzini 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2015-04-17 10:55 UTC (permalink / raw) To: Paolo Bonzini Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto On Fri, Apr 17, 2015 at 12:38:07PM +0200, Paolo Bonzini wrote: > > > On 17/04/2015 12:36, Peter Zijlstra wrote: > > Now you make everybody pay for your crap, x86-64 paravirt or not. Keep > > the cost by those who need it. > > > > Please take it out, ASAP. > > I'll just implement the static key. Can you first show that: preempt_out: int cpu = smp_processor_id(); if (vcpu->cpu != cpu) vcpu->cpu = cpu; preempt_in: int cpu = smp_processor_id(); if (unlikely(vcpu->cpu != cpu)) do_vcpu_migration_callback(cpu); Is actually a measurable performance hit and we actually _need_ the migration callback? Also, it looks like you already do exactly this for other things, look at: kvm_sched_in() kvm_arch_vcpu_load() if (unlikely(vcpu->cpu != cpu) ... ) So no, I don't believe for one second you need this. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 10:55 ` Peter Zijlstra @ 2015-04-17 12:46 ` Paolo Bonzini 2015-04-17 13:10 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: Paolo Bonzini @ 2015-04-17 12:46 UTC (permalink / raw) To: Peter Zijlstra Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto On 17/04/2015 12:55, Peter Zijlstra wrote: > Also, it looks like you already do exactly this for other things, look > at: > > kvm_sched_in() > kvm_arch_vcpu_load() > if (unlikely(vcpu->cpu != cpu) ... ) > > So no, I don't believe for one second you need this. You're missing that this snippet is running in the host, while this patch is concerned with the guest (paravirt). This notifier runs for _all_ tasks, not just for the KVM threads. In fact there will most likely be no KVM in the guest. There is no vcpu->cpu where this notifier is run. And frankly, I think the static key is snake oil. The cost of task migration in terms of cache misses and TLB misses is in no way comparable to the cost of filling in a structure on the stack, dereferencing the head of the notifiers list and seeing that it's NULL. If this was a real problem, it would be better solved by adding inlining in kernel/notifier.c: diff --git a/kernel/notifier.c b/kernel/notifier.c index ae9fc7cc360e..9c0cd0f739e0 100644 --- a/kernel/notifier.c +++ b/kernel/notifier.c @@ -59,28 +59,14 @@ static int notifier_chain_unregister(struct notifier_block **nl, return -ENOENT; } -/** - * notifier_call_chain - Informs the registered notifiers about an event. - * @nl: Pointer to head of the blocking notifier chain - * @val: Value passed unmodified to notifier function - * @v: Pointer passed unmodified to notifier function - * @nr_to_call: Number of notifier functions to be called. Don't care - * value of this parameter is -1. - * @nr_calls: Records the number of notifications sent. Don't care - * value of this field is NULL. - * @returns: notifier_call_chain returns the value returned by the - * last notifier function called. - */ -static int notifier_call_chain(struct notifier_block **nl, - unsigned long val, void *v, - int nr_to_call, int *nr_calls) +static int __notifier_call_chain(struct notifier_block *nb, + unsigned long val, void *v, + int nr_to_call, int *nr_calls) { int ret = NOTIFY_DONE; - struct notifier_block *nb, *next_nb; - - nb = rcu_dereference_raw(*nl); + struct notifier_block *next_nb; - while (nb && nr_to_call) { + do { next_nb = rcu_dereference_raw(nb->next); #ifdef CONFIG_DEBUG_NOTIFIERS @@ -94,14 +80,38 @@ static int notifier_call_chain(struct notifier_block **nl, if (nr_calls) (*nr_calls)++; - - if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK) - break; - nb = next_nb; - nr_to_call--; - } + } while (!(ret & NOTIFY_STOP_MASK) && + (nb = next_nb) != NULL && + --nr_to_call); return ret; } +NOKPROBE_SYMBOL(__notifier_call_chain); + +/** + * notifier_call_chain - Informs the registered notifiers about an event. + * @nl: Pointer to head of the blocking notifier chain + * @val: Value passed unmodified to notifier function + * @v: Pointer passed unmodified to notifier function + * @nr_to_call: Number of notifier functions to be called. Don't care + * value of this parameter is -1. + * @nr_calls: Records the number of notifications sent. Don't care + * value of this field is NULL. + * @returns: notifier_call_chain returns the value returned by the + * last notifier function called. + */ +static __always_inline int notifier_call_chain(struct notifier_block **nl, + unsigned long val, void *v, + int nr_to_call, int *nr_calls) +{ + struct notifier_block *nb = rcu_dereference_raw(*nl); + if (unlikely(nr_to_call == 0)) + return NOTIFY_DONE; + + if (!nb) + return NOTIFY_DONE; + + return __notifier_call_chain(nb, val, v, nr_to_call, nr_calls); +} NOKPROBE_SYMBOL(notifier_call_chain); /* @@ -190,7 +199,12 @@ NOKPROBE_SYMBOL(__atomic_notifier_call_chain); int atomic_notifier_call_chain(struct atomic_notifier_head *nh, unsigned long val, void *v) { - return __atomic_notifier_call_chain(nh, val, v, -1, NULL); + int ret; + + rcu_read_lock(); + ret = notifier_call_chain(&nh->head, val, v, -1, NULL); + rcu_read_unlock(); + return ret; } EXPORT_SYMBOL_GPL(atomic_notifier_call_chain); NOKPROBE_SYMBOL(atomic_notifier_call_chain); Also, move enough stuff to a header so that the fast path is inlined to a single pointer derefrence. Paolo ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 12:46 ` Paolo Bonzini @ 2015-04-17 13:10 ` Peter Zijlstra 2015-04-17 13:38 ` Paolo Bonzini 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2015-04-17 13:10 UTC (permalink / raw) To: Paolo Bonzini Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto On Fri, Apr 17, 2015 at 02:46:57PM +0200, Paolo Bonzini wrote: > On 17/04/2015 12:55, Peter Zijlstra wrote: > > Also, it looks like you already do exactly this for other things, look > > at: > > > > kvm_sched_in() > > kvm_arch_vcpu_load() > > if (unlikely(vcpu->cpu != cpu) ... ) > > > > So no, I don't believe for one second you need this. > > You're missing that this snippet is running in the host, while this > patch is concerned with the guest (paravirt). This doesn't make sense; but it brings us back to where we were last time. There is _0_ justification for this in the patches, that alone is grounds enough to reject it. Why should the guest task care about the physical cpu of the vcpu; that's a layering fail if ever there was one. Furthermore, the only thing that migration handler seems to do is increment a variable that is not actually used in that file. > And frankly, I think the static key is snake oil. The cost of task > migration in terms of cache misses and TLB misses is in no way > comparable to the cost of filling in a structure on the stack, > dereferencing the head of the notifiers list and seeing that it's NULL. The path this notifier is called from has nothing to do with those costs. And the fact you're inflicting these costs on _everyone_ for a single x86_64-paravirt case is insane. I've had enough of this, the below goes into sched/urgent and you can come back with sane patches if and when you're ready. --- arch/x86/kernel/pvclock.c | 24 ------------------------ include/linux/sched.h | 8 -------- kernel/sched/core.c | 15 --------------- 3 files changed, 47 deletions(-) diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index e5ecd20e72dd..82f116de3835 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -160,27 +160,6 @@ struct pvclock_vcpu_time_info *pvclock_get_vsyscall_time_info(int cpu) } #ifdef CONFIG_X86_64 -static int pvclock_task_migrate(struct notifier_block *nb, unsigned long l, - void *v) -{ - struct task_migration_notifier *mn = v; - struct pvclock_vsyscall_time_info *pvti; - - pvti = pvclock_get_vsyscall_user_time_info(mn->from_cpu); - - /* this is NULL when pvclock vsyscall is not initialized */ - if (unlikely(pvti == NULL)) - return NOTIFY_DONE; - - pvti->migrate_count++; - - return NOTIFY_DONE; -} - -static struct notifier_block pvclock_migrate = { - .notifier_call = pvclock_task_migrate, -}; - /* * Initialize the generic pvclock vsyscall state. This will allocate * a/some page(s) for the per-vcpu pvclock information, set up a @@ -202,9 +181,6 @@ int __init pvclock_init_vsyscall(struct pvclock_vsyscall_time_info *i, PAGE_KERNEL_VVAR); } - - register_task_migration_notifier(&pvclock_migrate); - return 0; } #endif diff --git a/include/linux/sched.h b/include/linux/sched.h index 3f3308824fa4..0eabab99e126 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -176,14 +176,6 @@ extern void get_iowait_load(unsigned long *nr_waiters, unsigned long *load); extern void calc_global_load(unsigned long ticks); extern void update_cpu_load_nohz(void); -/* Notifier for when a task gets migrated to a new CPU */ -struct task_migration_notifier { - struct task_struct *task; - int from_cpu; - int to_cpu; -}; -extern void register_task_migration_notifier(struct notifier_block *n); - extern unsigned long get_parent_ip(unsigned long addr); extern void dump_cpu_task(int cpu); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 1d0bc4fe266d..dbfc93d40292 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1013,13 +1013,6 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags) rq_clock_skip_update(rq, true); } -static ATOMIC_NOTIFIER_HEAD(task_migration_notifier); - -void register_task_migration_notifier(struct notifier_block *n) -{ - atomic_notifier_chain_register(&task_migration_notifier, n); -} - #ifdef CONFIG_SMP void set_task_cpu(struct task_struct *p, unsigned int new_cpu) { @@ -1050,18 +1043,10 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu) trace_sched_migrate_task(p, new_cpu); if (task_cpu(p) != new_cpu) { - struct task_migration_notifier tmn; - if (p->sched_class->migrate_task_rq) p->sched_class->migrate_task_rq(p, new_cpu); p->se.nr_migrations++; perf_sw_event_sched(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 0); - - tmn.task = p; - tmn.from_cpu = task_cpu(p); - tmn.to_cpu = new_cpu; - - atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn); } __set_task_cpu(p, new_cpu); ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 13:10 ` Peter Zijlstra @ 2015-04-17 13:38 ` Paolo Bonzini 2015-04-17 13:43 ` Peter Zijlstra 2015-04-17 19:01 ` Marcelo Tosatti 0 siblings, 2 replies; 34+ messages in thread From: Paolo Bonzini @ 2015-04-17 13:38 UTC (permalink / raw) To: Peter Zijlstra Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto On 17/04/2015 15:10, Peter Zijlstra wrote: > On Fri, Apr 17, 2015 at 02:46:57PM +0200, Paolo Bonzini wrote: >> On 17/04/2015 12:55, Peter Zijlstra wrote: >>> Also, it looks like you already do exactly this for other things, look >>> at: >>> >>> kvm_sched_in() >>> kvm_arch_vcpu_load() >>> if (unlikely(vcpu->cpu != cpu) ... ) >>> >>> So no, I don't believe for one second you need this. > > This [...] brings us back to where we were last > time. There is _0_ justification for this in the patches, that alone is > grounds enough to reject it. Oh, we totally agree on that. I didn't commit that patch, but I already said the commit message was insufficient. > Why should the guest task care about the physical cpu of the vcpu; > that's a layering fail if ever there was one. It's totally within your right to not read the code, but then please don't try commenting at it. This code: kvm_sched_in() kvm_arch_vcpu_load() if (unlikely(vcpu->cpu != cpu) ... ) runs in the host. The hypervisor obviously cares if the physical CPU of the VCPU changes. It has to tell the source processor (vcpu->cpu) to release the VCPU's data structure and only then it can use it in the target processor (cpu). No layering violation here. The task migration notifier runs in the guest, whenever the VCPU of a task changes. > Furthermore, the only thing that migration handler seems to do is > increment a variable that is not actually used in that file. It's used in the vDSO, so you cannot increment it in the file that uses it. >> And frankly, I think the static key is snake oil. The cost of task >> migration in terms of cache misses and TLB misses is in no way >> comparable to the cost of filling in a structure on the stack, >> dereferencing the head of the notifiers list and seeing that it's NULL. > > The path this notifier is called from has nothing to do with those > costs. How not? The task is going to incur those costs, it's not like half a dozen extra instruction make any difference. But anyway... > And the fact you're inflicting these costs on _everyone_ for a > single x86_64-paravirt case is insane. ... that's a valid objection. Please look at the patch below. > I've had enough of this, the below goes into sched/urgent and you can > come back with sane patches if and when you're ready. Oh, please, cut the alpha male crap. Paolo ------------------- 8< ---------------- >From 4eb9d7132e1990c0586f28af3103675416d38974 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini <pbonzini@redhat.com> Date: Fri, 17 Apr 2015 14:57:34 +0200 Subject: [PATCH] sched: add CONFIG_TASK_MIGRATION_NOTIFIER The task migration notifier is only used in x86 paravirt. Make it possible to compile it out. While at it, move some code around to ensure tmn is filled from CPU registers. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- arch/x86/Kconfig | 1 + init/Kconfig | 3 +++ kernel/sched/core.c | 9 ++++++++- 3 files changed, 12 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d43e7e1c784b..9af252c8698d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -649,6 +649,7 @@ if HYPERVISOR_GUEST config PARAVIRT bool "Enable paravirtualization code" + select TASK_MIGRATION_NOTIFIER ---help--- This changes the kernel so it can modify itself when it is run under a hypervisor, potentially improving performance significantly diff --git a/init/Kconfig b/init/Kconfig index 3b9df1aa35db..891917123338 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -2016,6 +2016,9 @@ source "block/Kconfig" config PREEMPT_NOTIFIERS bool +config TASK_MIGRATION_NOTIFIER + bool + config PADATA depends on SMP bool diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f9123a82cbb6..c07a53aa543c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1016,12 +1016,14 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags) rq_clock_skip_update(rq, true); } +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER static ATOMIC_NOTIFIER_HEAD(task_migration_notifier); void register_task_migration_notifier(struct notifier_block *n) { atomic_notifier_chain_register(&task_migration_notifier, n); } +#endif #ifdef CONFIG_SMP void set_task_cpu(struct task_struct *p, unsigned int new_cpu) @@ -1053,18 +1055,23 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu) trace_sched_migrate_task(p, new_cpu); if (task_cpu(p) != new_cpu) { +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER struct task_migration_notifier tmn; + int from_cpu = task_cpu(p); +#endif if (p->sched_class->migrate_task_rq) p->sched_class->migrate_task_rq(p, new_cpu); p->se.nr_migrations++; perf_sw_event_sched(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 0); +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER tmn.task = p; - tmn.from_cpu = task_cpu(p); + tmn.from_cpu = from_cpu; tmn.to_cpu = new_cpu; atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn); +#endif } __set_task_cpu(p, new_cpu); -- 2.3.5 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 13:38 ` Paolo Bonzini @ 2015-04-17 13:43 ` Peter Zijlstra 2015-04-17 14:57 ` Paolo Bonzini 2015-04-17 19:01 ` Marcelo Tosatti 1 sibling, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2015-04-17 13:43 UTC (permalink / raw) To: Paolo Bonzini Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto On Fri, Apr 17, 2015 at 03:38:58PM +0200, Paolo Bonzini wrote: > > The path this notifier is called from has nothing to do with those > > costs. > > How not? The task is going to incur those costs, it's not like half > a dozen extra instruction make any difference. But anyway... Its attributed to the entity doing the migration, which can be the wakeup path or a softirq. And we very much do care about the wakeup path. > ... that's a valid objection. Please look at the patch below. Still a NAK on that, distros have no choice but to enable that CONFIG option because people might want to run KVM. CONFIG options are pointless if they end up being mandatory. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 13:43 ` Peter Zijlstra @ 2015-04-17 14:57 ` Paolo Bonzini 0 siblings, 0 replies; 34+ messages in thread From: Paolo Bonzini @ 2015-04-17 14:57 UTC (permalink / raw) To: Peter Zijlstra Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto On 17/04/2015 15:43, Peter Zijlstra wrote: > On Fri, Apr 17, 2015 at 03:38:58PM +0200, Paolo Bonzini wrote: >>> The path this notifier is called from has nothing to do with those >>> costs. > > Its attributed to the entity doing the migration, which can be the > wakeup path or a softirq. And we very much do care about the wakeup > path. It's not run on all wakeups. It's within an "if (task_cpu(p) != new_cpu)". WF_MIGRATED _is_ a slow path for wakeups, even though wakeup itself is of course something we care about. For load balancing, calculate_imbalance alone is orders of magnitudes more expensive than this notifier (which can be optimized to two instructions with at most one cache miss). >> ... that's a valid objection. Please look at the patch below. > > Still a NAK on that, distros have no choice but to enable that CONFIG > option because people might want to run KVM. Again: running virtual machines does not require these notifiers. KVM needs preempt and MMU notifiers, and also enables user return notifiers, but does not need these ones. It's only paravirt that needs them. It's perfectly fine for distros to disable paravirt. Some do, some don't. Paolo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 13:38 ` Paolo Bonzini 2015-04-17 13:43 ` Peter Zijlstra @ 2015-04-17 19:01 ` Marcelo Tosatti 2015-04-17 19:16 ` Andy Lutomirski 2015-04-17 19:57 ` Paolo Bonzini 1 sibling, 2 replies; 34+ messages in thread From: Marcelo Tosatti @ 2015-04-17 19:01 UTC (permalink / raw) To: Paolo Bonzini Cc: Peter Zijlstra, torvalds, linux-kernel, gleb, kvm, Ralf Baechle, luto On Fri, Apr 17, 2015 at 03:38:58PM +0200, Paolo Bonzini wrote: > On 17/04/2015 15:10, Peter Zijlstra wrote: > > On Fri, Apr 17, 2015 at 02:46:57PM +0200, Paolo Bonzini wrote: > >> On 17/04/2015 12:55, Peter Zijlstra wrote: > >>> Also, it looks like you already do exactly this for other things, look > >>> at: > >>> > >>> kvm_sched_in() > >>> kvm_arch_vcpu_load() > >>> if (unlikely(vcpu->cpu != cpu) ... ) > >>> > >>> So no, I don't believe for one second you need this. > > > > This [...] brings us back to where we were last > > time. There is _0_ justification for this in the patches, that alone is > > grounds enough to reject it. > > Oh, we totally agree on that. I didn't commit that patch, but I already > said the commit message was insufficient. > > > Why should the guest task care about the physical cpu of the vcpu; > > that's a layering fail if ever there was one. > > It's totally within your right to not read the code, but then please > don't try commenting at it. > > This code: > > kvm_sched_in() > kvm_arch_vcpu_load() > if (unlikely(vcpu->cpu != cpu) ... ) > > runs in the host. The hypervisor obviously cares if the physical CPU of > the VCPU changes. It has to tell the source processor (vcpu->cpu) to > release the VCPU's data structure and only then it can use it in the > target processor (cpu). No layering violation here. > > The task migration notifier runs in the guest, whenever the VCPU of > a task changes. > > > Furthermore, the only thing that migration handler seems to do is > > increment a variable that is not actually used in that file. > > It's used in the vDSO, so you cannot increment it in the file that uses it. > > >> And frankly, I think the static key is snake oil. The cost of task > >> migration in terms of cache misses and TLB misses is in no way > >> comparable to the cost of filling in a structure on the stack, > >> dereferencing the head of the notifiers list and seeing that it's NULL. > > > > The path this notifier is called from has nothing to do with those > > costs. > > How not? The task is going to incur those costs, it's not like half > a dozen extra instruction make any difference. But anyway... > > > And the fact you're inflicting these costs on _everyone_ for a > > single x86_64-paravirt case is insane. > > ... that's a valid objection. Please look at the patch below. > > > I've had enough of this, the below goes into sched/urgent and you can > > come back with sane patches if and when you're ready. > > Oh, please, cut the alpha male crap. > > Paolo > > ------------------- 8< ---------------- > >From 4eb9d7132e1990c0586f28af3103675416d38974 Mon Sep 17 00:00:00 2001 > From: Paolo Bonzini <pbonzini@redhat.com> > Date: Fri, 17 Apr 2015 14:57:34 +0200 > Subject: [PATCH] sched: add CONFIG_TASK_MIGRATION_NOTIFIER > > The task migration notifier is only used in x86 paravirt. Make it > possible to compile it out. > > While at it, move some code around to ensure tmn is filled from CPU > registers. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > arch/x86/Kconfig | 1 + > init/Kconfig | 3 +++ > kernel/sched/core.c | 9 ++++++++- > 3 files changed, 12 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index d43e7e1c784b..9af252c8698d 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -649,6 +649,7 @@ if HYPERVISOR_GUEST > > config PARAVIRT > bool "Enable paravirtualization code" > + select TASK_MIGRATION_NOTIFIER > ---help--- > This changes the kernel so it can modify itself when it is run > under a hypervisor, potentially improving performance significantly > diff --git a/init/Kconfig b/init/Kconfig > index 3b9df1aa35db..891917123338 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -2016,6 +2016,9 @@ source "block/Kconfig" > config PREEMPT_NOTIFIERS > bool > > +config TASK_MIGRATION_NOTIFIER > + bool > + > config PADATA > depends on SMP > bool > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index f9123a82cbb6..c07a53aa543c 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1016,12 +1016,14 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags) > rq_clock_skip_update(rq, true); > } > > +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER > static ATOMIC_NOTIFIER_HEAD(task_migration_notifier); > > void register_task_migration_notifier(struct notifier_block *n) > { > atomic_notifier_chain_register(&task_migration_notifier, n); > } > +#endif > > #ifdef CONFIG_SMP > void set_task_cpu(struct task_struct *p, unsigned int new_cpu) > @@ -1053,18 +1055,23 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu) > trace_sched_migrate_task(p, new_cpu); > > if (task_cpu(p) != new_cpu) { > +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER > struct task_migration_notifier tmn; > + int from_cpu = task_cpu(p); > +#endif > > if (p->sched_class->migrate_task_rq) > p->sched_class->migrate_task_rq(p, new_cpu); > p->se.nr_migrations++; > perf_sw_event_sched(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 0); > > +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER > tmn.task = p; > - tmn.from_cpu = task_cpu(p); > + tmn.from_cpu = from_cpu; > tmn.to_cpu = new_cpu; > > atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn); > +#endif > } > > __set_task_cpu(p, new_cpu); > -- > 2.3.5 Paolo, Please revert the patch -- can fix properly in the host which also conforms the KVM guest/host documented protocol. Radim submitted a patch to kvm@ to split the kvm_write_guest in two with a barrier in between, i think. I'll review that patch. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 19:01 ` Marcelo Tosatti @ 2015-04-17 19:16 ` Andy Lutomirski 2015-04-17 19:57 ` Paolo Bonzini 1 sibling, 0 replies; 34+ messages in thread From: Andy Lutomirski @ 2015-04-17 19:16 UTC (permalink / raw) To: Marcelo Tosatti Cc: Paolo Bonzini, Peter Zijlstra, Linus Torvalds, linux-kernel, Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski On Fri, Apr 17, 2015 at 12:01 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote: > On Fri, Apr 17, 2015 at 03:38:58PM +0200, Paolo Bonzini wrote: >> On 17/04/2015 15:10, Peter Zijlstra wrote: >> > On Fri, Apr 17, 2015 at 02:46:57PM +0200, Paolo Bonzini wrote: >> >> On 17/04/2015 12:55, Peter Zijlstra wrote: >> >>> Also, it looks like you already do exactly this for other things, look >> >>> at: >> >>> >> >>> kvm_sched_in() >> >>> kvm_arch_vcpu_load() >> >>> if (unlikely(vcpu->cpu != cpu) ... ) >> >>> >> >>> So no, I don't believe for one second you need this. >> > >> > This [...] brings us back to where we were last >> > time. There is _0_ justification for this in the patches, that alone is >> > grounds enough to reject it. >> >> Oh, we totally agree on that. I didn't commit that patch, but I already >> said the commit message was insufficient. >> >> > Why should the guest task care about the physical cpu of the vcpu; >> > that's a layering fail if ever there was one. >> >> It's totally within your right to not read the code, but then please >> don't try commenting at it. >> >> This code: >> >> kvm_sched_in() >> kvm_arch_vcpu_load() >> if (unlikely(vcpu->cpu != cpu) ... ) >> >> runs in the host. The hypervisor obviously cares if the physical CPU of >> the VCPU changes. It has to tell the source processor (vcpu->cpu) to >> release the VCPU's data structure and only then it can use it in the >> target processor (cpu). No layering violation here. >> >> The task migration notifier runs in the guest, whenever the VCPU of >> a task changes. >> >> > Furthermore, the only thing that migration handler seems to do is >> > increment a variable that is not actually used in that file. >> >> It's used in the vDSO, so you cannot increment it in the file that uses it. >> >> >> And frankly, I think the static key is snake oil. The cost of task >> >> migration in terms of cache misses and TLB misses is in no way >> >> comparable to the cost of filling in a structure on the stack, >> >> dereferencing the head of the notifiers list and seeing that it's NULL. >> > >> > The path this notifier is called from has nothing to do with those >> > costs. >> >> How not? The task is going to incur those costs, it's not like half >> a dozen extra instruction make any difference. But anyway... >> >> > And the fact you're inflicting these costs on _everyone_ for a >> > single x86_64-paravirt case is insane. >> >> ... that's a valid objection. Please look at the patch below. >> >> > I've had enough of this, the below goes into sched/urgent and you can >> > come back with sane patches if and when you're ready. >> >> Oh, please, cut the alpha male crap. >> >> Paolo >> >> ------------------- 8< ---------------- >> >From 4eb9d7132e1990c0586f28af3103675416d38974 Mon Sep 17 00:00:00 2001 >> From: Paolo Bonzini <pbonzini@redhat.com> >> Date: Fri, 17 Apr 2015 14:57:34 +0200 >> Subject: [PATCH] sched: add CONFIG_TASK_MIGRATION_NOTIFIER >> >> The task migration notifier is only used in x86 paravirt. Make it >> possible to compile it out. >> >> While at it, move some code around to ensure tmn is filled from CPU >> registers. >> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> >> --- >> arch/x86/Kconfig | 1 + >> init/Kconfig | 3 +++ >> kernel/sched/core.c | 9 ++++++++- >> 3 files changed, 12 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >> index d43e7e1c784b..9af252c8698d 100644 >> --- a/arch/x86/Kconfig >> +++ b/arch/x86/Kconfig >> @@ -649,6 +649,7 @@ if HYPERVISOR_GUEST >> >> config PARAVIRT >> bool "Enable paravirtualization code" >> + select TASK_MIGRATION_NOTIFIER >> ---help--- >> This changes the kernel so it can modify itself when it is run >> under a hypervisor, potentially improving performance significantly >> diff --git a/init/Kconfig b/init/Kconfig >> index 3b9df1aa35db..891917123338 100644 >> --- a/init/Kconfig >> +++ b/init/Kconfig >> @@ -2016,6 +2016,9 @@ source "block/Kconfig" >> config PREEMPT_NOTIFIERS >> bool >> >> +config TASK_MIGRATION_NOTIFIER >> + bool >> + >> config PADATA >> depends on SMP >> bool >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index f9123a82cbb6..c07a53aa543c 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -1016,12 +1016,14 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags) >> rq_clock_skip_update(rq, true); >> } >> >> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER >> static ATOMIC_NOTIFIER_HEAD(task_migration_notifier); >> >> void register_task_migration_notifier(struct notifier_block *n) >> { >> atomic_notifier_chain_register(&task_migration_notifier, n); >> } >> +#endif >> >> #ifdef CONFIG_SMP >> void set_task_cpu(struct task_struct *p, unsigned int new_cpu) >> @@ -1053,18 +1055,23 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu) >> trace_sched_migrate_task(p, new_cpu); >> >> if (task_cpu(p) != new_cpu) { >> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER >> struct task_migration_notifier tmn; >> + int from_cpu = task_cpu(p); >> +#endif >> >> if (p->sched_class->migrate_task_rq) >> p->sched_class->migrate_task_rq(p, new_cpu); >> p->se.nr_migrations++; >> perf_sw_event_sched(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 0); >> >> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER >> tmn.task = p; >> - tmn.from_cpu = task_cpu(p); >> + tmn.from_cpu = from_cpu; >> tmn.to_cpu = new_cpu; >> >> atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn); >> +#endif >> } >> >> __set_task_cpu(p, new_cpu); >> -- >> 2.3.5 > > Paolo, > > Please revert the patch -- can fix properly in the host > which also conforms the KVM guest/host documented protocol. > > Radim submitted a patch to kvm@ to split > the kvm_write_guest in two with a barrier in between, i think. > > I'll review that patch. > Can you cc me on that? Thanks, Andy -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 19:01 ` Marcelo Tosatti 2015-04-17 19:16 ` Andy Lutomirski @ 2015-04-17 19:57 ` Paolo Bonzini 2015-04-17 20:18 ` Marcelo Tosatti 1 sibling, 1 reply; 34+ messages in thread From: Paolo Bonzini @ 2015-04-17 19:57 UTC (permalink / raw) To: Marcelo Tosatti Cc: Peter Zijlstra, torvalds, linux-kernel, gleb, kvm, Ralf Baechle, luto >> From 4eb9d7132e1990c0586f28af3103675416d38974 Mon Sep 17 00:00:00 2001 >> From: Paolo Bonzini <pbonzini@redhat.com> >> Date: Fri, 17 Apr 2015 14:57:34 +0200 >> Subject: [PATCH] sched: add CONFIG_TASK_MIGRATION_NOTIFIER >> >> The task migration notifier is only used in x86 paravirt. Make it >> possible to compile it out. >> >> While at it, move some code around to ensure tmn is filled from CPU >> registers. >> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> >> --- >> arch/x86/Kconfig | 1 + >> init/Kconfig | 3 +++ >> kernel/sched/core.c | 9 ++++++++- >> 3 files changed, 12 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >> index d43e7e1c784b..9af252c8698d 100644 >> --- a/arch/x86/Kconfig >> +++ b/arch/x86/Kconfig >> @@ -649,6 +649,7 @@ if HYPERVISOR_GUEST >> >> config PARAVIRT >> bool "Enable paravirtualization code" >> + select TASK_MIGRATION_NOTIFIER >> ---help--- >> This changes the kernel so it can modify itself when it is run >> under a hypervisor, potentially improving performance significantly >> diff --git a/init/Kconfig b/init/Kconfig >> index 3b9df1aa35db..891917123338 100644 >> --- a/init/Kconfig >> +++ b/init/Kconfig >> @@ -2016,6 +2016,9 @@ source "block/Kconfig" >> config PREEMPT_NOTIFIERS >> bool >> >> +config TASK_MIGRATION_NOTIFIER >> + bool >> + >> config PADATA >> depends on SMP >> bool >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index f9123a82cbb6..c07a53aa543c 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -1016,12 +1016,14 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags) >> rq_clock_skip_update(rq, true); >> } >> >> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER >> static ATOMIC_NOTIFIER_HEAD(task_migration_notifier); >> >> void register_task_migration_notifier(struct notifier_block *n) >> { >> atomic_notifier_chain_register(&task_migration_notifier, n); >> } >> +#endif >> >> #ifdef CONFIG_SMP >> void set_task_cpu(struct task_struct *p, unsigned int new_cpu) >> @@ -1053,18 +1055,23 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu) >> trace_sched_migrate_task(p, new_cpu); >> >> if (task_cpu(p) != new_cpu) { >> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER >> struct task_migration_notifier tmn; >> + int from_cpu = task_cpu(p); >> +#endif >> >> if (p->sched_class->migrate_task_rq) >> p->sched_class->migrate_task_rq(p, new_cpu); >> p->se.nr_migrations++; >> perf_sw_event_sched(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 0); >> >> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER >> tmn.task = p; >> - tmn.from_cpu = task_cpu(p); >> + tmn.from_cpu = from_cpu; >> tmn.to_cpu = new_cpu; >> >> atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn); >> +#endif >> } >> >> __set_task_cpu(p, new_cpu); >> -- >> 2.3.5 > > Paolo, > > Please revert the patch -- can fix properly in the host > which also conforms the KVM guest/host documented protocol. > > Radim submitted a patch to kvm@ to split > the kvm_write_guest in two with a barrier in between, i think. > > I'll review that patch. You're thinking of http://article.gmane.org/gmane.linux.kernel.stable/129187, but see Andy's reply: > > I think there are at least two ways that would work: > > a) If KVM incremented version as advertised: > > cpu = getcpu(); > pvti = pvti for cpu; > > ver1 = pvti->version; > check stable bit; > rdtsc_barrier, rdtsc, read scale, shift, etc. > if (getcpu() != cpu) retry; > if (pvti->version != ver1) retry; > > I think this is safe because, we're guaranteed that there was an > interval (between the two version reads) in which the vcpu we think > we're on was running and the kvmclock data was valid and marked > stable, and we know that the tsc we read came from that interval. > > Note: rdtscp isn't needed. If we're stable, is makes no difference > which cpu's tsc we actually read. > > b) If version remains buggy but we use this migrations_from hack: > > cpu = getcpu(); > pvti = pvti for cpu; > m1 = pvti->migrations_from; > barrier(); > > ver1 = pvti->version; > check stable bit; > rdtsc_barrier, rdtsc, read scale, shift, etc. > if (getcpu() != cpu) retry; > if (pvti->version != ver1) retry; /* probably not really needed */ > > barrier(); > if (pvti->migrations_from != m1) retry; > > This is just like (a), except that we're using a guest kernel hack to > ensure that no one migrated off the vcpu during the version-protected > critical section and that we were, in fact, on that vcpu at some point > during that critical section. Once we've ensured that we were on > pvti's associated vcpu for the entire time we were reading it, then we > are protected by the existing versioning in the host. (a) is not going to happen until 4.2, and there are too many buggy hosts around so we'd have to define new ABI that lets the guest distinguish a buggy host from a fixed one. (b) works now, is not invasive, and I still maintain that the cost is negligible. I'm going to run for a while with CONFIG_SCHEDSTATS to see how often you have a migration. Anyhow if the task migration notifier is reverted we have to disable the whole vsyscall support altogether. Paolo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 19:57 ` Paolo Bonzini @ 2015-04-17 20:18 ` Marcelo Tosatti 2015-04-17 20:39 ` Andy Lutomirski 2015-04-20 16:59 ` Paolo Bonzini 0 siblings, 2 replies; 34+ messages in thread From: Marcelo Tosatti @ 2015-04-17 20:18 UTC (permalink / raw) To: Paolo Bonzini Cc: Peter Zijlstra, torvalds, linux-kernel, gleb, kvm, Ralf Baechle, luto On Fri, Apr 17, 2015 at 09:57:12PM +0200, Paolo Bonzini wrote: > > > >> From 4eb9d7132e1990c0586f28af3103675416d38974 Mon Sep 17 00:00:00 2001 > >> From: Paolo Bonzini <pbonzini@redhat.com> > >> Date: Fri, 17 Apr 2015 14:57:34 +0200 > >> Subject: [PATCH] sched: add CONFIG_TASK_MIGRATION_NOTIFIER > >> > >> The task migration notifier is only used in x86 paravirt. Make it > >> possible to compile it out. > >> > >> While at it, move some code around to ensure tmn is filled from CPU > >> registers. > >> > >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > >> --- > >> arch/x86/Kconfig | 1 + > >> init/Kconfig | 3 +++ > >> kernel/sched/core.c | 9 ++++++++- > >> 3 files changed, 12 insertions(+), 1 deletion(-) > >> > >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > >> index d43e7e1c784b..9af252c8698d 100644 > >> --- a/arch/x86/Kconfig > >> +++ b/arch/x86/Kconfig > >> @@ -649,6 +649,7 @@ if HYPERVISOR_GUEST > >> > >> config PARAVIRT > >> bool "Enable paravirtualization code" > >> + select TASK_MIGRATION_NOTIFIER > >> ---help--- > >> This changes the kernel so it can modify itself when it is run > >> under a hypervisor, potentially improving performance significantly > >> diff --git a/init/Kconfig b/init/Kconfig > >> index 3b9df1aa35db..891917123338 100644 > >> --- a/init/Kconfig > >> +++ b/init/Kconfig > >> @@ -2016,6 +2016,9 @@ source "block/Kconfig" > >> config PREEMPT_NOTIFIERS > >> bool > >> > >> +config TASK_MIGRATION_NOTIFIER > >> + bool > >> + > >> config PADATA > >> depends on SMP > >> bool > >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c > >> index f9123a82cbb6..c07a53aa543c 100644 > >> --- a/kernel/sched/core.c > >> +++ b/kernel/sched/core.c > >> @@ -1016,12 +1016,14 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags) > >> rq_clock_skip_update(rq, true); > >> } > >> > >> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER > >> static ATOMIC_NOTIFIER_HEAD(task_migration_notifier); > >> > >> void register_task_migration_notifier(struct notifier_block *n) > >> { > >> atomic_notifier_chain_register(&task_migration_notifier, n); > >> } > >> +#endif > >> > >> #ifdef CONFIG_SMP > >> void set_task_cpu(struct task_struct *p, unsigned int new_cpu) > >> @@ -1053,18 +1055,23 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu) > >> trace_sched_migrate_task(p, new_cpu); > >> > >> if (task_cpu(p) != new_cpu) { > >> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER > >> struct task_migration_notifier tmn; > >> + int from_cpu = task_cpu(p); > >> +#endif > >> > >> if (p->sched_class->migrate_task_rq) > >> p->sched_class->migrate_task_rq(p, new_cpu); > >> p->se.nr_migrations++; > >> perf_sw_event_sched(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 0); > >> > >> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER > >> tmn.task = p; > >> - tmn.from_cpu = task_cpu(p); > >> + tmn.from_cpu = from_cpu; > >> tmn.to_cpu = new_cpu; > >> > >> atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn); > >> +#endif > >> } > >> > >> __set_task_cpu(p, new_cpu); > >> -- > >> 2.3.5 > > > > Paolo, > > > > Please revert the patch -- can fix properly in the host > > which also conforms the KVM guest/host documented protocol. > > > > Radim submitted a patch to kvm@ to split > > the kvm_write_guest in two with a barrier in between, i think. > > > > I'll review that patch. > > You're thinking of > http://article.gmane.org/gmane.linux.kernel.stable/129187, but see > Andy's reply: > > > > > I think there are at least two ways that would work: > > > > a) If KVM incremented version as advertised: > > > > cpu = getcpu(); > > pvti = pvti for cpu; > > > > ver1 = pvti->version; > > check stable bit; > > rdtsc_barrier, rdtsc, read scale, shift, etc. > > if (getcpu() != cpu) retry; > > if (pvti->version != ver1) retry; > > > > I think this is safe because, we're guaranteed that there was an > > interval (between the two version reads) in which the vcpu we think > > we're on was running and the kvmclock data was valid and marked > > stable, and we know that the tsc we read came from that interval. > > > > Note: rdtscp isn't needed. If we're stable, is makes no difference > > which cpu's tsc we actually read. > > > > b) If version remains buggy but we use this migrations_from hack: > > > > cpu = getcpu(); > > pvti = pvti for cpu; > > m1 = pvti->migrations_from; > > barrier(); > > > > ver1 = pvti->version; > > check stable bit; > > rdtsc_barrier, rdtsc, read scale, shift, etc. > > if (getcpu() != cpu) retry; > > if (pvti->version != ver1) retry; /* probably not really needed */ > > > > barrier(); > > if (pvti->migrations_from != m1) retry; > > > > This is just like (a), except that we're using a guest kernel hack to > > ensure that no one migrated off the vcpu during the version-protected > > critical section and that we were, in fact, on that vcpu at some point > > during that critical section. Once we've ensured that we were on > > pvti's associated vcpu for the entire time we were reading it, then we > > are protected by the existing versioning in the host. > > (a) is not going to happen until 4.2, and there are too many buggy hosts > around so we'd have to define new ABI that lets the guest distinguish a > buggy host from a fixed one. > > (b) works now, is not invasive, and I still maintain that the cost is > negligible. I'm going to run for a while with CONFIG_SCHEDSTATS to see > how often you have a migration. > > Anyhow if the task migration notifier is reverted we have to disable the > whole vsyscall support altogether. The bug which this is fixing is very rare, have no memory of a report. In fact, its even difficult to create a synthetic reproducer. You need: 1) update of kvmclock data structure (happens once every 5 minutes). 2) migration of task from vcpu1 to vcpu2 back to vcpu1. 3) a data race between kvm_write_guest (string copy) and 2 above. At the same time. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 20:18 ` Marcelo Tosatti @ 2015-04-17 20:39 ` Andy Lutomirski 2015-04-17 21:28 ` Linus Torvalds 2015-04-20 16:59 ` Paolo Bonzini 1 sibling, 1 reply; 34+ messages in thread From: Andy Lutomirski @ 2015-04-17 20:39 UTC (permalink / raw) To: Marcelo Tosatti Cc: Paolo Bonzini, Peter Zijlstra, Linus Torvalds, linux-kernel, Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski On Fri, Apr 17, 2015 at 1:18 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote: > On Fri, Apr 17, 2015 at 09:57:12PM +0200, Paolo Bonzini wrote: >> >> >> >> From 4eb9d7132e1990c0586f28af3103675416d38974 Mon Sep 17 00:00:00 2001 >> >> From: Paolo Bonzini <pbonzini@redhat.com> >> >> Date: Fri, 17 Apr 2015 14:57:34 +0200 >> >> Subject: [PATCH] sched: add CONFIG_TASK_MIGRATION_NOTIFIER >> >> >> >> The task migration notifier is only used in x86 paravirt. Make it >> >> possible to compile it out. >> >> >> >> While at it, move some code around to ensure tmn is filled from CPU >> >> registers. >> >> >> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> >> >> --- >> >> arch/x86/Kconfig | 1 + >> >> init/Kconfig | 3 +++ >> >> kernel/sched/core.c | 9 ++++++++- >> >> 3 files changed, 12 insertions(+), 1 deletion(-) >> >> >> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >> >> index d43e7e1c784b..9af252c8698d 100644 >> >> --- a/arch/x86/Kconfig >> >> +++ b/arch/x86/Kconfig >> >> @@ -649,6 +649,7 @@ if HYPERVISOR_GUEST >> >> >> >> config PARAVIRT >> >> bool "Enable paravirtualization code" >> >> + select TASK_MIGRATION_NOTIFIER >> >> ---help--- >> >> This changes the kernel so it can modify itself when it is run >> >> under a hypervisor, potentially improving performance significantly >> >> diff --git a/init/Kconfig b/init/Kconfig >> >> index 3b9df1aa35db..891917123338 100644 >> >> --- a/init/Kconfig >> >> +++ b/init/Kconfig >> >> @@ -2016,6 +2016,9 @@ source "block/Kconfig" >> >> config PREEMPT_NOTIFIERS >> >> bool >> >> >> >> +config TASK_MIGRATION_NOTIFIER >> >> + bool >> >> + >> >> config PADATA >> >> depends on SMP >> >> bool >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> >> index f9123a82cbb6..c07a53aa543c 100644 >> >> --- a/kernel/sched/core.c >> >> +++ b/kernel/sched/core.c >> >> @@ -1016,12 +1016,14 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags) >> >> rq_clock_skip_update(rq, true); >> >> } >> >> >> >> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER >> >> static ATOMIC_NOTIFIER_HEAD(task_migration_notifier); >> >> >> >> void register_task_migration_notifier(struct notifier_block *n) >> >> { >> >> atomic_notifier_chain_register(&task_migration_notifier, n); >> >> } >> >> +#endif >> >> >> >> #ifdef CONFIG_SMP >> >> void set_task_cpu(struct task_struct *p, unsigned int new_cpu) >> >> @@ -1053,18 +1055,23 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu) >> >> trace_sched_migrate_task(p, new_cpu); >> >> >> >> if (task_cpu(p) != new_cpu) { >> >> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER >> >> struct task_migration_notifier tmn; >> >> + int from_cpu = task_cpu(p); >> >> +#endif >> >> >> >> if (p->sched_class->migrate_task_rq) >> >> p->sched_class->migrate_task_rq(p, new_cpu); >> >> p->se.nr_migrations++; >> >> perf_sw_event_sched(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 0); >> >> >> >> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER >> >> tmn.task = p; >> >> - tmn.from_cpu = task_cpu(p); >> >> + tmn.from_cpu = from_cpu; >> >> tmn.to_cpu = new_cpu; >> >> >> >> atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn); >> >> +#endif >> >> } >> >> >> >> __set_task_cpu(p, new_cpu); >> >> -- >> >> 2.3.5 >> > >> > Paolo, >> > >> > Please revert the patch -- can fix properly in the host >> > which also conforms the KVM guest/host documented protocol. >> > >> > Radim submitted a patch to kvm@ to split >> > the kvm_write_guest in two with a barrier in between, i think. >> > >> > I'll review that patch. >> >> You're thinking of >> http://article.gmane.org/gmane.linux.kernel.stable/129187, but see >> Andy's reply: >> >> > >> > I think there are at least two ways that would work: >> > >> > a) If KVM incremented version as advertised: >> > >> > cpu = getcpu(); >> > pvti = pvti for cpu; >> > >> > ver1 = pvti->version; >> > check stable bit; >> > rdtsc_barrier, rdtsc, read scale, shift, etc. >> > if (getcpu() != cpu) retry; >> > if (pvti->version != ver1) retry; >> > >> > I think this is safe because, we're guaranteed that there was an >> > interval (between the two version reads) in which the vcpu we think >> > we're on was running and the kvmclock data was valid and marked >> > stable, and we know that the tsc we read came from that interval. >> > >> > Note: rdtscp isn't needed. If we're stable, is makes no difference >> > which cpu's tsc we actually read. >> > >> > b) If version remains buggy but we use this migrations_from hack: >> > >> > cpu = getcpu(); >> > pvti = pvti for cpu; >> > m1 = pvti->migrations_from; >> > barrier(); >> > >> > ver1 = pvti->version; >> > check stable bit; >> > rdtsc_barrier, rdtsc, read scale, shift, etc. >> > if (getcpu() != cpu) retry; >> > if (pvti->version != ver1) retry; /* probably not really needed */ >> > >> > barrier(); >> > if (pvti->migrations_from != m1) retry; >> > >> > This is just like (a), except that we're using a guest kernel hack to >> > ensure that no one migrated off the vcpu during the version-protected >> > critical section and that we were, in fact, on that vcpu at some point >> > during that critical section. Once we've ensured that we were on >> > pvti's associated vcpu for the entire time we were reading it, then we >> > are protected by the existing versioning in the host. >> >> (a) is not going to happen until 4.2, and there are too many buggy hosts >> around so we'd have to define new ABI that lets the guest distinguish a >> buggy host from a fixed one. >> >> (b) works now, is not invasive, and I still maintain that the cost is >> negligible. I'm going to run for a while with CONFIG_SCHEDSTATS to see >> how often you have a migration. >> >> Anyhow if the task migration notifier is reverted we have to disable the >> whole vsyscall support altogether. > > The bug which this is fixing is very rare, have no memory of a report. > > In fact, its even difficult to create a synthetic reproducer. You need: > > 1) update of kvmclock data structure (happens once every 5 minutes). > 2) migration of task from vcpu1 to vcpu2 back to vcpu1. > 3) a data race between kvm_write_guest (string copy) and > 2 above. > > At the same time. Something maybe worth considering: On my box, vclock_gettime using kvm-clock is about 40 ns. An empty syscall is about 33 ns. clock_gettime *should* be around 17 ns. The clock_gettime syscall is about 73 ns. Could we figure out why clock_gettime (the syscall) is so slow, fix it, and then not be so sad about removing the existing kvm-clock vdso code? Once we fix the host for real (add a new feature bit and make it global instead of per-cpu), then we could have a really fast vdso implementation, too. --Andy > > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 20:39 ` Andy Lutomirski @ 2015-04-17 21:28 ` Linus Torvalds 2015-04-17 21:42 ` Andy Lutomirski 0 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2015-04-17 21:28 UTC (permalink / raw) To: Andy Lutomirski Cc: Marcelo Tosatti, Paolo Bonzini, Peter Zijlstra, linux-kernel, Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski On Fri, Apr 17, 2015 at 4:39 PM, Andy Lutomirski <luto@amacapital.net> wrote: > > On my box, vclock_gettime using kvm-clock is about 40 ns. An empty > syscall is about 33 ns. clock_gettime *should* be around 17 ns. > > The clock_gettime syscall is about 73 ns. > > Could we figure out why clock_gettime (the syscall) is so slow If we only could profile some random program (let's call it "a.out" that did the syscall(__NR_gettime_syscall) a couple million times. Oh, lookie here, Santa came around: 21.83% [k] system_call 12.85% [.] syscall 9.76% [k] __audit_syscall_exit 9.55% [k] copy_user_enhanced_fast_string 4.68% [k] __getnstimeofday64 4.08% [k] syscall_trace_enter_phase1 3.85% [k] __audit_syscall_entry 3.77% [k] unroll_tree_refs 3.15% [k] sys_clock_gettime 2.92% [k] int_very_careful 2.73% [.] main 2.35% [k] syscall_trace_leave 2.28% [k] read_tsc 1.73% [k] int_restore_rest 1.73% [k] int_with_check 1.48% [k] syscall_return 1.32% [k] dput 1.24% [k] system_call_fastpath 1.21% [k] syscall_return_via_sysret 1.21% [k] tracesys 0.81% [k] do_audit_syscall_entry 0.80% [k] current_kernel_time 0.73% [k] getnstimeofday64 0.68% [k] path_put 0.66% [k] posix_clock_realtime_get 0.61% [k] int_careful 0.60% [k] mntput 0.49% [k] kfree 0.36% [k] _copy_to_user 0.31% [k] int_ret_from_sys_call_irqs_off looks to me like it's spending a lot of time in system call auditing. Which makes no sense to me, since none of this should be triggering any auditing. And there's a lot of time in low-level kernel system call assembly code. If I only remembered the name of the crazy person who said "Ok" when I suggest he just be the maintainer of the code since he has spent a lot of time sending patches for it. Something like Amdy Letorsky. No, that wasn't it. Hmm. It's on the tip of my tongue. Oh well. Maybe somebody can remember the guys name. It's familiar for some reason. Andy? Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 21:28 ` Linus Torvalds @ 2015-04-17 21:42 ` Andy Lutomirski 2015-04-17 22:04 ` Linus Torvalds 0 siblings, 1 reply; 34+ messages in thread From: Andy Lutomirski @ 2015-04-17 21:42 UTC (permalink / raw) To: Linus Torvalds, John Stultz Cc: Marcelo Tosatti, Paolo Bonzini, Peter Zijlstra, linux-kernel, Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski On Fri, Apr 17, 2015 at 2:28 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, Apr 17, 2015 at 4:39 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> >> On my box, vclock_gettime using kvm-clock is about 40 ns. An empty >> syscall is about 33 ns. clock_gettime *should* be around 17 ns. >> >> The clock_gettime syscall is about 73 ns. >> >> Could we figure out why clock_gettime (the syscall) is so slow > > If we only could profile some random program (let's call it "a.out" > that did the syscall(__NR_gettime_syscall) a couple million times. > > Oh, lookie here, Santa came around: > > 21.83% [k] system_call > 12.85% [.] syscall > 9.76% [k] __audit_syscall_exit > 9.55% [k] copy_user_enhanced_fast_string > 4.68% [k] __getnstimeofday64 > 4.08% [k] syscall_trace_enter_phase1 > 3.85% [k] __audit_syscall_entry > 3.77% [k] unroll_tree_refs > 3.15% [k] sys_clock_gettime > 2.92% [k] int_very_careful > 2.73% [.] main > 2.35% [k] syscall_trace_leave > 2.28% [k] read_tsc > 1.73% [k] int_restore_rest > 1.73% [k] int_with_check > 1.48% [k] syscall_return > 1.32% [k] dput > 1.24% [k] system_call_fastpath > 1.21% [k] syscall_return_via_sysret > 1.21% [k] tracesys > 0.81% [k] do_audit_syscall_entry > 0.80% [k] current_kernel_time > 0.73% [k] getnstimeofday64 > 0.68% [k] path_put > 0.66% [k] posix_clock_realtime_get > 0.61% [k] int_careful > 0.60% [k] mntput > 0.49% [k] kfree > 0.36% [k] _copy_to_user > 0.31% [k] int_ret_from_sys_call_irqs_off > > looks to me like it's spending a lot of time in system call auditing. > Which makes no sense to me, since none of this should be triggering > any auditing. And there's a lot of time in low-level kernel system > call assembly code. Muahaha. The auditors have invaded your system. (I did my little benchmark with a more sensible configuration -- see way below). Can you send the output of: # auditctl -s # auditctl -l Are you, perchance, using Fedora? I lobbied rather heavily, and successfully, to get the default configuration to stop auditing. Unfortunately, the fix wasn't retroactive, so, unless you have a very fresh install, you might want to apply the fix yourself: https://fedorahosted.org/fesco/ticket/1311 > > If I only remembered the name of the crazy person who said "Ok" when I > suggest he just be the maintainer of the code since he has spent a lot > of time sending patches for it. Something like Amdy Letorsky. No, that > wasn't it. Hmm. It's on the tip of my tongue. > > Oh well. Maybe somebody can remember the guys name. It's familiar for > some reason. Andy? Amdy Lumirtowsky thinks he meant to attach a condition to his maintainerish activities: he will do his best to keep the audit code *out* of the low-level stuff, but he's going to try to avoid ever touching the audit code itself, because if he ever had to change it, he might accidentally delete the entire file. Seriously, wasn't there a TAINT_PERFORMANCE thing proposed at some point? I would love auditing to set some really loud global warning that you've just done a Bad Thing (tm) performance-wise by enabling it. Back to timing. With kvm-clock, I see: 23.80% timing_test_64 [kernel.kallsyms] [k] pvclock_clocksource_read 15.57% timing_test_64 libc-2.20.so [.] syscall 12.39% timing_test_64 [kernel.kallsyms] [k] system_call 12.35% timing_test_64 [kernel.kallsyms] [k] copy_user_generic_string 10.95% timing_test_64 [kernel.kallsyms] [k] system_call_after_swapgs 7.35% timing_test_64 [kernel.kallsyms] [k] ktime_get_ts64 6.20% timing_test_64 [kernel.kallsyms] [k] sys_clock_gettime 3.62% timing_test_64 [kernel.kallsyms] [k] system_call_fastpath 2.08% timing_test_64 timing_test_64 [.] main 1.72% timing_test_64 timing_test_64 [.] syscall@plt 1.58% timing_test_64 [kernel.kallsyms] [k] kvm_clock_get_cycles 1.22% timing_test_64 [kernel.kallsyms] [k] _copy_to_user 0.65% timing_test_64 [kernel.kallsyms] [k] posix_ktime_get_ts 0.13% timing_test_64 [kernel.kallsyms] [k] apic_timer_interrupt We've got some silly indirection, a uaccess that probably didn't get optimized very well, and the terrifying function pvclock_clocksource_read. By comparison, using tsc: 19.51% timing_test_64 libc-2.20.so [.] syscall 15.52% timing_test_64 [kernel.kallsyms] [k] system_call 15.25% timing_test_64 [kernel.kallsyms] [k] copy_user_generic_string 14.34% timing_test_64 [kernel.kallsyms] [k] system_call_after_swapgs 8.66% timing_test_64 [kernel.kallsyms] [k] ktime_get_ts64 6.95% timing_test_64 [kernel.kallsyms] [k] sys_clock_gettime 5.93% timing_test_64 [kernel.kallsyms] [k] native_read_tsc 5.12% timing_test_64 [kernel.kallsyms] [k] system_call_fastpath 2.62% timing_test_64 timing_test_64 [.] main That's better, although the uaccess silliness is still there. (No PEBS -- I don't think it works right in KVM.) --Andy > > Linus -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 21:42 ` Andy Lutomirski @ 2015-04-17 22:04 ` Linus Torvalds 2015-04-17 22:25 ` Andy Lutomirski 0 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2015-04-17 22:04 UTC (permalink / raw) To: Andy Lutomirski Cc: John Stultz, Marcelo Tosatti, Paolo Bonzini, Peter Zijlstra, linux-kernel, Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski On Fri, Apr 17, 2015 at 5:42 PM, Andy Lutomirski <luto@amacapital.net> wrote: > > Muahaha. The auditors have invaded your system. (I did my little > benchmark with a more sensible configuration -- see way below). > > Can you send the output of: > > # auditctl -s > # auditctl -l # auditctl -s enabled 1 flag 1 pid 822 rate_limit 0 backlog_limit 320 lost 0 backlog 0 backlog_wait_time 60000 loginuid_immutable 0 unlocked # auditctl -l No rules > Are you, perchance, using Fedora? F21. Yup. I used to just disable auditing in the kernel entirely, but then I ended up deciding that I need to run something closer to the broken Fedora config (selinux in particular) in order to actually optimize the real-world pathname handling situation rather than the _sane_ one. Oh well. I think audit support got enabled at the same time in my kernels because I ended up using the default config and then taking out the truly crazy stuff without noticing AUDITSYSCALL. > I lobbied rather heavily, and > successfully, to get the default configuration to stop auditing. > Unfortunately, the fix wasn't retroactive, so, unless you have a very > fresh install, you might want to apply the fix yourself: Is that fix happening in Fedora going forward, though? Like F22? > Amdy Lumirtowsky thinks he meant to attach a condition to his > maintainerish activities: he will do his best to keep the audit code > *out* of the low-level stuff, but he's going to try to avoid ever > touching the audit code itself, because if he ever had to change it, > he might accidentally delete the entire file. Oooh. That would be _such_ a shame. Can we please do it by mistake? "Oops, my fingers slipped" > Seriously, wasn't there a TAINT_PERFORMANCE thing proposed at some > point? I would love auditing to set some really loud global warning > that you've just done a Bad Thing (tm) performance-wise by enabling > it. Or even just a big fat warning in dmesg the first time auditing triggers. > Back to timing. With kvm-clock, I see: > > 23.80% timing_test_64 [kernel.kallsyms] [k] pvclock_clocksource_read Oh wow. How can that possibly be sane? Isn't the *whole* point of pvclock_clocksource_read() to be a native rdtsc with scaling? How does it cause that kind of insane pain? Oh well. Some paravirt person would need to look and care. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 22:04 ` Linus Torvalds @ 2015-04-17 22:25 ` Andy Lutomirski 2015-04-17 23:39 ` Marcelo Tosatti 2015-04-18 16:20 ` Paolo Bonzini 0 siblings, 2 replies; 34+ messages in thread From: Andy Lutomirski @ 2015-04-17 22:25 UTC (permalink / raw) To: Linus Torvalds Cc: John Stultz, Marcelo Tosatti, Paolo Bonzini, Peter Zijlstra, linux-kernel, Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski On Fri, Apr 17, 2015 at 3:04 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, Apr 17, 2015 at 5:42 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> >> Muahaha. The auditors have invaded your system. (I did my little >> benchmark with a more sensible configuration -- see way below). >> >> Can you send the output of: >> >> # auditctl -s >> # auditctl -l > > # auditctl -s > enabled 1 > flag 1 > pid 822 > rate_limit 0 > backlog_limit 320 > lost 0 > backlog 0 > backlog_wait_time 60000 > loginuid_immutable 0 unlocked > # auditctl -l > No rules Yes, "No rules" doesn't mean "don't audit". > >> Are you, perchance, using Fedora? > > F21. Yup. > > I used to just disable auditing in the kernel entirely, but then I > ended up deciding that I need to run something closer to the broken > Fedora config (selinux in particular) in order to actually optimize > the real-world pathname handling situation rather than the _sane_ one. > Oh well. I think audit support got enabled at the same time in my > kernels because I ended up using the default config and then taking > out the truly crazy stuff without noticing AUDITSYSCALL. > >> I lobbied rather heavily, and >> successfully, to get the default configuration to stop auditing. >> Unfortunately, the fix wasn't retroactive, so, unless you have a very >> fresh install, you might want to apply the fix yourself: > > Is that fix happening in Fedora going forward, though? Like F22? It's in F21, actually. Unfortunately, the audit package is really weird. It installs /etc/audit/rules.d/audit.rules, containing: # This file contains the auditctl rules that are loaded # whenever the audit daemon is started via the initscripts. # The rules are simply the parameters that would be passed # to auditctl. # First rule - delete all -D # This suppresses syscall auditing for all tasks started # with this rule in effect. Remove it if you need syscall # auditing. -a task,never Then, if it's a fresh install (i.e. /etc/audit/audit.rules doesn't exist) it copies that file to /etc/audit/audit.rules post-install. (No, I don't know why it works this way.) IOW, you might want to copy your /etc/audit/rules.d/audit.rules to /etc/audit/audit.rules. I think you need to reboot to get the full effect. You could apply the rule manually (or maybe restart the audit service), but it will only affect newly-started tasks. > >> Amdy Lumirtowsky thinks he meant to attach a condition to his >> maintainerish activities: he will do his best to keep the audit code >> *out* of the low-level stuff, but he's going to try to avoid ever >> touching the audit code itself, because if he ever had to change it, >> he might accidentally delete the entire file. > > Oooh. That would be _such_ a shame. > > Can we please do it by mistake? "Oops, my fingers slipped" > >> Seriously, wasn't there a TAINT_PERFORMANCE thing proposed at some >> point? I would love auditing to set some really loud global warning >> that you've just done a Bad Thing (tm) performance-wise by enabling >> it. > > Or even just a big fat warning in dmesg the first time auditing triggers. > >> Back to timing. With kvm-clock, I see: >> >> 23.80% timing_test_64 [kernel.kallsyms] [k] pvclock_clocksource_read > > Oh wow. How can that possibly be sane? > > Isn't the *whole* point of pvclock_clocksource_read() to be a native > rdtsc with scaling? How does it cause that kind of insane pain? An unnecessarily complicated protocol, a buggy host implementation, and an unnecessarily complicated guest implementation :( > > Oh well. Some paravirt person would need to look and care. The code there is a bit scary. --Andy > > Linus -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 22:25 ` Andy Lutomirski @ 2015-04-17 23:39 ` Marcelo Tosatti 2015-04-18 16:20 ` Paolo Bonzini 1 sibling, 0 replies; 34+ messages in thread From: Marcelo Tosatti @ 2015-04-17 23:39 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, John Stultz, Paolo Bonzini, Peter Zijlstra, linux-kernel, Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski On Fri, Apr 17, 2015 at 03:25:28PM -0700, Andy Lutomirski wrote: > On Fri, Apr 17, 2015 at 3:04 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > On Fri, Apr 17, 2015 at 5:42 PM, Andy Lutomirski <luto@amacapital.net> wrote: > >> > >> Muahaha. The auditors have invaded your system. (I did my little > >> benchmark with a more sensible configuration -- see way below). > >> > >> Can you send the output of: > >> > >> # auditctl -s > >> # auditctl -l > > > > # auditctl -s > > enabled 1 > > flag 1 > > pid 822 > > rate_limit 0 > > backlog_limit 320 > > lost 0 > > backlog 0 > > backlog_wait_time 60000 > > loginuid_immutable 0 unlocked > > # auditctl -l > > No rules > > Yes, "No rules" doesn't mean "don't audit". > > > > >> Are you, perchance, using Fedora? > > > > F21. Yup. > > > > I used to just disable auditing in the kernel entirely, but then I > > ended up deciding that I need to run something closer to the broken > > Fedora config (selinux in particular) in order to actually optimize > > the real-world pathname handling situation rather than the _sane_ one. > > Oh well. I think audit support got enabled at the same time in my > > kernels because I ended up using the default config and then taking > > out the truly crazy stuff without noticing AUDITSYSCALL. > > > >> I lobbied rather heavily, and > >> successfully, to get the default configuration to stop auditing. > >> Unfortunately, the fix wasn't retroactive, so, unless you have a very > >> fresh install, you might want to apply the fix yourself: > > > > Is that fix happening in Fedora going forward, though? Like F22? > > It's in F21, actually. Unfortunately, the audit package is really > weird. It installs /etc/audit/rules.d/audit.rules, containing: > > # This file contains the auditctl rules that are loaded > # whenever the audit daemon is started via the initscripts. > # The rules are simply the parameters that would be passed > # to auditctl. > > # First rule - delete all > -D > > # This suppresses syscall auditing for all tasks started > # with this rule in effect. Remove it if you need syscall > # auditing. > -a task,never > > Then, if it's a fresh install (i.e. /etc/audit/audit.rules doesn't > exist) it copies that file to /etc/audit/audit.rules post-install. > (No, I don't know why it works this way.) > > IOW, you might want to copy your /etc/audit/rules.d/audit.rules to > /etc/audit/audit.rules. I think you need to reboot to get the full > effect. You could apply the rule manually (or maybe restart the audit > service), but it will only affect newly-started tasks. > > > > >> Amdy Lumirtowsky thinks he meant to attach a condition to his > >> maintainerish activities: he will do his best to keep the audit code > >> *out* of the low-level stuff, but he's going to try to avoid ever > >> touching the audit code itself, because if he ever had to change it, > >> he might accidentally delete the entire file. > > > > Oooh. That would be _such_ a shame. > > > > Can we please do it by mistake? "Oops, my fingers slipped" > > > >> Seriously, wasn't there a TAINT_PERFORMANCE thing proposed at some > >> point? I would love auditing to set some really loud global warning > >> that you've just done a Bad Thing (tm) performance-wise by enabling > >> it. > > > > Or even just a big fat warning in dmesg the first time auditing triggers. > > > >> Back to timing. With kvm-clock, I see: > >> > >> 23.80% timing_test_64 [kernel.kallsyms] [k] pvclock_clocksource_read > > > > Oh wow. How can that possibly be sane? > > > > Isn't the *whole* point of pvclock_clocksource_read() to be a native > > rdtsc with scaling? How does it cause that kind of insane pain? > > An unnecessarily complicated protocol, a buggy host implementation, > and an unnecessarily complicated guest implementation :( How about start by removing the unnecessary rdtsc-barrier? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 22:25 ` Andy Lutomirski 2015-04-17 23:39 ` Marcelo Tosatti @ 2015-04-18 16:20 ` Paolo Bonzini 1 sibling, 0 replies; 34+ messages in thread From: Paolo Bonzini @ 2015-04-18 16:20 UTC (permalink / raw) To: Andy Lutomirski, Linus Torvalds Cc: John Stultz, Marcelo Tosatti, Peter Zijlstra, linux-kernel, Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski On 18/04/2015 00:25, Andy Lutomirski wrote: >> Isn't the *whole* point of pvclock_clocksource_read() to be a native >> rdtsc with scaling? How does it cause that kind of insane pain? It's possible that your machine ends up with PVCLOCK_TSC_STABLE_BIT clear, so you get an atomic cmpxchg in addition (and associated cacheline bouncing, since anything reading the clocksource in the kernel will cause that variable to bounce). But that's not too common on recent machines. Is the vsyscall faster for you or does it degenerate to the syscall? If so, you have PVCLOCK_TSC_STABLE_BIT clear. > An unnecessarily complicated protocol, a buggy host implementation, > and an unnecessarily complicated guest implementation :( pvclock_clocksource_read() itself is not scary and need not worry about the buggy host implementation (preempt_disable makes things easy). It's the vDSO stuff that has the nice things. There's a few micro-optimizations that we could do (the guest implementation _is_ unnecessarily baroque), but it may not be enough if the rdtsc_barrier()s (lfence) are the performance killers. Will look closely on Monday. Paolo >> Oh well. Some paravirt person would need to look and care. > > The code there is a bit scary. > > --Andy > >> >> Linus > > > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-17 20:18 ` Marcelo Tosatti 2015-04-17 20:39 ` Andy Lutomirski @ 2015-04-20 16:59 ` Paolo Bonzini 2015-04-20 20:27 ` Andy Lutomirski 2015-04-22 20:56 ` Marcelo Tosatti 1 sibling, 2 replies; 34+ messages in thread From: Paolo Bonzini @ 2015-04-20 16:59 UTC (permalink / raw) To: Marcelo Tosatti Cc: Peter Zijlstra, torvalds, linux-kernel, gleb, kvm, Ralf Baechle, luto On 17/04/2015 22:18, Marcelo Tosatti wrote: > The bug which this is fixing is very rare, have no memory of a report. > > In fact, its even difficult to create a synthetic reproducer. But then why was the task migration notifier even in Jeremy's original code for Xen? Was it supposed to work even on non-synchronized TSC? If that's the case, then it could be reverted indeed; but then why did you commit this patch to 4.1? Did you think of something that would cause the seqcount-like protocol to fail, and that turned out not to be the case later? I was only following the mailing list sparsely in March. Paolo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-20 16:59 ` Paolo Bonzini @ 2015-04-20 20:27 ` Andy Lutomirski 2015-04-22 21:21 ` Marcelo Tosatti 2015-04-22 20:56 ` Marcelo Tosatti 1 sibling, 1 reply; 34+ messages in thread From: Andy Lutomirski @ 2015-04-20 20:27 UTC (permalink / raw) To: Paolo Bonzini Cc: Marcelo Tosatti, Peter Zijlstra, Linus Torvalds, linux-kernel, Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski On Mon, Apr 20, 2015 at 9:59 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > On 17/04/2015 22:18, Marcelo Tosatti wrote: >> The bug which this is fixing is very rare, have no memory of a report. >> >> In fact, its even difficult to create a synthetic reproducer. > > But then why was the task migration notifier even in Jeremy's original > code for Xen? Was it supposed to work even on non-synchronized TSC? > > If that's the case, then it could be reverted indeed; but then why did > you commit this patch to 4.1? Did you think of something that would > cause the seqcount-like protocol to fail, and that turned out not to be > the case later? I was only following the mailing list sparsely in March. I don't think anyone ever tried that hard to test this stuff. There was an infinte loop that Firefox was triggering as a KVM guest somewhat reliably until a couple months ago in the same vdso code. :( --Andy ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-20 20:27 ` Andy Lutomirski @ 2015-04-22 21:21 ` Marcelo Tosatti 2015-04-23 9:13 ` Paolo Bonzini 0 siblings, 1 reply; 34+ messages in thread From: Marcelo Tosatti @ 2015-04-22 21:21 UTC (permalink / raw) To: Andy Lutomirski Cc: Paolo Bonzini, Peter Zijlstra, Linus Torvalds, linux-kernel, Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski On Mon, Apr 20, 2015 at 01:27:58PM -0700, Andy Lutomirski wrote: > On Mon, Apr 20, 2015 at 9:59 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > > > > On 17/04/2015 22:18, Marcelo Tosatti wrote: > >> The bug which this is fixing is very rare, have no memory of a report. > >> > >> In fact, its even difficult to create a synthetic reproducer. > > > > But then why was the task migration notifier even in Jeremy's original > > code for Xen? Was it supposed to work even on non-synchronized TSC? > > > > If that's the case, then it could be reverted indeed; but then why did > > you commit this patch to 4.1? Did you think of something that would > > cause the seqcount-like protocol to fail, and that turned out not to be > > the case later? I was only following the mailing list sparsely in March. > > I don't think anyone ever tried that hard to test this stuff. There > was an infinte loop that Firefox was triggering as a KVM guest > somewhat reliably until a couple months ago in the same vdso code. :( https://bugzilla.redhat.com/show_bug.cgi?id=1174664 --- Comment #5 from Juan Quintela <quintela@redhat.com> --- Another round # dmesg | grep msr [ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00 [ 0.000000] kvm-clock: cpu 0, msr 1:1ffd8001, primary cpu clock [ 0.000000] kvm-stealtime: cpu 0, msr 11fc0d100 [ 0.041174] kvm-clock: cpu 1, msr 1:1ffd8041, secondary cpu clock [ 0.053011] kvm-stealtime: cpu 1, msr 11fc8d100 After start: [root@trasno yum.repos.d]# virsh qemu-monitor-command --hmp browser 'xp /8x 0x1ffd8000' 000000001ffd8000: 0x3b401060 0xfffc7f4b 0x3b42d040 0xfffc7f4b 000000001ffd8010: 0x3b42d460 0xfffc7f4b 0x3b42d4c0 0xfffc7f4b [root@trasno yum.repos.d]# virsh qemu-monitor-command --hmp browser 'xp /8x 0x1ffd8040' 000000001ffd8040: 0x3b42d700 0xfffc7f4b 0x3b42d760 0xfffc7f4b 000000001ffd8050: 0x3b42d7c0 0xfffc7f4b 0x3b42d820 0xfffc7f4b When firefox hangs [root@trasno yum.repos.d]# virsh qemu-monitor-command --hmp browser 'xp /8x 0x1ffd8000' 000000001ffd8000: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 000000001ffd8010: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a [root@trasno yum.repos.d]# virsh qemu-monitor-command --hmp browser 'xp /8x 0x1ffd8040' 000000001ffd8040: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 000000001ffd8050: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-22 21:21 ` Marcelo Tosatti @ 2015-04-23 9:13 ` Paolo Bonzini 2015-04-23 11:51 ` Marcelo Tosatti 0 siblings, 1 reply; 34+ messages in thread From: Paolo Bonzini @ 2015-04-23 9:13 UTC (permalink / raw) To: Marcelo Tosatti, Andy Lutomirski Cc: Peter Zijlstra, Linus Torvalds, linux-kernel, Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski On 22/04/2015 23:21, Marcelo Tosatti wrote: > On Mon, Apr 20, 2015 at 01:27:58PM -0700, Andy Lutomirski wrote: >> On Mon, Apr 20, 2015 at 9:59 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>> >>> >>> On 17/04/2015 22:18, Marcelo Tosatti wrote: >>>> The bug which this is fixing is very rare, have no memory of a report. >>>> >>>> In fact, its even difficult to create a synthetic reproducer. >>> >>> But then why was the task migration notifier even in Jeremy's original >>> code for Xen? Was it supposed to work even on non-synchronized TSC? >>> >>> If that's the case, then it could be reverted indeed; but then why did >>> you commit this patch to 4.1? Did you think of something that would >>> cause the seqcount-like protocol to fail, and that turned out not to be >>> the case later? I was only following the mailing list sparsely in March. >> >> I don't think anyone ever tried that hard to test this stuff. There >> was an infinte loop that Firefox was triggering as a KVM guest >> somewhat reliably until a couple months ago in the same vdso code. :( > > https://bugzilla.redhat.com/show_bug.cgi?id=1174664 That was the missing volatile in an asm. Older compilers didn't catch it. :( Paolo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-23 9:13 ` Paolo Bonzini @ 2015-04-23 11:51 ` Marcelo Tosatti 2015-04-23 12:02 ` Paolo Bonzini 0 siblings, 1 reply; 34+ messages in thread From: Marcelo Tosatti @ 2015-04-23 11:51 UTC (permalink / raw) To: Paolo Bonzini Cc: Andy Lutomirski, Peter Zijlstra, Linus Torvalds, linux-kernel, Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski On Thu, Apr 23, 2015 at 11:13:23AM +0200, Paolo Bonzini wrote: > > > On 22/04/2015 23:21, Marcelo Tosatti wrote: > > On Mon, Apr 20, 2015 at 01:27:58PM -0700, Andy Lutomirski wrote: > >> On Mon, Apr 20, 2015 at 9:59 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > >>> > >>> > >>> On 17/04/2015 22:18, Marcelo Tosatti wrote: > >>>> The bug which this is fixing is very rare, have no memory of a report. > >>>> > >>>> In fact, its even difficult to create a synthetic reproducer. > >>> > >>> But then why was the task migration notifier even in Jeremy's original > >>> code for Xen? Was it supposed to work even on non-synchronized TSC? > >>> > >>> If that's the case, then it could be reverted indeed; but then why did > >>> you commit this patch to 4.1? Did you think of something that would > >>> cause the seqcount-like protocol to fail, and that turned out not to be > >>> the case later? I was only following the mailing list sparsely in March. > >> > >> I don't think anyone ever tried that hard to test this stuff. There > >> was an infinte loop that Firefox was triggering as a KVM guest > >> somewhat reliably until a couple months ago in the same vdso code. :( > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1174664 > > That was the missing volatile in an asm. Older compilers didn't catch > it. :( How do you know that? It looks like memory corruption (look at the pattern at the end). ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-23 11:51 ` Marcelo Tosatti @ 2015-04-23 12:02 ` Paolo Bonzini 2015-04-23 17:06 ` Marcelo Tosatti 0 siblings, 1 reply; 34+ messages in thread From: Paolo Bonzini @ 2015-04-23 12:02 UTC (permalink / raw) To: Marcelo Tosatti Cc: Andy Lutomirski, Peter Zijlstra, Linus Torvalds, linux-kernel, Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski On 23/04/2015 13:51, Marcelo Tosatti wrote: >>> > > https://bugzilla.redhat.com/show_bug.cgi?id=1174664 >> > >> > That was the missing volatile in an asm. Older compilers didn't catch >> > it. :( > How do you know that? It looks like memory corruption (look at the > pattern at the end). I suspect some kind of operator error there, it makes no sense. On the other hand, bug 1178975 is much clearer and the symptoms are the same. In that bug, you can see that the same kernel source works on f20 (package version 3.17.7-200.fc20.x86_64) and fails on f21 (package version 3.17.7-300.fc21.x86_64). Of course the compiler is different. The newer one hoists the lsl out of the loop; if you get a CPU migration at the wrong time, the cpu != cpu1 condition will always be true the loop will never exit. Paolo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-23 12:02 ` Paolo Bonzini @ 2015-04-23 17:06 ` Marcelo Tosatti 0 siblings, 0 replies; 34+ messages in thread From: Marcelo Tosatti @ 2015-04-23 17:06 UTC (permalink / raw) To: Paolo Bonzini Cc: Andy Lutomirski, Peter Zijlstra, Linus Torvalds, linux-kernel, Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski On Thu, Apr 23, 2015 at 02:02:29PM +0200, Paolo Bonzini wrote: > > > On 23/04/2015 13:51, Marcelo Tosatti wrote: > >>> > > https://bugzilla.redhat.com/show_bug.cgi?id=1174664 > >> > > >> > That was the missing volatile in an asm. Older compilers didn't catch > >> > it. :( > > How do you know that? It looks like memory corruption (look at the > > pattern at the end). > > I suspect some kind of operator error there, it makes no sense. if (unlikely(s->flags & SLAB_POISON)) memset(start, POISON_INUSE, PAGE_SIZE << order); * Padding is done using 0x5a (POISON_INUSE) > On the other hand, bug 1178975 is much clearer and the symptoms are the > same. In that bug, you can see that the same kernel source works on f20 > (package version 3.17.7-200.fc20.x86_64) and fails on f21 (package > version 3.17.7-300.fc21.x86_64). Of course the compiler is different. > The newer one hoists the lsl out of the loop; if you get a CPU migration > at the wrong time, the cpu != cpu1 condition will always be true the > loop will never exit. > > Paolo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-20 16:59 ` Paolo Bonzini 2015-04-20 20:27 ` Andy Lutomirski @ 2015-04-22 20:56 ` Marcelo Tosatti 2015-04-22 21:01 ` Paolo Bonzini 1 sibling, 1 reply; 34+ messages in thread From: Marcelo Tosatti @ 2015-04-22 20:56 UTC (permalink / raw) To: Paolo Bonzini Cc: Peter Zijlstra, torvalds, linux-kernel, gleb, kvm, Ralf Baechle, luto On Mon, Apr 20, 2015 at 06:59:04PM +0200, Paolo Bonzini wrote: > > > On 17/04/2015 22:18, Marcelo Tosatti wrote: > > The bug which this is fixing is very rare, have no memory of a report. > > > > In fact, its even difficult to create a synthetic reproducer. > > But then why was the task migration notifier even in Jeremy's original > code for Xen? To cover for the vcpu1 -> vcpu2 -> vcpu1 case, i believe. > Was it supposed to work even on non-synchronized TSC? Yes it is supposed to work on non-synchronized TSC. > If that's the case, then it could be reverted indeed; but then why did > you commit this patch to 4.1? Because it fixes the problem Andy reported (see Subject: KVM: x86: fix kvmclock write race (v2) on kvm@). As long as you have Radim's fix on top. > Did you think of something that would > cause the seqcount-like protocol to fail, and that turned out not to be > the case later? I was only following the mailing list sparsely in March. No. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-22 20:56 ` Marcelo Tosatti @ 2015-04-22 21:01 ` Paolo Bonzini 2015-04-22 22:55 ` Marcelo Tosatti 0 siblings, 1 reply; 34+ messages in thread From: Paolo Bonzini @ 2015-04-22 21:01 UTC (permalink / raw) To: Marcelo Tosatti Cc: Peter Zijlstra, torvalds, linux-kernel, gleb, kvm, Ralf Baechle, luto On 22/04/2015 22:56, Marcelo Tosatti wrote: >> > But then why was the task migration notifier even in Jeremy's original >> > code for Xen? > To cover for the vcpu1 -> vcpu2 -> vcpu1 case, i believe. Ok, to cover it for non-synchronized TSC. While KVM requires synchronized TSC. > > If that's the case, then it could be reverted indeed; but then why did > > you commit this patch to 4.1? > > Because it fixes the problem Andy reported (see Subject: KVM: x86: fix > kvmclock write race (v2) on kvm@). As long as you have Radim's > fix on top. But if it's so rare, and it was known that fixing the host protocol was just as good a solution, why was the guest fix committed? I'm just trying to understand. I am worried that this patch was rushed in; so far I had assumed it wasn't (a revert of a revert is rare enough that you don't do it lightly...) but maybe I was wrong. Right now I cannot even decide whether to revert it (and please Peter in the process :)) or submit the Kconfig symbol patch officially. Paolo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-22 21:01 ` Paolo Bonzini @ 2015-04-22 22:55 ` Marcelo Tosatti 2015-04-23 11:29 ` Paolo Bonzini 0 siblings, 1 reply; 34+ messages in thread From: Marcelo Tosatti @ 2015-04-22 22:55 UTC (permalink / raw) To: Paolo Bonzini Cc: Peter Zijlstra, torvalds, linux-kernel, gleb, kvm, Ralf Baechle, luto On Wed, Apr 22, 2015 at 11:01:49PM +0200, Paolo Bonzini wrote: > > > On 22/04/2015 22:56, Marcelo Tosatti wrote: > >> > But then why was the task migration notifier even in Jeremy's original > >> > code for Xen? > > To cover for the vcpu1 -> vcpu2 -> vcpu1 case, i believe. > > Ok, to cover it for non-synchronized TSC. While KVM requires > synchronized TSC. > > > > If that's the case, then it could be reverted indeed; but then why did > > > you commit this patch to 4.1? > > > > Because it fixes the problem Andy reported (see Subject: KVM: x86: fix > > kvmclock write race (v2) on kvm@). As long as you have Radim's > > fix on top. > > But if it's so rare, and it was known that fixing the host protocol was > just as good a solution, why was the guest fix committed? I don't know. Should have fixed the host protocol. > I'm just trying to understand. I am worried that this patch was rushed > in; so far I had assumed it wasn't (a revert of a revert is rare enough > that you don't do it lightly...) but maybe I was wrong. Yes it was rushed in. > Right now I cannot even decide whether to revert it (and please Peter in > the process :)) or submit the Kconfig symbol patch officially. > > Paolo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] First batch of KVM changes for 4.1 2015-04-22 22:55 ` Marcelo Tosatti @ 2015-04-23 11:29 ` Paolo Bonzini 0 siblings, 0 replies; 34+ messages in thread From: Paolo Bonzini @ 2015-04-23 11:29 UTC (permalink / raw) To: Marcelo Tosatti Cc: Peter Zijlstra, torvalds, linux-kernel, gleb, kvm, Ralf Baechle, luto On 23/04/2015 00:55, Marcelo Tosatti wrote: > On Wed, Apr 22, 2015 at 11:01:49PM +0200, Paolo Bonzini wrote: >> >> >> On 22/04/2015 22:56, Marcelo Tosatti wrote: >>>>> But then why was the task migration notifier even in Jeremy's original >>>>> code for Xen? >>> To cover for the vcpu1 -> vcpu2 -> vcpu1 case, i believe. >> >> Ok, to cover it for non-synchronized TSC. While KVM requires >> synchronized TSC. >> >>>> If that's the case, then it could be reverted indeed; but then why did >>>> you commit this patch to 4.1? >>> >>> Because it fixes the problem Andy reported (see Subject: KVM: x86: fix >>> kvmclock write race (v2) on kvm@). As long as you have Radim's >>> fix on top. >> >> But if it's so rare, and it was known that fixing the host protocol was >> just as good a solution, why was the guest fix committed? > > I don't know. Should have fixed the host protocol. No problem. Let's do the right thing now. >> I'm just trying to understand. I am worried that this patch was rushed >> in; so far I had assumed it wasn't (a revert of a revert is rare enough >> that you don't do it lightly...) but maybe I was wrong. > > Yes it was rushed in. Ok, so re-reverted it will be. Paolo ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2015-04-23 17:07 UTC | newest] Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-04-10 15:01 [GIT PULL] First batch of KVM changes for 4.1 Paolo Bonzini 2015-04-17 8:52 ` Peter Zijlstra 2015-04-17 9:17 ` Peter Zijlstra 2015-04-17 10:09 ` Paolo Bonzini 2015-04-17 10:36 ` Peter Zijlstra 2015-04-17 10:38 ` Paolo Bonzini 2015-04-17 10:55 ` Peter Zijlstra 2015-04-17 12:46 ` Paolo Bonzini 2015-04-17 13:10 ` Peter Zijlstra 2015-04-17 13:38 ` Paolo Bonzini 2015-04-17 13:43 ` Peter Zijlstra 2015-04-17 14:57 ` Paolo Bonzini 2015-04-17 19:01 ` Marcelo Tosatti 2015-04-17 19:16 ` Andy Lutomirski 2015-04-17 19:57 ` Paolo Bonzini 2015-04-17 20:18 ` Marcelo Tosatti 2015-04-17 20:39 ` Andy Lutomirski 2015-04-17 21:28 ` Linus Torvalds 2015-04-17 21:42 ` Andy Lutomirski 2015-04-17 22:04 ` Linus Torvalds 2015-04-17 22:25 ` Andy Lutomirski 2015-04-17 23:39 ` Marcelo Tosatti 2015-04-18 16:20 ` Paolo Bonzini 2015-04-20 16:59 ` Paolo Bonzini 2015-04-20 20:27 ` Andy Lutomirski 2015-04-22 21:21 ` Marcelo Tosatti 2015-04-23 9:13 ` Paolo Bonzini 2015-04-23 11:51 ` Marcelo Tosatti 2015-04-23 12:02 ` Paolo Bonzini 2015-04-23 17:06 ` Marcelo Tosatti 2015-04-22 20:56 ` Marcelo Tosatti 2015-04-22 21:01 ` Paolo Bonzini 2015-04-22 22:55 ` Marcelo Tosatti 2015-04-23 11:29 ` Paolo Bonzini
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.