kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] First batch of KVM changes for 4.1
@ 2015-04-10 15:01 Paolo Bonzini
  2015-04-17  8:52 ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2015-04-10 15:01 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel, gleb, kvm, Ralf Baechle

Linus,

The following changes since commit ae705930fca6322600690df9dc1c7d0516145a93:

  arm/arm64: KVM: Keep elrsr/aisr in sync with software model (2015-03-14 13:42:07 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

for you to fetch changes up to ca3f0874723fad81d0c701b63ae3a17a408d5f25:

  KVM: use slowpath for cross page cached accesses (2015-04-10 16:04:45 +0200)

Note that this includes some MIPS patches that will also come through
Ralf Baechle's tree (CCed).  As usual, PPC will be sent in a second
pull request.  This time, no child is due during the merge window,
and hence the KVM/PPC pull request shall come normally from me.

Thanks,

Paolo

----------------------------------------------------------------
The most interesting bit here is irqfd/ioeventfd support for ARM and ARM64.

ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling
vhost, too), page aging

s390: interrupt handling rework, allowing to inject all local interrupts
via new ioctl and to get/set the full local irq state for migration
and introspection.  New ioctls to access memory by virtual address,
and to get/set the guest storage keys.  SIMD support.

MIPS: FPU and MIPS SIMD Architecture (MSA) support.  Includes some patches
from Ralf Baechle's MIPS tree.

x86: bugfixes (notably for pvclock, the others are small) and cleanups.
Another small latency improvement for the TSC deadline timer.

----------------------------------------------------------------
Alex Bennée (2):
      arm/arm64: KVM: export VCPU power state via MP_STATE ioctl
      arm/arm64: KVM: add a common vgic_queue_irq_to_lr fn

Alexander Yarygin (5):
      KVM: s390: Use the read_guest_abs() in guest debug functions
      KVM: s390: Fix low-address protection for real addresses
      KVM: s390: Guest's memory access functions get access registers
      KVM: s390: Optimize paths where get_vcpu_asce() is invoked
      KVM: s390: Add access register mode

Andre Przywara (10):
      KVM: move iodev.h from virt/kvm/ to include/kvm
      KVM: arm/arm64: remove now unneeded include directory from Makefile
      KVM: x86: remove now unneeded include directory from Makefile
      KVM: arm/arm64: rename struct kvm_mmio_range to vgic_io_range
      KVM: arm/arm64: simplify vgic_find_range() and callers
      KVM: arm/arm64: implement kvm_io_bus MMIO handling for the VGIC
      KVM: arm/arm64: prepare GICv2 emulation to be handled by kvm_io_bus
      KVM: arm/arm64: merge GICv3 RD_base and SGI_base register frames
      KVM: arm/arm64: prepare GICv3 emulation to use kvm_io_bus MMIO handling
      KVM: arm/arm64: rework MMIO abort handling to use KVM MMIO bus

Arseny Solokha (1):
      kvm/ppc/mpic: drop unused IRQ_testbit

Bandan Das (1):
      KVM: SVM: Fix confusing message if no exit handlers are installed

Christian Borntraeger (3):
      KVM: make halt_poll_ns static
      KVM: MAINTAINERS: add file arch/x86/kernel/kvm.c|kvmclock.c
      KVM: s390: enable more features that need no hypervisor changes

Christoffer Dall (3):
      arm/arm64: KVM: Kill CONFIG_KVM_ARM_{VGIC,TIMER}
      arm/arm64: KVM: support for un-queuing active IRQs
      arm/arm64: KVM: Fix migration race in the arch timer

David Hildenbrand (5):
      KVM: s390: fix handling of write errors in the tpi handler
      KVM: s390: reinjection of irqs can fail in the tpi handler
      KVM: s390: fix instruction interception trace point
      KVM: s390: store the breaking-event address on pgm interrupts
      KVM: s390: cpu timer irq priority

David Kaplan (3):
      KVM: SVM: use kvm_register_write()/read()
      kvm: svm: make wbinvd faster
      x86: svm: use cr_interception for SVM_EXIT_CR0_SEL_WRITE

Dominik Dingel (1):
      KVM: s390: cleanup jump lables in kvm_arch_init_vm

Ekaterina Tumanova (2):
      KVM: s390: Zero out current VMDB of STSI before including level3 data.
      KVM: s390: introduce post handlers for STSI

Eric Auger (5):
      KVM: arm/arm64: unset CONFIG_HAVE_KVM_IRQCHIP
      KVM: introduce kvm_arch_intc_initialized and use it in irqfd
      KVM: arm/arm64: implement kvm_arch_intc_initialized
      KVM: arm/arm64: remove coarse grain dist locking at kvm_vgic_sync_hwstate
      KVM: arm/arm64: add irqfd support

Eric Farman (5):
      KVM: s390: Allocate and save/restore vector registers
      KVM: s390: Vector exceptions
      KVM: s390: Add new SIGP order to kernel counters
      KVM: s390: Machine Check
      KVM: s390: Enable vector support for capable guest

Eugene Korenevsky (4):
      KVM: remove useless check of "ret" variable prior to returning the same value
      KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu
      KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry
      KVM: nVMX: remove unnecessary double caching of MAXPHYADDR

Geert Uytterhoeven (1):
      KVM: s390: Spelling s/intance/instance/

James Hogan (24):
      MIPS: lose_fpu(): Disable FPU when MSA enabled
      Revert "MIPS: Don't assume 64-bit FP registers for context switch"
      MIPS: MSA: Fix big-endian FPR_IDX implementation
      Merge branch '4.1-fp' of git://git.linux-mips.org/pub/scm/ralf/upstream-sfr into kvm_mips_queue
      MIPS: KVM: Handle MSA Disabled exceptions from guest
      MIPS: Clear [MSA]FPE CSR.Cause after notify_die()
      MIPS: KVM: Handle TRAP exceptions from guest kernel
      MIPS: KVM: Implement PRid CP0 register access
      MIPS: KVM: Sort kvm_mips_get_reg() registers
      MIPS: KVM: Drop pr_info messages on init/exit
      MIPS: KVM: Clean up register definitions a little
      MIPS: KVM: Simplify default guest Config registers
      MIPS: KVM: Add Config4/5 and writing of Config registers
      MIPS: KVM: Add vcpu_get_regs/vcpu_set_regs callback
      MIPS: KVM: Add base guest FPU support
      MIPS: KVM: Emulate FPU bits in COP0 interface
      MIPS: KVM: Add FP exception handling
      MIPS: KVM: Expose FPU registers
      MIPS: KVM: Wire up FPU capability
      MIPS: KVM: Add base guest MSA support
      MIPS: KVM: Emulate MSA bits in COP0 interface
      MIPS: KVM: Add MSA exception handling
      MIPS: KVM: Expose MSA registers
      MIPS: KVM: Wire up MSA capability

Jan Kiszka (3):
      KVM: x86: Fix re-execution of patched vmmcall
      KVM: nVMX: Do not emulate #UD while in guest mode
      KVM: nVMX: Add support for rdtscp

Jason J. Herne (1):
      KVM: s390: Create ioctl for Getting/Setting guest storage keys

Jens Freimann (5):
      KVM: s390: fix get_all_floating_irqs
      KVM: s390: deliver floating interrupts in order of priority
      KVM: s390: add ioctl to inject local interrupts
      KVM: s390: refactor vcpu injection function
      KVM: s390: migrate vcpu interrupt state

Joe Perches (1):
      x86: Use bool function return values of true/false not 1/0

Joel Schopp (1):
      kvm: x86: make kvm_emulate_* consistant

Kevin Mulvey (2):
      KVM: white space formatting in kvm_main.c
      KVM: fix checkpatch.pl errors in kvm/irqchip.c

Marc Zyngier (3):
      arm/arm64: KVM: Allow handle_hva_to_gpa to return a value
      arm/arm64: KVM: Implement Stage-2 page aging
      arm/arm64: KVM: Optimize handling of Access Flag faults

Marcelo Tosatti (3):
      Merge tag 'kvm-s390-next-20150306' of git://git.kernel.org/.../kvms390/linux into queue
      x86: kvm: Revert "remove sched notifier for cross-cpu migrations"
      Merge tag 'kvm-s390-next-20150318' of git://git.kernel.org/.../kvms390/linux into queue

Mark Rutland (1):
      KVM: vgic: add virt-capable compatible strings

Michael Mueller (3):
      KVM: s390: perform vcpu model setup in a function
      KVM: s390: drop SIMD bit from kvm_s390_fac_list_mask
      KVM: s390: represent SIMD cap in kvm facility

Nadav Amit (8):
      KVM: x86: CMOV emulation on legacy mode is wrong
      KVM: x86: POPA emulation may not clear bits [63:32]
      KVM: x86: BSF and BSR emulation change register unnecassarily
      KVM: x86: removing redundant eflags bits definitions
      KVM: x86: Remove redundant definitions
      KVM: x86: BSP in MSR_IA32_APICBASE is writable
      KVM: x86: DR0-DR3 are not clear on reset
      KVM: x86: Clear CR2 on VCPU reset

Nikolay Nikolaev (2):
      KVM: Redesign kvm_io_bus_ API to pass VCPU structure to the callbacks.
      KVM: arm/arm64: enable KVM_CAP_IOEVENTFD

Paolo Bonzini (8):
      KVM: arm/arm64: prefer IS_ENABLED to a static variable
      Merge tag 'kvm_mips_20150327' of git://git.kernel.org/.../jhogan/kvm-mips into kvm-next
      Merge tag 'kvm-arm-fixes-4.0-rc5' of git://git.kernel.org/.../kvmarm/kvmarm into 'kvm-next'
      Merge tag 'kvm-arm-for-4.1' of git://git.kernel.org/.../kvmarm/kvmarm into 'kvm-next'
      Merge tag 'kvm-s390-next-20150331' of git://git.kernel.org/.../kvms390/linux into HEAD
      KVM: x86: extract blocking logic from __vcpu_run
      KVM: x86: optimize delivery of TSC deadline timer interrupt
      KVM: remove kvm_read_hva and kvm_read_hva_atomic

Paul Burton (8):
      MIPS: Push .set mips64r* into the functions needing it
      MIPS: assume at as source/dest of MSA copy/insert instructions
      MIPS: remove MSA macro recursion
      MIPS: wrap cfcmsa & ctcmsa accesses for toolchains with MSA support
      MIPS: clear MSACSR cause bits when handling MSA FP exception
      MIPS: Ensure FCSR cause bits are clear after invoking FPU emulator
      MIPS: prevent FP context set via ptrace being discarded
      MIPS: disable FPU if the mode is unsupported

Petr Matousek (1):
      kvm: x86: i8259: return initialized data on invalid-size read

Radim Krčmář (8):
      KVM: x86: inline kvm_ioapic_handles_vector()
      x86: vdso: fix pvclock races with task migration
      KVM: vmx: pass error code with internal error #2
      KVM: x86: use MDA for interrupt matching
      KVM: x86: fix mixed APIC mode broadcast
      KVM: x86: avoid logical_map when it is invalid
      KVM: x86: simplify kvm_apic_map
      KVM: use slowpath for cross page cached accesses

Takuya Yoshikawa (1):
      KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()

Thomas Huth (5):
      KVM: s390: Nullify instruction for certain program exceptions
      KVM: s390: Forward PSW to next instruction for addressing exceptions
      KVM: s390: Use insn_length() to calculate length of instruction
      KVM: Get rid of kvm_kvfree()
      KVM: s390: Add MEMOP ioctls for reading/writing guest memory

Wanpeng Li (2):
      kvm: x86: fix x86 eflags fixed bit
      kvm: mmu: lazy collapse small sptes into large sptes

Wincy Van (1):
      KVM: ioapic: Record edge-triggered interrupts delivery status

Xiubo Li (10):
      KVM: Fix WARNINGs for 'sizeof(X)' instead of 'sizeof X' in kvm_main.c
      KVM: Fix WARNING: labels should not be indented in kvm_main.c
      KVM: Fix ERROR: do not initialise statics to 0 or NULL in kvm_main.c
      KVM: EXPORT_SYMBOL should immediately follow its function
      KVM: Missing blank line after declarations in kvm_main.c
      KVM: no space before tabs in kvm_main.c
      KVM: Fix indentation in kvm_main.c
      KVM: Use pr_info/pr_err in kvm_main.c
      KVM: x86: Avoid using plain integer as NULL pointer warning
      KVM: x86: For the symbols used locally only should be static type

Yannick Guerrini (1):
      KVM: s390: Fix trivial typo in comments

 Documentation/CodeOfConflict                       |   27 +
 Documentation/devicetree/bindings/i2c/i2c-imx.txt  |    1 +
 .../devicetree/bindings/net/apm-xgene-enet.txt     |    5 +-
 .../bindings/serial/snps-dw-apb-uart.txt           |   16 +
 Documentation/power/suspend-and-interrupts.txt     |   22 +-
 Documentation/virtual/kvm/api.txt                  |  335 +++++-
 Documentation/virtual/kvm/devices/s390_flic.txt    |    3 +
 MAINTAINERS                                        |   15 +-
 Makefile                                           |    2 +-
 arch/arm/include/asm/kvm_arm.h                     |    1 +
 arch/arm/include/asm/kvm_host.h                    |   15 +-
 arch/arm/include/asm/kvm_mmio.h                    |   22 -
 arch/arm/include/uapi/asm/kvm.h                    |    3 +
 arch/arm/kernel/asm-offsets.c                      |    4 -
 arch/arm/kvm/Kconfig                               |   30 +-
 arch/arm/kvm/Makefile                              |   12 +-
 arch/arm/kvm/arm.c                                 |   45 +-
 arch/arm/kvm/guest.c                               |   18 -
 arch/arm/kvm/interrupts_head.S                     |    8 -
 arch/arm/kvm/mmio.c                                |   64 +-
 arch/arm/kvm/mmu.c                                 |  134 ++-
 arch/arm/kvm/trace.h                               |   48 +
 arch/arm/mach-pxa/idp.c                            |    1 +
 arch/arm/mach-pxa/lpd270.c                         |    2 +-
 arch/arm/mach-sa1100/neponset.c                    |    4 +-
 arch/arm/mach-sa1100/pleb.c                        |    2 +-
 arch/arm64/boot/dts/apm/apm-storm.dtsi             |    4 +-
 arch/arm64/include/asm/esr.h                       |    1 +
 arch/arm64/include/asm/kvm_arm.h                   |    1 +
 arch/arm64/include/asm/kvm_host.h                  |   15 +-
 arch/arm64/include/asm/kvm_mmio.h                  |   22 -
 arch/arm64/include/uapi/asm/kvm.h                  |    3 +
 arch/arm64/kvm/Kconfig                             |   18 +-
 arch/arm64/kvm/Makefile                            |   20 +-
 arch/arm64/mm/pageattr.c                           |    5 +-
 arch/mips/include/asm/asmmacro-32.h                |  128 +--
 arch/mips/include/asm/asmmacro.h                   |  218 ++--
 arch/mips/include/asm/fpu.h                        |   20 +-
 arch/mips/include/asm/kdebug.h                     |    3 +-
 arch/mips/include/asm/kvm_host.h                   |  125 ++-
 arch/mips/include/asm/processor.h                  |    2 +-
 arch/mips/include/uapi/asm/kvm.h                   |  164 +--
 arch/mips/kernel/asm-offsets.c                     |  105 +-
 arch/mips/kernel/genex.S                           |   15 +-
 arch/mips/kernel/ptrace.c                          |   30 +-
 arch/mips/kernel/r4k_fpu.S                         |    2 +-
 arch/mips/kernel/traps.c                           |   33 +-
 arch/mips/kvm/Makefile                             |    8 +-
 arch/mips/kvm/emulate.c                            |  332 +++++-
 arch/mips/kvm/fpu.S                                |  122 +++
 arch/mips/kvm/locore.S                             |   38 +
 arch/mips/kvm/mips.c                               |  472 ++++++++-
 arch/mips/kvm/msa.S                                |  161 +++
 arch/mips/kvm/stats.c                              |    4 +
 arch/mips/kvm/tlb.c                                |    6 +
 arch/mips/kvm/trap_emul.c                          |  199 +++-
 arch/powerpc/kvm/mpic.c                            |   17 +-
 arch/powerpc/kvm/powerpc.c                         |    4 +-
 arch/s390/include/asm/kvm_host.h                   |   46 +-
 arch/s390/include/asm/mmu_context.h                |    2 +-
 arch/s390/include/asm/page.h                       |   11 +-
 arch/s390/include/uapi/asm/kvm.h                   |    4 +
 arch/s390/include/uapi/asm/sie.h                   |    4 +-
 arch/s390/kernel/asm-offsets.c                     |    1 +
 arch/s390/kernel/jump_label.c                      |   12 +-
 arch/s390/kernel/module.c                          |    1 +
 arch/s390/kernel/processor.c                       |    2 +-
 arch/s390/kvm/diag.c                               |    6 +-
 arch/s390/kvm/gaccess.c                            |  296 +++++-
 arch/s390/kvm/gaccess.h                            |   21 +-
 arch/s390/kvm/guestdbg.c                           |    8 +-
 arch/s390/kvm/intercept.c                          |    5 +-
 arch/s390/kvm/interrupt.c                          | 1101 +++++++++++++-------
 arch/s390/kvm/kvm-s390.c                           |  398 ++++++-
 arch/s390/kvm/kvm-s390.h                           |   51 +-
 arch/s390/kvm/priv.c                               |  144 ++-
 arch/s390/kvm/sigp.c                               |    7 +-
 arch/s390/pci/pci.c                                |   28 +-
 arch/s390/pci/pci_mmio.c                           |   17 +-
 arch/x86/Kconfig                                   |    1 +
 arch/x86/include/asm/kvm_host.h                    |   28 +-
 arch/x86/include/asm/kvm_para.h                    |    2 +-
 arch/x86/include/asm/pvclock.h                     |    1 +
 arch/x86/include/asm/xsave.h                       |   28 +-
 arch/x86/include/uapi/asm/vmx.h                    |    1 +
 arch/x86/kernel/entry_64.S                         |   13 +-
 arch/x86/kernel/pvclock.c                          |   44 +
 arch/x86/kvm/Makefile                              |    2 +-
 arch/x86/kvm/cpuid.c                               |   33 +-
 arch/x86/kvm/cpuid.h                               |    8 +-
 arch/x86/kvm/emulate.c                             |  193 ++--
 arch/x86/kvm/i8254.c                               |   14 +-
 arch/x86/kvm/i8254.h                               |    2 +-
 arch/x86/kvm/i8259.c                               |   13 +-
 arch/x86/kvm/ioapic.c                              |   22 +-
 arch/x86/kvm/ioapic.h                              |   11 +-
 arch/x86/kvm/irq.h                                 |    2 +-
 arch/x86/kvm/lapic.c                               |  147 +--
 arch/x86/kvm/lapic.h                               |   17 +-
 arch/x86/kvm/mmu.c                                 |   73 ++
 arch/x86/kvm/pmu.c                                 |    2 +-
 arch/x86/kvm/svm.c                                 |   43 +-
 arch/x86/kvm/vmx.c                                 |  146 +--
 arch/x86/kvm/x86.c                                 |  157 ++-
 arch/x86/pci/acpi.c                                |   11 +-
 arch/x86/vdso/vclock_gettime.c                     |   34 +-
 drivers/acpi/resource.c                            |    4 +-
 drivers/acpi/video.c                               |   20 +-
 drivers/android/binder.c                           |   10 +-
 drivers/ata/sata_fsl.c                             |    2 +
 drivers/base/power/domain.c                        |   24 +-
 drivers/base/power/wakeup.c                        |    1 +
 drivers/char/tpm/tpm-chip.c                        |   34 +-
 drivers/char/tpm/tpm_ibmvtpm.c                     |   10 +-
 drivers/char/tpm/tpm_ibmvtpm.h                     |    6 +-
 drivers/clk/at91/pmc.c                             |   20 +-
 drivers/clk/at91/pmc.h                             |    1 +
 drivers/cpufreq/exynos-cpufreq.c                   |   21 +-
 drivers/cpufreq/ppc-corenet-cpufreq.c              |    2 +
 drivers/cpuidle/cpuidle.c                          |   61 +-
 drivers/dma/at_xdmac.c                             |    7 +-
 drivers/dma/dw/core.c                              |    2 +-
 drivers/dma/ioat/dma_v3.c                          |    4 +
 drivers/dma/mmp_pdma.c                             |   10 +
 drivers/dma/mmp_tdma.c                             |   31 +-
 drivers/dma/qcom_bam_dma.c                         |   10 +-
 drivers/dma/sh/shdmac.c                            |   15 +-
 drivers/firmware/dmi_scan.c                        |   17 +-
 drivers/firmware/efi/libstub/efi-stub-helper.c     |    8 +-
 drivers/gpu/drm/drm_mm.c                           |  152 +--
 drivers/gpu/drm/i915/i915_debugfs.c                |    4 +-
 drivers/gpu/drm/i915/i915_drv.c                    |   30 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c                |    6 +-
 drivers/gpu/drm/i915/intel_fifo_underrun.c         |   18 +-
 drivers/gpu/drm/imx/dw_hdmi-imx.c                  |   36 +-
 drivers/gpu/drm/imx/imx-ldb.c                      |   28 +-
 drivers/gpu/drm/imx/parallel-display.c             |    5 +-
 drivers/gpu/drm/msm/mdp/mdp4/mdp4_irq.c            |    5 +
 drivers/gpu/drm/msm/mdp/mdp5/mdp5.xml.h            |   15 +-
 drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c           |   99 +-
 drivers/gpu/drm/msm/mdp/mdp5/mdp5_encoder.c        |    6 +-
 drivers/gpu/drm/msm/mdp/mdp5/mdp5_irq.c            |    5 +
 drivers/gpu/drm/msm/msm_atomic.c                   |    4 +-
 drivers/gpu/drm/nouveau/nouveau_fbcon.c            |    2 +-
 drivers/gpu/drm/radeon/atombios_crtc.c             |    3 +
 drivers/gpu/drm/radeon/atombios_encoders.c         |   30 +-
 drivers/gpu/drm/radeon/cik.c                       |    3 +
 drivers/gpu/drm/radeon/dce6_afmt.c                 |   68 +-
 drivers/gpu/drm/radeon/evergreen.c                 |    3 +
 drivers/gpu/drm/radeon/evergreen_hdmi.c            |   59 +-
 drivers/gpu/drm/radeon/r100.c                      |    4 +
 drivers/gpu/drm/radeon/r600.c                      |    3 +
 drivers/gpu/drm/radeon/r600_hdmi.c                 |   11 -
 drivers/gpu/drm/radeon/radeon_audio.c              |   50 +-
 drivers/gpu/drm/radeon/radeon_cs.c                 |    4 +-
 drivers/gpu/drm/radeon/rs600.c                     |    4 +
 drivers/gpu/drm/radeon/si.c                        |    3 +
 drivers/gpu/drm/radeon/sid.h                       |    4 +-
 drivers/gpu/drm/ttm/ttm_bo.c                       |    2 +-
 drivers/gpu/ipu-v3/ipu-di.c                        |    2 +
 drivers/i2c/busses/i2c-designware-baytrail.c       |   40 +-
 drivers/iio/adc/mcp3422.c                          |   17 +-
 drivers/iio/adc/qcom-spmi-iadc.c                   |    3 +-
 drivers/iio/common/ssp_sensors/ssp_dev.c           |    2 +
 drivers/iio/dac/ad5686.c                           |    2 +-
 drivers/iio/humidity/dht11.c                       |   69 +-
 drivers/iio/humidity/si7020.c                      |    6 +-
 drivers/iio/imu/adis16400_core.c                   |    3 +-
 drivers/iio/imu/inv_mpu6050/inv_mpu_core.c         |    6 +-
 drivers/iio/light/Kconfig                          |    2 +
 drivers/iio/magnetometer/Kconfig                   |    2 +
 drivers/input/keyboard/tc3589x-keypad.c            |    6 +-
 drivers/input/misc/mma8450.c                       |    1 +
 drivers/input/mouse/alps.c                         |    4 +-
 drivers/input/mouse/cyapa_gen3.c                   |    2 +-
 drivers/input/mouse/cyapa_gen5.c                   |    4 +-
 drivers/input/mouse/focaltech.c                    |   50 +-
 drivers/input/mouse/psmouse-base.c                 |   14 +-
 drivers/input/mouse/psmouse.h                      |    6 +
 drivers/input/touchscreen/Kconfig                  |    1 +
 drivers/misc/mei/init.c                            |    2 +
 drivers/net/can/dev.c                              |    8 +
 drivers/net/can/usb/kvaser_usb.c                   |   48 +-
 drivers/net/can/usb/peak_usb/pcan_usb_fd.c         |    4 +
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c     |    2 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c   |    4 +
 drivers/net/ethernet/broadcom/bcm63xx_enet.c       |    8 +-
 drivers/net/ethernet/broadcom/bgmac.c              |    7 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c   |    3 +
 drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c |    6 +-
 drivers/net/ethernet/cadence/macb.c                |    8 +-
 drivers/net/ethernet/cadence/macb.h                |    2 +-
 drivers/net/ethernet/freescale/fec_main.c          |    3 +-
 drivers/net/ethernet/freescale/gianfar.c           |   19 +-
 drivers/net/ethernet/smsc/smc91x.c                 |    1 +
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |   65 +-
 drivers/net/team/team.c                            |    6 +-
 drivers/net/xen-netback/interface.c                |    3 +-
 drivers/net/xen-netback/netback.c                  |   22 +-
 drivers/pci/host/pci-versatile.c                   |    2 +-
 drivers/regulator/core.c                           |    7 -
 drivers/regulator/da9210-regulator.c               |    9 +
 drivers/regulator/rk808-regulator.c                |    8 +
 drivers/rtc/rtc-at91rm9200.c                       |   62 +-
 drivers/rtc/rtc-at91sam9.c                         |   73 +-
 drivers/s390/block/dcssblk.c                       |    2 +-
 drivers/s390/block/scm_blk_cluster.c               |    2 +-
 drivers/spi/spi-atmel.c                            |   12 +-
 drivers/spi/spi-dw-mid.c                           |    6 +
 drivers/spi/spi-dw-pci.c                           |    4 +-
 drivers/spi/spi-dw.c                               |    4 +-
 drivers/spi/spi-img-spfi.c                         |    7 +
 drivers/spi/spi-pl022.c                            |    2 +-
 drivers/spi/spi-ti-qspi.c                          |   22 +
 drivers/staging/comedi/drivers/adv_pci1710.c       |    3 +-
 drivers/staging/comedi/drivers/comedi_isadma.c     |    5 +-
 drivers/staging/comedi/drivers/vmk80xx.c           |   71 --
 drivers/staging/iio/adc/mxs-lradc.c                |  207 ++--
 drivers/staging/iio/resolver/ad2s1200.c            |    3 +-
 .../thermal/int340x_thermal/int340x_thermal_zone.c |   10 +-
 drivers/thermal/samsung/exynos_tmu.c               |    3 +-
 drivers/thermal/thermal_core.c                     |   37 +-
 drivers/tty/bfin_jtag_comm.c                       |   13 -
 drivers/tty/serial/8250/8250_core.c                |   11 +-
 drivers/tty/serial/8250/8250_dw.c                  |   32 +
 drivers/tty/serial/8250/8250_pci.c                 |   20 +-
 drivers/tty/serial/atmel_serial.c                  |   49 +-
 drivers/tty/serial/of_serial.c                     |    4 -
 drivers/tty/serial/sprd_serial.c                   |    4 +-
 drivers/tty/tty_io.c                               |    4 +-
 drivers/tty/tty_ioctl.c                            |   16 +-
 drivers/usb/class/cdc-acm.c                        |    2 +
 drivers/usb/core/devio.c                           |    2 +
 drivers/usb/dwc3/dwc3-omap.c                       |   30 +-
 drivers/usb/gadget/configfs.c                      |    2 -
 drivers/usb/gadget/function/f_hid.c                |    2 +-
 drivers/usb/gadget/function/f_phonet.c             |    5 +-
 drivers/usb/gadget/function/f_sourcesink.c         |    4 +-
 drivers/usb/gadget/function/f_uac2.c               |   34 +-
 drivers/usb/gadget/function/uvc_v4l2.c             |    1 +
 drivers/usb/gadget/function/uvc_video.c            |    1 +
 drivers/usb/gadget/legacy/g_ffs.c                  |    6 +-
 drivers/usb/host/xhci-pci.c                        |   30 +
 drivers/usb/host/xhci-plat.c                       |   19 +-
 drivers/usb/host/xhci-ring.c                       |   12 +-
 drivers/usb/host/xhci.c                            |  100 +-
 drivers/usb/host/xhci.h                            |   11 +-
 drivers/usb/isp1760/isp1760-hcd.c                  |    6 +-
 drivers/usb/musb/musb_core.c                       |   10 +-
 drivers/usb/musb/musb_dsps.c                       |   32 +-
 drivers/usb/musb/musb_host.c                       |    2 +-
 drivers/usb/musb/omap2430.c                        |    7 +-
 drivers/usb/renesas_usbhs/Kconfig                  |    1 +
 drivers/usb/serial/bus.c                           |   45 +-
 drivers/usb/serial/ch341.c                         |   15 +-
 drivers/usb/serial/console.c                       |    2 +
 drivers/usb/serial/cp210x.c                        |    2 +
 drivers/usb/serial/ftdi_sio.c                      |   19 +
 drivers/usb/serial/ftdi_sio_ids.h                  |   23 +
 drivers/usb/serial/generic.c                       |    5 +-
 drivers/usb/serial/mxuport.c                       |    3 +-
 drivers/usb/serial/pl2303.c                        |   18 +-
 drivers/usb/serial/usb-serial.c                    |   21 +-
 drivers/usb/storage/unusual_uas.h                  |    7 +
 drivers/usb/storage/usb.c                          |    6 +
 drivers/video/fbdev/amba-clcd.c                    |    3 +
 drivers/video/fbdev/core/fbmon.c                   |    6 +-
 drivers/video/fbdev/omap2/dss/display-sysfs.c      |  179 ++--
 drivers/watchdog/at91sam9_wdt.c                    |    3 +-
 fs/btrfs/ctree.c                                   |    8 +-
 fs/btrfs/extent-tree.c                             |   16 +
 fs/btrfs/file.c                                    |   87 +-
 fs/btrfs/inode.c                                   |    1 -
 fs/btrfs/ordered-data.c                            |    7 +-
 fs/btrfs/send.c                                    |  171 ++-
 fs/btrfs/transaction.c                             |    3 -
 fs/btrfs/tree-log.c                                |    2 +-
 fs/btrfs/xattr.c                                   |    8 +-
 fs/ecryptfs/ecryptfs_kernel.h                      |    4 +-
 fs/ecryptfs/file.c                                 |   34 +-
 fs/ecryptfs/keystore.c                             |    2 +-
 fs/ecryptfs/main.c                                 |    2 +-
 fs/locks.c                                         |    3 +-
 fs/nfs/client.c                                    |    2 +-
 fs/nfs/delegation.c                                |   45 +-
 fs/nfs/dir.c                                       |   22 +-
 fs/nfs/file.c                                      |   11 +-
 fs/nfs/inode.c                                     |  111 +-
 fs/nfs/internal.h                                  |    1 +
 fs/nfs/nfs3proc.c                                  |    4 +-
 fs/nfs/nfs3xdr.c                                   |    5 +
 fs/nfs/nfs4client.c                                |    9 +-
 fs/nfs/nfs4proc.c                                  |   31 +-
 fs/nfs/nfs4session.h                               |    1 +
 fs/nfs/nfs4state.c                                 |   18 +-
 fs/nfs/proc.c                                      |    6 +-
 fs/nfs/write.c                                     |   30 +
 include/drm/drm_mm.h                               |   52 +-
 include/drm/ttm/ttm_bo_api.h                       |    2 +-
 include/drm/ttm/ttm_bo_driver.h                    |    2 +-
 include/kvm/arm_arch_timer.h                       |   31 +-
 include/kvm/arm_vgic.h                             |  117 +--
 {virt => include}/kvm/iodev.h                      |   28 +-
 include/linux/cpuidle.h                            |   17 +-
 include/linux/interrupt.h                          |    9 +-
 include/linux/irqdesc.h                            |    1 +
 include/linux/kvm_host.h                           |   32 +-
 include/linux/nfs_fs.h                             |    5 +-
 include/linux/sched.h                              |    8 +
 include/linux/serial_core.h                        |   14 +-
 include/linux/spi/spi.h                            |    2 +-
 include/linux/usb/serial.h                         |    3 +-
 include/linux/workqueue.h                          |    3 +-
 include/net/netfilter/nf_tables.h                  |   22 +-
 include/uapi/linux/kvm.h                           |   65 +-
 include/uapi/linux/serial.h                        |    4 +
 include/video/omapdss.h                            |    1 +
 kernel/cpuset.c                                    |    9 +-
 kernel/irq/manage.c                                |    7 +-
 kernel/irq/pm.c                                    |    7 +-
 kernel/livepatch/core.c                            |    3 +-
 kernel/module.c                                    |    2 +
 kernel/printk/console_cmdline.h                    |    2 +-
 kernel/printk/printk.c                             |    1 +
 kernel/sched/core.c                                |   15 +
 kernel/sched/idle.c                                |   54 +-
 kernel/trace/ftrace.c                              |   40 +-
 kernel/workqueue.c                                 |   56 +-
 lib/seq_buf.c                                      |    4 +-
 net/can/af_can.c                                   |    3 +
 net/ipv4/ip_fragment.c                             |   11 +-
 net/ipv4/ip_sockglue.c                             |   33 +-
 net/ipv4/ping.c                                    |   12 +-
 net/ipv4/tcp.c                                     |   10 +-
 net/ipv6/datagram.c                                |   39 +-
 net/ipv6/ping.c                                    |    5 +-
 net/irda/ircomm/ircomm_tty.c                       |    4 +-
 net/netfilter/ipvs/ip_vs_sync.c                    |    3 +
 net/netfilter/nf_tables_api.c                      |   61 +-
 net/netfilter/nft_compat.c                         |   14 +-
 net/packet/af_packet.c                             |   22 +-
 net/rxrpc/ar-error.c                               |    4 +-
 net/sunrpc/cache.c                                 |    2 +-
 net/sunrpc/xprtrdma/rpc_rdma.c                     |    3 +-
 net/sunrpc/xprtrdma/xprt_rdma.h                    |    2 +-
 net/tipc/link.c                                    |    7 +-
 sound/drivers/opl3/opl3_midi.c                     |    2 +
 sound/firewire/dice/dice-interface.h               |   18 +-
 sound/firewire/dice/dice-proc.c                    |    4 +-
 sound/firewire/oxfw/oxfw-stream.c                  |    5 +-
 sound/isa/msnd/msnd_pinnacle_mixer.c               |    3 +-
 sound/pci/hda/patch_realtek.c                      |    7 +
 sound/soc/atmel/sam9g20_wm8731.c                   |   68 +-
 sound/soc/cirrus/Kconfig                           |    2 +-
 sound/soc/codecs/Kconfig                           |    2 +-
 sound/soc/codecs/max98357a.c                       |   12 +-
 sound/soc/codecs/rt5670.c                          |    7 +-
 sound/soc/codecs/rt5677.c                          |   32 +-
 sound/soc/codecs/sta32x.c                          |    6 +-
 sound/soc/fsl/fsl_ssi.c                            |   11 +-
 sound/soc/generic/simple-card.c                    |    5 +
 sound/soc/intel/sst-atom-controls.h                |    2 +-
 sound/soc/intel/sst/sst.c                          |   10 +-
 sound/soc/omap/omap-hdmi-audio.c                   |    3 +
 sound/soc/omap/omap-mcbsp.c                        |   11 +
 sound/soc/omap/omap-pcm.c                          |    2 +-
 sound/soc/samsung/Kconfig                          |   10 +-
 sound/soc/sh/rcar/core.c                           |    4 +-
 sound/usb/line6/playback.c                         |    6 +-
 virt/kvm/arm/arch_timer.c                          |   45 +-
 virt/kvm/arm/vgic-v2-emul.c                        |   71 +-
 virt/kvm/arm/vgic-v3-emul.c                        |  246 +++--
 virt/kvm/arm/vgic.c                                |  479 ++++++---
 virt/kvm/arm/vgic.h                                |   37 +-
 virt/kvm/coalesced_mmio.c                          |    7 +-
 virt/kvm/eventfd.c                                 |    9 +-
 virt/kvm/irqchip.c                                 |    2 +-
 virt/kvm/kvm_main.c                                |  148 ++-
 378 files changed, 8320 insertions(+), 3641 deletions(-)
 create mode 100644 Documentation/CodeOfConflict
 create mode 100644 arch/mips/kvm/fpu.S
 create mode 100644 arch/mips/kvm/msa.S
 rename {virt => include}/kvm/iodev.h (66%)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-10 15:01 [GIT PULL] First batch of KVM changes for 4.1 Paolo Bonzini
@ 2015-04-17  8:52 ` Peter Zijlstra
  2015-04-17  9:17   ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2015-04-17  8:52 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto

On Fri, Apr 10, 2015 at 05:01:29PM +0200, Paolo Bonzini wrote:
>  include/linux/sched.h                              |    8 +
>  kernel/sched/core.c                                |   15 +

Can you please not puke over the scheduler without Acks from at least
one maintainer?

I complained about this very thing two years ago:

  http://marc.info/?l=linux-kernel&m=137345253916751

And now it magically re-appears WTF!


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17  8:52 ` Peter Zijlstra
@ 2015-04-17  9:17   ` Peter Zijlstra
  2015-04-17 10:09     ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2015-04-17  9:17 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto

On Fri, Apr 17, 2015 at 10:52:38AM +0200, Peter Zijlstra wrote:
> On Fri, Apr 10, 2015 at 05:01:29PM +0200, Paolo Bonzini wrote:
> >  include/linux/sched.h                              |    8 +
> >  kernel/sched/core.c                                |   15 +
> 
> Can you please not puke over the scheduler without Acks from at least
> one maintainer?
> 
> I complained about this very thing two years ago:
> 
>   http://marc.info/?l=linux-kernel&m=137345253916751
> 
> And now it magically re-appears WTF!

And I really don't understand _why_ you need that extra callback in the
first place. You already have preempt notifiers, just track if you came
in on another cpu than you went out on and voila!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17  9:17   ` Peter Zijlstra
@ 2015-04-17 10:09     ` Paolo Bonzini
  2015-04-17 10:36       ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2015-04-17 10:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto



On 17/04/2015 11:17, Peter Zijlstra wrote:
> On Fri, Apr 17, 2015 at 10:52:38AM +0200, Peter Zijlstra wrote:
>> On Fri, Apr 10, 2015 at 05:01:29PM +0200, Paolo Bonzini wrote:
>>>  include/linux/sched.h                              |    8 +
>>>  kernel/sched/core.c                                |   15 +
>>
>> Can you please not puke over the scheduler without Acks from at least
>> one maintainer?

Sorry, this was done while I was not handling the KVM tree.  At the very
least the commit message should have included the original hashes of the
commit and the revert.  This way one could have found the original Acks:

    commit 582b336ec2c0f0076f5650a029fcc9abd4a906f7
    Author: Marcelo Tosatti <mtosatti@redhat.com>
    Date:   Tue Nov 27 23:28:54 2012 -0200

    sched: add notifier for cross-cpu migrations

    Originally from Jeremy Fitzhardinge.

    Acked-by: Ingo Molnar <mingo@redhat.com>
    Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>


>> I complained about this very thing two years ago:
>>
>>   http://marc.info/?l=linux-kernel&m=137345253916751
>>
>> And now it magically re-appears WTF!
> 
> And I really don't understand _why_ you need that extra callback in the
> first place. You already have preempt notifiers, just track if you came
> in on another cpu than you went out on and voila!

Then you pay for _all_ preemptions of _all_ processes in the guest,
instead of the hopefully rare ones that do a CPU migration.

Preempt notifiers are registered on current only, this one is global.

Of course, adding a static key is a good idea.  I can also add a config
symbol, selected by paravirt, if you want.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 10:09     ` Paolo Bonzini
@ 2015-04-17 10:36       ` Peter Zijlstra
  2015-04-17 10:38         ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2015-04-17 10:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto

On Fri, Apr 17, 2015 at 12:09:49PM +0200, Paolo Bonzini wrote:
> 
> 
> On 17/04/2015 11:17, Peter Zijlstra wrote:
> > On Fri, Apr 17, 2015 at 10:52:38AM +0200, Peter Zijlstra wrote:
> >> On Fri, Apr 10, 2015 at 05:01:29PM +0200, Paolo Bonzini wrote:
> >>>  include/linux/sched.h                              |    8 +
> >>>  kernel/sched/core.c                                |   15 +
> >>
> >> Can you please not puke over the scheduler without Acks from at least
> >> one maintainer?
> 
> Sorry, this was done while I was not handling the KVM tree.  At the very
> least the commit message should have included the original hashes of the
> commit and the revert.  This way one could have found the original Acks:
> 
>     commit 582b336ec2c0f0076f5650a029fcc9abd4a906f7
>     Author: Marcelo Tosatti <mtosatti@redhat.com>
>     Date:   Tue Nov 27 23:28:54 2012 -0200
> 
>     sched: add notifier for cross-cpu migrations
> 
>     Originally from Jeremy Fitzhardinge.
> 
>     Acked-by: Ingo Molnar <mingo@redhat.com>
>     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Still not a good reason to sneak it back it now, after I got it taken
out. There was a reason it was removed, prior acks (esp. 2 year old
ones) do not count one whit _NOW_.

Also, Ingo later agreed that is was a mistake,

  http://marc.info/?l=linux-kernel&m=137346715521978&w=2

which is an effective retract of whatever ACK that was.

It was crap code then and its crap code now.

> >> I complained about this very thing two years ago:
> >>
> >>   http://marc.info/?l=linux-kernel&m=137345253916751
> >>
> >> And now it magically re-appears WTF!
> > 
> > And I really don't understand _why_ you need that extra callback in the
> > first place. You already have preempt notifiers, just track if you came
> > in on another cpu than you went out on and voila!
> 
> Then you pay for _all_ preemptions of _all_ processes in the guest,
> instead of the hopefully rare ones that do a CPU migration.

Now you make everybody pay for your crap, x86-64 paravirt or not. Keep
the cost by those who need it.

Please take it out, ASAP.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 10:36       ` Peter Zijlstra
@ 2015-04-17 10:38         ` Paolo Bonzini
  2015-04-17 10:55           ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2015-04-17 10:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto



On 17/04/2015 12:36, Peter Zijlstra wrote:
> Now you make everybody pay for your crap, x86-64 paravirt or not. Keep
> the cost by those who need it.
> 
> Please take it out, ASAP.

I'll just implement the static key.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 10:38         ` Paolo Bonzini
@ 2015-04-17 10:55           ` Peter Zijlstra
  2015-04-17 12:46             ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2015-04-17 10:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto

On Fri, Apr 17, 2015 at 12:38:07PM +0200, Paolo Bonzini wrote:
> 
> 
> On 17/04/2015 12:36, Peter Zijlstra wrote:
> > Now you make everybody pay for your crap, x86-64 paravirt or not. Keep
> > the cost by those who need it.
> > 
> > Please take it out, ASAP.
> 
> I'll just implement the static key.

Can you first show that:

preempt_out:
	int cpu = smp_processor_id();
	if (vcpu->cpu != cpu)
		vcpu->cpu = cpu;

preempt_in:
	int cpu = smp_processor_id();
	if (unlikely(vcpu->cpu != cpu))
		do_vcpu_migration_callback(cpu);

Is actually a measurable performance hit and we actually _need_ the
migration callback?

Also, it looks like you already do exactly this for other things, look
at:

	kvm_sched_in()
	  kvm_arch_vcpu_load()
	    if (unlikely(vcpu->cpu != cpu) ... )

So no, I don't believe for one second you need this.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 10:55           ` Peter Zijlstra
@ 2015-04-17 12:46             ` Paolo Bonzini
  2015-04-17 13:10               ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2015-04-17 12:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto



On 17/04/2015 12:55, Peter Zijlstra wrote:
> Also, it looks like you already do exactly this for other things, look
> at:
> 
> 	kvm_sched_in()
> 	  kvm_arch_vcpu_load()
> 	    if (unlikely(vcpu->cpu != cpu) ... )
> 
> So no, I don't believe for one second you need this.

You're missing that this snippet is running in the host, while this 
patch is concerned with the guest (paravirt).

This notifier runs for _all_ tasks, not just for the KVM threads.  In 
fact there will most likely be no KVM in the guest.

There is no vcpu->cpu where this notifier is run.

And frankly, I think the static key is snake oil.  The cost of task 
migration in terms of cache misses and TLB misses is in no way 
comparable to the cost of filling in a structure on the stack, 
dereferencing the head of the notifiers list and seeing that it's NULL.

If this was a real problem, it would be better solved by adding 
inlining in kernel/notifier.c:

diff --git a/kernel/notifier.c b/kernel/notifier.c
index ae9fc7cc360e..9c0cd0f739e0 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -59,28 +59,14 @@ static int notifier_chain_unregister(struct notifier_block **nl,
 	return -ENOENT;
 }
 
-/**
- * notifier_call_chain - Informs the registered notifiers about an event.
- *	@nl:		Pointer to head of the blocking notifier chain
- *	@val:		Value passed unmodified to notifier function
- *	@v:		Pointer passed unmodified to notifier function
- *	@nr_to_call:	Number of notifier functions to be called. Don't care
- *			value of this parameter is -1.
- *	@nr_calls:	Records the number of notifications sent. Don't care
- *			value of this field is NULL.
- *	@returns:	notifier_call_chain returns the value returned by the
- *			last notifier function called.
- */
-static int notifier_call_chain(struct notifier_block **nl,
-			       unsigned long val, void *v,
-			       int nr_to_call, int *nr_calls)
+static int __notifier_call_chain(struct notifier_block *nb,
+			         unsigned long val, void *v,
+			         int nr_to_call, int *nr_calls)
 {
 	int ret = NOTIFY_DONE;
-	struct notifier_block *nb, *next_nb;
-
-	nb = rcu_dereference_raw(*nl);
+	struct notifier_block *next_nb;
 
-	while (nb && nr_to_call) {
+	do {
 		next_nb = rcu_dereference_raw(nb->next);
 
 #ifdef CONFIG_DEBUG_NOTIFIERS
@@ -94,14 +80,38 @@ static int notifier_call_chain(struct notifier_block **nl,
 
 		if (nr_calls)
 			(*nr_calls)++;
-
-		if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK)
-			break;
-		nb = next_nb;
-		nr_to_call--;
-	}
+	} while (!(ret & NOTIFY_STOP_MASK) &&
+		 (nb = next_nb) != NULL &&
+		 --nr_to_call);
 	return ret;
 }
+NOKPROBE_SYMBOL(__notifier_call_chain);
+
+/**
+ * notifier_call_chain - Informs the registered notifiers about an event.
+ *	@nl:		Pointer to head of the blocking notifier chain
+ *	@val:		Value passed unmodified to notifier function
+ *	@v:		Pointer passed unmodified to notifier function
+ *	@nr_to_call:	Number of notifier functions to be called. Don't care
+ *			value of this parameter is -1.
+ *	@nr_calls:	Records the number of notifications sent. Don't care
+ *			value of this field is NULL.
+ *	@returns:	notifier_call_chain returns the value returned by the
+ *			last notifier function called.
+ */
+static __always_inline int notifier_call_chain(struct notifier_block **nl,
+			      		       unsigned long val, void *v,
+					       int nr_to_call, int *nr_calls)
+{
+	struct notifier_block *nb = rcu_dereference_raw(*nl);
+	if (unlikely(nr_to_call == 0))
+		return NOTIFY_DONE;
+
+	if (!nb)
+		return NOTIFY_DONE;
+
+	return __notifier_call_chain(nb, val, v, nr_to_call, nr_calls);
+}
 NOKPROBE_SYMBOL(notifier_call_chain);
 
 /*
@@ -190,7 +199,12 @@ NOKPROBE_SYMBOL(__atomic_notifier_call_chain);
 int atomic_notifier_call_chain(struct atomic_notifier_head *nh,
 			       unsigned long val, void *v)
 {
-	return __atomic_notifier_call_chain(nh, val, v, -1, NULL);
+	int ret;
+
+	rcu_read_lock();
+	ret = notifier_call_chain(&nh->head, val, v, -1, NULL);
+	rcu_read_unlock();
+	return ret;
 }
 EXPORT_SYMBOL_GPL(atomic_notifier_call_chain);
 NOKPROBE_SYMBOL(atomic_notifier_call_chain);


Also, move enough stuff to a header so that the fast path is inlined to
a single pointer derefrence.

Paolo

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 12:46             ` Paolo Bonzini
@ 2015-04-17 13:10               ` Peter Zijlstra
  2015-04-17 13:38                 ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2015-04-17 13:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto

On Fri, Apr 17, 2015 at 02:46:57PM +0200, Paolo Bonzini wrote:
> On 17/04/2015 12:55, Peter Zijlstra wrote:
> > Also, it looks like you already do exactly this for other things, look
> > at:
> > 
> > 	kvm_sched_in()
> > 	  kvm_arch_vcpu_load()
> > 	    if (unlikely(vcpu->cpu != cpu) ... )
> > 
> > So no, I don't believe for one second you need this.
> 
> You're missing that this snippet is running in the host, while this 
> patch is concerned with the guest (paravirt).

This doesn't make sense; but it brings us back to where we were last
time. There is _0_ justification for this in the patches, that alone is
grounds enough to reject it.

Why should the guest task care about the physical cpu of the vcpu;
that's a layering fail if ever there was one.

Furthermore, the only thing that migration handler seems to do is
increment a variable that is not actually used in that file.

> And frankly, I think the static key is snake oil.  The cost of task 
> migration in terms of cache misses and TLB misses is in no way 
> comparable to the cost of filling in a structure on the stack, 
> dereferencing the head of the notifiers list and seeing that it's NULL.

The path this notifier is called from has nothing to do with those
costs. And the fact you're inflicting these costs on _everyone_ for a
single x86_64-paravirt case is insane.

I've had enough of this, the below goes into sched/urgent and you can
come back with sane patches if and when you're ready.

---
 arch/x86/kernel/pvclock.c | 24 ------------------------
 include/linux/sched.h     |  8 --------
 kernel/sched/core.c       | 15 ---------------
 3 files changed, 47 deletions(-)

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index e5ecd20e72dd..82f116de3835 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -160,27 +160,6 @@ struct pvclock_vcpu_time_info *pvclock_get_vsyscall_time_info(int cpu)
 }
 
 #ifdef CONFIG_X86_64
-static int pvclock_task_migrate(struct notifier_block *nb, unsigned long l,
-			        void *v)
-{
-	struct task_migration_notifier *mn = v;
-	struct pvclock_vsyscall_time_info *pvti;
-
-	pvti = pvclock_get_vsyscall_user_time_info(mn->from_cpu);
-
-	/* this is NULL when pvclock vsyscall is not initialized */
-	if (unlikely(pvti == NULL))
-		return NOTIFY_DONE;
-
-	pvti->migrate_count++;
-
-	return NOTIFY_DONE;
-}
-
-static struct notifier_block pvclock_migrate = {
-	.notifier_call = pvclock_task_migrate,
-};
-
 /*
  * Initialize the generic pvclock vsyscall state.  This will allocate
  * a/some page(s) for the per-vcpu pvclock information, set up a
@@ -202,9 +181,6 @@ int __init pvclock_init_vsyscall(struct pvclock_vsyscall_time_info *i,
 			     PAGE_KERNEL_VVAR);
 	}
 
-
-	register_task_migration_notifier(&pvclock_migrate);
-
 	return 0;
 }
 #endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3f3308824fa4..0eabab99e126 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -176,14 +176,6 @@ extern void get_iowait_load(unsigned long *nr_waiters, unsigned long *load);
 extern void calc_global_load(unsigned long ticks);
 extern void update_cpu_load_nohz(void);
 
-/* Notifier for when a task gets migrated to a new CPU */
-struct task_migration_notifier {
-	struct task_struct *task;
-	int from_cpu;
-	int to_cpu;
-};
-extern void register_task_migration_notifier(struct notifier_block *n);
-
 extern unsigned long get_parent_ip(unsigned long addr);
 
 extern void dump_cpu_task(int cpu);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1d0bc4fe266d..dbfc93d40292 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1013,13 +1013,6 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
 		rq_clock_skip_update(rq, true);
 }
 
-static ATOMIC_NOTIFIER_HEAD(task_migration_notifier);
-
-void register_task_migration_notifier(struct notifier_block *n)
-{
-	atomic_notifier_chain_register(&task_migration_notifier, n);
-}
-
 #ifdef CONFIG_SMP
 void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 {
@@ -1050,18 +1043,10 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 	trace_sched_migrate_task(p, new_cpu);
 
 	if (task_cpu(p) != new_cpu) {
-		struct task_migration_notifier tmn;
-
 		if (p->sched_class->migrate_task_rq)
 			p->sched_class->migrate_task_rq(p, new_cpu);
 		p->se.nr_migrations++;
 		perf_sw_event_sched(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 0);
-
-		tmn.task = p;
-		tmn.from_cpu = task_cpu(p);
-		tmn.to_cpu = new_cpu;
-
-		atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn);
 	}
 
 	__set_task_cpu(p, new_cpu);

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 13:10               ` Peter Zijlstra
@ 2015-04-17 13:38                 ` Paolo Bonzini
  2015-04-17 13:43                   ` Peter Zijlstra
  2015-04-17 19:01                   ` Marcelo Tosatti
  0 siblings, 2 replies; 34+ messages in thread
From: Paolo Bonzini @ 2015-04-17 13:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto

On 17/04/2015 15:10, Peter Zijlstra wrote:
> On Fri, Apr 17, 2015 at 02:46:57PM +0200, Paolo Bonzini wrote:
>> On 17/04/2015 12:55, Peter Zijlstra wrote:
>>> Also, it looks like you already do exactly this for other things, look
>>> at:
>>>
>>> 	kvm_sched_in()
>>> 	  kvm_arch_vcpu_load()
>>> 	    if (unlikely(vcpu->cpu != cpu) ... )
>>>
>>> So no, I don't believe for one second you need this.
> 
> This [...] brings us back to where we were last
> time. There is _0_ justification for this in the patches, that alone is
> grounds enough to reject it.

Oh, we totally agree on that.  I didn't commit that patch, but I already
said the commit message was insufficient.

> Why should the guest task care about the physical cpu of the vcpu;
> that's a layering fail if ever there was one.

It's totally within your right to not read the code, but then please
don't try commenting at it.

This code:

	kvm_sched_in()
	  kvm_arch_vcpu_load()
 	    if (unlikely(vcpu->cpu != cpu) ... )

runs in the host.  The hypervisor obviously cares if the physical CPU of
the VCPU changes.  It has to tell the source processor (vcpu->cpu) to
release the VCPU's data structure and only then it can use it in the
target processor (cpu).  No layering violation here.

The task migration notifier runs in the guest, whenever the VCPU of
a task changes.

> Furthermore, the only thing that migration handler seems to do is
> increment a variable that is not actually used in that file.

It's used in the vDSO, so you cannot increment it in the file that uses it.

>> And frankly, I think the static key is snake oil.  The cost of task 
>> migration in terms of cache misses and TLB misses is in no way 
>> comparable to the cost of filling in a structure on the stack, 
>> dereferencing the head of the notifiers list and seeing that it's NULL.
> 
> The path this notifier is called from has nothing to do with those
> costs.

How not?  The task is going to incur those costs, it's not like half
a dozen extra instruction make any difference.  But anyway...

> And the fact you're inflicting these costs on _everyone_ for a
> single x86_64-paravirt case is insane.

... that's a valid objection.  Please look at the patch below.

> I've had enough of this, the below goes into sched/urgent and you can
> come back with sane patches if and when you're ready.

Oh, please, cut the alpha male crap.

Paolo

------------------- 8< ----------------
>From 4eb9d7132e1990c0586f28af3103675416d38974 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Fri, 17 Apr 2015 14:57:34 +0200
Subject: [PATCH] sched: add CONFIG_TASK_MIGRATION_NOTIFIER

The task migration notifier is only used in x86 paravirt.  Make it
possible to compile it out.

While at it, move some code around to ensure tmn is filled from CPU
registers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/Kconfig    | 1 +
 init/Kconfig        | 3 +++
 kernel/sched/core.c | 9 ++++++++-
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d43e7e1c784b..9af252c8698d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -649,6 +649,7 @@ if HYPERVISOR_GUEST
 
 config PARAVIRT
 	bool "Enable paravirtualization code"
+	select TASK_MIGRATION_NOTIFIER
 	---help---
 	  This changes the kernel so it can modify itself when it is run
 	  under a hypervisor, potentially improving performance significantly
diff --git a/init/Kconfig b/init/Kconfig
index 3b9df1aa35db..891917123338 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2016,6 +2016,9 @@ source "block/Kconfig"
 config PREEMPT_NOTIFIERS
 	bool
 
+config TASK_MIGRATION_NOTIFIER
+	bool
+
 config PADATA
 	depends on SMP
 	bool
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f9123a82cbb6..c07a53aa543c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1016,12 +1016,14 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
 		rq_clock_skip_update(rq, true);
 }
 
+#ifdef CONFIG_TASK_MIGRATION_NOTIFIER
 static ATOMIC_NOTIFIER_HEAD(task_migration_notifier);
 
 void register_task_migration_notifier(struct notifier_block *n)
 {
 	atomic_notifier_chain_register(&task_migration_notifier, n);
 }
+#endif
 
 #ifdef CONFIG_SMP
 void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
@@ -1053,18 +1055,23 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 	trace_sched_migrate_task(p, new_cpu);
 
 	if (task_cpu(p) != new_cpu) {
+#ifdef CONFIG_TASK_MIGRATION_NOTIFIER
 		struct task_migration_notifier tmn;
+		int from_cpu = task_cpu(p);
+#endif
 
 		if (p->sched_class->migrate_task_rq)
 			p->sched_class->migrate_task_rq(p, new_cpu);
 		p->se.nr_migrations++;
 		perf_sw_event_sched(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 0);
 
+#ifdef CONFIG_TASK_MIGRATION_NOTIFIER
 		tmn.task = p;
-		tmn.from_cpu = task_cpu(p);
+		tmn.from_cpu = from_cpu;
 		tmn.to_cpu = new_cpu;
 
 		atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn);
+#endif
 	}
 
 	__set_task_cpu(p, new_cpu);
-- 
2.3.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 13:38                 ` Paolo Bonzini
@ 2015-04-17 13:43                   ` Peter Zijlstra
  2015-04-17 14:57                     ` Paolo Bonzini
  2015-04-17 19:01                   ` Marcelo Tosatti
  1 sibling, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2015-04-17 13:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto

On Fri, Apr 17, 2015 at 03:38:58PM +0200, Paolo Bonzini wrote:
> > The path this notifier is called from has nothing to do with those
> > costs.
> 
> How not?  The task is going to incur those costs, it's not like half
> a dozen extra instruction make any difference.  But anyway...

Its attributed to the entity doing the migration, which can be the
wakeup path or a softirq. And we very much do care about the wakeup
path.

> ... that's a valid objection.  Please look at the patch below.

Still a NAK on that, distros have no choice but to enable that CONFIG
option because people might want to run KVM.

CONFIG options are pointless if they end up being mandatory.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 13:43                   ` Peter Zijlstra
@ 2015-04-17 14:57                     ` Paolo Bonzini
  0 siblings, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2015-04-17 14:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: torvalds, linux-kernel, gleb, kvm, Ralf Baechle, mtosatti, luto



On 17/04/2015 15:43, Peter Zijlstra wrote:
> On Fri, Apr 17, 2015 at 03:38:58PM +0200, Paolo Bonzini wrote:
>>> The path this notifier is called from has nothing to do with those
>>> costs.
>
> Its attributed to the entity doing the migration, which can be the
> wakeup path or a softirq. And we very much do care about the wakeup
> path.

It's not run on all wakeups.  It's within an "if (task_cpu(p) !=
new_cpu)".  WF_MIGRATED _is_ a slow path for wakeups, even though wakeup
itself is of course something we care about.

For load balancing, calculate_imbalance alone is orders of magnitudes
more expensive than this notifier (which can be optimized to two
instructions with at most one cache miss).

>> ... that's a valid objection.  Please look at the patch below.
> 
> Still a NAK on that, distros have no choice but to enable that CONFIG
> option because people might want to run KVM.

Again: running virtual machines does not require these notifiers.  KVM
needs preempt and MMU notifiers, and also enables user return notifiers,
but does not need these ones.

It's only paravirt that needs them.  It's perfectly fine for distros to
disable paravirt.  Some do, some don't.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 13:38                 ` Paolo Bonzini
  2015-04-17 13:43                   ` Peter Zijlstra
@ 2015-04-17 19:01                   ` Marcelo Tosatti
  2015-04-17 19:16                     ` Andy Lutomirski
  2015-04-17 19:57                     ` Paolo Bonzini
  1 sibling, 2 replies; 34+ messages in thread
From: Marcelo Tosatti @ 2015-04-17 19:01 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Zijlstra, torvalds, linux-kernel, gleb, kvm, Ralf Baechle, luto

On Fri, Apr 17, 2015 at 03:38:58PM +0200, Paolo Bonzini wrote:
> On 17/04/2015 15:10, Peter Zijlstra wrote:
> > On Fri, Apr 17, 2015 at 02:46:57PM +0200, Paolo Bonzini wrote:
> >> On 17/04/2015 12:55, Peter Zijlstra wrote:
> >>> Also, it looks like you already do exactly this for other things, look
> >>> at:
> >>>
> >>> 	kvm_sched_in()
> >>> 	  kvm_arch_vcpu_load()
> >>> 	    if (unlikely(vcpu->cpu != cpu) ... )
> >>>
> >>> So no, I don't believe for one second you need this.
> > 
> > This [...] brings us back to where we were last
> > time. There is _0_ justification for this in the patches, that alone is
> > grounds enough to reject it.
> 
> Oh, we totally agree on that.  I didn't commit that patch, but I already
> said the commit message was insufficient.
> 
> > Why should the guest task care about the physical cpu of the vcpu;
> > that's a layering fail if ever there was one.
> 
> It's totally within your right to not read the code, but then please
> don't try commenting at it.
> 
> This code:
> 
> 	kvm_sched_in()
> 	  kvm_arch_vcpu_load()
>  	    if (unlikely(vcpu->cpu != cpu) ... )
> 
> runs in the host.  The hypervisor obviously cares if the physical CPU of
> the VCPU changes.  It has to tell the source processor (vcpu->cpu) to
> release the VCPU's data structure and only then it can use it in the
> target processor (cpu).  No layering violation here.
> 
> The task migration notifier runs in the guest, whenever the VCPU of
> a task changes.
> 
> > Furthermore, the only thing that migration handler seems to do is
> > increment a variable that is not actually used in that file.
> 
> It's used in the vDSO, so you cannot increment it in the file that uses it.
> 
> >> And frankly, I think the static key is snake oil.  The cost of task 
> >> migration in terms of cache misses and TLB misses is in no way 
> >> comparable to the cost of filling in a structure on the stack, 
> >> dereferencing the head of the notifiers list and seeing that it's NULL.
> > 
> > The path this notifier is called from has nothing to do with those
> > costs.
> 
> How not?  The task is going to incur those costs, it's not like half
> a dozen extra instruction make any difference.  But anyway...
> 
> > And the fact you're inflicting these costs on _everyone_ for a
> > single x86_64-paravirt case is insane.
> 
> ... that's a valid objection.  Please look at the patch below.
> 
> > I've had enough of this, the below goes into sched/urgent and you can
> > come back with sane patches if and when you're ready.
> 
> Oh, please, cut the alpha male crap.
> 
> Paolo
> 
> ------------------- 8< ----------------
> >From 4eb9d7132e1990c0586f28af3103675416d38974 Mon Sep 17 00:00:00 2001
> From: Paolo Bonzini <pbonzini@redhat.com>
> Date: Fri, 17 Apr 2015 14:57:34 +0200
> Subject: [PATCH] sched: add CONFIG_TASK_MIGRATION_NOTIFIER
> 
> The task migration notifier is only used in x86 paravirt.  Make it
> possible to compile it out.
> 
> While at it, move some code around to ensure tmn is filled from CPU
> registers.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/Kconfig    | 1 +
>  init/Kconfig        | 3 +++
>  kernel/sched/core.c | 9 ++++++++-
>  3 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index d43e7e1c784b..9af252c8698d 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -649,6 +649,7 @@ if HYPERVISOR_GUEST
>  
>  config PARAVIRT
>  	bool "Enable paravirtualization code"
> +	select TASK_MIGRATION_NOTIFIER
>  	---help---
>  	  This changes the kernel so it can modify itself when it is run
>  	  under a hypervisor, potentially improving performance significantly
> diff --git a/init/Kconfig b/init/Kconfig
> index 3b9df1aa35db..891917123338 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -2016,6 +2016,9 @@ source "block/Kconfig"
>  config PREEMPT_NOTIFIERS
>  	bool
>  
> +config TASK_MIGRATION_NOTIFIER
> +	bool
> +
>  config PADATA
>  	depends on SMP
>  	bool
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f9123a82cbb6..c07a53aa543c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1016,12 +1016,14 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
>  		rq_clock_skip_update(rq, true);
>  }
>  
> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER
>  static ATOMIC_NOTIFIER_HEAD(task_migration_notifier);
>  
>  void register_task_migration_notifier(struct notifier_block *n)
>  {
>  	atomic_notifier_chain_register(&task_migration_notifier, n);
>  }
> +#endif
>  
>  #ifdef CONFIG_SMP
>  void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
> @@ -1053,18 +1055,23 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>  	trace_sched_migrate_task(p, new_cpu);
>  
>  	if (task_cpu(p) != new_cpu) {
> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER
>  		struct task_migration_notifier tmn;
> +		int from_cpu = task_cpu(p);
> +#endif
>  
>  		if (p->sched_class->migrate_task_rq)
>  			p->sched_class->migrate_task_rq(p, new_cpu);
>  		p->se.nr_migrations++;
>  		perf_sw_event_sched(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 0);
>  
> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER
>  		tmn.task = p;
> -		tmn.from_cpu = task_cpu(p);
> +		tmn.from_cpu = from_cpu;
>  		tmn.to_cpu = new_cpu;
>  
>  		atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn);
> +#endif
>  	}
>  
>  	__set_task_cpu(p, new_cpu);
> -- 
> 2.3.5

Paolo, 

Please revert the patch -- can fix properly in the host
which also conforms the KVM guest/host documented protocol.

Radim submitted a patch to kvm@ to split 
the kvm_write_guest in two with a barrier in between, i think.

I'll review that patch.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 19:01                   ` Marcelo Tosatti
@ 2015-04-17 19:16                     ` Andy Lutomirski
  2015-04-17 19:57                     ` Paolo Bonzini
  1 sibling, 0 replies; 34+ messages in thread
From: Andy Lutomirski @ 2015-04-17 19:16 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Paolo Bonzini, Peter Zijlstra, Linus Torvalds, linux-kernel,
	Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski

On Fri, Apr 17, 2015 at 12:01 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Fri, Apr 17, 2015 at 03:38:58PM +0200, Paolo Bonzini wrote:
>> On 17/04/2015 15:10, Peter Zijlstra wrote:
>> > On Fri, Apr 17, 2015 at 02:46:57PM +0200, Paolo Bonzini wrote:
>> >> On 17/04/2015 12:55, Peter Zijlstra wrote:
>> >>> Also, it looks like you already do exactly this for other things, look
>> >>> at:
>> >>>
>> >>>   kvm_sched_in()
>> >>>     kvm_arch_vcpu_load()
>> >>>       if (unlikely(vcpu->cpu != cpu) ... )
>> >>>
>> >>> So no, I don't believe for one second you need this.
>> >
>> > This [...] brings us back to where we were last
>> > time. There is _0_ justification for this in the patches, that alone is
>> > grounds enough to reject it.
>>
>> Oh, we totally agree on that.  I didn't commit that patch, but I already
>> said the commit message was insufficient.
>>
>> > Why should the guest task care about the physical cpu of the vcpu;
>> > that's a layering fail if ever there was one.
>>
>> It's totally within your right to not read the code, but then please
>> don't try commenting at it.
>>
>> This code:
>>
>>       kvm_sched_in()
>>         kvm_arch_vcpu_load()
>>           if (unlikely(vcpu->cpu != cpu) ... )
>>
>> runs in the host.  The hypervisor obviously cares if the physical CPU of
>> the VCPU changes.  It has to tell the source processor (vcpu->cpu) to
>> release the VCPU's data structure and only then it can use it in the
>> target processor (cpu).  No layering violation here.
>>
>> The task migration notifier runs in the guest, whenever the VCPU of
>> a task changes.
>>
>> > Furthermore, the only thing that migration handler seems to do is
>> > increment a variable that is not actually used in that file.
>>
>> It's used in the vDSO, so you cannot increment it in the file that uses it.
>>
>> >> And frankly, I think the static key is snake oil.  The cost of task
>> >> migration in terms of cache misses and TLB misses is in no way
>> >> comparable to the cost of filling in a structure on the stack,
>> >> dereferencing the head of the notifiers list and seeing that it's NULL.
>> >
>> > The path this notifier is called from has nothing to do with those
>> > costs.
>>
>> How not?  The task is going to incur those costs, it's not like half
>> a dozen extra instruction make any difference.  But anyway...
>>
>> > And the fact you're inflicting these costs on _everyone_ for a
>> > single x86_64-paravirt case is insane.
>>
>> ... that's a valid objection.  Please look at the patch below.
>>
>> > I've had enough of this, the below goes into sched/urgent and you can
>> > come back with sane patches if and when you're ready.
>>
>> Oh, please, cut the alpha male crap.
>>
>> Paolo
>>
>> ------------------- 8< ----------------
>> >From 4eb9d7132e1990c0586f28af3103675416d38974 Mon Sep 17 00:00:00 2001
>> From: Paolo Bonzini <pbonzini@redhat.com>
>> Date: Fri, 17 Apr 2015 14:57:34 +0200
>> Subject: [PATCH] sched: add CONFIG_TASK_MIGRATION_NOTIFIER
>>
>> The task migration notifier is only used in x86 paravirt.  Make it
>> possible to compile it out.
>>
>> While at it, move some code around to ensure tmn is filled from CPU
>> registers.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  arch/x86/Kconfig    | 1 +
>>  init/Kconfig        | 3 +++
>>  kernel/sched/core.c | 9 ++++++++-
>>  3 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index d43e7e1c784b..9af252c8698d 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -649,6 +649,7 @@ if HYPERVISOR_GUEST
>>
>>  config PARAVIRT
>>       bool "Enable paravirtualization code"
>> +     select TASK_MIGRATION_NOTIFIER
>>       ---help---
>>         This changes the kernel so it can modify itself when it is run
>>         under a hypervisor, potentially improving performance significantly
>> diff --git a/init/Kconfig b/init/Kconfig
>> index 3b9df1aa35db..891917123338 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -2016,6 +2016,9 @@ source "block/Kconfig"
>>  config PREEMPT_NOTIFIERS
>>       bool
>>
>> +config TASK_MIGRATION_NOTIFIER
>> +     bool
>> +
>>  config PADATA
>>       depends on SMP
>>       bool
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index f9123a82cbb6..c07a53aa543c 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1016,12 +1016,14 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
>>               rq_clock_skip_update(rq, true);
>>  }
>>
>> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER
>>  static ATOMIC_NOTIFIER_HEAD(task_migration_notifier);
>>
>>  void register_task_migration_notifier(struct notifier_block *n)
>>  {
>>       atomic_notifier_chain_register(&task_migration_notifier, n);
>>  }
>> +#endif
>>
>>  #ifdef CONFIG_SMP
>>  void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>> @@ -1053,18 +1055,23 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>>       trace_sched_migrate_task(p, new_cpu);
>>
>>       if (task_cpu(p) != new_cpu) {
>> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER
>>               struct task_migration_notifier tmn;
>> +             int from_cpu = task_cpu(p);
>> +#endif
>>
>>               if (p->sched_class->migrate_task_rq)
>>                       p->sched_class->migrate_task_rq(p, new_cpu);
>>               p->se.nr_migrations++;
>>               perf_sw_event_sched(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 0);
>>
>> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER
>>               tmn.task = p;
>> -             tmn.from_cpu = task_cpu(p);
>> +             tmn.from_cpu = from_cpu;
>>               tmn.to_cpu = new_cpu;
>>
>>               atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn);
>> +#endif
>>       }
>>
>>       __set_task_cpu(p, new_cpu);
>> --
>> 2.3.5
>
> Paolo,
>
> Please revert the patch -- can fix properly in the host
> which also conforms the KVM guest/host documented protocol.
>
> Radim submitted a patch to kvm@ to split
> the kvm_write_guest in two with a barrier in between, i think.
>
> I'll review that patch.
>

Can you cc me on that?

Thanks,
Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 19:01                   ` Marcelo Tosatti
  2015-04-17 19:16                     ` Andy Lutomirski
@ 2015-04-17 19:57                     ` Paolo Bonzini
  2015-04-17 20:18                       ` Marcelo Tosatti
  1 sibling, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2015-04-17 19:57 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Peter Zijlstra, torvalds, linux-kernel, gleb, kvm, Ralf Baechle, luto



>> From 4eb9d7132e1990c0586f28af3103675416d38974 Mon Sep 17 00:00:00 2001
>> From: Paolo Bonzini <pbonzini@redhat.com>
>> Date: Fri, 17 Apr 2015 14:57:34 +0200
>> Subject: [PATCH] sched: add CONFIG_TASK_MIGRATION_NOTIFIER
>>
>> The task migration notifier is only used in x86 paravirt.  Make it
>> possible to compile it out.
>>
>> While at it, move some code around to ensure tmn is filled from CPU
>> registers.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  arch/x86/Kconfig    | 1 +
>>  init/Kconfig        | 3 +++
>>  kernel/sched/core.c | 9 ++++++++-
>>  3 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index d43e7e1c784b..9af252c8698d 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -649,6 +649,7 @@ if HYPERVISOR_GUEST
>>  
>>  config PARAVIRT
>>  	bool "Enable paravirtualization code"
>> +	select TASK_MIGRATION_NOTIFIER
>>  	---help---
>>  	  This changes the kernel so it can modify itself when it is run
>>  	  under a hypervisor, potentially improving performance significantly
>> diff --git a/init/Kconfig b/init/Kconfig
>> index 3b9df1aa35db..891917123338 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -2016,6 +2016,9 @@ source "block/Kconfig"
>>  config PREEMPT_NOTIFIERS
>>  	bool
>>  
>> +config TASK_MIGRATION_NOTIFIER
>> +	bool
>> +
>>  config PADATA
>>  	depends on SMP
>>  	bool
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index f9123a82cbb6..c07a53aa543c 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1016,12 +1016,14 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
>>  		rq_clock_skip_update(rq, true);
>>  }
>>  
>> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER
>>  static ATOMIC_NOTIFIER_HEAD(task_migration_notifier);
>>  
>>  void register_task_migration_notifier(struct notifier_block *n)
>>  {
>>  	atomic_notifier_chain_register(&task_migration_notifier, n);
>>  }
>> +#endif
>>  
>>  #ifdef CONFIG_SMP
>>  void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>> @@ -1053,18 +1055,23 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>>  	trace_sched_migrate_task(p, new_cpu);
>>  
>>  	if (task_cpu(p) != new_cpu) {
>> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER
>>  		struct task_migration_notifier tmn;
>> +		int from_cpu = task_cpu(p);
>> +#endif
>>  
>>  		if (p->sched_class->migrate_task_rq)
>>  			p->sched_class->migrate_task_rq(p, new_cpu);
>>  		p->se.nr_migrations++;
>>  		perf_sw_event_sched(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 0);
>>  
>> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER
>>  		tmn.task = p;
>> -		tmn.from_cpu = task_cpu(p);
>> +		tmn.from_cpu = from_cpu;
>>  		tmn.to_cpu = new_cpu;
>>  
>>  		atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn);
>> +#endif
>>  	}
>>  
>>  	__set_task_cpu(p, new_cpu);
>> -- 
>> 2.3.5
> 
> Paolo, 
> 
> Please revert the patch -- can fix properly in the host
> which also conforms the KVM guest/host documented protocol.
> 
> Radim submitted a patch to kvm@ to split 
> the kvm_write_guest in two with a barrier in between, i think.
> 
> I'll review that patch.

You're thinking of
http://article.gmane.org/gmane.linux.kernel.stable/129187, but see
Andy's reply:

> 
> I think there are at least two ways that would work:
> 
> a) If KVM incremented version as advertised:
> 
> cpu = getcpu();
> pvti = pvti for cpu;
> 
> ver1 = pvti->version;
> check stable bit;
> rdtsc_barrier, rdtsc, read scale, shift, etc.
> if (getcpu() != cpu) retry;
> if (pvti->version != ver1) retry;
> 
> I think this is safe because, we're guaranteed that there was an
> interval (between the two version reads) in which the vcpu we think
> we're on was running and the kvmclock data was valid and marked
> stable, and we know that the tsc we read came from that interval.
> 
> Note: rdtscp isn't needed. If we're stable, is makes no difference
> which cpu's tsc we actually read.
> 
> b) If version remains buggy but we use this migrations_from hack:
> 
> cpu = getcpu();
> pvti = pvti for cpu;
> m1 = pvti->migrations_from;
> barrier();
> 
> ver1 = pvti->version;
> check stable bit;
> rdtsc_barrier, rdtsc, read scale, shift, etc.
> if (getcpu() != cpu) retry;
> if (pvti->version != ver1) retry;  /* probably not really needed */
> 
> barrier();
> if (pvti->migrations_from != m1) retry;
> 
> This is just like (a), except that we're using a guest kernel hack to
> ensure that no one migrated off the vcpu during the version-protected
> critical section and that we were, in fact, on that vcpu at some point
> during that critical section.  Once we've ensured that we were on
> pvti's associated vcpu for the entire time we were reading it, then we
> are protected by the existing versioning in the host.

(a) is not going to happen until 4.2, and there are too many buggy hosts
around so we'd have to define new ABI that lets the guest distinguish a
buggy host from a fixed one.

(b) works now, is not invasive, and I still maintain that the cost is
negligible.  I'm going to run for a while with CONFIG_SCHEDSTATS to see
how often you have a migration.

Anyhow if the task migration notifier is reverted we have to disable the
whole vsyscall support altogether.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 19:57                     ` Paolo Bonzini
@ 2015-04-17 20:18                       ` Marcelo Tosatti
  2015-04-17 20:39                         ` Andy Lutomirski
  2015-04-20 16:59                         ` Paolo Bonzini
  0 siblings, 2 replies; 34+ messages in thread
From: Marcelo Tosatti @ 2015-04-17 20:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Zijlstra, torvalds, linux-kernel, gleb, kvm, Ralf Baechle, luto

On Fri, Apr 17, 2015 at 09:57:12PM +0200, Paolo Bonzini wrote:
> 
> 
> >> From 4eb9d7132e1990c0586f28af3103675416d38974 Mon Sep 17 00:00:00 2001
> >> From: Paolo Bonzini <pbonzini@redhat.com>
> >> Date: Fri, 17 Apr 2015 14:57:34 +0200
> >> Subject: [PATCH] sched: add CONFIG_TASK_MIGRATION_NOTIFIER
> >>
> >> The task migration notifier is only used in x86 paravirt.  Make it
> >> possible to compile it out.
> >>
> >> While at it, move some code around to ensure tmn is filled from CPU
> >> registers.
> >>
> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >> ---
> >>  arch/x86/Kconfig    | 1 +
> >>  init/Kconfig        | 3 +++
> >>  kernel/sched/core.c | 9 ++++++++-
> >>  3 files changed, 12 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> >> index d43e7e1c784b..9af252c8698d 100644
> >> --- a/arch/x86/Kconfig
> >> +++ b/arch/x86/Kconfig
> >> @@ -649,6 +649,7 @@ if HYPERVISOR_GUEST
> >>  
> >>  config PARAVIRT
> >>  	bool "Enable paravirtualization code"
> >> +	select TASK_MIGRATION_NOTIFIER
> >>  	---help---
> >>  	  This changes the kernel so it can modify itself when it is run
> >>  	  under a hypervisor, potentially improving performance significantly
> >> diff --git a/init/Kconfig b/init/Kconfig
> >> index 3b9df1aa35db..891917123338 100644
> >> --- a/init/Kconfig
> >> +++ b/init/Kconfig
> >> @@ -2016,6 +2016,9 @@ source "block/Kconfig"
> >>  config PREEMPT_NOTIFIERS
> >>  	bool
> >>  
> >> +config TASK_MIGRATION_NOTIFIER
> >> +	bool
> >> +
> >>  config PADATA
> >>  	depends on SMP
> >>  	bool
> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> index f9123a82cbb6..c07a53aa543c 100644
> >> --- a/kernel/sched/core.c
> >> +++ b/kernel/sched/core.c
> >> @@ -1016,12 +1016,14 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
> >>  		rq_clock_skip_update(rq, true);
> >>  }
> >>  
> >> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER
> >>  static ATOMIC_NOTIFIER_HEAD(task_migration_notifier);
> >>  
> >>  void register_task_migration_notifier(struct notifier_block *n)
> >>  {
> >>  	atomic_notifier_chain_register(&task_migration_notifier, n);
> >>  }
> >> +#endif
> >>  
> >>  #ifdef CONFIG_SMP
> >>  void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
> >> @@ -1053,18 +1055,23 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
> >>  	trace_sched_migrate_task(p, new_cpu);
> >>  
> >>  	if (task_cpu(p) != new_cpu) {
> >> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER
> >>  		struct task_migration_notifier tmn;
> >> +		int from_cpu = task_cpu(p);
> >> +#endif
> >>  
> >>  		if (p->sched_class->migrate_task_rq)
> >>  			p->sched_class->migrate_task_rq(p, new_cpu);
> >>  		p->se.nr_migrations++;
> >>  		perf_sw_event_sched(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 0);
> >>  
> >> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER
> >>  		tmn.task = p;
> >> -		tmn.from_cpu = task_cpu(p);
> >> +		tmn.from_cpu = from_cpu;
> >>  		tmn.to_cpu = new_cpu;
> >>  
> >>  		atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn);
> >> +#endif
> >>  	}
> >>  
> >>  	__set_task_cpu(p, new_cpu);
> >> -- 
> >> 2.3.5
> > 
> > Paolo, 
> > 
> > Please revert the patch -- can fix properly in the host
> > which also conforms the KVM guest/host documented protocol.
> > 
> > Radim submitted a patch to kvm@ to split 
> > the kvm_write_guest in two with a barrier in between, i think.
> > 
> > I'll review that patch.
> 
> You're thinking of
> http://article.gmane.org/gmane.linux.kernel.stable/129187, but see
> Andy's reply:
> 
> > 
> > I think there are at least two ways that would work:
> > 
> > a) If KVM incremented version as advertised:
> > 
> > cpu = getcpu();
> > pvti = pvti for cpu;
> > 
> > ver1 = pvti->version;
> > check stable bit;
> > rdtsc_barrier, rdtsc, read scale, shift, etc.
> > if (getcpu() != cpu) retry;
> > if (pvti->version != ver1) retry;
> > 
> > I think this is safe because, we're guaranteed that there was an
> > interval (between the two version reads) in which the vcpu we think
> > we're on was running and the kvmclock data was valid and marked
> > stable, and we know that the tsc we read came from that interval.
> > 
> > Note: rdtscp isn't needed. If we're stable, is makes no difference
> > which cpu's tsc we actually read.
> > 
> > b) If version remains buggy but we use this migrations_from hack:
> > 
> > cpu = getcpu();
> > pvti = pvti for cpu;
> > m1 = pvti->migrations_from;
> > barrier();
> > 
> > ver1 = pvti->version;
> > check stable bit;
> > rdtsc_barrier, rdtsc, read scale, shift, etc.
> > if (getcpu() != cpu) retry;
> > if (pvti->version != ver1) retry;  /* probably not really needed */
> > 
> > barrier();
> > if (pvti->migrations_from != m1) retry;
> > 
> > This is just like (a), except that we're using a guest kernel hack to
> > ensure that no one migrated off the vcpu during the version-protected
> > critical section and that we were, in fact, on that vcpu at some point
> > during that critical section.  Once we've ensured that we were on
> > pvti's associated vcpu for the entire time we were reading it, then we
> > are protected by the existing versioning in the host.
> 
> (a) is not going to happen until 4.2, and there are too many buggy hosts
> around so we'd have to define new ABI that lets the guest distinguish a
> buggy host from a fixed one.
> 
> (b) works now, is not invasive, and I still maintain that the cost is
> negligible.  I'm going to run for a while with CONFIG_SCHEDSTATS to see
> how often you have a migration.
> 
> Anyhow if the task migration notifier is reverted we have to disable the
> whole vsyscall support altogether.

The bug which this is fixing is very rare, have no memory of a report.

In fact, its even difficult to create a synthetic reproducer. You need:

1) update of kvmclock data structure (happens once every 5 minutes).
2) migration of task from vcpu1 to vcpu2 back to vcpu1.
3) a data race between kvm_write_guest (string copy) and 
2 above.

At the same time.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 20:18                       ` Marcelo Tosatti
@ 2015-04-17 20:39                         ` Andy Lutomirski
  2015-04-17 21:28                           ` Linus Torvalds
  2015-04-20 16:59                         ` Paolo Bonzini
  1 sibling, 1 reply; 34+ messages in thread
From: Andy Lutomirski @ 2015-04-17 20:39 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Paolo Bonzini, Peter Zijlstra, Linus Torvalds, linux-kernel,
	Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski

On Fri, Apr 17, 2015 at 1:18 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Fri, Apr 17, 2015 at 09:57:12PM +0200, Paolo Bonzini wrote:
>>
>>
>> >> From 4eb9d7132e1990c0586f28af3103675416d38974 Mon Sep 17 00:00:00 2001
>> >> From: Paolo Bonzini <pbonzini@redhat.com>
>> >> Date: Fri, 17 Apr 2015 14:57:34 +0200
>> >> Subject: [PATCH] sched: add CONFIG_TASK_MIGRATION_NOTIFIER
>> >>
>> >> The task migration notifier is only used in x86 paravirt.  Make it
>> >> possible to compile it out.
>> >>
>> >> While at it, move some code around to ensure tmn is filled from CPU
>> >> registers.
>> >>
>> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> >> ---
>> >>  arch/x86/Kconfig    | 1 +
>> >>  init/Kconfig        | 3 +++
>> >>  kernel/sched/core.c | 9 ++++++++-
>> >>  3 files changed, 12 insertions(+), 1 deletion(-)
>> >>
>> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> >> index d43e7e1c784b..9af252c8698d 100644
>> >> --- a/arch/x86/Kconfig
>> >> +++ b/arch/x86/Kconfig
>> >> @@ -649,6 +649,7 @@ if HYPERVISOR_GUEST
>> >>
>> >>  config PARAVIRT
>> >>    bool "Enable paravirtualization code"
>> >> +  select TASK_MIGRATION_NOTIFIER
>> >>    ---help---
>> >>      This changes the kernel so it can modify itself when it is run
>> >>      under a hypervisor, potentially improving performance significantly
>> >> diff --git a/init/Kconfig b/init/Kconfig
>> >> index 3b9df1aa35db..891917123338 100644
>> >> --- a/init/Kconfig
>> >> +++ b/init/Kconfig
>> >> @@ -2016,6 +2016,9 @@ source "block/Kconfig"
>> >>  config PREEMPT_NOTIFIERS
>> >>    bool
>> >>
>> >> +config TASK_MIGRATION_NOTIFIER
>> >> +  bool
>> >> +
>> >>  config PADATA
>> >>    depends on SMP
>> >>    bool
>> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> >> index f9123a82cbb6..c07a53aa543c 100644
>> >> --- a/kernel/sched/core.c
>> >> +++ b/kernel/sched/core.c
>> >> @@ -1016,12 +1016,14 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
>> >>            rq_clock_skip_update(rq, true);
>> >>  }
>> >>
>> >> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER
>> >>  static ATOMIC_NOTIFIER_HEAD(task_migration_notifier);
>> >>
>> >>  void register_task_migration_notifier(struct notifier_block *n)
>> >>  {
>> >>    atomic_notifier_chain_register(&task_migration_notifier, n);
>> >>  }
>> >> +#endif
>> >>
>> >>  #ifdef CONFIG_SMP
>> >>  void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>> >> @@ -1053,18 +1055,23 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>> >>    trace_sched_migrate_task(p, new_cpu);
>> >>
>> >>    if (task_cpu(p) != new_cpu) {
>> >> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER
>> >>            struct task_migration_notifier tmn;
>> >> +          int from_cpu = task_cpu(p);
>> >> +#endif
>> >>
>> >>            if (p->sched_class->migrate_task_rq)
>> >>                    p->sched_class->migrate_task_rq(p, new_cpu);
>> >>            p->se.nr_migrations++;
>> >>            perf_sw_event_sched(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 0);
>> >>
>> >> +#ifdef CONFIG_TASK_MIGRATION_NOTIFIER
>> >>            tmn.task = p;
>> >> -          tmn.from_cpu = task_cpu(p);
>> >> +          tmn.from_cpu = from_cpu;
>> >>            tmn.to_cpu = new_cpu;
>> >>
>> >>            atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn);
>> >> +#endif
>> >>    }
>> >>
>> >>    __set_task_cpu(p, new_cpu);
>> >> --
>> >> 2.3.5
>> >
>> > Paolo,
>> >
>> > Please revert the patch -- can fix properly in the host
>> > which also conforms the KVM guest/host documented protocol.
>> >
>> > Radim submitted a patch to kvm@ to split
>> > the kvm_write_guest in two with a barrier in between, i think.
>> >
>> > I'll review that patch.
>>
>> You're thinking of
>> http://article.gmane.org/gmane.linux.kernel.stable/129187, but see
>> Andy's reply:
>>
>> >
>> > I think there are at least two ways that would work:
>> >
>> > a) If KVM incremented version as advertised:
>> >
>> > cpu = getcpu();
>> > pvti = pvti for cpu;
>> >
>> > ver1 = pvti->version;
>> > check stable bit;
>> > rdtsc_barrier, rdtsc, read scale, shift, etc.
>> > if (getcpu() != cpu) retry;
>> > if (pvti->version != ver1) retry;
>> >
>> > I think this is safe because, we're guaranteed that there was an
>> > interval (between the two version reads) in which the vcpu we think
>> > we're on was running and the kvmclock data was valid and marked
>> > stable, and we know that the tsc we read came from that interval.
>> >
>> > Note: rdtscp isn't needed. If we're stable, is makes no difference
>> > which cpu's tsc we actually read.
>> >
>> > b) If version remains buggy but we use this migrations_from hack:
>> >
>> > cpu = getcpu();
>> > pvti = pvti for cpu;
>> > m1 = pvti->migrations_from;
>> > barrier();
>> >
>> > ver1 = pvti->version;
>> > check stable bit;
>> > rdtsc_barrier, rdtsc, read scale, shift, etc.
>> > if (getcpu() != cpu) retry;
>> > if (pvti->version != ver1) retry;  /* probably not really needed */
>> >
>> > barrier();
>> > if (pvti->migrations_from != m1) retry;
>> >
>> > This is just like (a), except that we're using a guest kernel hack to
>> > ensure that no one migrated off the vcpu during the version-protected
>> > critical section and that we were, in fact, on that vcpu at some point
>> > during that critical section.  Once we've ensured that we were on
>> > pvti's associated vcpu for the entire time we were reading it, then we
>> > are protected by the existing versioning in the host.
>>
>> (a) is not going to happen until 4.2, and there are too many buggy hosts
>> around so we'd have to define new ABI that lets the guest distinguish a
>> buggy host from a fixed one.
>>
>> (b) works now, is not invasive, and I still maintain that the cost is
>> negligible.  I'm going to run for a while with CONFIG_SCHEDSTATS to see
>> how often you have a migration.
>>
>> Anyhow if the task migration notifier is reverted we have to disable the
>> whole vsyscall support altogether.
>
> The bug which this is fixing is very rare, have no memory of a report.
>
> In fact, its even difficult to create a synthetic reproducer. You need:
>
> 1) update of kvmclock data structure (happens once every 5 minutes).
> 2) migration of task from vcpu1 to vcpu2 back to vcpu1.
> 3) a data race between kvm_write_guest (string copy) and
> 2 above.
>
> At the same time.

Something maybe worth considering:

On my box, vclock_gettime using kvm-clock is about 40 ns.  An empty
syscall is about 33 ns.  clock_gettime *should* be around 17 ns.

The clock_gettime syscall is about 73 ns.

Could we figure out why clock_gettime (the syscall) is so slow, fix
it, and then not be so sad about removing the existing kvm-clock vdso
code?  Once we fix the host for real (add a new feature bit and make
it global instead of per-cpu), then we could have a really fast vdso
implementation, too.

--Andy

>
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 20:39                         ` Andy Lutomirski
@ 2015-04-17 21:28                           ` Linus Torvalds
  2015-04-17 21:42                             ` Andy Lutomirski
  0 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2015-04-17 21:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Marcelo Tosatti, Paolo Bonzini, Peter Zijlstra, linux-kernel,
	Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski

On Fri, Apr 17, 2015 at 4:39 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> On my box, vclock_gettime using kvm-clock is about 40 ns.  An empty
> syscall is about 33 ns.  clock_gettime *should* be around 17 ns.
>
> The clock_gettime syscall is about 73 ns.
>
> Could we figure out why clock_gettime (the syscall) is so slow

If we only could profile some random program (let's call it "a.out"
that did the syscall(__NR_gettime_syscall) a couple million times.

Oh, lookie here, Santa came around:

  21.83%   [k] system_call
  12.85%   [.] syscall
   9.76%   [k] __audit_syscall_exit
   9.55%   [k] copy_user_enhanced_fast_string
   4.68%   [k] __getnstimeofday64
   4.08%   [k] syscall_trace_enter_phase1
   3.85%   [k] __audit_syscall_entry
   3.77%   [k] unroll_tree_refs
   3.15%   [k] sys_clock_gettime
   2.92%   [k] int_very_careful
   2.73%   [.] main
   2.35%   [k] syscall_trace_leave
   2.28%   [k] read_tsc
   1.73%   [k] int_restore_rest
   1.73%   [k] int_with_check
   1.48%   [k] syscall_return
   1.32%   [k] dput
   1.24%   [k] system_call_fastpath
   1.21%   [k] syscall_return_via_sysret
   1.21%   [k] tracesys
   0.81%   [k] do_audit_syscall_entry
   0.80%   [k] current_kernel_time
   0.73%   [k] getnstimeofday64
   0.68%   [k] path_put
   0.66%   [k] posix_clock_realtime_get
   0.61%   [k] int_careful
   0.60%   [k] mntput
   0.49%   [k] kfree
   0.36%   [k] _copy_to_user
   0.31%   [k] int_ret_from_sys_call_irqs_off

looks to me like it's spending a lot of time in system call auditing.
Which makes no sense to me, since none of this should be triggering
any auditing. And there's a lot of time in low-level kernel system
call assembly code.

If I only remembered the name of the crazy person who said "Ok" when I
suggest he just be the maintainer of the code since he has spent a lot
of time sending patches for it. Something like Amdy Letorsky. No, that
wasn't it. Hmm. It's on the tip of my tongue.

Oh well. Maybe somebody can remember the guys name. It's familiar for
some reason. Andy?

             Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 21:28                           ` Linus Torvalds
@ 2015-04-17 21:42                             ` Andy Lutomirski
  2015-04-17 22:04                               ` Linus Torvalds
  0 siblings, 1 reply; 34+ messages in thread
From: Andy Lutomirski @ 2015-04-17 21:42 UTC (permalink / raw)
  To: Linus Torvalds, John Stultz
  Cc: Marcelo Tosatti, Paolo Bonzini, Peter Zijlstra, linux-kernel,
	Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski

On Fri, Apr 17, 2015 at 2:28 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Apr 17, 2015 at 4:39 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> On my box, vclock_gettime using kvm-clock is about 40 ns.  An empty
>> syscall is about 33 ns.  clock_gettime *should* be around 17 ns.
>>
>> The clock_gettime syscall is about 73 ns.
>>
>> Could we figure out why clock_gettime (the syscall) is so slow
>
> If we only could profile some random program (let's call it "a.out"
> that did the syscall(__NR_gettime_syscall) a couple million times.
>
> Oh, lookie here, Santa came around:
>
>   21.83%   [k] system_call
>   12.85%   [.] syscall
>    9.76%   [k] __audit_syscall_exit
>    9.55%   [k] copy_user_enhanced_fast_string
>    4.68%   [k] __getnstimeofday64
>    4.08%   [k] syscall_trace_enter_phase1
>    3.85%   [k] __audit_syscall_entry
>    3.77%   [k] unroll_tree_refs
>    3.15%   [k] sys_clock_gettime
>    2.92%   [k] int_very_careful
>    2.73%   [.] main
>    2.35%   [k] syscall_trace_leave
>    2.28%   [k] read_tsc
>    1.73%   [k] int_restore_rest
>    1.73%   [k] int_with_check
>    1.48%   [k] syscall_return
>    1.32%   [k] dput
>    1.24%   [k] system_call_fastpath
>    1.21%   [k] syscall_return_via_sysret
>    1.21%   [k] tracesys
>    0.81%   [k] do_audit_syscall_entry
>    0.80%   [k] current_kernel_time
>    0.73%   [k] getnstimeofday64
>    0.68%   [k] path_put
>    0.66%   [k] posix_clock_realtime_get
>    0.61%   [k] int_careful
>    0.60%   [k] mntput
>    0.49%   [k] kfree
>    0.36%   [k] _copy_to_user
>    0.31%   [k] int_ret_from_sys_call_irqs_off
>
> looks to me like it's spending a lot of time in system call auditing.
> Which makes no sense to me, since none of this should be triggering
> any auditing. And there's a lot of time in low-level kernel system
> call assembly code.

Muahaha.  The auditors have invaded your system.  (I did my little
benchmark with a more sensible configuration -- see way below).

Can you send the output of:

# auditctl -s
# auditctl -l

Are you, perchance, using Fedora?  I lobbied rather heavily, and
successfully, to get the default configuration to stop auditing.
Unfortunately, the fix wasn't retroactive, so, unless you have a very
fresh install, you might want to apply the fix yourself:

https://fedorahosted.org/fesco/ticket/1311

>
> If I only remembered the name of the crazy person who said "Ok" when I
> suggest he just be the maintainer of the code since he has spent a lot
> of time sending patches for it. Something like Amdy Letorsky. No, that
> wasn't it. Hmm. It's on the tip of my tongue.
>
> Oh well. Maybe somebody can remember the guys name. It's familiar for
> some reason. Andy?

Amdy Lumirtowsky thinks he meant to attach a condition to his
maintainerish activities: he will do his best to keep the audit code
*out* of the low-level stuff, but he's going to try to avoid ever
touching the audit code itself, because if he ever had to change it,
he might accidentally delete the entire file.

Seriously, wasn't there a TAINT_PERFORMANCE thing proposed at some
point?  I would love auditing to set some really loud global warning
that you've just done a Bad Thing (tm) performance-wise by enabling
it.

Back to timing.  With kvm-clock, I see:

  23.80%  timing_test_64  [kernel.kallsyms]   [k] pvclock_clocksource_read
  15.57%  timing_test_64  libc-2.20.so        [.] syscall
  12.39%  timing_test_64  [kernel.kallsyms]   [k] system_call
  12.35%  timing_test_64  [kernel.kallsyms]   [k] copy_user_generic_string
  10.95%  timing_test_64  [kernel.kallsyms]   [k] system_call_after_swapgs
   7.35%  timing_test_64  [kernel.kallsyms]   [k] ktime_get_ts64
   6.20%  timing_test_64  [kernel.kallsyms]   [k] sys_clock_gettime
   3.62%  timing_test_64  [kernel.kallsyms]   [k] system_call_fastpath
   2.08%  timing_test_64  timing_test_64      [.] main
   1.72%  timing_test_64  timing_test_64      [.] syscall@plt
   1.58%  timing_test_64  [kernel.kallsyms]   [k] kvm_clock_get_cycles
   1.22%  timing_test_64  [kernel.kallsyms]   [k] _copy_to_user
   0.65%  timing_test_64  [kernel.kallsyms]   [k] posix_ktime_get_ts
   0.13%  timing_test_64  [kernel.kallsyms]   [k] apic_timer_interrupt

We've got some silly indirection, a uaccess that probably didn't get
optimized very well, and the terrifying function
pvclock_clocksource_read.

By comparison, using tsc:

  19.51%  timing_test_64  libc-2.20.so       [.] syscall
  15.52%  timing_test_64  [kernel.kallsyms]  [k] system_call
  15.25%  timing_test_64  [kernel.kallsyms]  [k] copy_user_generic_string
  14.34%  timing_test_64  [kernel.kallsyms]  [k] system_call_after_swapgs
   8.66%  timing_test_64  [kernel.kallsyms]  [k] ktime_get_ts64
   6.95%  timing_test_64  [kernel.kallsyms]  [k] sys_clock_gettime
   5.93%  timing_test_64  [kernel.kallsyms]  [k] native_read_tsc
   5.12%  timing_test_64  [kernel.kallsyms]  [k] system_call_fastpath
   2.62%  timing_test_64  timing_test_64     [.] main

That's better, although the uaccess silliness is still there.

(No PEBS -- I don't think it works right in KVM.)

--Andy

>
>              Linus



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 21:42                             ` Andy Lutomirski
@ 2015-04-17 22:04                               ` Linus Torvalds
  2015-04-17 22:25                                 ` Andy Lutomirski
  0 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2015-04-17 22:04 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: John Stultz, Marcelo Tosatti, Paolo Bonzini, Peter Zijlstra,
	linux-kernel, Gleb Natapov, kvm list, Ralf Baechle,
	Andrew Lutomirski

On Fri, Apr 17, 2015 at 5:42 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> Muahaha.  The auditors have invaded your system.  (I did my little
> benchmark with a more sensible configuration -- see way below).
>
> Can you send the output of:
>
> # auditctl -s
> # auditctl -l

  # auditctl -s
  enabled 1
  flag 1
  pid 822
  rate_limit 0
  backlog_limit 320
  lost 0
  backlog 0
  backlog_wait_time 60000
  loginuid_immutable 0 unlocked
  # auditctl -l
  No rules

> Are you, perchance, using Fedora?

F21. Yup.

I used to just disable auditing in the kernel entirely, but then I
ended up deciding that I need to run something closer to the broken
Fedora config (selinux in particular) in order to actually optimize
the real-world pathname handling situation rather than the _sane_ one.
Oh well. I think audit support got enabled at the same time in my
kernels because I ended up using the default config and then taking
out the truly crazy stuff without noticing AUDITSYSCALL.

> I lobbied rather heavily, and
> successfully, to get the default configuration to stop auditing.
> Unfortunately, the fix wasn't retroactive, so, unless you have a very
> fresh install, you might want to apply the fix yourself:

Is that fix happening in Fedora going forward, though? Like F22?

> Amdy Lumirtowsky thinks he meant to attach a condition to his
> maintainerish activities: he will do his best to keep the audit code
> *out* of the low-level stuff, but he's going to try to avoid ever
> touching the audit code itself, because if he ever had to change it,
> he might accidentally delete the entire file.

Oooh. That would be _such_ a shame.

Can we please do it by mistake? "Oops, my fingers slipped"

> Seriously, wasn't there a TAINT_PERFORMANCE thing proposed at some
> point?  I would love auditing to set some really loud global warning
> that you've just done a Bad Thing (tm) performance-wise by enabling
> it.

Or even just a big fat warning in dmesg the first time auditing triggers.

> Back to timing.  With kvm-clock, I see:
>
>   23.80%  timing_test_64  [kernel.kallsyms]   [k] pvclock_clocksource_read

Oh wow. How can that possibly be sane?

Isn't the *whole* point of pvclock_clocksource_read() to be a native
rdtsc with scaling? How does it cause that kind of insane pain?

Oh well. Some paravirt person would need to look and care.

           Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 22:04                               ` Linus Torvalds
@ 2015-04-17 22:25                                 ` Andy Lutomirski
  2015-04-17 23:39                                   ` Marcelo Tosatti
  2015-04-18 16:20                                   ` Paolo Bonzini
  0 siblings, 2 replies; 34+ messages in thread
From: Andy Lutomirski @ 2015-04-17 22:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: John Stultz, Marcelo Tosatti, Paolo Bonzini, Peter Zijlstra,
	linux-kernel, Gleb Natapov, kvm list, Ralf Baechle,
	Andrew Lutomirski

On Fri, Apr 17, 2015 at 3:04 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Apr 17, 2015 at 5:42 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> Muahaha.  The auditors have invaded your system.  (I did my little
>> benchmark with a more sensible configuration -- see way below).
>>
>> Can you send the output of:
>>
>> # auditctl -s
>> # auditctl -l
>
>   # auditctl -s
>   enabled 1
>   flag 1
>   pid 822
>   rate_limit 0
>   backlog_limit 320
>   lost 0
>   backlog 0
>   backlog_wait_time 60000
>   loginuid_immutable 0 unlocked
>   # auditctl -l
>   No rules

Yes, "No rules" doesn't mean "don't audit".

>
>> Are you, perchance, using Fedora?
>
> F21. Yup.
>
> I used to just disable auditing in the kernel entirely, but then I
> ended up deciding that I need to run something closer to the broken
> Fedora config (selinux in particular) in order to actually optimize
> the real-world pathname handling situation rather than the _sane_ one.
> Oh well. I think audit support got enabled at the same time in my
> kernels because I ended up using the default config and then taking
> out the truly crazy stuff without noticing AUDITSYSCALL.
>
>> I lobbied rather heavily, and
>> successfully, to get the default configuration to stop auditing.
>> Unfortunately, the fix wasn't retroactive, so, unless you have a very
>> fresh install, you might want to apply the fix yourself:
>
> Is that fix happening in Fedora going forward, though? Like F22?

It's in F21, actually.  Unfortunately, the audit package is really
weird.  It installs /etc/audit/rules.d/audit.rules, containing:

# This file contains the auditctl rules that are loaded
# whenever the audit daemon is started via the initscripts.
# The rules are simply the parameters that would be passed
# to auditctl.

# First rule - delete all
-D

# This suppresses syscall auditing for all tasks started
# with this rule in effect.  Remove it if you need syscall
# auditing.
-a task,never

Then, if it's a fresh install (i.e. /etc/audit/audit.rules doesn't
exist) it copies that file to /etc/audit/audit.rules post-install.
(No, I don't know why it works this way.)

IOW, you might want to copy your /etc/audit/rules.d/audit.rules to
/etc/audit/audit.rules.  I think you need to reboot to get the full
effect.  You could apply the rule manually (or maybe restart the audit
service), but it will only affect newly-started tasks.

>
>> Amdy Lumirtowsky thinks he meant to attach a condition to his
>> maintainerish activities: he will do his best to keep the audit code
>> *out* of the low-level stuff, but he's going to try to avoid ever
>> touching the audit code itself, because if he ever had to change it,
>> he might accidentally delete the entire file.
>
> Oooh. That would be _such_ a shame.
>
> Can we please do it by mistake? "Oops, my fingers slipped"
>
>> Seriously, wasn't there a TAINT_PERFORMANCE thing proposed at some
>> point?  I would love auditing to set some really loud global warning
>> that you've just done a Bad Thing (tm) performance-wise by enabling
>> it.
>
> Or even just a big fat warning in dmesg the first time auditing triggers.
>
>> Back to timing.  With kvm-clock, I see:
>>
>>   23.80%  timing_test_64  [kernel.kallsyms]   [k] pvclock_clocksource_read
>
> Oh wow. How can that possibly be sane?
>
> Isn't the *whole* point of pvclock_clocksource_read() to be a native
> rdtsc with scaling? How does it cause that kind of insane pain?

An unnecessarily complicated protocol, a buggy host implementation,
and an unnecessarily complicated guest implementation :(

>
> Oh well. Some paravirt person would need to look and care.

The code there is a bit scary.

--Andy

>
>            Linus



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 22:25                                 ` Andy Lutomirski
@ 2015-04-17 23:39                                   ` Marcelo Tosatti
  2015-04-18 16:20                                   ` Paolo Bonzini
  1 sibling, 0 replies; 34+ messages in thread
From: Marcelo Tosatti @ 2015-04-17 23:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, John Stultz, Paolo Bonzini, Peter Zijlstra,
	linux-kernel, Gleb Natapov, kvm list, Ralf Baechle,
	Andrew Lutomirski

On Fri, Apr 17, 2015 at 03:25:28PM -0700, Andy Lutomirski wrote:
> On Fri, Apr 17, 2015 at 3:04 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Fri, Apr 17, 2015 at 5:42 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> >>
> >> Muahaha.  The auditors have invaded your system.  (I did my little
> >> benchmark with a more sensible configuration -- see way below).
> >>
> >> Can you send the output of:
> >>
> >> # auditctl -s
> >> # auditctl -l
> >
> >   # auditctl -s
> >   enabled 1
> >   flag 1
> >   pid 822
> >   rate_limit 0
> >   backlog_limit 320
> >   lost 0
> >   backlog 0
> >   backlog_wait_time 60000
> >   loginuid_immutable 0 unlocked
> >   # auditctl -l
> >   No rules
> 
> Yes, "No rules" doesn't mean "don't audit".
> 
> >
> >> Are you, perchance, using Fedora?
> >
> > F21. Yup.
> >
> > I used to just disable auditing in the kernel entirely, but then I
> > ended up deciding that I need to run something closer to the broken
> > Fedora config (selinux in particular) in order to actually optimize
> > the real-world pathname handling situation rather than the _sane_ one.
> > Oh well. I think audit support got enabled at the same time in my
> > kernels because I ended up using the default config and then taking
> > out the truly crazy stuff without noticing AUDITSYSCALL.
> >
> >> I lobbied rather heavily, and
> >> successfully, to get the default configuration to stop auditing.
> >> Unfortunately, the fix wasn't retroactive, so, unless you have a very
> >> fresh install, you might want to apply the fix yourself:
> >
> > Is that fix happening in Fedora going forward, though? Like F22?
> 
> It's in F21, actually.  Unfortunately, the audit package is really
> weird.  It installs /etc/audit/rules.d/audit.rules, containing:
> 
> # This file contains the auditctl rules that are loaded
> # whenever the audit daemon is started via the initscripts.
> # The rules are simply the parameters that would be passed
> # to auditctl.
> 
> # First rule - delete all
> -D
> 
> # This suppresses syscall auditing for all tasks started
> # with this rule in effect.  Remove it if you need syscall
> # auditing.
> -a task,never
> 
> Then, if it's a fresh install (i.e. /etc/audit/audit.rules doesn't
> exist) it copies that file to /etc/audit/audit.rules post-install.
> (No, I don't know why it works this way.)
> 
> IOW, you might want to copy your /etc/audit/rules.d/audit.rules to
> /etc/audit/audit.rules.  I think you need to reboot to get the full
> effect.  You could apply the rule manually (or maybe restart the audit
> service), but it will only affect newly-started tasks.
> 
> >
> >> Amdy Lumirtowsky thinks he meant to attach a condition to his
> >> maintainerish activities: he will do his best to keep the audit code
> >> *out* of the low-level stuff, but he's going to try to avoid ever
> >> touching the audit code itself, because if he ever had to change it,
> >> he might accidentally delete the entire file.
> >
> > Oooh. That would be _such_ a shame.
> >
> > Can we please do it by mistake? "Oops, my fingers slipped"
> >
> >> Seriously, wasn't there a TAINT_PERFORMANCE thing proposed at some
> >> point?  I would love auditing to set some really loud global warning
> >> that you've just done a Bad Thing (tm) performance-wise by enabling
> >> it.
> >
> > Or even just a big fat warning in dmesg the first time auditing triggers.
> >
> >> Back to timing.  With kvm-clock, I see:
> >>
> >>   23.80%  timing_test_64  [kernel.kallsyms]   [k] pvclock_clocksource_read
> >
> > Oh wow. How can that possibly be sane?
> >
> > Isn't the *whole* point of pvclock_clocksource_read() to be a native
> > rdtsc with scaling? How does it cause that kind of insane pain?
> 
> An unnecessarily complicated protocol, a buggy host implementation,
> and an unnecessarily complicated guest implementation :(

How about start by removing the unnecessary rdtsc-barrier? 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 22:25                                 ` Andy Lutomirski
  2015-04-17 23:39                                   ` Marcelo Tosatti
@ 2015-04-18 16:20                                   ` Paolo Bonzini
  1 sibling, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2015-04-18 16:20 UTC (permalink / raw)
  To: Andy Lutomirski, Linus Torvalds
  Cc: John Stultz, Marcelo Tosatti, Peter Zijlstra, linux-kernel,
	Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski



On 18/04/2015 00:25, Andy Lutomirski wrote:
>> Isn't the *whole* point of pvclock_clocksource_read() to be a native
>> rdtsc with scaling? How does it cause that kind of insane pain?

It's possible that your machine ends up with PVCLOCK_TSC_STABLE_BIT
clear, so you get an atomic cmpxchg in addition (and associated
cacheline bouncing, since anything reading the clocksource in the kernel
will cause that variable to bounce).  But that's not too common on
recent machines.

Is the vsyscall faster for you or does it degenerate to the syscall?  If
so, you have PVCLOCK_TSC_STABLE_BIT clear.

> An unnecessarily complicated protocol, a buggy host implementation,
> and an unnecessarily complicated guest implementation :(

pvclock_clocksource_read() itself is not scary and need not worry about
the buggy host implementation (preempt_disable makes things easy).  It's
the vDSO stuff that has the nice things.

There's a few micro-optimizations that we could do (the guest
implementation _is_ unnecessarily baroque), but it may not be enough if
the rdtsc_barrier()s (lfence) are the performance killers.  Will look
closely on Monday.

Paolo

>> Oh well. Some paravirt person would need to look and care.
> 
> The code there is a bit scary.
> 
> --Andy
> 
>>
>>            Linus
> 
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-17 20:18                       ` Marcelo Tosatti
  2015-04-17 20:39                         ` Andy Lutomirski
@ 2015-04-20 16:59                         ` Paolo Bonzini
  2015-04-20 20:27                           ` Andy Lutomirski
  2015-04-22 20:56                           ` Marcelo Tosatti
  1 sibling, 2 replies; 34+ messages in thread
From: Paolo Bonzini @ 2015-04-20 16:59 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Peter Zijlstra, torvalds, linux-kernel, gleb, kvm, Ralf Baechle, luto



On 17/04/2015 22:18, Marcelo Tosatti wrote:
> The bug which this is fixing is very rare, have no memory of a report.
> 
> In fact, its even difficult to create a synthetic reproducer.

But then why was the task migration notifier even in Jeremy's original
code for Xen?  Was it supposed to work even on non-synchronized TSC?

If that's the case, then it could be reverted indeed; but then why did
you commit this patch to 4.1?  Did you think of something that would
cause the seqcount-like protocol to fail, and that turned out not to be
the case later?  I was only following the mailing list sparsely in March.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-20 16:59                         ` Paolo Bonzini
@ 2015-04-20 20:27                           ` Andy Lutomirski
  2015-04-22 21:21                             ` Marcelo Tosatti
  2015-04-22 20:56                           ` Marcelo Tosatti
  1 sibling, 1 reply; 34+ messages in thread
From: Andy Lutomirski @ 2015-04-20 20:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marcelo Tosatti, Peter Zijlstra, Linus Torvalds, linux-kernel,
	Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski

On Mon, Apr 20, 2015 at 9:59 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 17/04/2015 22:18, Marcelo Tosatti wrote:
>> The bug which this is fixing is very rare, have no memory of a report.
>>
>> In fact, its even difficult to create a synthetic reproducer.
>
> But then why was the task migration notifier even in Jeremy's original
> code for Xen?  Was it supposed to work even on non-synchronized TSC?
>
> If that's the case, then it could be reverted indeed; but then why did
> you commit this patch to 4.1?  Did you think of something that would
> cause the seqcount-like protocol to fail, and that turned out not to be
> the case later?  I was only following the mailing list sparsely in March.

I don't think anyone ever tried that hard to test this stuff.  There
was an infinte loop that Firefox was triggering as a KVM guest
somewhat reliably until a couple months ago in the same vdso code.  :(

--Andy

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-20 16:59                         ` Paolo Bonzini
  2015-04-20 20:27                           ` Andy Lutomirski
@ 2015-04-22 20:56                           ` Marcelo Tosatti
  2015-04-22 21:01                             ` Paolo Bonzini
  1 sibling, 1 reply; 34+ messages in thread
From: Marcelo Tosatti @ 2015-04-22 20:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Zijlstra, torvalds, linux-kernel, gleb, kvm, Ralf Baechle, luto

On Mon, Apr 20, 2015 at 06:59:04PM +0200, Paolo Bonzini wrote:
> 
> 
> On 17/04/2015 22:18, Marcelo Tosatti wrote:
> > The bug which this is fixing is very rare, have no memory of a report.
> > 
> > In fact, its even difficult to create a synthetic reproducer.
> 
> But then why was the task migration notifier even in Jeremy's original
> code for Xen? 

To cover for the vcpu1 -> vcpu2 -> vcpu1 case, i believe.

> Was it supposed to work even on non-synchronized TSC?

Yes it is supposed to work on non-synchronized TSC.

> If that's the case, then it could be reverted indeed; but then why did
> you commit this patch to 4.1? 

Because it fixes the problem Andy reported (see Subject: KVM: x86: fix
kvmclock write race (v2) on kvm@). As long as you have Radim's
fix on top.

> Did you think of something that would
> cause the seqcount-like protocol to fail, and that turned out not to be
> the case later?  I was only following the mailing list sparsely in March.

No.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-22 20:56                           ` Marcelo Tosatti
@ 2015-04-22 21:01                             ` Paolo Bonzini
  2015-04-22 22:55                               ` Marcelo Tosatti
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2015-04-22 21:01 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Peter Zijlstra, torvalds, linux-kernel, gleb, kvm, Ralf Baechle, luto



On 22/04/2015 22:56, Marcelo Tosatti wrote:
>> > But then why was the task migration notifier even in Jeremy's original
>> > code for Xen? 
> To cover for the vcpu1 -> vcpu2 -> vcpu1 case, i believe.

Ok, to cover it for non-synchronized TSC.  While KVM requires
synchronized TSC.

> > If that's the case, then it could be reverted indeed; but then why did
> > you commit this patch to 4.1? 
> 
> Because it fixes the problem Andy reported (see Subject: KVM: x86: fix
> kvmclock write race (v2) on kvm@). As long as you have Radim's
> fix on top.

But if it's so rare, and it was known that fixing the host protocol was
just as good a solution, why was the guest fix committed?

I'm just trying to understand.  I am worried that this patch was rushed
in; so far I had assumed it wasn't (a revert of a revert is rare enough
that you don't do it lightly...) but maybe I was wrong.

Right now I cannot even decide whether to revert it (and please Peter in
the process :)) or submit the Kconfig symbol patch officially.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-20 20:27                           ` Andy Lutomirski
@ 2015-04-22 21:21                             ` Marcelo Tosatti
  2015-04-23  9:13                               ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Marcelo Tosatti @ 2015-04-22 21:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Paolo Bonzini, Peter Zijlstra, Linus Torvalds, linux-kernel,
	Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski

On Mon, Apr 20, 2015 at 01:27:58PM -0700, Andy Lutomirski wrote:
> On Mon, Apr 20, 2015 at 9:59 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> >
> > On 17/04/2015 22:18, Marcelo Tosatti wrote:
> >> The bug which this is fixing is very rare, have no memory of a report.
> >>
> >> In fact, its even difficult to create a synthetic reproducer.
> >
> > But then why was the task migration notifier even in Jeremy's original
> > code for Xen?  Was it supposed to work even on non-synchronized TSC?
> >
> > If that's the case, then it could be reverted indeed; but then why did
> > you commit this patch to 4.1?  Did you think of something that would
> > cause the seqcount-like protocol to fail, and that turned out not to be
> > the case later?  I was only following the mailing list sparsely in March.
> 
> I don't think anyone ever tried that hard to test this stuff.  There
> was an infinte loop that Firefox was triggering as a KVM guest
> somewhat reliably until a couple months ago in the same vdso code.  :(

https://bugzilla.redhat.com/show_bug.cgi?id=1174664

--- Comment #5 from Juan Quintela <quintela@redhat.com> ---

Another round

# dmesg | grep msr
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] kvm-clock: cpu 0, msr 1:1ffd8001, primary cpu clock
[    0.000000] kvm-stealtime: cpu 0, msr 11fc0d100
[    0.041174] kvm-clock: cpu 1, msr 1:1ffd8041, secondary cpu clock
[    0.053011] kvm-stealtime: cpu 1, msr 11fc8d100


After start:

[root@trasno yum.repos.d]# virsh qemu-monitor-command --hmp browser  'xp
/8x
0x1ffd8000'
000000001ffd8000: 0x3b401060 0xfffc7f4b 0x3b42d040 0xfffc7f4b
000000001ffd8010: 0x3b42d460 0xfffc7f4b 0x3b42d4c0 0xfffc7f4b


[root@trasno yum.repos.d]# virsh qemu-monitor-command --hmp browser  'xp /8x 0x1ffd8040'
000000001ffd8040: 0x3b42d700 0xfffc7f4b 0x3b42d760 0xfffc7f4b
000000001ffd8050: 0x3b42d7c0 0xfffc7f4b 0x3b42d820 0xfffc7f4b

When firefox hangs

[root@trasno yum.repos.d]# virsh qemu-monitor-command --hmp browser  'xp
/8x
0x1ffd8000'
000000001ffd8000: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
000000001ffd8010: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a


[root@trasno yum.repos.d]# virsh qemu-monitor-command --hmp browser  'xp
/8x
0x1ffd8040'
000000001ffd8040: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a
000000001ffd8050: 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a 0x5a5a5a5a



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-22 21:01                             ` Paolo Bonzini
@ 2015-04-22 22:55                               ` Marcelo Tosatti
  2015-04-23 11:29                                 ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Marcelo Tosatti @ 2015-04-22 22:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Zijlstra, torvalds, linux-kernel, gleb, kvm, Ralf Baechle, luto

On Wed, Apr 22, 2015 at 11:01:49PM +0200, Paolo Bonzini wrote:
> 
> 
> On 22/04/2015 22:56, Marcelo Tosatti wrote:
> >> > But then why was the task migration notifier even in Jeremy's original
> >> > code for Xen? 
> > To cover for the vcpu1 -> vcpu2 -> vcpu1 case, i believe.
> 
> Ok, to cover it for non-synchronized TSC.  While KVM requires
> synchronized TSC.
> 
> > > If that's the case, then it could be reverted indeed; but then why did
> > > you commit this patch to 4.1? 
> > 
> > Because it fixes the problem Andy reported (see Subject: KVM: x86: fix
> > kvmclock write race (v2) on kvm@). As long as you have Radim's
> > fix on top.
> 
> But if it's so rare, and it was known that fixing the host protocol was
> just as good a solution, why was the guest fix committed?

I don't know. Should have fixed the host protocol.

> I'm just trying to understand.  I am worried that this patch was rushed
> in; so far I had assumed it wasn't (a revert of a revert is rare enough
> that you don't do it lightly...) but maybe I was wrong.

Yes it was rushed in.

> Right now I cannot even decide whether to revert it (and please Peter in
> the process :)) or submit the Kconfig symbol patch officially.
> 
> Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-22 21:21                             ` Marcelo Tosatti
@ 2015-04-23  9:13                               ` Paolo Bonzini
  2015-04-23 11:51                                 ` Marcelo Tosatti
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2015-04-23  9:13 UTC (permalink / raw)
  To: Marcelo Tosatti, Andy Lutomirski
  Cc: Peter Zijlstra, Linus Torvalds, linux-kernel, Gleb Natapov,
	kvm list, Ralf Baechle, Andrew Lutomirski



On 22/04/2015 23:21, Marcelo Tosatti wrote:
> On Mon, Apr 20, 2015 at 01:27:58PM -0700, Andy Lutomirski wrote:
>> On Mon, Apr 20, 2015 at 9:59 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>
>>>
>>> On 17/04/2015 22:18, Marcelo Tosatti wrote:
>>>> The bug which this is fixing is very rare, have no memory of a report.
>>>>
>>>> In fact, its even difficult to create a synthetic reproducer.
>>>
>>> But then why was the task migration notifier even in Jeremy's original
>>> code for Xen?  Was it supposed to work even on non-synchronized TSC?
>>>
>>> If that's the case, then it could be reverted indeed; but then why did
>>> you commit this patch to 4.1?  Did you think of something that would
>>> cause the seqcount-like protocol to fail, and that turned out not to be
>>> the case later?  I was only following the mailing list sparsely in March.
>>
>> I don't think anyone ever tried that hard to test this stuff.  There
>> was an infinte loop that Firefox was triggering as a KVM guest
>> somewhat reliably until a couple months ago in the same vdso code.  :(
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1174664

That was the missing volatile in an asm.  Older compilers didn't catch
it. :(

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-22 22:55                               ` Marcelo Tosatti
@ 2015-04-23 11:29                                 ` Paolo Bonzini
  0 siblings, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2015-04-23 11:29 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Peter Zijlstra, torvalds, linux-kernel, gleb, kvm, Ralf Baechle, luto



On 23/04/2015 00:55, Marcelo Tosatti wrote:
> On Wed, Apr 22, 2015 at 11:01:49PM +0200, Paolo Bonzini wrote:
>>
>>
>> On 22/04/2015 22:56, Marcelo Tosatti wrote:
>>>>> But then why was the task migration notifier even in Jeremy's original
>>>>> code for Xen? 
>>> To cover for the vcpu1 -> vcpu2 -> vcpu1 case, i believe.
>>
>> Ok, to cover it for non-synchronized TSC.  While KVM requires
>> synchronized TSC.
>>
>>>> If that's the case, then it could be reverted indeed; but then why did
>>>> you commit this patch to 4.1? 
>>>
>>> Because it fixes the problem Andy reported (see Subject: KVM: x86: fix
>>> kvmclock write race (v2) on kvm@). As long as you have Radim's
>>> fix on top.
>>
>> But if it's so rare, and it was known that fixing the host protocol was
>> just as good a solution, why was the guest fix committed?
> 
> I don't know. Should have fixed the host protocol.

No problem.  Let's do the right thing now.

>> I'm just trying to understand.  I am worried that this patch was rushed
>> in; so far I had assumed it wasn't (a revert of a revert is rare enough
>> that you don't do it lightly...) but maybe I was wrong.
> 
> Yes it was rushed in.

Ok, so re-reverted it will be.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-23  9:13                               ` Paolo Bonzini
@ 2015-04-23 11:51                                 ` Marcelo Tosatti
  2015-04-23 12:02                                   ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: Marcelo Tosatti @ 2015-04-23 11:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Andy Lutomirski, Peter Zijlstra, Linus Torvalds, linux-kernel,
	Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski

On Thu, Apr 23, 2015 at 11:13:23AM +0200, Paolo Bonzini wrote:
> 
> 
> On 22/04/2015 23:21, Marcelo Tosatti wrote:
> > On Mon, Apr 20, 2015 at 01:27:58PM -0700, Andy Lutomirski wrote:
> >> On Mon, Apr 20, 2015 at 9:59 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>>
> >>>
> >>> On 17/04/2015 22:18, Marcelo Tosatti wrote:
> >>>> The bug which this is fixing is very rare, have no memory of a report.
> >>>>
> >>>> In fact, its even difficult to create a synthetic reproducer.
> >>>
> >>> But then why was the task migration notifier even in Jeremy's original
> >>> code for Xen?  Was it supposed to work even on non-synchronized TSC?
> >>>
> >>> If that's the case, then it could be reverted indeed; but then why did
> >>> you commit this patch to 4.1?  Did you think of something that would
> >>> cause the seqcount-like protocol to fail, and that turned out not to be
> >>> the case later?  I was only following the mailing list sparsely in March.
> >>
> >> I don't think anyone ever tried that hard to test this stuff.  There
> >> was an infinte loop that Firefox was triggering as a KVM guest
> >> somewhat reliably until a couple months ago in the same vdso code.  :(
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1174664
> 
> That was the missing volatile in an asm.  Older compilers didn't catch
> it. :(

How do you know that? It looks like memory corruption (look at the
pattern at the end).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-23 11:51                                 ` Marcelo Tosatti
@ 2015-04-23 12:02                                   ` Paolo Bonzini
  2015-04-23 17:06                                     ` Marcelo Tosatti
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2015-04-23 12:02 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Andy Lutomirski, Peter Zijlstra, Linus Torvalds, linux-kernel,
	Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski



On 23/04/2015 13:51, Marcelo Tosatti wrote:
>>> > > https://bugzilla.redhat.com/show_bug.cgi?id=1174664
>> > 
>> > That was the missing volatile in an asm.  Older compilers didn't catch
>> > it. :(
> How do you know that? It looks like memory corruption (look at the
> pattern at the end).

I suspect some kind of operator error there, it makes no sense.

On the other hand, bug 1178975 is much clearer and the symptoms are the
same.  In that bug, you can see that the same kernel source works on f20
(package version 3.17.7-200.fc20.x86_64) and fails on f21 (package
version 3.17.7-300.fc21.x86_64).  Of course the compiler is different.
The newer one hoists the lsl out of the loop; if you get a CPU migration
at the wrong time, the cpu != cpu1 condition will always be true the
loop will never exit.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] First batch of KVM changes for 4.1
  2015-04-23 12:02                                   ` Paolo Bonzini
@ 2015-04-23 17:06                                     ` Marcelo Tosatti
  0 siblings, 0 replies; 34+ messages in thread
From: Marcelo Tosatti @ 2015-04-23 17:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Andy Lutomirski, Peter Zijlstra, Linus Torvalds, linux-kernel,
	Gleb Natapov, kvm list, Ralf Baechle, Andrew Lutomirski

On Thu, Apr 23, 2015 at 02:02:29PM +0200, Paolo Bonzini wrote:
> 
> 
> On 23/04/2015 13:51, Marcelo Tosatti wrote:
> >>> > > https://bugzilla.redhat.com/show_bug.cgi?id=1174664
> >> > 
> >> > That was the missing volatile in an asm.  Older compilers didn't catch
> >> > it. :(
> > How do you know that? It looks like memory corruption (look at the
> > pattern at the end).
> 
> I suspect some kind of operator error there, it makes no sense.


        if (unlikely(s->flags & SLAB_POISON))
                memset(start, POISON_INUSE, PAGE_SIZE << order);

 *      Padding is done using 0x5a (POISON_INUSE)

> On the other hand, bug 1178975 is much clearer and the symptoms are the
> same.  In that bug, you can see that the same kernel source works on f20
> (package version 3.17.7-200.fc20.x86_64) and fails on f21 (package
> version 3.17.7-300.fc21.x86_64).  Of course the compiler is different.
> The newer one hoists the lsl out of the loop; if you get a CPU migration
> at the wrong time, the cpu != cpu1 condition will always be true the
> loop will never exit.
> 
> Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2015-04-23 17:07 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-10 15:01 [GIT PULL] First batch of KVM changes for 4.1 Paolo Bonzini
2015-04-17  8:52 ` Peter Zijlstra
2015-04-17  9:17   ` Peter Zijlstra
2015-04-17 10:09     ` Paolo Bonzini
2015-04-17 10:36       ` Peter Zijlstra
2015-04-17 10:38         ` Paolo Bonzini
2015-04-17 10:55           ` Peter Zijlstra
2015-04-17 12:46             ` Paolo Bonzini
2015-04-17 13:10               ` Peter Zijlstra
2015-04-17 13:38                 ` Paolo Bonzini
2015-04-17 13:43                   ` Peter Zijlstra
2015-04-17 14:57                     ` Paolo Bonzini
2015-04-17 19:01                   ` Marcelo Tosatti
2015-04-17 19:16                     ` Andy Lutomirski
2015-04-17 19:57                     ` Paolo Bonzini
2015-04-17 20:18                       ` Marcelo Tosatti
2015-04-17 20:39                         ` Andy Lutomirski
2015-04-17 21:28                           ` Linus Torvalds
2015-04-17 21:42                             ` Andy Lutomirski
2015-04-17 22:04                               ` Linus Torvalds
2015-04-17 22:25                                 ` Andy Lutomirski
2015-04-17 23:39                                   ` Marcelo Tosatti
2015-04-18 16:20                                   ` Paolo Bonzini
2015-04-20 16:59                         ` Paolo Bonzini
2015-04-20 20:27                           ` Andy Lutomirski
2015-04-22 21:21                             ` Marcelo Tosatti
2015-04-23  9:13                               ` Paolo Bonzini
2015-04-23 11:51                                 ` Marcelo Tosatti
2015-04-23 12:02                                   ` Paolo Bonzini
2015-04-23 17:06                                     ` Marcelo Tosatti
2015-04-22 20:56                           ` Marcelo Tosatti
2015-04-22 21:01                             ` Paolo Bonzini
2015-04-22 22:55                               ` Marcelo Tosatti
2015-04-23 11:29                                 ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).