All of lore.kernel.org
 help / color / mirror / Atom feed
* Linux 4.15-rc2
@ 2017-12-03 16:22 Linus Torvalds
  2017-12-04 22:25 ` Linux 4.15-rc2: Regression in resume from ACPI S3 Rafael J. Wysocki
  2018-02-21 18:36 ` Linux 4.15-rc2 Eugene Syromiatnikov
  0 siblings, 2 replies; 83+ messages in thread
From: Linus Torvalds @ 2017-12-03 16:22 UTC (permalink / raw)
  To: Linux Kernel Mailing List

It's Sunday, but a few hours earlier than usual, since I'm on the east
coast, three hours ahead of my normal release schedule.

It's a slightly bigger rc2 than I would have wished for, but this
early in the release process I don't worry about it. The appended
shortlog gives the details, it's fixes all over the place -
architectures, drivers, filesystems, networking, core kernel.

One thing I'll point out is that I'm trying to get some kernel ASLR
leaks plugged, and as part of that we now hash any pointers printed by
"%p' by default. That won't affect a lot of people, but where it is a
debugging problem (rather than leaking interesting kernel pointers),
we will have to fix things up.

It can be a small annoyance, but the alternatives (trying to actually
find all the cases where we might be leaking) were worse. But let's
see if anybody even notices - a lot of the pointer printouts are stale
debug information from when some driver was originally written, and
aren't actually really interesting.

There will probably be some more leak fixes during this rc process,
we'll see how that all sorts out.

                 Linus

---

Abhishek Goel (2):
      cpupowerutils: bench - Fix cpu online check
      cpupower : Fix cpupower working when cpu0 is offline

Adrian Hunter (4):
      mmc: block: Fix missing blk_put_request()
      mmc: block: Check return value of blk_get_request()
      mmc: core: Do not leave the block driver in a suspended state
      mmc: block: Ensure that debugfs files are removed

Ahmad Fatoum (1):
      e1000: Fix off-by-one in debug message

Alex Deucher (4):
      drm/amdgpu/gfx7: cache raster_config values
      drm/amdgpu: used cached gca values for cik_read_register
      Revert "drm/amdgpu: fix rmmod KCQ disable failed error"
      drm/amdgpu: drop experimental flag for raven

Amritha Nambiar (1):
      i40e: Fix reporting incorrect error codes

Andrew Elble (2):
      nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class
      nfsd: check for use of the closed special stateid

Andrew Jiang (1):
      drm/amd/display: Don't reject 3D timings

Andrew Waterman (3):
      RISC-V: Add VDSO entries for clock_get/gettimeofday/getcpu
      RISC-V: Flush I$ when making a dirty page executable
      RISC-V: Allow userspace to flush the instruction cache

Andrey Grodzovsky (1):
      drm/amd/display: Switch to drm_atomic_helper_wait_for_flip_done

Andrey Gusakov (6):
      drm/bridge: tc358767: do no fail on hi-res displays
      drm/bridge: tc358767: filter out too high modes
      drm/bridge: tc358767: fix DP0_MISC register set
      drm/bridge: tc358767: fix timing calculations
      drm/bridge: tc358767: fix AUXDATAn registers access
      drm/bridge: tc358767: fix 1-lane behavior

Andy Shevchenko (1):
      scripts/bloat-o-meter: don't fail with division by 0

Antoine Tenart (4):
      net: mvpp2: fix the txq_init error path
      net: mvpp2: cleanup probed ports in the probe error path
      net: mvpp2: check ethtool sets the Tx ring size is to a valid min value
      net: phy: marvell10g: fix the PHY id mask

Ard Biesheuvel (2):
      arm64: module-plts: factor out PLT generation code for ftrace
      arm64: ftrace: emit ftrace-mod.o contents through code

Arnd Bergmann (2):
      drm/i915: fix intel_backlight_device_register declaration
      drm/omap: displays: panel-dpi: add backlight dependency

Bartosz Golaszewski (1):
      eeprom: at24: correctly set the size for at24mac402

Bastian Stender (2):
      mmc: core: prepend 0x to pre_eol_info entry in sysfs
      mmc: core: prepend 0x to OCR entry in sysfs

Benjamin Gaignard (1):
      ethernet: dwmac-stm32: Fix copyright

Bhawanpreet Lakha (1):
      drm/amd/display: Add null check for 24BPP (xfm and dpp)

Bhumika Goyal (3):
      sunrpc: make the function arg as const
      NFSD: make cache_detail structures const
      SUNRPC: make cache_detail structures const

Bjorn Andersson (1):
      mmc: sdhci-msm: Optionally wait for signal level changes

Carlos Maiolino (1):
      xfs: Properly retry failed dquot items in case of error during
buffer writeback

Changbin Du (1):
      drm/i915/gvt: Fix unsafe locking caused by spin_unlock_bh

Chao Yu (1):
      quota: propagate error from __dquot_initialize

Charlene Liu (2):
      drm/amd/display: fix seq issue: turn on clock before programming afmt.
      drm/amd/display: try to find matching audio inst for enc inst first

Chris Wilson (3):
      drm/i915: Clear breadcrumb node when cancelling signaling
      drm/i915: Mark the userptr invalidate workqueue as WQ_MEM_RECLAIM
      drm/i915/fbdev: Serialise early hotplug events with async fbdev config

Christian Borntraeger (1):
      s390/debug: use pK for kernel pointers

Christian König (2):
      drm/amdgpu: don't try to move pinned BOs
      drm/ttm: fix populate_and_map() functions once more

Christoph Hellwig (1):
      move libgcc.h to include/linux

Christophe JAILLET (2):
      bnxt_en: Fix an error handling path in 'bnxt_get_module_eeprom()'
      drm/omap: Fix error handling path in 'omap_dmm_probe()'

Chun-Yeow Yeoh (1):
      mac80211: fix the update of path metric for RANN frame

Cihangir Akturk (1):
      drm: mali-dp: switch to drm_*_get(), drm_*_put() helpers

Colin Ian King (9):
      nvme: fix spelling mistake: "requeing" -> "requeuing"
      drm/i915/gvt: ensure -ve return value is handled correctly
      i2c: i2c-boardinfo: fix memory leaks on devinfo
      ambassador: fix incorrect indentation of assignment statement
      atm: fore200e: use %pK to format kernel addresses instead of %x
      atm: lanai: use %p to format kernel addresses instead of %x
      atm: suni: remove extraneous space to fix indentation
      drm/amd/display: fix memory leaks on error exit return
      net: via: via-rhine: use %p to format void * address instead of %x

Dan Carpenter (1):
      omapdrm: hdmi4_cec: signedness bug in hdmi4_cec_init()

Dan Williams (11):
      mm: fix device-dax pud write-faults triggered by get_user_pages()
      mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITE
      mm: replace pud_write with pud_access_permitted in fault + gup paths
      mm: replace pmd_write with pmd_access_permitted in fault + gup paths
      mm: replace pte_write with pte_access_permitted in fault + gup paths
      mm, hugetlbfs: introduce ->split() to vm_operations_struct
      device-dax: implement ->split() to catch invalid munmap attempts
      mm: introduce get_user_pages_longterm
      mm: fail get_vaddr_frames() for filesystem-dax mappings
      v4l2: disable filesystem-dax mapping support
      IB/core: disable memory registration of filesystem-dax vmas

Darrick J. Wong (5):
      xfs: always free inline data before resetting inode fork during ifree
      xfs: log recovery should replay deferred ops in order
      xfs: ubsan fixes
      xfs: remove unused parameter from xfs_writepage_map
      xfs: scrub inode mode properly

Dave Airlie (1):
      drm/ttm: don't attempt to use hugepages if dma32 requested (v2)

Dave Martin (1):
      arm64: fpsimd: Fix failure to restore FPSIMD state after signals

David Disseldorp (1):
      null_blk: fix dev->badblocks leak

David Hildenbrand (1):
      KVM: x86: fix em_fxstor() sleeping while in atomic

David Howells (14):
      rxrpc: The mutex lock returned by rxrpc_accept_call() needs releasing
      rxrpc: Don't set upgrade by default in sendmsg()
      rxrpc: Provide a different lockdep key for call->user_mutex for
kernel calls
      rxrpc: Delay terminal ACK transmission on a client call
      rxrpc: Split the call params from the operation params
      rxrpc: Fix call timeouts
      rxrpc: Don't transmit DELAY ACKs immediately on proposal
      rxrpc: Express protocol timeouts in terms of RTT
      rxrpc: Add a timeout for detecting lost ACKs/lost DATA
      rxrpc: Add keepalive for a call
      rxrpc: Fix service endpoint expiry
      rxrpc: Fix conn expiry timers
      afs: Fix permit refcounting
      afs: Properly reset afs_vnode (inode) fields

David S. Miller (1):
      sparc64: Fix boot on T4 and later.

David Sterba (2):
      btrfs: add missing device::flush_bio puts
      btrfs: dev_alloc_list is not protected by RCU, use normal list_del

Dmitry V. Levin (1):
      uapi: fix linux/kfd_ioctl.h userspace compilation errors

Dmytro Laktyushkin (3):
      drm/amd/display: fix split recout calculation
      drm/amd/display: fix split recout offset
      drm/amd/display: fix split viewport rounding error

Dr. David Alan Gilbert (2):
      KVM: lapic: Split out x2apic ldr calculation
      KVM: lapic: Fixup LDR on load in x2apic

Eduardo Otubo (1):
      xen-netfront: remove warning when unloading module

Eric Anholt (1):
      drm/bridge: Fix lvds-encoder since the panel_bridge rework.

Eric Dumazet (1):
      net/packet: fix a race in packet_bind() and packet_notifier()

Eric Sandeen (3):
      xfs: fix leaks on corruption errors in xfs_bmap.c
      xfs: fix uninitialized variable in xfs_scrub_quota
      xfs: calculate correct offset in xfs_scrub_quota_item

Eric Yang (1):
      drm/amd/display: Add timing validation against dongle cap

Eyal Moscovici (1):
      KVM: x86: Allow suppressing prints on RDMSR/WRMSR of unhandled MSRs

Felix Kuehling (2):
      drm/amdgpu: Fix SDMA load/unload sequence on HWS disabled mode
      drm/amdkfd: Fix SDMA oversubsription handling

Filipe Manana (3):
      Btrfs: move definition of the function btrfs_find_new_delalloc_bytes
      Btrfs: fix reported number of inode blocks after buffered append writes
      Btrfs: incremental send, fix wrong unlink path after renaming file

Geert Uytterhoeven (1):
      net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit

Greg Kroah-Hartman (19):
      s390: block: add SPDX identifiers to the remaining files
      s390: crypto: add SPDX identifiers to the remaining files
      s390: cio: add SPDX identifiers to the remaining files
      s390: char: add SPDX identifiers to the remaining files
      s390: net: add SPDX identifiers to the remaining files
      s390: scsi: zfcp_aux: add SPDX identifier
      s390: virtio: add SPDX identifiers to the remaining files
      s390: crypto: Remove redundant license text
      s390: drivers: Remove redundant license text
      s390: kernel: add SPDX identifiers to the remaining files
      s390: crypto: add SPDX identifiers to the remaining files
      s390: mm: add SPDX identifiers to the remaining files
      s390: pci: add SPDX identifiers to the remaining files
      s390: appldata: add SPDX identifiers to the remaining files
      s390: add SPDX identifiers to the remaining files
      s390: kernel: Remove redundant license text
      s390: include: Remove redundant license text
      s390: crypto: Remove redundant license text
      s390: Remove redundant license text

Gustavo A R Silva (1):
      i40e/virtchnl: fix application of sizeof to pointer

Gustavo A. R. Silva (1):
      net: openvswitch: datapath: fix data type in queue_gso_packets

Hans Verkuil (1):
      drm/bridge: adv7511/33: Fix adv7511_cec_init() failure handling

Hans de Goede (4):
      ACPI / bus: Leave modalias empty for devices which are not present
      drm/i915: Fix false-positive assert_rpm_wakelock_held in
i915_pmic_bus_access_notifier v2
      drm/i915: Re-register PMIC bus access notifier on runtime resume
      i2c: i801: Fix Failed to allocate irq -2147483648 error

Harald Freudenberger (1):
      s390/zcrypt: Fix wrong comparison leading to strange load balancing

Harry Wentland (6):
      drm/amd/display: Fix amdgpu_dm bugs found by smatch
      drm/amd/display: Bunch of smatch error and warning fixes in DC
      drm/amd/display: Fix use before NULL check in validate_timing
      drm/amd/display: Fix hubp check in set_cursor_position
      drm/amd/display: Fix potential NULL and mem leak in create_links
      drm/amd/display: Fix couple more inconsistent NULL checks in dc_resource

Heiko Carstens (2):
      s390: rework __switch_to() to allow larger task_struct offsets
      s390/disassembler: remove confusing code

Heiner Kallweit (2):
      eeprom: at24: fix reading from 24MAC402/24MAC602
      eeprom: at24: check at24_read/write arguments

Hersen Wu (2):
      drm/amd/display: Handle as MST first and then DP dongle if sink
support both
      drm/amd/display: USB-C / thunderbolt dock specific workaround

Huacai Chen (1):
      bcache: Fix building error on MIPS

Hyong-Youb Kim (1):
      myri10ge: Update MAINTAINERS

Ian Kent (2):
      autofs: revert "autofs: take more care to not update last_used
on path walk"
      autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"

Israel Rukshin (1):
      nvme-rdma: Use mr pool

Jakub Kicinski (1):
      cls_bpf: don't decrement net's refcount when offload fails

James Hogan (1):
      cpufreq: Add Loongson machine dependencies

James Smart (1):
      nvmet-fc: correct ref counting error when deferred rcv used

Jan H. Schönherr (1):
      KVM: Let KVM_SET_SIGNAL_MASK work as advertised

Janakarajan Natarajan (1):
      KVM: x86: Fix CPUID function for word 6 (80000001_ECX)

Jean Delvare (1):
      hwmon: Drop reference to Jean's tree

Jeff Layton (1):
      reiserfs: remove unneeded i_version bump

Jeff Lien (1):
      nvme-pci: add quirk for delay before CHK RDY for WDC SN200

Jens Axboe (2):
      nvme-fc: don't use bit masks for set/test_bit() numbers
      blktrace: fix trace mutex deadlock

Jerry (Fangzhi) Zuo (1):
      drm/amd/display: Check aux channel before MST resume

Jesse Chan (1):
      cpufreq: mediatek: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE

Jiang Biao (1):
      fs/mbcache.c: make count_objects() more robust

Jinbum Park (1):
      arm64: pgd: Mark pgd_cache as __ro_after_init

Jiri Pirko (1):
      net: sched: cbq: create block for q->link.block

Johannes Berg (2):
      cfg80211: select CRYPTO_SHA256 if needed
      mac80211: use QoS NDP for AP probing

John Johansen (1):
      apparmor: fix oops in audit_signal_cb hook

Jon Maloy (1):
      tipc: eliminate access after delete in group_filter_msg()

Joonas Lahtinen (1):
      drm/i915: Disable THP until we have a GPU read BW W/A

Jordan Lazare (1):
      drm/amd/display: Revert noisy assert messages

Jorgen Hansen (2):
      VSOCK: Don't call vsock_stream_has_data in atomic context
      VSOCK: Don't set sk_state to TCP_CLOSE before testing it

Josef Bacik (2):
      btrfs: clear space cache inode generation always
      btrfs: fix deadlock when writing out space cache

Kai-Heng Feng (1):
      nvme-pci: disable APST on Samsung SSD 960 EVO + ASUS PRIME B350M-A

Kees Cook (1):
      exec: avoid RLIMIT_STACK races with prlimit()

Keith Busch (2):
      nvme: Fix NULL dereference on reservation request
      nvme: Suppress static analyis warning

Kirill A. Shutemov (3):
      mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
      mm, thp: Do not make pmd/pud dirty without a reason
      mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine

Laurent Pinchart (1):
      drm: omapdrm: Fix DPI on platforms using the DSI VDDS

Leo (Sunpeng) Li (3):
      drm/amd/display: Should disable when new stream is null
      drm/amd/display: Do DC mode-change check when adding CRTCs
      drm/amd/display: Do not put drm_atomic_state on resume

Leo Liu (1):
      drm/amdgpu: move UVD/VCE and VCN structure out from union

Linus Torvalds (6):
      Rename superblock flags (MS_xyz -> SB_xyz)
      proc: don't report kernel addresses in /proc/<pid>/stack
      Revert "mm, thp: Do not make pmd/pud dirty without a reason"
      kallsyms: take advantage of the new '%px' format
      vsprintf: don't use 'restricted_pointer()' when not restricting
      Linux 4.15-rc2

Liran Alon (6):
      KVM: x86: pvclock: Handle first-time write to pvclock-page
contains random junk
      KVM: nVMX/nSVM: Don't intercept #UD when running L2
      KVM: x86: Exit to user-mode on #UD intercept when emulator requires
      KVM: x86: emulator: Return to user-mode on L1 CPL=0 emulation failure
      KVM: x86: Don't re-execute instruction when not passing CR2 value
      KVM: nVMX: Fix vmx_check_nested_events() return value in case an
event was reinjected to L2

Liu Bo (3):
      Btrfs: add write_flags for compression bio
      Btrfs: bail out gracefully rather than BUG_ON
      Btrfs: fix list_add corruption and soft lockups in fsync

Liu, Changcheng (1):
      scripts/faddr2line: extend usage on generic arch

Liviu Dudau (3):
      drm: hdlcd: Update PM code to save/restore console.
      drm: mali-dp: Separate static internal data into a read-only structure.
      drm: mali-dp: Disable planes when their CRTC gets disabled.

Lucas Stach (2):
      drm/atomic: make drm_atomic_helper_wait_for_vblanks more agressive
      drm/imx: always call wait_for_flip_done in commit_tail

Lv Zheng (1):
      ACPI / EC: Fix regression related to PM ops support in ECDT device

Maarten Lankhorst (2):
      drm/vblank: Pass crtc_id to page_flip_ioctl.
      drm/fb_helper: Disable all crtc's when initial setup fails.

Mahesh Salgaonkar (1):
      powerpc/powernv: Fix kexec crashes caused by tlbie tracing

Marcos Paulo de Souza (1):
      blktrace: Use blk_trace_bio_get_cgid inside blk_add_trace_bio

Mark Rutland (1):
      arm64: mm: cleanup stale AIVIVT references

Martin Schwidefsky (4):
      s390: fix alloc_pgste check in init_new_context again
      s390: sthyi: add SPDX identifiers to the remaining files
      s390: revert ELF_ET_DYN_BASE base changes
      s390/gs: add compat regset for the guarded storage broadcast control block

Max Gurtovoy (1):
      nvme-rdma: fix memory leak during queue allocation

Michael Ellerman (1):
      powerpc/kexec: Fix kexec/kdump in P9 guest kernels

Michael Lyle (1):
      bcache: check return value of register_shrinker

Michal Hocko (3):
      xfs: fortify xfs_alloc_buftarg error handling
      mm, memory_hotplug: do not back off draining pcp free pages from
kworker context
      Revert "mm/page-writeback.c: print a warning if the vm dirtiness
settings are illogical"

Michel Dänzer (2):
      drm/amdgpu: Set adev->vcn.irq.num_types for VCN
      drm/amdgpu: Use unsigned ring indices in amdgpu_queue_mgr_map

Mika Westerberg (1):
      net: thunderbolt: Stop using zero to mean no valid DMA mapping

Mike Kravetz (1):
      mm/cma: fix alloc_contig_range ret code/potential leak

Mike Maloney (1):
      packet: fix crash in fanout_demux_rollover()

Mikulas Patocka (1):
      block: remove useless assignment in bio_split

Minwoo Im (2):
      nvme-pci: avoid hmb desc array idx out-of-bound when hmmaxd set.
      nvme-pci: fix NULL pointer dereference in nvme_free_host_mem()

Mirza Krak (1):
      drm/rockchip: dw-mipi-dsi: fix possible un-balanced runtime PM enable

Nadav Amit (1):
      fs/hugetlbfs/inode.c: change put_page/unlock_page order in
hugetlbfs_fallocate()

Naofumi Honda (1):
      nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat

Nikita Leshenko (5):
      KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
      KVM: x86: ioapic: Don't fire level irq when Remote IRR set
      KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq
      KVM: x86: ioapic: Clear Remote IRR when entry is switched to
edge-triggered
      KVM: x86: ioapic: Preserve read-only values in the redirection table

Nikolay Borisov (1):
      btrfs: Fix transaction abort during failure in btrfs_rm_dev_item

OGAWA Hirofumi (1):
      fs/fat/inode.c: fix sb_rdonly() change

Oded Gabbay (2):
      microblaze: add missing include to mmu_context_mm.h
      drm/radeon: remove init of CIK VMIDs 8-16 for amdkfd

Olof Johansson (8):
      RISC-V: use generic serial.h
      RISC-V: use RISCV_{INT,SHORT} instead of {INT,SHORT} for asm macros
      RISC-V: io.h: type fixes for warnings
      RISC-V: move empty_zero_page definition to C and export it
      RISC-V: Export some expected symbols for modules
      RISC-V: Provide stub of setup_profiling_timer()
      RISC-V: Use define for get_cycles like other architectures
      RISC-V: Add missing include

Ondrej Mosnáček (1):
      crypto: skcipher - Fix skcipher_walk_aead_common

Palmer Dabbelt (11):
      RISC-V: Remove __vdso_cmpxchg{32,64} symbol versions
      RISC-V: Remove unused arguments from ATOMIC_OP
      RISC-V: Comment on why {,cmp}xchg is ordered how it is
      RISC-V: Remove __smp_bp__{before,after}_atomic
      RISC-V: Remove smb_mb__{before,after}_spinlock()
      RISC-V: __test_and_op_bit_ord should be strongly ordered
      RISC-V: Add READ_ONCE in arch_spin_is_locked()
      RISC-V: `sfence.vma` orderes the instruction cache
      RISC-V: remove spin_unlock_wait()
      RISC-V: Clean up an unused include
      RISC-V: __io_writes should respect the length argument

Paolo Abeni (1):
      sch_sfq: fix null pointer dereference at timer expiration

Paolo Bonzini (2):
      KVM: x86: inject exceptions produced by x86_decode_insn
      KVM: vmx: use X86_CR4_UMIP and X86_FEATURE_UMIP

Paul Mackerras (1):
      KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT
guests on radix hosts

Peter Rosin (1):
      hwmon: (jc42) optionally try to disable the SMBUS timeout

Peter Ujfalusi (1):
      omapdrm: hdmi4: Correct the SoC revision matching

Petr Machata (4):
      mlxsw: spectrum_router: Offload decap only for up tunnels
      mlxsw: spectrum_router: Demote tunnels on VRF migration
      mlxsw: spectrum_router: Handle encap to demoted tunnels
      mlxsw: spectrum_router: Update nexthop RIF on update

Pierre-Hugues Husson (1):
      drm/bridge: synopsys/dw-hdmi: Enable cec clock

Qu Wenruo (2):
      btrfs: Fix wild memory access in compression level parser
      btrfs: tree-checker: Fix false panic for sanity test

Randy Dunlap (2):
      block: genhd.c: fix message typo
      drm/amdkfd: fix amdkfd use-after-free GP fault

Robert Lippert (1):
      hwmon: (pmbus) Use 64bit math for DIRECT format values

Roman Kapl (1):
      net: sched: crash on blocks with goto chain action

Roman Li (2):
      drm/amd/display: Fix S3 topology change
      drm/amd/display: fix gamma setting

Rui Hua (1):
      bcache: recover data from backing when data is clean

Russell King (1):
      ARM: avoid faulting on qemu

Sagi Grimberg (7):
      nvme-fabrics: introduce init command check for a queue that is not alive
      nvme-fc: check if queue is ready in queue_rq
      nvme-loop: check if queue is ready in queue_rq
      nvme-rdma: don't suppress send completions
      nvme-rdma: don't complete requests before a send work request
has completed
      nvme-rdma: wait for local invalidation before completing a request
      nvme-rdma: Check remotely invalidated rkey matches our expected rkey

Sara Sharon (1):
      mac80211: tear down RX aggregations first

Sasha Neftin (1):
      e1000e: fix the use of magic numbers for buffer overrun issue

Shakeel Butt (1):
      mm, memcg: fix mem_cgroup_swapout() for THPs

Shirish S (1):
      drm/amd/display: check plane state before validating fbc

Srishti Sharma (2):
      drm/arm: Replace instances of drm_dev_unref with drm_dev_put.
      drm/arm: Replace instances of drm_dev_unref with drm_dev_put.

Stefan Schake (1):
      drm/vc4: Account for interrupts in flight

Stephan Mueller (2):
      crypto: algif_aead - skip SGL entries with NULL page
      crypto: af_alg - remove locking in async callback

Stephen Hemminger (1):
      uapi: add SPDX identifier to vm_sockets_diag.h

Sunil Goutham (1):
      net: thunderx: Fix TCP/UDP checksum offload for IPv6 pkts

Takashi Iwai (1):
      Revert "ALSA: usb-audio: Fix potential zero-division at parsing FU"

Tang Junhui (1):
      bcache: add a comment in journal bucket reading

Tetsuo Handa (1):
      quota: Check for register_shrinker() failure.

Thomas Meyer (1):
      auxdisplay: img-ascii-lcd: Only build on archs that have IOMEM

Thomas Richter (1):
      s390/topology: fix compile error in file arch/s390/kernel/smp.c

Tobin C. Harding (5):
      docs: correct documentation for %pK
      vsprintf: refactor %pK code out of pointer()
      printk: hash addresses printed with %p
      vsprintf: add printk specifier %px
      kasan: use %px to print addresses instead of %p

Trond Myklebust (11):
      nfsd: Fix stateid races between OPEN and CLOSE
      nfsd: Fix another OPEN stateid race
      nfsd: CLOSE SHOULD return the invalid special stateid for NFSv4.x (x>0)
      nfsd: Ensure we don't recognise lock stateids after freeing them
      nfsd4: move find_lock_stateid
      nfsd: Fix race in lock stateid creation
      nfsd: Ensure we check stateid validity in the seqid operation checks
      nfsd: Fix races with check_stateid_generation()
      NFSv4: Ensure gcc 4.4.4 can compile initialiser for "invalid_stateid"
      SUNRPC: Allow connect to return EHOSTUNREACH
      SUNRPC: Handle ENETDOWN errors

Ulf Hansson (1):
      mmc: sdhci: Avoid swiotlb buffer being full

Vaibhav Jain (3):
      cxl: Check if vphb exists before iterating over AFU devices
      powerpc: Avoid signed to unsigned conversion in set_thread_tidr()
      powerpc: Do not assign thread.tidr if already assigned

Vasily Averin (9):
      nfsd: remove net pointer from debug messages
      lockd: remove net pointer from messages
      grace: replace BUG_ON by WARN_ONCE in exit_net hook
      lockd: added cleanup checks in exit_net hook
      lockd: lost rollback of set_grace_period() in lockd_down_net()
      race of lockd inetaddr notifiers vs nlmsvc_rqst change
      race of nfsd inetaddr notifiers vs nn->nfsd_serv change
      nlm_shutdown_hosts_net() cleanup
      lockd: fix "list_add double add" caused by legacy signal interface

Vasily Gorbik (1):
      s390/disassembler: correct disassembly lines alignment

Vasyl Gomonovych (1):
      lmc: Use memdup_user() as a cleanup

Ville Syrjälä (4):
      drm/edid: Don't send non-zero YQ in AVI infoframe for HDMI 1.x sinks
      drm/i915: Fix init_clock_gating for resume
      drm/i915: Don't try indexed reads to alternate slave addresses
      drm/i915: Prevent zero length "index" write

Vitor Massaru Iha (1):
      drm: Fix checkpatch issue: "WARNING: braces {} are not necessary
for single statement blocks."

Vivien Didelot (1):
      net: dsa: fix 'increment on 0' warning

Wang Nan (1):
      mm, oom_reaper: gather each vma to prevent leaking TLB entry

Wanpeng Li (6):
      KVM: X86: Fix operand/address-size during instruction decoding
      KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry
      KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure
      KVM: X86: Fix softlockup when get the current kvmclock
      KVM: VMX: Fix rflags cache during vCPU reset
      KVM: VMX: Fix vmx->nested freeing when no SMI handler

Weinan Li (1):
      drm/i915/gvt: remove skl_misc_ctl_write handler

Will Deacon (1):
      arm64: context: Fix comments and remove pointless smp_wmb()

Xiaolin Zhang (1):
      drm/i915/gvt: enabled pipe A default on creating vgpu

Xin Long (11):
      sctp: use sizeof(__u16) for each stream number length instead of
magic number
      sctp: only allow the out stream reset when the stream outq is empty
      sctp: only allow the asoc reset when the asoc outq is empty
      sctp: avoid flushing unsent queue when doing asoc reset
      sctp: set sender next_tsn for the old result with ctsn_ack_point plus 1
      sctp: force SCTP_ERROR_INV_STRM with __u32 when calling sctp_chunk_fail
      sctp: force the params with right types for sctp csum apis
      sctp: remove extern from stream sched
      sctp: use right member as the param of list_for_each_entry
      bonding: use nla_get_u64 to extract the value for
IFLA_BOND_AD_ACTOR_SYSTEM
      vxlan: use __be32 type for the param vni in __vxlan_fdb_delete

Xiong Zhang (1):
      drm/i915/gvt: Correct ADDR_4K/2M/1G_MASK definition

Xu YiPing (1):
      arm64: perf: remove unsupported events for Cortex-A73

Yan Markman (1):
      net: mvpp2: do not disable GMAC padding

Yisheng Xie (1):
      kmemleak: add scheduling point to kmemleak_scan()

Yury Norov (1):
      arm64: cpu_ops: Add missing 'const' qualifiers

Zhu Yanjun (1):
      forcedeth: replace pci_unmap_page with dma_unmap_page

Zi Yan (1):
      mm: migrate: fix an incorrect call of prep_transhuge_page()

chenjie (1):
      mm/madvise.c: fix madvise() infinite loop under special circumstances

fred gao (1):
      drm/i915/gvt: Move request alloc to dispatch_workload path only

shaoyunl (1):
      drm/amdkfd: Fix SDMA ring buffer size calculation

weiping zhang (7):
      bdi: convert bdi_debug_register to int
      bdi: add error handle for bdi_debug_register
      block: add WARN_ON if bdi register fail
      blk-wbt: remove duplicated setting in wbt_init
      blk-sysfs: remove NULL pointer checking in queue_wb_lat_store
      blk-wbt: move wbt_clear_stat to common place in wbt_done
      blk-wbt: fix comments typo

zhangliping (1):
      openvswitch: fix the incorrect flow action alloc size

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-03 16:22 Linux 4.15-rc2 Linus Torvalds
@ 2017-12-04 22:25 ` Rafael J. Wysocki
  2017-12-04 22:36   ` Linus Torvalds
  2018-02-21 18:36 ` Linux 4.15-rc2 Eugene Syromiatnikov
  1 sibling, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-04 22:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, x86

On Sunday, December 3, 2017 5:22:56 PM CET Linus Torvalds wrote:
> It's Sunday, but a few hours earlier than usual, since I'm on the east
> coast, three hours ahead of my normal release schedule.
> 
> It's a slightly bigger rc2 than I would have wished for, but this
> early in the release process I don't worry about it. The appended
> shortlog gives the details, it's fixes all over the place -
> architectures, drivers, filesystems, networking, core kernel.
> 
> One thing I'll point out is that I'm trying to get some kernel ASLR
> leaks plugged, and as part of that we now hash any pointers printed by
> "%p' by default. That won't affect a lot of people, but where it is a
> debugging problem (rather than leaking interesting kernel pointers),
> we will have to fix things up.
> 
> It can be a small annoyance, but the alternatives (trying to actually
> find all the cases where we might be leaking) were worse. But let's
> see if anybody even notices - a lot of the pointer printouts are stale
> debug information from when some driver was originally written, and
> aren't actually really interesting.
> 
> There will probably be some more leak fixes during this rc process,
> we'll see how that all sorts out.

So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
systems I have tested, so it is probably safe to assume it to be
broken everywhere.

I'm quite confident that this is not something that went in through the
PM tree, because I was running those changes on the systems that turn
out to be broken now.

It looks like the the ACPI waking vector mechanism stopped working, so
I'm suspecting some x86 changes having to do with virtual-to-physical
address mapping.

I've just started bisection.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-04 22:25 ` Linux 4.15-rc2: Regression in resume from ACPI S3 Rafael J. Wysocki
@ 2017-12-04 22:36   ` Linus Torvalds
  2017-12-04 22:38     ` Thomas Gleixner
  2017-12-06 12:15     ` Michal Hocko
  0 siblings, 2 replies; 83+ messages in thread
From: Linus Torvalds @ 2017-12-04 22:36 UTC (permalink / raw)
  To: Rafael J. Wysocki, Andy Lutomirski
  Cc: Linux Kernel Mailing List, the arch/x86 maintainers

On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> systems I have tested, so it is probably safe to assume it to be
> broken everywhere.

Oh, it's definitely not broken everywhere, because I use it myself,
and was traveling last week due to my mom's bday.

HOWEVER.

Some of the x86 work seems to have broken it for some configurations.
In particular, do you have a big "everything enabled" kernel config -
particularly lockdep and irqflags tracing enabled?

Andy has a patch, but it hasn't made it to me yet (probably because
the x86 people are very busy with the kaiser work):

    https://lkml.org/lkml/2017/11/30/546

(also note his follow-up "fix the commit message" note, but that one
doesn't actually affect the code itself).

Does that patch fix it for  you?

          Linus

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-04 22:36   ` Linus Torvalds
@ 2017-12-04 22:38     ` Thomas Gleixner
  2017-12-04 22:41       ` Rafael J. Wysocki
  2017-12-06 12:15     ` Michal Hocko
  1 sibling, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-04 22:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Mon, 4 Dec 2017, Linus Torvalds wrote:

> On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > systems I have tested, so it is probably safe to assume it to be
> > broken everywhere.
> 
> Oh, it's definitely not broken everywhere, because I use it myself,
> and was traveling last week due to my mom's bday.
> 
> HOWEVER.
> 
> Some of the x86 work seems to have broken it for some configurations.
> In particular, do you have a big "everything enabled" kernel config -
> particularly lockdep and irqflags tracing enabled?
> 
> Andy has a patch, but it hasn't made it to me yet (probably because
> the x86 people are very busy with the kaiser work):

Picking it up right now.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-04 22:38     ` Thomas Gleixner
@ 2017-12-04 22:41       ` Rafael J. Wysocki
  2017-12-05  0:25         ` Rafael J. Wysocki
  0 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-04 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Linus Torvalds
  Cc: Andy Lutomirski, Linux Kernel Mailing List, the arch/x86 maintainers

On Monday, December 4, 2017 11:38:54 PM CET Thomas Gleixner wrote:
> On Mon, 4 Dec 2017, Linus Torvalds wrote:
> 
> > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > systems I have tested, so it is probably safe to assume it to be
> > > broken everywhere.
> > 
> > Oh, it's definitely not broken everywhere, because I use it myself,
> > and was traveling last week due to my mom's bday.
> > 
> > HOWEVER.
> > 
> > Some of the x86 work seems to have broken it for some configurations.
> > In particular, do you have a big "everything enabled" kernel config -
> > particularly lockdep and irqflags tracing enabled?
> > 
> > Andy has a patch, but it hasn't made it to me yet (probably because
> > the x86 people are very busy with the kaiser work):

This definitely fixes the problem at least on one of the affected machines.
 
> Picking it up right now.

Cool, thanks!

Rafael

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-04 22:41       ` Rafael J. Wysocki
@ 2017-12-05  0:25         ` Rafael J. Wysocki
  2017-12-09 10:33           ` Pavel Machek
  0 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-05  0:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Monday, December 4, 2017 11:41:06 PM CET Rafael J. Wysocki wrote:
> On Monday, December 4, 2017 11:38:54 PM CET Thomas Gleixner wrote:
> > On Mon, 4 Dec 2017, Linus Torvalds wrote:
> > 
> > > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > >
> > > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > > systems I have tested, so it is probably safe to assume it to be
> > > > broken everywhere.
> > > 
> > > Oh, it's definitely not broken everywhere, because I use it myself,
> > > and was traveling last week due to my mom's bday.
> > > 
> > > HOWEVER.
> > > 
> > > Some of the x86 work seems to have broken it for some configurations.
> > > In particular, do you have a big "everything enabled" kernel config -
> > > particularly lockdep and irqflags tracing enabled?
> > > 
> > > Andy has a patch, but it hasn't made it to me yet (probably because
> > > the x86 people are very busy with the kaiser work):
> 
> This definitely fixes the problem at least on one of the affected machines.

I can confirm that the Andy's patch fixes it on all systems that had this
issue here.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-04 22:36   ` Linus Torvalds
  2017-12-04 22:38     ` Thomas Gleixner
@ 2017-12-06 12:15     ` Michal Hocko
  2017-12-06 12:23       ` Thomas Gleixner
                         ` (2 more replies)
  1 sibling, 3 replies; 83+ messages in thread
From: Michal Hocko @ 2017-12-06 12:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Mon 04-12-17 14:36:20, Linus Torvalds wrote:
> On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > systems I have tested, so it is probably safe to assume it to be
> > broken everywhere.
> 
> Oh, it's definitely not broken everywhere, because I use it myself,
> and was traveling last week due to my mom's bday.
> 
> HOWEVER.
> 
> Some of the x86 work seems to have broken it for some configurations.
> In particular, do you have a big "everything enabled" kernel config -
> particularly lockdep and irqflags tracing enabled?
> 
> Andy has a patch, but it hasn't made it to me yet (probably because
> the x86 people are very busy with the kaiser work):
> 
>     https://lkml.org/lkml/2017/11/30/546
> 
> (also note his follow-up "fix the commit message" note, but that one
> doesn't actually affect the code itself).

merging tip/x86/urgent on top of your tree fixed this problem for me,
but I am seeing something else
[  131.711412] ACPI: Preparing to enter system sleep state S3
[  131.755328] ACPI: EC: event blocked
[  131.755328] ACPI: EC: EC stopped
[  131.755328] PM: Saving platform NVS memory
[  131.755344] Disabling non-boot CPUs ...
[  131.779330] IRQ 124: no longer affine to CPU1
[  131.780334] smpboot: CPU 1 is now offline
[  131.804465] smpboot: CPU 2 is now offline
[  131.827291] IRQ 122: no longer affine to CPU3
[  131.827292] IRQ 123: no longer affine to CPU3
[  131.828293] smpboot: CPU 3 is now offline
[  131.830991] ACPI: Low-level resume complete
[  131.831092] ACPI: EC: EC started
[  131.831093] PM: Restoring platform NVS memory
[  131.831864] do_IRQ: 0.55 No irq handler for vector
[  131.831884] Enabling non-boot CPUs ...
[  131.831909] x86: Booting SMP configuration:
[  131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
[  131.832913]  cache: parent cpu1 should not be sleeping
[  131.833058] CPU1 is up
[  131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
[  131.833864]  cache: parent cpu2 should not be sleeping
[  131.833983] CPU2 is up
[  131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
[  131.834776]  cache: parent cpu3 should not be sleeping
[  131.834923] CPU3 is up

"No irq handler" part looks a bit scary (maybe related to lost affinity
messages?) but the following messages look quite as well. Is this
something known? The system seems to be up and running without any
visible issues.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-06 12:15     ` Michal Hocko
@ 2017-12-06 12:23       ` Thomas Gleixner
  2017-12-06 14:04         ` Rafael J. Wysocki
  2017-12-06 12:31       ` Maarten Lankhorst
  2017-12-07  7:55       ` Michal Hocko
  2 siblings, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-06 12:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linus Torvalds, Rafael J. Wysocki, Andy Lutomirski,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Wed, 6 Dec 2017, Michal Hocko wrote:
> merging tip/x86/urgent on top of your tree fixed this problem for me,
> but I am seeing something else
> [  131.711412] ACPI: Preparing to enter system sleep state S3
> [  131.755328] ACPI: EC: event blocked
> [  131.755328] ACPI: EC: EC stopped
> [  131.755328] PM: Saving platform NVS memory
> [  131.755344] Disabling non-boot CPUs ...
> [  131.779330] IRQ 124: no longer affine to CPU1
> [  131.780334] smpboot: CPU 1 is now offline
> [  131.804465] smpboot: CPU 2 is now offline
> [  131.827291] IRQ 122: no longer affine to CPU3
> [  131.827292] IRQ 123: no longer affine to CPU3
> [  131.828293] smpboot: CPU 3 is now offline
> [  131.830991] ACPI: Low-level resume complete
> [  131.831092] ACPI: EC: EC started
> [  131.831093] PM: Restoring platform NVS memory
> [  131.831864] do_IRQ: 0.55 No irq handler for vector

Hmm, that's really odd.

> [  131.831884] Enabling non-boot CPUs ...
> [  131.831909] x86: Booting SMP configuration:
> [  131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [  131.832913]  cache: parent cpu1 should not be sleeping

This is an old one. 

> [  131.833058] CPU1 is up
> [  131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> [  131.833864]  cache: parent cpu2 should not be sleeping
> [  131.833983] CPU2 is up
> [  131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> [  131.834776]  cache: parent cpu3 should not be sleeping
> [  131.834923] CPU3 is up
> 
> "No irq handler" part looks a bit scary (maybe related to lost affinity
> messages?) but the following messages look quite as well. Is this
> something known? The system seems to be up and running without any
> visible issues.

I assume it's due to the affinity break, just that we don't know right now
on which CPU that do_IRQ() message triggered. I assume it's CPU0 because
the others are offline already, but ....

I'll think about it how we can figure out what's going on.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-06 12:15     ` Michal Hocko
  2017-12-06 12:23       ` Thomas Gleixner
@ 2017-12-06 12:31       ` Maarten Lankhorst
  2017-12-06 12:46         ` Thomas Gleixner
  2017-12-07  7:55       ` Michal Hocko
  2 siblings, 1 reply; 83+ messages in thread
From: Maarten Lankhorst @ 2017-12-06 12:31 UTC (permalink / raw)
  To: Michal Hocko, Linus Torvalds
  Cc: Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers

Op 06-12-17 om 13:15 schreef Michal Hocko:
> On Mon 04-12-17 14:36:20, Linus Torvalds wrote:
>> On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>> So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
>>> systems I have tested, so it is probably safe to assume it to be
>>> broken everywhere.
>> Oh, it's definitely not broken everywhere, because I use it myself,
>> and was traveling last week due to my mom's bday.
>>
>> HOWEVER.
>>
>> Some of the x86 work seems to have broken it for some configurations.
>> In particular, do you have a big "everything enabled" kernel config -
>> particularly lockdep and irqflags tracing enabled?
>>
>> Andy has a patch, but it hasn't made it to me yet (probably because
>> the x86 people are very busy with the kaiser work):
>>
>>     https://lkml.org/lkml/2017/11/30/546
>>
>> (also note his follow-up "fix the commit message" note, but that one
>> doesn't actually affect the code itself).
> merging tip/x86/urgent on top of your tree fixed this problem for me,
> but I am seeing something else
> [  131.711412] ACPI: Preparing to enter system sleep state S3
> [  131.755328] ACPI: EC: event blocked
> [  131.755328] ACPI: EC: EC stopped
> [  131.755328] PM: Saving platform NVS memory
> [  131.755344] Disabling non-boot CPUs ...
> [  131.779330] IRQ 124: no longer affine to CPU1
> [  131.780334] smpboot: CPU 1 is now offline
> [  131.804465] smpboot: CPU 2 is now offline
> [  131.827291] IRQ 122: no longer affine to CPU3
> [  131.827292] IRQ 123: no longer affine to CPU3
> [  131.828293] smpboot: CPU 3 is now offline
> [  131.830991] ACPI: Low-level resume complete
> [  131.831092] ACPI: EC: EC started
> [  131.831093] PM: Restoring platform NVS memory
> [  131.831864] do_IRQ: 0.55 No irq handler for vector
> [  131.831884] Enabling non-boot CPUs ...
> [  131.831909] x86: Booting SMP configuration:
> [  131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [  131.832913]  cache: parent cpu1 should not be sleeping
> [  131.833058] CPU1 is up
> [  131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> [  131.833864]  cache: parent cpu2 should not be sleeping
> [  131.833983] CPU2 is up
> [  131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> [  131.834776]  cache: parent cpu3 should not be sleeping
> [  131.834923] CPU3 is up
>
> "No irq handler" part looks a bit scary (maybe related to lost affinity
> messages?) but the following messages look quite as well. Is this
> something known? The system seems to be up and running without any
> visible issues.

Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
Symptoms are similar..

~Maarten

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-06 12:31       ` Maarten Lankhorst
@ 2017-12-06 12:46         ` Thomas Gleixner
  2017-12-06 13:09           ` Maarten Lankhorst
  0 siblings, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-06 12:46 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Michal Hocko, Linus Torvalds, Rafael J. Wysocki, Andy Lutomirski,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
> Op 06-12-17 om 13:15 schreef Michal Hocko:
> >
> > "No irq handler" part looks a bit scary (maybe related to lost affinity
> > messages?) but the following messages look quite as well. Is this
> > something known? The system seems to be up and running without any
> > visible issues.
> 
> Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
> Symptoms are similar..

Well, the spurious interrupt is one thing, but you obviously lose
interrupts for some reason.

Did you ever manage to get the data out which I asked for?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-06 12:46         ` Thomas Gleixner
@ 2017-12-06 13:09           ` Maarten Lankhorst
  2017-12-06 14:15             ` Thomas Gleixner
  0 siblings, 1 reply; 83+ messages in thread
From: Maarten Lankhorst @ 2017-12-06 13:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Michal Hocko, Linus Torvalds, Rafael J. Wysocki, Andy Lutomirski,
	Linux Kernel Mailing List, the arch/x86 maintainers

Op 06-12-17 om 13:46 schreef Thomas Gleixner:
> On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
>> Op 06-12-17 om 13:15 schreef Michal Hocko:
>>> "No irq handler" part looks a bit scary (maybe related to lost affinity
>>> messages?) but the following messages look quite as well. Is this
>>> something known? The system seems to be up and running without any
>>> visible issues.
>> Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
>> Symptoms are similar..
> Well, the spurious interrupt is one thing, but you obviously lose
> interrupts for some reason.
>
> Did you ever manage to get the data out which I asked for?
>
> Thanks,
>
> 	tglx
>
Yes, sent this out about an hour ago

https://lkml.org/lkml/2017/12/6/215

Cheers,

Maarten

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-06 12:23       ` Thomas Gleixner
@ 2017-12-06 14:04         ` Rafael J. Wysocki
  0 siblings, 0 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-06 14:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Michal Hocko, Linus Torvalds, Andy Lutomirski,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Wednesday, December 6, 2017 1:23:34 PM CET Thomas Gleixner wrote:
> On Wed, 6 Dec 2017, Michal Hocko wrote:
> > merging tip/x86/urgent on top of your tree fixed this problem for me,
> > but I am seeing something else
> > [  131.711412] ACPI: Preparing to enter system sleep state S3
> > [  131.755328] ACPI: EC: event blocked
> > [  131.755328] ACPI: EC: EC stopped
> > [  131.755328] PM: Saving platform NVS memory
> > [  131.755344] Disabling non-boot CPUs ...
> > [  131.779330] IRQ 124: no longer affine to CPU1
> > [  131.780334] smpboot: CPU 1 is now offline
> > [  131.804465] smpboot: CPU 2 is now offline
> > [  131.827291] IRQ 122: no longer affine to CPU3
> > [  131.827292] IRQ 123: no longer affine to CPU3
> > [  131.828293] smpboot: CPU 3 is now offline
> > [  131.830991] ACPI: Low-level resume complete
> > [  131.831092] ACPI: EC: EC started
> > [  131.831093] PM: Restoring platform NVS memory
> > [  131.831864] do_IRQ: 0.55 No irq handler for vector
> 
> Hmm, that's really odd.
> 
> > [  131.831884] Enabling non-boot CPUs ...
> > [  131.831909] x86: Booting SMP configuration:
> > [  131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > [  131.832913]  cache: parent cpu1 should not be sleeping
> 
> This is an old one. 
> 
> > [  131.833058] CPU1 is up
> > [  131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> > [  131.833864]  cache: parent cpu2 should not be sleeping
> > [  131.833983] CPU2 is up
> > [  131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> > [  131.834776]  cache: parent cpu3 should not be sleeping
> > [  131.834923] CPU3 is up
> > 
> > "No irq handler" part looks a bit scary (maybe related to lost affinity
> > messages?) but the following messages look quite as well. Is this
> > something known? The system seems to be up and running without any
> > visible issues.
> 
> I assume it's due to the affinity break, just that we don't know right now
> on which CPU that do_IRQ() message triggered. I assume it's CPU0 because
> the others are offline already, but ....

This is resume from S3, so the firmware might do something odd to the other
CPUs, but in case it didn't (which is quite likely or we would have seen more
of these messages), they are offline and in mwait_play_dead(), so IMO it is
safe to assume that this was CPU0.

And this appears to have happened at the atch_suspend_enable_irqs() time,
which is just local_irq_enable() on x86 running on CPU0.

> I'll think about it how we can figure out what's going on.

It looks like an interrupt that have triggered right after we've enabled
interrupts on the boot CPU.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-06 13:09           ` Maarten Lankhorst
@ 2017-12-06 14:15             ` Thomas Gleixner
  2017-12-07 13:33               ` Maarten Lankhorst
  0 siblings, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-06 14:15 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Michal Hocko, Linus Torvalds, Rafael J. Wysocki, Andy Lutomirski,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
> Op 06-12-17 om 13:46 schreef Thomas Gleixner:
> > On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
> >> Op 06-12-17 om 13:15 schreef Michal Hocko:
> >>> "No irq handler" part looks a bit scary (maybe related to lost affinity
> >>> messages?) but the following messages look quite as well. Is this
> >>> something known? The system seems to be up and running without any
> >>> visible issues.
> >> Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
> >> Symptoms are similar..
> > Well, the spurious interrupt is one thing, but you obviously lose
> > interrupts for some reason.
> >
> > Did you ever manage to get the data out which I asked for?
> >
> > Thanks,
> >
> > 	tglx
> >
> Yes, sent this out about an hour ago
> 
> https://lkml.org/lkml/2017/12/6/215

Weird. Did not reach me

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-06 12:15     ` Michal Hocko
  2017-12-06 12:23       ` Thomas Gleixner
  2017-12-06 12:31       ` Maarten Lankhorst
@ 2017-12-07  7:55       ` Michal Hocko
  2017-12-10 20:30         ` Michal Hocko
  2 siblings, 1 reply; 83+ messages in thread
From: Michal Hocko @ 2017-12-07  7:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Wed 06-12-17 13:14:52, Michal Hocko wrote:
> On Mon 04-12-17 14:36:20, Linus Torvalds wrote:
> > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > systems I have tested, so it is probably safe to assume it to be
> > > broken everywhere.
> > 
> > Oh, it's definitely not broken everywhere, because I use it myself,
> > and was traveling last week due to my mom's bday.
> > 
> > HOWEVER.
> > 
> > Some of the x86 work seems to have broken it for some configurations.
> > In particular, do you have a big "everything enabled" kernel config -
> > particularly lockdep and irqflags tracing enabled?
> > 
> > Andy has a patch, but it hasn't made it to me yet (probably because
> > the x86 people are very busy with the kaiser work):
> > 
> >     https://lkml.org/lkml/2017/11/30/546
> > 
> > (also note his follow-up "fix the commit message" note, but that one
> > doesn't actually affect the code itself).
> 
> merging tip/x86/urgent on top of your tree fixed this problem for me,
> but I am seeing something else
> [  131.711412] ACPI: Preparing to enter system sleep state S3
> [  131.755328] ACPI: EC: event blocked
> [  131.755328] ACPI: EC: EC stopped
> [  131.755328] PM: Saving platform NVS memory
> [  131.755344] Disabling non-boot CPUs ...
> [  131.779330] IRQ 124: no longer affine to CPU1
> [  131.780334] smpboot: CPU 1 is now offline
> [  131.804465] smpboot: CPU 2 is now offline
> [  131.827291] IRQ 122: no longer affine to CPU3
> [  131.827292] IRQ 123: no longer affine to CPU3
> [  131.828293] smpboot: CPU 3 is now offline
> [  131.830991] ACPI: Low-level resume complete
> [  131.831092] ACPI: EC: EC started
> [  131.831093] PM: Restoring platform NVS memory
> [  131.831864] do_IRQ: 0.55 No irq handler for vector
> [  131.831884] Enabling non-boot CPUs ...
> [  131.831909] x86: Booting SMP configuration:
> [  131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [  131.832913]  cache: parent cpu1 should not be sleeping
> [  131.833058] CPU1 is up
> [  131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> [  131.833864]  cache: parent cpu2 should not be sleeping
> [  131.833983] CPU2 is up
> [  131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> [  131.834776]  cache: parent cpu3 should not be sleeping
> [  131.834923] CPU3 is up
> 
> "No irq handler" part looks a bit scary (maybe related to lost affinity
> messages?) but the following messages look quite as well. Is this
> something known? The system seems to be up and running without any
> visible issues.

Hmm, there is still something bad going on during resume. My laptop
haven't woken up from s2ram this morning. The screen was powered on
but the system hasn't come up.

The last thing that made it into the kernel log on fs is this

Dec  6 19:32:29 tiehlicka kernel: [21898.084685] PM: suspend entry (deep)

which won't tell us much I suspect. I've tried dozen s2ram cycles and it
hasn't reproduced so it smells like a timing issue.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-06 14:15             ` Thomas Gleixner
@ 2017-12-07 13:33               ` Maarten Lankhorst
  2017-12-08 10:30                 ` Thomas Gleixner
  0 siblings, 1 reply; 83+ messages in thread
From: Maarten Lankhorst @ 2017-12-07 13:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Michal Hocko, Linus Torvalds, Rafael J. Wysocki, Andy Lutomirski,
	Linux Kernel Mailing List, the arch/x86 maintainers

Op 06-12-17 om 15:15 schreef Thomas Gleixner:
> On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
>> Op 06-12-17 om 13:46 schreef Thomas Gleixner:
>>> On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
>>>> Op 06-12-17 om 13:15 schreef Michal Hocko:
>>>>> "No irq handler" part looks a bit scary (maybe related to lost affinity
>>>>> messages?) but the following messages look quite as well. Is this
>>>>> something known? The system seems to be up and running without any
>>>>> visible issues.
>>>> Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
>>>> Symptoms are similar..
>>> Well, the spurious interrupt is one thing, but you obviously lose
>>> interrupts for some reason.
>>>
>>> Did you ever manage to get the data out which I asked for?
>>>
>>> Thanks,
>>>
>>> 	tglx
>>>
>> Yes, sent this out about an hour ago
>>
>> https://lkml.org/lkml/2017/12/6/215
> Weird. Did not reach me
>
But do you have any idea?

~Maarten

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-07 13:33               ` Maarten Lankhorst
@ 2017-12-08 10:30                 ` Thomas Gleixner
  2017-12-13 15:57                   ` Thomas Gleixner
  0 siblings, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-08 10:30 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Michal Hocko, Linus Torvalds, Rafael J. Wysocki, Andy Lutomirski,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Thu, 7 Dec 2017, Maarten Lankhorst wrote:
> Op 06-12-17 om 15:15 schreef Thomas Gleixner:
> > On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
> >> Op 06-12-17 om 13:46 schreef Thomas Gleixner:
> >>> On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
> >>>> Op 06-12-17 om 13:15 schreef Michal Hocko:
> >>>>> "No irq handler" part looks a bit scary (maybe related to lost affinity
> >>>>> messages?) but the following messages look quite as well. Is this
> >>>>> something known? The system seems to be up and running without any
> >>>>> visible issues.
> >>>> Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
> >>>> Symptoms are similar..
> >>> Well, the spurious interrupt is one thing, but you obviously lose
> >>> interrupts for some reason.
> >>>
> >>> Did you ever manage to get the data out which I asked for?
> >>>
> >>> Thanks,
> >>>
> >>> 	tglx
> >>>
> >> Yes, sent this out about an hour ago
> >>
> >> https://lkml.org/lkml/2017/12/6/215
> > Weird. Did not reach me
> >
> But do you have any idea?

Can you please provide the full trace, dmesg and the full output of
.../debug/irq/... ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-05  0:25         ` Rafael J. Wysocki
@ 2017-12-09 10:33           ` Pavel Machek
  2017-12-09 11:41             ` Pavel Machek
       [not found]             ` <CA+55aFw8tuoJ2gcXx3K2sKFf2Y9hXX4naMVQNqGOUivnjwhjkg@mail.gmail.com>
  0 siblings, 2 replies; 83+ messages in thread
From: Pavel Machek @ 2017-12-09 10:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Thomas Gleixner, Linus Torvalds, Andy Lutomirski,
	Linux Kernel Mailing List, the arch/x86 maintainers

[-- Attachment #1: Type: text/plain, Size: 1556 bytes --]

On Tue 2017-12-05 01:25:55, Rafael J. Wysocki wrote:
> On Monday, December 4, 2017 11:41:06 PM CET Rafael J. Wysocki wrote:
> > On Monday, December 4, 2017 11:38:54 PM CET Thomas Gleixner wrote:
> > > On Mon, 4 Dec 2017, Linus Torvalds wrote:
> > > 
> > > > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > >
> > > > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > > > systems I have tested, so it is probably safe to assume it to be
> > > > > broken everywhere.
> > > > 
> > > > Oh, it's definitely not broken everywhere, because I use it myself,
> > > > and was traveling last week due to my mom's bday.
> > > > 
> > > > HOWEVER.
> > > > 
> > > > Some of the x86 work seems to have broken it for some configurations.
> > > > In particular, do you have a big "everything enabled" kernel config -
> > > > particularly lockdep and irqflags tracing enabled?
> > > > 
> > > > Andy has a patch, but it hasn't made it to me yet (probably because
> > > > the x86 people are very busy with the kaiser work):
> > 
> > This definitely fixes the problem at least on one of the affected machines.
> 
> I can confirm that the Andy's patch fixes it on all systems that had this
> issue here.

I believe I have the issue here, too (-next on thinkpad x60). Which
patch is expected to fix it? Let me try recent -next...

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-09 10:33           ` Pavel Machek
@ 2017-12-09 11:41             ` Pavel Machek
       [not found]             ` <CA+55aFw8tuoJ2gcXx3K2sKFf2Y9hXX4naMVQNqGOUivnjwhjkg@mail.gmail.com>
  1 sibling, 0 replies; 83+ messages in thread
From: Pavel Machek @ 2017-12-09 11:41 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Thomas Gleixner, Linus Torvalds, Andy Lutomirski,
	Linux Kernel Mailing List, the arch/x86 maintainers

[-- Attachment #1: Type: text/plain, Size: 1690 bytes --]

On Sat 2017-12-09 11:33:25, Pavel Machek wrote:
> On Tue 2017-12-05 01:25:55, Rafael J. Wysocki wrote:
> > On Monday, December 4, 2017 11:41:06 PM CET Rafael J. Wysocki wrote:
> > > On Monday, December 4, 2017 11:38:54 PM CET Thomas Gleixner wrote:
> > > > On Mon, 4 Dec 2017, Linus Torvalds wrote:
> > > > 
> > > > > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > > >
> > > > > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > > > > systems I have tested, so it is probably safe to assume it to be
> > > > > > broken everywhere.
> > > > > 
> > > > > Oh, it's definitely not broken everywhere, because I use it myself,
> > > > > and was traveling last week due to my mom's bday.
> > > > > 
> > > > > HOWEVER.
> > > > > 
> > > > > Some of the x86 work seems to have broken it for some configurations.
> > > > > In particular, do you have a big "everything enabled" kernel config -
> > > > > particularly lockdep and irqflags tracing enabled?
> > > > > 
> > > > > Andy has a patch, but it hasn't made it to me yet (probably because
> > > > > the x86 people are very busy with the kaiser work):
> > > 
> > > This definitely fixes the problem at least on one of the affected machines.
> > 
> > I can confirm that the Andy's patch fixes it on all systems that had this
> > issue here.
> 
> I believe I have the issue here, too (-next on thinkpad x60). Which
> patch is expected to fix it? Let me try recent -next...

Still there AFAICT.
									Pavel


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
       [not found]             ` <CA+55aFw8tuoJ2gcXx3K2sKFf2Y9hXX4naMVQNqGOUivnjwhjkg@mail.gmail.com>
@ 2017-12-09 22:01               ` Pavel Machek
       [not found]                 ` <CA+55aFySAdiBZhZ0PSDjH5PuvPPcMsBRXbxCkObfm1eY7gHDbQ@mail.gmail.com>
  0 siblings, 1 reply; 83+ messages in thread
From: Pavel Machek @ 2017-12-09 22:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Thomas Gleixner, Andy Lutomirski,
	Linux Kernel Mailing List, the arch/x86 maintainers

[-- Attachment #1: Type: text/plain, Size: 820 bytes --]

Hi!

On Sat 2017-12-09 10:55:14, Linus Torvalds wrote:
> On Dec 9, 2017 02:33, "Pavel Machek" <pavel@ucw.cz> wrote:
> 
> > I believe I have the issue here, too (-next on thinkpad x60). Which
> > patch is expected to fix it? Let me try recent -next...
> 
> 
> It should be fixed in mainline. I don't know if next has picked it
> up yet.

Strange. I was at 4.15-rc1, and suspend worked there (thinkpad x60,
32-bit). It was broken in -next. I updated to current mainline (
4ded3bec65a07343258ed8fd9d46483f032d866f ) and suspend is broken.

It suspends ok, I press Fn button to make it resume, fans spin up but
moon LED is still lit.

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
       [not found]                 ` <CA+55aFySAdiBZhZ0PSDjH5PuvPPcMsBRXbxCkObfm1eY7gHDbQ@mail.gmail.com>
@ 2017-12-10 16:23                   ` Pavel Machek
  2017-12-10 16:37                     ` Linus Torvalds
  0 siblings, 1 reply; 83+ messages in thread
From: Pavel Machek @ 2017-12-10 16:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Thomas Gleixner, Andy Lutomirski,
	Linux Kernel Mailing List, the arch/x86 maintainers

[-- Attachment #1: Type: text/plain, Size: 964 bytes --]

On Sat 2017-12-09 14:47:41, Linus Torvalds wrote:
> On Dec 9, 2017 14:01, "Pavel Machek" <pavel@ucw.cz> wrote:
> 
> 
> Strange. I was at 4.15-rc1, and suspend worked there (thinkpad x60,
> 32-bit). It was broken in -next. I updated to current mainline (
> 4ded3bec65a07343258ed8fd9d46483f032d866f ) and suspend is broken.
> 

Can you do something about html emails? Quoting them doesn't work too well.

> Any chance to bisect it? This doesn't sound like the other problem
> we had.

Let me try...

v4.15-rc2: suspends/resumes ok.
4ded3be: hangs on resume.

Given that between those, there was supposed "fix" for suspend, I
believe I should try reverting that one first. .. if someone can tell
me commit id, that would help.

And yes, if everything else fails, I can probably bisect.

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-10 16:23                   ` Pavel Machek
@ 2017-12-10 16:37                     ` Linus Torvalds
  2017-12-10 18:56                       ` Pavel Machek
  0 siblings, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2017-12-10 16:37 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Rafael J. Wysocki, Thomas Gleixner, Andy Lutomirski,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Dec 10, 2017 at 8:23 AM, Pavel Machek <pavel@ucw.cz> wrote:
>
> Can you do something about html emails? Quoting them doesn't work too well.

Yeah, and they don't show up onlkml either because of rules. I try to
avoid them, but have been more on mobile for various reasons lately
than usual. That should be over and done with after today, though.

>> Any chance to bisect it? This doesn't sound like the other problem
>> we had.
>
> Let me try...
>
> v4.15-rc2: suspends/resumes ok.
> 4ded3be: hangs on resume.
>
> Given that between those, there was supposed "fix" for suspend, I
> believe I should try reverting that one first. .. if someone can tell
> me commit id, that would help.

The fix in there should be 5b06bbcfc2c6 ("x86/power: Fix some ordering
bugs in __restore_processor_context()") so you can certainly see if it
works before that (or just reverting it).

But there are also a few other x86 low-level things there, and that
fix really looks very safe, so I'd almost expect something else to
have triggered your problem. There's less than 500 commits in that
range you have, so a few bisections should narrow it down a lot.

            Linus

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-10 16:37                     ` Linus Torvalds
@ 2017-12-10 18:56                       ` Pavel Machek
  2017-12-10 20:30                         ` Linus Torvalds
  0 siblings, 1 reply; 83+ messages in thread
From: Pavel Machek @ 2017-12-10 18:56 UTC (permalink / raw)
  To: Linus Torvalds, luto, tglx, jarkko.nikula
  Cc: Rafael J. Wysocki, Thomas Gleixner, Andy Lutomirski,
	Linux Kernel Mailing List, the arch/x86 maintainers

[-- Attachment #1: Type: text/plain, Size: 1950 bytes --]

On Sun 2017-12-10 08:37:56, Linus Torvalds wrote:
> On Sun, Dec 10, 2017 at 8:23 AM, Pavel Machek <pavel@ucw.cz> wrote:
> >
> > Can you do something about html emails? Quoting them doesn't work too well.
> 
> Yeah, and they don't show up onlkml either because of rules. I try to
> avoid them, but have been more on mobile for various reasons lately
> than usual. That should be over and done with after today, though.
> 
> >> Any chance to bisect it? This doesn't sound like the other problem
> >> we had.
> >
> > Let me try...
> >
> > v4.15-rc2: suspends/resumes ok.
> > 4ded3be: hangs on resume.
> >
> > Given that between those, there was supposed "fix" for suspend, I
> > believe I should try reverting that one first. .. if someone can tell
> > me commit id, that would help.
> 
> The fix in there should be 5b06bbcfc2c6 ("x86/power: Fix some ordering
> bugs in __restore_processor_context()") so you can certainly see if it
> works before that (or just reverting it).

Revert is easier.

> But there are also a few other x86 low-level things there, and that
> fix really looks very safe, so I'd almost expect something else to
> have triggered your problem. There's less than 500 commits in that
> range you have, so a few bisections should narrow it down a lot.

No, that commit does _look_ pretty suspect to me...

Confirmed, revert fixes it. You see how it moves fix_processor_context
around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
machines exist? Aha.

Which brings me to .. various people do automated testing of
kernel. Testing 32-bit kernel for boot, and both 32-bit and 64-bit for
boot and suspend would be very nice. The last item is not hard, either:

sudo rtcwake -l -m mem -s 5

...should take 10 seconds or so. 

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-07  7:55       ` Michal Hocko
@ 2017-12-10 20:30         ` Michal Hocko
  0 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2017-12-10 20:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Thu 07-12-17 08:55:08, Michal Hocko wrote:
> On Wed 06-12-17 13:14:52, Michal Hocko wrote:
> > On Mon 04-12-17 14:36:20, Linus Torvalds wrote:
> > > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > >
> > > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > > systems I have tested, so it is probably safe to assume it to be
> > > > broken everywhere.
> > > 
> > > Oh, it's definitely not broken everywhere, because I use it myself,
> > > and was traveling last week due to my mom's bday.
> > > 
> > > HOWEVER.
> > > 
> > > Some of the x86 work seems to have broken it for some configurations.
> > > In particular, do you have a big "everything enabled" kernel config -
> > > particularly lockdep and irqflags tracing enabled?
> > > 
> > > Andy has a patch, but it hasn't made it to me yet (probably because
> > > the x86 people are very busy with the kaiser work):
> > > 
> > >     https://lkml.org/lkml/2017/11/30/546
> > > 
> > > (also note his follow-up "fix the commit message" note, but that one
> > > doesn't actually affect the code itself).
> > 
> > merging tip/x86/urgent on top of your tree fixed this problem for me,
> > but I am seeing something else
> > [  131.711412] ACPI: Preparing to enter system sleep state S3
> > [  131.755328] ACPI: EC: event blocked
> > [  131.755328] ACPI: EC: EC stopped
> > [  131.755328] PM: Saving platform NVS memory
> > [  131.755344] Disabling non-boot CPUs ...
> > [  131.779330] IRQ 124: no longer affine to CPU1
> > [  131.780334] smpboot: CPU 1 is now offline
> > [  131.804465] smpboot: CPU 2 is now offline
> > [  131.827291] IRQ 122: no longer affine to CPU3
> > [  131.827292] IRQ 123: no longer affine to CPU3
> > [  131.828293] smpboot: CPU 3 is now offline
> > [  131.830991] ACPI: Low-level resume complete
> > [  131.831092] ACPI: EC: EC started
> > [  131.831093] PM: Restoring platform NVS memory
> > [  131.831864] do_IRQ: 0.55 No irq handler for vector
> > [  131.831884] Enabling non-boot CPUs ...
> > [  131.831909] x86: Booting SMP configuration:
> > [  131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > [  131.832913]  cache: parent cpu1 should not be sleeping
> > [  131.833058] CPU1 is up
> > [  131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> > [  131.833864]  cache: parent cpu2 should not be sleeping
> > [  131.833983] CPU2 is up
> > [  131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> > [  131.834776]  cache: parent cpu3 should not be sleeping
> > [  131.834923] CPU3 is up
> > 
> > "No irq handler" part looks a bit scary (maybe related to lost affinity
> > messages?) but the following messages look quite as well. Is this
> > something known? The system seems to be up and running without any
> > visible issues.
> 
> Hmm, there is still something bad going on during resume. My laptop
> haven't woken up from s2ram this morning. The screen was powered on
> but the system hasn't come up.

It's been few days and I haven't seen this problem again. And I am doing
s2ram all the time...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-10 18:56                       ` Pavel Machek
@ 2017-12-10 20:30                         ` Linus Torvalds
  2017-12-10 20:43                           ` Pavel Machek
                                             ` (2 more replies)
  0 siblings, 3 replies; 83+ messages in thread
From: Linus Torvalds @ 2017-12-10 20:30 UTC (permalink / raw)
  To: Pavel Machek, Zhang Rui
  Cc: Andrew Lutomirski, Thomas Gleixner, Jarkko Nikula,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Sun, Dec 10, 2017 at 10:56 AM, Pavel Machek <pavel@ucw.cz> wrote:
>
> Confirmed, revert fixes it. You see how it moves fix_processor_context
> around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
> machines exist? Aha.

Yeah, people do.

Andy?

> Which brings me to .. various people do automated testing of
> kernel. Testing 32-bit kernel for boot, and both 32-bit and 64-bit for
> boot and suspend would be very nice. The last item is not hard, either:
>
> sudo rtcwake -l -m mem -s 5
>
> ...should take 10 seconds or so.

I'm told 0day does *some* suspend/resume testing, but I think it's
pretty limited, partly because the kinds of machines it primarily
works on don't really support suspend/resume at all. I'm also not sure
just how many of those machines are 32-bit at all..

But I'm adding Zhang Rui to the cc, to see if my recollection is right.

Because you're right, more suspend/resume automated testing would be
good to have. And yes, people test mainly 64-bit these days.

Also, I'm not even sure what the 0day rules are for just plain
mainline. I don't tend to see a lot of breakage reports, even though
I'd expect to. This came in from the x86 trees (and those do their own
tests too, but probably not suspend/resume either), but it hit my tree
fairly soon after going into the x86 -tip trees.

            Linus

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-10 20:30                         ` Linus Torvalds
@ 2017-12-10 20:43                           ` Pavel Machek
  2017-12-10 21:28                             ` Linus Torvalds
  2017-12-10 21:38                           ` [PATCH] Fix resume on x86-32 machines Pavel Machek
  2017-12-11 14:09                           ` Linux 4.15-rc2: Regression in resume from ACPI S3 Zhang Rui
  2 siblings, 1 reply; 83+ messages in thread
From: Pavel Machek @ 2017-12-10 20:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Zhang Rui, Andrew Lutomirski, Thomas Gleixner, Jarkko Nikula,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	the arch/x86 maintainers

[-- Attachment #1: Type: text/plain, Size: 1875 bytes --]

On Sun 2017-12-10 12:30:52, Linus Torvalds wrote:
> On Sun, Dec 10, 2017 at 10:56 AM, Pavel Machek <pavel@ucw.cz> wrote:
> >
> > Confirmed, revert fixes it. You see how it moves fix_processor_context
> > around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
> > machines exist? Aha.
> 
> Yeah, people do.
> 
> Andy?

For the record... this should fix it. Tested on x60. More tests pending.

Signed-off-by: Pavel Machek <pavel@ucw.cz>


diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 5191de1..d59f05f 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -216,8 +216,8 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	write_cr0(ctxt->cr0);
 
 	/*
-	 * now restore the descriptor tables to their proper values
-	 * ltr is done i fix_processor_context().
+	 * Now restore the descriptor tables to their proper values
+	 * ltr is done in fix_processor_context().
 	 */
 #ifdef CONFIG_X86_32
 	load_idt(&ctxt->idt);
@@ -235,8 +235,6 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	wrmsrl(MSR_GS_BASE, ctxt->gs_base);
 #endif
 
-	fix_processor_context();
-
 	/*
 	 * Restore segment registers.  This happens after restoring the GDT
 	 * and LDT, which happen in fix_processor_context().
@@ -252,8 +250,12 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	 */
 	if (boot_cpu_has(X86_FEATURE_SEP))
 		enable_sep_cpu();
+
+	fix_processor_context();
 #else
 /* CONFIG_X86_64 */
+	fix_processor_context();
+
 	asm volatile ("movw %0, %%ds" :: "r" (ctxt->ds));
 	asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
 	asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-10 20:43                           ` Pavel Machek
@ 2017-12-10 21:28                             ` Linus Torvalds
  2017-12-10 21:35                               ` Pavel Machek
  2017-12-12 17:27                               ` Linus Torvalds
  0 siblings, 2 replies; 83+ messages in thread
From: Linus Torvalds @ 2017-12-10 21:28 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Zhang Rui, Andrew Lutomirski, Thomas Gleixner, Jarkko Nikula,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Sun, Dec 10, 2017 at 12:43 PM, Pavel Machek <pavel@ucw.cz> wrote:
>
> For the record... this should fix it. Tested on x60. More tests pending.

This can't be right.

At the very least, now the comment is wrong. And the comment does seem
relevant for 32-bit too:

> -       fix_processor_context();
> -
>         /*
>          * Restore segment registers.  This happens after restoring the GDT
>          * and LDT, which happen in fix_processor_context().

Notice? You've moved down the 32-bit fix_processor_context() call to
after the loadsegment() calls, which smells wrong.

That said, this *all* smells wrong. Why is there a special
fix_processor_context() function at all with different 32-bit and
64-bit behavior? This code is all written to be maximally confusing.

I think this could do with some re-org to make it more logical. That
"some random things done in fix_processor_context(), other random
things done directly in __restore_processor_state()" makes no sense at
all to me. There's no logic to what is done where.

                Linus

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-10 21:28                             ` Linus Torvalds
@ 2017-12-10 21:35                               ` Pavel Machek
  2017-12-12 17:27                               ` Linus Torvalds
  1 sibling, 0 replies; 83+ messages in thread
From: Pavel Machek @ 2017-12-10 21:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Zhang Rui, Andrew Lutomirski, Thomas Gleixner, Jarkko Nikula,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	the arch/x86 maintainers

[-- Attachment #1: Type: text/plain, Size: 1760 bytes --]

On Sun 2017-12-10 13:28:50, Linus Torvalds wrote:
> On Sun, Dec 10, 2017 at 12:43 PM, Pavel Machek <pavel@ucw.cz> wrote:
> >
> > For the record... this should fix it. Tested on x60. More tests pending.
> 
> This can't be right.
> 
> At the very least, now the comment is wrong. And the comment does seem
> relevant for 32-bit too:

Well, take a look at orignal patch. I'm reverting 32-bit code to
v4.15-rc1 version, while keeping 64-bit code at v4.15-rc3
version. Yes, my brain hurts from looking at the code :-(.

In the meantime, I did short test on 64-bit machine. No ill effect observed.

Hmm. Aha. Yes, the comment is wrong... as it was in wrong in -rc1.

> > -       fix_processor_context();
> > -
> >         /*
> >          * Restore segment registers.  This happens after restoring the GDT
> >          * and LDT, which happen in fix_processor_context().
> 
> Notice? You've moved down the 32-bit fix_processor_context() call to
> after the loadsegment() calls, which smells wrong.

Yeah, I did. There's where it was in v4.15-rc1, and that's what ws
working for me. 

> That said, this *all* smells wrong. Why is there a special
> fix_processor_context() function at all with different 32-bit and
> 64-bit behavior? This code is all written to be maximally confusing.
> 
> I think this could do with some re-org to make it more logical. That
> "some random things done in fix_processor_context(), other random
> things done directly in __restore_processor_state()" makes no sense at
> all to me. There's no logic to what is done where.

I have to agree.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH] Fix resume on x86-32 machines
  2017-12-10 20:30                         ` Linus Torvalds
  2017-12-10 20:43                           ` Pavel Machek
@ 2017-12-10 21:38                           ` Pavel Machek
  2017-12-10 21:58                             ` Andy Lutomirski
  2017-12-11 14:09                           ` Linux 4.15-rc2: Regression in resume from ACPI S3 Zhang Rui
  2 siblings, 1 reply; 83+ messages in thread
From: Pavel Machek @ 2017-12-10 21:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Zhang Rui, Andrew Lutomirski, Thomas Gleixner, Jarkko Nikula,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	the arch/x86 maintainers

[-- Attachment #1: Type: text/plain, Size: 2072 bytes --]


After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
(unintentionally?) reordered stuff with respect to
fix_processor_context() on 32-bit and 64-bit. We undo that change on
32-bit.

While we are at it, fix a comment.

Signed-off-by: Pavel Machek <pavel@ucw.cz>
Fixes: 5b06bbcfc2c621da3009da8decb7511500c293ed

diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 5191de1..af7b613 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -216,8 +216,8 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	write_cr0(ctxt->cr0);
 
 	/*
-	 * now restore the descriptor tables to their proper values
-	 * ltr is done i fix_processor_context().
+	 * Now restore the descriptor tables to their proper values
+	 * ltr is done in fix_processor_context().
 	 */
 #ifdef CONFIG_X86_32
 	load_idt(&ctxt->idt);
@@ -235,13 +235,11 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	wrmsrl(MSR_GS_BASE, ctxt->gs_base);
 #endif
 
-	fix_processor_context();
-
+#ifdef CONFIG_X86_32
 	/*
-	 * Restore segment registers.  This happens after restoring the GDT
-	 * and LDT, which happen in fix_processor_context().
+	 * Restore segment registers.
 	 */
-#ifdef CONFIG_X86_32
+
 	loadsegment(es, ctxt->es);
 	loadsegment(fs, ctxt->fs);
 	loadsegment(gs, ctxt->gs);
@@ -252,8 +250,17 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	 */
 	if (boot_cpu_has(X86_FEATURE_SEP))
 		enable_sep_cpu();
+
+	fix_processor_context();
 #else
 /* CONFIG_X86_64 */
+	/*
+	 * Restore segment registers.  This happens after restoring the GDT
+	 * and LDT, which happen in fix_processor_context().
+	 */
+	
+	fix_processor_context();
+
 	asm volatile ("movw %0, %%ds" :: "r" (ctxt->ds));
 	asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
 	asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH] Fix resume on x86-32 machines
  2017-12-10 21:38                           ` [PATCH] Fix resume on x86-32 machines Pavel Machek
@ 2017-12-10 21:58                             ` Andy Lutomirski
  2017-12-10 22:20                               ` Pavel Machek
                                                 ` (2 more replies)
  0 siblings, 3 replies; 83+ messages in thread
From: Andy Lutomirski @ 2017-12-10 21:58 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Linus Torvalds, Zhang Rui, Andrew Lutomirski, Thomas Gleixner,
	Jarkko Nikula, Rafael J. Wysocki, Linux Kernel Mailing List,
	the arch/x86 maintainers



> On Dec 10, 2017, at 1:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
> 
> 
> After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
> (unintentionally?) reordered stuff with respect to
> fix_processor_context() on 32-bit and 64-bit. We undo that change on
> 32-bit.
> 

Can you explain what was wrong with the reordering?  Your patch certainly *looks* incorrect.

I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.

> While we are at it, fix a comment.
> 
> Signed-off-by: Pavel Machek <pavel@ucw.cz>
> Fixes: 5b06bbcfc2c621da3009da8decb7511500c293ed
> 
> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> index 5191de1..af7b613 100644
> --- a/arch/x86/power/cpu.c
> +++ b/arch/x86/power/cpu.c
> @@ -216,8 +216,8 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
>    write_cr0(ctxt->cr0);
> 
>    /*
> -     * now restore the descriptor tables to their proper values
> -     * ltr is done i fix_processor_context().
> +     * Now restore the descriptor tables to their proper values
> +     * ltr is done in fix_processor_context().
>     */
> #ifdef CONFIG_X86_32
>    load_idt(&ctxt->idt);
> @@ -235,13 +235,11 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
>    wrmsrl(MSR_GS_BASE, ctxt->gs_base);
> #endif
> 
> -    fix_processor_context();
> -
> +#ifdef CONFIG_X86_32
>    /*
> -     * Restore segment registers.  This happens after restoring the GDT
> -     * and LDT, which happen in fix_processor_context().
> +     * Restore segment registers.
>     */
> -#ifdef CONFIG_X86_32
> +
>    loadsegment(es, ctxt->es);
>    loadsegment(fs, ctxt->fs);
>    loadsegment(gs, ctxt->gs);
> @@ -252,8 +250,17 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
>     */
>    if (boot_cpu_has(X86_FEATURE_SEP))
>        enable_sep_cpu();
> +
> +    fix_processor_context();
> #else
> /* CONFIG_X86_64 */
> +    /*
> +     * Restore segment registers.  This happens after restoring the GDT
> +     * and LDT, which happen in fix_processor_context().
> +     */
> +    
> +    fix_processor_context();
> +
>    asm volatile ("movw %0, %%ds" :: "r" (ctxt->ds));
>    asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
>    asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));
> 
> 
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] Fix resume on x86-32 machines
  2017-12-10 21:58                             ` Andy Lutomirski
@ 2017-12-10 22:20                               ` Pavel Machek
  2017-12-11  9:25                                 ` Jarkko Nikula
  2017-12-11 14:22                               ` Rafael J. Wysocki
  2017-12-11 15:13                               ` Ingo Molnar
  2 siblings, 1 reply; 83+ messages in thread
From: Pavel Machek @ 2017-12-10 22:20 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Zhang Rui, Andrew Lutomirski, Thomas Gleixner,
	Jarkko Nikula, Rafael J. Wysocki, Linux Kernel Mailing List,
	the arch/x86 maintainers

[-- Attachment #1: Type: text/plain, Size: 3062 bytes --]

On Sun 2017-12-10 13:58:23, Andy Lutomirski wrote:
> > On Dec 10, 2017, at 1:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
> > 
> > 
> > After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
> > (unintentionally?) reordered stuff with respect to
> > fix_processor_context() on 32-bit and 64-bit. We undo that change on
> > 32-bit.
> > 
> 
> Can you explain what was wrong with the reordering?  Your patch certainly *looks* incorrect.
> 

No, I can't, sorry.

> I'm guessing that the real issue is that 32-bit needs %fs restored
>  early for TLS.

Maybe. I can test patches...

I don't think it would be good idea to revert
5b06bbcfc2c621da3009da8decb7511500c293ed, but since it introduced
regression in -rc2, I believe we should fix the regression now, and
then we can try to provide cleaner solution.

As Linus noted, the code is quite "interesting".
									Pavel


> 
> > While we are at it, fix a comment.
> > 
> > Signed-off-by: Pavel Machek <pavel@ucw.cz>
> > Fixes: 5b06bbcfc2c621da3009da8decb7511500c293ed
> > 
> > diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> > index 5191de1..af7b613 100644
> > --- a/arch/x86/power/cpu.c
> > +++ b/arch/x86/power/cpu.c
> > @@ -216,8 +216,8 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> >    write_cr0(ctxt->cr0);
> > 
> >    /*
> > -     * now restore the descriptor tables to their proper values
> > -     * ltr is done i fix_processor_context().
> > +     * Now restore the descriptor tables to their proper values
> > +     * ltr is done in fix_processor_context().
> >     */
> > #ifdef CONFIG_X86_32
> >    load_idt(&ctxt->idt);
> > @@ -235,13 +235,11 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> >    wrmsrl(MSR_GS_BASE, ctxt->gs_base);
> > #endif
> > 
> > -    fix_processor_context();
> > -
> > +#ifdef CONFIG_X86_32
> >    /*
> > -     * Restore segment registers.  This happens after restoring the GDT
> > -     * and LDT, which happen in fix_processor_context().
> > +     * Restore segment registers.
> >     */
> > -#ifdef CONFIG_X86_32
> > +
> >    loadsegment(es, ctxt->es);
> >    loadsegment(fs, ctxt->fs);
> >    loadsegment(gs, ctxt->gs);
> > @@ -252,8 +250,17 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> >     */
> >    if (boot_cpu_has(X86_FEATURE_SEP))
> >        enable_sep_cpu();
> > +
> > +    fix_processor_context();
> > #else
> > /* CONFIG_X86_64 */
> > +    /*
> > +     * Restore segment registers.  This happens after restoring the GDT
> > +     * and LDT, which happen in fix_processor_context().
> > +     */
> > +    
> > +    fix_processor_context();
> > +
> >    asm volatile ("movw %0, %%ds" :: "r" (ctxt->ds));
> >    asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
> >    asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));
> > 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] Fix resume on x86-32 machines
  2017-12-10 22:20                               ` Pavel Machek
@ 2017-12-11  9:25                                 ` Jarkko Nikula
  0 siblings, 0 replies; 83+ messages in thread
From: Jarkko Nikula @ 2017-12-11  9:25 UTC (permalink / raw)
  To: Pavel Machek, Andy Lutomirski
  Cc: Linus Torvalds, Zhang Rui, Andrew Lutomirski, Thomas Gleixner,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	the arch/x86 maintainers

On 12/11/2017 12:20 AM, Pavel Machek wrote:
> On Sun 2017-12-10 13:58:23, Andy Lutomirski wrote:
>>> On Dec 10, 2017, at 1:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
>>>
>>>
>>> After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
>>> (unintentionally?) reordered stuff with respect to
>>> fix_processor_context() on 32-bit and 64-bit. We undo that change on
>>> 32-bit.
>>>
>>
>> Can you explain what was wrong with the reordering?  Your patch certainly *looks* incorrect.
>>
> 
> No, I can't, sorry.
> 
>> I'm guessing that the real issue is that 32-bit needs %fs restored
>>   early for TLS.
> 
> Maybe. I can test patches...
> 
> I don't think it would be good idea to revert
> 5b06bbcfc2c621da3009da8decb7511500c293ed, but since it introduced
> regression in -rc2, I believe we should fix the regression now, and
> then we can try to provide cleaner solution.
> 
I can confirm Pavel's findings. Commit 5b06bbcfc2c6 ("x86/power: Fix 
some ordering bugs in __restore_processor_context()") broke the 
suspend/resume on 32-bit kernel.

v4.15-rc3 works either by reverting the commit or by Pavel's patch. 
Fortunately Pavel's patch still keeps the 64-bit suspend/resume ok.

-- 
Jarkko

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-10 20:30                         ` Linus Torvalds
  2017-12-10 20:43                           ` Pavel Machek
  2017-12-10 21:38                           ` [PATCH] Fix resume on x86-32 machines Pavel Machek
@ 2017-12-11 14:09                           ` Zhang Rui
  2017-12-11 16:28                             ` Andy Lutomirski
  2017-12-12  8:00                             ` Pavel Machek
  2 siblings, 2 replies; 83+ messages in thread
From: Zhang Rui @ 2017-12-11 14:09 UTC (permalink / raw)
  To: Linus Torvalds, Pavel Machek
  Cc: Andrew Lutomirski, Thomas Gleixner, Jarkko Nikula,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Sun, 2017-12-10 at 12:30 -0800, Linus Torvalds wrote:
> On Sun, Dec 10, 2017 at 10:56 AM, Pavel Machek <pavel@ucw.cz> wrote:
> > 
> > 
> > Confirmed, revert fixes it. You see how it moves
> > fix_processor_context
> > around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
> > machines exist? Aha.
> Yeah, people do.
> 
> Andy?
> 
> > 
> > Which brings me to .. various people do automated testing of
> > kernel. Testing 32-bit kernel for boot, and both 32-bit and 64-bit
> > for
> > boot and suspend would be very nice. The last item is not hard,
> > either:
> > 
> > sudo rtcwake -l -m mem -s 5
> > 
> > ...should take 10 seconds or so.
> I'm told 0day does *some* suspend/resume testing, but I think it's
> pretty limited, partly because the kinds of machines it primarily
> works on don't really support suspend/resume at all.

currently, we're running suspend test on 1 platform only, with 64 bit
kernel. suspend test will be enabled on more platforms (laptops) in
next two weeks.

I will check why it does not find the first regression introduced by
ca37e57bbe0c ("x86/entry/64: Add missing irqflags tracing to
native_load_gs_index()").

>  I'm also not sure
> just how many of those machines are 32-bit at all..

for this, I suppose it can be reproduced if we use 32-bit kernel and
rootfs, right? Then it's easier to enable this in 0Day.

thanks,
rui
> 
> But I'm adding Zhang Rui to the cc, to see if my recollection is
> right.
> 
> Because you're right, more suspend/resume automated testing would be
> good to have. And yes, people test mainly 64-bit these days.
> 
> Also, I'm not even sure what the 0day rules are for just plain
> mainline. I don't tend to see a lot of breakage reports, even though
> I'd expect to. This came in from the x86 trees (and those do their
> own
> tests too, but probably not suspend/resume either), but it hit my
> tree
> fairly soon after going into the x86 -tip trees.
> 
>             Linus

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] Fix resume on x86-32 machines
  2017-12-10 21:58                             ` Andy Lutomirski
  2017-12-10 22:20                               ` Pavel Machek
@ 2017-12-11 14:22                               ` Rafael J. Wysocki
  2017-12-11 14:43                                 ` Rafael J. Wysocki
                                                   ` (2 more replies)
  2017-12-11 15:13                               ` Ingo Molnar
  2 siblings, 3 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-11 14:22 UTC (permalink / raw)
  To: Pavel Machek, Andrew Lutomirski
  Cc: Andy Lutomirski, Linus Torvalds, Zhang Rui, Thomas Gleixner,
	Jarkko Nikula, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Sunday, December 10, 2017 10:58:23 PM CET Andy Lutomirski wrote:
> 
> > On Dec 10, 2017, at 1:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
> > 
> > 
> > After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
> > (unintentionally?) reordered stuff with respect to
> > fix_processor_context() on 32-bit and 64-bit. We undo that change on
> > 32-bit.
> > 
> 
> Can you explain what was wrong with the reordering?  Your patch certainly *looks* incorrect.
> 
> I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.

I *think* you are right.

Anyway, that should be easy enough to verify.

Pavel, can you please check if the below change works too?

---
 arch/x86/power/cpu.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-pm/arch/x86/power/cpu.c
===================================================================
--- linux-pm.orig/arch/x86/power/cpu.c
+++ linux-pm/arch/x86/power/cpu.c
@@ -221,6 +221,8 @@ static void notrace __restore_processor_
 	 */
 #ifdef CONFIG_X86_32
 	load_idt(&ctxt->idt);
+
+	loadsegment(fs, ctxt->fs);
 #else
 /* CONFIG_X86_64 */
 	load_idt((const struct desc_ptr *)&ctxt->idt_limit);
@@ -243,7 +245,6 @@ static void notrace __restore_processor_
 	 */
 #ifdef CONFIG_X86_32
 	loadsegment(es, ctxt->es);
-	loadsegment(fs, ctxt->fs);
 	loadsegment(gs, ctxt->gs);
 	loadsegment(ss, ctxt->ss);
 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] Fix resume on x86-32 machines
  2017-12-11 14:22                               ` Rafael J. Wysocki
@ 2017-12-11 14:43                                 ` Rafael J. Wysocki
  2017-12-11 14:59                                 ` Jarkko Nikula
  2017-12-11 18:31                                 ` Linus Torvalds
  2 siblings, 0 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-11 14:43 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Pavel Machek, Andy Lutomirski, Linus Torvalds, Zhang Rui,
	Thomas Gleixner, Jarkko Nikula, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Monday, December 11, 2017 3:22:39 PM CET Rafael J. Wysocki wrote:
> On Sunday, December 10, 2017 10:58:23 PM CET Andy Lutomirski wrote:
> > 
> > > On Dec 10, 2017, at 1:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
> > > 
> > > 
> > > After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
> > > (unintentionally?) reordered stuff with respect to
> > > fix_processor_context() on 32-bit and 64-bit. We undo that change on
> > > 32-bit.
> > > 
> > 
> > Can you explain what was wrong with the reordering?  Your patch certainly *looks* incorrect.
> > 
> > I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
> 
> I *think* you are right.

Hmm.  Don't we need to restore GS on 32-bit before doing the per_cpu() thing
in fix_processor_context()?

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] Fix resume on x86-32 machines
  2017-12-11 14:22                               ` Rafael J. Wysocki
  2017-12-11 14:43                                 ` Rafael J. Wysocki
@ 2017-12-11 14:59                                 ` Jarkko Nikula
  2017-12-11 18:31                                 ` Linus Torvalds
  2 siblings, 0 replies; 83+ messages in thread
From: Jarkko Nikula @ 2017-12-11 14:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Andrew Lutomirski
  Cc: Andy Lutomirski, Linus Torvalds, Zhang Rui, Thomas Gleixner,
	Linux Kernel Mailing List, the arch/x86 maintainers

On 12/11/2017 04:22 PM, Rafael J. Wysocki wrote:
> On Sunday, December 10, 2017 10:58:23 PM CET Andy Lutomirski wrote:
>>
>>> On Dec 10, 2017, at 1:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
>>>
>>>
>>> After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
>>> (unintentionally?) reordered stuff with respect to
>>> fix_processor_context() on 32-bit and 64-bit. We undo that change on
>>> 32-bit.
>>>
>>
>> Can you explain what was wrong with the reordering?  Your patch certainly *looks* incorrect.
>>
>> I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
> 
> I *think* you are right.
> 
> Anyway, that should be easy enough to verify.
> 
> Pavel, can you please check if the below change works too?
> 
> ---
>   arch/x86/power/cpu.c |    3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
Works here on my test machine.

Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] Fix resume on x86-32 machines
  2017-12-10 21:58                             ` Andy Lutomirski
  2017-12-10 22:20                               ` Pavel Machek
  2017-12-11 14:22                               ` Rafael J. Wysocki
@ 2017-12-11 15:13                               ` Ingo Molnar
  2017-12-11 16:26                                 ` Andy Lutomirski
  2 siblings, 1 reply; 83+ messages in thread
From: Ingo Molnar @ 2017-12-11 15:13 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Pavel Machek, Linus Torvalds, Zhang Rui, Andrew Lutomirski,
	Thomas Gleixner, Jarkko Nikula, Rafael J. Wysocki,
	Linux Kernel Mailing List, the arch/x86 maintainers


* Andy Lutomirski <luto@amacapital.net> wrote:

> 
> 
> > On Dec 10, 2017, at 1:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
> > 
> > 
> > After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
> > (unintentionally?) reordered stuff with respect to
> > fix_processor_context() on 32-bit and 64-bit. We undo that change on
> > 32-bit.
> > 
> 
> Can you explain what was wrong with the reordering?  Your patch certainly *looks* incorrect.
> 
> I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.

Does some early percpu primitive need GS as well perhaps?

Might be safest to restore both FS and GS early.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] Fix resume on x86-32 machines
  2017-12-11 15:13                               ` Ingo Molnar
@ 2017-12-11 16:26                                 ` Andy Lutomirski
  0 siblings, 0 replies; 83+ messages in thread
From: Andy Lutomirski @ 2017-12-11 16:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pavel Machek, Linus Torvalds, Zhang Rui, Andrew Lutomirski,
	Thomas Gleixner, Jarkko Nikula, Rafael J. Wysocki,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Mon, Dec 11, 2017 at 7:13 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Andy Lutomirski <luto@amacapital.net> wrote:
>
>>
>>
>> > On Dec 10, 2017, at 1:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
>> >
>> >
>> > After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
>> > (unintentionally?) reordered stuff with respect to
>> > fix_processor_context() on 32-bit and 64-bit. We undo that change on
>> > 32-bit.
>> >
>>
>> Can you explain what was wrong with the reordering?  Your patch certainly *looks* incorrect.
>>
>> I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
>
> Does some early percpu primitive need GS as well perhaps?
>
> Might be safest to restore both FS and GS early.
>

fs needs to be restored early for TLS.  gs needs to be restored early,
maybe, if !X86_32_LAZY_GS -- it's used for stack-protector.  If
X86_32_LAZY_GS, gs must *not* be restored early.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-11 14:09                           ` Linux 4.15-rc2: Regression in resume from ACPI S3 Zhang Rui
@ 2017-12-11 16:28                             ` Andy Lutomirski
  2017-12-12  8:00                             ` Pavel Machek
  1 sibling, 0 replies; 83+ messages in thread
From: Andy Lutomirski @ 2017-12-11 16:28 UTC (permalink / raw)
  To: Zhang Rui
  Cc: Linus Torvalds, Pavel Machek, Andrew Lutomirski, Thomas Gleixner,
	Jarkko Nikula, Rafael J. Wysocki, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Mon, Dec 11, 2017 at 6:09 AM, Zhang Rui <rui.zhang@intel.com> wrote:
> On Sun, 2017-12-10 at 12:30 -0800, Linus Torvalds wrote:
>> On Sun, Dec 10, 2017 at 10:56 AM, Pavel Machek <pavel@ucw.cz> wrote:
>> >
>> >
>> > Confirmed, revert fixes it. You see how it moves
>> > fix_processor_context
>> > around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
>> > machines exist? Aha.
>> Yeah, people do.
>>
>> Andy?
>>
>> >
>> > Which brings me to .. various people do automated testing of
>> > kernel. Testing 32-bit kernel for boot, and both 32-bit and 64-bit
>> > for
>> > boot and suspend would be very nice. The last item is not hard,
>> > either:
>> >
>> > sudo rtcwake -l -m mem -s 5
>> >
>> > ...should take 10 seconds or so.
>> I'm told 0day does *some* suspend/resume testing, but I think it's
>> pretty limited, partly because the kinds of machines it primarily
>> works on don't really support suspend/resume at all.
>
> currently, we're running suspend test on 1 platform only, with 64 bit
> kernel. suspend test will be enabled on more platforms (laptops) in
> next two weeks.
>
> I will check why it does not find the first regression introduced by
> ca37e57bbe0c ("x86/entry/64: Add missing irqflags tracing to
> native_load_gs_index()").
>
>>  I'm also not sure
>> just how many of those machines are 32-bit at all..
>
> for this, I suppose it can be reproduced if we use 32-bit kernel and
> rootfs, right? Then it's easier to enable this in 0Day.
>

Yes.

The 64-bit problem should also be reproducible with rtcwake even in a vm.

Also, on this topic, could make run_tests in
tools/testing/selftests/x86 be added to the rotation as well?  The
testing dir should match the kernel being tested IMO.

> thanks,
> rui
>>
>> But I'm adding Zhang Rui to the cc, to see if my recollection is
>> right.
>>
>> Because you're right, more suspend/resume automated testing would be
>> good to have. And yes, people test mainly 64-bit these days.
>>
>> Also, I'm not even sure what the 0day rules are for just plain
>> mainline. I don't tend to see a lot of breakage reports, even though
>> I'd expect to. This came in from the x86 trees (and those do their
>> own
>> tests too, but probably not suspend/resume either), but it hit my
>> tree
>> fairly soon after going into the x86 -tip trees.
>>
>>             Linus

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] Fix resume on x86-32 machines
  2017-12-11 14:22                               ` Rafael J. Wysocki
  2017-12-11 14:43                                 ` Rafael J. Wysocki
  2017-12-11 14:59                                 ` Jarkko Nikula
@ 2017-12-11 18:31                                 ` Linus Torvalds
  2017-12-11 18:41                                   ` Andy Lutomirski
  2 siblings, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2017-12-11 18:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Andrew Lutomirski, Andy Lutomirski, Zhang Rui,
	Thomas Gleixner, Jarkko Nikula, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Mon, Dec 11, 2017 at 6:22 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Sunday, December 10, 2017 10:58:23 PM CET Andy Lutomirski wrote:
>>
>> I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
>
> I *think* you are right.
>
> Anyway, that should be easy enough to verify.
>
> Pavel, can you please check if the below change works too?

So Jarkko confirmed this works for him, but the more I look at this
crap, the less I like it.

Why do we save fs/ds/es/ss at all on x86-32? Don't they all have fixed
values in the kernel, with %fs being __KERNEL_PERCPU, and the others
being __USER_DS?

Nothing else can possibly be valid, as far as I can tell.

I think we actually leave the user-space percpu segment in %gs (or the
stack canary base), so that one we should actually save/restore, but
I'm getting the feeling that we should just reset the other segment
registers to known values on 32-bit.

Also, why does the 32-bit code do

        loadsegment(es, ctxt->es);

but the 64-bit code does

        asm volatile ("movw %0, %%es" :: "r" (ctxt->es));

And look at that confusion between MSR_GS_BASE and MSR_KERNEL_GS_BASE
all within the 64-bit case.

In particular, note how we reload the %gs segment in between the two -
wouldn't that mess with the currently active gs base if %gs can be
non-zero?

Christ, what a mess.

So I think that whole sequence is garbage. It has been written as some
kind of "save and restore registers", but that's not what it really
then does - or what it should do.

It should make sure to restore a sane kernel state, not some random
register state.

And the 32-bit and 64-bit code really should strive to be at least
_sanely_ different, not this randomly and insanely different mess.

But yes, Rafael's patch looks like the minimal one-liner. But I think
we should do the %gs load early too for the 32-bit stack canary case,
kind of like we need to do %fs for percpu base.

                  Linus

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] Fix resume on x86-32 machines
  2017-12-11 18:31                                 ` Linus Torvalds
@ 2017-12-11 18:41                                   ` Andy Lutomirski
  2017-12-11 19:12                                     ` Linus Torvalds
  2017-12-14 20:38                                     ` Pavel Machek
  0 siblings, 2 replies; 83+ messages in thread
From: Andy Lutomirski @ 2017-12-11 18:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Pavel Machek, Andrew Lutomirski, Zhang Rui,
	Thomas Gleixner, Jarkko Nikula, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Mon, Dec 11, 2017 at 10:31 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Dec 11, 2017 at 6:22 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> On Sunday, December 10, 2017 10:58:23 PM CET Andy Lutomirski wrote:
>>>
>>> I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
>>
>> I *think* you are right.
>>
>> Anyway, that should be easy enough to verify.
>>
>> Pavel, can you please check if the below change works too?
>
> So Jarkko confirmed this works for him, but the more I look at this
> crap, the less I like it.
>
> Why do we save fs/ds/es/ss at all on x86-32? Don't they all have fixed
> values in the kernel, with %fs being __KERNEL_PERCPU, and the others
> being __USER_DS?
>
> Nothing else can possibly be valid, as far as I can tell.
>
> I think we actually leave the user-space percpu segment in %gs (or the
> stack canary base), so that one we should actually save/restore, but
> I'm getting the feeling that we should just reset the other segment
> registers to known values on 32-bit.
>
> Also, why does the 32-bit code do
>
>         loadsegment(es, ctxt->es);
>
> but the 64-bit code does
>
>         asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
>
> And look at that confusion between MSR_GS_BASE and MSR_KERNEL_GS_BASE
> all within the 64-bit case.
>
> In particular, note how we reload the %gs segment in between the two -
> wouldn't that mess with the currently active gs base if %gs can be
> non-zero?
>
> Christ, what a mess.
>
> So I think that whole sequence is garbage. It has been written as some
> kind of "save and restore registers", but that's not what it really
> then does - or what it should do.
>
> It should make sure to restore a sane kernel state, not some random
> register state.
>
> And the 32-bit and 64-bit code really should strive to be at least
> _sanely_ different, not this randomly and insanely different mess.
>
> But yes, Rafael's patch looks like the minimal one-liner. But I think
> we should do the %gs load early too for the 32-bit stack canary case,
> kind of like we need to do %fs for percpu base.

I'll try to get to this in a day or so -- is that okay?  Or should we
do some trivial fix/revert and fix it for real next time around?

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] Fix resume on x86-32 machines
  2017-12-11 18:41                                   ` Andy Lutomirski
@ 2017-12-11 19:12                                     ` Linus Torvalds
  2017-12-14 20:38                                     ` Pavel Machek
  1 sibling, 0 replies; 83+ messages in thread
From: Linus Torvalds @ 2017-12-11 19:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Rafael J. Wysocki, Pavel Machek, Andrew Lutomirski, Zhang Rui,
	Thomas Gleixner, Jarkko Nikula, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Mon, Dec 11, 2017 at 10:41 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> I'll try to get to this in a day or so -- is that okay?  Or should we
> do some trivial fix/revert and fix it for real next time around?

I don't think we want some trivial fix/revert just to keep it working.
This code is too fragile as-is, and I think that "make it work" is
more than reverting. You did fix real issues on x86-64 with odd
segment use, for example.

                Linus

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-11 14:09                           ` Linux 4.15-rc2: Regression in resume from ACPI S3 Zhang Rui
  2017-12-11 16:28                             ` Andy Lutomirski
@ 2017-12-12  8:00                             ` Pavel Machek
  1 sibling, 0 replies; 83+ messages in thread
From: Pavel Machek @ 2017-12-12  8:00 UTC (permalink / raw)
  To: Zhang Rui
  Cc: Linus Torvalds, Andrew Lutomirski, Thomas Gleixner,
	Jarkko Nikula, Rafael J. Wysocki, Linux Kernel Mailing List,
	the arch/x86 maintainers

[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]

Hi!

> > > ...should take 10 seconds or so.
> > I'm told 0day does *some* suspend/resume testing, but I think it's
> > pretty limited, partly because the kinds of machines it primarily
> > works on don't really support suspend/resume at all.
> 
> currently, we're running suspend test on 1 platform only, with 64 bit
> kernel. suspend test will be enabled on more platforms (laptops) in
> next two weeks.

Thanks!

> >  I'm also not sure
> > just how many of those machines are 32-bit at all..
> 
> for this, I suppose it can be reproduced if we use 32-bit kernel and
> rootfs, right? Then it's easier to enable this in 0Day.

Yes, Intel cpus are pretty good at backwards compatibility, and most
problems are not subtle at all. So yes, 32-bit kernel / rootfs on
recent machine should be good for testing.

Best regards,
								Pavel
								
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-10 21:28                             ` Linus Torvalds
  2017-12-10 21:35                               ` Pavel Machek
@ 2017-12-12 17:27                               ` Linus Torvalds
  2017-12-12 18:05                                 ` Andy Lutomirski
  1 sibling, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2017-12-12 17:27 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Zhang Rui, Andrew Lutomirski, Thomas Gleixner, Jarkko Nikula,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Sun, Dec 10, 2017 at 1:28 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> That said, this *all* smells wrong. Why is there a special
> fix_processor_context() function at all with different 32-bit and
> 64-bit behavior? This code is all written to be maximally confusing.

Hmm. Looking a bit more at this, I think it should be solved by:

 - load the original read-write GDT early, along with the IDT.

  We have already saved it off in __save_processor_state(), and it may
have already gotten loaded very early in at least some of the paths,
but it definitely hasn't gotten reloaded in all the paths (think
"suspend/resume testing" etc).

 - add the LDT descriptor to the save area too, exactly like we
already have IDT/GDT.

   Then, we can do "load_ldt()" early (along with IDT and GDT).

 - now we can just load all the segments early, and get the percpu and
stack canary stuff without any special cases

- do NOT use "load_gs_index()", which does that swapgs dance (twice!)
and plays with interrupt state. Just load the segment register, and
then do the wrmsrl() of the {FS,GS,KERNEL_GS}_BASE values. There is no
need for the swapgs dance.

 - now that we have a fully working system, then the
"fix_processor_context()" code can do the "fancy" stuff like setting
up the RO fixmap GDT and re-initializing the TLB state. Those want
percpu stuff.

In other words, why not try to organize this so that we do a simple
and fairly mindless restore of the low-level state first? Avoid all
the "system is halfway up" crud.

Would that work for people? Andy?

               Linus

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-12 17:27                               ` Linus Torvalds
@ 2017-12-12 18:05                                 ` Andy Lutomirski
  2017-12-12 18:36                                   ` Linus Torvalds
  0 siblings, 1 reply; 83+ messages in thread
From: Andy Lutomirski @ 2017-12-12 18:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pavel Machek, Zhang Rui, Andrew Lutomirski, Thomas Gleixner,
	Jarkko Nikula, Rafael J. Wysocki, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Tue, Dec 12, 2017 at 9:27 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sun, Dec 10, 2017 at 1:28 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> That said, this *all* smells wrong. Why is there a special
>> fix_processor_context() function at all with different 32-bit and
>> 64-bit behavior? This code is all written to be maximally confusing.
>
> Hmm. Looking a bit more at this, I think it should be solved by:
>
>  - load the original read-write GDT early, along with the IDT.
>
>   We have already saved it off in __save_processor_state(), and it may
> have already gotten loaded very early in at least some of the paths,
> but it definitely hasn't gotten reloaded in all the paths (think
> "suspend/resume testing" etc).
>
>  - add the LDT descriptor to the save area too, exactly like we
> already have IDT/GDT.
>
>    Then, we can do "load_ldt()" early (along with IDT and GDT).
>
>  - now we can just load all the segments early, and get the percpu and
> stack canary stuff without any special cases
>
> - do NOT use "load_gs_index()", which does that swapgs dance (twice!)
> and plays with interrupt state. Just load the segment register, and
> then do the wrmsrl() of the {FS,GS,KERNEL_GS}_BASE values. There is no
> need for the swapgs dance.

Using what helper?  On x86_64, it can fault, and IIRC we explicitly
don't allow loadsegment(gs, ...).

>
>  - now that we have a fully working system, then the
> "fix_processor_context()" code can do the "fancy" stuff like setting
> up the RO fixmap GDT and re-initializing the TLB state. Those want
> percpu stuff.
>
> In other words, why not try to organize this so that we do a simple
> and fairly mindless restore of the low-level state first? Avoid all
> the "system is halfway up" crud.
>
> Would that work for people? Andy?

Other than the above, more or less.

But we should really do all the user segments last.  They're not at
all needed for normal kernel execution, so I think they should be the
very last part.

I'll try to get to this today.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-12 18:05                                 ` Andy Lutomirski
@ 2017-12-12 18:36                                   ` Linus Torvalds
  2017-12-12 22:10                                     ` Andy Lutomirski
  0 siblings, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2017-12-12 18:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Pavel Machek, Zhang Rui, Thomas Gleixner, Jarkko Nikula,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski <luto@kernel.org> wrote:
>>
>> - do NOT use "load_gs_index()", which does that swapgs dance (twice!)
>> and plays with interrupt state. Just load the segment register, and
>> then do the wrmsrl() of the {FS,GS,KERNEL_GS}_BASE values. There is no
>> need for the swapgs dance.
>
> Using what helper?  On x86_64, it can fault, and IIRC we explicitly
> don't allow loadsegment(gs, ...).

Just do the loadsegment() thing. The fact that we don't have a gs
version of it is legacy - to catch bad users. It shouldn't stop us
from having good users.

That said - can it really fault? Because if it can, then why can't %fs
fault? And on x86-64, we just do

        asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));

and don't actually use 'loadsegment()' for _any_ of the segments.  We
only do the fault protection on 32-bit.

In fact, we really should try to avoid taking faults here anyway,
shouldn't we? We haven't loaded enough of the context yet.

Hmm.

Maybe we should load only the fixed kernel segments at this point, and
then do all the loadsegment() of gs/fs in the later phase when we're
all set up.

THERE we can do the swapgs dance with interrupt tracing etc, because
*there* we actually are fully set up. I guess that means reloading the
FS/GS base MSR's,

              Linus

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-12 18:36                                   ` Linus Torvalds
@ 2017-12-12 22:10                                     ` Andy Lutomirski
  2017-12-12 22:33                                       ` Linus Torvalds
  2017-12-13 11:16                                       ` Jarkko Nikula
  0 siblings, 2 replies; 83+ messages in thread
From: Andy Lutomirski @ 2017-12-12 22:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Pavel Machek, Zhang Rui, Thomas Gleixner,
	Jarkko Nikula, Rafael J. Wysocki, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Tue, Dec 12, 2017 at 10:36 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski <luto@kernel.org> wrote:
>>>
>>> - do NOT use "load_gs_index()", which does that swapgs dance (twice!)
>>> and plays with interrupt state. Just load the segment register, and
>>> then do the wrmsrl() of the {FS,GS,KERNEL_GS}_BASE values. There is no
>>> need for the swapgs dance.
>>
>> Using what helper?  On x86_64, it can fault, and IIRC we explicitly
>> don't allow loadsegment(gs, ...).
>
> Just do the loadsegment() thing. The fact that we don't have a gs
> version of it is legacy - to catch bad users. It shouldn't stop us
> from having good users.
>
> That said - can it really fault? Because if it can, then why can't %fs
> fault? And on x86-64, we just do
>
>         asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));
>
> and don't actually use 'loadsegment()' for _any_ of the segments.  We
> only do the fault protection on 32-bit.
>
> In fact, we really should try to avoid taking faults here anyway,
> shouldn't we? We haven't loaded enough of the context yet.
>
> Hmm.
>
> Maybe we should load only the fixed kernel segments at this point, and
> then do all the loadsegment() of gs/fs in the later phase when we're
> all set up.
>
> THERE we can do the swapgs dance with interrupt tracing etc, because
> *there* we actually are fully set up. I guess that means reloading the
> FS/GS base MSR's,

Like this?

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=cb855aa9679a15adbe43732f5854270de2b35856

I've barely tested it.  It suspended and resumed once in a 64-bit VM.
It compiles on 32-bit.

(That link might not work for a little bit.  I'm not sure what's up.)

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-12 22:10                                     ` Andy Lutomirski
@ 2017-12-12 22:33                                       ` Linus Torvalds
  2017-12-12 23:10                                         ` Andy Lutomirski
  2017-12-13 11:16                                       ` Jarkko Nikula
  1 sibling, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2017-12-12 22:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Pavel Machek, Zhang Rui, Thomas Gleixner, Jarkko Nikula,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Tue, Dec 12, 2017 at 2:10 PM, Andy Lutomirski <luto@kernel.org> wrote:
>
> (That link might not work for a little bit.  I'm not sure what's up.)

I think your link is just bogus.

  https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes

works.

Anyway, the code looks much better to me.

Whether it _works_ is another matter, of course.

            Linus

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-12 22:33                                       ` Linus Torvalds
@ 2017-12-12 23:10                                         ` Andy Lutomirski
  0 siblings, 0 replies; 83+ messages in thread
From: Andy Lutomirski @ 2017-12-12 23:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Pavel Machek, Zhang Rui, Thomas Gleixner,
	Jarkko Nikula, Rafael J. Wysocki, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Tue, Dec 12, 2017 at 2:33 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Dec 12, 2017 at 2:10 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>
>> (That link might not work for a little bit.  I'm not sure what's up.)
>
> I think your link is just bogus.
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes
>
> works.

If I click "commit" on that page, I get my link, and it's still busted.

>
> Anyway, the code looks much better to me.
>
> Whether it _works_ is another matter, of course.
>
>             Linus

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-12 22:10                                     ` Andy Lutomirski
  2017-12-12 22:33                                       ` Linus Torvalds
@ 2017-12-13 11:16                                       ` Jarkko Nikula
  2017-12-13 12:40                                         ` Ingo Molnar
  2017-12-13 18:50                                         ` Andy Lutomirski
  1 sibling, 2 replies; 83+ messages in thread
From: Jarkko Nikula @ 2017-12-13 11:16 UTC (permalink / raw)
  To: Andy Lutomirski, Linus Torvalds
  Cc: Pavel Machek, Zhang Rui, Thomas Gleixner, Rafael J. Wysocki,
	Linux Kernel Mailing List, the arch/x86 maintainers

On 12/13/2017 12:10 AM, Andy Lutomirski wrote:
> On Tue, Dec 12, 2017 at 10:36 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski <luto@kernel.org> wrote:
> Like this?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=cb855aa9679a15adbe43732f5854270de2b35856
> 
> I've barely tested it.  It suspended and resumed once in a 64-bit VM.
> It compiles on 32-bit.
> 
I tested with "rtcwake -s 5 -m mem" 2-3 times on a real machine. Both 
64-bit and 32-bit work.

Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-13 11:16                                       ` Jarkko Nikula
@ 2017-12-13 12:40                                         ` Ingo Molnar
  2017-12-13 18:50                                         ` Andy Lutomirski
  1 sibling, 0 replies; 83+ messages in thread
From: Ingo Molnar @ 2017-12-13 12:40 UTC (permalink / raw)
  To: Jarkko Nikula
  Cc: Andy Lutomirski, Linus Torvalds, Pavel Machek, Zhang Rui,
	Thomas Gleixner, Rafael J. Wysocki, Linux Kernel Mailing List,
	the arch/x86 maintainers


* Jarkko Nikula <jarkko.nikula@linux.intel.com> wrote:

> On 12/13/2017 12:10 AM, Andy Lutomirski wrote:
> > On Tue, Dec 12, 2017 at 10:36 AM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > > On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski <luto@kernel.org> wrote:
> > Like this?
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=cb855aa9679a15adbe43732f5854270de2b35856
> > 
> > I've barely tested it.  It suspended and resumed once in a 64-bit VM.
> > It compiles on 32-bit.
> > 
> I tested with "rtcwake -s 5 -m mem" 2-3 times on a real machine. Both 64-bit
> and 32-bit work.
> 
> Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>

Great, thanks for the testing!

Andy, mind sending the final fix with a changelog and the Reported-by/Tested-by 
tags, etc?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-08 10:30                 ` Thomas Gleixner
@ 2017-12-13 15:57                   ` Thomas Gleixner
  2017-12-13 16:23                     ` Bjorn Helgaas
  0 siblings, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-13 15:57 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Michal Hocko, Linus Torvalds, Rafael J. Wysocki, Andy Lutomirski,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Daniel Vetter, Bjorn Helgaas, Rafael J. Wysocki

So I was finally able to figure out what the hell is going on:

Suspend:

 - The device suspend code puts the graphics card into a power
   state != PCI_D0.

 - Offline non boot CPUs

 - Break interrupt affinity. Allocate new vector on CPU 0, compose and
   write MSI message which ends up in:

   __pci_write_msi_msg(entry, msg)
   {
	if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
	   /* Don't touch the hardware now */
	} else {
	   ....
	}
	entry->msg = *msg;
   }
 
  So because the device is not in PCI_D0 the message is not written. It's
  written in the device resume path.

Resume:
[  139.670446] ACPI: Low-level resume complete
[  139.670541] PM: Restoring platform NVS memory
[  139.672462] do_IRQ: 0.55 No irq handler for vector
[  139.672475] Enabling non-boot CPUs ...

So the spurious interrupt happens early and way before the device resume
code writes the new MSI message.

I checked the behaviour on 4.14. The MSI write is delayed there in the same
way, but there is no spurious interrupt. There is no interrupt coming in at
all _BEFORE_ the device is put out of PCI_D0.

And this has certainly nothing to do with the vector management changes,
but I can't figure yet what makes that spurious interrupt to be sent.

Any ideas welcome.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-13 15:57                   ` Thomas Gleixner
@ 2017-12-13 16:23                     ` Bjorn Helgaas
  2017-12-13 16:41                       ` Thomas Gleixner
  0 siblings, 1 reply; 83+ messages in thread
From: Bjorn Helgaas @ 2017-12-13 16:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Maarten Lankhorst, Michal Hocko, Linus Torvalds,
	Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pci, linux-pm

[+cc linux-pci, linux-pm]

On Wed, Dec 13, 2017 at 04:57:56PM +0100, Thomas Gleixner wrote:
> So I was finally able to figure out what the hell is going on:
> 
> Suspend:
> 
>  - The device suspend code puts the graphics card into a power
>    state != PCI_D0.
> 
>  - Offline non boot CPUs
> 
>  - Break interrupt affinity. Allocate new vector on CPU 0, compose and
>    write MSI message which ends up in:
> 
>    __pci_write_msi_msg(entry, msg)
>    {
> 	if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
> 	   /* Don't touch the hardware now */
> 	} else {
> 	   ....
> 	}
> 	entry->msg = *msg;
>    }
>  
>   So because the device is not in PCI_D0 the message is not written. It's
>   written in the device resume path.

I'm not a PM guru, but this ordering seems fragile.  If we offline
CPUs before re-targeting interrupts directed at those CPUs, aren't we
always going to be at risk of sending interrupts to an offline CPU?

Even if the device is now asleep and therefore should not generate an
interrupt, it seems like there's a window when the device returns to
PCI_D0 where it could generate an interrupt before we have a chance to
update the MSI message.

> Resume:
> [  139.670446] ACPI: Low-level resume complete
> [  139.670541] PM: Restoring platform NVS memory
> [  139.672462] do_IRQ: 0.55 No irq handler for vector
> [  139.672475] Enabling non-boot CPUs ...
> 
> So the spurious interrupt happens early and way before the device resume
> code writes the new MSI message.
> 
> I checked the behaviour on 4.14. The MSI write is delayed there in the same
> way, but there is no spurious interrupt. There is no interrupt coming in at
> all _BEFORE_ the device is put out of PCI_D0.
> 
> And this has certainly nothing to do with the vector management changes,
> but I can't figure yet what makes that spurious interrupt to be sent.
> 
> Any ideas welcome.
> 
> Thanks,
> 
> 	tglx
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-13 16:23                     ` Bjorn Helgaas
@ 2017-12-13 16:41                       ` Thomas Gleixner
  2017-12-13 17:45                         ` Linus Torvalds
  0 siblings, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-13 16:41 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Maarten Lankhorst, Michal Hocko, Linus Torvalds,
	Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pci, linux-pm

On Wed, 13 Dec 2017, Bjorn Helgaas wrote:
> [+cc linux-pci, linux-pm]
> 
> On Wed, Dec 13, 2017 at 04:57:56PM +0100, Thomas Gleixner wrote:
> > So I was finally able to figure out what the hell is going on:
> > 
> > Suspend:
> > 
> >  - The device suspend code puts the graphics card into a power
> >    state != PCI_D0.
> > 
> >  - Offline non boot CPUs
> > 
> >  - Break interrupt affinity. Allocate new vector on CPU 0, compose and
> >    write MSI message which ends up in:
> > 
> >    __pci_write_msi_msg(entry, msg)
> >    {
> > 	if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
> > 	   /* Don't touch the hardware now */
> > 	} else {
> > 	   ....
> > 	}
> > 	entry->msg = *msg;
> >    }
> >  
> >   So because the device is not in PCI_D0 the message is not written. It's
> >   written in the device resume path.
> 
> I'm not a PM guru, but this ordering seems fragile.  If we offline
> CPUs before re-targeting interrupts directed at those CPUs, aren't we
> always going to be at risk of sending interrupts to an offline CPU?
> 
> Even if the device is now asleep and therefore should not generate an
> interrupt, it seems like there's a window when the device returns to
> PCI_D0 where it could generate an interrupt before we have a chance to
> update the MSI message.

Definitely. That was fragile forever but puzzles me is that I can't figure
out what now causes that spurious interrupt to surface out of the blue.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-13 16:41                       ` Thomas Gleixner
@ 2017-12-13 17:45                         ` Linus Torvalds
  2017-12-13 18:19                           ` Thomas Gleixner
  0 siblings, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2017-12-13 17:45 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pci, linux-pm

On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Definitely. That was fragile forever but puzzles me is that I can't figure
> out what now causes that spurious interrupt to surface out of the blue.

Perhaps just timing?

How hard would it be to change the ordering to just redirect irqs first?

                 Linus

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-13 17:45                         ` Linus Torvalds
@ 2017-12-13 18:19                           ` Thomas Gleixner
  2017-12-13 20:52                             ` Thomas Gleixner
  2017-12-13 22:39                             ` Rafael J. Wysocki
  0 siblings, 2 replies; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-13 18:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pci, linux-pm

On Wed, 13 Dec 2017, Linus Torvalds wrote:

> On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > Definitely. That was fragile forever but puzzles me is that I can't figure
> > out what now causes that spurious interrupt to surface out of the blue.
> 
> Perhaps just timing?

That's what I'm trying to figure out right now, because that is the only
sensible explanation left. The whole machinery of suspend is exactly the
same with and without the vector changes. I instrumented all functions
involved and the picture is the same. I even do not see any fundamental
timing differences where one would say: That's it.

What puzzles me even more is that in the range of commits I'm fiddling with
there is no other change than the vector management stuff and the point
where it breaks makes no sense at all. The point Maarten bisected it to
works nicely here, so that might just point to a very subtle timing issue.

> How hard would it be to change the ordering to just redirect irqs first?

The whole interrupt redirection happens when the non boot CPUs are brought
down, which is the very last step before the actual suspend happens.

We could probably do that earlier, but that's something Rafael needs to
answer ultimately.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-13 11:16                                       ` Jarkko Nikula
  2017-12-13 12:40                                         ` Ingo Molnar
@ 2017-12-13 18:50                                         ` Andy Lutomirski
  1 sibling, 0 replies; 83+ messages in thread
From: Andy Lutomirski @ 2017-12-13 18:50 UTC (permalink / raw)
  To: Jarkko Nikula
  Cc: Andy Lutomirski, Linus Torvalds, Pavel Machek, Zhang Rui,
	Thomas Gleixner, Rafael J. Wysocki, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Wed, Dec 13, 2017 at 3:16 AM, Jarkko Nikula
<jarkko.nikula@linux.intel.com> wrote:
> On 12/13/2017 12:10 AM, Andy Lutomirski wrote:
>>
>> On Tue, Dec 12, 2017 at 10:36 AM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>>
>>> On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski <luto@kernel.org>
>>> wrote:
>>
>> Like this?
>>
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=cb855aa9679a15adbe43732f5854270de2b35856
>>
>> I've barely tested it.  It suspended and resumed once in a 64-bit VM.
>> It compiles on 32-bit.
>>
> I tested with "rtcwake -s 5 -m mem" 2-3 times on a real machine. Both 64-bit
> and 32-bit work.
>
> Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>

Thanks!

I've split the patch into three and put your Tested-By on all of them,
since the end state is exactly the same as what you tested.  I'll
email them out once I test them myself :)

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-13 18:19                           ` Thomas Gleixner
@ 2017-12-13 20:52                             ` Thomas Gleixner
  2017-12-13 21:06                               ` Thomas Gleixner
  2017-12-13 22:39                             ` Rafael J. Wysocki
  1 sibling, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-13 20:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pci, linux-pm

On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Linus Torvalds wrote:
> 
> > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > >
> > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > out what now causes that spurious interrupt to surface out of the blue.
> > 
> > Perhaps just timing?
> 
> That's what I'm trying to figure out right now, because that is the only
> sensible explanation left. The whole machinery of suspend is exactly the
> same with and without the vector changes. I instrumented all functions
> involved and the picture is the same. I even do not see any fundamental
> timing differences where one would say: That's it.
> 
> What puzzles me even more is that in the range of commits I'm fiddling with
> there is no other change than the vector management stuff and the point
> where it breaks makes no sense at all. The point Maarten bisected it to
> works nicely here, so that might just point to a very subtle timing issue.

After doing more debugging on this it turns out that this looks like a
legacy interrupt coming in. The vector number is always 55, which is legacy
IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
masked and vector 55 is completely unused.

More questions than answers. Still investigating.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-13 20:52                             ` Thomas Gleixner
@ 2017-12-13 21:06                               ` Thomas Gleixner
  2017-12-13 22:48                                 ` Rafael J. Wysocki
  2017-12-14 11:54                                 ` Thomas Gleixner
  0 siblings, 2 replies; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-13 21:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pci, linux-pm

On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > On Wed, 13 Dec 2017, Linus Torvalds wrote:
> > 
> > > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > >
> > > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > > out what now causes that spurious interrupt to surface out of the blue.
> > > 
> > > Perhaps just timing?
> > 
> > That's what I'm trying to figure out right now, because that is the only
> > sensible explanation left. The whole machinery of suspend is exactly the
> > same with and without the vector changes. I instrumented all functions
> > involved and the picture is the same. I even do not see any fundamental
> > timing differences where one would say: That's it.
> > 
> > What puzzles me even more is that in the range of commits I'm fiddling with
> > there is no other change than the vector management stuff and the point
> > where it breaks makes no sense at all. The point Maarten bisected it to
> > works nicely here, so that might just point to a very subtle timing issue.
> 
> After doing more debugging on this it turns out that this looks like a
> legacy interrupt coming in. The vector number is always 55, which is legacy
> IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
> masked and vector 55 is completely unused.
> 
> More questions than answers. Still investigating.

And it does not explain Maartens report which gets a spurious vector 33 on
CPU4 after the non boot cpus have been brought online again. And that's the
vector which was assigned before the affinity was moved by unplugging CPU4.

Hrmpf. Even more mystery to solve.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-13 18:19                           ` Thomas Gleixner
  2017-12-13 20:52                             ` Thomas Gleixner
@ 2017-12-13 22:39                             ` Rafael J. Wysocki
  2017-12-13 23:26                               ` Rafael J. Wysocki
  1 sibling, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-13 22:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pci, linux-pm

On Wednesday, December 13, 2017 7:19:17 PM CET Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Linus Torvalds wrote:
> 
> > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > >
> > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > out what now causes that spurious interrupt to surface out of the blue.
> > 
> > Perhaps just timing?
> 
> That's what I'm trying to figure out right now, because that is the only
> sensible explanation left. The whole machinery of suspend is exactly the
> same with and without the vector changes. I instrumented all functions
> involved and the picture is the same. I even do not see any fundamental
> timing differences where one would say: That's it.
> 
> What puzzles me even more is that in the range of commits I'm fiddling with
> there is no other change than the vector management stuff and the point
> where it breaks makes no sense at all. The point Maarten bisected it to
> works nicely here, so that might just point to a very subtle timing issue.
> 
> > How hard would it be to change the ordering to just redirect irqs first?
> 
> The whole interrupt redirection happens when the non boot CPUs are brought
> down, which is the very last step before the actual suspend happens.
> 
> We could probably do that earlier, but that's something Rafael needs to
> answer ultimately.

Well, that's both flattering and concerning. ;-)

Anyway, yes, we can do that earlier AFAICS.  Action handlers are not going to
run after we've called suspend_device_irqs() which happens before the final
stage of PCI devices suspend (suspend_noirq) and it doesn't matter which CPU
gets the interrupt from that point on (it is either wakeup or unwanted then).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-13 21:06                               ` Thomas Gleixner
@ 2017-12-13 22:48                                 ` Rafael J. Wysocki
  2017-12-14 11:54                                 ` Thomas Gleixner
  1 sibling, 0 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-13 22:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pci, linux-pm

On Wednesday, December 13, 2017 10:06:40 PM CET Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > > On Wed, 13 Dec 2017, Linus Torvalds wrote:
> > > 
> > > > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > > >
> > > > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > > > out what now causes that spurious interrupt to surface out of the blue.
> > > > 
> > > > Perhaps just timing?
> > > 
> > > That's what I'm trying to figure out right now, because that is the only
> > > sensible explanation left. The whole machinery of suspend is exactly the
> > > same with and without the vector changes. I instrumented all functions
> > > involved and the picture is the same. I even do not see any fundamental
> > > timing differences where one would say: That's it.
> > > 
> > > What puzzles me even more is that in the range of commits I'm fiddling with
> > > there is no other change than the vector management stuff and the point
> > > where it breaks makes no sense at all. The point Maarten bisected it to
> > > works nicely here, so that might just point to a very subtle timing issue.
> > 
> > After doing more debugging on this it turns out that this looks like a
> > legacy interrupt coming in. The vector number is always 55, which is legacy
> > IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
> > masked and vector 55 is completely unused.
> > 
> > More questions than answers. Still investigating.
> 
> And it does not explain Maartens report which gets a spurious vector 33 on
> CPU4 after the non boot cpus have been brought online again. And that's the
> vector which was assigned before the affinity was moved by unplugging CPU4.
> 
> Hrmpf. Even more mystery to solve.

Any chance to look at /proc/interrupts from a machine where that can be
reproduced?

I'm also curious if that can be reproduced by doing CPU offline/online
without suspending?

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-13 22:39                             ` Rafael J. Wysocki
@ 2017-12-13 23:26                               ` Rafael J. Wysocki
  0 siblings, 0 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-13 23:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, Linux PCI, Linux PM

On Wed, Dec 13, 2017 at 11:39 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Wednesday, December 13, 2017 7:19:17 PM CET Thomas Gleixner wrote:
>> On Wed, 13 Dec 2017, Linus Torvalds wrote:
>>
>> > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> > >
>> > > Definitely. That was fragile forever but puzzles me is that I can't figure
>> > > out what now causes that spurious interrupt to surface out of the blue.
>> >
>> > Perhaps just timing?
>>
>> That's what I'm trying to figure out right now, because that is the only
>> sensible explanation left. The whole machinery of suspend is exactly the
>> same with and without the vector changes. I instrumented all functions
>> involved and the picture is the same. I even do not see any fundamental
>> timing differences where one would say: That's it.
>>
>> What puzzles me even more is that in the range of commits I'm fiddling with
>> there is no other change than the vector management stuff and the point
>> where it breaks makes no sense at all. The point Maarten bisected it to
>> works nicely here, so that might just point to a very subtle timing issue.
>>
>> > How hard would it be to change the ordering to just redirect irqs first?
>>
>> The whole interrupt redirection happens when the non boot CPUs are brought
>> down, which is the very last step before the actual suspend happens.
>>
>> We could probably do that earlier, but that's something Rafael needs to
>> answer ultimately.
>
> Well, that's both flattering and concerning. ;-)
>
> Anyway, yes, we can do that earlier AFAICS.  Action handlers are not going to
> run after we've called suspend_device_irqs() which happens before the final
> stage of PCI devices suspend (suspend_noirq) and it doesn't matter which CPU
> gets the interrupt from that point on (it is either wakeup or unwanted then).

There is a catch that we don't and likely should not do that for
suspend-to-idle, but since we have pm_suspend_target_state now, that
case can be distinguished from the "full suspend" one readily.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-13 21:06                               ` Thomas Gleixner
  2017-12-13 22:48                                 ` Rafael J. Wysocki
@ 2017-12-14 11:54                                 ` Thomas Gleixner
  2017-12-14 12:12                                   ` Rafael J. Wysocki
                                                     ` (2 more replies)
  1 sibling, 3 replies; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-14 11:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pci, linux-pm

On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > > On Wed, 13 Dec 2017, Linus Torvalds wrote:
> > > 
> > > > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > > >
> > > > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > > > out what now causes that spurious interrupt to surface out of the blue.
> > > > 
> > > > Perhaps just timing?
> > > 
> > > That's what I'm trying to figure out right now, because that is the only
> > > sensible explanation left. The whole machinery of suspend is exactly the
> > > same with and without the vector changes. I instrumented all functions
> > > involved and the picture is the same. I even do not see any fundamental
> > > timing differences where one would say: That's it.
> > > 
> > > What puzzles me even more is that in the range of commits I'm fiddling with
> > > there is no other change than the vector management stuff and the point
> > > where it breaks makes no sense at all. The point Maarten bisected it to
> > > works nicely here, so that might just point to a very subtle timing issue.
> > 
> > After doing more debugging on this it turns out that this looks like a
> > legacy interrupt coming in. The vector number is always 55, which is legacy
> > IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
> > masked and vector 55 is completely unused.
> > 
> > More questions than answers. Still investigating.

At least that one could be explained by the changes. In the previous
management scheme the IOAPIC interrupts were always allocated even when the
interrupt was not in use. The new scheme does not longer do that because
people complained about the vector waste (16 vectors on each CPU) and it
got rid of all the special casing of IRQ0-15.

So the old scheme silently consumed the spurious vector. I added debug code
to that effect to 4.14 and on that machine IRQ7 is triggered at the same
point post resume and the core code drops it silently because the interrupt
is marked masked and no action assigned.

So the only difference to today is that the new code complains, while the
old one does an extra mask of the already masked IOAPIC pin and silently
returns.

After quite some investigation I found out that its independent of the
graphics thing. That's a genuine issue on that platform which seems to emit
random legacy vectors which were never ever used for unknown reasons. I
verified that both the IOAPIC and the PIC are masked, so they cannot send
crap. Though it turned out that the silly firmware unmasks the PIC and
leaves it that way when it returns from suspend. Now there is a race
whether the kernel resume path manages to mask the PIC again early enough
before something triggers IRQ7 or not. Adding/removing debug code makes the
problem come and go. So I really don't worry about that one and rather
prefer to have the spurious interrupt printed than silently consumed by
chance.

Now the graphics issue is a different story. That only happens on
hibernation after doing the snapshot. There all non boot cpus are onlined
again and after that the devices are 'thawed'. The following reenable of
interrupts fails because i915 is not in PCI_D0 state.

Suspend:

   irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
   __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
   __pci_write_msi_msg: Not written <- Device not in PCI_D0
   ....
   device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [resume]
   pci_pm_resume_noirq <-dpm_run_callback
   pci_pm_resume_noirq <-dpm_run_callback
   pci_pm_default_resume_early <-pci_pm_resume_noirq
   pci_pm_default_resume_early <-pci_pm_resume_noirq
   __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a  <-- Set the new affinity
   device_pm_callback_end: i915 0000:00:02.0, err=0

Hibernate:

   irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
   __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
   __pci_write_msi_msg: Not written <- Device not in PCI_D0
   ....
   device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
   pci_pm_thaw_noirq <-dpm_run_callback
   __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
   __pci_write_msi_msg: Not written  <--- Device is not in PCI_D0
   device_pm_callback_end: i915 0000:00:02.0, err=0

So that code path fails to set the new affinity because at the point where
the MSI msg should be written the device state is != PCI_D0.

Now, what's different vs. 4.14:

The 4.14 code accidentaly had the irq descriptor for this vector still
populated in the old CPU due to the convoluted way the vector allocation
worked. I have still to investigate if one of those cases is actually
leaking the descriptor, which would be a fatal bug.

But the new code does a proper cleanup and does not repopulate it on the
offline CPU. So that unearthes the issue. I'm handing that over to the PM
folks to look at. I got lost in that maze of callbacks.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-14 11:54                                 ` Thomas Gleixner
@ 2017-12-14 12:12                                   ` Rafael J. Wysocki
  2017-12-14 12:30                                     ` Thomas Gleixner
  2017-12-14 13:24                                   ` Linux 4.15-rc2: Regression in resume from ACPI S3 Thomas Gleixner
  2017-12-14 19:03                                   ` Linus Torvalds
  2 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-14 12:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pci, linux-pm

On Thursday, December 14, 2017 12:54:05 PM CET Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > > On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > > > On Wed, 13 Dec 2017, Linus Torvalds wrote:
> > > > 
> > > > > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > > > >
> > > > > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > > > > out what now causes that spurious interrupt to surface out of the blue.
> > > > > 
> > > > > Perhaps just timing?
> > > > 
> > > > That's what I'm trying to figure out right now, because that is the only
> > > > sensible explanation left. The whole machinery of suspend is exactly the
> > > > same with and without the vector changes. I instrumented all functions
> > > > involved and the picture is the same. I even do not see any fundamental
> > > > timing differences where one would say: That's it.
> > > > 
> > > > What puzzles me even more is that in the range of commits I'm fiddling with
> > > > there is no other change than the vector management stuff and the point
> > > > where it breaks makes no sense at all. The point Maarten bisected it to
> > > > works nicely here, so that might just point to a very subtle timing issue.
> > > 
> > > After doing more debugging on this it turns out that this looks like a
> > > legacy interrupt coming in. The vector number is always 55, which is legacy
> > > IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
> > > masked and vector 55 is completely unused.
> > > 
> > > More questions than answers. Still investigating.
> 
> At least that one could be explained by the changes. In the previous
> management scheme the IOAPIC interrupts were always allocated even when the
> interrupt was not in use. The new scheme does not longer do that because
> people complained about the vector waste (16 vectors on each CPU) and it
> got rid of all the special casing of IRQ0-15.
> 
> So the old scheme silently consumed the spurious vector. I added debug code
> to that effect to 4.14 and on that machine IRQ7 is triggered at the same
> point post resume and the core code drops it silently because the interrupt
> is marked masked and no action assigned.
> 
> So the only difference to today is that the new code complains, while the
> old one does an extra mask of the already masked IOAPIC pin and silently
> returns.
> 
> After quite some investigation I found out that its independent of the
> graphics thing. That's a genuine issue on that platform which seems to emit
> random legacy vectors which were never ever used for unknown reasons. I
> verified that both the IOAPIC and the PIC are masked, so they cannot send
> crap. Though it turned out that the silly firmware unmasks the PIC and
> leaves it that way when it returns from suspend. Now there is a race
> whether the kernel resume path manages to mask the PIC again early enough
> before something triggers IRQ7 or not. Adding/removing debug code makes the
> problem come and go. So I really don't worry about that one and rather
> prefer to have the spurious interrupt printed than silently consumed by
> chance.

OK

> Now the graphics issue is a different story. That only happens on
> hibernation after doing the snapshot. There all non boot cpus are onlined
> again and after that the devices are 'thawed'. The following reenable of
> interrupts fails because i915 is not in PCI_D0 state.
> 
> Suspend:
> 
>    irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
>    __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
>    __pci_write_msi_msg: Not written <- Device not in PCI_D0
>    ....
>    device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [resume]
>    pci_pm_resume_noirq <-dpm_run_callback
>    pci_pm_resume_noirq <-dpm_run_callback
>    pci_pm_default_resume_early <-pci_pm_resume_noirq
>    pci_pm_default_resume_early <-pci_pm_resume_noirq
>    __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a  <-- Set the new affinity
>    device_pm_callback_end: i915 0000:00:02.0, err=0

So this works, because we power up the device during resume even if it
had been suspended (via runtime PM) before the suspend started.

> Hibernate:
> 
>    irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
>    __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
>    __pci_write_msi_msg: Not written <- Device not in PCI_D0
>    ....
>    device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
>    pci_pm_thaw_noirq <-dpm_run_callback
>    __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
>    __pci_write_msi_msg: Not written  <--- Device is not in PCI_D0
>    device_pm_callback_end: i915 0000:00:02.0, err=0

And here we try to leave the device alone which is OK for devices in D0,
but not for suspended ones.

It looks like we need to power up them at the "thaw" time too or at least
I don't see how to address that differently.

> So that code path fails to set the new affinity because at the point where
> the MSI msg should be written the device state is != PCI_D0.
> 
> Now, what's different vs. 4.14:
> 
> The 4.14 code accidentaly had the irq descriptor for this vector still
> populated in the old CPU due to the convoluted way the vector allocation
> worked. I have still to investigate if one of those cases is actually
> leaking the descriptor, which would be a fatal bug.
> 
> But the new code does a proper cleanup and does not repopulate it on the
> offline CPU. So that unearthes the issue. I'm handing that over to the PM
> folks to look at. I got lost in that maze of callbacks.

OK, thanks so much for getting to the bottom of this!

Rafael

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-14 12:12                                   ` Rafael J. Wysocki
@ 2017-12-14 12:30                                     ` Thomas Gleixner
  2017-12-14 15:30                                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-14 12:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pci, linux-pm

On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
> On Thursday, December 14, 2017 12:54:05 PM CET Thomas Gleixner wrote:
> > Now the graphics issue is a different story. That only happens on
> > hibernation after doing the snapshot. There all non boot cpus are onlined
> > again and after that the devices are 'thawed'. The following reenable of
> > interrupts fails because i915 is not in PCI_D0 state.
> > 
> > Suspend:
> > 
> >    irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> >    __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> >    __pci_write_msi_msg: Not written <- Device not in PCI_D0
> >    ....
> >    device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [resume]
> >    pci_pm_resume_noirq <-dpm_run_callback
> >    pci_pm_resume_noirq <-dpm_run_callback
> >    pci_pm_default_resume_early <-pci_pm_resume_noirq
> >    pci_pm_default_resume_early <-pci_pm_resume_noirq
> >    __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a  <-- Set the new affinity
> >    device_pm_callback_end: i915 0000:00:02.0, err=0
> 
> So this works, because we power up the device during resume even if it
> had been suspended (via runtime PM) before the suspend started.
> 
> > Hibernate:
> > 
> >    irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> >    __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> >    __pci_write_msi_msg: Not written <- Device not in PCI_D0
> >    ....
> >    device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
> >    pci_pm_thaw_noirq <-dpm_run_callback
> >    __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> >    __pci_write_msi_msg: Not written  <--- Device is not in PCI_D0
> >    device_pm_callback_end: i915 0000:00:02.0, err=0
> 
> And here we try to leave the device alone which is OK for devices in D0,
> but not for suspended ones.
> 
> It looks like we need to power up them at the "thaw" time too or at least
> I don't see how to address that differently.

The question is whether the code which brings the device out of D0 should
write the message unconditionally. That would be sufficient I think.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-14 11:54                                 ` Thomas Gleixner
  2017-12-14 12:12                                   ` Rafael J. Wysocki
@ 2017-12-14 13:24                                   ` Thomas Gleixner
  2017-12-14 19:03                                   ` Linus Torvalds
  2 siblings, 0 replies; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-14 13:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pci, linux-pm

On Thu, 14 Dec 2017, Thomas Gleixner wrote:
> Now, what's different vs. 4.14:
> 
> The 4.14 code accidentaly had the irq descriptor for this vector still
> populated in the old CPU due to the convoluted way the vector allocation
> worked. I have still to investigate if one of those cases is actually
> leaking the descriptor, which would be a fatal bug.

It doesn't leak. It repopulates it at the same place out of sheer luck.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-14 12:30                                     ` Thomas Gleixner
@ 2017-12-14 15:30                                       ` Rafael J. Wysocki
  2017-12-14 15:52                                         ` Thomas Gleixner
  0 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-14 15:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pci, linux-pm

On Thursday, December 14, 2017 1:30:37 PM CET Thomas Gleixner wrote:
> On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
> > On Thursday, December 14, 2017 12:54:05 PM CET Thomas Gleixner wrote:
> > > Now the graphics issue is a different story. That only happens on
> > > hibernation after doing the snapshot. There all non boot cpus are onlined
> > > again and after that the devices are 'thawed'. The following reenable of
> > > interrupts fails because i915 is not in PCI_D0 state.
> > > 
> > > Suspend:
> > > 
> > >    irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> > >    __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> > >    __pci_write_msi_msg: Not written <- Device not in PCI_D0
> > >    ....
> > >    device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [resume]
> > >    pci_pm_resume_noirq <-dpm_run_callback
> > >    pci_pm_resume_noirq <-dpm_run_callback
> > >    pci_pm_default_resume_early <-pci_pm_resume_noirq
> > >    pci_pm_default_resume_early <-pci_pm_resume_noirq
> > >    __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a  <-- Set the new affinity
> > >    device_pm_callback_end: i915 0000:00:02.0, err=0
> > 
> > So this works, because we power up the device during resume even if it
> > had been suspended (via runtime PM) before the suspend started.
> > 
> > > Hibernate:
> > > 
> > >    irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> > >    __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> > >    __pci_write_msi_msg: Not written <- Device not in PCI_D0
> > >    ....
> > >    device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
> > >    pci_pm_thaw_noirq <-dpm_run_callback
> > >    __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> > >    __pci_write_msi_msg: Not written  <--- Device is not in PCI_D0
> > >    device_pm_callback_end: i915 0000:00:02.0, err=0
> > 
> > And here we try to leave the device alone which is OK for devices in D0,
> > but not for suspended ones.
> > 
> > It looks like we need to power up them at the "thaw" time too or at least
> > I don't see how to address that differently.
> 
> The question is whether the code which brings the device out of D0 should
> write the message unconditionally. That would be sufficient I think.

It doesn't have to do that.

The problem here is that pci_pm_thaw_noirq() calls pci_restore_state() which
in fact requires the device to be in D0, so the caller should put it into
D0 instead of trying to "update" its power state.

[Note that the PCI layer doesn't put devices into low-power states during the
hibernation's "freeze" transition, but drivers can legitimately do that in
their "freeze" callbacks which was overlooked in that code and that's what
i915 does.]

So IMO what we need is the change below.  I'm going to test it shortly,
but please give it a go too.

---
 drivers/pci/pci-driver.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_resume_early(dev);
 
-	pci_update_current_state(pci_dev, PCI_D0);
+	/*
+	 * pci_restore_state() requires the device to be in D0 (because of MSI
+	 * restoration among other things), so force it into D0 in case the
+	 * driver's "freeze" callbacks put it into a low-power state directly.
+	 */
+	pci_set_power_state(pci_dev, PCI_D0);
 	pci_restore_state(pci_dev);
 
 	if (drv && drv->pm && drv->pm->thaw_noirq)

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-14 15:30                                       ` Rafael J. Wysocki
@ 2017-12-14 15:52                                         ` Thomas Gleixner
  2017-12-14 15:54                                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-14 15:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pci, linux-pm

On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
> The problem here is that pci_pm_thaw_noirq() calls pci_restore_state() which
> in fact requires the device to be in D0, so the caller should put it into
> D0 instead of trying to "update" its power state.
> 
> [Note that the PCI layer doesn't put devices into low-power states during the
> hibernation's "freeze" transition, but drivers can legitimately do that in
> their "freeze" callbacks which was overlooked in that code and that's what
> i915 does.]
> 
> So IMO what we need is the change below.  I'm going to test it shortly,
> but please give it a go too.

So now this looks more reasonable:

  irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
  __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a 
  __pci_write_msi_msg: Not written
  ...
  device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
  pci_pm_thaw_noirq <-dpm_run_callback
  __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a 
  device_pm_callback_end: i915 0000:00:02.0, err=0
  ...
  resume_irqs: Resume 125
  ...
  irq_handler_entry: irq=125 name=i915

Thanks,

	tglx

> ---
>  drivers/pci/pci-driver.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> Index: linux-pm/drivers/pci/pci-driver.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
>  	if (pci_has_legacy_pm_support(pci_dev))
>  		return pci_legacy_resume_early(dev);
>  
> -	pci_update_current_state(pci_dev, PCI_D0);
> +	/*
> +	 * pci_restore_state() requires the device to be in D0 (because of MSI
> +	 * restoration among other things), so force it into D0 in case the
> +	 * driver's "freeze" callbacks put it into a low-power state directly.
> +	 */
> +	pci_set_power_state(pci_dev, PCI_D0);
>  	pci_restore_state(pci_dev);
>  
>  	if (drv && drv->pm && drv->pm->thaw_noirq)
> 
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-14 15:52                                         ` Thomas Gleixner
@ 2017-12-14 15:54                                           ` Rafael J. Wysocki
  2017-12-14 16:17                                             ` Maarten Lankhorst
  2017-12-15  2:07                                             ` [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq() Rafael J. Wysocki
  0 siblings, 2 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-14 15:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pci, linux-pm

On Thursday, December 14, 2017 4:52:22 PM CET Thomas Gleixner wrote:
> On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
> > The problem here is that pci_pm_thaw_noirq() calls pci_restore_state() which
> > in fact requires the device to be in D0, so the caller should put it into
> > D0 instead of trying to "update" its power state.
> > 
> > [Note that the PCI layer doesn't put devices into low-power states during the
> > hibernation's "freeze" transition, but drivers can legitimately do that in
> > their "freeze" callbacks which was overlooked in that code and that's what
> > i915 does.]
> > 
> > So IMO what we need is the change below.  I'm going to test it shortly,
> > but please give it a go too.
> 
> So now this looks more reasonable:
> 
>   irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
>   __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a 
>   __pci_write_msi_msg: Not written
>   ...
>   device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
>   pci_pm_thaw_noirq <-dpm_run_callback
>   __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a 
>   device_pm_callback_end: i915 0000:00:02.0, err=0
>   ...
>   resume_irqs: Resume 125
>   ...
>   irq_handler_entry: irq=125 name=i915

Cool.

Let me respin it with a changelog etc then.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-14 15:54                                           ` Rafael J. Wysocki
@ 2017-12-14 16:17                                             ` Maarten Lankhorst
  2017-12-15  2:07                                             ` [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq() Rafael J. Wysocki
  1 sibling, 0 replies; 83+ messages in thread
From: Maarten Lankhorst @ 2017-12-14 16:17 UTC (permalink / raw)
  To: Rafael J. Wysocki, Thomas Gleixner
  Cc: Linus Torvalds, Bjorn Helgaas, Michal Hocko, Andy Lutomirski,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Daniel Vetter, Bjorn Helgaas, Rafael J. Wysocki, linux-pci,
	linux-pm

Op 14-12-17 om 16:54 schreef Rafael J. Wysocki:
> On Thursday, December 14, 2017 4:52:22 PM CET Thomas Gleixner wrote:
>> On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
>>> The problem here is that pci_pm_thaw_noirq() calls pci_restore_state() which
>>> in fact requires the device to be in D0, so the caller should put it into
>>> D0 instead of trying to "update" its power state.
>>>
>>> [Note that the PCI layer doesn't put devices into low-power states during the
>>> hibernation's "freeze" transition, but drivers can legitimately do that in
>>> their "freeze" callbacks which was overlooked in that code and that's what
>>> i915 does.]
>>>
>>> So IMO what we need is the change below.  I'm going to test it shortly,
>>> but please give it a go too.
>> So now this looks more reasonable:
>>
>>   irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
>>   __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a 
>>   __pci_write_msi_msg: Not written
>>   ...
>>   device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
>>   pci_pm_thaw_noirq <-dpm_run_callback
>>   __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a 
>>   device_pm_callback_end: i915 0000:00:02.0, err=0
>>   ...
>>   resume_irqs: Resume 125
>>   ...
>>   irq_handler_entry: irq=125 name=i915
> Cool.
>
> Let me respin it with a changelog etc then.
>
> Thanks,
> Rafael
>
>
The machine I was using for reproducing the bug appears to be fixed with this patch, so I now sent
it to intel's trybot for results.

https://patchwork.freedesktop.org/series/35367/

Thanks for looking at the bug!

~Maarten

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-14 11:54                                 ` Thomas Gleixner
  2017-12-14 12:12                                   ` Rafael J. Wysocki
  2017-12-14 13:24                                   ` Linux 4.15-rc2: Regression in resume from ACPI S3 Thomas Gleixner
@ 2017-12-14 19:03                                   ` Linus Torvalds
  2017-12-14 22:36                                     ` Thomas Gleixner
  2 siblings, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2017-12-14 19:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pci, linux-pm

On Thu, Dec 14, 2017 at 3:54 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> So the old scheme silently consumed the spurious vector. I added debug code
> to that effect to 4.14 and on that machine IRQ7 is triggered at the same
> point post resume and the core code drops it silently because the interrupt
> is marked masked and no action assigned.
>
> So the only difference to today is that the new code complains, while the
> old one does an extra mask of the already masked IOAPIC pin and silently
> returns.

Great debugging, and it looks like Rafael has a patch that already got
positive testing.

I just wanted to pipe up about that "irq7", because judging from your
email it seems like you think it's a real irq:

>        Now there is a race
> whether the kernel resume path manages to mask the PIC again early enough
> before something triggers IRQ7 or not.

..and that's not how the PIC works.

In fact, "legacy irq 7" is the _normal_ and very traditional spurious
interrupt, and it's documented. If the PIC gets an interrupt from
_any_ source, but the interrupt goes away before the PIC gets an
acknowledge from the CPU (and by "acknowledge", I'm not talking about
the explicit software IRQ ACK, I'm talking about the hardware
protocol, between the PIC and the CPU), the PIC will then report irq 7
as the interrupt - regardless of what the original was.

The reason is almost always something like

 - CPU interrupts are disabled or masked

 - driver does a write to the external hardware that causes an
interrupt to be raised

 - CPU doesn't react to the irq due to the disabled/masked nature

 - but the driver then does something that masks the interrupt again

 - interrupts are enabled/unmasked on the CPU

 - CPU now acks the interrupt, but the PIC no longer sees any
interrupt source, so the PIC (that has to reply with *something*)
replies with that documented spurious irq7.

To confuse things further, irq7 is not _exclusively_ the spurious
interrupt, You can definitely put real hardware and connect it to pin7
of the PIC, and get real irq7 reports.

And to confuse things even *more*, this "irq7" thing is per-PIC, and
the PC model obviously has the whole "nested PIC" thing where the
second PIC is connected to irq2 of the first PIC. So there are *two*
different "spurious interrupt" reports, one for each PIC.

Anyway, to avoid this issue, drivers should strive to

 (a) actually take the interrupt when doing things that can cause
them, and have the interrupt handler do whatever it is that causes the
interrupt to go away (ie: "normal operation")

 (b) if you play games with clearing the source of the interrupt
_without_ taking the interrupt, you should strive to basically mask
the interrupt first.

So to do (b) you can do something like

      mask_device_interrupt(dev);
      read_from_device_to_synchronize(dev);

instead of (or perhaps _before_) disabling interrupts at a CPU level.
Suspend/resume obviously does tend to play games with these kinds of
things where you are no longer in "normal operation" and you do setup
without having interrupts actually enabled.

Or you can just decide that spurious interrupts are ok, and ignore the
issue. But they *can* be very confusing, and obviously in this case
that confusion then seems to have caused actual problems.

            Linus

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] Fix resume on x86-32 machines
  2017-12-11 18:41                                   ` Andy Lutomirski
  2017-12-11 19:12                                     ` Linus Torvalds
@ 2017-12-14 20:38                                     ` Pavel Machek
  2017-12-14 20:47                                       ` Linus Torvalds
  1 sibling, 1 reply; 83+ messages in thread
From: Pavel Machek @ 2017-12-14 20:38 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Rafael J. Wysocki, Andrew Lutomirski, Zhang Rui,
	Thomas Gleixner, Jarkko Nikula, Linux Kernel Mailing List,
	the arch/x86 maintainers

[-- Attachment #1: Type: text/plain, Size: 738 bytes --]

Hi!

> > But yes, Rafael's patch looks like the minimal one-liner. But I think
> > we should do the %gs load early too for the 32-bit stack canary case,
> > kind of like we need to do %fs for percpu base.
> 
> I'll try to get to this in a day or so -- is that okay?  Or should we
> do some trivial fix/revert and fix it for real next time around?

I can test the patch....

But given this is already "regression fix for x86-64 caused regression
on x86-32", I really believe we should merge trivial fix now, and do
the cleanups / nicer fixes sometime later?

Thanks,
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] Fix resume on x86-32 machines
  2017-12-14 20:38                                     ` Pavel Machek
@ 2017-12-14 20:47                                       ` Linus Torvalds
  2017-12-14 21:20                                         ` Andy Lutomirski
  2017-12-14 22:22                                         ` Pavel Machek
  0 siblings, 2 replies; 83+ messages in thread
From: Linus Torvalds @ 2017-12-14 20:47 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andy Lutomirski, Rafael J. Wysocki, Andrew Lutomirski, Zhang Rui,
	Thomas Gleixner, Jarkko Nikula, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Thu, Dec 14, 2017 at 12:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
>
> But given this is already "regression fix for x86-64 caused regression
> on x86-32", I really believe we should merge trivial fix now, and do
> the cleanups / nicer fixes sometime later?

The fix patch was already posted, but in another thread (confusingly
with _almost_ the same subject: "Re: Linux 4.15-rc2: Regression in
resume from ACPI S3").

Here:

  https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes

but it hasn't actually gotten to me yet (I think Luto made a new
version where he split it up into three smaller patches, and I'm
assuming it's in one of the -tip trees heading for me asap).

All the x86 people have been insanely busy with the crazy page table
isolation patches, that probably is slowing down things. But I was
expecting to get the -tip tree pulls tomorrow (.. because.. Friday.
It's my busiest day of the week because everybody wants to get their
work out before the weekend).

              Linus

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] Fix resume on x86-32 machines
  2017-12-14 20:47                                       ` Linus Torvalds
@ 2017-12-14 21:20                                         ` Andy Lutomirski
  2017-12-14 22:22                                         ` Pavel Machek
  1 sibling, 0 replies; 83+ messages in thread
From: Andy Lutomirski @ 2017-12-14 21:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pavel Machek, Rafael J. Wysocki, Andrew Lutomirski, Zhang Rui,
	Thomas Gleixner, Jarkko Nikula, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Thu, Dec 14, 2017 at 12:47 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Dec 14, 2017 at 12:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
>>
>> But given this is already "regression fix for x86-64 caused regression
>> on x86-32", I really believe we should merge trivial fix now, and do
>> the cleanups / nicer fixes sometime later?
>
> The fix patch was already posted, but in another thread (confusingly
> with _almost_ the same subject: "Re: Linux 4.15-rc2: Regression in
> resume from ACPI S3").
>
> Here:
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes
>
> but it hasn't actually gotten to me yet (I think Luto made a new
> version where he split it up into three smaller patches, and I'm
> assuming it's in one of the -tip trees heading for me asap).
>
> All the x86 people have been insanely busy with the crazy page table
> isolation patches, that probably is slowing down things. But I was
> expecting to get the -tip tree pulls tomorrow (.. because.. Friday.
> It's my busiest day of the week because everybody wants to get their
> work out before the weekend).

Nah, it's that I was busy with the stupid PTI stuff, and the original
version of the patch didn't compile on some configurations.  I've
tested the version I just sent for suspend-to-ram and suspend-to-disk
on 32-bit and 64-bit in a couple of configurations.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] Fix resume on x86-32 machines
  2017-12-14 20:47                                       ` Linus Torvalds
  2017-12-14 21:20                                         ` Andy Lutomirski
@ 2017-12-14 22:22                                         ` Pavel Machek
  1 sibling, 0 replies; 83+ messages in thread
From: Pavel Machek @ 2017-12-14 22:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Rafael J. Wysocki, Andrew Lutomirski, Zhang Rui,
	Thomas Gleixner, Jarkko Nikula, Linux Kernel Mailing List,
	the arch/x86 maintainers

[-- Attachment #1: Type: text/plain, Size: 1074 bytes --]

On Thu 2017-12-14 12:47:56, Linus Torvalds wrote:
> On Thu, Dec 14, 2017 at 12:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
> >
> > But given this is already "regression fix for x86-64 caused regression
> > on x86-32", I really believe we should merge trivial fix now, and do
> > the cleanups / nicer fixes sometime later?
> 
> The fix patch was already posted, but in another thread (confusingly
> with _almost_ the same subject: "Re: Linux 4.15-rc2: Regression in
> resume from ACPI S3").
> 
> Here:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes
> 
> but it hasn't actually gotten to me yet (I think Luto made a new
> version where he split it up into three smaller patches, and I'm
> assuming it's in one of the -tip trees heading for me asap).

Ok.. It did not apply cleanly. ... but I see fresh patches in my
inbox, let me test those.

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-14 19:03                                   ` Linus Torvalds
@ 2017-12-14 22:36                                     ` Thomas Gleixner
  2017-12-14 22:47                                       ` Linus Torvalds
  2017-12-15  0:34                                       ` Rafael J. Wysocki
  0 siblings, 2 replies; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-14 22:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pci, linux-pm

On Thu, 14 Dec 2017, Linus Torvalds wrote:
> On Thu, Dec 14, 2017 at 3:54 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> I just wanted to pipe up about that "irq7", because judging from your
> email it seems like you think it's a real irq:
> 
> >        Now there is a race
> > whether the kernel resume path manages to mask the PIC again early enough
> > before something triggers IRQ7 or not.
> 
> ..and that's not how the PIC works.
> 
> In fact, "legacy irq 7" is the _normal_ and very traditional spurious
> interrupt, and it's documented. If the PIC gets an interrupt from
> _any_ source, but the interrupt goes away before the PIC gets an
> acknowledge from the CPU (and by "acknowledge", I'm not talking about
> the explicit software IRQ ACK, I'm talking about the hardware
> protocol, between the PIC and the CPU), the PIC will then report irq 7
> as the interrupt - regardless of what the original was.
> 
> The reason is almost always something like
> 
>  - CPU interrupts are disabled or masked
> 
>  - driver does a write to the external hardware that causes an
> interrupt to be raised

Which should be a non issue because _ALL_ PIC irq lines are masked at the
PIC itself. All interrupts are routed through IOAPIC. So unless the IOAPIC
sports similar behaviour the PIC should not ever observe that scenario.

But, because the silly firmware comes out of suspend with all PIC lines
unmasked for whatever reason, the PIC can observe that IRQ being raised and
the CPU not handling it. So yes, I forgot about 7 being magic, but I still
think it's the firmware which causes it by unmasking the PIC irqs.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-14 22:36                                     ` Thomas Gleixner
@ 2017-12-14 22:47                                       ` Linus Torvalds
  2017-12-15  9:05                                         ` Thomas Gleixner
  2017-12-15  0:34                                       ` Rafael J. Wysocki
  1 sibling, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2017-12-14 22:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pci, linux-pm

On Thu, Dec 14, 2017 at 2:36 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> But, because the silly firmware comes out of suspend with all PIC lines
> unmasked for whatever reason, the PIC can observe that IRQ being raised and
> the CPU not handling it. So yes, I forgot about 7 being magic, but I still
> think it's the firmware which causes it by unmasking the PIC irqs.

Yes, that sounds quite likely.

            Linus

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-14 22:36                                     ` Thomas Gleixner
  2017-12-14 22:47                                       ` Linus Torvalds
@ 2017-12-15  0:34                                       ` Rafael J. Wysocki
  1 sibling, 0 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-15  0:34 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, Linux PCI, Linux PM

On Thu, Dec 14, 2017 at 11:36 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Thu, 14 Dec 2017, Linus Torvalds wrote:
>> On Thu, Dec 14, 2017 at 3:54 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> I just wanted to pipe up about that "irq7", because judging from your
>> email it seems like you think it's a real irq:
>>
>> >        Now there is a race
>> > whether the kernel resume path manages to mask the PIC again early enough
>> > before something triggers IRQ7 or not.
>>
>> ..and that's not how the PIC works.
>>
>> In fact, "legacy irq 7" is the _normal_ and very traditional spurious
>> interrupt, and it's documented. If the PIC gets an interrupt from
>> _any_ source, but the interrupt goes away before the PIC gets an
>> acknowledge from the CPU (and by "acknowledge", I'm not talking about
>> the explicit software IRQ ACK, I'm talking about the hardware
>> protocol, between the PIC and the CPU), the PIC will then report irq 7
>> as the interrupt - regardless of what the original was.
>>
>> The reason is almost always something like
>>
>>  - CPU interrupts are disabled or masked
>>
>>  - driver does a write to the external hardware that causes an
>> interrupt to be raised
>
> Which should be a non issue because _ALL_ PIC irq lines are masked at the
> PIC itself. All interrupts are routed through IOAPIC. So unless the IOAPIC
> sports similar behaviour the PIC should not ever observe that scenario.
>
> But, because the silly firmware comes out of suspend with all PIC lines
> unmasked for whatever reason, the PIC can observe that IRQ being raised and
> the CPU not handling it. So yes, I forgot about 7 being magic, but I still
> think it's the firmware which causes it by unmasking the PIC irqs.

That's my understanding too.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
  2017-12-14 15:54                                           ` Rafael J. Wysocki
  2017-12-14 16:17                                             ` Maarten Lankhorst
@ 2017-12-15  2:07                                             ` Rafael J. Wysocki
  2017-12-15 14:28                                               ` Rafael J. Wysocki
  2017-12-15 18:30                                               ` Bjorn Helgaas
  1 sibling, 2 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-15  2:07 UTC (permalink / raw)
  To: Thomas Gleixner, linux-pci
  Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pm, Chen Yu

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

It is incorrect to call pci_restore_state() for devices in low-power
states (D1-D3), as that involves the restoration of MSI setup which
requires MMIO to be operational and that is only the case in D0.

However, pci_pm_thaw_noirq() may do that if the driver's "freeze"
callbacks put the device into a low-power state, so fix it by making
it force devices into D0 via pci_set_power_state() instead of trying
to "update" their power state which is pointless.

Fixes: e60514bd4485 (PCI/PM: Restore the status of PCI devices across hibernation)
Cc: 4.13+ <stable@vger.kernel.org> # 4.13+
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Maarten Lankhorst <dev@mblankhorst.nl>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Maarten Lankhorst <dev@mblankhorst.nl>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

The bug is not as old as I thought, actually.

Yes, we did the pci_update_current_state() in pci_pm_thaw_noirq()
forever, but it started to be problematic in 4.13, when we started
to call pci_restore_state() in addition to it to fix another issue.

---
 drivers/pci/pci-driver.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_resume_early(dev);
 
-	pci_update_current_state(pci_dev, PCI_D0);
+	/*
+	 * pci_restore_state() requires the device to be in D0 (because of MSI
+	 * restoration among other things), so force it into D0 in case the
+	 * driver's "freeze" callbacks put it into a low-power state directly.
+	 */
+	pci_set_power_state(pci_dev, PCI_D0);
 	pci_restore_state(pci_dev);
 
 	if (drv && drv->pm && drv->pm->thaw_noirq)

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
  2017-12-14 22:47                                       ` Linus Torvalds
@ 2017-12-15  9:05                                         ` Thomas Gleixner
  0 siblings, 0 replies; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-15  9:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
	Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pci, linux-pm

On Thu, 14 Dec 2017, Linus Torvalds wrote:

> On Thu, Dec 14, 2017 at 2:36 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > But, because the silly firmware comes out of suspend with all PIC lines
> > unmasked for whatever reason, the PIC can observe that IRQ being raised and
> > the CPU not handling it. So yes, I forgot about 7 being magic, but I still
> > think it's the firmware which causes it by unmasking the PIC irqs.
> 
> Yes, that sounds quite likely.

And just for the record I was able to figure out which interrupt comes in
and goes away again. It's the only level triggered interrupt, which is the
ACPI interrupt.....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
  2017-12-15  2:07                                             ` [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq() Rafael J. Wysocki
@ 2017-12-15 14:28                                               ` Rafael J. Wysocki
  2017-12-15 18:30                                               ` Bjorn Helgaas
  1 sibling, 0 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-15 14:28 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Thomas Gleixner, Linux PCI, Linus Torvalds, Maarten Lankhorst,
	Michal Hocko, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, Linux PM, Chen Yu

On Fri, Dec 15, 2017 at 3:07 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> It is incorrect to call pci_restore_state() for devices in low-power
> states (D1-D3), as that involves the restoration of MSI setup which
> requires MMIO to be operational and that is only the case in D0.
>
> However, pci_pm_thaw_noirq() may do that if the driver's "freeze"
> callbacks put the device into a low-power state, so fix it by making
> it force devices into D0 via pci_set_power_state() instead of trying
> to "update" their power state which is pointless.
>
> Fixes: e60514bd4485 (PCI/PM: Restore the status of PCI devices across hibernation)
> Cc: 4.13+ <stable@vger.kernel.org> # 4.13+
> Reported-by: Thomas Gleixner <tglx@linutronix.de>
> Reported-by: Maarten Lankhorst <dev@mblankhorst.nl>
> Tested-by: Thomas Gleixner <tglx@linutronix.de>
> Tested-by: Maarten Lankhorst <dev@mblankhorst.nl>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> The bug is not as old as I thought, actually.
>
> Yes, we did the pci_update_current_state() in pci_pm_thaw_noirq()
> forever, but it started to be problematic in 4.13, when we started
> to call pci_restore_state() in addition to it to fix another issue.

Bjorn, any concerns about this one?

> ---
>  drivers/pci/pci-driver.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> Index: linux-pm/drivers/pci/pci-driver.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
>         if (pci_has_legacy_pm_support(pci_dev))
>                 return pci_legacy_resume_early(dev);
>
> -       pci_update_current_state(pci_dev, PCI_D0);
> +       /*
> +        * pci_restore_state() requires the device to be in D0 (because of MSI
> +        * restoration among other things), so force it into D0 in case the
> +        * driver's "freeze" callbacks put it into a low-power state directly.
> +        */
> +       pci_set_power_state(pci_dev, PCI_D0);
>         pci_restore_state(pci_dev);
>
>         if (drv && drv->pm && drv->pm->thaw_noirq)
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
  2017-12-15  2:07                                             ` [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq() Rafael J. Wysocki
  2017-12-15 14:28                                               ` Rafael J. Wysocki
@ 2017-12-15 18:30                                               ` Bjorn Helgaas
  2017-12-15 23:44                                                 ` Rafael J. Wysocki
  1 sibling, 1 reply; 83+ messages in thread
From: Bjorn Helgaas @ 2017-12-15 18:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Thomas Gleixner, linux-pci, Linus Torvalds, Maarten Lankhorst,
	Michal Hocko, Andy Lutomirski, Linux Kernel Mailing List,
	the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
	Rafael J. Wysocki, linux-pm, Chen Yu

On Fri, Dec 15, 2017 at 03:07:18AM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> It is incorrect to call pci_restore_state() for devices in low-power
> states (D1-D3), as that involves the restoration of MSI setup which
> requires MMIO to be operational and that is only the case in D0.
> 
> However, pci_pm_thaw_noirq() may do that if the driver's "freeze"
> callbacks put the device into a low-power state, so fix it by making
> it force devices into D0 via pci_set_power_state() instead of trying
> to "update" their power state which is pointless.
> 
> Fixes: e60514bd4485 (PCI/PM: Restore the status of PCI devices across hibernation)
> Cc: 4.13+ <stable@vger.kernel.org> # 4.13+
> Reported-by: Thomas Gleixner <tglx@linutronix.de>
> Reported-by: Maarten Lankhorst <dev@mblankhorst.nl>
> Tested-by: Thomas Gleixner <tglx@linutronix.de>
> Tested-by: Maarten Lankhorst <dev@mblankhorst.nl>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

Let me know if you want me to take this.  I don't have anything
currently queued up that touches pci-driver.c, so I'm happy if you
take it yourself.

> ---
> 
> The bug is not as old as I thought, actually.
> 
> Yes, we did the pci_update_current_state() in pci_pm_thaw_noirq()
> forever, but it started to be problematic in 4.13, when we started
> to call pci_restore_state() in addition to it to fix another issue.
> 
> ---
>  drivers/pci/pci-driver.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> Index: linux-pm/drivers/pci/pci-driver.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
>  	if (pci_has_legacy_pm_support(pci_dev))
>  		return pci_legacy_resume_early(dev);
>  
> -	pci_update_current_state(pci_dev, PCI_D0);
> +	/*
> +	 * pci_restore_state() requires the device to be in D0 (because of MSI
> +	 * restoration among other things), so force it into D0 in case the
> +	 * driver's "freeze" callbacks put it into a low-power state directly.
> +	 */
> +	pci_set_power_state(pci_dev, PCI_D0);
>  	pci_restore_state(pci_dev);
>  
>  	if (drv && drv->pm && drv->pm->thaw_noirq)
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
  2017-12-15 18:30                                               ` Bjorn Helgaas
@ 2017-12-15 23:44                                                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-15 23:44 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Thomas Gleixner, Linux PCI, Linus Torvalds,
	Maarten Lankhorst, Michal Hocko, Andy Lutomirski,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Daniel Vetter, Bjorn Helgaas, Rafael J. Wysocki, Linux PM,
	Chen Yu

On Fri, Dec 15, 2017 at 7:30 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Fri, Dec 15, 2017 at 03:07:18AM +0100, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> It is incorrect to call pci_restore_state() for devices in low-power
>> states (D1-D3), as that involves the restoration of MSI setup which
>> requires MMIO to be operational and that is only the case in D0.
>>
>> However, pci_pm_thaw_noirq() may do that if the driver's "freeze"
>> callbacks put the device into a low-power state, so fix it by making
>> it force devices into D0 via pci_set_power_state() instead of trying
>> to "update" their power state which is pointless.
>>
>> Fixes: e60514bd4485 (PCI/PM: Restore the status of PCI devices across hibernation)
>> Cc: 4.13+ <stable@vger.kernel.org> # 4.13+
>> Reported-by: Thomas Gleixner <tglx@linutronix.de>
>> Reported-by: Maarten Lankhorst <dev@mblankhorst.nl>
>> Tested-by: Thomas Gleixner <tglx@linutronix.de>
>> Tested-by: Maarten Lankhorst <dev@mblankhorst.nl>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>
> Let me know if you want me to take this.  I don't have anything
> currently queued up that touches pci-driver.c, so I'm happy if you
> take it yourself.

I will take it.

Depending of what Yu finds, we may need an additional fix to make the
Purley system work.

>> ---
>>
>> The bug is not as old as I thought, actually.
>>
>> Yes, we did the pci_update_current_state() in pci_pm_thaw_noirq()
>> forever, but it started to be problematic in 4.13, when we started
>> to call pci_restore_state() in addition to it to fix another issue.
>>
>> ---
>>  drivers/pci/pci-driver.c |    7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> Index: linux-pm/drivers/pci/pci-driver.c
>> ===================================================================
>> --- linux-pm.orig/drivers/pci/pci-driver.c
>> +++ linux-pm/drivers/pci/pci-driver.c
>> @@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
>>       if (pci_has_legacy_pm_support(pci_dev))
>>               return pci_legacy_resume_early(dev);
>>
>> -     pci_update_current_state(pci_dev, PCI_D0);
>> +     /*
>> +      * pci_restore_state() requires the device to be in D0 (because of MSI
>> +      * restoration among other things), so force it into D0 in case the
>> +      * driver's "freeze" callbacks put it into a low-power state directly.
>> +      */
>> +     pci_set_power_state(pci_dev, PCI_D0);
>>       pci_restore_state(pci_dev);
>>
>>       if (drv && drv->pm && drv->pm->thaw_noirq)
>>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: Linux 4.15-rc2
  2017-12-03 16:22 Linux 4.15-rc2 Linus Torvalds
  2017-12-04 22:25 ` Linux 4.15-rc2: Regression in resume from ACPI S3 Rafael J. Wysocki
@ 2018-02-21 18:36 ` Eugene Syromiatnikov
  1 sibling, 0 replies; 83+ messages in thread
From: Eugene Syromiatnikov @ 2018-02-21 18:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, Al Viro

On Sun, Dec 03, 2017 at 11:22:56AM -0500, Linus Torvalds wrote:
> 
> Linus Torvalds (6):
>       Rename superblock flags (MS_xyz -> SB_xyz)

This commit, while claims that it changes internal flags, also touches
an UAPI header (include/uapi/linux/bfs_fs.h), specifically, the macro
BFS_UNCLEAN.  I expect that either this macro should be in a private
header, or (if this check is expected to be available to the userspace)
at least SB_RDONLY should be also defined there.

^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2018-02-21 18:36 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-03 16:22 Linux 4.15-rc2 Linus Torvalds
2017-12-04 22:25 ` Linux 4.15-rc2: Regression in resume from ACPI S3 Rafael J. Wysocki
2017-12-04 22:36   ` Linus Torvalds
2017-12-04 22:38     ` Thomas Gleixner
2017-12-04 22:41       ` Rafael J. Wysocki
2017-12-05  0:25         ` Rafael J. Wysocki
2017-12-09 10:33           ` Pavel Machek
2017-12-09 11:41             ` Pavel Machek
     [not found]             ` <CA+55aFw8tuoJ2gcXx3K2sKFf2Y9hXX4naMVQNqGOUivnjwhjkg@mail.gmail.com>
2017-12-09 22:01               ` Pavel Machek
     [not found]                 ` <CA+55aFySAdiBZhZ0PSDjH5PuvPPcMsBRXbxCkObfm1eY7gHDbQ@mail.gmail.com>
2017-12-10 16:23                   ` Pavel Machek
2017-12-10 16:37                     ` Linus Torvalds
2017-12-10 18:56                       ` Pavel Machek
2017-12-10 20:30                         ` Linus Torvalds
2017-12-10 20:43                           ` Pavel Machek
2017-12-10 21:28                             ` Linus Torvalds
2017-12-10 21:35                               ` Pavel Machek
2017-12-12 17:27                               ` Linus Torvalds
2017-12-12 18:05                                 ` Andy Lutomirski
2017-12-12 18:36                                   ` Linus Torvalds
2017-12-12 22:10                                     ` Andy Lutomirski
2017-12-12 22:33                                       ` Linus Torvalds
2017-12-12 23:10                                         ` Andy Lutomirski
2017-12-13 11:16                                       ` Jarkko Nikula
2017-12-13 12:40                                         ` Ingo Molnar
2017-12-13 18:50                                         ` Andy Lutomirski
2017-12-10 21:38                           ` [PATCH] Fix resume on x86-32 machines Pavel Machek
2017-12-10 21:58                             ` Andy Lutomirski
2017-12-10 22:20                               ` Pavel Machek
2017-12-11  9:25                                 ` Jarkko Nikula
2017-12-11 14:22                               ` Rafael J. Wysocki
2017-12-11 14:43                                 ` Rafael J. Wysocki
2017-12-11 14:59                                 ` Jarkko Nikula
2017-12-11 18:31                                 ` Linus Torvalds
2017-12-11 18:41                                   ` Andy Lutomirski
2017-12-11 19:12                                     ` Linus Torvalds
2017-12-14 20:38                                     ` Pavel Machek
2017-12-14 20:47                                       ` Linus Torvalds
2017-12-14 21:20                                         ` Andy Lutomirski
2017-12-14 22:22                                         ` Pavel Machek
2017-12-11 15:13                               ` Ingo Molnar
2017-12-11 16:26                                 ` Andy Lutomirski
2017-12-11 14:09                           ` Linux 4.15-rc2: Regression in resume from ACPI S3 Zhang Rui
2017-12-11 16:28                             ` Andy Lutomirski
2017-12-12  8:00                             ` Pavel Machek
2017-12-06 12:15     ` Michal Hocko
2017-12-06 12:23       ` Thomas Gleixner
2017-12-06 14:04         ` Rafael J. Wysocki
2017-12-06 12:31       ` Maarten Lankhorst
2017-12-06 12:46         ` Thomas Gleixner
2017-12-06 13:09           ` Maarten Lankhorst
2017-12-06 14:15             ` Thomas Gleixner
2017-12-07 13:33               ` Maarten Lankhorst
2017-12-08 10:30                 ` Thomas Gleixner
2017-12-13 15:57                   ` Thomas Gleixner
2017-12-13 16:23                     ` Bjorn Helgaas
2017-12-13 16:41                       ` Thomas Gleixner
2017-12-13 17:45                         ` Linus Torvalds
2017-12-13 18:19                           ` Thomas Gleixner
2017-12-13 20:52                             ` Thomas Gleixner
2017-12-13 21:06                               ` Thomas Gleixner
2017-12-13 22:48                                 ` Rafael J. Wysocki
2017-12-14 11:54                                 ` Thomas Gleixner
2017-12-14 12:12                                   ` Rafael J. Wysocki
2017-12-14 12:30                                     ` Thomas Gleixner
2017-12-14 15:30                                       ` Rafael J. Wysocki
2017-12-14 15:52                                         ` Thomas Gleixner
2017-12-14 15:54                                           ` Rafael J. Wysocki
2017-12-14 16:17                                             ` Maarten Lankhorst
2017-12-15  2:07                                             ` [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq() Rafael J. Wysocki
2017-12-15 14:28                                               ` Rafael J. Wysocki
2017-12-15 18:30                                               ` Bjorn Helgaas
2017-12-15 23:44                                                 ` Rafael J. Wysocki
2017-12-14 13:24                                   ` Linux 4.15-rc2: Regression in resume from ACPI S3 Thomas Gleixner
2017-12-14 19:03                                   ` Linus Torvalds
2017-12-14 22:36                                     ` Thomas Gleixner
2017-12-14 22:47                                       ` Linus Torvalds
2017-12-15  9:05                                         ` Thomas Gleixner
2017-12-15  0:34                                       ` Rafael J. Wysocki
2017-12-13 22:39                             ` Rafael J. Wysocki
2017-12-13 23:26                               ` Rafael J. Wysocki
2017-12-07  7:55       ` Michal Hocko
2017-12-10 20:30         ` Michal Hocko
2018-02-21 18:36 ` Linux 4.15-rc2 Eugene Syromiatnikov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.