* Linux 4.15-rc2
@ 2017-12-03 16:22 Linus Torvalds
2017-12-04 22:25 ` Linux 4.15-rc2: Regression in resume from ACPI S3 Rafael J. Wysocki
2018-02-21 18:36 ` Linux 4.15-rc2 Eugene Syromiatnikov
0 siblings, 2 replies; 83+ messages in thread
From: Linus Torvalds @ 2017-12-03 16:22 UTC (permalink / raw)
To: Linux Kernel Mailing List
It's Sunday, but a few hours earlier than usual, since I'm on the east
coast, three hours ahead of my normal release schedule.
It's a slightly bigger rc2 than I would have wished for, but this
early in the release process I don't worry about it. The appended
shortlog gives the details, it's fixes all over the place -
architectures, drivers, filesystems, networking, core kernel.
One thing I'll point out is that I'm trying to get some kernel ASLR
leaks plugged, and as part of that we now hash any pointers printed by
"%p' by default. That won't affect a lot of people, but where it is a
debugging problem (rather than leaking interesting kernel pointers),
we will have to fix things up.
It can be a small annoyance, but the alternatives (trying to actually
find all the cases where we might be leaking) were worse. But let's
see if anybody even notices - a lot of the pointer printouts are stale
debug information from when some driver was originally written, and
aren't actually really interesting.
There will probably be some more leak fixes during this rc process,
we'll see how that all sorts out.
Linus
---
Abhishek Goel (2):
cpupowerutils: bench - Fix cpu online check
cpupower : Fix cpupower working when cpu0 is offline
Adrian Hunter (4):
mmc: block: Fix missing blk_put_request()
mmc: block: Check return value of blk_get_request()
mmc: core: Do not leave the block driver in a suspended state
mmc: block: Ensure that debugfs files are removed
Ahmad Fatoum (1):
e1000: Fix off-by-one in debug message
Alex Deucher (4):
drm/amdgpu/gfx7: cache raster_config values
drm/amdgpu: used cached gca values for cik_read_register
Revert "drm/amdgpu: fix rmmod KCQ disable failed error"
drm/amdgpu: drop experimental flag for raven
Amritha Nambiar (1):
i40e: Fix reporting incorrect error codes
Andrew Elble (2):
nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class
nfsd: check for use of the closed special stateid
Andrew Jiang (1):
drm/amd/display: Don't reject 3D timings
Andrew Waterman (3):
RISC-V: Add VDSO entries for clock_get/gettimeofday/getcpu
RISC-V: Flush I$ when making a dirty page executable
RISC-V: Allow userspace to flush the instruction cache
Andrey Grodzovsky (1):
drm/amd/display: Switch to drm_atomic_helper_wait_for_flip_done
Andrey Gusakov (6):
drm/bridge: tc358767: do no fail on hi-res displays
drm/bridge: tc358767: filter out too high modes
drm/bridge: tc358767: fix DP0_MISC register set
drm/bridge: tc358767: fix timing calculations
drm/bridge: tc358767: fix AUXDATAn registers access
drm/bridge: tc358767: fix 1-lane behavior
Andy Shevchenko (1):
scripts/bloat-o-meter: don't fail with division by 0
Antoine Tenart (4):
net: mvpp2: fix the txq_init error path
net: mvpp2: cleanup probed ports in the probe error path
net: mvpp2: check ethtool sets the Tx ring size is to a valid min value
net: phy: marvell10g: fix the PHY id mask
Ard Biesheuvel (2):
arm64: module-plts: factor out PLT generation code for ftrace
arm64: ftrace: emit ftrace-mod.o contents through code
Arnd Bergmann (2):
drm/i915: fix intel_backlight_device_register declaration
drm/omap: displays: panel-dpi: add backlight dependency
Bartosz Golaszewski (1):
eeprom: at24: correctly set the size for at24mac402
Bastian Stender (2):
mmc: core: prepend 0x to pre_eol_info entry in sysfs
mmc: core: prepend 0x to OCR entry in sysfs
Benjamin Gaignard (1):
ethernet: dwmac-stm32: Fix copyright
Bhawanpreet Lakha (1):
drm/amd/display: Add null check for 24BPP (xfm and dpp)
Bhumika Goyal (3):
sunrpc: make the function arg as const
NFSD: make cache_detail structures const
SUNRPC: make cache_detail structures const
Bjorn Andersson (1):
mmc: sdhci-msm: Optionally wait for signal level changes
Carlos Maiolino (1):
xfs: Properly retry failed dquot items in case of error during
buffer writeback
Changbin Du (1):
drm/i915/gvt: Fix unsafe locking caused by spin_unlock_bh
Chao Yu (1):
quota: propagate error from __dquot_initialize
Charlene Liu (2):
drm/amd/display: fix seq issue: turn on clock before programming afmt.
drm/amd/display: try to find matching audio inst for enc inst first
Chris Wilson (3):
drm/i915: Clear breadcrumb node when cancelling signaling
drm/i915: Mark the userptr invalidate workqueue as WQ_MEM_RECLAIM
drm/i915/fbdev: Serialise early hotplug events with async fbdev config
Christian Borntraeger (1):
s390/debug: use pK for kernel pointers
Christian König (2):
drm/amdgpu: don't try to move pinned BOs
drm/ttm: fix populate_and_map() functions once more
Christoph Hellwig (1):
move libgcc.h to include/linux
Christophe JAILLET (2):
bnxt_en: Fix an error handling path in 'bnxt_get_module_eeprom()'
drm/omap: Fix error handling path in 'omap_dmm_probe()'
Chun-Yeow Yeoh (1):
mac80211: fix the update of path metric for RANN frame
Cihangir Akturk (1):
drm: mali-dp: switch to drm_*_get(), drm_*_put() helpers
Colin Ian King (9):
nvme: fix spelling mistake: "requeing" -> "requeuing"
drm/i915/gvt: ensure -ve return value is handled correctly
i2c: i2c-boardinfo: fix memory leaks on devinfo
ambassador: fix incorrect indentation of assignment statement
atm: fore200e: use %pK to format kernel addresses instead of %x
atm: lanai: use %p to format kernel addresses instead of %x
atm: suni: remove extraneous space to fix indentation
drm/amd/display: fix memory leaks on error exit return
net: via: via-rhine: use %p to format void * address instead of %x
Dan Carpenter (1):
omapdrm: hdmi4_cec: signedness bug in hdmi4_cec_init()
Dan Williams (11):
mm: fix device-dax pud write-faults triggered by get_user_pages()
mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITE
mm: replace pud_write with pud_access_permitted in fault + gup paths
mm: replace pmd_write with pmd_access_permitted in fault + gup paths
mm: replace pte_write with pte_access_permitted in fault + gup paths
mm, hugetlbfs: introduce ->split() to vm_operations_struct
device-dax: implement ->split() to catch invalid munmap attempts
mm: introduce get_user_pages_longterm
mm: fail get_vaddr_frames() for filesystem-dax mappings
v4l2: disable filesystem-dax mapping support
IB/core: disable memory registration of filesystem-dax vmas
Darrick J. Wong (5):
xfs: always free inline data before resetting inode fork during ifree
xfs: log recovery should replay deferred ops in order
xfs: ubsan fixes
xfs: remove unused parameter from xfs_writepage_map
xfs: scrub inode mode properly
Dave Airlie (1):
drm/ttm: don't attempt to use hugepages if dma32 requested (v2)
Dave Martin (1):
arm64: fpsimd: Fix failure to restore FPSIMD state after signals
David Disseldorp (1):
null_blk: fix dev->badblocks leak
David Hildenbrand (1):
KVM: x86: fix em_fxstor() sleeping while in atomic
David Howells (14):
rxrpc: The mutex lock returned by rxrpc_accept_call() needs releasing
rxrpc: Don't set upgrade by default in sendmsg()
rxrpc: Provide a different lockdep key for call->user_mutex for
kernel calls
rxrpc: Delay terminal ACK transmission on a client call
rxrpc: Split the call params from the operation params
rxrpc: Fix call timeouts
rxrpc: Don't transmit DELAY ACKs immediately on proposal
rxrpc: Express protocol timeouts in terms of RTT
rxrpc: Add a timeout for detecting lost ACKs/lost DATA
rxrpc: Add keepalive for a call
rxrpc: Fix service endpoint expiry
rxrpc: Fix conn expiry timers
afs: Fix permit refcounting
afs: Properly reset afs_vnode (inode) fields
David S. Miller (1):
sparc64: Fix boot on T4 and later.
David Sterba (2):
btrfs: add missing device::flush_bio puts
btrfs: dev_alloc_list is not protected by RCU, use normal list_del
Dmitry V. Levin (1):
uapi: fix linux/kfd_ioctl.h userspace compilation errors
Dmytro Laktyushkin (3):
drm/amd/display: fix split recout calculation
drm/amd/display: fix split recout offset
drm/amd/display: fix split viewport rounding error
Dr. David Alan Gilbert (2):
KVM: lapic: Split out x2apic ldr calculation
KVM: lapic: Fixup LDR on load in x2apic
Eduardo Otubo (1):
xen-netfront: remove warning when unloading module
Eric Anholt (1):
drm/bridge: Fix lvds-encoder since the panel_bridge rework.
Eric Dumazet (1):
net/packet: fix a race in packet_bind() and packet_notifier()
Eric Sandeen (3):
xfs: fix leaks on corruption errors in xfs_bmap.c
xfs: fix uninitialized variable in xfs_scrub_quota
xfs: calculate correct offset in xfs_scrub_quota_item
Eric Yang (1):
drm/amd/display: Add timing validation against dongle cap
Eyal Moscovici (1):
KVM: x86: Allow suppressing prints on RDMSR/WRMSR of unhandled MSRs
Felix Kuehling (2):
drm/amdgpu: Fix SDMA load/unload sequence on HWS disabled mode
drm/amdkfd: Fix SDMA oversubsription handling
Filipe Manana (3):
Btrfs: move definition of the function btrfs_find_new_delalloc_bytes
Btrfs: fix reported number of inode blocks after buffered append writes
Btrfs: incremental send, fix wrong unlink path after renaming file
Geert Uytterhoeven (1):
net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit
Greg Kroah-Hartman (19):
s390: block: add SPDX identifiers to the remaining files
s390: crypto: add SPDX identifiers to the remaining files
s390: cio: add SPDX identifiers to the remaining files
s390: char: add SPDX identifiers to the remaining files
s390: net: add SPDX identifiers to the remaining files
s390: scsi: zfcp_aux: add SPDX identifier
s390: virtio: add SPDX identifiers to the remaining files
s390: crypto: Remove redundant license text
s390: drivers: Remove redundant license text
s390: kernel: add SPDX identifiers to the remaining files
s390: crypto: add SPDX identifiers to the remaining files
s390: mm: add SPDX identifiers to the remaining files
s390: pci: add SPDX identifiers to the remaining files
s390: appldata: add SPDX identifiers to the remaining files
s390: add SPDX identifiers to the remaining files
s390: kernel: Remove redundant license text
s390: include: Remove redundant license text
s390: crypto: Remove redundant license text
s390: Remove redundant license text
Gustavo A R Silva (1):
i40e/virtchnl: fix application of sizeof to pointer
Gustavo A. R. Silva (1):
net: openvswitch: datapath: fix data type in queue_gso_packets
Hans Verkuil (1):
drm/bridge: adv7511/33: Fix adv7511_cec_init() failure handling
Hans de Goede (4):
ACPI / bus: Leave modalias empty for devices which are not present
drm/i915: Fix false-positive assert_rpm_wakelock_held in
i915_pmic_bus_access_notifier v2
drm/i915: Re-register PMIC bus access notifier on runtime resume
i2c: i801: Fix Failed to allocate irq -2147483648 error
Harald Freudenberger (1):
s390/zcrypt: Fix wrong comparison leading to strange load balancing
Harry Wentland (6):
drm/amd/display: Fix amdgpu_dm bugs found by smatch
drm/amd/display: Bunch of smatch error and warning fixes in DC
drm/amd/display: Fix use before NULL check in validate_timing
drm/amd/display: Fix hubp check in set_cursor_position
drm/amd/display: Fix potential NULL and mem leak in create_links
drm/amd/display: Fix couple more inconsistent NULL checks in dc_resource
Heiko Carstens (2):
s390: rework __switch_to() to allow larger task_struct offsets
s390/disassembler: remove confusing code
Heiner Kallweit (2):
eeprom: at24: fix reading from 24MAC402/24MAC602
eeprom: at24: check at24_read/write arguments
Hersen Wu (2):
drm/amd/display: Handle as MST first and then DP dongle if sink
support both
drm/amd/display: USB-C / thunderbolt dock specific workaround
Huacai Chen (1):
bcache: Fix building error on MIPS
Hyong-Youb Kim (1):
myri10ge: Update MAINTAINERS
Ian Kent (2):
autofs: revert "autofs: take more care to not update last_used
on path walk"
autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"
Israel Rukshin (1):
nvme-rdma: Use mr pool
Jakub Kicinski (1):
cls_bpf: don't decrement net's refcount when offload fails
James Hogan (1):
cpufreq: Add Loongson machine dependencies
James Smart (1):
nvmet-fc: correct ref counting error when deferred rcv used
Jan H. Schönherr (1):
KVM: Let KVM_SET_SIGNAL_MASK work as advertised
Janakarajan Natarajan (1):
KVM: x86: Fix CPUID function for word 6 (80000001_ECX)
Jean Delvare (1):
hwmon: Drop reference to Jean's tree
Jeff Layton (1):
reiserfs: remove unneeded i_version bump
Jeff Lien (1):
nvme-pci: add quirk for delay before CHK RDY for WDC SN200
Jens Axboe (2):
nvme-fc: don't use bit masks for set/test_bit() numbers
blktrace: fix trace mutex deadlock
Jerry (Fangzhi) Zuo (1):
drm/amd/display: Check aux channel before MST resume
Jesse Chan (1):
cpufreq: mediatek: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
Jiang Biao (1):
fs/mbcache.c: make count_objects() more robust
Jinbum Park (1):
arm64: pgd: Mark pgd_cache as __ro_after_init
Jiri Pirko (1):
net: sched: cbq: create block for q->link.block
Johannes Berg (2):
cfg80211: select CRYPTO_SHA256 if needed
mac80211: use QoS NDP for AP probing
John Johansen (1):
apparmor: fix oops in audit_signal_cb hook
Jon Maloy (1):
tipc: eliminate access after delete in group_filter_msg()
Joonas Lahtinen (1):
drm/i915: Disable THP until we have a GPU read BW W/A
Jordan Lazare (1):
drm/amd/display: Revert noisy assert messages
Jorgen Hansen (2):
VSOCK: Don't call vsock_stream_has_data in atomic context
VSOCK: Don't set sk_state to TCP_CLOSE before testing it
Josef Bacik (2):
btrfs: clear space cache inode generation always
btrfs: fix deadlock when writing out space cache
Kai-Heng Feng (1):
nvme-pci: disable APST on Samsung SSD 960 EVO + ASUS PRIME B350M-A
Kees Cook (1):
exec: avoid RLIMIT_STACK races with prlimit()
Keith Busch (2):
nvme: Fix NULL dereference on reservation request
nvme: Suppress static analyis warning
Kirill A. Shutemov (3):
mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
mm, thp: Do not make pmd/pud dirty without a reason
mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine
Laurent Pinchart (1):
drm: omapdrm: Fix DPI on platforms using the DSI VDDS
Leo (Sunpeng) Li (3):
drm/amd/display: Should disable when new stream is null
drm/amd/display: Do DC mode-change check when adding CRTCs
drm/amd/display: Do not put drm_atomic_state on resume
Leo Liu (1):
drm/amdgpu: move UVD/VCE and VCN structure out from union
Linus Torvalds (6):
Rename superblock flags (MS_xyz -> SB_xyz)
proc: don't report kernel addresses in /proc/<pid>/stack
Revert "mm, thp: Do not make pmd/pud dirty without a reason"
kallsyms: take advantage of the new '%px' format
vsprintf: don't use 'restricted_pointer()' when not restricting
Linux 4.15-rc2
Liran Alon (6):
KVM: x86: pvclock: Handle first-time write to pvclock-page
contains random junk
KVM: nVMX/nSVM: Don't intercept #UD when running L2
KVM: x86: Exit to user-mode on #UD intercept when emulator requires
KVM: x86: emulator: Return to user-mode on L1 CPL=0 emulation failure
KVM: x86: Don't re-execute instruction when not passing CR2 value
KVM: nVMX: Fix vmx_check_nested_events() return value in case an
event was reinjected to L2
Liu Bo (3):
Btrfs: add write_flags for compression bio
Btrfs: bail out gracefully rather than BUG_ON
Btrfs: fix list_add corruption and soft lockups in fsync
Liu, Changcheng (1):
scripts/faddr2line: extend usage on generic arch
Liviu Dudau (3):
drm: hdlcd: Update PM code to save/restore console.
drm: mali-dp: Separate static internal data into a read-only structure.
drm: mali-dp: Disable planes when their CRTC gets disabled.
Lucas Stach (2):
drm/atomic: make drm_atomic_helper_wait_for_vblanks more agressive
drm/imx: always call wait_for_flip_done in commit_tail
Lv Zheng (1):
ACPI / EC: Fix regression related to PM ops support in ECDT device
Maarten Lankhorst (2):
drm/vblank: Pass crtc_id to page_flip_ioctl.
drm/fb_helper: Disable all crtc's when initial setup fails.
Mahesh Salgaonkar (1):
powerpc/powernv: Fix kexec crashes caused by tlbie tracing
Marcos Paulo de Souza (1):
blktrace: Use blk_trace_bio_get_cgid inside blk_add_trace_bio
Mark Rutland (1):
arm64: mm: cleanup stale AIVIVT references
Martin Schwidefsky (4):
s390: fix alloc_pgste check in init_new_context again
s390: sthyi: add SPDX identifiers to the remaining files
s390: revert ELF_ET_DYN_BASE base changes
s390/gs: add compat regset for the guarded storage broadcast control block
Max Gurtovoy (1):
nvme-rdma: fix memory leak during queue allocation
Michael Ellerman (1):
powerpc/kexec: Fix kexec/kdump in P9 guest kernels
Michael Lyle (1):
bcache: check return value of register_shrinker
Michal Hocko (3):
xfs: fortify xfs_alloc_buftarg error handling
mm, memory_hotplug: do not back off draining pcp free pages from
kworker context
Revert "mm/page-writeback.c: print a warning if the vm dirtiness
settings are illogical"
Michel Dänzer (2):
drm/amdgpu: Set adev->vcn.irq.num_types for VCN
drm/amdgpu: Use unsigned ring indices in amdgpu_queue_mgr_map
Mika Westerberg (1):
net: thunderbolt: Stop using zero to mean no valid DMA mapping
Mike Kravetz (1):
mm/cma: fix alloc_contig_range ret code/potential leak
Mike Maloney (1):
packet: fix crash in fanout_demux_rollover()
Mikulas Patocka (1):
block: remove useless assignment in bio_split
Minwoo Im (2):
nvme-pci: avoid hmb desc array idx out-of-bound when hmmaxd set.
nvme-pci: fix NULL pointer dereference in nvme_free_host_mem()
Mirza Krak (1):
drm/rockchip: dw-mipi-dsi: fix possible un-balanced runtime PM enable
Nadav Amit (1):
fs/hugetlbfs/inode.c: change put_page/unlock_page order in
hugetlbfs_fallocate()
Naofumi Honda (1):
nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat
Nikita Leshenko (5):
KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
KVM: x86: ioapic: Don't fire level irq when Remote IRR set
KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq
KVM: x86: ioapic: Clear Remote IRR when entry is switched to
edge-triggered
KVM: x86: ioapic: Preserve read-only values in the redirection table
Nikolay Borisov (1):
btrfs: Fix transaction abort during failure in btrfs_rm_dev_item
OGAWA Hirofumi (1):
fs/fat/inode.c: fix sb_rdonly() change
Oded Gabbay (2):
microblaze: add missing include to mmu_context_mm.h
drm/radeon: remove init of CIK VMIDs 8-16 for amdkfd
Olof Johansson (8):
RISC-V: use generic serial.h
RISC-V: use RISCV_{INT,SHORT} instead of {INT,SHORT} for asm macros
RISC-V: io.h: type fixes for warnings
RISC-V: move empty_zero_page definition to C and export it
RISC-V: Export some expected symbols for modules
RISC-V: Provide stub of setup_profiling_timer()
RISC-V: Use define for get_cycles like other architectures
RISC-V: Add missing include
Ondrej Mosnáček (1):
crypto: skcipher - Fix skcipher_walk_aead_common
Palmer Dabbelt (11):
RISC-V: Remove __vdso_cmpxchg{32,64} symbol versions
RISC-V: Remove unused arguments from ATOMIC_OP
RISC-V: Comment on why {,cmp}xchg is ordered how it is
RISC-V: Remove __smp_bp__{before,after}_atomic
RISC-V: Remove smb_mb__{before,after}_spinlock()
RISC-V: __test_and_op_bit_ord should be strongly ordered
RISC-V: Add READ_ONCE in arch_spin_is_locked()
RISC-V: `sfence.vma` orderes the instruction cache
RISC-V: remove spin_unlock_wait()
RISC-V: Clean up an unused include
RISC-V: __io_writes should respect the length argument
Paolo Abeni (1):
sch_sfq: fix null pointer dereference at timer expiration
Paolo Bonzini (2):
KVM: x86: inject exceptions produced by x86_decode_insn
KVM: vmx: use X86_CR4_UMIP and X86_FEATURE_UMIP
Paul Mackerras (1):
KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT
guests on radix hosts
Peter Rosin (1):
hwmon: (jc42) optionally try to disable the SMBUS timeout
Peter Ujfalusi (1):
omapdrm: hdmi4: Correct the SoC revision matching
Petr Machata (4):
mlxsw: spectrum_router: Offload decap only for up tunnels
mlxsw: spectrum_router: Demote tunnels on VRF migration
mlxsw: spectrum_router: Handle encap to demoted tunnels
mlxsw: spectrum_router: Update nexthop RIF on update
Pierre-Hugues Husson (1):
drm/bridge: synopsys/dw-hdmi: Enable cec clock
Qu Wenruo (2):
btrfs: Fix wild memory access in compression level parser
btrfs: tree-checker: Fix false panic for sanity test
Randy Dunlap (2):
block: genhd.c: fix message typo
drm/amdkfd: fix amdkfd use-after-free GP fault
Robert Lippert (1):
hwmon: (pmbus) Use 64bit math for DIRECT format values
Roman Kapl (1):
net: sched: crash on blocks with goto chain action
Roman Li (2):
drm/amd/display: Fix S3 topology change
drm/amd/display: fix gamma setting
Rui Hua (1):
bcache: recover data from backing when data is clean
Russell King (1):
ARM: avoid faulting on qemu
Sagi Grimberg (7):
nvme-fabrics: introduce init command check for a queue that is not alive
nvme-fc: check if queue is ready in queue_rq
nvme-loop: check if queue is ready in queue_rq
nvme-rdma: don't suppress send completions
nvme-rdma: don't complete requests before a send work request
has completed
nvme-rdma: wait for local invalidation before completing a request
nvme-rdma: Check remotely invalidated rkey matches our expected rkey
Sara Sharon (1):
mac80211: tear down RX aggregations first
Sasha Neftin (1):
e1000e: fix the use of magic numbers for buffer overrun issue
Shakeel Butt (1):
mm, memcg: fix mem_cgroup_swapout() for THPs
Shirish S (1):
drm/amd/display: check plane state before validating fbc
Srishti Sharma (2):
drm/arm: Replace instances of drm_dev_unref with drm_dev_put.
drm/arm: Replace instances of drm_dev_unref with drm_dev_put.
Stefan Schake (1):
drm/vc4: Account for interrupts in flight
Stephan Mueller (2):
crypto: algif_aead - skip SGL entries with NULL page
crypto: af_alg - remove locking in async callback
Stephen Hemminger (1):
uapi: add SPDX identifier to vm_sockets_diag.h
Sunil Goutham (1):
net: thunderx: Fix TCP/UDP checksum offload for IPv6 pkts
Takashi Iwai (1):
Revert "ALSA: usb-audio: Fix potential zero-division at parsing FU"
Tang Junhui (1):
bcache: add a comment in journal bucket reading
Tetsuo Handa (1):
quota: Check for register_shrinker() failure.
Thomas Meyer (1):
auxdisplay: img-ascii-lcd: Only build on archs that have IOMEM
Thomas Richter (1):
s390/topology: fix compile error in file arch/s390/kernel/smp.c
Tobin C. Harding (5):
docs: correct documentation for %pK
vsprintf: refactor %pK code out of pointer()
printk: hash addresses printed with %p
vsprintf: add printk specifier %px
kasan: use %px to print addresses instead of %p
Trond Myklebust (11):
nfsd: Fix stateid races between OPEN and CLOSE
nfsd: Fix another OPEN stateid race
nfsd: CLOSE SHOULD return the invalid special stateid for NFSv4.x (x>0)
nfsd: Ensure we don't recognise lock stateids after freeing them
nfsd4: move find_lock_stateid
nfsd: Fix race in lock stateid creation
nfsd: Ensure we check stateid validity in the seqid operation checks
nfsd: Fix races with check_stateid_generation()
NFSv4: Ensure gcc 4.4.4 can compile initialiser for "invalid_stateid"
SUNRPC: Allow connect to return EHOSTUNREACH
SUNRPC: Handle ENETDOWN errors
Ulf Hansson (1):
mmc: sdhci: Avoid swiotlb buffer being full
Vaibhav Jain (3):
cxl: Check if vphb exists before iterating over AFU devices
powerpc: Avoid signed to unsigned conversion in set_thread_tidr()
powerpc: Do not assign thread.tidr if already assigned
Vasily Averin (9):
nfsd: remove net pointer from debug messages
lockd: remove net pointer from messages
grace: replace BUG_ON by WARN_ONCE in exit_net hook
lockd: added cleanup checks in exit_net hook
lockd: lost rollback of set_grace_period() in lockd_down_net()
race of lockd inetaddr notifiers vs nlmsvc_rqst change
race of nfsd inetaddr notifiers vs nn->nfsd_serv change
nlm_shutdown_hosts_net() cleanup
lockd: fix "list_add double add" caused by legacy signal interface
Vasily Gorbik (1):
s390/disassembler: correct disassembly lines alignment
Vasyl Gomonovych (1):
lmc: Use memdup_user() as a cleanup
Ville Syrjälä (4):
drm/edid: Don't send non-zero YQ in AVI infoframe for HDMI 1.x sinks
drm/i915: Fix init_clock_gating for resume
drm/i915: Don't try indexed reads to alternate slave addresses
drm/i915: Prevent zero length "index" write
Vitor Massaru Iha (1):
drm: Fix checkpatch issue: "WARNING: braces {} are not necessary
for single statement blocks."
Vivien Didelot (1):
net: dsa: fix 'increment on 0' warning
Wang Nan (1):
mm, oom_reaper: gather each vma to prevent leaking TLB entry
Wanpeng Li (6):
KVM: X86: Fix operand/address-size during instruction decoding
KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry
KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure
KVM: X86: Fix softlockup when get the current kvmclock
KVM: VMX: Fix rflags cache during vCPU reset
KVM: VMX: Fix vmx->nested freeing when no SMI handler
Weinan Li (1):
drm/i915/gvt: remove skl_misc_ctl_write handler
Will Deacon (1):
arm64: context: Fix comments and remove pointless smp_wmb()
Xiaolin Zhang (1):
drm/i915/gvt: enabled pipe A default on creating vgpu
Xin Long (11):
sctp: use sizeof(__u16) for each stream number length instead of
magic number
sctp: only allow the out stream reset when the stream outq is empty
sctp: only allow the asoc reset when the asoc outq is empty
sctp: avoid flushing unsent queue when doing asoc reset
sctp: set sender next_tsn for the old result with ctsn_ack_point plus 1
sctp: force SCTP_ERROR_INV_STRM with __u32 when calling sctp_chunk_fail
sctp: force the params with right types for sctp csum apis
sctp: remove extern from stream sched
sctp: use right member as the param of list_for_each_entry
bonding: use nla_get_u64 to extract the value for
IFLA_BOND_AD_ACTOR_SYSTEM
vxlan: use __be32 type for the param vni in __vxlan_fdb_delete
Xiong Zhang (1):
drm/i915/gvt: Correct ADDR_4K/2M/1G_MASK definition
Xu YiPing (1):
arm64: perf: remove unsupported events for Cortex-A73
Yan Markman (1):
net: mvpp2: do not disable GMAC padding
Yisheng Xie (1):
kmemleak: add scheduling point to kmemleak_scan()
Yury Norov (1):
arm64: cpu_ops: Add missing 'const' qualifiers
Zhu Yanjun (1):
forcedeth: replace pci_unmap_page with dma_unmap_page
Zi Yan (1):
mm: migrate: fix an incorrect call of prep_transhuge_page()
chenjie (1):
mm/madvise.c: fix madvise() infinite loop under special circumstances
fred gao (1):
drm/i915/gvt: Move request alloc to dispatch_workload path only
shaoyunl (1):
drm/amdkfd: Fix SDMA ring buffer size calculation
weiping zhang (7):
bdi: convert bdi_debug_register to int
bdi: add error handle for bdi_debug_register
block: add WARN_ON if bdi register fail
blk-wbt: remove duplicated setting in wbt_init
blk-sysfs: remove NULL pointer checking in queue_wb_lat_store
blk-wbt: move wbt_clear_stat to common place in wbt_done
blk-wbt: fix comments typo
zhangliping (1):
openvswitch: fix the incorrect flow action alloc size
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-03 16:22 Linux 4.15-rc2 Linus Torvalds
@ 2017-12-04 22:25 ` Rafael J. Wysocki
2017-12-04 22:36 ` Linus Torvalds
2018-02-21 18:36 ` Linux 4.15-rc2 Eugene Syromiatnikov
1 sibling, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-04 22:25 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Linux Kernel Mailing List, x86
On Sunday, December 3, 2017 5:22:56 PM CET Linus Torvalds wrote:
> It's Sunday, but a few hours earlier than usual, since I'm on the east
> coast, three hours ahead of my normal release schedule.
>
> It's a slightly bigger rc2 than I would have wished for, but this
> early in the release process I don't worry about it. The appended
> shortlog gives the details, it's fixes all over the place -
> architectures, drivers, filesystems, networking, core kernel.
>
> One thing I'll point out is that I'm trying to get some kernel ASLR
> leaks plugged, and as part of that we now hash any pointers printed by
> "%p' by default. That won't affect a lot of people, but where it is a
> debugging problem (rather than leaking interesting kernel pointers),
> we will have to fix things up.
>
> It can be a small annoyance, but the alternatives (trying to actually
> find all the cases where we might be leaking) were worse. But let's
> see if anybody even notices - a lot of the pointer printouts are stale
> debug information from when some driver was originally written, and
> aren't actually really interesting.
>
> There will probably be some more leak fixes during this rc process,
> we'll see how that all sorts out.
So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
systems I have tested, so it is probably safe to assume it to be
broken everywhere.
I'm quite confident that this is not something that went in through the
PM tree, because I was running those changes on the systems that turn
out to be broken now.
It looks like the the ACPI waking vector mechanism stopped working, so
I'm suspecting some x86 changes having to do with virtual-to-physical
address mapping.
I've just started bisection.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-04 22:25 ` Linux 4.15-rc2: Regression in resume from ACPI S3 Rafael J. Wysocki
@ 2017-12-04 22:36 ` Linus Torvalds
2017-12-04 22:38 ` Thomas Gleixner
2017-12-06 12:15 ` Michal Hocko
0 siblings, 2 replies; 83+ messages in thread
From: Linus Torvalds @ 2017-12-04 22:36 UTC (permalink / raw)
To: Rafael J. Wysocki, Andy Lutomirski
Cc: Linux Kernel Mailing List, the arch/x86 maintainers
On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> systems I have tested, so it is probably safe to assume it to be
> broken everywhere.
Oh, it's definitely not broken everywhere, because I use it myself,
and was traveling last week due to my mom's bday.
HOWEVER.
Some of the x86 work seems to have broken it for some configurations.
In particular, do you have a big "everything enabled" kernel config -
particularly lockdep and irqflags tracing enabled?
Andy has a patch, but it hasn't made it to me yet (probably because
the x86 people are very busy with the kaiser work):
https://lkml.org/lkml/2017/11/30/546
(also note his follow-up "fix the commit message" note, but that one
doesn't actually affect the code itself).
Does that patch fix it for you?
Linus
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-04 22:36 ` Linus Torvalds
@ 2017-12-04 22:38 ` Thomas Gleixner
2017-12-04 22:41 ` Rafael J. Wysocki
2017-12-06 12:15 ` Michal Hocko
1 sibling, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-04 22:38 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers
On Mon, 4 Dec 2017, Linus Torvalds wrote:
> On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > systems I have tested, so it is probably safe to assume it to be
> > broken everywhere.
>
> Oh, it's definitely not broken everywhere, because I use it myself,
> and was traveling last week due to my mom's bday.
>
> HOWEVER.
>
> Some of the x86 work seems to have broken it for some configurations.
> In particular, do you have a big "everything enabled" kernel config -
> particularly lockdep and irqflags tracing enabled?
>
> Andy has a patch, but it hasn't made it to me yet (probably because
> the x86 people are very busy with the kaiser work):
Picking it up right now.
Thanks,
tglx
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-04 22:38 ` Thomas Gleixner
@ 2017-12-04 22:41 ` Rafael J. Wysocki
2017-12-05 0:25 ` Rafael J. Wysocki
0 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-04 22:41 UTC (permalink / raw)
To: Thomas Gleixner, Linus Torvalds
Cc: Andy Lutomirski, Linux Kernel Mailing List, the arch/x86 maintainers
On Monday, December 4, 2017 11:38:54 PM CET Thomas Gleixner wrote:
> On Mon, 4 Dec 2017, Linus Torvalds wrote:
>
> > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > systems I have tested, so it is probably safe to assume it to be
> > > broken everywhere.
> >
> > Oh, it's definitely not broken everywhere, because I use it myself,
> > and was traveling last week due to my mom's bday.
> >
> > HOWEVER.
> >
> > Some of the x86 work seems to have broken it for some configurations.
> > In particular, do you have a big "everything enabled" kernel config -
> > particularly lockdep and irqflags tracing enabled?
> >
> > Andy has a patch, but it hasn't made it to me yet (probably because
> > the x86 people are very busy with the kaiser work):
This definitely fixes the problem at least on one of the affected machines.
> Picking it up right now.
Cool, thanks!
Rafael
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-04 22:41 ` Rafael J. Wysocki
@ 2017-12-05 0:25 ` Rafael J. Wysocki
2017-12-09 10:33 ` Pavel Machek
0 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-05 0:25 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Linus Torvalds, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers
On Monday, December 4, 2017 11:41:06 PM CET Rafael J. Wysocki wrote:
> On Monday, December 4, 2017 11:38:54 PM CET Thomas Gleixner wrote:
> > On Mon, 4 Dec 2017, Linus Torvalds wrote:
> >
> > > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > >
> > > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > > systems I have tested, so it is probably safe to assume it to be
> > > > broken everywhere.
> > >
> > > Oh, it's definitely not broken everywhere, because I use it myself,
> > > and was traveling last week due to my mom's bday.
> > >
> > > HOWEVER.
> > >
> > > Some of the x86 work seems to have broken it for some configurations.
> > > In particular, do you have a big "everything enabled" kernel config -
> > > particularly lockdep and irqflags tracing enabled?
> > >
> > > Andy has a patch, but it hasn't made it to me yet (probably because
> > > the x86 people are very busy with the kaiser work):
>
> This definitely fixes the problem at least on one of the affected machines.
I can confirm that the Andy's patch fixes it on all systems that had this
issue here.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-04 22:36 ` Linus Torvalds
2017-12-04 22:38 ` Thomas Gleixner
@ 2017-12-06 12:15 ` Michal Hocko
2017-12-06 12:23 ` Thomas Gleixner
` (2 more replies)
1 sibling, 3 replies; 83+ messages in thread
From: Michal Hocko @ 2017-12-06 12:15 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers
On Mon 04-12-17 14:36:20, Linus Torvalds wrote:
> On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > systems I have tested, so it is probably safe to assume it to be
> > broken everywhere.
>
> Oh, it's definitely not broken everywhere, because I use it myself,
> and was traveling last week due to my mom's bday.
>
> HOWEVER.
>
> Some of the x86 work seems to have broken it for some configurations.
> In particular, do you have a big "everything enabled" kernel config -
> particularly lockdep and irqflags tracing enabled?
>
> Andy has a patch, but it hasn't made it to me yet (probably because
> the x86 people are very busy with the kaiser work):
>
> https://lkml.org/lkml/2017/11/30/546
>
> (also note his follow-up "fix the commit message" note, but that one
> doesn't actually affect the code itself).
merging tip/x86/urgent on top of your tree fixed this problem for me,
but I am seeing something else
[ 131.711412] ACPI: Preparing to enter system sleep state S3
[ 131.755328] ACPI: EC: event blocked
[ 131.755328] ACPI: EC: EC stopped
[ 131.755328] PM: Saving platform NVS memory
[ 131.755344] Disabling non-boot CPUs ...
[ 131.779330] IRQ 124: no longer affine to CPU1
[ 131.780334] smpboot: CPU 1 is now offline
[ 131.804465] smpboot: CPU 2 is now offline
[ 131.827291] IRQ 122: no longer affine to CPU3
[ 131.827292] IRQ 123: no longer affine to CPU3
[ 131.828293] smpboot: CPU 3 is now offline
[ 131.830991] ACPI: Low-level resume complete
[ 131.831092] ACPI: EC: EC started
[ 131.831093] PM: Restoring platform NVS memory
[ 131.831864] do_IRQ: 0.55 No irq handler for vector
[ 131.831884] Enabling non-boot CPUs ...
[ 131.831909] x86: Booting SMP configuration:
[ 131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 131.832913] cache: parent cpu1 should not be sleeping
[ 131.833058] CPU1 is up
[ 131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
[ 131.833864] cache: parent cpu2 should not be sleeping
[ 131.833983] CPU2 is up
[ 131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
[ 131.834776] cache: parent cpu3 should not be sleeping
[ 131.834923] CPU3 is up
"No irq handler" part looks a bit scary (maybe related to lost affinity
messages?) but the following messages look quite as well. Is this
something known? The system seems to be up and running without any
visible issues.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-06 12:15 ` Michal Hocko
@ 2017-12-06 12:23 ` Thomas Gleixner
2017-12-06 14:04 ` Rafael J. Wysocki
2017-12-06 12:31 ` Maarten Lankhorst
2017-12-07 7:55 ` Michal Hocko
2 siblings, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-06 12:23 UTC (permalink / raw)
To: Michal Hocko
Cc: Linus Torvalds, Rafael J. Wysocki, Andy Lutomirski,
Linux Kernel Mailing List, the arch/x86 maintainers
On Wed, 6 Dec 2017, Michal Hocko wrote:
> merging tip/x86/urgent on top of your tree fixed this problem for me,
> but I am seeing something else
> [ 131.711412] ACPI: Preparing to enter system sleep state S3
> [ 131.755328] ACPI: EC: event blocked
> [ 131.755328] ACPI: EC: EC stopped
> [ 131.755328] PM: Saving platform NVS memory
> [ 131.755344] Disabling non-boot CPUs ...
> [ 131.779330] IRQ 124: no longer affine to CPU1
> [ 131.780334] smpboot: CPU 1 is now offline
> [ 131.804465] smpboot: CPU 2 is now offline
> [ 131.827291] IRQ 122: no longer affine to CPU3
> [ 131.827292] IRQ 123: no longer affine to CPU3
> [ 131.828293] smpboot: CPU 3 is now offline
> [ 131.830991] ACPI: Low-level resume complete
> [ 131.831092] ACPI: EC: EC started
> [ 131.831093] PM: Restoring platform NVS memory
> [ 131.831864] do_IRQ: 0.55 No irq handler for vector
Hmm, that's really odd.
> [ 131.831884] Enabling non-boot CPUs ...
> [ 131.831909] x86: Booting SMP configuration:
> [ 131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [ 131.832913] cache: parent cpu1 should not be sleeping
This is an old one.
> [ 131.833058] CPU1 is up
> [ 131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> [ 131.833864] cache: parent cpu2 should not be sleeping
> [ 131.833983] CPU2 is up
> [ 131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> [ 131.834776] cache: parent cpu3 should not be sleeping
> [ 131.834923] CPU3 is up
>
> "No irq handler" part looks a bit scary (maybe related to lost affinity
> messages?) but the following messages look quite as well. Is this
> something known? The system seems to be up and running without any
> visible issues.
I assume it's due to the affinity break, just that we don't know right now
on which CPU that do_IRQ() message triggered. I assume it's CPU0 because
the others are offline already, but ....
I'll think about it how we can figure out what's going on.
Thanks,
tglx
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-06 12:15 ` Michal Hocko
2017-12-06 12:23 ` Thomas Gleixner
@ 2017-12-06 12:31 ` Maarten Lankhorst
2017-12-06 12:46 ` Thomas Gleixner
2017-12-07 7:55 ` Michal Hocko
2 siblings, 1 reply; 83+ messages in thread
From: Maarten Lankhorst @ 2017-12-06 12:31 UTC (permalink / raw)
To: Michal Hocko, Linus Torvalds
Cc: Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers
Op 06-12-17 om 13:15 schreef Michal Hocko:
> On Mon 04-12-17 14:36:20, Linus Torvalds wrote:
>> On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>> So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
>>> systems I have tested, so it is probably safe to assume it to be
>>> broken everywhere.
>> Oh, it's definitely not broken everywhere, because I use it myself,
>> and was traveling last week due to my mom's bday.
>>
>> HOWEVER.
>>
>> Some of the x86 work seems to have broken it for some configurations.
>> In particular, do you have a big "everything enabled" kernel config -
>> particularly lockdep and irqflags tracing enabled?
>>
>> Andy has a patch, but it hasn't made it to me yet (probably because
>> the x86 people are very busy with the kaiser work):
>>
>> https://lkml.org/lkml/2017/11/30/546
>>
>> (also note his follow-up "fix the commit message" note, but that one
>> doesn't actually affect the code itself).
> merging tip/x86/urgent on top of your tree fixed this problem for me,
> but I am seeing something else
> [ 131.711412] ACPI: Preparing to enter system sleep state S3
> [ 131.755328] ACPI: EC: event blocked
> [ 131.755328] ACPI: EC: EC stopped
> [ 131.755328] PM: Saving platform NVS memory
> [ 131.755344] Disabling non-boot CPUs ...
> [ 131.779330] IRQ 124: no longer affine to CPU1
> [ 131.780334] smpboot: CPU 1 is now offline
> [ 131.804465] smpboot: CPU 2 is now offline
> [ 131.827291] IRQ 122: no longer affine to CPU3
> [ 131.827292] IRQ 123: no longer affine to CPU3
> [ 131.828293] smpboot: CPU 3 is now offline
> [ 131.830991] ACPI: Low-level resume complete
> [ 131.831092] ACPI: EC: EC started
> [ 131.831093] PM: Restoring platform NVS memory
> [ 131.831864] do_IRQ: 0.55 No irq handler for vector
> [ 131.831884] Enabling non-boot CPUs ...
> [ 131.831909] x86: Booting SMP configuration:
> [ 131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [ 131.832913] cache: parent cpu1 should not be sleeping
> [ 131.833058] CPU1 is up
> [ 131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> [ 131.833864] cache: parent cpu2 should not be sleeping
> [ 131.833983] CPU2 is up
> [ 131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> [ 131.834776] cache: parent cpu3 should not be sleeping
> [ 131.834923] CPU3 is up
>
> "No irq handler" part looks a bit scary (maybe related to lost affinity
> messages?) but the following messages look quite as well. Is this
> something known? The system seems to be up and running without any
> visible issues.
Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
Symptoms are similar..
~Maarten
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-06 12:31 ` Maarten Lankhorst
@ 2017-12-06 12:46 ` Thomas Gleixner
2017-12-06 13:09 ` Maarten Lankhorst
0 siblings, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-06 12:46 UTC (permalink / raw)
To: Maarten Lankhorst
Cc: Michal Hocko, Linus Torvalds, Rafael J. Wysocki, Andy Lutomirski,
Linux Kernel Mailing List, the arch/x86 maintainers
On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
> Op 06-12-17 om 13:15 schreef Michal Hocko:
> >
> > "No irq handler" part looks a bit scary (maybe related to lost affinity
> > messages?) but the following messages look quite as well. Is this
> > something known? The system seems to be up and running without any
> > visible issues.
>
> Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
> Symptoms are similar..
Well, the spurious interrupt is one thing, but you obviously lose
interrupts for some reason.
Did you ever manage to get the data out which I asked for?
Thanks,
tglx
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-06 12:46 ` Thomas Gleixner
@ 2017-12-06 13:09 ` Maarten Lankhorst
2017-12-06 14:15 ` Thomas Gleixner
0 siblings, 1 reply; 83+ messages in thread
From: Maarten Lankhorst @ 2017-12-06 13:09 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Michal Hocko, Linus Torvalds, Rafael J. Wysocki, Andy Lutomirski,
Linux Kernel Mailing List, the arch/x86 maintainers
Op 06-12-17 om 13:46 schreef Thomas Gleixner:
> On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
>> Op 06-12-17 om 13:15 schreef Michal Hocko:
>>> "No irq handler" part looks a bit scary (maybe related to lost affinity
>>> messages?) but the following messages look quite as well. Is this
>>> something known? The system seems to be up and running without any
>>> visible issues.
>> Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
>> Symptoms are similar..
> Well, the spurious interrupt is one thing, but you obviously lose
> interrupts for some reason.
>
> Did you ever manage to get the data out which I asked for?
>
> Thanks,
>
> tglx
>
Yes, sent this out about an hour ago
https://lkml.org/lkml/2017/12/6/215
Cheers,
Maarten
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-06 12:23 ` Thomas Gleixner
@ 2017-12-06 14:04 ` Rafael J. Wysocki
0 siblings, 0 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-06 14:04 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Michal Hocko, Linus Torvalds, Andy Lutomirski,
Linux Kernel Mailing List, the arch/x86 maintainers
On Wednesday, December 6, 2017 1:23:34 PM CET Thomas Gleixner wrote:
> On Wed, 6 Dec 2017, Michal Hocko wrote:
> > merging tip/x86/urgent on top of your tree fixed this problem for me,
> > but I am seeing something else
> > [ 131.711412] ACPI: Preparing to enter system sleep state S3
> > [ 131.755328] ACPI: EC: event blocked
> > [ 131.755328] ACPI: EC: EC stopped
> > [ 131.755328] PM: Saving platform NVS memory
> > [ 131.755344] Disabling non-boot CPUs ...
> > [ 131.779330] IRQ 124: no longer affine to CPU1
> > [ 131.780334] smpboot: CPU 1 is now offline
> > [ 131.804465] smpboot: CPU 2 is now offline
> > [ 131.827291] IRQ 122: no longer affine to CPU3
> > [ 131.827292] IRQ 123: no longer affine to CPU3
> > [ 131.828293] smpboot: CPU 3 is now offline
> > [ 131.830991] ACPI: Low-level resume complete
> > [ 131.831092] ACPI: EC: EC started
> > [ 131.831093] PM: Restoring platform NVS memory
> > [ 131.831864] do_IRQ: 0.55 No irq handler for vector
>
> Hmm, that's really odd.
>
> > [ 131.831884] Enabling non-boot CPUs ...
> > [ 131.831909] x86: Booting SMP configuration:
> > [ 131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > [ 131.832913] cache: parent cpu1 should not be sleeping
>
> This is an old one.
>
> > [ 131.833058] CPU1 is up
> > [ 131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> > [ 131.833864] cache: parent cpu2 should not be sleeping
> > [ 131.833983] CPU2 is up
> > [ 131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> > [ 131.834776] cache: parent cpu3 should not be sleeping
> > [ 131.834923] CPU3 is up
> >
> > "No irq handler" part looks a bit scary (maybe related to lost affinity
> > messages?) but the following messages look quite as well. Is this
> > something known? The system seems to be up and running without any
> > visible issues.
>
> I assume it's due to the affinity break, just that we don't know right now
> on which CPU that do_IRQ() message triggered. I assume it's CPU0 because
> the others are offline already, but ....
This is resume from S3, so the firmware might do something odd to the other
CPUs, but in case it didn't (which is quite likely or we would have seen more
of these messages), they are offline and in mwait_play_dead(), so IMO it is
safe to assume that this was CPU0.
And this appears to have happened at the atch_suspend_enable_irqs() time,
which is just local_irq_enable() on x86 running on CPU0.
> I'll think about it how we can figure out what's going on.
It looks like an interrupt that have triggered right after we've enabled
interrupts on the boot CPU.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-06 13:09 ` Maarten Lankhorst
@ 2017-12-06 14:15 ` Thomas Gleixner
2017-12-07 13:33 ` Maarten Lankhorst
0 siblings, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-06 14:15 UTC (permalink / raw)
To: Maarten Lankhorst
Cc: Michal Hocko, Linus Torvalds, Rafael J. Wysocki, Andy Lutomirski,
Linux Kernel Mailing List, the arch/x86 maintainers
On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
> Op 06-12-17 om 13:46 schreef Thomas Gleixner:
> > On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
> >> Op 06-12-17 om 13:15 schreef Michal Hocko:
> >>> "No irq handler" part looks a bit scary (maybe related to lost affinity
> >>> messages?) but the following messages look quite as well. Is this
> >>> something known? The system seems to be up and running without any
> >>> visible issues.
> >> Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
> >> Symptoms are similar..
> > Well, the spurious interrupt is one thing, but you obviously lose
> > interrupts for some reason.
> >
> > Did you ever manage to get the data out which I asked for?
> >
> > Thanks,
> >
> > tglx
> >
> Yes, sent this out about an hour ago
>
> https://lkml.org/lkml/2017/12/6/215
Weird. Did not reach me
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-06 12:15 ` Michal Hocko
2017-12-06 12:23 ` Thomas Gleixner
2017-12-06 12:31 ` Maarten Lankhorst
@ 2017-12-07 7:55 ` Michal Hocko
2017-12-10 20:30 ` Michal Hocko
2 siblings, 1 reply; 83+ messages in thread
From: Michal Hocko @ 2017-12-07 7:55 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers
On Wed 06-12-17 13:14:52, Michal Hocko wrote:
> On Mon 04-12-17 14:36:20, Linus Torvalds wrote:
> > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > systems I have tested, so it is probably safe to assume it to be
> > > broken everywhere.
> >
> > Oh, it's definitely not broken everywhere, because I use it myself,
> > and was traveling last week due to my mom's bday.
> >
> > HOWEVER.
> >
> > Some of the x86 work seems to have broken it for some configurations.
> > In particular, do you have a big "everything enabled" kernel config -
> > particularly lockdep and irqflags tracing enabled?
> >
> > Andy has a patch, but it hasn't made it to me yet (probably because
> > the x86 people are very busy with the kaiser work):
> >
> > https://lkml.org/lkml/2017/11/30/546
> >
> > (also note his follow-up "fix the commit message" note, but that one
> > doesn't actually affect the code itself).
>
> merging tip/x86/urgent on top of your tree fixed this problem for me,
> but I am seeing something else
> [ 131.711412] ACPI: Preparing to enter system sleep state S3
> [ 131.755328] ACPI: EC: event blocked
> [ 131.755328] ACPI: EC: EC stopped
> [ 131.755328] PM: Saving platform NVS memory
> [ 131.755344] Disabling non-boot CPUs ...
> [ 131.779330] IRQ 124: no longer affine to CPU1
> [ 131.780334] smpboot: CPU 1 is now offline
> [ 131.804465] smpboot: CPU 2 is now offline
> [ 131.827291] IRQ 122: no longer affine to CPU3
> [ 131.827292] IRQ 123: no longer affine to CPU3
> [ 131.828293] smpboot: CPU 3 is now offline
> [ 131.830991] ACPI: Low-level resume complete
> [ 131.831092] ACPI: EC: EC started
> [ 131.831093] PM: Restoring platform NVS memory
> [ 131.831864] do_IRQ: 0.55 No irq handler for vector
> [ 131.831884] Enabling non-boot CPUs ...
> [ 131.831909] x86: Booting SMP configuration:
> [ 131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [ 131.832913] cache: parent cpu1 should not be sleeping
> [ 131.833058] CPU1 is up
> [ 131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> [ 131.833864] cache: parent cpu2 should not be sleeping
> [ 131.833983] CPU2 is up
> [ 131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> [ 131.834776] cache: parent cpu3 should not be sleeping
> [ 131.834923] CPU3 is up
>
> "No irq handler" part looks a bit scary (maybe related to lost affinity
> messages?) but the following messages look quite as well. Is this
> something known? The system seems to be up and running without any
> visible issues.
Hmm, there is still something bad going on during resume. My laptop
haven't woken up from s2ram this morning. The screen was powered on
but the system hasn't come up.
The last thing that made it into the kernel log on fs is this
Dec 6 19:32:29 tiehlicka kernel: [21898.084685] PM: suspend entry (deep)
which won't tell us much I suspect. I've tried dozen s2ram cycles and it
hasn't reproduced so it smells like a timing issue.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-06 14:15 ` Thomas Gleixner
@ 2017-12-07 13:33 ` Maarten Lankhorst
2017-12-08 10:30 ` Thomas Gleixner
0 siblings, 1 reply; 83+ messages in thread
From: Maarten Lankhorst @ 2017-12-07 13:33 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Michal Hocko, Linus Torvalds, Rafael J. Wysocki, Andy Lutomirski,
Linux Kernel Mailing List, the arch/x86 maintainers
Op 06-12-17 om 15:15 schreef Thomas Gleixner:
> On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
>> Op 06-12-17 om 13:46 schreef Thomas Gleixner:
>>> On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
>>>> Op 06-12-17 om 13:15 schreef Michal Hocko:
>>>>> "No irq handler" part looks a bit scary (maybe related to lost affinity
>>>>> messages?) but the following messages look quite as well. Is this
>>>>> something known? The system seems to be up and running without any
>>>>> visible issues.
>>>> Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
>>>> Symptoms are similar..
>>> Well, the spurious interrupt is one thing, but you obviously lose
>>> interrupts for some reason.
>>>
>>> Did you ever manage to get the data out which I asked for?
>>>
>>> Thanks,
>>>
>>> tglx
>>>
>> Yes, sent this out about an hour ago
>>
>> https://lkml.org/lkml/2017/12/6/215
> Weird. Did not reach me
>
But do you have any idea?
~Maarten
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-07 13:33 ` Maarten Lankhorst
@ 2017-12-08 10:30 ` Thomas Gleixner
2017-12-13 15:57 ` Thomas Gleixner
0 siblings, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-08 10:30 UTC (permalink / raw)
To: Maarten Lankhorst
Cc: Michal Hocko, Linus Torvalds, Rafael J. Wysocki, Andy Lutomirski,
Linux Kernel Mailing List, the arch/x86 maintainers
On Thu, 7 Dec 2017, Maarten Lankhorst wrote:
> Op 06-12-17 om 15:15 schreef Thomas Gleixner:
> > On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
> >> Op 06-12-17 om 13:46 schreef Thomas Gleixner:
> >>> On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
> >>>> Op 06-12-17 om 13:15 schreef Michal Hocko:
> >>>>> "No irq handler" part looks a bit scary (maybe related to lost affinity
> >>>>> messages?) but the following messages look quite as well. Is this
> >>>>> something known? The system seems to be up and running without any
> >>>>> visible issues.
> >>>> Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
> >>>> Symptoms are similar..
> >>> Well, the spurious interrupt is one thing, but you obviously lose
> >>> interrupts for some reason.
> >>>
> >>> Did you ever manage to get the data out which I asked for?
> >>>
> >>> Thanks,
> >>>
> >>> tglx
> >>>
> >> Yes, sent this out about an hour ago
> >>
> >> https://lkml.org/lkml/2017/12/6/215
> > Weird. Did not reach me
> >
> But do you have any idea?
Can you please provide the full trace, dmesg and the full output of
.../debug/irq/... ?
Thanks,
tglx
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-05 0:25 ` Rafael J. Wysocki
@ 2017-12-09 10:33 ` Pavel Machek
2017-12-09 11:41 ` Pavel Machek
[not found] ` <CA+55aFw8tuoJ2gcXx3K2sKFf2Y9hXX4naMVQNqGOUivnjwhjkg@mail.gmail.com>
0 siblings, 2 replies; 83+ messages in thread
From: Pavel Machek @ 2017-12-09 10:33 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Thomas Gleixner, Linus Torvalds, Andy Lutomirski,
Linux Kernel Mailing List, the arch/x86 maintainers
[-- Attachment #1: Type: text/plain, Size: 1556 bytes --]
On Tue 2017-12-05 01:25:55, Rafael J. Wysocki wrote:
> On Monday, December 4, 2017 11:41:06 PM CET Rafael J. Wysocki wrote:
> > On Monday, December 4, 2017 11:38:54 PM CET Thomas Gleixner wrote:
> > > On Mon, 4 Dec 2017, Linus Torvalds wrote:
> > >
> > > > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > >
> > > > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > > > systems I have tested, so it is probably safe to assume it to be
> > > > > broken everywhere.
> > > >
> > > > Oh, it's definitely not broken everywhere, because I use it myself,
> > > > and was traveling last week due to my mom's bday.
> > > >
> > > > HOWEVER.
> > > >
> > > > Some of the x86 work seems to have broken it for some configurations.
> > > > In particular, do you have a big "everything enabled" kernel config -
> > > > particularly lockdep and irqflags tracing enabled?
> > > >
> > > > Andy has a patch, but it hasn't made it to me yet (probably because
> > > > the x86 people are very busy with the kaiser work):
> >
> > This definitely fixes the problem at least on one of the affected machines.
>
> I can confirm that the Andy's patch fixes it on all systems that had this
> issue here.
I believe I have the issue here, too (-next on thinkpad x60). Which
patch is expected to fix it? Let me try recent -next...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-09 10:33 ` Pavel Machek
@ 2017-12-09 11:41 ` Pavel Machek
[not found] ` <CA+55aFw8tuoJ2gcXx3K2sKFf2Y9hXX4naMVQNqGOUivnjwhjkg@mail.gmail.com>
1 sibling, 0 replies; 83+ messages in thread
From: Pavel Machek @ 2017-12-09 11:41 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Thomas Gleixner, Linus Torvalds, Andy Lutomirski,
Linux Kernel Mailing List, the arch/x86 maintainers
[-- Attachment #1: Type: text/plain, Size: 1690 bytes --]
On Sat 2017-12-09 11:33:25, Pavel Machek wrote:
> On Tue 2017-12-05 01:25:55, Rafael J. Wysocki wrote:
> > On Monday, December 4, 2017 11:41:06 PM CET Rafael J. Wysocki wrote:
> > > On Monday, December 4, 2017 11:38:54 PM CET Thomas Gleixner wrote:
> > > > On Mon, 4 Dec 2017, Linus Torvalds wrote:
> > > >
> > > > > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > > >
> > > > > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > > > > systems I have tested, so it is probably safe to assume it to be
> > > > > > broken everywhere.
> > > > >
> > > > > Oh, it's definitely not broken everywhere, because I use it myself,
> > > > > and was traveling last week due to my mom's bday.
> > > > >
> > > > > HOWEVER.
> > > > >
> > > > > Some of the x86 work seems to have broken it for some configurations.
> > > > > In particular, do you have a big "everything enabled" kernel config -
> > > > > particularly lockdep and irqflags tracing enabled?
> > > > >
> > > > > Andy has a patch, but it hasn't made it to me yet (probably because
> > > > > the x86 people are very busy with the kaiser work):
> > >
> > > This definitely fixes the problem at least on one of the affected machines.
> >
> > I can confirm that the Andy's patch fixes it on all systems that had this
> > issue here.
>
> I believe I have the issue here, too (-next on thinkpad x60). Which
> patch is expected to fix it? Let me try recent -next...
Still there AFAICT.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
[not found] ` <CA+55aFw8tuoJ2gcXx3K2sKFf2Y9hXX4naMVQNqGOUivnjwhjkg@mail.gmail.com>
@ 2017-12-09 22:01 ` Pavel Machek
[not found] ` <CA+55aFySAdiBZhZ0PSDjH5PuvPPcMsBRXbxCkObfm1eY7gHDbQ@mail.gmail.com>
0 siblings, 1 reply; 83+ messages in thread
From: Pavel Machek @ 2017-12-09 22:01 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Thomas Gleixner, Andy Lutomirski,
Linux Kernel Mailing List, the arch/x86 maintainers
[-- Attachment #1: Type: text/plain, Size: 820 bytes --]
Hi!
On Sat 2017-12-09 10:55:14, Linus Torvalds wrote:
> On Dec 9, 2017 02:33, "Pavel Machek" <pavel@ucw.cz> wrote:
>
> > I believe I have the issue here, too (-next on thinkpad x60). Which
> > patch is expected to fix it? Let me try recent -next...
>
>
> It should be fixed in mainline. I don't know if next has picked it
> up yet.
Strange. I was at 4.15-rc1, and suspend worked there (thinkpad x60,
32-bit). It was broken in -next. I updated to current mainline (
4ded3bec65a07343258ed8fd9d46483f032d866f ) and suspend is broken.
It suspends ok, I press Fn button to make it resume, fans spin up but
moon LED is still lit.
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
[not found] ` <CA+55aFySAdiBZhZ0PSDjH5PuvPPcMsBRXbxCkObfm1eY7gHDbQ@mail.gmail.com>
@ 2017-12-10 16:23 ` Pavel Machek
2017-12-10 16:37 ` Linus Torvalds
0 siblings, 1 reply; 83+ messages in thread
From: Pavel Machek @ 2017-12-10 16:23 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Thomas Gleixner, Andy Lutomirski,
Linux Kernel Mailing List, the arch/x86 maintainers
[-- Attachment #1: Type: text/plain, Size: 964 bytes --]
On Sat 2017-12-09 14:47:41, Linus Torvalds wrote:
> On Dec 9, 2017 14:01, "Pavel Machek" <pavel@ucw.cz> wrote:
>
>
> Strange. I was at 4.15-rc1, and suspend worked there (thinkpad x60,
> 32-bit). It was broken in -next. I updated to current mainline (
> 4ded3bec65a07343258ed8fd9d46483f032d866f ) and suspend is broken.
>
Can you do something about html emails? Quoting them doesn't work too well.
> Any chance to bisect it? This doesn't sound like the other problem
> we had.
Let me try...
v4.15-rc2: suspends/resumes ok.
4ded3be: hangs on resume.
Given that between those, there was supposed "fix" for suspend, I
believe I should try reverting that one first. .. if someone can tell
me commit id, that would help.
And yes, if everything else fails, I can probably bisect.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-10 16:23 ` Pavel Machek
@ 2017-12-10 16:37 ` Linus Torvalds
2017-12-10 18:56 ` Pavel Machek
0 siblings, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2017-12-10 16:37 UTC (permalink / raw)
To: Pavel Machek
Cc: Rafael J. Wysocki, Thomas Gleixner, Andy Lutomirski,
Linux Kernel Mailing List, the arch/x86 maintainers
On Sun, Dec 10, 2017 at 8:23 AM, Pavel Machek <pavel@ucw.cz> wrote:
>
> Can you do something about html emails? Quoting them doesn't work too well.
Yeah, and they don't show up onlkml either because of rules. I try to
avoid them, but have been more on mobile for various reasons lately
than usual. That should be over and done with after today, though.
>> Any chance to bisect it? This doesn't sound like the other problem
>> we had.
>
> Let me try...
>
> v4.15-rc2: suspends/resumes ok.
> 4ded3be: hangs on resume.
>
> Given that between those, there was supposed "fix" for suspend, I
> believe I should try reverting that one first. .. if someone can tell
> me commit id, that would help.
The fix in there should be 5b06bbcfc2c6 ("x86/power: Fix some ordering
bugs in __restore_processor_context()") so you can certainly see if it
works before that (or just reverting it).
But there are also a few other x86 low-level things there, and that
fix really looks very safe, so I'd almost expect something else to
have triggered your problem. There's less than 500 commits in that
range you have, so a few bisections should narrow it down a lot.
Linus
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-10 16:37 ` Linus Torvalds
@ 2017-12-10 18:56 ` Pavel Machek
2017-12-10 20:30 ` Linus Torvalds
0 siblings, 1 reply; 83+ messages in thread
From: Pavel Machek @ 2017-12-10 18:56 UTC (permalink / raw)
To: Linus Torvalds, luto, tglx, jarkko.nikula
Cc: Rafael J. Wysocki, Thomas Gleixner, Andy Lutomirski,
Linux Kernel Mailing List, the arch/x86 maintainers
[-- Attachment #1: Type: text/plain, Size: 1950 bytes --]
On Sun 2017-12-10 08:37:56, Linus Torvalds wrote:
> On Sun, Dec 10, 2017 at 8:23 AM, Pavel Machek <pavel@ucw.cz> wrote:
> >
> > Can you do something about html emails? Quoting them doesn't work too well.
>
> Yeah, and they don't show up onlkml either because of rules. I try to
> avoid them, but have been more on mobile for various reasons lately
> than usual. That should be over and done with after today, though.
>
> >> Any chance to bisect it? This doesn't sound like the other problem
> >> we had.
> >
> > Let me try...
> >
> > v4.15-rc2: suspends/resumes ok.
> > 4ded3be: hangs on resume.
> >
> > Given that between those, there was supposed "fix" for suspend, I
> > believe I should try reverting that one first. .. if someone can tell
> > me commit id, that would help.
>
> The fix in there should be 5b06bbcfc2c6 ("x86/power: Fix some ordering
> bugs in __restore_processor_context()") so you can certainly see if it
> works before that (or just reverting it).
Revert is easier.
> But there are also a few other x86 low-level things there, and that
> fix really looks very safe, so I'd almost expect something else to
> have triggered your problem. There's less than 500 commits in that
> range you have, so a few bisections should narrow it down a lot.
No, that commit does _look_ pretty suspect to me...
Confirmed, revert fixes it. You see how it moves fix_processor_context
around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
machines exist? Aha.
Which brings me to .. various people do automated testing of
kernel. Testing 32-bit kernel for boot, and both 32-bit and 64-bit for
boot and suspend would be very nice. The last item is not hard, either:
sudo rtcwake -l -m mem -s 5
...should take 10 seconds or so.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-07 7:55 ` Michal Hocko
@ 2017-12-10 20:30 ` Michal Hocko
0 siblings, 0 replies; 83+ messages in thread
From: Michal Hocko @ 2017-12-10 20:30 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers
On Thu 07-12-17 08:55:08, Michal Hocko wrote:
> On Wed 06-12-17 13:14:52, Michal Hocko wrote:
> > On Mon 04-12-17 14:36:20, Linus Torvalds wrote:
> > > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > >
> > > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > > systems I have tested, so it is probably safe to assume it to be
> > > > broken everywhere.
> > >
> > > Oh, it's definitely not broken everywhere, because I use it myself,
> > > and was traveling last week due to my mom's bday.
> > >
> > > HOWEVER.
> > >
> > > Some of the x86 work seems to have broken it for some configurations.
> > > In particular, do you have a big "everything enabled" kernel config -
> > > particularly lockdep and irqflags tracing enabled?
> > >
> > > Andy has a patch, but it hasn't made it to me yet (probably because
> > > the x86 people are very busy with the kaiser work):
> > >
> > > https://lkml.org/lkml/2017/11/30/546
> > >
> > > (also note his follow-up "fix the commit message" note, but that one
> > > doesn't actually affect the code itself).
> >
> > merging tip/x86/urgent on top of your tree fixed this problem for me,
> > but I am seeing something else
> > [ 131.711412] ACPI: Preparing to enter system sleep state S3
> > [ 131.755328] ACPI: EC: event blocked
> > [ 131.755328] ACPI: EC: EC stopped
> > [ 131.755328] PM: Saving platform NVS memory
> > [ 131.755344] Disabling non-boot CPUs ...
> > [ 131.779330] IRQ 124: no longer affine to CPU1
> > [ 131.780334] smpboot: CPU 1 is now offline
> > [ 131.804465] smpboot: CPU 2 is now offline
> > [ 131.827291] IRQ 122: no longer affine to CPU3
> > [ 131.827292] IRQ 123: no longer affine to CPU3
> > [ 131.828293] smpboot: CPU 3 is now offline
> > [ 131.830991] ACPI: Low-level resume complete
> > [ 131.831092] ACPI: EC: EC started
> > [ 131.831093] PM: Restoring platform NVS memory
> > [ 131.831864] do_IRQ: 0.55 No irq handler for vector
> > [ 131.831884] Enabling non-boot CPUs ...
> > [ 131.831909] x86: Booting SMP configuration:
> > [ 131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > [ 131.832913] cache: parent cpu1 should not be sleeping
> > [ 131.833058] CPU1 is up
> > [ 131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> > [ 131.833864] cache: parent cpu2 should not be sleeping
> > [ 131.833983] CPU2 is up
> > [ 131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> > [ 131.834776] cache: parent cpu3 should not be sleeping
> > [ 131.834923] CPU3 is up
> >
> > "No irq handler" part looks a bit scary (maybe related to lost affinity
> > messages?) but the following messages look quite as well. Is this
> > something known? The system seems to be up and running without any
> > visible issues.
>
> Hmm, there is still something bad going on during resume. My laptop
> haven't woken up from s2ram this morning. The screen was powered on
> but the system hasn't come up.
It's been few days and I haven't seen this problem again. And I am doing
s2ram all the time...
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-10 18:56 ` Pavel Machek
@ 2017-12-10 20:30 ` Linus Torvalds
2017-12-10 20:43 ` Pavel Machek
` (2 more replies)
0 siblings, 3 replies; 83+ messages in thread
From: Linus Torvalds @ 2017-12-10 20:30 UTC (permalink / raw)
To: Pavel Machek, Zhang Rui
Cc: Andrew Lutomirski, Thomas Gleixner, Jarkko Nikula,
Rafael J. Wysocki, Linux Kernel Mailing List,
the arch/x86 maintainers
On Sun, Dec 10, 2017 at 10:56 AM, Pavel Machek <pavel@ucw.cz> wrote:
>
> Confirmed, revert fixes it. You see how it moves fix_processor_context
> around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
> machines exist? Aha.
Yeah, people do.
Andy?
> Which brings me to .. various people do automated testing of
> kernel. Testing 32-bit kernel for boot, and both 32-bit and 64-bit for
> boot and suspend would be very nice. The last item is not hard, either:
>
> sudo rtcwake -l -m mem -s 5
>
> ...should take 10 seconds or so.
I'm told 0day does *some* suspend/resume testing, but I think it's
pretty limited, partly because the kinds of machines it primarily
works on don't really support suspend/resume at all. I'm also not sure
just how many of those machines are 32-bit at all..
But I'm adding Zhang Rui to the cc, to see if my recollection is right.
Because you're right, more suspend/resume automated testing would be
good to have. And yes, people test mainly 64-bit these days.
Also, I'm not even sure what the 0day rules are for just plain
mainline. I don't tend to see a lot of breakage reports, even though
I'd expect to. This came in from the x86 trees (and those do their own
tests too, but probably not suspend/resume either), but it hit my tree
fairly soon after going into the x86 -tip trees.
Linus
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-10 20:30 ` Linus Torvalds
@ 2017-12-10 20:43 ` Pavel Machek
2017-12-10 21:28 ` Linus Torvalds
2017-12-10 21:38 ` [PATCH] Fix resume on x86-32 machines Pavel Machek
2017-12-11 14:09 ` Linux 4.15-rc2: Regression in resume from ACPI S3 Zhang Rui
2 siblings, 1 reply; 83+ messages in thread
From: Pavel Machek @ 2017-12-10 20:43 UTC (permalink / raw)
To: Linus Torvalds
Cc: Zhang Rui, Andrew Lutomirski, Thomas Gleixner, Jarkko Nikula,
Rafael J. Wysocki, Linux Kernel Mailing List,
the arch/x86 maintainers
[-- Attachment #1: Type: text/plain, Size: 1875 bytes --]
On Sun 2017-12-10 12:30:52, Linus Torvalds wrote:
> On Sun, Dec 10, 2017 at 10:56 AM, Pavel Machek <pavel@ucw.cz> wrote:
> >
> > Confirmed, revert fixes it. You see how it moves fix_processor_context
> > around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
> > machines exist? Aha.
>
> Yeah, people do.
>
> Andy?
For the record... this should fix it. Tested on x60. More tests pending.
Signed-off-by: Pavel Machek <pavel@ucw.cz>
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 5191de1..d59f05f 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -216,8 +216,8 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
write_cr0(ctxt->cr0);
/*
- * now restore the descriptor tables to their proper values
- * ltr is done i fix_processor_context().
+ * Now restore the descriptor tables to their proper values
+ * ltr is done in fix_processor_context().
*/
#ifdef CONFIG_X86_32
load_idt(&ctxt->idt);
@@ -235,8 +235,6 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
wrmsrl(MSR_GS_BASE, ctxt->gs_base);
#endif
- fix_processor_context();
-
/*
* Restore segment registers. This happens after restoring the GDT
* and LDT, which happen in fix_processor_context().
@@ -252,8 +250,12 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
*/
if (boot_cpu_has(X86_FEATURE_SEP))
enable_sep_cpu();
+
+ fix_processor_context();
#else
/* CONFIG_X86_64 */
+ fix_processor_context();
+
asm volatile ("movw %0, %%ds" :: "r" (ctxt->ds));
asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply related [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-10 20:43 ` Pavel Machek
@ 2017-12-10 21:28 ` Linus Torvalds
2017-12-10 21:35 ` Pavel Machek
2017-12-12 17:27 ` Linus Torvalds
0 siblings, 2 replies; 83+ messages in thread
From: Linus Torvalds @ 2017-12-10 21:28 UTC (permalink / raw)
To: Pavel Machek
Cc: Zhang Rui, Andrew Lutomirski, Thomas Gleixner, Jarkko Nikula,
Rafael J. Wysocki, Linux Kernel Mailing List,
the arch/x86 maintainers
On Sun, Dec 10, 2017 at 12:43 PM, Pavel Machek <pavel@ucw.cz> wrote:
>
> For the record... this should fix it. Tested on x60. More tests pending.
This can't be right.
At the very least, now the comment is wrong. And the comment does seem
relevant for 32-bit too:
> - fix_processor_context();
> -
> /*
> * Restore segment registers. This happens after restoring the GDT
> * and LDT, which happen in fix_processor_context().
Notice? You've moved down the 32-bit fix_processor_context() call to
after the loadsegment() calls, which smells wrong.
That said, this *all* smells wrong. Why is there a special
fix_processor_context() function at all with different 32-bit and
64-bit behavior? This code is all written to be maximally confusing.
I think this could do with some re-org to make it more logical. That
"some random things done in fix_processor_context(), other random
things done directly in __restore_processor_state()" makes no sense at
all to me. There's no logic to what is done where.
Linus
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-10 21:28 ` Linus Torvalds
@ 2017-12-10 21:35 ` Pavel Machek
2017-12-12 17:27 ` Linus Torvalds
1 sibling, 0 replies; 83+ messages in thread
From: Pavel Machek @ 2017-12-10 21:35 UTC (permalink / raw)
To: Linus Torvalds
Cc: Zhang Rui, Andrew Lutomirski, Thomas Gleixner, Jarkko Nikula,
Rafael J. Wysocki, Linux Kernel Mailing List,
the arch/x86 maintainers
[-- Attachment #1: Type: text/plain, Size: 1760 bytes --]
On Sun 2017-12-10 13:28:50, Linus Torvalds wrote:
> On Sun, Dec 10, 2017 at 12:43 PM, Pavel Machek <pavel@ucw.cz> wrote:
> >
> > For the record... this should fix it. Tested on x60. More tests pending.
>
> This can't be right.
>
> At the very least, now the comment is wrong. And the comment does seem
> relevant for 32-bit too:
Well, take a look at orignal patch. I'm reverting 32-bit code to
v4.15-rc1 version, while keeping 64-bit code at v4.15-rc3
version. Yes, my brain hurts from looking at the code :-(.
In the meantime, I did short test on 64-bit machine. No ill effect observed.
Hmm. Aha. Yes, the comment is wrong... as it was in wrong in -rc1.
> > - fix_processor_context();
> > -
> > /*
> > * Restore segment registers. This happens after restoring the GDT
> > * and LDT, which happen in fix_processor_context().
>
> Notice? You've moved down the 32-bit fix_processor_context() call to
> after the loadsegment() calls, which smells wrong.
Yeah, I did. There's where it was in v4.15-rc1, and that's what ws
working for me.
> That said, this *all* smells wrong. Why is there a special
> fix_processor_context() function at all with different 32-bit and
> 64-bit behavior? This code is all written to be maximally confusing.
>
> I think this could do with some re-org to make it more logical. That
> "some random things done in fix_processor_context(), other random
> things done directly in __restore_processor_state()" makes no sense at
> all to me. There's no logic to what is done where.
I have to agree.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 83+ messages in thread
* [PATCH] Fix resume on x86-32 machines
2017-12-10 20:30 ` Linus Torvalds
2017-12-10 20:43 ` Pavel Machek
@ 2017-12-10 21:38 ` Pavel Machek
2017-12-10 21:58 ` Andy Lutomirski
2017-12-11 14:09 ` Linux 4.15-rc2: Regression in resume from ACPI S3 Zhang Rui
2 siblings, 1 reply; 83+ messages in thread
From: Pavel Machek @ 2017-12-10 21:38 UTC (permalink / raw)
To: Linus Torvalds
Cc: Zhang Rui, Andrew Lutomirski, Thomas Gleixner, Jarkko Nikula,
Rafael J. Wysocki, Linux Kernel Mailing List,
the arch/x86 maintainers
[-- Attachment #1: Type: text/plain, Size: 2072 bytes --]
After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
(unintentionally?) reordered stuff with respect to
fix_processor_context() on 32-bit and 64-bit. We undo that change on
32-bit.
While we are at it, fix a comment.
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Fixes: 5b06bbcfc2c621da3009da8decb7511500c293ed
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 5191de1..af7b613 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -216,8 +216,8 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
write_cr0(ctxt->cr0);
/*
- * now restore the descriptor tables to their proper values
- * ltr is done i fix_processor_context().
+ * Now restore the descriptor tables to their proper values
+ * ltr is done in fix_processor_context().
*/
#ifdef CONFIG_X86_32
load_idt(&ctxt->idt);
@@ -235,13 +235,11 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
wrmsrl(MSR_GS_BASE, ctxt->gs_base);
#endif
- fix_processor_context();
-
+#ifdef CONFIG_X86_32
/*
- * Restore segment registers. This happens after restoring the GDT
- * and LDT, which happen in fix_processor_context().
+ * Restore segment registers.
*/
-#ifdef CONFIG_X86_32
+
loadsegment(es, ctxt->es);
loadsegment(fs, ctxt->fs);
loadsegment(gs, ctxt->gs);
@@ -252,8 +250,17 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
*/
if (boot_cpu_has(X86_FEATURE_SEP))
enable_sep_cpu();
+
+ fix_processor_context();
#else
/* CONFIG_X86_64 */
+ /*
+ * Restore segment registers. This happens after restoring the GDT
+ * and LDT, which happen in fix_processor_context().
+ */
+
+ fix_processor_context();
+
asm volatile ("movw %0, %%ds" :: "r" (ctxt->ds));
asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply related [flat|nested] 83+ messages in thread
* Re: [PATCH] Fix resume on x86-32 machines
2017-12-10 21:38 ` [PATCH] Fix resume on x86-32 machines Pavel Machek
@ 2017-12-10 21:58 ` Andy Lutomirski
2017-12-10 22:20 ` Pavel Machek
` (2 more replies)
0 siblings, 3 replies; 83+ messages in thread
From: Andy Lutomirski @ 2017-12-10 21:58 UTC (permalink / raw)
To: Pavel Machek
Cc: Linus Torvalds, Zhang Rui, Andrew Lutomirski, Thomas Gleixner,
Jarkko Nikula, Rafael J. Wysocki, Linux Kernel Mailing List,
the arch/x86 maintainers
> On Dec 10, 2017, at 1:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
>
>
> After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
> (unintentionally?) reordered stuff with respect to
> fix_processor_context() on 32-bit and 64-bit. We undo that change on
> 32-bit.
>
Can you explain what was wrong with the reordering? Your patch certainly *looks* incorrect.
I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
> While we are at it, fix a comment.
>
> Signed-off-by: Pavel Machek <pavel@ucw.cz>
> Fixes: 5b06bbcfc2c621da3009da8decb7511500c293ed
>
> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> index 5191de1..af7b613 100644
> --- a/arch/x86/power/cpu.c
> +++ b/arch/x86/power/cpu.c
> @@ -216,8 +216,8 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> write_cr0(ctxt->cr0);
>
> /*
> - * now restore the descriptor tables to their proper values
> - * ltr is done i fix_processor_context().
> + * Now restore the descriptor tables to their proper values
> + * ltr is done in fix_processor_context().
> */
> #ifdef CONFIG_X86_32
> load_idt(&ctxt->idt);
> @@ -235,13 +235,11 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> wrmsrl(MSR_GS_BASE, ctxt->gs_base);
> #endif
>
> - fix_processor_context();
> -
> +#ifdef CONFIG_X86_32
> /*
> - * Restore segment registers. This happens after restoring the GDT
> - * and LDT, which happen in fix_processor_context().
> + * Restore segment registers.
> */
> -#ifdef CONFIG_X86_32
> +
> loadsegment(es, ctxt->es);
> loadsegment(fs, ctxt->fs);
> loadsegment(gs, ctxt->gs);
> @@ -252,8 +250,17 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> */
> if (boot_cpu_has(X86_FEATURE_SEP))
> enable_sep_cpu();
> +
> + fix_processor_context();
> #else
> /* CONFIG_X86_64 */
> + /*
> + * Restore segment registers. This happens after restoring the GDT
> + * and LDT, which happen in fix_processor_context().
> + */
> +
> + fix_processor_context();
> +
> asm volatile ("movw %0, %%ds" :: "r" (ctxt->ds));
> asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
> asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));
>
>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH] Fix resume on x86-32 machines
2017-12-10 21:58 ` Andy Lutomirski
@ 2017-12-10 22:20 ` Pavel Machek
2017-12-11 9:25 ` Jarkko Nikula
2017-12-11 14:22 ` Rafael J. Wysocki
2017-12-11 15:13 ` Ingo Molnar
2 siblings, 1 reply; 83+ messages in thread
From: Pavel Machek @ 2017-12-10 22:20 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Linus Torvalds, Zhang Rui, Andrew Lutomirski, Thomas Gleixner,
Jarkko Nikula, Rafael J. Wysocki, Linux Kernel Mailing List,
the arch/x86 maintainers
[-- Attachment #1: Type: text/plain, Size: 3062 bytes --]
On Sun 2017-12-10 13:58:23, Andy Lutomirski wrote:
> > On Dec 10, 2017, at 1:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
> >
> >
> > After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
> > (unintentionally?) reordered stuff with respect to
> > fix_processor_context() on 32-bit and 64-bit. We undo that change on
> > 32-bit.
> >
>
> Can you explain what was wrong with the reordering? Your patch certainly *looks* incorrect.
>
No, I can't, sorry.
> I'm guessing that the real issue is that 32-bit needs %fs restored
> early for TLS.
Maybe. I can test patches...
I don't think it would be good idea to revert
5b06bbcfc2c621da3009da8decb7511500c293ed, but since it introduced
regression in -rc2, I believe we should fix the regression now, and
then we can try to provide cleaner solution.
As Linus noted, the code is quite "interesting".
Pavel
>
> > While we are at it, fix a comment.
> >
> > Signed-off-by: Pavel Machek <pavel@ucw.cz>
> > Fixes: 5b06bbcfc2c621da3009da8decb7511500c293ed
> >
> > diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> > index 5191de1..af7b613 100644
> > --- a/arch/x86/power/cpu.c
> > +++ b/arch/x86/power/cpu.c
> > @@ -216,8 +216,8 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> > write_cr0(ctxt->cr0);
> >
> > /*
> > - * now restore the descriptor tables to their proper values
> > - * ltr is done i fix_processor_context().
> > + * Now restore the descriptor tables to their proper values
> > + * ltr is done in fix_processor_context().
> > */
> > #ifdef CONFIG_X86_32
> > load_idt(&ctxt->idt);
> > @@ -235,13 +235,11 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> > wrmsrl(MSR_GS_BASE, ctxt->gs_base);
> > #endif
> >
> > - fix_processor_context();
> > -
> > +#ifdef CONFIG_X86_32
> > /*
> > - * Restore segment registers. This happens after restoring the GDT
> > - * and LDT, which happen in fix_processor_context().
> > + * Restore segment registers.
> > */
> > -#ifdef CONFIG_X86_32
> > +
> > loadsegment(es, ctxt->es);
> > loadsegment(fs, ctxt->fs);
> > loadsegment(gs, ctxt->gs);
> > @@ -252,8 +250,17 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> > */
> > if (boot_cpu_has(X86_FEATURE_SEP))
> > enable_sep_cpu();
> > +
> > + fix_processor_context();
> > #else
> > /* CONFIG_X86_64 */
> > + /*
> > + * Restore segment registers. This happens after restoring the GDT
> > + * and LDT, which happen in fix_processor_context().
> > + */
> > +
> > + fix_processor_context();
> > +
> > asm volatile ("movw %0, %%ds" :: "r" (ctxt->ds));
> > asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
> > asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));
> >
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH] Fix resume on x86-32 machines
2017-12-10 22:20 ` Pavel Machek
@ 2017-12-11 9:25 ` Jarkko Nikula
0 siblings, 0 replies; 83+ messages in thread
From: Jarkko Nikula @ 2017-12-11 9:25 UTC (permalink / raw)
To: Pavel Machek, Andy Lutomirski
Cc: Linus Torvalds, Zhang Rui, Andrew Lutomirski, Thomas Gleixner,
Rafael J. Wysocki, Linux Kernel Mailing List,
the arch/x86 maintainers
On 12/11/2017 12:20 AM, Pavel Machek wrote:
> On Sun 2017-12-10 13:58:23, Andy Lutomirski wrote:
>>> On Dec 10, 2017, at 1:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
>>>
>>>
>>> After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
>>> (unintentionally?) reordered stuff with respect to
>>> fix_processor_context() on 32-bit and 64-bit. We undo that change on
>>> 32-bit.
>>>
>>
>> Can you explain what was wrong with the reordering? Your patch certainly *looks* incorrect.
>>
>
> No, I can't, sorry.
>
>> I'm guessing that the real issue is that 32-bit needs %fs restored
>> early for TLS.
>
> Maybe. I can test patches...
>
> I don't think it would be good idea to revert
> 5b06bbcfc2c621da3009da8decb7511500c293ed, but since it introduced
> regression in -rc2, I believe we should fix the regression now, and
> then we can try to provide cleaner solution.
>
I can confirm Pavel's findings. Commit 5b06bbcfc2c6 ("x86/power: Fix
some ordering bugs in __restore_processor_context()") broke the
suspend/resume on 32-bit kernel.
v4.15-rc3 works either by reverting the commit or by Pavel's patch.
Fortunately Pavel's patch still keeps the 64-bit suspend/resume ok.
--
Jarkko
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-10 20:30 ` Linus Torvalds
2017-12-10 20:43 ` Pavel Machek
2017-12-10 21:38 ` [PATCH] Fix resume on x86-32 machines Pavel Machek
@ 2017-12-11 14:09 ` Zhang Rui
2017-12-11 16:28 ` Andy Lutomirski
2017-12-12 8:00 ` Pavel Machek
2 siblings, 2 replies; 83+ messages in thread
From: Zhang Rui @ 2017-12-11 14:09 UTC (permalink / raw)
To: Linus Torvalds, Pavel Machek
Cc: Andrew Lutomirski, Thomas Gleixner, Jarkko Nikula,
Rafael J. Wysocki, Linux Kernel Mailing List,
the arch/x86 maintainers
On Sun, 2017-12-10 at 12:30 -0800, Linus Torvalds wrote:
> On Sun, Dec 10, 2017 at 10:56 AM, Pavel Machek <pavel@ucw.cz> wrote:
> >
> >
> > Confirmed, revert fixes it. You see how it moves
> > fix_processor_context
> > around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
> > machines exist? Aha.
> Yeah, people do.
>
> Andy?
>
> >
> > Which brings me to .. various people do automated testing of
> > kernel. Testing 32-bit kernel for boot, and both 32-bit and 64-bit
> > for
> > boot and suspend would be very nice. The last item is not hard,
> > either:
> >
> > sudo rtcwake -l -m mem -s 5
> >
> > ...should take 10 seconds or so.
> I'm told 0day does *some* suspend/resume testing, but I think it's
> pretty limited, partly because the kinds of machines it primarily
> works on don't really support suspend/resume at all.
currently, we're running suspend test on 1 platform only, with 64 bit
kernel. suspend test will be enabled on more platforms (laptops) in
next two weeks.
I will check why it does not find the first regression introduced by
ca37e57bbe0c ("x86/entry/64: Add missing irqflags tracing to
native_load_gs_index()").
> I'm also not sure
> just how many of those machines are 32-bit at all..
for this, I suppose it can be reproduced if we use 32-bit kernel and
rootfs, right? Then it's easier to enable this in 0Day.
thanks,
rui
>
> But I'm adding Zhang Rui to the cc, to see if my recollection is
> right.
>
> Because you're right, more suspend/resume automated testing would be
> good to have. And yes, people test mainly 64-bit these days.
>
> Also, I'm not even sure what the 0day rules are for just plain
> mainline. I don't tend to see a lot of breakage reports, even though
> I'd expect to. This came in from the x86 trees (and those do their
> own
> tests too, but probably not suspend/resume either), but it hit my
> tree
> fairly soon after going into the x86 -tip trees.
>
> Linus
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH] Fix resume on x86-32 machines
2017-12-10 21:58 ` Andy Lutomirski
2017-12-10 22:20 ` Pavel Machek
@ 2017-12-11 14:22 ` Rafael J. Wysocki
2017-12-11 14:43 ` Rafael J. Wysocki
` (2 more replies)
2017-12-11 15:13 ` Ingo Molnar
2 siblings, 3 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-11 14:22 UTC (permalink / raw)
To: Pavel Machek, Andrew Lutomirski
Cc: Andy Lutomirski, Linus Torvalds, Zhang Rui, Thomas Gleixner,
Jarkko Nikula, Linux Kernel Mailing List,
the arch/x86 maintainers
On Sunday, December 10, 2017 10:58:23 PM CET Andy Lutomirski wrote:
>
> > On Dec 10, 2017, at 1:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
> >
> >
> > After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
> > (unintentionally?) reordered stuff with respect to
> > fix_processor_context() on 32-bit and 64-bit. We undo that change on
> > 32-bit.
> >
>
> Can you explain what was wrong with the reordering? Your patch certainly *looks* incorrect.
>
> I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
I *think* you are right.
Anyway, that should be easy enough to verify.
Pavel, can you please check if the below change works too?
---
arch/x86/power/cpu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Index: linux-pm/arch/x86/power/cpu.c
===================================================================
--- linux-pm.orig/arch/x86/power/cpu.c
+++ linux-pm/arch/x86/power/cpu.c
@@ -221,6 +221,8 @@ static void notrace __restore_processor_
*/
#ifdef CONFIG_X86_32
load_idt(&ctxt->idt);
+
+ loadsegment(fs, ctxt->fs);
#else
/* CONFIG_X86_64 */
load_idt((const struct desc_ptr *)&ctxt->idt_limit);
@@ -243,7 +245,6 @@ static void notrace __restore_processor_
*/
#ifdef CONFIG_X86_32
loadsegment(es, ctxt->es);
- loadsegment(fs, ctxt->fs);
loadsegment(gs, ctxt->gs);
loadsegment(ss, ctxt->ss);
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH] Fix resume on x86-32 machines
2017-12-11 14:22 ` Rafael J. Wysocki
@ 2017-12-11 14:43 ` Rafael J. Wysocki
2017-12-11 14:59 ` Jarkko Nikula
2017-12-11 18:31 ` Linus Torvalds
2 siblings, 0 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-11 14:43 UTC (permalink / raw)
To: Andrew Lutomirski
Cc: Pavel Machek, Andy Lutomirski, Linus Torvalds, Zhang Rui,
Thomas Gleixner, Jarkko Nikula, Linux Kernel Mailing List,
the arch/x86 maintainers
On Monday, December 11, 2017 3:22:39 PM CET Rafael J. Wysocki wrote:
> On Sunday, December 10, 2017 10:58:23 PM CET Andy Lutomirski wrote:
> >
> > > On Dec 10, 2017, at 1:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
> > >
> > >
> > > After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
> > > (unintentionally?) reordered stuff with respect to
> > > fix_processor_context() on 32-bit and 64-bit. We undo that change on
> > > 32-bit.
> > >
> >
> > Can you explain what was wrong with the reordering? Your patch certainly *looks* incorrect.
> >
> > I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
>
> I *think* you are right.
Hmm. Don't we need to restore GS on 32-bit before doing the per_cpu() thing
in fix_processor_context()?
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH] Fix resume on x86-32 machines
2017-12-11 14:22 ` Rafael J. Wysocki
2017-12-11 14:43 ` Rafael J. Wysocki
@ 2017-12-11 14:59 ` Jarkko Nikula
2017-12-11 18:31 ` Linus Torvalds
2 siblings, 0 replies; 83+ messages in thread
From: Jarkko Nikula @ 2017-12-11 14:59 UTC (permalink / raw)
To: Rafael J. Wysocki, Pavel Machek, Andrew Lutomirski
Cc: Andy Lutomirski, Linus Torvalds, Zhang Rui, Thomas Gleixner,
Linux Kernel Mailing List, the arch/x86 maintainers
On 12/11/2017 04:22 PM, Rafael J. Wysocki wrote:
> On Sunday, December 10, 2017 10:58:23 PM CET Andy Lutomirski wrote:
>>
>>> On Dec 10, 2017, at 1:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
>>>
>>>
>>> After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
>>> (unintentionally?) reordered stuff with respect to
>>> fix_processor_context() on 32-bit and 64-bit. We undo that change on
>>> 32-bit.
>>>
>>
>> Can you explain what was wrong with the reordering? Your patch certainly *looks* incorrect.
>>
>> I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
>
> I *think* you are right.
>
> Anyway, that should be easy enough to verify.
>
> Pavel, can you please check if the below change works too?
>
> ---
> arch/x86/power/cpu.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
Works here on my test machine.
Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH] Fix resume on x86-32 machines
2017-12-10 21:58 ` Andy Lutomirski
2017-12-10 22:20 ` Pavel Machek
2017-12-11 14:22 ` Rafael J. Wysocki
@ 2017-12-11 15:13 ` Ingo Molnar
2017-12-11 16:26 ` Andy Lutomirski
2 siblings, 1 reply; 83+ messages in thread
From: Ingo Molnar @ 2017-12-11 15:13 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Pavel Machek, Linus Torvalds, Zhang Rui, Andrew Lutomirski,
Thomas Gleixner, Jarkko Nikula, Rafael J. Wysocki,
Linux Kernel Mailing List, the arch/x86 maintainers
* Andy Lutomirski <luto@amacapital.net> wrote:
>
>
> > On Dec 10, 2017, at 1:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
> >
> >
> > After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
> > (unintentionally?) reordered stuff with respect to
> > fix_processor_context() on 32-bit and 64-bit. We undo that change on
> > 32-bit.
> >
>
> Can you explain what was wrong with the reordering? Your patch certainly *looks* incorrect.
>
> I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
Does some early percpu primitive need GS as well perhaps?
Might be safest to restore both FS and GS early.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH] Fix resume on x86-32 machines
2017-12-11 15:13 ` Ingo Molnar
@ 2017-12-11 16:26 ` Andy Lutomirski
0 siblings, 0 replies; 83+ messages in thread
From: Andy Lutomirski @ 2017-12-11 16:26 UTC (permalink / raw)
To: Ingo Molnar
Cc: Pavel Machek, Linus Torvalds, Zhang Rui, Andrew Lutomirski,
Thomas Gleixner, Jarkko Nikula, Rafael J. Wysocki,
Linux Kernel Mailing List, the arch/x86 maintainers
On Mon, Dec 11, 2017 at 7:13 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Andy Lutomirski <luto@amacapital.net> wrote:
>
>>
>>
>> > On Dec 10, 2017, at 1:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
>> >
>> >
>> > After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
>> > (unintentionally?) reordered stuff with respect to
>> > fix_processor_context() on 32-bit and 64-bit. We undo that change on
>> > 32-bit.
>> >
>>
>> Can you explain what was wrong with the reordering? Your patch certainly *looks* incorrect.
>>
>> I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
>
> Does some early percpu primitive need GS as well perhaps?
>
> Might be safest to restore both FS and GS early.
>
fs needs to be restored early for TLS. gs needs to be restored early,
maybe, if !X86_32_LAZY_GS -- it's used for stack-protector. If
X86_32_LAZY_GS, gs must *not* be restored early.
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-11 14:09 ` Linux 4.15-rc2: Regression in resume from ACPI S3 Zhang Rui
@ 2017-12-11 16:28 ` Andy Lutomirski
2017-12-12 8:00 ` Pavel Machek
1 sibling, 0 replies; 83+ messages in thread
From: Andy Lutomirski @ 2017-12-11 16:28 UTC (permalink / raw)
To: Zhang Rui
Cc: Linus Torvalds, Pavel Machek, Andrew Lutomirski, Thomas Gleixner,
Jarkko Nikula, Rafael J. Wysocki, Linux Kernel Mailing List,
the arch/x86 maintainers
On Mon, Dec 11, 2017 at 6:09 AM, Zhang Rui <rui.zhang@intel.com> wrote:
> On Sun, 2017-12-10 at 12:30 -0800, Linus Torvalds wrote:
>> On Sun, Dec 10, 2017 at 10:56 AM, Pavel Machek <pavel@ucw.cz> wrote:
>> >
>> >
>> > Confirmed, revert fixes it. You see how it moves
>> > fix_processor_context
>> > around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
>> > machines exist? Aha.
>> Yeah, people do.
>>
>> Andy?
>>
>> >
>> > Which brings me to .. various people do automated testing of
>> > kernel. Testing 32-bit kernel for boot, and both 32-bit and 64-bit
>> > for
>> > boot and suspend would be very nice. The last item is not hard,
>> > either:
>> >
>> > sudo rtcwake -l -m mem -s 5
>> >
>> > ...should take 10 seconds or so.
>> I'm told 0day does *some* suspend/resume testing, but I think it's
>> pretty limited, partly because the kinds of machines it primarily
>> works on don't really support suspend/resume at all.
>
> currently, we're running suspend test on 1 platform only, with 64 bit
> kernel. suspend test will be enabled on more platforms (laptops) in
> next two weeks.
>
> I will check why it does not find the first regression introduced by
> ca37e57bbe0c ("x86/entry/64: Add missing irqflags tracing to
> native_load_gs_index()").
>
>> I'm also not sure
>> just how many of those machines are 32-bit at all..
>
> for this, I suppose it can be reproduced if we use 32-bit kernel and
> rootfs, right? Then it's easier to enable this in 0Day.
>
Yes.
The 64-bit problem should also be reproducible with rtcwake even in a vm.
Also, on this topic, could make run_tests in
tools/testing/selftests/x86 be added to the rotation as well? The
testing dir should match the kernel being tested IMO.
> thanks,
> rui
>>
>> But I'm adding Zhang Rui to the cc, to see if my recollection is
>> right.
>>
>> Because you're right, more suspend/resume automated testing would be
>> good to have. And yes, people test mainly 64-bit these days.
>>
>> Also, I'm not even sure what the 0day rules are for just plain
>> mainline. I don't tend to see a lot of breakage reports, even though
>> I'd expect to. This came in from the x86 trees (and those do their
>> own
>> tests too, but probably not suspend/resume either), but it hit my
>> tree
>> fairly soon after going into the x86 -tip trees.
>>
>> Linus
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH] Fix resume on x86-32 machines
2017-12-11 14:22 ` Rafael J. Wysocki
2017-12-11 14:43 ` Rafael J. Wysocki
2017-12-11 14:59 ` Jarkko Nikula
@ 2017-12-11 18:31 ` Linus Torvalds
2017-12-11 18:41 ` Andy Lutomirski
2 siblings, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2017-12-11 18:31 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Pavel Machek, Andrew Lutomirski, Andy Lutomirski, Zhang Rui,
Thomas Gleixner, Jarkko Nikula, Linux Kernel Mailing List,
the arch/x86 maintainers
On Mon, Dec 11, 2017 at 6:22 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Sunday, December 10, 2017 10:58:23 PM CET Andy Lutomirski wrote:
>>
>> I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
>
> I *think* you are right.
>
> Anyway, that should be easy enough to verify.
>
> Pavel, can you please check if the below change works too?
So Jarkko confirmed this works for him, but the more I look at this
crap, the less I like it.
Why do we save fs/ds/es/ss at all on x86-32? Don't they all have fixed
values in the kernel, with %fs being __KERNEL_PERCPU, and the others
being __USER_DS?
Nothing else can possibly be valid, as far as I can tell.
I think we actually leave the user-space percpu segment in %gs (or the
stack canary base), so that one we should actually save/restore, but
I'm getting the feeling that we should just reset the other segment
registers to known values on 32-bit.
Also, why does the 32-bit code do
loadsegment(es, ctxt->es);
but the 64-bit code does
asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
And look at that confusion between MSR_GS_BASE and MSR_KERNEL_GS_BASE
all within the 64-bit case.
In particular, note how we reload the %gs segment in between the two -
wouldn't that mess with the currently active gs base if %gs can be
non-zero?
Christ, what a mess.
So I think that whole sequence is garbage. It has been written as some
kind of "save and restore registers", but that's not what it really
then does - or what it should do.
It should make sure to restore a sane kernel state, not some random
register state.
And the 32-bit and 64-bit code really should strive to be at least
_sanely_ different, not this randomly and insanely different mess.
But yes, Rafael's patch looks like the minimal one-liner. But I think
we should do the %gs load early too for the 32-bit stack canary case,
kind of like we need to do %fs for percpu base.
Linus
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH] Fix resume on x86-32 machines
2017-12-11 18:31 ` Linus Torvalds
@ 2017-12-11 18:41 ` Andy Lutomirski
2017-12-11 19:12 ` Linus Torvalds
2017-12-14 20:38 ` Pavel Machek
0 siblings, 2 replies; 83+ messages in thread
From: Andy Lutomirski @ 2017-12-11 18:41 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Pavel Machek, Andrew Lutomirski, Zhang Rui,
Thomas Gleixner, Jarkko Nikula, Linux Kernel Mailing List,
the arch/x86 maintainers
On Mon, Dec 11, 2017 at 10:31 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Dec 11, 2017 at 6:22 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> On Sunday, December 10, 2017 10:58:23 PM CET Andy Lutomirski wrote:
>>>
>>> I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
>>
>> I *think* you are right.
>>
>> Anyway, that should be easy enough to verify.
>>
>> Pavel, can you please check if the below change works too?
>
> So Jarkko confirmed this works for him, but the more I look at this
> crap, the less I like it.
>
> Why do we save fs/ds/es/ss at all on x86-32? Don't they all have fixed
> values in the kernel, with %fs being __KERNEL_PERCPU, and the others
> being __USER_DS?
>
> Nothing else can possibly be valid, as far as I can tell.
>
> I think we actually leave the user-space percpu segment in %gs (or the
> stack canary base), so that one we should actually save/restore, but
> I'm getting the feeling that we should just reset the other segment
> registers to known values on 32-bit.
>
> Also, why does the 32-bit code do
>
> loadsegment(es, ctxt->es);
>
> but the 64-bit code does
>
> asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
>
> And look at that confusion between MSR_GS_BASE and MSR_KERNEL_GS_BASE
> all within the 64-bit case.
>
> In particular, note how we reload the %gs segment in between the two -
> wouldn't that mess with the currently active gs base if %gs can be
> non-zero?
>
> Christ, what a mess.
>
> So I think that whole sequence is garbage. It has been written as some
> kind of "save and restore registers", but that's not what it really
> then does - or what it should do.
>
> It should make sure to restore a sane kernel state, not some random
> register state.
>
> And the 32-bit and 64-bit code really should strive to be at least
> _sanely_ different, not this randomly and insanely different mess.
>
> But yes, Rafael's patch looks like the minimal one-liner. But I think
> we should do the %gs load early too for the 32-bit stack canary case,
> kind of like we need to do %fs for percpu base.
I'll try to get to this in a day or so -- is that okay? Or should we
do some trivial fix/revert and fix it for real next time around?
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH] Fix resume on x86-32 machines
2017-12-11 18:41 ` Andy Lutomirski
@ 2017-12-11 19:12 ` Linus Torvalds
2017-12-14 20:38 ` Pavel Machek
1 sibling, 0 replies; 83+ messages in thread
From: Linus Torvalds @ 2017-12-11 19:12 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Rafael J. Wysocki, Pavel Machek, Andrew Lutomirski, Zhang Rui,
Thomas Gleixner, Jarkko Nikula, Linux Kernel Mailing List,
the arch/x86 maintainers
On Mon, Dec 11, 2017 at 10:41 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> I'll try to get to this in a day or so -- is that okay? Or should we
> do some trivial fix/revert and fix it for real next time around?
I don't think we want some trivial fix/revert just to keep it working.
This code is too fragile as-is, and I think that "make it work" is
more than reverting. You did fix real issues on x86-64 with odd
segment use, for example.
Linus
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-11 14:09 ` Linux 4.15-rc2: Regression in resume from ACPI S3 Zhang Rui
2017-12-11 16:28 ` Andy Lutomirski
@ 2017-12-12 8:00 ` Pavel Machek
1 sibling, 0 replies; 83+ messages in thread
From: Pavel Machek @ 2017-12-12 8:00 UTC (permalink / raw)
To: Zhang Rui
Cc: Linus Torvalds, Andrew Lutomirski, Thomas Gleixner,
Jarkko Nikula, Rafael J. Wysocki, Linux Kernel Mailing List,
the arch/x86 maintainers
[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]
Hi!
> > > ...should take 10 seconds or so.
> > I'm told 0day does *some* suspend/resume testing, but I think it's
> > pretty limited, partly because the kinds of machines it primarily
> > works on don't really support suspend/resume at all.
>
> currently, we're running suspend test on 1 platform only, with 64 bit
> kernel. suspend test will be enabled on more platforms (laptops) in
> next two weeks.
Thanks!
> > I'm also not sure
> > just how many of those machines are 32-bit at all..
>
> for this, I suppose it can be reproduced if we use 32-bit kernel and
> rootfs, right? Then it's easier to enable this in 0Day.
Yes, Intel cpus are pretty good at backwards compatibility, and most
problems are not subtle at all. So yes, 32-bit kernel / rootfs on
recent machine should be good for testing.
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-10 21:28 ` Linus Torvalds
2017-12-10 21:35 ` Pavel Machek
@ 2017-12-12 17:27 ` Linus Torvalds
2017-12-12 18:05 ` Andy Lutomirski
1 sibling, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2017-12-12 17:27 UTC (permalink / raw)
To: Pavel Machek
Cc: Zhang Rui, Andrew Lutomirski, Thomas Gleixner, Jarkko Nikula,
Rafael J. Wysocki, Linux Kernel Mailing List,
the arch/x86 maintainers
On Sun, Dec 10, 2017 at 1:28 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> That said, this *all* smells wrong. Why is there a special
> fix_processor_context() function at all with different 32-bit and
> 64-bit behavior? This code is all written to be maximally confusing.
Hmm. Looking a bit more at this, I think it should be solved by:
- load the original read-write GDT early, along with the IDT.
We have already saved it off in __save_processor_state(), and it may
have already gotten loaded very early in at least some of the paths,
but it definitely hasn't gotten reloaded in all the paths (think
"suspend/resume testing" etc).
- add the LDT descriptor to the save area too, exactly like we
already have IDT/GDT.
Then, we can do "load_ldt()" early (along with IDT and GDT).
- now we can just load all the segments early, and get the percpu and
stack canary stuff without any special cases
- do NOT use "load_gs_index()", which does that swapgs dance (twice!)
and plays with interrupt state. Just load the segment register, and
then do the wrmsrl() of the {FS,GS,KERNEL_GS}_BASE values. There is no
need for the swapgs dance.
- now that we have a fully working system, then the
"fix_processor_context()" code can do the "fancy" stuff like setting
up the RO fixmap GDT and re-initializing the TLB state. Those want
percpu stuff.
In other words, why not try to organize this so that we do a simple
and fairly mindless restore of the low-level state first? Avoid all
the "system is halfway up" crud.
Would that work for people? Andy?
Linus
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-12 17:27 ` Linus Torvalds
@ 2017-12-12 18:05 ` Andy Lutomirski
2017-12-12 18:36 ` Linus Torvalds
0 siblings, 1 reply; 83+ messages in thread
From: Andy Lutomirski @ 2017-12-12 18:05 UTC (permalink / raw)
To: Linus Torvalds
Cc: Pavel Machek, Zhang Rui, Andrew Lutomirski, Thomas Gleixner,
Jarkko Nikula, Rafael J. Wysocki, Linux Kernel Mailing List,
the arch/x86 maintainers
On Tue, Dec 12, 2017 at 9:27 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sun, Dec 10, 2017 at 1:28 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> That said, this *all* smells wrong. Why is there a special
>> fix_processor_context() function at all with different 32-bit and
>> 64-bit behavior? This code is all written to be maximally confusing.
>
> Hmm. Looking a bit more at this, I think it should be solved by:
>
> - load the original read-write GDT early, along with the IDT.
>
> We have already saved it off in __save_processor_state(), and it may
> have already gotten loaded very early in at least some of the paths,
> but it definitely hasn't gotten reloaded in all the paths (think
> "suspend/resume testing" etc).
>
> - add the LDT descriptor to the save area too, exactly like we
> already have IDT/GDT.
>
> Then, we can do "load_ldt()" early (along with IDT and GDT).
>
> - now we can just load all the segments early, and get the percpu and
> stack canary stuff without any special cases
>
> - do NOT use "load_gs_index()", which does that swapgs dance (twice!)
> and plays with interrupt state. Just load the segment register, and
> then do the wrmsrl() of the {FS,GS,KERNEL_GS}_BASE values. There is no
> need for the swapgs dance.
Using what helper? On x86_64, it can fault, and IIRC we explicitly
don't allow loadsegment(gs, ...).
>
> - now that we have a fully working system, then the
> "fix_processor_context()" code can do the "fancy" stuff like setting
> up the RO fixmap GDT and re-initializing the TLB state. Those want
> percpu stuff.
>
> In other words, why not try to organize this so that we do a simple
> and fairly mindless restore of the low-level state first? Avoid all
> the "system is halfway up" crud.
>
> Would that work for people? Andy?
Other than the above, more or less.
But we should really do all the user segments last. They're not at
all needed for normal kernel execution, so I think they should be the
very last part.
I'll try to get to this today.
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-12 18:05 ` Andy Lutomirski
@ 2017-12-12 18:36 ` Linus Torvalds
2017-12-12 22:10 ` Andy Lutomirski
0 siblings, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2017-12-12 18:36 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Pavel Machek, Zhang Rui, Thomas Gleixner, Jarkko Nikula,
Rafael J. Wysocki, Linux Kernel Mailing List,
the arch/x86 maintainers
On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski <luto@kernel.org> wrote:
>>
>> - do NOT use "load_gs_index()", which does that swapgs dance (twice!)
>> and plays with interrupt state. Just load the segment register, and
>> then do the wrmsrl() of the {FS,GS,KERNEL_GS}_BASE values. There is no
>> need for the swapgs dance.
>
> Using what helper? On x86_64, it can fault, and IIRC we explicitly
> don't allow loadsegment(gs, ...).
Just do the loadsegment() thing. The fact that we don't have a gs
version of it is legacy - to catch bad users. It shouldn't stop us
from having good users.
That said - can it really fault? Because if it can, then why can't %fs
fault? And on x86-64, we just do
asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));
and don't actually use 'loadsegment()' for _any_ of the segments. We
only do the fault protection on 32-bit.
In fact, we really should try to avoid taking faults here anyway,
shouldn't we? We haven't loaded enough of the context yet.
Hmm.
Maybe we should load only the fixed kernel segments at this point, and
then do all the loadsegment() of gs/fs in the later phase when we're
all set up.
THERE we can do the swapgs dance with interrupt tracing etc, because
*there* we actually are fully set up. I guess that means reloading the
FS/GS base MSR's,
Linus
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-12 18:36 ` Linus Torvalds
@ 2017-12-12 22:10 ` Andy Lutomirski
2017-12-12 22:33 ` Linus Torvalds
2017-12-13 11:16 ` Jarkko Nikula
0 siblings, 2 replies; 83+ messages in thread
From: Andy Lutomirski @ 2017-12-12 22:10 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andy Lutomirski, Pavel Machek, Zhang Rui, Thomas Gleixner,
Jarkko Nikula, Rafael J. Wysocki, Linux Kernel Mailing List,
the arch/x86 maintainers
On Tue, Dec 12, 2017 at 10:36 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski <luto@kernel.org> wrote:
>>>
>>> - do NOT use "load_gs_index()", which does that swapgs dance (twice!)
>>> and plays with interrupt state. Just load the segment register, and
>>> then do the wrmsrl() of the {FS,GS,KERNEL_GS}_BASE values. There is no
>>> need for the swapgs dance.
>>
>> Using what helper? On x86_64, it can fault, and IIRC we explicitly
>> don't allow loadsegment(gs, ...).
>
> Just do the loadsegment() thing. The fact that we don't have a gs
> version of it is legacy - to catch bad users. It shouldn't stop us
> from having good users.
>
> That said - can it really fault? Because if it can, then why can't %fs
> fault? And on x86-64, we just do
>
> asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));
>
> and don't actually use 'loadsegment()' for _any_ of the segments. We
> only do the fault protection on 32-bit.
>
> In fact, we really should try to avoid taking faults here anyway,
> shouldn't we? We haven't loaded enough of the context yet.
>
> Hmm.
>
> Maybe we should load only the fixed kernel segments at this point, and
> then do all the loadsegment() of gs/fs in the later phase when we're
> all set up.
>
> THERE we can do the swapgs dance with interrupt tracing etc, because
> *there* we actually are fully set up. I guess that means reloading the
> FS/GS base MSR's,
Like this?
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=cb855aa9679a15adbe43732f5854270de2b35856
I've barely tested it. It suspended and resumed once in a 64-bit VM.
It compiles on 32-bit.
(That link might not work for a little bit. I'm not sure what's up.)
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-12 22:10 ` Andy Lutomirski
@ 2017-12-12 22:33 ` Linus Torvalds
2017-12-12 23:10 ` Andy Lutomirski
2017-12-13 11:16 ` Jarkko Nikula
1 sibling, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2017-12-12 22:33 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Pavel Machek, Zhang Rui, Thomas Gleixner, Jarkko Nikula,
Rafael J. Wysocki, Linux Kernel Mailing List,
the arch/x86 maintainers
On Tue, Dec 12, 2017 at 2:10 PM, Andy Lutomirski <luto@kernel.org> wrote:
>
> (That link might not work for a little bit. I'm not sure what's up.)
I think your link is just bogus.
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes
works.
Anyway, the code looks much better to me.
Whether it _works_ is another matter, of course.
Linus
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-12 22:33 ` Linus Torvalds
@ 2017-12-12 23:10 ` Andy Lutomirski
0 siblings, 0 replies; 83+ messages in thread
From: Andy Lutomirski @ 2017-12-12 23:10 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andy Lutomirski, Pavel Machek, Zhang Rui, Thomas Gleixner,
Jarkko Nikula, Rafael J. Wysocki, Linux Kernel Mailing List,
the arch/x86 maintainers
On Tue, Dec 12, 2017 at 2:33 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Dec 12, 2017 at 2:10 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>
>> (That link might not work for a little bit. I'm not sure what's up.)
>
> I think your link is just bogus.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes
>
> works.
If I click "commit" on that page, I get my link, and it's still busted.
>
> Anyway, the code looks much better to me.
>
> Whether it _works_ is another matter, of course.
>
> Linus
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-12 22:10 ` Andy Lutomirski
2017-12-12 22:33 ` Linus Torvalds
@ 2017-12-13 11:16 ` Jarkko Nikula
2017-12-13 12:40 ` Ingo Molnar
2017-12-13 18:50 ` Andy Lutomirski
1 sibling, 2 replies; 83+ messages in thread
From: Jarkko Nikula @ 2017-12-13 11:16 UTC (permalink / raw)
To: Andy Lutomirski, Linus Torvalds
Cc: Pavel Machek, Zhang Rui, Thomas Gleixner, Rafael J. Wysocki,
Linux Kernel Mailing List, the arch/x86 maintainers
On 12/13/2017 12:10 AM, Andy Lutomirski wrote:
> On Tue, Dec 12, 2017 at 10:36 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski <luto@kernel.org> wrote:
> Like this?
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=cb855aa9679a15adbe43732f5854270de2b35856
>
> I've barely tested it. It suspended and resumed once in a 64-bit VM.
> It compiles on 32-bit.
>
I tested with "rtcwake -s 5 -m mem" 2-3 times on a real machine. Both
64-bit and 32-bit work.
Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-13 11:16 ` Jarkko Nikula
@ 2017-12-13 12:40 ` Ingo Molnar
2017-12-13 18:50 ` Andy Lutomirski
1 sibling, 0 replies; 83+ messages in thread
From: Ingo Molnar @ 2017-12-13 12:40 UTC (permalink / raw)
To: Jarkko Nikula
Cc: Andy Lutomirski, Linus Torvalds, Pavel Machek, Zhang Rui,
Thomas Gleixner, Rafael J. Wysocki, Linux Kernel Mailing List,
the arch/x86 maintainers
* Jarkko Nikula <jarkko.nikula@linux.intel.com> wrote:
> On 12/13/2017 12:10 AM, Andy Lutomirski wrote:
> > On Tue, Dec 12, 2017 at 10:36 AM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > > On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski <luto@kernel.org> wrote:
> > Like this?
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=cb855aa9679a15adbe43732f5854270de2b35856
> >
> > I've barely tested it. It suspended and resumed once in a 64-bit VM.
> > It compiles on 32-bit.
> >
> I tested with "rtcwake -s 5 -m mem" 2-3 times on a real machine. Both 64-bit
> and 32-bit work.
>
> Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Great, thanks for the testing!
Andy, mind sending the final fix with a changelog and the Reported-by/Tested-by
tags, etc?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-08 10:30 ` Thomas Gleixner
@ 2017-12-13 15:57 ` Thomas Gleixner
2017-12-13 16:23 ` Bjorn Helgaas
0 siblings, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-13 15:57 UTC (permalink / raw)
To: Maarten Lankhorst
Cc: Michal Hocko, Linus Torvalds, Rafael J. Wysocki, Andy Lutomirski,
Linux Kernel Mailing List, the arch/x86 maintainers,
Daniel Vetter, Bjorn Helgaas, Rafael J. Wysocki
So I was finally able to figure out what the hell is going on:
Suspend:
- The device suspend code puts the graphics card into a power
state != PCI_D0.
- Offline non boot CPUs
- Break interrupt affinity. Allocate new vector on CPU 0, compose and
write MSI message which ends up in:
__pci_write_msi_msg(entry, msg)
{
if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
/* Don't touch the hardware now */
} else {
....
}
entry->msg = *msg;
}
So because the device is not in PCI_D0 the message is not written. It's
written in the device resume path.
Resume:
[ 139.670446] ACPI: Low-level resume complete
[ 139.670541] PM: Restoring platform NVS memory
[ 139.672462] do_IRQ: 0.55 No irq handler for vector
[ 139.672475] Enabling non-boot CPUs ...
So the spurious interrupt happens early and way before the device resume
code writes the new MSI message.
I checked the behaviour on 4.14. The MSI write is delayed there in the same
way, but there is no spurious interrupt. There is no interrupt coming in at
all _BEFORE_ the device is put out of PCI_D0.
And this has certainly nothing to do with the vector management changes,
but I can't figure yet what makes that spurious interrupt to be sent.
Any ideas welcome.
Thanks,
tglx
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-13 15:57 ` Thomas Gleixner
@ 2017-12-13 16:23 ` Bjorn Helgaas
2017-12-13 16:41 ` Thomas Gleixner
0 siblings, 1 reply; 83+ messages in thread
From: Bjorn Helgaas @ 2017-12-13 16:23 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Maarten Lankhorst, Michal Hocko, Linus Torvalds,
Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pci, linux-pm
[+cc linux-pci, linux-pm]
On Wed, Dec 13, 2017 at 04:57:56PM +0100, Thomas Gleixner wrote:
> So I was finally able to figure out what the hell is going on:
>
> Suspend:
>
> - The device suspend code puts the graphics card into a power
> state != PCI_D0.
>
> - Offline non boot CPUs
>
> - Break interrupt affinity. Allocate new vector on CPU 0, compose and
> write MSI message which ends up in:
>
> __pci_write_msi_msg(entry, msg)
> {
> if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
> /* Don't touch the hardware now */
> } else {
> ....
> }
> entry->msg = *msg;
> }
>
> So because the device is not in PCI_D0 the message is not written. It's
> written in the device resume path.
I'm not a PM guru, but this ordering seems fragile. If we offline
CPUs before re-targeting interrupts directed at those CPUs, aren't we
always going to be at risk of sending interrupts to an offline CPU?
Even if the device is now asleep and therefore should not generate an
interrupt, it seems like there's a window when the device returns to
PCI_D0 where it could generate an interrupt before we have a chance to
update the MSI message.
> Resume:
> [ 139.670446] ACPI: Low-level resume complete
> [ 139.670541] PM: Restoring platform NVS memory
> [ 139.672462] do_IRQ: 0.55 No irq handler for vector
> [ 139.672475] Enabling non-boot CPUs ...
>
> So the spurious interrupt happens early and way before the device resume
> code writes the new MSI message.
>
> I checked the behaviour on 4.14. The MSI write is delayed there in the same
> way, but there is no spurious interrupt. There is no interrupt coming in at
> all _BEFORE_ the device is put out of PCI_D0.
>
> And this has certainly nothing to do with the vector management changes,
> but I can't figure yet what makes that spurious interrupt to be sent.
>
> Any ideas welcome.
>
> Thanks,
>
> tglx
>
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-13 16:23 ` Bjorn Helgaas
@ 2017-12-13 16:41 ` Thomas Gleixner
2017-12-13 17:45 ` Linus Torvalds
0 siblings, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-13 16:41 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Maarten Lankhorst, Michal Hocko, Linus Torvalds,
Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pci, linux-pm
On Wed, 13 Dec 2017, Bjorn Helgaas wrote:
> [+cc linux-pci, linux-pm]
>
> On Wed, Dec 13, 2017 at 04:57:56PM +0100, Thomas Gleixner wrote:
> > So I was finally able to figure out what the hell is going on:
> >
> > Suspend:
> >
> > - The device suspend code puts the graphics card into a power
> > state != PCI_D0.
> >
> > - Offline non boot CPUs
> >
> > - Break interrupt affinity. Allocate new vector on CPU 0, compose and
> > write MSI message which ends up in:
> >
> > __pci_write_msi_msg(entry, msg)
> > {
> > if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
> > /* Don't touch the hardware now */
> > } else {
> > ....
> > }
> > entry->msg = *msg;
> > }
> >
> > So because the device is not in PCI_D0 the message is not written. It's
> > written in the device resume path.
>
> I'm not a PM guru, but this ordering seems fragile. If we offline
> CPUs before re-targeting interrupts directed at those CPUs, aren't we
> always going to be at risk of sending interrupts to an offline CPU?
>
> Even if the device is now asleep and therefore should not generate an
> interrupt, it seems like there's a window when the device returns to
> PCI_D0 where it could generate an interrupt before we have a chance to
> update the MSI message.
Definitely. That was fragile forever but puzzles me is that I can't figure
out what now causes that spurious interrupt to surface out of the blue.
Thanks,
tglx
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-13 16:41 ` Thomas Gleixner
@ 2017-12-13 17:45 ` Linus Torvalds
2017-12-13 18:19 ` Thomas Gleixner
0 siblings, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2017-12-13 17:45 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pci, linux-pm
On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Definitely. That was fragile forever but puzzles me is that I can't figure
> out what now causes that spurious interrupt to surface out of the blue.
Perhaps just timing?
How hard would it be to change the ordering to just redirect irqs first?
Linus
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-13 17:45 ` Linus Torvalds
@ 2017-12-13 18:19 ` Thomas Gleixner
2017-12-13 20:52 ` Thomas Gleixner
2017-12-13 22:39 ` Rafael J. Wysocki
0 siblings, 2 replies; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-13 18:19 UTC (permalink / raw)
To: Linus Torvalds
Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pci, linux-pm
On Wed, 13 Dec 2017, Linus Torvalds wrote:
> On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > Definitely. That was fragile forever but puzzles me is that I can't figure
> > out what now causes that spurious interrupt to surface out of the blue.
>
> Perhaps just timing?
That's what I'm trying to figure out right now, because that is the only
sensible explanation left. The whole machinery of suspend is exactly the
same with and without the vector changes. I instrumented all functions
involved and the picture is the same. I even do not see any fundamental
timing differences where one would say: That's it.
What puzzles me even more is that in the range of commits I'm fiddling with
there is no other change than the vector management stuff and the point
where it breaks makes no sense at all. The point Maarten bisected it to
works nicely here, so that might just point to a very subtle timing issue.
> How hard would it be to change the ordering to just redirect irqs first?
The whole interrupt redirection happens when the non boot CPUs are brought
down, which is the very last step before the actual suspend happens.
We could probably do that earlier, but that's something Rafael needs to
answer ultimately.
Thanks,
tglx
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-13 11:16 ` Jarkko Nikula
2017-12-13 12:40 ` Ingo Molnar
@ 2017-12-13 18:50 ` Andy Lutomirski
1 sibling, 0 replies; 83+ messages in thread
From: Andy Lutomirski @ 2017-12-13 18:50 UTC (permalink / raw)
To: Jarkko Nikula
Cc: Andy Lutomirski, Linus Torvalds, Pavel Machek, Zhang Rui,
Thomas Gleixner, Rafael J. Wysocki, Linux Kernel Mailing List,
the arch/x86 maintainers
On Wed, Dec 13, 2017 at 3:16 AM, Jarkko Nikula
<jarkko.nikula@linux.intel.com> wrote:
> On 12/13/2017 12:10 AM, Andy Lutomirski wrote:
>>
>> On Tue, Dec 12, 2017 at 10:36 AM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>>
>>> On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski <luto@kernel.org>
>>> wrote:
>>
>> Like this?
>>
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=cb855aa9679a15adbe43732f5854270de2b35856
>>
>> I've barely tested it. It suspended and resumed once in a 64-bit VM.
>> It compiles on 32-bit.
>>
> I tested with "rtcwake -s 5 -m mem" 2-3 times on a real machine. Both 64-bit
> and 32-bit work.
>
> Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Thanks!
I've split the patch into three and put your Tested-By on all of them,
since the end state is exactly the same as what you tested. I'll
email them out once I test them myself :)
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-13 18:19 ` Thomas Gleixner
@ 2017-12-13 20:52 ` Thomas Gleixner
2017-12-13 21:06 ` Thomas Gleixner
2017-12-13 22:39 ` Rafael J. Wysocki
1 sibling, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-13 20:52 UTC (permalink / raw)
To: Linus Torvalds
Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pci, linux-pm
On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Linus Torvalds wrote:
>
> > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > >
> > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > out what now causes that spurious interrupt to surface out of the blue.
> >
> > Perhaps just timing?
>
> That's what I'm trying to figure out right now, because that is the only
> sensible explanation left. The whole machinery of suspend is exactly the
> same with and without the vector changes. I instrumented all functions
> involved and the picture is the same. I even do not see any fundamental
> timing differences where one would say: That's it.
>
> What puzzles me even more is that in the range of commits I'm fiddling with
> there is no other change than the vector management stuff and the point
> where it breaks makes no sense at all. The point Maarten bisected it to
> works nicely here, so that might just point to a very subtle timing issue.
After doing more debugging on this it turns out that this looks like a
legacy interrupt coming in. The vector number is always 55, which is legacy
IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
masked and vector 55 is completely unused.
More questions than answers. Still investigating.
Thanks,
tglx
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-13 20:52 ` Thomas Gleixner
@ 2017-12-13 21:06 ` Thomas Gleixner
2017-12-13 22:48 ` Rafael J. Wysocki
2017-12-14 11:54 ` Thomas Gleixner
0 siblings, 2 replies; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-13 21:06 UTC (permalink / raw)
To: Linus Torvalds
Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pci, linux-pm
On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > On Wed, 13 Dec 2017, Linus Torvalds wrote:
> >
> > > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > >
> > > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > > out what now causes that spurious interrupt to surface out of the blue.
> > >
> > > Perhaps just timing?
> >
> > That's what I'm trying to figure out right now, because that is the only
> > sensible explanation left. The whole machinery of suspend is exactly the
> > same with and without the vector changes. I instrumented all functions
> > involved and the picture is the same. I even do not see any fundamental
> > timing differences where one would say: That's it.
> >
> > What puzzles me even more is that in the range of commits I'm fiddling with
> > there is no other change than the vector management stuff and the point
> > where it breaks makes no sense at all. The point Maarten bisected it to
> > works nicely here, so that might just point to a very subtle timing issue.
>
> After doing more debugging on this it turns out that this looks like a
> legacy interrupt coming in. The vector number is always 55, which is legacy
> IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
> masked and vector 55 is completely unused.
>
> More questions than answers. Still investigating.
And it does not explain Maartens report which gets a spurious vector 33 on
CPU4 after the non boot cpus have been brought online again. And that's the
vector which was assigned before the affinity was moved by unplugging CPU4.
Hrmpf. Even more mystery to solve.
Thanks,
tglx
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-13 18:19 ` Thomas Gleixner
2017-12-13 20:52 ` Thomas Gleixner
@ 2017-12-13 22:39 ` Rafael J. Wysocki
2017-12-13 23:26 ` Rafael J. Wysocki
1 sibling, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-13 22:39 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pci, linux-pm
On Wednesday, December 13, 2017 7:19:17 PM CET Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Linus Torvalds wrote:
>
> > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > >
> > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > out what now causes that spurious interrupt to surface out of the blue.
> >
> > Perhaps just timing?
>
> That's what I'm trying to figure out right now, because that is the only
> sensible explanation left. The whole machinery of suspend is exactly the
> same with and without the vector changes. I instrumented all functions
> involved and the picture is the same. I even do not see any fundamental
> timing differences where one would say: That's it.
>
> What puzzles me even more is that in the range of commits I'm fiddling with
> there is no other change than the vector management stuff and the point
> where it breaks makes no sense at all. The point Maarten bisected it to
> works nicely here, so that might just point to a very subtle timing issue.
>
> > How hard would it be to change the ordering to just redirect irqs first?
>
> The whole interrupt redirection happens when the non boot CPUs are brought
> down, which is the very last step before the actual suspend happens.
>
> We could probably do that earlier, but that's something Rafael needs to
> answer ultimately.
Well, that's both flattering and concerning. ;-)
Anyway, yes, we can do that earlier AFAICS. Action handlers are not going to
run after we've called suspend_device_irqs() which happens before the final
stage of PCI devices suspend (suspend_noirq) and it doesn't matter which CPU
gets the interrupt from that point on (it is either wakeup or unwanted then).
Thanks,
Rafael
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-13 21:06 ` Thomas Gleixner
@ 2017-12-13 22:48 ` Rafael J. Wysocki
2017-12-14 11:54 ` Thomas Gleixner
1 sibling, 0 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-13 22:48 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pci, linux-pm
On Wednesday, December 13, 2017 10:06:40 PM CET Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > > On Wed, 13 Dec 2017, Linus Torvalds wrote:
> > >
> > > > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > > >
> > > > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > > > out what now causes that spurious interrupt to surface out of the blue.
> > > >
> > > > Perhaps just timing?
> > >
> > > That's what I'm trying to figure out right now, because that is the only
> > > sensible explanation left. The whole machinery of suspend is exactly the
> > > same with and without the vector changes. I instrumented all functions
> > > involved and the picture is the same. I even do not see any fundamental
> > > timing differences where one would say: That's it.
> > >
> > > What puzzles me even more is that in the range of commits I'm fiddling with
> > > there is no other change than the vector management stuff and the point
> > > where it breaks makes no sense at all. The point Maarten bisected it to
> > > works nicely here, so that might just point to a very subtle timing issue.
> >
> > After doing more debugging on this it turns out that this looks like a
> > legacy interrupt coming in. The vector number is always 55, which is legacy
> > IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
> > masked and vector 55 is completely unused.
> >
> > More questions than answers. Still investigating.
>
> And it does not explain Maartens report which gets a spurious vector 33 on
> CPU4 after the non boot cpus have been brought online again. And that's the
> vector which was assigned before the affinity was moved by unplugging CPU4.
>
> Hrmpf. Even more mystery to solve.
Any chance to look at /proc/interrupts from a machine where that can be
reproduced?
I'm also curious if that can be reproduced by doing CPU offline/online
without suspending?
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-13 22:39 ` Rafael J. Wysocki
@ 2017-12-13 23:26 ` Rafael J. Wysocki
0 siblings, 0 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-13 23:26 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, Linux PCI, Linux PM
On Wed, Dec 13, 2017 at 11:39 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Wednesday, December 13, 2017 7:19:17 PM CET Thomas Gleixner wrote:
>> On Wed, 13 Dec 2017, Linus Torvalds wrote:
>>
>> > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> > >
>> > > Definitely. That was fragile forever but puzzles me is that I can't figure
>> > > out what now causes that spurious interrupt to surface out of the blue.
>> >
>> > Perhaps just timing?
>>
>> That's what I'm trying to figure out right now, because that is the only
>> sensible explanation left. The whole machinery of suspend is exactly the
>> same with and without the vector changes. I instrumented all functions
>> involved and the picture is the same. I even do not see any fundamental
>> timing differences where one would say: That's it.
>>
>> What puzzles me even more is that in the range of commits I'm fiddling with
>> there is no other change than the vector management stuff and the point
>> where it breaks makes no sense at all. The point Maarten bisected it to
>> works nicely here, so that might just point to a very subtle timing issue.
>>
>> > How hard would it be to change the ordering to just redirect irqs first?
>>
>> The whole interrupt redirection happens when the non boot CPUs are brought
>> down, which is the very last step before the actual suspend happens.
>>
>> We could probably do that earlier, but that's something Rafael needs to
>> answer ultimately.
>
> Well, that's both flattering and concerning. ;-)
>
> Anyway, yes, we can do that earlier AFAICS. Action handlers are not going to
> run after we've called suspend_device_irqs() which happens before the final
> stage of PCI devices suspend (suspend_noirq) and it doesn't matter which CPU
> gets the interrupt from that point on (it is either wakeup or unwanted then).
There is a catch that we don't and likely should not do that for
suspend-to-idle, but since we have pm_suspend_target_state now, that
case can be distinguished from the "full suspend" one readily.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-13 21:06 ` Thomas Gleixner
2017-12-13 22:48 ` Rafael J. Wysocki
@ 2017-12-14 11:54 ` Thomas Gleixner
2017-12-14 12:12 ` Rafael J. Wysocki
` (2 more replies)
1 sibling, 3 replies; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-14 11:54 UTC (permalink / raw)
To: Linus Torvalds
Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pci, linux-pm
On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > > On Wed, 13 Dec 2017, Linus Torvalds wrote:
> > >
> > > > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > > >
> > > > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > > > out what now causes that spurious interrupt to surface out of the blue.
> > > >
> > > > Perhaps just timing?
> > >
> > > That's what I'm trying to figure out right now, because that is the only
> > > sensible explanation left. The whole machinery of suspend is exactly the
> > > same with and without the vector changes. I instrumented all functions
> > > involved and the picture is the same. I even do not see any fundamental
> > > timing differences where one would say: That's it.
> > >
> > > What puzzles me even more is that in the range of commits I'm fiddling with
> > > there is no other change than the vector management stuff and the point
> > > where it breaks makes no sense at all. The point Maarten bisected it to
> > > works nicely here, so that might just point to a very subtle timing issue.
> >
> > After doing more debugging on this it turns out that this looks like a
> > legacy interrupt coming in. The vector number is always 55, which is legacy
> > IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
> > masked and vector 55 is completely unused.
> >
> > More questions than answers. Still investigating.
At least that one could be explained by the changes. In the previous
management scheme the IOAPIC interrupts were always allocated even when the
interrupt was not in use. The new scheme does not longer do that because
people complained about the vector waste (16 vectors on each CPU) and it
got rid of all the special casing of IRQ0-15.
So the old scheme silently consumed the spurious vector. I added debug code
to that effect to 4.14 and on that machine IRQ7 is triggered at the same
point post resume and the core code drops it silently because the interrupt
is marked masked and no action assigned.
So the only difference to today is that the new code complains, while the
old one does an extra mask of the already masked IOAPIC pin and silently
returns.
After quite some investigation I found out that its independent of the
graphics thing. That's a genuine issue on that platform which seems to emit
random legacy vectors which were never ever used for unknown reasons. I
verified that both the IOAPIC and the PIC are masked, so they cannot send
crap. Though it turned out that the silly firmware unmasks the PIC and
leaves it that way when it returns from suspend. Now there is a race
whether the kernel resume path manages to mask the PIC again early enough
before something triggers IRQ7 or not. Adding/removing debug code makes the
problem come and go. So I really don't worry about that one and rather
prefer to have the spurious interrupt printed than silently consumed by
chance.
Now the graphics issue is a different story. That only happens on
hibernation after doing the snapshot. There all non boot cpus are onlined
again and after that the devices are 'thawed'. The following reenable of
interrupts fails because i915 is not in PCI_D0 state.
Suspend:
irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
__pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
__pci_write_msi_msg: Not written <- Device not in PCI_D0
....
device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [resume]
pci_pm_resume_noirq <-dpm_run_callback
pci_pm_resume_noirq <-dpm_run_callback
pci_pm_default_resume_early <-pci_pm_resume_noirq
pci_pm_default_resume_early <-pci_pm_resume_noirq
__pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a <-- Set the new affinity
device_pm_callback_end: i915 0000:00:02.0, err=0
Hibernate:
irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
__pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
__pci_write_msi_msg: Not written <- Device not in PCI_D0
....
device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
pci_pm_thaw_noirq <-dpm_run_callback
__pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
__pci_write_msi_msg: Not written <--- Device is not in PCI_D0
device_pm_callback_end: i915 0000:00:02.0, err=0
So that code path fails to set the new affinity because at the point where
the MSI msg should be written the device state is != PCI_D0.
Now, what's different vs. 4.14:
The 4.14 code accidentaly had the irq descriptor for this vector still
populated in the old CPU due to the convoluted way the vector allocation
worked. I have still to investigate if one of those cases is actually
leaking the descriptor, which would be a fatal bug.
But the new code does a proper cleanup and does not repopulate it on the
offline CPU. So that unearthes the issue. I'm handing that over to the PM
folks to look at. I got lost in that maze of callbacks.
Thanks,
tglx
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-14 11:54 ` Thomas Gleixner
@ 2017-12-14 12:12 ` Rafael J. Wysocki
2017-12-14 12:30 ` Thomas Gleixner
2017-12-14 13:24 ` Linux 4.15-rc2: Regression in resume from ACPI S3 Thomas Gleixner
2017-12-14 19:03 ` Linus Torvalds
2 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-14 12:12 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pci, linux-pm
On Thursday, December 14, 2017 12:54:05 PM CET Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > > On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > > > On Wed, 13 Dec 2017, Linus Torvalds wrote:
> > > >
> > > > > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > > > >
> > > > > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > > > > out what now causes that spurious interrupt to surface out of the blue.
> > > > >
> > > > > Perhaps just timing?
> > > >
> > > > That's what I'm trying to figure out right now, because that is the only
> > > > sensible explanation left. The whole machinery of suspend is exactly the
> > > > same with and without the vector changes. I instrumented all functions
> > > > involved and the picture is the same. I even do not see any fundamental
> > > > timing differences where one would say: That's it.
> > > >
> > > > What puzzles me even more is that in the range of commits I'm fiddling with
> > > > there is no other change than the vector management stuff and the point
> > > > where it breaks makes no sense at all. The point Maarten bisected it to
> > > > works nicely here, so that might just point to a very subtle timing issue.
> > >
> > > After doing more debugging on this it turns out that this looks like a
> > > legacy interrupt coming in. The vector number is always 55, which is legacy
> > > IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
> > > masked and vector 55 is completely unused.
> > >
> > > More questions than answers. Still investigating.
>
> At least that one could be explained by the changes. In the previous
> management scheme the IOAPIC interrupts were always allocated even when the
> interrupt was not in use. The new scheme does not longer do that because
> people complained about the vector waste (16 vectors on each CPU) and it
> got rid of all the special casing of IRQ0-15.
>
> So the old scheme silently consumed the spurious vector. I added debug code
> to that effect to 4.14 and on that machine IRQ7 is triggered at the same
> point post resume and the core code drops it silently because the interrupt
> is marked masked and no action assigned.
>
> So the only difference to today is that the new code complains, while the
> old one does an extra mask of the already masked IOAPIC pin and silently
> returns.
>
> After quite some investigation I found out that its independent of the
> graphics thing. That's a genuine issue on that platform which seems to emit
> random legacy vectors which were never ever used for unknown reasons. I
> verified that both the IOAPIC and the PIC are masked, so they cannot send
> crap. Though it turned out that the silly firmware unmasks the PIC and
> leaves it that way when it returns from suspend. Now there is a race
> whether the kernel resume path manages to mask the PIC again early enough
> before something triggers IRQ7 or not. Adding/removing debug code makes the
> problem come and go. So I really don't worry about that one and rather
> prefer to have the spurious interrupt printed than silently consumed by
> chance.
OK
> Now the graphics issue is a different story. That only happens on
> hibernation after doing the snapshot. There all non boot cpus are onlined
> again and after that the devices are 'thawed'. The following reenable of
> interrupts fails because i915 is not in PCI_D0 state.
>
> Suspend:
>
> irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> __pci_write_msi_msg: Not written <- Device not in PCI_D0
> ....
> device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [resume]
> pci_pm_resume_noirq <-dpm_run_callback
> pci_pm_resume_noirq <-dpm_run_callback
> pci_pm_default_resume_early <-pci_pm_resume_noirq
> pci_pm_default_resume_early <-pci_pm_resume_noirq
> __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a <-- Set the new affinity
> device_pm_callback_end: i915 0000:00:02.0, err=0
So this works, because we power up the device during resume even if it
had been suspended (via runtime PM) before the suspend started.
> Hibernate:
>
> irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> __pci_write_msi_msg: Not written <- Device not in PCI_D0
> ....
> device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
> pci_pm_thaw_noirq <-dpm_run_callback
> __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> __pci_write_msi_msg: Not written <--- Device is not in PCI_D0
> device_pm_callback_end: i915 0000:00:02.0, err=0
And here we try to leave the device alone which is OK for devices in D0,
but not for suspended ones.
It looks like we need to power up them at the "thaw" time too or at least
I don't see how to address that differently.
> So that code path fails to set the new affinity because at the point where
> the MSI msg should be written the device state is != PCI_D0.
>
> Now, what's different vs. 4.14:
>
> The 4.14 code accidentaly had the irq descriptor for this vector still
> populated in the old CPU due to the convoluted way the vector allocation
> worked. I have still to investigate if one of those cases is actually
> leaking the descriptor, which would be a fatal bug.
>
> But the new code does a proper cleanup and does not repopulate it on the
> offline CPU. So that unearthes the issue. I'm handing that over to the PM
> folks to look at. I got lost in that maze of callbacks.
OK, thanks so much for getting to the bottom of this!
Rafael
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-14 12:12 ` Rafael J. Wysocki
@ 2017-12-14 12:30 ` Thomas Gleixner
2017-12-14 15:30 ` Rafael J. Wysocki
0 siblings, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-14 12:30 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pci, linux-pm
On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
> On Thursday, December 14, 2017 12:54:05 PM CET Thomas Gleixner wrote:
> > Now the graphics issue is a different story. That only happens on
> > hibernation after doing the snapshot. There all non boot cpus are onlined
> > again and after that the devices are 'thawed'. The following reenable of
> > interrupts fails because i915 is not in PCI_D0 state.
> >
> > Suspend:
> >
> > irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> > __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> > __pci_write_msi_msg: Not written <- Device not in PCI_D0
> > ....
> > device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [resume]
> > pci_pm_resume_noirq <-dpm_run_callback
> > pci_pm_resume_noirq <-dpm_run_callback
> > pci_pm_default_resume_early <-pci_pm_resume_noirq
> > pci_pm_default_resume_early <-pci_pm_resume_noirq
> > __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a <-- Set the new affinity
> > device_pm_callback_end: i915 0000:00:02.0, err=0
>
> So this works, because we power up the device during resume even if it
> had been suspended (via runtime PM) before the suspend started.
>
> > Hibernate:
> >
> > irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> > __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> > __pci_write_msi_msg: Not written <- Device not in PCI_D0
> > ....
> > device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
> > pci_pm_thaw_noirq <-dpm_run_callback
> > __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> > __pci_write_msi_msg: Not written <--- Device is not in PCI_D0
> > device_pm_callback_end: i915 0000:00:02.0, err=0
>
> And here we try to leave the device alone which is OK for devices in D0,
> but not for suspended ones.
>
> It looks like we need to power up them at the "thaw" time too or at least
> I don't see how to address that differently.
The question is whether the code which brings the device out of D0 should
write the message unconditionally. That would be sufficient I think.
Thanks,
tglx
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-14 11:54 ` Thomas Gleixner
2017-12-14 12:12 ` Rafael J. Wysocki
@ 2017-12-14 13:24 ` Thomas Gleixner
2017-12-14 19:03 ` Linus Torvalds
2 siblings, 0 replies; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-14 13:24 UTC (permalink / raw)
To: Linus Torvalds
Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pci, linux-pm
On Thu, 14 Dec 2017, Thomas Gleixner wrote:
> Now, what's different vs. 4.14:
>
> The 4.14 code accidentaly had the irq descriptor for this vector still
> populated in the old CPU due to the convoluted way the vector allocation
> worked. I have still to investigate if one of those cases is actually
> leaking the descriptor, which would be a fatal bug.
It doesn't leak. It repopulates it at the same place out of sheer luck.
Thanks,
tglx
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-14 12:30 ` Thomas Gleixner
@ 2017-12-14 15:30 ` Rafael J. Wysocki
2017-12-14 15:52 ` Thomas Gleixner
0 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-14 15:30 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pci, linux-pm
On Thursday, December 14, 2017 1:30:37 PM CET Thomas Gleixner wrote:
> On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
> > On Thursday, December 14, 2017 12:54:05 PM CET Thomas Gleixner wrote:
> > > Now the graphics issue is a different story. That only happens on
> > > hibernation after doing the snapshot. There all non boot cpus are onlined
> > > again and after that the devices are 'thawed'. The following reenable of
> > > interrupts fails because i915 is not in PCI_D0 state.
> > >
> > > Suspend:
> > >
> > > irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> > > __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> > > __pci_write_msi_msg: Not written <- Device not in PCI_D0
> > > ....
> > > device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [resume]
> > > pci_pm_resume_noirq <-dpm_run_callback
> > > pci_pm_resume_noirq <-dpm_run_callback
> > > pci_pm_default_resume_early <-pci_pm_resume_noirq
> > > pci_pm_default_resume_early <-pci_pm_resume_noirq
> > > __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a <-- Set the new affinity
> > > device_pm_callback_end: i915 0000:00:02.0, err=0
> >
> > So this works, because we power up the device during resume even if it
> > had been suspended (via runtime PM) before the suspend started.
> >
> > > Hibernate:
> > >
> > > irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> > > __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> > > __pci_write_msi_msg: Not written <- Device not in PCI_D0
> > > ....
> > > device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
> > > pci_pm_thaw_noirq <-dpm_run_callback
> > > __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> > > __pci_write_msi_msg: Not written <--- Device is not in PCI_D0
> > > device_pm_callback_end: i915 0000:00:02.0, err=0
> >
> > And here we try to leave the device alone which is OK for devices in D0,
> > but not for suspended ones.
> >
> > It looks like we need to power up them at the "thaw" time too or at least
> > I don't see how to address that differently.
>
> The question is whether the code which brings the device out of D0 should
> write the message unconditionally. That would be sufficient I think.
It doesn't have to do that.
The problem here is that pci_pm_thaw_noirq() calls pci_restore_state() which
in fact requires the device to be in D0, so the caller should put it into
D0 instead of trying to "update" its power state.
[Note that the PCI layer doesn't put devices into low-power states during the
hibernation's "freeze" transition, but drivers can legitimately do that in
their "freeze" callbacks which was overlooked in that code and that's what
i915 does.]
So IMO what we need is the change below. I'm going to test it shortly,
but please give it a go too.
---
drivers/pci/pci-driver.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_resume_early(dev);
- pci_update_current_state(pci_dev, PCI_D0);
+ /*
+ * pci_restore_state() requires the device to be in D0 (because of MSI
+ * restoration among other things), so force it into D0 in case the
+ * driver's "freeze" callbacks put it into a low-power state directly.
+ */
+ pci_set_power_state(pci_dev, PCI_D0);
pci_restore_state(pci_dev);
if (drv && drv->pm && drv->pm->thaw_noirq)
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-14 15:30 ` Rafael J. Wysocki
@ 2017-12-14 15:52 ` Thomas Gleixner
2017-12-14 15:54 ` Rafael J. Wysocki
0 siblings, 1 reply; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-14 15:52 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pci, linux-pm
On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
> The problem here is that pci_pm_thaw_noirq() calls pci_restore_state() which
> in fact requires the device to be in D0, so the caller should put it into
> D0 instead of trying to "update" its power state.
>
> [Note that the PCI layer doesn't put devices into low-power states during the
> hibernation's "freeze" transition, but drivers can legitimately do that in
> their "freeze" callbacks which was overlooked in that code and that's what
> i915 does.]
>
> So IMO what we need is the change below. I'm going to test it shortly,
> but please give it a go too.
So now this looks more reasonable:
irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
__pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
__pci_write_msi_msg: Not written
...
device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
pci_pm_thaw_noirq <-dpm_run_callback
__pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
device_pm_callback_end: i915 0000:00:02.0, err=0
...
resume_irqs: Resume 125
...
irq_handler_entry: irq=125 name=i915
Thanks,
tglx
> ---
> drivers/pci/pci-driver.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> Index: linux-pm/drivers/pci/pci-driver.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
> if (pci_has_legacy_pm_support(pci_dev))
> return pci_legacy_resume_early(dev);
>
> - pci_update_current_state(pci_dev, PCI_D0);
> + /*
> + * pci_restore_state() requires the device to be in D0 (because of MSI
> + * restoration among other things), so force it into D0 in case the
> + * driver's "freeze" callbacks put it into a low-power state directly.
> + */
> + pci_set_power_state(pci_dev, PCI_D0);
> pci_restore_state(pci_dev);
>
> if (drv && drv->pm && drv->pm->thaw_noirq)
>
>
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-14 15:52 ` Thomas Gleixner
@ 2017-12-14 15:54 ` Rafael J. Wysocki
2017-12-14 16:17 ` Maarten Lankhorst
2017-12-15 2:07 ` [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq() Rafael J. Wysocki
0 siblings, 2 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-14 15:54 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pci, linux-pm
On Thursday, December 14, 2017 4:52:22 PM CET Thomas Gleixner wrote:
> On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
> > The problem here is that pci_pm_thaw_noirq() calls pci_restore_state() which
> > in fact requires the device to be in D0, so the caller should put it into
> > D0 instead of trying to "update" its power state.
> >
> > [Note that the PCI layer doesn't put devices into low-power states during the
> > hibernation's "freeze" transition, but drivers can legitimately do that in
> > their "freeze" callbacks which was overlooked in that code and that's what
> > i915 does.]
> >
> > So IMO what we need is the change below. I'm going to test it shortly,
> > but please give it a go too.
>
> So now this looks more reasonable:
>
> irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> __pci_write_msi_msg: Not written
> ...
> device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
> pci_pm_thaw_noirq <-dpm_run_callback
> __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> device_pm_callback_end: i915 0000:00:02.0, err=0
> ...
> resume_irqs: Resume 125
> ...
> irq_handler_entry: irq=125 name=i915
Cool.
Let me respin it with a changelog etc then.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-14 15:54 ` Rafael J. Wysocki
@ 2017-12-14 16:17 ` Maarten Lankhorst
2017-12-15 2:07 ` [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq() Rafael J. Wysocki
1 sibling, 0 replies; 83+ messages in thread
From: Maarten Lankhorst @ 2017-12-14 16:17 UTC (permalink / raw)
To: Rafael J. Wysocki, Thomas Gleixner
Cc: Linus Torvalds, Bjorn Helgaas, Michal Hocko, Andy Lutomirski,
Linux Kernel Mailing List, the arch/x86 maintainers,
Daniel Vetter, Bjorn Helgaas, Rafael J. Wysocki, linux-pci,
linux-pm
Op 14-12-17 om 16:54 schreef Rafael J. Wysocki:
> On Thursday, December 14, 2017 4:52:22 PM CET Thomas Gleixner wrote:
>> On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
>>> The problem here is that pci_pm_thaw_noirq() calls pci_restore_state() which
>>> in fact requires the device to be in D0, so the caller should put it into
>>> D0 instead of trying to "update" its power state.
>>>
>>> [Note that the PCI layer doesn't put devices into low-power states during the
>>> hibernation's "freeze" transition, but drivers can legitimately do that in
>>> their "freeze" callbacks which was overlooked in that code and that's what
>>> i915 does.]
>>>
>>> So IMO what we need is the change below. I'm going to test it shortly,
>>> but please give it a go too.
>> So now this looks more reasonable:
>>
>> irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
>> __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
>> __pci_write_msi_msg: Not written
>> ...
>> device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
>> pci_pm_thaw_noirq <-dpm_run_callback
>> __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
>> device_pm_callback_end: i915 0000:00:02.0, err=0
>> ...
>> resume_irqs: Resume 125
>> ...
>> irq_handler_entry: irq=125 name=i915
> Cool.
>
> Let me respin it with a changelog etc then.
>
> Thanks,
> Rafael
>
>
The machine I was using for reproducing the bug appears to be fixed with this patch, so I now sent
it to intel's trybot for results.
https://patchwork.freedesktop.org/series/35367/
Thanks for looking at the bug!
~Maarten
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-14 11:54 ` Thomas Gleixner
2017-12-14 12:12 ` Rafael J. Wysocki
2017-12-14 13:24 ` Linux 4.15-rc2: Regression in resume from ACPI S3 Thomas Gleixner
@ 2017-12-14 19:03 ` Linus Torvalds
2017-12-14 22:36 ` Thomas Gleixner
2 siblings, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2017-12-14 19:03 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pci, linux-pm
On Thu, Dec 14, 2017 at 3:54 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> So the old scheme silently consumed the spurious vector. I added debug code
> to that effect to 4.14 and on that machine IRQ7 is triggered at the same
> point post resume and the core code drops it silently because the interrupt
> is marked masked and no action assigned.
>
> So the only difference to today is that the new code complains, while the
> old one does an extra mask of the already masked IOAPIC pin and silently
> returns.
Great debugging, and it looks like Rafael has a patch that already got
positive testing.
I just wanted to pipe up about that "irq7", because judging from your
email it seems like you think it's a real irq:
> Now there is a race
> whether the kernel resume path manages to mask the PIC again early enough
> before something triggers IRQ7 or not.
..and that's not how the PIC works.
In fact, "legacy irq 7" is the _normal_ and very traditional spurious
interrupt, and it's documented. If the PIC gets an interrupt from
_any_ source, but the interrupt goes away before the PIC gets an
acknowledge from the CPU (and by "acknowledge", I'm not talking about
the explicit software IRQ ACK, I'm talking about the hardware
protocol, between the PIC and the CPU), the PIC will then report irq 7
as the interrupt - regardless of what the original was.
The reason is almost always something like
- CPU interrupts are disabled or masked
- driver does a write to the external hardware that causes an
interrupt to be raised
- CPU doesn't react to the irq due to the disabled/masked nature
- but the driver then does something that masks the interrupt again
- interrupts are enabled/unmasked on the CPU
- CPU now acks the interrupt, but the PIC no longer sees any
interrupt source, so the PIC (that has to reply with *something*)
replies with that documented spurious irq7.
To confuse things further, irq7 is not _exclusively_ the spurious
interrupt, You can definitely put real hardware and connect it to pin7
of the PIC, and get real irq7 reports.
And to confuse things even *more*, this "irq7" thing is per-PIC, and
the PC model obviously has the whole "nested PIC" thing where the
second PIC is connected to irq2 of the first PIC. So there are *two*
different "spurious interrupt" reports, one for each PIC.
Anyway, to avoid this issue, drivers should strive to
(a) actually take the interrupt when doing things that can cause
them, and have the interrupt handler do whatever it is that causes the
interrupt to go away (ie: "normal operation")
(b) if you play games with clearing the source of the interrupt
_without_ taking the interrupt, you should strive to basically mask
the interrupt first.
So to do (b) you can do something like
mask_device_interrupt(dev);
read_from_device_to_synchronize(dev);
instead of (or perhaps _before_) disabling interrupts at a CPU level.
Suspend/resume obviously does tend to play games with these kinds of
things where you are no longer in "normal operation" and you do setup
without having interrupts actually enabled.
Or you can just decide that spurious interrupts are ok, and ignore the
issue. But they *can* be very confusing, and obviously in this case
that confusion then seems to have caused actual problems.
Linus
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH] Fix resume on x86-32 machines
2017-12-11 18:41 ` Andy Lutomirski
2017-12-11 19:12 ` Linus Torvalds
@ 2017-12-14 20:38 ` Pavel Machek
2017-12-14 20:47 ` Linus Torvalds
1 sibling, 1 reply; 83+ messages in thread
From: Pavel Machek @ 2017-12-14 20:38 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Linus Torvalds, Rafael J. Wysocki, Andrew Lutomirski, Zhang Rui,
Thomas Gleixner, Jarkko Nikula, Linux Kernel Mailing List,
the arch/x86 maintainers
[-- Attachment #1: Type: text/plain, Size: 738 bytes --]
Hi!
> > But yes, Rafael's patch looks like the minimal one-liner. But I think
> > we should do the %gs load early too for the 32-bit stack canary case,
> > kind of like we need to do %fs for percpu base.
>
> I'll try to get to this in a day or so -- is that okay? Or should we
> do some trivial fix/revert and fix it for real next time around?
I can test the patch....
But given this is already "regression fix for x86-64 caused regression
on x86-32", I really believe we should merge trivial fix now, and do
the cleanups / nicer fixes sometime later?
Thanks,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH] Fix resume on x86-32 machines
2017-12-14 20:38 ` Pavel Machek
@ 2017-12-14 20:47 ` Linus Torvalds
2017-12-14 21:20 ` Andy Lutomirski
2017-12-14 22:22 ` Pavel Machek
0 siblings, 2 replies; 83+ messages in thread
From: Linus Torvalds @ 2017-12-14 20:47 UTC (permalink / raw)
To: Pavel Machek
Cc: Andy Lutomirski, Rafael J. Wysocki, Andrew Lutomirski, Zhang Rui,
Thomas Gleixner, Jarkko Nikula, Linux Kernel Mailing List,
the arch/x86 maintainers
On Thu, Dec 14, 2017 at 12:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
>
> But given this is already "regression fix for x86-64 caused regression
> on x86-32", I really believe we should merge trivial fix now, and do
> the cleanups / nicer fixes sometime later?
The fix patch was already posted, but in another thread (confusingly
with _almost_ the same subject: "Re: Linux 4.15-rc2: Regression in
resume from ACPI S3").
Here:
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes
but it hasn't actually gotten to me yet (I think Luto made a new
version where he split it up into three smaller patches, and I'm
assuming it's in one of the -tip trees heading for me asap).
All the x86 people have been insanely busy with the crazy page table
isolation patches, that probably is slowing down things. But I was
expecting to get the -tip tree pulls tomorrow (.. because.. Friday.
It's my busiest day of the week because everybody wants to get their
work out before the weekend).
Linus
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH] Fix resume on x86-32 machines
2017-12-14 20:47 ` Linus Torvalds
@ 2017-12-14 21:20 ` Andy Lutomirski
2017-12-14 22:22 ` Pavel Machek
1 sibling, 0 replies; 83+ messages in thread
From: Andy Lutomirski @ 2017-12-14 21:20 UTC (permalink / raw)
To: Linus Torvalds
Cc: Pavel Machek, Rafael J. Wysocki, Andrew Lutomirski, Zhang Rui,
Thomas Gleixner, Jarkko Nikula, Linux Kernel Mailing List,
the arch/x86 maintainers
On Thu, Dec 14, 2017 at 12:47 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Dec 14, 2017 at 12:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
>>
>> But given this is already "regression fix for x86-64 caused regression
>> on x86-32", I really believe we should merge trivial fix now, and do
>> the cleanups / nicer fixes sometime later?
>
> The fix patch was already posted, but in another thread (confusingly
> with _almost_ the same subject: "Re: Linux 4.15-rc2: Regression in
> resume from ACPI S3").
>
> Here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes
>
> but it hasn't actually gotten to me yet (I think Luto made a new
> version where he split it up into three smaller patches, and I'm
> assuming it's in one of the -tip trees heading for me asap).
>
> All the x86 people have been insanely busy with the crazy page table
> isolation patches, that probably is slowing down things. But I was
> expecting to get the -tip tree pulls tomorrow (.. because.. Friday.
> It's my busiest day of the week because everybody wants to get their
> work out before the weekend).
Nah, it's that I was busy with the stupid PTI stuff, and the original
version of the patch didn't compile on some configurations. I've
tested the version I just sent for suspend-to-ram and suspend-to-disk
on 32-bit and 64-bit in a couple of configurations.
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH] Fix resume on x86-32 machines
2017-12-14 20:47 ` Linus Torvalds
2017-12-14 21:20 ` Andy Lutomirski
@ 2017-12-14 22:22 ` Pavel Machek
1 sibling, 0 replies; 83+ messages in thread
From: Pavel Machek @ 2017-12-14 22:22 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andy Lutomirski, Rafael J. Wysocki, Andrew Lutomirski, Zhang Rui,
Thomas Gleixner, Jarkko Nikula, Linux Kernel Mailing List,
the arch/x86 maintainers
[-- Attachment #1: Type: text/plain, Size: 1074 bytes --]
On Thu 2017-12-14 12:47:56, Linus Torvalds wrote:
> On Thu, Dec 14, 2017 at 12:38 PM, Pavel Machek <pavel@ucw.cz> wrote:
> >
> > But given this is already "regression fix for x86-64 caused regression
> > on x86-32", I really believe we should merge trivial fix now, and do
> > the cleanups / nicer fixes sometime later?
>
> The fix patch was already posted, but in another thread (confusingly
> with _almost_ the same subject: "Re: Linux 4.15-rc2: Regression in
> resume from ACPI S3").
>
> Here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes
>
> but it hasn't actually gotten to me yet (I think Luto made a new
> version where he split it up into three smaller patches, and I'm
> assuming it's in one of the -tip trees heading for me asap).
Ok.. It did not apply cleanly. ... but I see fresh patches in my
inbox, let me test those.
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-14 19:03 ` Linus Torvalds
@ 2017-12-14 22:36 ` Thomas Gleixner
2017-12-14 22:47 ` Linus Torvalds
2017-12-15 0:34 ` Rafael J. Wysocki
0 siblings, 2 replies; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-14 22:36 UTC (permalink / raw)
To: Linus Torvalds
Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pci, linux-pm
On Thu, 14 Dec 2017, Linus Torvalds wrote:
> On Thu, Dec 14, 2017 at 3:54 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> I just wanted to pipe up about that "irq7", because judging from your
> email it seems like you think it's a real irq:
>
> > Now there is a race
> > whether the kernel resume path manages to mask the PIC again early enough
> > before something triggers IRQ7 or not.
>
> ..and that's not how the PIC works.
>
> In fact, "legacy irq 7" is the _normal_ and very traditional spurious
> interrupt, and it's documented. If the PIC gets an interrupt from
> _any_ source, but the interrupt goes away before the PIC gets an
> acknowledge from the CPU (and by "acknowledge", I'm not talking about
> the explicit software IRQ ACK, I'm talking about the hardware
> protocol, between the PIC and the CPU), the PIC will then report irq 7
> as the interrupt - regardless of what the original was.
>
> The reason is almost always something like
>
> - CPU interrupts are disabled or masked
>
> - driver does a write to the external hardware that causes an
> interrupt to be raised
Which should be a non issue because _ALL_ PIC irq lines are masked at the
PIC itself. All interrupts are routed through IOAPIC. So unless the IOAPIC
sports similar behaviour the PIC should not ever observe that scenario.
But, because the silly firmware comes out of suspend with all PIC lines
unmasked for whatever reason, the PIC can observe that IRQ being raised and
the CPU not handling it. So yes, I forgot about 7 being magic, but I still
think it's the firmware which causes it by unmasking the PIC irqs.
Thanks,
tglx
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-14 22:36 ` Thomas Gleixner
@ 2017-12-14 22:47 ` Linus Torvalds
2017-12-15 9:05 ` Thomas Gleixner
2017-12-15 0:34 ` Rafael J. Wysocki
1 sibling, 1 reply; 83+ messages in thread
From: Linus Torvalds @ 2017-12-14 22:47 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pci, linux-pm
On Thu, Dec 14, 2017 at 2:36 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> But, because the silly firmware comes out of suspend with all PIC lines
> unmasked for whatever reason, the PIC can observe that IRQ being raised and
> the CPU not handling it. So yes, I forgot about 7 being magic, but I still
> think it's the firmware which causes it by unmasking the PIC irqs.
Yes, that sounds quite likely.
Linus
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-14 22:36 ` Thomas Gleixner
2017-12-14 22:47 ` Linus Torvalds
@ 2017-12-15 0:34 ` Rafael J. Wysocki
1 sibling, 0 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-15 0:34 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, Linux PCI, Linux PM
On Thu, Dec 14, 2017 at 11:36 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Thu, 14 Dec 2017, Linus Torvalds wrote:
>> On Thu, Dec 14, 2017 at 3:54 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> I just wanted to pipe up about that "irq7", because judging from your
>> email it seems like you think it's a real irq:
>>
>> > Now there is a race
>> > whether the kernel resume path manages to mask the PIC again early enough
>> > before something triggers IRQ7 or not.
>>
>> ..and that's not how the PIC works.
>>
>> In fact, "legacy irq 7" is the _normal_ and very traditional spurious
>> interrupt, and it's documented. If the PIC gets an interrupt from
>> _any_ source, but the interrupt goes away before the PIC gets an
>> acknowledge from the CPU (and by "acknowledge", I'm not talking about
>> the explicit software IRQ ACK, I'm talking about the hardware
>> protocol, between the PIC and the CPU), the PIC will then report irq 7
>> as the interrupt - regardless of what the original was.
>>
>> The reason is almost always something like
>>
>> - CPU interrupts are disabled or masked
>>
>> - driver does a write to the external hardware that causes an
>> interrupt to be raised
>
> Which should be a non issue because _ALL_ PIC irq lines are masked at the
> PIC itself. All interrupts are routed through IOAPIC. So unless the IOAPIC
> sports similar behaviour the PIC should not ever observe that scenario.
>
> But, because the silly firmware comes out of suspend with all PIC lines
> unmasked for whatever reason, the PIC can observe that IRQ being raised and
> the CPU not handling it. So yes, I forgot about 7 being magic, but I still
> think it's the firmware which causes it by unmasking the PIC irqs.
That's my understanding too.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 83+ messages in thread
* [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
2017-12-14 15:54 ` Rafael J. Wysocki
2017-12-14 16:17 ` Maarten Lankhorst
@ 2017-12-15 2:07 ` Rafael J. Wysocki
2017-12-15 14:28 ` Rafael J. Wysocki
2017-12-15 18:30 ` Bjorn Helgaas
1 sibling, 2 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-15 2:07 UTC (permalink / raw)
To: Thomas Gleixner, linux-pci
Cc: Linus Torvalds, Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pm, Chen Yu
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is incorrect to call pci_restore_state() for devices in low-power
states (D1-D3), as that involves the restoration of MSI setup which
requires MMIO to be operational and that is only the case in D0.
However, pci_pm_thaw_noirq() may do that if the driver's "freeze"
callbacks put the device into a low-power state, so fix it by making
it force devices into D0 via pci_set_power_state() instead of trying
to "update" their power state which is pointless.
Fixes: e60514bd4485 (PCI/PM: Restore the status of PCI devices across hibernation)
Cc: 4.13+ <stable@vger.kernel.org> # 4.13+
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Maarten Lankhorst <dev@mblankhorst.nl>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Maarten Lankhorst <dev@mblankhorst.nl>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
The bug is not as old as I thought, actually.
Yes, we did the pci_update_current_state() in pci_pm_thaw_noirq()
forever, but it started to be problematic in 4.13, when we started
to call pci_restore_state() in addition to it to fix another issue.
---
drivers/pci/pci-driver.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_resume_early(dev);
- pci_update_current_state(pci_dev, PCI_D0);
+ /*
+ * pci_restore_state() requires the device to be in D0 (because of MSI
+ * restoration among other things), so force it into D0 in case the
+ * driver's "freeze" callbacks put it into a low-power state directly.
+ */
+ pci_set_power_state(pci_dev, PCI_D0);
pci_restore_state(pci_dev);
if (drv && drv->pm && drv->pm->thaw_noirq)
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2: Regression in resume from ACPI S3
2017-12-14 22:47 ` Linus Torvalds
@ 2017-12-15 9:05 ` Thomas Gleixner
0 siblings, 0 replies; 83+ messages in thread
From: Thomas Gleixner @ 2017-12-15 9:05 UTC (permalink / raw)
To: Linus Torvalds
Cc: Bjorn Helgaas, Maarten Lankhorst, Michal Hocko,
Rafael J. Wysocki, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pci, linux-pm
On Thu, 14 Dec 2017, Linus Torvalds wrote:
> On Thu, Dec 14, 2017 at 2:36 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > But, because the silly firmware comes out of suspend with all PIC lines
> > unmasked for whatever reason, the PIC can observe that IRQ being raised and
> > the CPU not handling it. So yes, I forgot about 7 being magic, but I still
> > think it's the firmware which causes it by unmasking the PIC irqs.
>
> Yes, that sounds quite likely.
And just for the record I was able to figure out which interrupt comes in
and goes away again. It's the only level triggered interrupt, which is the
ACPI interrupt.....
Thanks,
tglx
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
2017-12-15 2:07 ` [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq() Rafael J. Wysocki
@ 2017-12-15 14:28 ` Rafael J. Wysocki
2017-12-15 18:30 ` Bjorn Helgaas
1 sibling, 0 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-15 14:28 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Thomas Gleixner, Linux PCI, Linus Torvalds, Maarten Lankhorst,
Michal Hocko, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, Linux PM, Chen Yu
On Fri, Dec 15, 2017 at 3:07 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> It is incorrect to call pci_restore_state() for devices in low-power
> states (D1-D3), as that involves the restoration of MSI setup which
> requires MMIO to be operational and that is only the case in D0.
>
> However, pci_pm_thaw_noirq() may do that if the driver's "freeze"
> callbacks put the device into a low-power state, so fix it by making
> it force devices into D0 via pci_set_power_state() instead of trying
> to "update" their power state which is pointless.
>
> Fixes: e60514bd4485 (PCI/PM: Restore the status of PCI devices across hibernation)
> Cc: 4.13+ <stable@vger.kernel.org> # 4.13+
> Reported-by: Thomas Gleixner <tglx@linutronix.de>
> Reported-by: Maarten Lankhorst <dev@mblankhorst.nl>
> Tested-by: Thomas Gleixner <tglx@linutronix.de>
> Tested-by: Maarten Lankhorst <dev@mblankhorst.nl>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> The bug is not as old as I thought, actually.
>
> Yes, we did the pci_update_current_state() in pci_pm_thaw_noirq()
> forever, but it started to be problematic in 4.13, when we started
> to call pci_restore_state() in addition to it to fix another issue.
Bjorn, any concerns about this one?
> ---
> drivers/pci/pci-driver.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> Index: linux-pm/drivers/pci/pci-driver.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
> if (pci_has_legacy_pm_support(pci_dev))
> return pci_legacy_resume_early(dev);
>
> - pci_update_current_state(pci_dev, PCI_D0);
> + /*
> + * pci_restore_state() requires the device to be in D0 (because of MSI
> + * restoration among other things), so force it into D0 in case the
> + * driver's "freeze" callbacks put it into a low-power state directly.
> + */
> + pci_set_power_state(pci_dev, PCI_D0);
> pci_restore_state(pci_dev);
>
> if (drv && drv->pm && drv->pm->thaw_noirq)
>
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
2017-12-15 2:07 ` [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq() Rafael J. Wysocki
2017-12-15 14:28 ` Rafael J. Wysocki
@ 2017-12-15 18:30 ` Bjorn Helgaas
2017-12-15 23:44 ` Rafael J. Wysocki
1 sibling, 1 reply; 83+ messages in thread
From: Bjorn Helgaas @ 2017-12-15 18:30 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Thomas Gleixner, linux-pci, Linus Torvalds, Maarten Lankhorst,
Michal Hocko, Andy Lutomirski, Linux Kernel Mailing List,
the arch/x86 maintainers, Daniel Vetter, Bjorn Helgaas,
Rafael J. Wysocki, linux-pm, Chen Yu
On Fri, Dec 15, 2017 at 03:07:18AM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> It is incorrect to call pci_restore_state() for devices in low-power
> states (D1-D3), as that involves the restoration of MSI setup which
> requires MMIO to be operational and that is only the case in D0.
>
> However, pci_pm_thaw_noirq() may do that if the driver's "freeze"
> callbacks put the device into a low-power state, so fix it by making
> it force devices into D0 via pci_set_power_state() instead of trying
> to "update" their power state which is pointless.
>
> Fixes: e60514bd4485 (PCI/PM: Restore the status of PCI devices across hibernation)
> Cc: 4.13+ <stable@vger.kernel.org> # 4.13+
> Reported-by: Thomas Gleixner <tglx@linutronix.de>
> Reported-by: Maarten Lankhorst <dev@mblankhorst.nl>
> Tested-by: Thomas Gleixner <tglx@linutronix.de>
> Tested-by: Maarten Lankhorst <dev@mblankhorst.nl>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Let me know if you want me to take this. I don't have anything
currently queued up that touches pci-driver.c, so I'm happy if you
take it yourself.
> ---
>
> The bug is not as old as I thought, actually.
>
> Yes, we did the pci_update_current_state() in pci_pm_thaw_noirq()
> forever, but it started to be problematic in 4.13, when we started
> to call pci_restore_state() in addition to it to fix another issue.
>
> ---
> drivers/pci/pci-driver.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> Index: linux-pm/drivers/pci/pci-driver.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
> if (pci_has_legacy_pm_support(pci_dev))
> return pci_legacy_resume_early(dev);
>
> - pci_update_current_state(pci_dev, PCI_D0);
> + /*
> + * pci_restore_state() requires the device to be in D0 (because of MSI
> + * restoration among other things), so force it into D0 in case the
> + * driver's "freeze" callbacks put it into a low-power state directly.
> + */
> + pci_set_power_state(pci_dev, PCI_D0);
> pci_restore_state(pci_dev);
>
> if (drv && drv->pm && drv->pm->thaw_noirq)
>
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
2017-12-15 18:30 ` Bjorn Helgaas
@ 2017-12-15 23:44 ` Rafael J. Wysocki
0 siblings, 0 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2017-12-15 23:44 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Rafael J. Wysocki, Thomas Gleixner, Linux PCI, Linus Torvalds,
Maarten Lankhorst, Michal Hocko, Andy Lutomirski,
Linux Kernel Mailing List, the arch/x86 maintainers,
Daniel Vetter, Bjorn Helgaas, Rafael J. Wysocki, Linux PM,
Chen Yu
On Fri, Dec 15, 2017 at 7:30 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Fri, Dec 15, 2017 at 03:07:18AM +0100, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> It is incorrect to call pci_restore_state() for devices in low-power
>> states (D1-D3), as that involves the restoration of MSI setup which
>> requires MMIO to be operational and that is only the case in D0.
>>
>> However, pci_pm_thaw_noirq() may do that if the driver's "freeze"
>> callbacks put the device into a low-power state, so fix it by making
>> it force devices into D0 via pci_set_power_state() instead of trying
>> to "update" their power state which is pointless.
>>
>> Fixes: e60514bd4485 (PCI/PM: Restore the status of PCI devices across hibernation)
>> Cc: 4.13+ <stable@vger.kernel.org> # 4.13+
>> Reported-by: Thomas Gleixner <tglx@linutronix.de>
>> Reported-by: Maarten Lankhorst <dev@mblankhorst.nl>
>> Tested-by: Thomas Gleixner <tglx@linutronix.de>
>> Tested-by: Maarten Lankhorst <dev@mblankhorst.nl>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>
> Let me know if you want me to take this. I don't have anything
> currently queued up that touches pci-driver.c, so I'm happy if you
> take it yourself.
I will take it.
Depending of what Yu finds, we may need an additional fix to make the
Purley system work.
>> ---
>>
>> The bug is not as old as I thought, actually.
>>
>> Yes, we did the pci_update_current_state() in pci_pm_thaw_noirq()
>> forever, but it started to be problematic in 4.13, when we started
>> to call pci_restore_state() in addition to it to fix another issue.
>>
>> ---
>> drivers/pci/pci-driver.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> Index: linux-pm/drivers/pci/pci-driver.c
>> ===================================================================
>> --- linux-pm.orig/drivers/pci/pci-driver.c
>> +++ linux-pm/drivers/pci/pci-driver.c
>> @@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
>> if (pci_has_legacy_pm_support(pci_dev))
>> return pci_legacy_resume_early(dev);
>>
>> - pci_update_current_state(pci_dev, PCI_D0);
>> + /*
>> + * pci_restore_state() requires the device to be in D0 (because of MSI
>> + * restoration among other things), so force it into D0 in case the
>> + * driver's "freeze" callbacks put it into a low-power state directly.
>> + */
>> + pci_set_power_state(pci_dev, PCI_D0);
>> pci_restore_state(pci_dev);
>>
>> if (drv && drv->pm && drv->pm->thaw_noirq)
>>
^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Linux 4.15-rc2
2017-12-03 16:22 Linux 4.15-rc2 Linus Torvalds
2017-12-04 22:25 ` Linux 4.15-rc2: Regression in resume from ACPI S3 Rafael J. Wysocki
@ 2018-02-21 18:36 ` Eugene Syromiatnikov
1 sibling, 0 replies; 83+ messages in thread
From: Eugene Syromiatnikov @ 2018-02-21 18:36 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Linux Kernel Mailing List, Al Viro
On Sun, Dec 03, 2017 at 11:22:56AM -0500, Linus Torvalds wrote:
>
> Linus Torvalds (6):
> Rename superblock flags (MS_xyz -> SB_xyz)
This commit, while claims that it changes internal flags, also touches
an UAPI header (include/uapi/linux/bfs_fs.h), specifically, the macro
BFS_UNCLEAN. I expect that either this macro should be in a private
header, or (if this check is expected to be available to the userspace)
at least SB_RDONLY should be also defined there.
^ permalink raw reply [flat|nested] 83+ messages in thread
end of thread, other threads:[~2018-02-21 18:36 UTC | newest]
Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-03 16:22 Linux 4.15-rc2 Linus Torvalds
2017-12-04 22:25 ` Linux 4.15-rc2: Regression in resume from ACPI S3 Rafael J. Wysocki
2017-12-04 22:36 ` Linus Torvalds
2017-12-04 22:38 ` Thomas Gleixner
2017-12-04 22:41 ` Rafael J. Wysocki
2017-12-05 0:25 ` Rafael J. Wysocki
2017-12-09 10:33 ` Pavel Machek
2017-12-09 11:41 ` Pavel Machek
[not found] ` <CA+55aFw8tuoJ2gcXx3K2sKFf2Y9hXX4naMVQNqGOUivnjwhjkg@mail.gmail.com>
2017-12-09 22:01 ` Pavel Machek
[not found] ` <CA+55aFySAdiBZhZ0PSDjH5PuvPPcMsBRXbxCkObfm1eY7gHDbQ@mail.gmail.com>
2017-12-10 16:23 ` Pavel Machek
2017-12-10 16:37 ` Linus Torvalds
2017-12-10 18:56 ` Pavel Machek
2017-12-10 20:30 ` Linus Torvalds
2017-12-10 20:43 ` Pavel Machek
2017-12-10 21:28 ` Linus Torvalds
2017-12-10 21:35 ` Pavel Machek
2017-12-12 17:27 ` Linus Torvalds
2017-12-12 18:05 ` Andy Lutomirski
2017-12-12 18:36 ` Linus Torvalds
2017-12-12 22:10 ` Andy Lutomirski
2017-12-12 22:33 ` Linus Torvalds
2017-12-12 23:10 ` Andy Lutomirski
2017-12-13 11:16 ` Jarkko Nikula
2017-12-13 12:40 ` Ingo Molnar
2017-12-13 18:50 ` Andy Lutomirski
2017-12-10 21:38 ` [PATCH] Fix resume on x86-32 machines Pavel Machek
2017-12-10 21:58 ` Andy Lutomirski
2017-12-10 22:20 ` Pavel Machek
2017-12-11 9:25 ` Jarkko Nikula
2017-12-11 14:22 ` Rafael J. Wysocki
2017-12-11 14:43 ` Rafael J. Wysocki
2017-12-11 14:59 ` Jarkko Nikula
2017-12-11 18:31 ` Linus Torvalds
2017-12-11 18:41 ` Andy Lutomirski
2017-12-11 19:12 ` Linus Torvalds
2017-12-14 20:38 ` Pavel Machek
2017-12-14 20:47 ` Linus Torvalds
2017-12-14 21:20 ` Andy Lutomirski
2017-12-14 22:22 ` Pavel Machek
2017-12-11 15:13 ` Ingo Molnar
2017-12-11 16:26 ` Andy Lutomirski
2017-12-11 14:09 ` Linux 4.15-rc2: Regression in resume from ACPI S3 Zhang Rui
2017-12-11 16:28 ` Andy Lutomirski
2017-12-12 8:00 ` Pavel Machek
2017-12-06 12:15 ` Michal Hocko
2017-12-06 12:23 ` Thomas Gleixner
2017-12-06 14:04 ` Rafael J. Wysocki
2017-12-06 12:31 ` Maarten Lankhorst
2017-12-06 12:46 ` Thomas Gleixner
2017-12-06 13:09 ` Maarten Lankhorst
2017-12-06 14:15 ` Thomas Gleixner
2017-12-07 13:33 ` Maarten Lankhorst
2017-12-08 10:30 ` Thomas Gleixner
2017-12-13 15:57 ` Thomas Gleixner
2017-12-13 16:23 ` Bjorn Helgaas
2017-12-13 16:41 ` Thomas Gleixner
2017-12-13 17:45 ` Linus Torvalds
2017-12-13 18:19 ` Thomas Gleixner
2017-12-13 20:52 ` Thomas Gleixner
2017-12-13 21:06 ` Thomas Gleixner
2017-12-13 22:48 ` Rafael J. Wysocki
2017-12-14 11:54 ` Thomas Gleixner
2017-12-14 12:12 ` Rafael J. Wysocki
2017-12-14 12:30 ` Thomas Gleixner
2017-12-14 15:30 ` Rafael J. Wysocki
2017-12-14 15:52 ` Thomas Gleixner
2017-12-14 15:54 ` Rafael J. Wysocki
2017-12-14 16:17 ` Maarten Lankhorst
2017-12-15 2:07 ` [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq() Rafael J. Wysocki
2017-12-15 14:28 ` Rafael J. Wysocki
2017-12-15 18:30 ` Bjorn Helgaas
2017-12-15 23:44 ` Rafael J. Wysocki
2017-12-14 13:24 ` Linux 4.15-rc2: Regression in resume from ACPI S3 Thomas Gleixner
2017-12-14 19:03 ` Linus Torvalds
2017-12-14 22:36 ` Thomas Gleixner
2017-12-14 22:47 ` Linus Torvalds
2017-12-15 9:05 ` Thomas Gleixner
2017-12-15 0:34 ` Rafael J. Wysocki
2017-12-13 22:39 ` Rafael J. Wysocki
2017-12-13 23:26 ` Rafael J. Wysocki
2017-12-07 7:55 ` Michal Hocko
2017-12-10 20:30 ` Michal Hocko
2018-02-21 18:36 ` Linux 4.15-rc2 Eugene Syromiatnikov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).