* Linux 4.10-rc5
@ 2017-01-22 21:32 Linus Torvalds
2017-01-25 12:10 ` [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5) Martin Steigerwald
0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2017-01-22 21:32 UTC (permalink / raw)
To: Linux Kernel Mailing List
Things seem to be calming down a bit, and everything looks nominal.
There's only been about 250 changes (not counting merges) in the last
week, and the diffstat touches less than 300 files (with drivers and
architecture updates being the bulk, but there's tooling, networking
and filesystems in there too).
So keep testing, and I think we'll have a regular release schedule.
Linus
---
Adam Ford (2):
ARM: OMAP2+: Fix WL1283 Bluetooth Baud Rate
ARM: dts: omap3: Fix Card Detect and Write Protect on Logic PD SOM-LV
Alexander Graf (1):
arm64: Fix swiotlb fallback allocation
Alexandre Belloni (1):
usb: gadget: udc: atmel: remove memory leak
Amelie Delaunay (1):
usb: dwc2: gadget: Fix GUSBCFG.USBTRDTIM value
Amir Goldstein (7):
xfs: make the ASSERT() condition likely
xfs: sanity check directory inode di_size
xfs: add missing include dependencies to xfs_dir2.h
xfs: replace xfs_mode_to_ftype table with switch statement
xfs: sanity check inode mode when creating new dentry
xfs: sanity check inode di_mode
ovl: fix possible use after free on redirect dir lookup
Andrey Smirnov (1):
at86rf230: Allow slow GPIO pins for "rstn"
Andy Shevchenko (2):
spi: dw-mid: switch to new dmaengine_terminate_* API (part 2)
spi: pxa2xx: add missed break
Aneesh Kumar K.V (2):
powerpc/mm/hugetlb: Don't panic when we don't find the default
huge page size
powerpc/mm: Fix little-endian 4K hugetlb
Anton Blanchard (1):
powerpc: Ignore reserved field in DCSR and PVR reads and writes
Arkadi Sharshevsky (2):
mlxsw: spectrum: Fix memory leak at skb reallocation
mlxsw: switchx2: Fix memory leak at skb reallocation
Arnd Bergmann (5):
ARM: ux500: fix prcmu_is_cpu_in_wfi() calculation
cpmac: remove hopeless #warning
net/mlx5e: Fix a -Wmaybe-uninitialized warning
ubifs: add CONFIG_BLOCK dependency for encryption
xfs: fix xfs_mode_to_ftype() prototype
Bart Van Assche (4):
qla2xxx: Fix indentation
qla2xxx: Declare an array with file scope static
qla2xxx: Move two arrays from header files to .c files
qla2xxx: Avoid that building with W=1 triggers complaints about
set-but-not-used variables
Basil Gunn (1):
ax25: Fix segfault after sock connection timeout
Beni Lev (1):
cfg80211: consider VHT opmode on station update
Benjamin Coddington (1):
nfs: Don't take a reference on fl->fl_file for LOCK operation
Benjamin Herrenschmidt (1):
powerpc/icp-opal: Fix missing KVM case and harden replay
Bhumika Goyal (2):
vhost: scsi: constify target_core_fabric_ops structures
virtio/s390: virtio: constify virtio_config_ops structures
Bjorn Helgaas (2):
x86/PCI: Ignore _CRS on Supermicro X8DTH-i/6/iF/6F
PCI: Enumerate switches below PCI-to-PCIe bridges
Brian Norris (2):
thermal: rockchip: improve conversion error messages
thermal: rockchip: don't pass table structs by value
Bryant G. Ly (2):
ibmvscsis: Fix max transfer length
ibmvscsis: Fix sleeping in interrupt context
Caesar Wang (4):
thermal: rockchip: fixes invalid temperature case
thermal: rockchip: optimize the conversion table
thermal: rockchip: handle set_trips without the trip points
thermal: rockchip: fixes the conversion table
Cedric Izoard (1):
mac80211: Fix headroom allocation when forwarding mesh pkt
Chen-Yu Tsai (2):
ARM: dts: sun6i: Disable display pipeline by default
ARM: dts: sun6i: hummingbird: Enable display engine again
Christian Borntraeger (1):
KVM: s390: do not expose random data via facility bitmap
Christoffer Dall (1):
KVM: arm/arm64: Fix occasional warning from the timer work function
Christoph Hellwig (2):
scsi: qla2xxx: fix MSI-X vector affinity
scsi: qla2xxx: remove irq_affinity_notifier
Christophe JAILLET (2):
spi: spi-axi: Free resources on error path
usb: gadget: composite: Fix function used to free memory
Colin Ian King (3):
spi: armada-3700: fix unsigned compare than zero on irq
ubifs: ensure zero err is returned on successful return
virtio/s390: add missing \n to end of dev_err message
Damien Le Moal (2):
scsi: sd: Fix wrong DPOFUA disable in sd_read_cache_type
scsi: sd: Ignore zoned field for host-managed devices
Dan Carpenter (2):
spi: armada-3700: Set mode bits correctly
vhost/scsi: silence uninitialized variable warning
Dan Williams (1):
libnvdimm, namespace: fix pmem namespace leak, delete when size
set to zero
Daniel Borkmann (1):
bpf: rework prog_digest into prog_tag
Dave Jones (1):
scsi: qla2xxx: Fix apparent cut-n-paste error.
Dave Martin (7):
arm64/ptrace: Preserve previous registers for short regset write
arm64/ptrace: Preserve previous registers for short regset write
arm64/ptrace: Preserve previous registers for short regset write
arm64/ptrace: Avoid uninitialised struct padding in fpr_set()
arm64/ptrace: Reject attempts to set incomplete hardware breakpoint fields
powerpc/ptrace: Preserve previous fprs/vsrs on short regset write
powerpc/ptrace: Preserve previous TM fprs/vsrs on short regset write
David Ahern (2):
net: lwtunnel: Handle lwtunnel_fill_encap failure
net: ipv4: fix table id in getroute response
David Lebrun (1):
ipv6: sr: fix several BUGs when preemption is enabled
David Sheets (1):
fuse: fix time_to_jiffies nsec sanity check
Dmitry Vyukov (1):
KVM: x86: fix fixing of hypercalls
Elad Raz (1):
mlxsw: pci: Fix EQE structure definition
Emmanuel Grumbach (1):
mac80211: fix the TID on NDPs sent as EOSP carrier
Emmanuel Vadot (1):
ARM: dts: sunxi: Change node name for pwrseq pin on Olinuxino-lime2-emmc
Eric Biggers (2):
ubifs: allow encryption ioctls in compat mode
ubifs: remove redundant checks for encryption key
Eric Dumazet (1):
mlx4: do not call napi_schedule() without care
Eric Sandeen (1):
xfs: don't wrap ID in xfs_dq_get_next_id
Ewan D. Milne (1):
scsi: ses: Fix SAS device detection in enclosure
Fabien Parent (1):
ARM: dts: da850-evm: fix read access to SPI flash
Fabio Estevam (1):
thermal: thermal_hwmon: Convert to hwmon_device_register_with_info()
Fam Zheng (1):
scsi: libfc: Fix variable name in fc_set_wwpn
Felix Fietkau (1):
mac80211: initialize SMPS field in HT capabilities
Florian Fainelli (1):
net: systemport: Decouple flow control from __bcm_sysport_tx_reclaim
G. Campana (1):
virtio_console: fix a crash in config_work_handler
Gary Bisson (2):
ARM: dts: imx6qdl-nitrogen6_max: fix sgtl5000 pinctrl init
ARM: dts: imx6qdl-nitrogen6_som2: fix sgtl5000 pinctrl init
Gavin Shan (1):
powerpc/eeh: Enable IO path on permanent error
Geert Uytterhoeven (1):
spi: SPI_FSL_DSPI should depend on HAS_DMA
Halil Pasic (2):
tools/virtio/ringtest: fix run-on-all.sh for offline cpus
tools/virtio/ringtest: tweaks for s390
Hangbin Liu (1):
mld: do not remove mld souce list info when set link down
Hans de Goede (1):
mmc: sdhci-acpi: Only powered up enabled acpi child devices
Hauke Mehrtens (2):
mtd: nand: xway: disable module support
mtd: nand: xway: fix build because of module functions
Heiko Carstens (2):
s390/ctl_reg: make __ctl_load a full memory barrier
s390: update defconfigs
Heiner Kallweit (1):
net: stmmac: don't use netdev_[dbg, info, ..] before net_device
is registered
Heinrich Schuchardt (1):
MMC: meson: avoid possible NULL dereference
Himanshu Madhani (3):
qla2xxx: Include ATIO queue in firmware dump when in target mode
qla2xxx: Set tcm_qla2xxx version to automatically track qla2xxx version
qla2xxx: Reset reserved field in firmware options to 0
Ilya Dryomov (1):
libceph: make sure ceph_aes_crypt() IV is aligned
Ivan Vecera (3):
be2net: fix status check in be_cmd_pmac_add()
be2net: don't delete MAC on close on unprivileged BE3 VFs
be2net: fix MAC addr setting on privileged BE3 VFs
J. Bruce Fields (2):
nfsd: fix supported attributes for acl & labels
svcrpc: don't leak contexts on PROC_DESTROY
Jack Morgenstein (3):
net/mlx4_core: Fix racy CQ (Completion Queue) free
net/mlx4_core: Fix when to save some qp context flags for
dynamic VST to VGT transitions
net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV
Jacob von Chorus (1):
thermal: core: move tz->device.groups cleanup to thermal_release
Jakub Sitnicki (1):
ip6_tunnel: Account for tunnel header in tunnel MTU
Jamal Hadi Salim (1):
net sched actions: fix refcnt when GETing of action after bind
James Bottomley (1):
scsi: mpt3sas: fix hang on ata passthrough commands
Jason Gerecke (1):
HID: wacom: Fix sibling detection regression
Jean-Jacques Hiblot (1):
ARM: dts: OMAP5 / DRA7: indicate that SATA port 0 is available.
Jeff Layton (3):
ceph: fix endianness of getattr mask in ceph_d_revalidate
ceph: fix endianness bug in frag_tree_split_cmp
ceph: fix bad endianness handling in parse_reply_info_extra
Jintack Lim (1):
KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems
Johan Hovold (2):
HID: corsair: fix DMA buffers on stack
HID: corsair: fix control-transfer error handling
Johannes Berg (3):
mac80211: implement multicast forwarding on fast-RX path
mac80211: calculate min channel width correctly
mac80211: recalculate min channel width on VHT opmode changes
Johannes Thumshirn (2):
scsi: bfa: fix wrongly initialized variable in
bfad_im_bsg_els_ct_request()
scsi: lpfc: Set elsiocb contexts to NULL after freeing it
John Stultz (1):
usb: dwc2: Avoid suspending if we're in gadget mode
Jon Mason (1):
ARM: dts: NSP: Fix DT ranges error
Joonyoung Shim (1):
clocksource/exynos_mct: Clear interrupt when cpu is shut down
Josef Bacik (1):
nbd: only set MSG_MORE when we have more to send
Karicheri, Muralidharan (1):
net: phy: dp83867: allow RGMII_TXID/RGMII_RXID interface types
Kazuya Mizuguchi (1):
ravb: Remove Rx overflow log messages
Keith Busch (1):
blk-mq: Remove unused variable
Kevin Hilman (1):
spi: davinci: use dma_mapping_error()
Krzysztof Kozlowski (2):
MAINTAINERS: Add Patchwork URL to Samsung Exynos entry
ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_*
Lance Richardson (1):
openvswitch: maintain correct checksum state in conntrack actions
Larry Finger (1):
taint/module: Fix problems when out-of-kernel driver defines true or false
Leo Yan (1):
usb: dwc2: use u32 for DT binding parameters
Linus Torvalds (1):
Linux 4.10-rc5
Linus Walleij (1):
ARM: 8613/1: Fix the uaccess crash on PB11MPCore
Lokesh Vutla (1):
ARM: dts: am335x-icev2: Remove the duplicated pinmux setting
Madhavan Srinivasan (3):
powerpc/perf: Fix PM_BRU_CMPL event code for power9
selftest/powerpc: Wrong PMC initialized in pmc56_overflow test
powerpc/perf: Use MSR to report privilege level on P9 DD1
Marc Gonzalez (2):
mtd: nand: tango: Update DT binding description
mtd: nand: tango: Reset pbus to raw mode in probe
Marc Zyngier (2):
KVM: arm/arm64: vgic: Fix deadlock on error handling
PCI/MSI: pci-xgene-msi: Fix CPU hotplug registration handling
Marek Szyprowski (1):
clk/samsung: exynos542x: mark some clocks as critical
Mark Rutland (2):
ARM: 8634/1: hw_breakpoint: blacklist Scorpion CPUs
arm64: avoid returning from bad_mode
Martynas Pumputis (1):
vxlan: Set ports in flow key when doing route lookups
Masahiro Yamada (1):
ARM, ARM64: dts: drop "arm,amba-bus" in favor of "simple-bus" part 3
Masami Hiramatsu (3):
perf probe: Fix to show correct locations for events on modules
perf probe: Add error checks to offline probe post-processing
perf probe: Fix to probe on gcc generated functions in modules
Masaru Nagai (1):
ravb: do not use zero-length alignment DMA descriptor
Mathias Nyman (1):
xhci: remove WARN_ON if dma mask is not set for platform devices
Michal Kazior (1):
mac80211: prevent skb/txq mismatch
Michal Simek (1):
ARM64: zynqmp: Fix W=1 dtc 1.4 warnings
Milan P. Gandhi (1):
scsi: qla2xxx: Get mutex lock before checking optrom_state
Milo Kim (1):
ARM: dts: sun8i: Support DTB build for NanoPi M1
Moritz Fischer (1):
ARM64: zynqmp: Fix i2c node's compatible string
Murali Karicheri (1):
PCI: designware: Check for iATU unroll only on platforms that use ATU
Neil Armstrong (1):
ARM64: dts: meson-gxbb-odroidc2: Disable SCPI DVFS
Nicholas Mc Guire (1):
usb: dwc2: host: fix Wmaybe-uninitialized warning
Nicholas Piggin (1):
powerpc: Fix pgtable pmd cache init
Nicolas Dichtel (1):
ARM: put types.h in uapi
Nikita Yushchenko (1):
swiotlb: ensure that page-sized mappings are page-aligned
Oleksandr Andrushchenko (1):
arm64: mm: avoid name clash in __page_to_voff()
Parthasarathy Bhuvaragan (1):
tipc: allocate user memory with GFP_KERNEL flag
Paul E. McKenney (2):
rcu: Remove cond_resched() from Tiny synchronize_sched()
rcu: Narrow early boot window of illegal synchronous grace periods
Peter Rosin (1):
ubifs: fix unencrypted journal write
Peter Ujfalusi (1):
ARM: OMAP1: DMA: Correct the number of logical channels
Phil Reid (1):
spi: dw: Make debugfs name unique between instances
Pierre Morel (1):
virtio/s390: support READ_STATUS command for virtio-ccw
Quinn Tran (7):
qla2xxx: Fix wrong IOCB type assumption
qla2xxx: Collect additional information to debug fw dump
qla2xxx: Fix crash due to null pointer access
qla2xxx: Terminate exchange if corrupted
qla2xxx: Reduce exess wait during chip reset
qla2xxx: Fix erroneous invalid handle message
qla2xxx: Disable out-of-order processing by default in firmware
Rabin Vincent (1):
ARM: 8632/1: ftrace: fix syscall name matching
Randy Dunlap (1):
mtd: nand: oxnas_nand: fix build errors on arch/um, require HAS_IOMEM
Reza Arbab (1):
powerpc/mm: Fix memory hotplug BUG() on radix
Richard Weinberger (1):
ubifs: Fix journal replay wrt. xattr nodes
Roberto Sassu (1):
scsi: lpfc: avoid double free of resource identifiers
Ruslan Ruslichenko (1):
x86/ioapic: Restore IO-APIC irq_chip retrigger callback
Russell King (1):
MAINTAINERS: update rmk's entries
Scott Mayhew (1):
sunrpc: don't call sleeping functions from the notifier block callbacks
Sedat Dilek (1):
perf/x86/amd/ibs: Fix typo after cleanup state names in cpu/hotplug
Sekhar Nori (1):
ARM: dts: dra72-evm-revc: fix typo in ethernet-phy node
Shannon Nelson (1):
tcp: fix tcp_fastopen unaligned access complaints on sparc
Shuah Khan (1):
usb: dwc3: exynos fix axius clock error path to do cleanup
Simon Horman (2):
spi: sh-msiof: Add R-Car Gen 2 and 3 fallback bindings
spi: sh-msiof: Do not use C++ style comment
Sriharsha Basavapatna (1):
svcrdma: avoid duplicate dma unmapping during error recovery
Stefan Hajnoczi (1):
pmem: return EIO on read_pmem() failure
Stefan Schmidt (4):
ieee802154: atusb: do not use the stack for buffers to make them DMA able
ieee802154: atusb: make sure we set a randaom extended address
if fetching fails
ieee802154: atusb: do not use the stack for address fetching to
make it DMA able
ieee802154: atusb: fix driver to work with older firmware versions
Stefan Wahren (1):
mmc: mxs-mmc: Fix additional cycles after transmission stop
Stefano Stabellini (1):
partially revert "xen: Remove event channel notification through
Xen PCI platform device"
Tahsin Erdogan (1):
fuse: clear FR_PENDING flag when moving requests out of pending queue
Thomas Gleixner (1):
cpu/hotplug: Provide dynamic range for prepare stage
Timur Tabi (1):
net: qcom/emac: grab a reference to the phydev on ACPI systems
Tobias Klauser (1):
cpu/hotplug: Remove unused but set variable in _cpu_down()
Trond Myklebust (5):
NFSv4: Call update_changeattr() from _nfs4_proc_open only if a
file was created
NFSv4: Don't apply change_info4 twice on rename within a directory
NFSv4: Don't call update_changeattr() unless the unlink is successful
NFSv4: update_changeattr should update the attribute timestamp
NFSv4: Fix client recovery when server reboots multiple times
Ulf Hansson (1):
mmc: core: Restore parts of the polling policy when switch to HS/HS DDR
Vadim Lomovtsev (1):
net: thunderx: acpi: fix LMAC initialization
Valentin Rothberg (2):
ARM: multi_v7_defconfig: fix config typo
ARM: multi_v7_defconfig: set bcm47xx watchdog
Vardan Mikayelyan (1):
usb: dwc2: gadget: Fix DMA memory freeing
Vincent Pelletier (1):
usb: gadget: f_fs: Fix iterations on endpoints.
Vineet Gupta (8):
ARC: mmu: clarify the MMUv3 programming model
ARCv2: save r30 on kernel entry as gcc uses it for code-gen
ARC: module: Fix !CONFIG_ARC_DW2_UNWIND builds
ARCv2: IOC: refactor the IOC and SLC operations into own functions
ARCv2: IOC: Adhere to progamming model guidelines to avoid DMA corruption
ARCv2: IOC: Use actual memory size to setup aperture size
ARC: mm: split arc_cache_init to allow __init reaping of bulk
ARC: Revert "ARC: mm: IOC: Don't enable IOC by default"
Vladimir Zapolskiy (1):
mtd: nand: lpc32xx: fix invalid error handling of a requested irq
Wei Yongjun (1):
soc: ti: wkup_m3_ipc: Fix error return code in wkup_m3_ipc_probe()
Yan, Zheng (1):
ceph: fix ceph_get_caps() interruption
Yuriy Kolerov (2):
ARC: IRQ: Use hwirq instead of virq in mask/unmask
ARCv2: IRQ: Call entry/exit functions for chained handlers in MCIP
Zhou Chengming (1):
perf/x86/intel: Handle exclusive threadid correctly on CPU hotplug
hayeswang (1):
r8152: fix the sw rx checksum is unavailable
stephen hemminger (1):
netvsc: add rcu_read locking to netvsc callback
^ permalink raw reply [flat|nested] 4+ messages in thread
* [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5)
2017-01-22 21:32 Linux 4.10-rc5 Linus Torvalds
@ 2017-01-25 12:10 ` Martin Steigerwald
2017-02-01 13:11 ` [Intel-gfx] " David Weinehall
0 siblings, 1 reply; 4+ messages in thread
From: Martin Steigerwald @ 2017-01-25 12:10 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Linus Torvalds, Intel Gfx Mailing List, Jani Nikula
Am Sonntag, 22. Januar 2017, 13:32:08 CET schrieb Linus Torvalds:
> Things seem to be calming down a bit, and everything looks nominal.
>
> There's only been about 250 changes (not counting merges) in the last
> week, and the diffstat touches less than 300 files (with drivers and
> architecture updates being the bulk, but there's tooling, networking
> and filesystems in there too).
>
> So keep testing, and I think we'll have a regular release schedule.
Testing this is no fun:
Bug 99533 - black screen after switching session
https://bugs.freedesktop.org/99533
This after GPU hang/lockups with Kernel 4.9 reported as for example:
Bug 98922 - [snb] GPU hang on PlaneShift
https://bugs.freedesktop.org/98922
Which may be a duplicate of #98747, #98794, #98860, #98891, #98288.
I am back at kernel 4.8.15 as I need this machine for production work.
Sometimes I wish for a microkernel that might be able to reincarnate drivers
that hang or do wierd things like that. That may at least give a way to
actually do some debugging or even get the desktop session back without
loosing its state. Especially for graphics drivers and hibernating/resuming
from hibernations which also occasionally fails – again without leaving a way
to interact with the machine to do further debugging. Linux kernel usually
just crashes completely, not even a ping or ssh possible, or it at least stuck
with a black display without any way to restart the graphics driver cause it
seems to be in some undefined state. Combined with occasionally happening bugs
this makes triaging bugs time consuming and risky. I do like to help testing,
but maybe its time to just switch to distro kernels and be done about it, as I
regularily come across bugs that are too expensive for me to triage.
Please understand that I am not willing to bisect these occasionally happening
bugs with have the potential to cause data loss due to having to switch off
the machine forcefully. Fortunately at least KMail saves a mail I write from
time to time and also Kate does swap files.
I am also a bit unwilling to do further debugging of this one as I usually use
two sessions when I am at work and I risk loosing data I work on. But… at
least with this issue it seems I would have a way to SSH into the machine
before kicking it.
I am dissatisfied with the state of the Intel graphics driver on this ThinkPad
T520 with Sandybridge since kernel 4.9 and wonder whether you guys at Intel
really test things with older hardware versions.
Thanks,
--
Martin
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Intel-gfx] [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5)
2017-01-25 12:10 ` [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5) Martin Steigerwald
@ 2017-02-01 13:11 ` David Weinehall
2017-02-11 14:55 ` Martin Steigerwald
0 siblings, 1 reply; 4+ messages in thread
From: David Weinehall @ 2017-02-01 13:11 UTC (permalink / raw)
To: Martin Steigerwald
Cc: Linux Kernel Mailing List, Intel Gfx Mailing List, Linus Torvalds
On Wed, Jan 25, 2017 at 01:10:26PM +0100, Martin Steigerwald wrote:
> Am Sonntag, 22. Januar 2017, 13:32:08 CET schrieb Linus Torvalds:
> > Things seem to be calming down a bit, and everything looks nominal.
> >
> > There's only been about 250 changes (not counting merges) in the last
> > week, and the diffstat touches less than 300 files (with drivers and
> > architecture updates being the bulk, but there's tooling, networking
> > and filesystems in there too).
> >
> > So keep testing, and I think we'll have a regular release schedule.
>
> Testing this is no fun:
>
> Bug 99533 - black screen after switching session
> https://bugs.freedesktop.org/99533
>
>
> This after GPU hang/lockups with Kernel 4.9 reported as for example:
>
> Bug 98922 - [snb] GPU hang on PlaneShift
> https://bugs.freedesktop.org/98922
>
> Which may be a duplicate of #98747, #98794, #98860, #98891, #98288.
>
>
> I am back at kernel 4.8.15 as I need this machine for production work.
>
> Sometimes I wish for a microkernel that might be able to reincarnate drivers
> that hang or do wierd things like that. That may at least give a way to
> actually do some debugging or even get the desktop session back without
> loosing its state. Especially for graphics drivers and hibernating/resuming
> from hibernations which also occasionally fails – again without leaving a way
> to interact with the machine to do further debugging. Linux kernel usually
> just crashes completely, not even a ping or ssh possible, or it at least stuck
> with a black display without any way to restart the graphics driver cause it
> seems to be in some undefined state. Combined with occasionally happening bugs
> this makes triaging bugs time consuming and risky. I do like to help testing,
> but maybe its time to just switch to distro kernels and be done about it, as I
> regularily come across bugs that are too expensive for me to triage.
>
> Please understand that I am not willing to bisect these occasionally happening
> bugs with have the potential to cause data loss due to having to switch off
> the machine forcefully. Fortunately at least KMail saves a mail I write from
> time to time and also Kate does swap files.
>
> I am also a bit unwilling to do further debugging of this one as I usually use
> two sessions when I am at work and I risk loosing data I work on. But… at
> least with this issue it seems I would have a way to SSH into the machine
> before kicking it.
>
>
> I am dissatisfied with the state of the Intel graphics driver on this ThinkPad
> T520 with Sandybridge since kernel 4.9 and wonder whether you guys at Intel
> really test things with older hardware versions.
Yes, we do. But for practical reasons we can only do testing for things
that we actually have testcases for, and obviously we don't have the
manpower to actually do *manual* testing on every platform, so issues
for older platforms that are only triggered by manual interaction tend
to slip under the radar.
We have a testfarm that tests every nightly build on all platforms we
have test machines for. The testcases are publicly available here:
https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/
Obviously most of our manpower is spent on development and testing for current
and future platforms, so for issues that involve older platforms,
especially something as old as Sandybridge (which is, by now, 6 years old)
we are happy for help with testing and bisection.
If the issues are specific to certain subsets of a platform it obviously
gets even more complex; it'd be a combinatorial nightmare to build a
testfarm that could test every variation of every platform.
If I got the count right the i915 driver supports around a hundred
different varieties of Intel graphics; combine that with the number of
different displays people connect, the number of eDP display that the
vendors connect, the different BIOSes that vendors use, etc., and I
think you'll begin to see what we're combating) -- to make things even
more complex you can connect several displays to each graphics card
(possibly via adapters), displays that don't always meet the standards
that they claim to meet. Due to limited room we are also a bit limited
when it comes to testing with multi-monitor setups.
This is why any help is welcome and sometimes even necessary. If you're
afraid of dataloss, be aware that it's possible to boot your system with
file systems mounted read-only; you could also boot from a USB-stick or
similar.
If you can find a testcase in i-g-t that easily reproduces the issue
that'd also be very helpful. Do note that not all testcases in i-g-t
are run as part of our nightly tests, since some of them are *extremely*
time consuming; the full combinatorial testcase, for instance, can
take weeks or months--I haven't done a full run recently--to complete.
I hope this helps you understand why bugs can slip under the radar,
and why a bisect is so important.
Kind regards, David Weinehall
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Intel-gfx] [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5)
2017-02-01 13:11 ` [Intel-gfx] " David Weinehall
@ 2017-02-11 14:55 ` Martin Steigerwald
0 siblings, 0 replies; 4+ messages in thread
From: Martin Steigerwald @ 2017-02-11 14:55 UTC (permalink / raw)
To: David Weinehall
Cc: Linux Kernel Mailing List, Intel Gfx Mailing List, Linus Torvalds
Am Mittwoch, 1. Februar 2017, 14:11:22 CET schrieb David Weinehall:
> On Wed, Jan 25, 2017 at 01:10:26PM +0100, Martin Steigerwald wrote:
> > Am Sonntag, 22. Januar 2017, 13:32:08 CET schrieb Linus Torvalds:
> > > Things seem to be calming down a bit, and everything looks nominal.
> > >
> > > There's only been about 250 changes (not counting merges) in the last
> > > week, and the diffstat touches less than 300 files (with drivers and
> > > architecture updates being the bulk, but there's tooling, networking
> > > and filesystems in there too).
> > >
> > > So keep testing, and I think we'll have a regular release schedule.
> >
> > Testing this is no fun:
> >
> > Bug 99533 - black screen after switching session
> > https://bugs.freedesktop.org/99533
> >
> >
> > This after GPU hang/lockups with Kernel 4.9 reported as for example:
> >
> > Bug 98922 - [snb] GPU hang on PlaneShift
> > https://bugs.freedesktop.org/98922
> >
> > Which may be a duplicate of #98747, #98794, #98860, #98891, #98288.
> >
> >
> > I am back at kernel 4.8.15 as I need this machine for production work.
> >
> > Sometimes I wish for a microkernel that might be able to reincarnate
> > drivers that hang or do wierd things like that. That may at least give a
> > way to actually do some debugging or even get the desktop session back
> > without loosing its state. Especially for graphics drivers and
> > hibernating/resuming from hibernations which also occasionally fails –
> > again without leaving a way to interact with the machine to do further
> > debugging. Linux kernel usually just crashes completely, not even a ping
> > or ssh possible, or it at least stuck with a black display without any
> > way to restart the graphics driver cause it seems to be in some undefined
> > state. Combined with occasionally happening bugs this makes triaging bugs
> > time consuming and risky. I do like to help testing, but maybe its time
> > to just switch to distro kernels and be done about it, as I regularily
> > come across bugs that are too expensive for me to triage.
> >
> > Please understand that I am not willing to bisect these occasionally
> > happening bugs with have the potential to cause data loss due to having
> > to switch off the machine forcefully. Fortunately at least KMail saves a
> > mail I write from time to time and also Kate does swap files.
> >
> > I am also a bit unwilling to do further debugging of this one as I usually
> > use two sessions when I am at work and I risk loosing data I work on.
> > But… at least with this issue it seems I would have a way to SSH into the
> > machine before kicking it.
> >
> >
> > I am dissatisfied with the state of the Intel graphics driver on this
> > ThinkPad T520 with Sandybridge since kernel 4.9 and wonder whether you
> > guys at Intel really test things with older hardware versions.
>
> Yes, we do. But for practical reasons we can only do testing for things
> that we actually have testcases for, and obviously we don't have the
> manpower to actually do *manual* testing on every platform, so issues
> for older platforms that are only triggered by manual interaction tend
> to slip under the radar.
>
> We have a testfarm that tests every nightly build on all platforms we
> have test machines for. The testcases are publicly available here:
>
> https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/
>
> Obviously most of our manpower is spent on development and testing for
> current and future platforms, so for issues that involve older platforms,
> especially something as old as Sandybridge (which is, by now, 6 years old)
> we are happy for help with testing and bisection.
>
> If the issues are specific to certain subsets of a platform it obviously
> gets even more complex; it'd be a combinatorial nightmare to build a
> testfarm that could test every variation of every platform.
>
> If I got the count right the i915 driver supports around a hundred
> different varieties of Intel graphics; combine that with the number of
> different displays people connect, the number of eDP display that the
> vendors connect, the different BIOSes that vendors use, etc., and I
> think you'll begin to see what we're combating) -- to make things even
> more complex you can connect several displays to each graphics card
> (possibly via adapters), displays that don't always meet the standards
> that they claim to meet. Due to limited room we are also a bit limited
> when it comes to testing with multi-monitor setups.
>
> This is why any help is welcome and sometimes even necessary. If you're
> afraid of dataloss, be aware that it's possible to boot your system with
> file systems mounted read-only; you could also boot from a USB-stick or
> similar.
>
> If you can find a testcase in i-g-t that easily reproduces the issue
> that'd also be very helpful. Do note that not all testcases in i-g-t
> are run as part of our nightly tests, since some of them are *extremely*
> time consuming; the full combinatorial testcase, for instance, can
> take weeks or months--I haven't done a full run recently--to complete.
>
> I hope this helps you understand why bugs can slip under the radar,
> and why a bisect is so important.
Wow, David. Does that mean that even Intel cannot really test the driver for
the hardware it supports?
A bisect of a hang the machine bug that only happens after a certain time of
using the computer and switching between sessions then is too expensive for
me. Thats the whole point I tried to make. The *cost* to provide a *useful*
bug report is too high.
You say you can´t cover this with a test case – I think switching between
sessions *could* be automated – and then you ask for help, yet to provide this
help an effort is needed that is beyond what I am willing to invest and which
IMHO is beyond what many users are willing to invest. See:
It would easily take 10-15 iterations as far as I remember from my last
bisect. And I´d either risk data loss *or* I´d use a live linux which means
that during that time I can´t use the machine for productive work. *Each time*
it may take anything between 10 minutes and several hours for the issue to
appear. I´d need to reboot, compile the next kernel, either copy it to USB
drive and boot from there or install it, and then repeat the testing steps.
I bet that would take me about one or two complete days to eventually find the
offending commit. I would not feel comfortable asking my employer for these
one to two days to do that work and my leisure time is also too valuable to me
and to full with other things to reduce it by that amount of time for every
bug like this.
Next week I hold a training, since in this particular case with 4.10 it
appears – I didn´t verify it – that "just" the gfx driver is broken, I might
be able to log in into the machine from a training workstation, so… I could at
least try to obtain some kind of gpu state dump, while I do most of my work on
the training workstation anyway. I remember Jani having told that backtraces
are mostly useless (then why bother to do them at all instead of just logging
"gfx driver crashed, use tool xyz to obtain debug info"?) and there is a new
way to dump the state of the gpu when it is hung.
But a bisect of an issue like this is an effort that is exceeding what I am
willing to put into it. And I think I am not alone with it.
With other issues like hangs during resume and on waking up that happen
occassionally I have given up already. I don´t even remember in which kernels
it started and it is even more costly to bisect. I actually don´t even
remember whether it worked okay at all since I gave up on compiling TuxOnIce
into every kernel.
I am giving up here as well now, unless there is a way to provide you with
sufficient debug information without doing a bisect here, i.e. by a gpu state
dump or something like that.
Upto to now on Linux there does not seem to be a gfx driver that either
*never* *ever* hangs, or at least manages to put out enough debug information,
if need be even onto a plain block device, in order to create a useful bug
report.
Added to that the development speed of one new kernel every three months I see
no realistic chance to keep the driver in a fully working state for the
hardware it supports. The effort to toroughly bisect every nasty bug like this
would just be too high. If invested with every hang bug the current kernels
have… – I have seen 4 different issues in 4.9 and 4.10 *just* on this ThinkPad
T520 – it may even exceed the development time.
So until at some time the effort needed to provide a *useful* bug report can
be reduced, I am out. I am willing to spend some hours into it, but not some
days for every single hang sometimes issue.
If you ask me instabilities like this… like also the instabilities within
Plasma / KWin which where related to Intel driver bugs, are one reason why
Linux still is not yet ready for the desktop.
Sorry,
--
Martin
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-02-11 14:55 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-22 21:32 Linux 4.10-rc5 Linus Torvalds
2017-01-25 12:10 ` [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5) Martin Steigerwald
2017-02-01 13:11 ` [Intel-gfx] " David Weinehall
2017-02-11 14:55 ` Martin Steigerwald
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).