* Linux 4.10-rc5 @ 2017-01-22 21:32 Linus Torvalds 2017-01-25 12:10 ` [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5) Martin Steigerwald 0 siblings, 1 reply; 4+ messages in thread From: Linus Torvalds @ 2017-01-22 21:32 UTC (permalink / raw) To: Linux Kernel Mailing List Things seem to be calming down a bit, and everything looks nominal. There's only been about 250 changes (not counting merges) in the last week, and the diffstat touches less than 300 files (with drivers and architecture updates being the bulk, but there's tooling, networking and filesystems in there too). So keep testing, and I think we'll have a regular release schedule. Linus --- Adam Ford (2): ARM: OMAP2+: Fix WL1283 Bluetooth Baud Rate ARM: dts: omap3: Fix Card Detect and Write Protect on Logic PD SOM-LV Alexander Graf (1): arm64: Fix swiotlb fallback allocation Alexandre Belloni (1): usb: gadget: udc: atmel: remove memory leak Amelie Delaunay (1): usb: dwc2: gadget: Fix GUSBCFG.USBTRDTIM value Amir Goldstein (7): xfs: make the ASSERT() condition likely xfs: sanity check directory inode di_size xfs: add missing include dependencies to xfs_dir2.h xfs: replace xfs_mode_to_ftype table with switch statement xfs: sanity check inode mode when creating new dentry xfs: sanity check inode di_mode ovl: fix possible use after free on redirect dir lookup Andrey Smirnov (1): at86rf230: Allow slow GPIO pins for "rstn" Andy Shevchenko (2): spi: dw-mid: switch to new dmaengine_terminate_* API (part 2) spi: pxa2xx: add missed break Aneesh Kumar K.V (2): powerpc/mm/hugetlb: Don't panic when we don't find the default huge page size powerpc/mm: Fix little-endian 4K hugetlb Anton Blanchard (1): powerpc: Ignore reserved field in DCSR and PVR reads and writes Arkadi Sharshevsky (2): mlxsw: spectrum: Fix memory leak at skb reallocation mlxsw: switchx2: Fix memory leak at skb reallocation Arnd Bergmann (5): ARM: ux500: fix prcmu_is_cpu_in_wfi() calculation cpmac: remove hopeless #warning net/mlx5e: Fix a -Wmaybe-uninitialized warning ubifs: add CONFIG_BLOCK dependency for encryption xfs: fix xfs_mode_to_ftype() prototype Bart Van Assche (4): qla2xxx: Fix indentation qla2xxx: Declare an array with file scope static qla2xxx: Move two arrays from header files to .c files qla2xxx: Avoid that building with W=1 triggers complaints about set-but-not-used variables Basil Gunn (1): ax25: Fix segfault after sock connection timeout Beni Lev (1): cfg80211: consider VHT opmode on station update Benjamin Coddington (1): nfs: Don't take a reference on fl->fl_file for LOCK operation Benjamin Herrenschmidt (1): powerpc/icp-opal: Fix missing KVM case and harden replay Bhumika Goyal (2): vhost: scsi: constify target_core_fabric_ops structures virtio/s390: virtio: constify virtio_config_ops structures Bjorn Helgaas (2): x86/PCI: Ignore _CRS on Supermicro X8DTH-i/6/iF/6F PCI: Enumerate switches below PCI-to-PCIe bridges Brian Norris (2): thermal: rockchip: improve conversion error messages thermal: rockchip: don't pass table structs by value Bryant G. Ly (2): ibmvscsis: Fix max transfer length ibmvscsis: Fix sleeping in interrupt context Caesar Wang (4): thermal: rockchip: fixes invalid temperature case thermal: rockchip: optimize the conversion table thermal: rockchip: handle set_trips without the trip points thermal: rockchip: fixes the conversion table Cedric Izoard (1): mac80211: Fix headroom allocation when forwarding mesh pkt Chen-Yu Tsai (2): ARM: dts: sun6i: Disable display pipeline by default ARM: dts: sun6i: hummingbird: Enable display engine again Christian Borntraeger (1): KVM: s390: do not expose random data via facility bitmap Christoffer Dall (1): KVM: arm/arm64: Fix occasional warning from the timer work function Christoph Hellwig (2): scsi: qla2xxx: fix MSI-X vector affinity scsi: qla2xxx: remove irq_affinity_notifier Christophe JAILLET (2): spi: spi-axi: Free resources on error path usb: gadget: composite: Fix function used to free memory Colin Ian King (3): spi: armada-3700: fix unsigned compare than zero on irq ubifs: ensure zero err is returned on successful return virtio/s390: add missing \n to end of dev_err message Damien Le Moal (2): scsi: sd: Fix wrong DPOFUA disable in sd_read_cache_type scsi: sd: Ignore zoned field for host-managed devices Dan Carpenter (2): spi: armada-3700: Set mode bits correctly vhost/scsi: silence uninitialized variable warning Dan Williams (1): libnvdimm, namespace: fix pmem namespace leak, delete when size set to zero Daniel Borkmann (1): bpf: rework prog_digest into prog_tag Dave Jones (1): scsi: qla2xxx: Fix apparent cut-n-paste error. Dave Martin (7): arm64/ptrace: Preserve previous registers for short regset write arm64/ptrace: Preserve previous registers for short regset write arm64/ptrace: Preserve previous registers for short regset write arm64/ptrace: Avoid uninitialised struct padding in fpr_set() arm64/ptrace: Reject attempts to set incomplete hardware breakpoint fields powerpc/ptrace: Preserve previous fprs/vsrs on short regset write powerpc/ptrace: Preserve previous TM fprs/vsrs on short regset write David Ahern (2): net: lwtunnel: Handle lwtunnel_fill_encap failure net: ipv4: fix table id in getroute response David Lebrun (1): ipv6: sr: fix several BUGs when preemption is enabled David Sheets (1): fuse: fix time_to_jiffies nsec sanity check Dmitry Vyukov (1): KVM: x86: fix fixing of hypercalls Elad Raz (1): mlxsw: pci: Fix EQE structure definition Emmanuel Grumbach (1): mac80211: fix the TID on NDPs sent as EOSP carrier Emmanuel Vadot (1): ARM: dts: sunxi: Change node name for pwrseq pin on Olinuxino-lime2-emmc Eric Biggers (2): ubifs: allow encryption ioctls in compat mode ubifs: remove redundant checks for encryption key Eric Dumazet (1): mlx4: do not call napi_schedule() without care Eric Sandeen (1): xfs: don't wrap ID in xfs_dq_get_next_id Ewan D. Milne (1): scsi: ses: Fix SAS device detection in enclosure Fabien Parent (1): ARM: dts: da850-evm: fix read access to SPI flash Fabio Estevam (1): thermal: thermal_hwmon: Convert to hwmon_device_register_with_info() Fam Zheng (1): scsi: libfc: Fix variable name in fc_set_wwpn Felix Fietkau (1): mac80211: initialize SMPS field in HT capabilities Florian Fainelli (1): net: systemport: Decouple flow control from __bcm_sysport_tx_reclaim G. Campana (1): virtio_console: fix a crash in config_work_handler Gary Bisson (2): ARM: dts: imx6qdl-nitrogen6_max: fix sgtl5000 pinctrl init ARM: dts: imx6qdl-nitrogen6_som2: fix sgtl5000 pinctrl init Gavin Shan (1): powerpc/eeh: Enable IO path on permanent error Geert Uytterhoeven (1): spi: SPI_FSL_DSPI should depend on HAS_DMA Halil Pasic (2): tools/virtio/ringtest: fix run-on-all.sh for offline cpus tools/virtio/ringtest: tweaks for s390 Hangbin Liu (1): mld: do not remove mld souce list info when set link down Hans de Goede (1): mmc: sdhci-acpi: Only powered up enabled acpi child devices Hauke Mehrtens (2): mtd: nand: xway: disable module support mtd: nand: xway: fix build because of module functions Heiko Carstens (2): s390/ctl_reg: make __ctl_load a full memory barrier s390: update defconfigs Heiner Kallweit (1): net: stmmac: don't use netdev_[dbg, info, ..] before net_device is registered Heinrich Schuchardt (1): MMC: meson: avoid possible NULL dereference Himanshu Madhani (3): qla2xxx: Include ATIO queue in firmware dump when in target mode qla2xxx: Set tcm_qla2xxx version to automatically track qla2xxx version qla2xxx: Reset reserved field in firmware options to 0 Ilya Dryomov (1): libceph: make sure ceph_aes_crypt() IV is aligned Ivan Vecera (3): be2net: fix status check in be_cmd_pmac_add() be2net: don't delete MAC on close on unprivileged BE3 VFs be2net: fix MAC addr setting on privileged BE3 VFs J. Bruce Fields (2): nfsd: fix supported attributes for acl & labels svcrpc: don't leak contexts on PROC_DESTROY Jack Morgenstein (3): net/mlx4_core: Fix racy CQ (Completion Queue) free net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV Jacob von Chorus (1): thermal: core: move tz->device.groups cleanup to thermal_release Jakub Sitnicki (1): ip6_tunnel: Account for tunnel header in tunnel MTU Jamal Hadi Salim (1): net sched actions: fix refcnt when GETing of action after bind James Bottomley (1): scsi: mpt3sas: fix hang on ata passthrough commands Jason Gerecke (1): HID: wacom: Fix sibling detection regression Jean-Jacques Hiblot (1): ARM: dts: OMAP5 / DRA7: indicate that SATA port 0 is available. Jeff Layton (3): ceph: fix endianness of getattr mask in ceph_d_revalidate ceph: fix endianness bug in frag_tree_split_cmp ceph: fix bad endianness handling in parse_reply_info_extra Jintack Lim (1): KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems Johan Hovold (2): HID: corsair: fix DMA buffers on stack HID: corsair: fix control-transfer error handling Johannes Berg (3): mac80211: implement multicast forwarding on fast-RX path mac80211: calculate min channel width correctly mac80211: recalculate min channel width on VHT opmode changes Johannes Thumshirn (2): scsi: bfa: fix wrongly initialized variable in bfad_im_bsg_els_ct_request() scsi: lpfc: Set elsiocb contexts to NULL after freeing it John Stultz (1): usb: dwc2: Avoid suspending if we're in gadget mode Jon Mason (1): ARM: dts: NSP: Fix DT ranges error Joonyoung Shim (1): clocksource/exynos_mct: Clear interrupt when cpu is shut down Josef Bacik (1): nbd: only set MSG_MORE when we have more to send Karicheri, Muralidharan (1): net: phy: dp83867: allow RGMII_TXID/RGMII_RXID interface types Kazuya Mizuguchi (1): ravb: Remove Rx overflow log messages Keith Busch (1): blk-mq: Remove unused variable Kevin Hilman (1): spi: davinci: use dma_mapping_error() Krzysztof Kozlowski (2): MAINTAINERS: Add Patchwork URL to Samsung Exynos entry ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_* Lance Richardson (1): openvswitch: maintain correct checksum state in conntrack actions Larry Finger (1): taint/module: Fix problems when out-of-kernel driver defines true or false Leo Yan (1): usb: dwc2: use u32 for DT binding parameters Linus Torvalds (1): Linux 4.10-rc5 Linus Walleij (1): ARM: 8613/1: Fix the uaccess crash on PB11MPCore Lokesh Vutla (1): ARM: dts: am335x-icev2: Remove the duplicated pinmux setting Madhavan Srinivasan (3): powerpc/perf: Fix PM_BRU_CMPL event code for power9 selftest/powerpc: Wrong PMC initialized in pmc56_overflow test powerpc/perf: Use MSR to report privilege level on P9 DD1 Marc Gonzalez (2): mtd: nand: tango: Update DT binding description mtd: nand: tango: Reset pbus to raw mode in probe Marc Zyngier (2): KVM: arm/arm64: vgic: Fix deadlock on error handling PCI/MSI: pci-xgene-msi: Fix CPU hotplug registration handling Marek Szyprowski (1): clk/samsung: exynos542x: mark some clocks as critical Mark Rutland (2): ARM: 8634/1: hw_breakpoint: blacklist Scorpion CPUs arm64: avoid returning from bad_mode Martynas Pumputis (1): vxlan: Set ports in flow key when doing route lookups Masahiro Yamada (1): ARM, ARM64: dts: drop "arm,amba-bus" in favor of "simple-bus" part 3 Masami Hiramatsu (3): perf probe: Fix to show correct locations for events on modules perf probe: Add error checks to offline probe post-processing perf probe: Fix to probe on gcc generated functions in modules Masaru Nagai (1): ravb: do not use zero-length alignment DMA descriptor Mathias Nyman (1): xhci: remove WARN_ON if dma mask is not set for platform devices Michal Kazior (1): mac80211: prevent skb/txq mismatch Michal Simek (1): ARM64: zynqmp: Fix W=1 dtc 1.4 warnings Milan P. Gandhi (1): scsi: qla2xxx: Get mutex lock before checking optrom_state Milo Kim (1): ARM: dts: sun8i: Support DTB build for NanoPi M1 Moritz Fischer (1): ARM64: zynqmp: Fix i2c node's compatible string Murali Karicheri (1): PCI: designware: Check for iATU unroll only on platforms that use ATU Neil Armstrong (1): ARM64: dts: meson-gxbb-odroidc2: Disable SCPI DVFS Nicholas Mc Guire (1): usb: dwc2: host: fix Wmaybe-uninitialized warning Nicholas Piggin (1): powerpc: Fix pgtable pmd cache init Nicolas Dichtel (1): ARM: put types.h in uapi Nikita Yushchenko (1): swiotlb: ensure that page-sized mappings are page-aligned Oleksandr Andrushchenko (1): arm64: mm: avoid name clash in __page_to_voff() Parthasarathy Bhuvaragan (1): tipc: allocate user memory with GFP_KERNEL flag Paul E. McKenney (2): rcu: Remove cond_resched() from Tiny synchronize_sched() rcu: Narrow early boot window of illegal synchronous grace periods Peter Rosin (1): ubifs: fix unencrypted journal write Peter Ujfalusi (1): ARM: OMAP1: DMA: Correct the number of logical channels Phil Reid (1): spi: dw: Make debugfs name unique between instances Pierre Morel (1): virtio/s390: support READ_STATUS command for virtio-ccw Quinn Tran (7): qla2xxx: Fix wrong IOCB type assumption qla2xxx: Collect additional information to debug fw dump qla2xxx: Fix crash due to null pointer access qla2xxx: Terminate exchange if corrupted qla2xxx: Reduce exess wait during chip reset qla2xxx: Fix erroneous invalid handle message qla2xxx: Disable out-of-order processing by default in firmware Rabin Vincent (1): ARM: 8632/1: ftrace: fix syscall name matching Randy Dunlap (1): mtd: nand: oxnas_nand: fix build errors on arch/um, require HAS_IOMEM Reza Arbab (1): powerpc/mm: Fix memory hotplug BUG() on radix Richard Weinberger (1): ubifs: Fix journal replay wrt. xattr nodes Roberto Sassu (1): scsi: lpfc: avoid double free of resource identifiers Ruslan Ruslichenko (1): x86/ioapic: Restore IO-APIC irq_chip retrigger callback Russell King (1): MAINTAINERS: update rmk's entries Scott Mayhew (1): sunrpc: don't call sleeping functions from the notifier block callbacks Sedat Dilek (1): perf/x86/amd/ibs: Fix typo after cleanup state names in cpu/hotplug Sekhar Nori (1): ARM: dts: dra72-evm-revc: fix typo in ethernet-phy node Shannon Nelson (1): tcp: fix tcp_fastopen unaligned access complaints on sparc Shuah Khan (1): usb: dwc3: exynos fix axius clock error path to do cleanup Simon Horman (2): spi: sh-msiof: Add R-Car Gen 2 and 3 fallback bindings spi: sh-msiof: Do not use C++ style comment Sriharsha Basavapatna (1): svcrdma: avoid duplicate dma unmapping during error recovery Stefan Hajnoczi (1): pmem: return EIO on read_pmem() failure Stefan Schmidt (4): ieee802154: atusb: do not use the stack for buffers to make them DMA able ieee802154: atusb: make sure we set a randaom extended address if fetching fails ieee802154: atusb: do not use the stack for address fetching to make it DMA able ieee802154: atusb: fix driver to work with older firmware versions Stefan Wahren (1): mmc: mxs-mmc: Fix additional cycles after transmission stop Stefano Stabellini (1): partially revert "xen: Remove event channel notification through Xen PCI platform device" Tahsin Erdogan (1): fuse: clear FR_PENDING flag when moving requests out of pending queue Thomas Gleixner (1): cpu/hotplug: Provide dynamic range for prepare stage Timur Tabi (1): net: qcom/emac: grab a reference to the phydev on ACPI systems Tobias Klauser (1): cpu/hotplug: Remove unused but set variable in _cpu_down() Trond Myklebust (5): NFSv4: Call update_changeattr() from _nfs4_proc_open only if a file was created NFSv4: Don't apply change_info4 twice on rename within a directory NFSv4: Don't call update_changeattr() unless the unlink is successful NFSv4: update_changeattr should update the attribute timestamp NFSv4: Fix client recovery when server reboots multiple times Ulf Hansson (1): mmc: core: Restore parts of the polling policy when switch to HS/HS DDR Vadim Lomovtsev (1): net: thunderx: acpi: fix LMAC initialization Valentin Rothberg (2): ARM: multi_v7_defconfig: fix config typo ARM: multi_v7_defconfig: set bcm47xx watchdog Vardan Mikayelyan (1): usb: dwc2: gadget: Fix DMA memory freeing Vincent Pelletier (1): usb: gadget: f_fs: Fix iterations on endpoints. Vineet Gupta (8): ARC: mmu: clarify the MMUv3 programming model ARCv2: save r30 on kernel entry as gcc uses it for code-gen ARC: module: Fix !CONFIG_ARC_DW2_UNWIND builds ARCv2: IOC: refactor the IOC and SLC operations into own functions ARCv2: IOC: Adhere to progamming model guidelines to avoid DMA corruption ARCv2: IOC: Use actual memory size to setup aperture size ARC: mm: split arc_cache_init to allow __init reaping of bulk ARC: Revert "ARC: mm: IOC: Don't enable IOC by default" Vladimir Zapolskiy (1): mtd: nand: lpc32xx: fix invalid error handling of a requested irq Wei Yongjun (1): soc: ti: wkup_m3_ipc: Fix error return code in wkup_m3_ipc_probe() Yan, Zheng (1): ceph: fix ceph_get_caps() interruption Yuriy Kolerov (2): ARC: IRQ: Use hwirq instead of virq in mask/unmask ARCv2: IRQ: Call entry/exit functions for chained handlers in MCIP Zhou Chengming (1): perf/x86/intel: Handle exclusive threadid correctly on CPU hotplug hayeswang (1): r8152: fix the sw rx checksum is unavailable stephen hemminger (1): netvsc: add rcu_read locking to netvsc callback ^ permalink raw reply [flat|nested] 4+ messages in thread
* [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5) 2017-01-22 21:32 Linux 4.10-rc5 Linus Torvalds @ 2017-01-25 12:10 ` Martin Steigerwald 2017-02-01 13:11 ` [Intel-gfx] " David Weinehall 0 siblings, 1 reply; 4+ messages in thread From: Martin Steigerwald @ 2017-01-25 12:10 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Linus Torvalds, Intel Gfx Mailing List, Jani Nikula Am Sonntag, 22. Januar 2017, 13:32:08 CET schrieb Linus Torvalds: > Things seem to be calming down a bit, and everything looks nominal. > > There's only been about 250 changes (not counting merges) in the last > week, and the diffstat touches less than 300 files (with drivers and > architecture updates being the bulk, but there's tooling, networking > and filesystems in there too). > > So keep testing, and I think we'll have a regular release schedule. Testing this is no fun: Bug 99533 - black screen after switching session https://bugs.freedesktop.org/99533 This after GPU hang/lockups with Kernel 4.9 reported as for example: Bug 98922 - [snb] GPU hang on PlaneShift https://bugs.freedesktop.org/98922 Which may be a duplicate of #98747, #98794, #98860, #98891, #98288. I am back at kernel 4.8.15 as I need this machine for production work. Sometimes I wish for a microkernel that might be able to reincarnate drivers that hang or do wierd things like that. That may at least give a way to actually do some debugging or even get the desktop session back without loosing its state. Especially for graphics drivers and hibernating/resuming from hibernations which also occasionally fails – again without leaving a way to interact with the machine to do further debugging. Linux kernel usually just crashes completely, not even a ping or ssh possible, or it at least stuck with a black display without any way to restart the graphics driver cause it seems to be in some undefined state. Combined with occasionally happening bugs this makes triaging bugs time consuming and risky. I do like to help testing, but maybe its time to just switch to distro kernels and be done about it, as I regularily come across bugs that are too expensive for me to triage. Please understand that I am not willing to bisect these occasionally happening bugs with have the potential to cause data loss due to having to switch off the machine forcefully. Fortunately at least KMail saves a mail I write from time to time and also Kate does swap files. I am also a bit unwilling to do further debugging of this one as I usually use two sessions when I am at work and I risk loosing data I work on. But… at least with this issue it seems I would have a way to SSH into the machine before kicking it. I am dissatisfied with the state of the Intel graphics driver on this ThinkPad T520 with Sandybridge since kernel 4.9 and wonder whether you guys at Intel really test things with older hardware versions. Thanks, -- Martin ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Intel-gfx] [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5) 2017-01-25 12:10 ` [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5) Martin Steigerwald @ 2017-02-01 13:11 ` David Weinehall 2017-02-11 14:55 ` Martin Steigerwald 0 siblings, 1 reply; 4+ messages in thread From: David Weinehall @ 2017-02-01 13:11 UTC (permalink / raw) To: Martin Steigerwald Cc: Linux Kernel Mailing List, Intel Gfx Mailing List, Linus Torvalds On Wed, Jan 25, 2017 at 01:10:26PM +0100, Martin Steigerwald wrote: > Am Sonntag, 22. Januar 2017, 13:32:08 CET schrieb Linus Torvalds: > > Things seem to be calming down a bit, and everything looks nominal. > > > > There's only been about 250 changes (not counting merges) in the last > > week, and the diffstat touches less than 300 files (with drivers and > > architecture updates being the bulk, but there's tooling, networking > > and filesystems in there too). > > > > So keep testing, and I think we'll have a regular release schedule. > > Testing this is no fun: > > Bug 99533 - black screen after switching session > https://bugs.freedesktop.org/99533 > > > This after GPU hang/lockups with Kernel 4.9 reported as for example: > > Bug 98922 - [snb] GPU hang on PlaneShift > https://bugs.freedesktop.org/98922 > > Which may be a duplicate of #98747, #98794, #98860, #98891, #98288. > > > I am back at kernel 4.8.15 as I need this machine for production work. > > Sometimes I wish for a microkernel that might be able to reincarnate drivers > that hang or do wierd things like that. That may at least give a way to > actually do some debugging or even get the desktop session back without > loosing its state. Especially for graphics drivers and hibernating/resuming > from hibernations which also occasionally fails – again without leaving a way > to interact with the machine to do further debugging. Linux kernel usually > just crashes completely, not even a ping or ssh possible, or it at least stuck > with a black display without any way to restart the graphics driver cause it > seems to be in some undefined state. Combined with occasionally happening bugs > this makes triaging bugs time consuming and risky. I do like to help testing, > but maybe its time to just switch to distro kernels and be done about it, as I > regularily come across bugs that are too expensive for me to triage. > > Please understand that I am not willing to bisect these occasionally happening > bugs with have the potential to cause data loss due to having to switch off > the machine forcefully. Fortunately at least KMail saves a mail I write from > time to time and also Kate does swap files. > > I am also a bit unwilling to do further debugging of this one as I usually use > two sessions when I am at work and I risk loosing data I work on. But… at > least with this issue it seems I would have a way to SSH into the machine > before kicking it. > > > I am dissatisfied with the state of the Intel graphics driver on this ThinkPad > T520 with Sandybridge since kernel 4.9 and wonder whether you guys at Intel > really test things with older hardware versions. Yes, we do. But for practical reasons we can only do testing for things that we actually have testcases for, and obviously we don't have the manpower to actually do *manual* testing on every platform, so issues for older platforms that are only triggered by manual interaction tend to slip under the radar. We have a testfarm that tests every nightly build on all platforms we have test machines for. The testcases are publicly available here: https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/ Obviously most of our manpower is spent on development and testing for current and future platforms, so for issues that involve older platforms, especially something as old as Sandybridge (which is, by now, 6 years old) we are happy for help with testing and bisection. If the issues are specific to certain subsets of a platform it obviously gets even more complex; it'd be a combinatorial nightmare to build a testfarm that could test every variation of every platform. If I got the count right the i915 driver supports around a hundred different varieties of Intel graphics; combine that with the number of different displays people connect, the number of eDP display that the vendors connect, the different BIOSes that vendors use, etc., and I think you'll begin to see what we're combating) -- to make things even more complex you can connect several displays to each graphics card (possibly via adapters), displays that don't always meet the standards that they claim to meet. Due to limited room we are also a bit limited when it comes to testing with multi-monitor setups. This is why any help is welcome and sometimes even necessary. If you're afraid of dataloss, be aware that it's possible to boot your system with file systems mounted read-only; you could also boot from a USB-stick or similar. If you can find a testcase in i-g-t that easily reproduces the issue that'd also be very helpful. Do note that not all testcases in i-g-t are run as part of our nightly tests, since some of them are *extremely* time consuming; the full combinatorial testcase, for instance, can take weeks or months--I haven't done a full run recently--to complete. I hope this helps you understand why bugs can slip under the radar, and why a bisect is so important. Kind regards, David Weinehall ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Intel-gfx] [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5) 2017-02-01 13:11 ` [Intel-gfx] " David Weinehall @ 2017-02-11 14:55 ` Martin Steigerwald 0 siblings, 0 replies; 4+ messages in thread From: Martin Steigerwald @ 2017-02-11 14:55 UTC (permalink / raw) To: David Weinehall Cc: Linux Kernel Mailing List, Intel Gfx Mailing List, Linus Torvalds Am Mittwoch, 1. Februar 2017, 14:11:22 CET schrieb David Weinehall: > On Wed, Jan 25, 2017 at 01:10:26PM +0100, Martin Steigerwald wrote: > > Am Sonntag, 22. Januar 2017, 13:32:08 CET schrieb Linus Torvalds: > > > Things seem to be calming down a bit, and everything looks nominal. > > > > > > There's only been about 250 changes (not counting merges) in the last > > > week, and the diffstat touches less than 300 files (with drivers and > > > architecture updates being the bulk, but there's tooling, networking > > > and filesystems in there too). > > > > > > So keep testing, and I think we'll have a regular release schedule. > > > > Testing this is no fun: > > > > Bug 99533 - black screen after switching session > > https://bugs.freedesktop.org/99533 > > > > > > This after GPU hang/lockups with Kernel 4.9 reported as for example: > > > > Bug 98922 - [snb] GPU hang on PlaneShift > > https://bugs.freedesktop.org/98922 > > > > Which may be a duplicate of #98747, #98794, #98860, #98891, #98288. > > > > > > I am back at kernel 4.8.15 as I need this machine for production work. > > > > Sometimes I wish for a microkernel that might be able to reincarnate > > drivers that hang or do wierd things like that. That may at least give a > > way to actually do some debugging or even get the desktop session back > > without loosing its state. Especially for graphics drivers and > > hibernating/resuming from hibernations which also occasionally fails – > > again without leaving a way to interact with the machine to do further > > debugging. Linux kernel usually just crashes completely, not even a ping > > or ssh possible, or it at least stuck with a black display without any > > way to restart the graphics driver cause it seems to be in some undefined > > state. Combined with occasionally happening bugs this makes triaging bugs > > time consuming and risky. I do like to help testing, but maybe its time > > to just switch to distro kernels and be done about it, as I regularily > > come across bugs that are too expensive for me to triage. > > > > Please understand that I am not willing to bisect these occasionally > > happening bugs with have the potential to cause data loss due to having > > to switch off the machine forcefully. Fortunately at least KMail saves a > > mail I write from time to time and also Kate does swap files. > > > > I am also a bit unwilling to do further debugging of this one as I usually > > use two sessions when I am at work and I risk loosing data I work on. > > But… at least with this issue it seems I would have a way to SSH into the > > machine before kicking it. > > > > > > I am dissatisfied with the state of the Intel graphics driver on this > > ThinkPad T520 with Sandybridge since kernel 4.9 and wonder whether you > > guys at Intel really test things with older hardware versions. > > Yes, we do. But for practical reasons we can only do testing for things > that we actually have testcases for, and obviously we don't have the > manpower to actually do *manual* testing on every platform, so issues > for older platforms that are only triggered by manual interaction tend > to slip under the radar. > > We have a testfarm that tests every nightly build on all platforms we > have test machines for. The testcases are publicly available here: > > https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/ > > Obviously most of our manpower is spent on development and testing for > current and future platforms, so for issues that involve older platforms, > especially something as old as Sandybridge (which is, by now, 6 years old) > we are happy for help with testing and bisection. > > If the issues are specific to certain subsets of a platform it obviously > gets even more complex; it'd be a combinatorial nightmare to build a > testfarm that could test every variation of every platform. > > If I got the count right the i915 driver supports around a hundred > different varieties of Intel graphics; combine that with the number of > different displays people connect, the number of eDP display that the > vendors connect, the different BIOSes that vendors use, etc., and I > think you'll begin to see what we're combating) -- to make things even > more complex you can connect several displays to each graphics card > (possibly via adapters), displays that don't always meet the standards > that they claim to meet. Due to limited room we are also a bit limited > when it comes to testing with multi-monitor setups. > > This is why any help is welcome and sometimes even necessary. If you're > afraid of dataloss, be aware that it's possible to boot your system with > file systems mounted read-only; you could also boot from a USB-stick or > similar. > > If you can find a testcase in i-g-t that easily reproduces the issue > that'd also be very helpful. Do note that not all testcases in i-g-t > are run as part of our nightly tests, since some of them are *extremely* > time consuming; the full combinatorial testcase, for instance, can > take weeks or months--I haven't done a full run recently--to complete. > > I hope this helps you understand why bugs can slip under the radar, > and why a bisect is so important. Wow, David. Does that mean that even Intel cannot really test the driver for the hardware it supports? A bisect of a hang the machine bug that only happens after a certain time of using the computer and switching between sessions then is too expensive for me. Thats the whole point I tried to make. The *cost* to provide a *useful* bug report is too high. You say you can´t cover this with a test case – I think switching between sessions *could* be automated – and then you ask for help, yet to provide this help an effort is needed that is beyond what I am willing to invest and which IMHO is beyond what many users are willing to invest. See: It would easily take 10-15 iterations as far as I remember from my last bisect. And I´d either risk data loss *or* I´d use a live linux which means that during that time I can´t use the machine for productive work. *Each time* it may take anything between 10 minutes and several hours for the issue to appear. I´d need to reboot, compile the next kernel, either copy it to USB drive and boot from there or install it, and then repeat the testing steps. I bet that would take me about one or two complete days to eventually find the offending commit. I would not feel comfortable asking my employer for these one to two days to do that work and my leisure time is also too valuable to me and to full with other things to reduce it by that amount of time for every bug like this. Next week I hold a training, since in this particular case with 4.10 it appears – I didn´t verify it – that "just" the gfx driver is broken, I might be able to log in into the machine from a training workstation, so… I could at least try to obtain some kind of gpu state dump, while I do most of my work on the training workstation anyway. I remember Jani having told that backtraces are mostly useless (then why bother to do them at all instead of just logging "gfx driver crashed, use tool xyz to obtain debug info"?) and there is a new way to dump the state of the gpu when it is hung. But a bisect of an issue like this is an effort that is exceeding what I am willing to put into it. And I think I am not alone with it. With other issues like hangs during resume and on waking up that happen occassionally I have given up already. I don´t even remember in which kernels it started and it is even more costly to bisect. I actually don´t even remember whether it worked okay at all since I gave up on compiling TuxOnIce into every kernel. I am giving up here as well now, unless there is a way to provide you with sufficient debug information without doing a bisect here, i.e. by a gpu state dump or something like that. Upto to now on Linux there does not seem to be a gfx driver that either *never* *ever* hangs, or at least manages to put out enough debug information, if need be even onto a plain block device, in order to create a useful bug report. Added to that the development speed of one new kernel every three months I see no realistic chance to keep the driver in a fully working state for the hardware it supports. The effort to toroughly bisect every nasty bug like this would just be too high. If invested with every hang bug the current kernels have… – I have seen 4 different issues in 4.9 and 4.10 *just* on this ThinkPad T520 – it may even exceed the development time. So until at some time the effort needed to provide a *useful* bug report can be reduced, I am out. I am willing to spend some hours into it, but not some days for every single hang sometimes issue. If you ask me instabilities like this… like also the instabilities within Plasma / KWin which where related to Intel driver bugs, are one reason why Linux still is not yet ready for the desktop. Sorry, -- Martin ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-02-11 14:55 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-01-22 21:32 Linux 4.10-rc5 Linus Torvalds 2017-01-25 12:10 ` [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5) Martin Steigerwald 2017-02-01 13:11 ` [Intel-gfx] " David Weinehall 2017-02-11 14:55 ` Martin Steigerwald
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).