linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Linux 4.10-rc5
@ 2017-01-22 21:32 Linus Torvalds
  2017-01-25 12:10 ` [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5) Martin Steigerwald
  0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2017-01-22 21:32 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Things seem to be calming down a bit, and everything looks nominal.

There's only been about 250 changes (not counting merges) in the last
week, and the diffstat touches less than 300 files (with drivers and
architecture updates being the bulk, but there's tooling, networking
and filesystems in there too).

So keep testing, and I think we'll have a regular release schedule.

                   Linus

---

Adam Ford (2):
      ARM: OMAP2+: Fix WL1283 Bluetooth Baud Rate
      ARM: dts: omap3: Fix Card Detect and Write Protect on Logic PD SOM-LV

Alexander Graf (1):
      arm64: Fix swiotlb fallback allocation

Alexandre Belloni (1):
      usb: gadget: udc: atmel: remove memory leak

Amelie Delaunay (1):
      usb: dwc2: gadget: Fix GUSBCFG.USBTRDTIM value

Amir Goldstein (7):
      xfs: make the ASSERT() condition likely
      xfs: sanity check directory inode di_size
      xfs: add missing include dependencies to xfs_dir2.h
      xfs: replace xfs_mode_to_ftype table with switch statement
      xfs: sanity check inode mode when creating new dentry
      xfs: sanity check inode di_mode
      ovl: fix possible use after free on redirect dir lookup

Andrey Smirnov (1):
      at86rf230: Allow slow GPIO pins for "rstn"

Andy Shevchenko (2):
      spi: dw-mid: switch to new dmaengine_terminate_* API (part 2)
      spi: pxa2xx: add missed break

Aneesh Kumar K.V (2):
      powerpc/mm/hugetlb: Don't panic when we don't find the default
huge page size
      powerpc/mm: Fix little-endian 4K hugetlb

Anton Blanchard (1):
      powerpc: Ignore reserved field in DCSR and PVR reads and writes

Arkadi Sharshevsky (2):
      mlxsw: spectrum: Fix memory leak at skb reallocation
      mlxsw: switchx2: Fix memory leak at skb reallocation

Arnd Bergmann (5):
      ARM: ux500: fix prcmu_is_cpu_in_wfi() calculation
      cpmac: remove hopeless #warning
      net/mlx5e: Fix a -Wmaybe-uninitialized warning
      ubifs: add CONFIG_BLOCK dependency for encryption
      xfs: fix xfs_mode_to_ftype() prototype

Bart Van Assche (4):
      qla2xxx: Fix indentation
      qla2xxx: Declare an array with file scope static
      qla2xxx: Move two arrays from header files to .c files
      qla2xxx: Avoid that building with W=1 triggers complaints about
set-but-not-used variables

Basil Gunn (1):
      ax25: Fix segfault after sock connection timeout

Beni Lev (1):
      cfg80211: consider VHT opmode on station update

Benjamin Coddington (1):
      nfs: Don't take a reference on fl->fl_file for LOCK operation

Benjamin Herrenschmidt (1):
      powerpc/icp-opal: Fix missing KVM case and harden replay

Bhumika Goyal (2):
      vhost: scsi: constify target_core_fabric_ops structures
      virtio/s390: virtio: constify virtio_config_ops structures

Bjorn Helgaas (2):
      x86/PCI: Ignore _CRS on Supermicro X8DTH-i/6/iF/6F
      PCI: Enumerate switches below PCI-to-PCIe bridges

Brian Norris (2):
      thermal: rockchip: improve conversion error messages
      thermal: rockchip: don't pass table structs by value

Bryant G. Ly (2):
      ibmvscsis: Fix max transfer length
      ibmvscsis: Fix sleeping in interrupt context

Caesar Wang (4):
      thermal: rockchip: fixes invalid temperature case
      thermal: rockchip: optimize the conversion table
      thermal: rockchip: handle set_trips without the trip points
      thermal: rockchip: fixes the conversion table

Cedric Izoard (1):
      mac80211: Fix headroom allocation when forwarding mesh pkt

Chen-Yu Tsai (2):
      ARM: dts: sun6i: Disable display pipeline by default
      ARM: dts: sun6i: hummingbird: Enable display engine again

Christian Borntraeger (1):
      KVM: s390: do not expose random data via facility bitmap

Christoffer Dall (1):
      KVM: arm/arm64: Fix occasional warning from the timer work function

Christoph Hellwig (2):
      scsi: qla2xxx: fix MSI-X vector affinity
      scsi: qla2xxx: remove irq_affinity_notifier

Christophe JAILLET (2):
      spi: spi-axi: Free resources on error path
      usb: gadget: composite: Fix function used to free memory

Colin Ian King (3):
      spi: armada-3700: fix unsigned compare than zero on irq
      ubifs: ensure zero err is returned on successful return
      virtio/s390: add missing \n to end of dev_err message

Damien Le Moal (2):
      scsi: sd: Fix wrong DPOFUA disable in sd_read_cache_type
      scsi: sd: Ignore zoned field for host-managed devices

Dan Carpenter (2):
      spi: armada-3700: Set mode bits correctly
      vhost/scsi: silence uninitialized variable warning

Dan Williams (1):
      libnvdimm, namespace: fix pmem namespace leak, delete when size
set to zero

Daniel Borkmann (1):
      bpf: rework prog_digest into prog_tag

Dave Jones (1):
      scsi: qla2xxx: Fix apparent cut-n-paste error.

Dave Martin (7):
      arm64/ptrace: Preserve previous registers for short regset write
      arm64/ptrace: Preserve previous registers for short regset write
      arm64/ptrace: Preserve previous registers for short regset write
      arm64/ptrace: Avoid uninitialised struct padding in fpr_set()
      arm64/ptrace: Reject attempts to set incomplete hardware breakpoint fields
      powerpc/ptrace: Preserve previous fprs/vsrs on short regset write
      powerpc/ptrace: Preserve previous TM fprs/vsrs on short regset write

David Ahern (2):
      net: lwtunnel: Handle lwtunnel_fill_encap failure
      net: ipv4: fix table id in getroute response

David Lebrun (1):
      ipv6: sr: fix several BUGs when preemption is enabled

David Sheets (1):
      fuse: fix time_to_jiffies nsec sanity check

Dmitry Vyukov (1):
      KVM: x86: fix fixing of hypercalls

Elad Raz (1):
      mlxsw: pci: Fix EQE structure definition

Emmanuel Grumbach (1):
      mac80211: fix the TID on NDPs sent as EOSP carrier

Emmanuel Vadot (1):
      ARM: dts: sunxi: Change node name for pwrseq pin on Olinuxino-lime2-emmc

Eric Biggers (2):
      ubifs: allow encryption ioctls in compat mode
      ubifs: remove redundant checks for encryption key

Eric Dumazet (1):
      mlx4: do not call napi_schedule() without care

Eric Sandeen (1):
      xfs: don't wrap ID in xfs_dq_get_next_id

Ewan D. Milne (1):
      scsi: ses: Fix SAS device detection in enclosure

Fabien Parent (1):
      ARM: dts: da850-evm: fix read access to SPI flash

Fabio Estevam (1):
      thermal: thermal_hwmon: Convert to hwmon_device_register_with_info()

Fam Zheng (1):
      scsi: libfc: Fix variable name in fc_set_wwpn

Felix Fietkau (1):
      mac80211: initialize SMPS field in HT capabilities

Florian Fainelli (1):
      net: systemport: Decouple flow control from __bcm_sysport_tx_reclaim

G. Campana (1):
      virtio_console: fix a crash in config_work_handler

Gary Bisson (2):
      ARM: dts: imx6qdl-nitrogen6_max: fix sgtl5000 pinctrl init
      ARM: dts: imx6qdl-nitrogen6_som2: fix sgtl5000 pinctrl init

Gavin Shan (1):
      powerpc/eeh: Enable IO path on permanent error

Geert Uytterhoeven (1):
      spi: SPI_FSL_DSPI should depend on HAS_DMA

Halil Pasic (2):
      tools/virtio/ringtest: fix run-on-all.sh for offline cpus
      tools/virtio/ringtest: tweaks for s390

Hangbin Liu (1):
      mld: do not remove mld souce list info when set link down

Hans de Goede (1):
      mmc: sdhci-acpi: Only powered up enabled acpi child devices

Hauke Mehrtens (2):
      mtd: nand: xway: disable module support
      mtd: nand: xway: fix build because of module functions

Heiko Carstens (2):
      s390/ctl_reg: make __ctl_load a full memory barrier
      s390: update defconfigs

Heiner Kallweit (1):
      net: stmmac: don't use netdev_[dbg, info, ..] before net_device
is registered

Heinrich Schuchardt (1):
      MMC: meson: avoid possible NULL dereference

Himanshu Madhani (3):
      qla2xxx: Include ATIO queue in firmware dump when in target mode
      qla2xxx: Set tcm_qla2xxx version to automatically track qla2xxx version
      qla2xxx: Reset reserved field in firmware options to 0

Ilya Dryomov (1):
      libceph: make sure ceph_aes_crypt() IV is aligned

Ivan Vecera (3):
      be2net: fix status check in be_cmd_pmac_add()
      be2net: don't delete MAC on close on unprivileged BE3 VFs
      be2net: fix MAC addr setting on privileged BE3 VFs

J. Bruce Fields (2):
      nfsd: fix supported attributes for acl & labels
      svcrpc: don't leak contexts on PROC_DESTROY

Jack Morgenstein (3):
      net/mlx4_core: Fix racy CQ (Completion Queue) free
      net/mlx4_core: Fix when to save some qp context flags for
dynamic VST to VGT transitions
      net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV

Jacob von Chorus (1):
      thermal: core: move tz->device.groups cleanup to thermal_release

Jakub Sitnicki (1):
      ip6_tunnel: Account for tunnel header in tunnel MTU

Jamal Hadi Salim (1):
      net sched actions: fix refcnt when GETing of action after bind

James Bottomley (1):
      scsi: mpt3sas: fix hang on ata passthrough commands

Jason Gerecke (1):
      HID: wacom: Fix sibling detection regression

Jean-Jacques Hiblot (1):
      ARM: dts: OMAP5 / DRA7: indicate that SATA port 0 is available.

Jeff Layton (3):
      ceph: fix endianness of getattr mask in ceph_d_revalidate
      ceph: fix endianness bug in frag_tree_split_cmp
      ceph: fix bad endianness handling in parse_reply_info_extra

Jintack Lim (1):
      KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems

Johan Hovold (2):
      HID: corsair: fix DMA buffers on stack
      HID: corsair: fix control-transfer error handling

Johannes Berg (3):
      mac80211: implement multicast forwarding on fast-RX path
      mac80211: calculate min channel width correctly
      mac80211: recalculate min channel width on VHT opmode changes

Johannes Thumshirn (2):
      scsi: bfa: fix wrongly initialized variable in
bfad_im_bsg_els_ct_request()
      scsi: lpfc: Set elsiocb contexts to NULL after freeing it

John Stultz (1):
      usb: dwc2: Avoid suspending if we're in gadget mode

Jon Mason (1):
      ARM: dts: NSP: Fix DT ranges error

Joonyoung Shim (1):
      clocksource/exynos_mct: Clear interrupt when cpu is shut down

Josef Bacik (1):
      nbd: only set MSG_MORE when we have more to send

Karicheri, Muralidharan (1):
      net: phy: dp83867: allow RGMII_TXID/RGMII_RXID interface types

Kazuya Mizuguchi (1):
      ravb: Remove Rx overflow log messages

Keith Busch (1):
      blk-mq: Remove unused variable

Kevin Hilman (1):
      spi: davinci: use dma_mapping_error()

Krzysztof Kozlowski (2):
      MAINTAINERS: Add Patchwork URL to Samsung Exynos entry
      ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_*

Lance Richardson (1):
      openvswitch: maintain correct checksum state in conntrack actions

Larry Finger (1):
      taint/module: Fix problems when out-of-kernel driver defines true or false

Leo Yan (1):
      usb: dwc2: use u32 for DT binding parameters

Linus Torvalds (1):
      Linux 4.10-rc5

Linus Walleij (1):
      ARM: 8613/1: Fix the uaccess crash on PB11MPCore

Lokesh Vutla (1):
      ARM: dts: am335x-icev2: Remove the duplicated pinmux setting

Madhavan Srinivasan (3):
      powerpc/perf: Fix PM_BRU_CMPL event code for power9
      selftest/powerpc: Wrong PMC initialized in pmc56_overflow test
      powerpc/perf: Use MSR to report privilege level on P9 DD1

Marc Gonzalez (2):
      mtd: nand: tango: Update DT binding description
      mtd: nand: tango: Reset pbus to raw mode in probe

Marc Zyngier (2):
      KVM: arm/arm64: vgic: Fix deadlock on error handling
      PCI/MSI: pci-xgene-msi: Fix CPU hotplug registration handling

Marek Szyprowski (1):
      clk/samsung: exynos542x: mark some clocks as critical

Mark Rutland (2):
      ARM: 8634/1: hw_breakpoint: blacklist Scorpion CPUs
      arm64: avoid returning from bad_mode

Martynas Pumputis (1):
      vxlan: Set ports in flow key when doing route lookups

Masahiro Yamada (1):
      ARM, ARM64: dts: drop "arm,amba-bus" in favor of "simple-bus" part 3

Masami Hiramatsu (3):
      perf probe: Fix to show correct locations for events on modules
      perf probe: Add error checks to offline probe post-processing
      perf probe: Fix to probe on gcc generated functions in modules

Masaru Nagai (1):
      ravb: do not use zero-length alignment DMA descriptor

Mathias Nyman (1):
      xhci: remove WARN_ON if dma mask is not set for platform devices

Michal Kazior (1):
      mac80211: prevent skb/txq mismatch

Michal Simek (1):
      ARM64: zynqmp: Fix W=1 dtc 1.4 warnings

Milan P. Gandhi (1):
      scsi: qla2xxx: Get mutex lock before checking optrom_state

Milo Kim (1):
      ARM: dts: sun8i: Support DTB build for NanoPi M1

Moritz Fischer (1):
      ARM64: zynqmp: Fix i2c node's compatible string

Murali Karicheri (1):
      PCI: designware: Check for iATU unroll only on platforms that use ATU

Neil Armstrong (1):
      ARM64: dts: meson-gxbb-odroidc2: Disable SCPI DVFS

Nicholas Mc Guire (1):
      usb: dwc2: host: fix Wmaybe-uninitialized warning

Nicholas Piggin (1):
      powerpc: Fix pgtable pmd cache init

Nicolas Dichtel (1):
      ARM: put types.h in uapi

Nikita Yushchenko (1):
      swiotlb: ensure that page-sized mappings are page-aligned

Oleksandr Andrushchenko (1):
      arm64: mm: avoid name clash in __page_to_voff()

Parthasarathy Bhuvaragan (1):
      tipc: allocate user memory with GFP_KERNEL flag

Paul E. McKenney (2):
      rcu: Remove cond_resched() from Tiny synchronize_sched()
      rcu: Narrow early boot window of illegal synchronous grace periods

Peter Rosin (1):
      ubifs: fix unencrypted journal write

Peter Ujfalusi (1):
      ARM: OMAP1: DMA: Correct the number of logical channels

Phil Reid (1):
      spi: dw: Make debugfs name unique between instances

Pierre Morel (1):
      virtio/s390: support READ_STATUS command for virtio-ccw

Quinn Tran (7):
      qla2xxx: Fix wrong IOCB type assumption
      qla2xxx: Collect additional information to debug fw dump
      qla2xxx: Fix crash due to null pointer access
      qla2xxx: Terminate exchange if corrupted
      qla2xxx: Reduce exess wait during chip reset
      qla2xxx: Fix erroneous invalid handle message
      qla2xxx: Disable out-of-order processing by default in firmware

Rabin Vincent (1):
      ARM: 8632/1: ftrace: fix syscall name matching

Randy Dunlap (1):
      mtd: nand: oxnas_nand: fix build errors on arch/um, require HAS_IOMEM

Reza Arbab (1):
      powerpc/mm: Fix memory hotplug BUG() on radix

Richard Weinberger (1):
      ubifs: Fix journal replay wrt. xattr nodes

Roberto Sassu (1):
      scsi: lpfc: avoid double free of resource identifiers

Ruslan Ruslichenko (1):
      x86/ioapic: Restore IO-APIC irq_chip retrigger callback

Russell King (1):
      MAINTAINERS: update rmk's entries

Scott Mayhew (1):
      sunrpc: don't call sleeping functions from the notifier block callbacks

Sedat Dilek (1):
      perf/x86/amd/ibs: Fix typo after cleanup state names in cpu/hotplug

Sekhar Nori (1):
      ARM: dts: dra72-evm-revc: fix typo in ethernet-phy node

Shannon Nelson (1):
      tcp: fix tcp_fastopen unaligned access complaints on sparc

Shuah Khan (1):
      usb: dwc3: exynos fix axius clock error path to do cleanup

Simon Horman (2):
      spi: sh-msiof: Add R-Car Gen 2 and 3 fallback bindings
      spi: sh-msiof: Do not use C++ style comment

Sriharsha Basavapatna (1):
      svcrdma: avoid duplicate dma unmapping during error recovery

Stefan Hajnoczi (1):
      pmem: return EIO on read_pmem() failure

Stefan Schmidt (4):
      ieee802154: atusb: do not use the stack for buffers to make them DMA able
      ieee802154: atusb: make sure we set a randaom extended address
if fetching fails
      ieee802154: atusb: do not use the stack for address fetching to
make it DMA able
      ieee802154: atusb: fix driver to work with older firmware versions

Stefan Wahren (1):
      mmc: mxs-mmc: Fix additional cycles after transmission stop

Stefano Stabellini (1):
      partially revert "xen: Remove event channel notification through
Xen PCI platform device"

Tahsin Erdogan (1):
      fuse: clear FR_PENDING flag when moving requests out of pending queue

Thomas Gleixner (1):
      cpu/hotplug: Provide dynamic range for prepare stage

Timur Tabi (1):
      net: qcom/emac: grab a reference to the phydev on ACPI systems

Tobias Klauser (1):
      cpu/hotplug: Remove unused but set variable in _cpu_down()

Trond Myklebust (5):
      NFSv4: Call update_changeattr() from _nfs4_proc_open only if a
file was created
      NFSv4: Don't apply change_info4 twice on rename within a directory
      NFSv4: Don't call update_changeattr() unless the unlink is successful
      NFSv4: update_changeattr should update the attribute timestamp
      NFSv4: Fix client recovery when server reboots multiple times

Ulf Hansson (1):
      mmc: core: Restore parts of the polling policy when switch to HS/HS DDR

Vadim Lomovtsev (1):
      net: thunderx: acpi: fix LMAC initialization

Valentin Rothberg (2):
      ARM: multi_v7_defconfig: fix config typo
      ARM: multi_v7_defconfig: set bcm47xx watchdog

Vardan Mikayelyan (1):
      usb: dwc2: gadget: Fix DMA memory freeing

Vincent Pelletier (1):
      usb: gadget: f_fs: Fix iterations on endpoints.

Vineet Gupta (8):
      ARC: mmu: clarify the MMUv3 programming model
      ARCv2: save r30 on kernel entry as gcc uses it for code-gen
      ARC: module: Fix !CONFIG_ARC_DW2_UNWIND builds
      ARCv2: IOC: refactor the IOC and SLC operations into own functions
      ARCv2: IOC: Adhere to progamming model guidelines to avoid DMA corruption
      ARCv2: IOC: Use actual memory size to setup aperture size
      ARC: mm: split arc_cache_init to allow __init reaping of bulk
      ARC: Revert "ARC: mm: IOC: Don't enable IOC by default"

Vladimir Zapolskiy (1):
      mtd: nand: lpc32xx: fix invalid error handling of a requested irq

Wei Yongjun (1):
      soc: ti: wkup_m3_ipc: Fix error return code in wkup_m3_ipc_probe()

Yan, Zheng (1):
      ceph: fix ceph_get_caps() interruption

Yuriy Kolerov (2):
      ARC: IRQ: Use hwirq instead of virq in mask/unmask
      ARCv2: IRQ: Call entry/exit functions for chained handlers in MCIP

Zhou Chengming (1):
      perf/x86/intel: Handle exclusive threadid correctly on CPU hotplug

hayeswang (1):
      r8152: fix the sw rx checksum is unavailable

stephen hemminger (1):
      netvsc: add rcu_read locking to netvsc callback

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5)
  2017-01-22 21:32 Linux 4.10-rc5 Linus Torvalds
@ 2017-01-25 12:10 ` Martin Steigerwald
  2017-02-01 13:11   ` [Intel-gfx] " David Weinehall
  0 siblings, 1 reply; 4+ messages in thread
From: Martin Steigerwald @ 2017-01-25 12:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Linus Torvalds, Intel Gfx Mailing List, Jani Nikula

Am Sonntag, 22. Januar 2017, 13:32:08 CET schrieb Linus Torvalds:
> Things seem to be calming down a bit, and everything looks nominal.
> 
> There's only been about 250 changes (not counting merges) in the last
> week, and the diffstat touches less than 300 files (with drivers and
> architecture updates being the bulk, but there's tooling, networking
> and filesystems in there too).
> 
> So keep testing, and I think we'll have a regular release schedule.

Testing this is no fun:

Bug 99533 - black screen after switching session
https://bugs.freedesktop.org/99533


This after GPU hang/lockups with Kernel 4.9 reported as for example:

Bug 98922 - [snb] GPU hang on PlaneShift
https://bugs.freedesktop.org/98922

Which may be a duplicate of #98747, #98794, #98860, #98891, #98288.


I am back at kernel 4.8.15 as I need this machine for production work.

Sometimes I wish for a microkernel that might be able to reincarnate drivers 
that hang or do wierd things like that. That may at least give a way to 
actually do some debugging or even get the desktop session back without 
loosing its state. Especially for graphics drivers and hibernating/resuming 
from hibernations which also occasionally fails – again without leaving a way 
to interact with the machine to do further debugging. Linux kernel usually 
just crashes completely, not even a ping or ssh possible, or it at least stuck 
with a black display without any way to restart the graphics driver cause it 
seems to be in some undefined state. Combined with occasionally happening bugs 
this makes triaging bugs time consuming and risky. I do like to help testing, 
but maybe its time to just switch to distro kernels and be done about it, as I 
regularily come across bugs that are too expensive for me to triage.

Please understand that I am not willing to bisect these occasionally happening 
bugs with have the potential to cause data loss due to having to switch off 
the machine forcefully. Fortunately at least KMail saves a mail I write from 
time to time and also Kate does swap files.

I am also a bit unwilling to do further debugging of this one as I usually use 
two sessions when I am at work and I risk loosing data I work on. But… at 
least with this issue it seems I would have a way to SSH into the machine 
before kicking it.


I am dissatisfied with the state of the Intel graphics driver on this ThinkPad 
T520 with Sandybridge since kernel 4.9 and wonder whether you guys at Intel 
really test things with older hardware versions.

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Intel-gfx] [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5)
  2017-01-25 12:10 ` [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5) Martin Steigerwald
@ 2017-02-01 13:11   ` David Weinehall
  2017-02-11 14:55     ` Martin Steigerwald
  0 siblings, 1 reply; 4+ messages in thread
From: David Weinehall @ 2017-02-01 13:11 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Linux Kernel Mailing List, Intel Gfx Mailing List, Linus Torvalds

On Wed, Jan 25, 2017 at 01:10:26PM +0100, Martin Steigerwald wrote:
> Am Sonntag, 22. Januar 2017, 13:32:08 CET schrieb Linus Torvalds:
> > Things seem to be calming down a bit, and everything looks nominal.
> > 
> > There's only been about 250 changes (not counting merges) in the last
> > week, and the diffstat touches less than 300 files (with drivers and
> > architecture updates being the bulk, but there's tooling, networking
> > and filesystems in there too).
> > 
> > So keep testing, and I think we'll have a regular release schedule.
> 
> Testing this is no fun:
> 
> Bug 99533 - black screen after switching session
> https://bugs.freedesktop.org/99533
> 
> 
> This after GPU hang/lockups with Kernel 4.9 reported as for example:
> 
> Bug 98922 - [snb] GPU hang on PlaneShift
> https://bugs.freedesktop.org/98922
> 
> Which may be a duplicate of #98747, #98794, #98860, #98891, #98288.
> 
> 
> I am back at kernel 4.8.15 as I need this machine for production work.
> 
> Sometimes I wish for a microkernel that might be able to reincarnate drivers 
> that hang or do wierd things like that. That may at least give a way to 
> actually do some debugging or even get the desktop session back without 
> loosing its state. Especially for graphics drivers and hibernating/resuming 
> from hibernations which also occasionally fails – again without leaving a way 
> to interact with the machine to do further debugging. Linux kernel usually 
> just crashes completely, not even a ping or ssh possible, or it at least stuck 
> with a black display without any way to restart the graphics driver cause it 
> seems to be in some undefined state. Combined with occasionally happening bugs 
> this makes triaging bugs time consuming and risky. I do like to help testing, 
> but maybe its time to just switch to distro kernels and be done about it, as I 
> regularily come across bugs that are too expensive for me to triage.
> 
> Please understand that I am not willing to bisect these occasionally happening 
> bugs with have the potential to cause data loss due to having to switch off 
> the machine forcefully. Fortunately at least KMail saves a mail I write from 
> time to time and also Kate does swap files.
> 
> I am also a bit unwilling to do further debugging of this one as I usually use 
> two sessions when I am at work and I risk loosing data I work on. But… at 
> least with this issue it seems I would have a way to SSH into the machine 
> before kicking it.
> 
> 
> I am dissatisfied with the state of the Intel graphics driver on this ThinkPad 
> T520 with Sandybridge since kernel 4.9 and wonder whether you guys at Intel 
> really test things with older hardware versions.

Yes, we do. But for practical reasons we can only do testing for things
that we actually have testcases for, and obviously we don't have the
manpower to actually do *manual* testing on every platform, so issues
for older platforms that are only triggered by manual interaction tend
to slip under the radar.

We have a testfarm that tests every nightly build on all platforms we
have test machines for. The testcases are publicly available here:

https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/

Obviously most of our manpower is spent on development and testing for current
and future platforms, so for issues that involve older platforms,
especially something as old as Sandybridge (which is, by now, 6 years old)
we are happy for help with testing and bisection.

If the issues are specific to certain subsets of a platform it obviously
gets even more complex; it'd be a combinatorial nightmare to build a
testfarm that could test every variation of every platform.

If I got the count right the i915 driver supports around a hundred
different varieties of Intel graphics; combine that with the number of
different displays people connect, the number of eDP display that the
vendors connect, the different BIOSes that vendors use, etc., and I
think you'll begin to see what we're combating) -- to make things even
more complex you can connect several displays to each graphics card
(possibly via adapters), displays that don't always meet the standards
that they claim to meet.  Due to limited room we are also a bit limited
when it comes to testing with multi-monitor setups.

This is why any help is welcome and sometimes even necessary. If you're
afraid of dataloss, be aware that it's possible to boot your system with
file systems mounted read-only; you could also boot from a USB-stick or
similar.

If you can find a testcase in i-g-t that easily reproduces the issue
that'd also be very helpful. Do note that not all testcases in i-g-t
are run as part of our nightly tests, since some of them are *extremely*
time consuming; the full combinatorial testcase, for instance, can
take weeks or months--I haven't done a full run recently--to complete.

I hope this helps you understand why bugs can slip under the radar,
and why a bisect is so important.


Kind regards, David Weinehall

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Intel-gfx] [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5)
  2017-02-01 13:11   ` [Intel-gfx] " David Weinehall
@ 2017-02-11 14:55     ` Martin Steigerwald
  0 siblings, 0 replies; 4+ messages in thread
From: Martin Steigerwald @ 2017-02-11 14:55 UTC (permalink / raw)
  To: David Weinehall
  Cc: Linux Kernel Mailing List, Intel Gfx Mailing List, Linus Torvalds

Am Mittwoch, 1. Februar 2017, 14:11:22 CET schrieb David Weinehall:
> On Wed, Jan 25, 2017 at 01:10:26PM +0100, Martin Steigerwald wrote:
> > Am Sonntag, 22. Januar 2017, 13:32:08 CET schrieb Linus Torvalds:
> > > Things seem to be calming down a bit, and everything looks nominal.
> > > 
> > > There's only been about 250 changes (not counting merges) in the last
> > > week, and the diffstat touches less than 300 files (with drivers and
> > > architecture updates being the bulk, but there's tooling, networking
> > > and filesystems in there too).
> > > 
> > > So keep testing, and I think we'll have a regular release schedule.
> > 
> > Testing this is no fun:
> > 
> > Bug 99533 - black screen after switching session
> > https://bugs.freedesktop.org/99533
> > 
> > 
> > This after GPU hang/lockups with Kernel 4.9 reported as for example:
> > 
> > Bug 98922 - [snb] GPU hang on PlaneShift
> > https://bugs.freedesktop.org/98922
> > 
> > Which may be a duplicate of #98747, #98794, #98860, #98891, #98288.
> > 
> > 
> > I am back at kernel 4.8.15 as I need this machine for production work.
> > 
> > Sometimes I wish for a microkernel that might be able to reincarnate
> > drivers that hang or do wierd things like that. That may at least give a
> > way to actually do some debugging or even get the desktop session back
> > without loosing its state. Especially for graphics drivers and
> > hibernating/resuming from hibernations which also occasionally fails –
> > again without leaving a way to interact with the machine to do further
> > debugging. Linux kernel usually just crashes completely, not even a ping
> > or ssh possible, or it at least stuck with a black display without any
> > way to restart the graphics driver cause it seems to be in some undefined
> > state. Combined with occasionally happening bugs this makes triaging bugs
> > time consuming and risky. I do like to help testing, but maybe its time
> > to just switch to distro kernels and be done about it, as I regularily
> > come across bugs that are too expensive for me to triage.
> > 
> > Please understand that I am not willing to bisect these occasionally
> > happening bugs with have the potential to cause data loss due to having
> > to switch off the machine forcefully. Fortunately at least KMail saves a
> > mail I write from time to time and also Kate does swap files.
> > 
> > I am also a bit unwilling to do further debugging of this one as I usually
> > use two sessions when I am at work and I risk loosing data I work on.
> > But… at least with this issue it seems I would have a way to SSH into the
> > machine before kicking it.
> > 
> > 
> > I am dissatisfied with the state of the Intel graphics driver on this
> > ThinkPad T520 with Sandybridge since kernel 4.9 and wonder whether you
> > guys at Intel really test things with older hardware versions.
> 
> Yes, we do. But for practical reasons we can only do testing for things
> that we actually have testcases for, and obviously we don't have the
> manpower to actually do *manual* testing on every platform, so issues
> for older platforms that are only triggered by manual interaction tend
> to slip under the radar.
> 
> We have a testfarm that tests every nightly build on all platforms we
> have test machines for. The testcases are publicly available here:
> 
> https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/
> 
> Obviously most of our manpower is spent on development and testing for
> current and future platforms, so for issues that involve older platforms,
> especially something as old as Sandybridge (which is, by now, 6 years old)
> we are happy for help with testing and bisection.
> 
> If the issues are specific to certain subsets of a platform it obviously
> gets even more complex; it'd be a combinatorial nightmare to build a
> testfarm that could test every variation of every platform.
> 
> If I got the count right the i915 driver supports around a hundred
> different varieties of Intel graphics; combine that with the number of
> different displays people connect, the number of eDP display that the
> vendors connect, the different BIOSes that vendors use, etc., and I
> think you'll begin to see what we're combating) -- to make things even
> more complex you can connect several displays to each graphics card
> (possibly via adapters), displays that don't always meet the standards
> that they claim to meet.  Due to limited room we are also a bit limited
> when it comes to testing with multi-monitor setups.
> 
> This is why any help is welcome and sometimes even necessary. If you're
> afraid of dataloss, be aware that it's possible to boot your system with
> file systems mounted read-only; you could also boot from a USB-stick or
> similar.
> 
> If you can find a testcase in i-g-t that easily reproduces the issue
> that'd also be very helpful. Do note that not all testcases in i-g-t
> are run as part of our nightly tests, since some of them are *extremely*
> time consuming; the full combinatorial testcase, for instance, can
> take weeks or months--I haven't done a full run recently--to complete.
> 
> I hope this helps you understand why bugs can slip under the radar,
> and why a bisect is so important.

Wow, David. Does that mean that even Intel cannot really test the driver for 
the hardware it supports?

A bisect of a hang the machine bug that only happens after a certain time of 
using the computer and switching between sessions then is too expensive for 
me. Thats the whole point I tried to make. The *cost* to provide a *useful* 
bug report is too high.

You say you can´t cover this with a test case – I think switching between 
sessions *could* be automated – and then you ask for help, yet to provide this 
help an effort is needed that is beyond what I am willing to invest and which 
IMHO is beyond what many users are willing to invest. See:

It would easily take 10-15 iterations as far as I remember from my last 
bisect. And I´d either risk data loss *or* I´d use a live linux which means 
that during that time I can´t use the machine for productive work. *Each time* 
it may take anything between 10 minutes and several hours for the issue to 
appear. I´d need to reboot, compile the next kernel, either copy it to USB 
drive and boot from there or install it, and then repeat the testing steps.

I bet that would take me about one or two complete days to eventually find the 
offending commit. I would not feel comfortable asking my employer for these 
one to two days to do that work and my leisure time is also too valuable to me 
and to full with other things to reduce it by that amount of time for every 
bug like this.

Next week I hold a training, since in this particular case with 4.10 it 
appears – I didn´t verify it – that "just" the gfx driver is broken, I might 
be able to log in into the machine from a training workstation, so… I could at 
least try to obtain some kind of gpu state dump, while I do most of my work on 
the training workstation anyway. I remember Jani having told that backtraces 
are mostly useless (then why bother to do them at all instead of just logging 
"gfx driver crashed, use tool xyz to obtain debug info"?) and there is a new 
way to dump the state of the gpu when it is hung.

But a bisect of an issue like this is an effort that is exceeding what I am 
willing to put into it. And I think I am not alone with it.

With other issues like hangs during resume and on waking up that happen 
occassionally I have given up already. I don´t even remember in which kernels 
it started and it is even more costly to bisect. I actually don´t even 
remember whether it worked okay at all since I gave up on compiling TuxOnIce 
into every kernel.

I am giving up here as well now, unless there is a way to provide you with 
sufficient debug information without doing a bisect here, i.e. by a gpu state 
dump or something like that.

Upto to now on Linux there does not seem to be a gfx driver that either 
*never* *ever* hangs, or at least manages to put out enough debug information, 
if need be even onto a plain block device, in order to create a useful bug 
report.

Added to that the development speed of one new kernel every three months I see 
no realistic chance to keep the driver in a fully working state for the 
hardware it supports. The effort to toroughly bisect every nasty bug like this 
would just be too high. If invested with every hang bug the current kernels 
have… – I have seen 4 different issues in 4.9 and 4.10 *just* on this ThinkPad 
T520 – it may even exceed the development time.

So until at some time the effort needed to provide a *useful* bug report can 
be reduced, I am out. I am willing to spend some hours into it, but not some 
days for every single hang sometimes issue.

If you ask me instabilities like this… like also the instabilities within 
Plasma / KWin which where related to Intel driver bugs, are one reason why 
Linux still is not yet ready for the desktop.

Sorry,
-- 
Martin

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-02-11 14:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-22 21:32 Linux 4.10-rc5 Linus Torvalds
2017-01-25 12:10 ` [REGRESSION] Black screen after switching desktop session (was: Re: Linux 4.10-rc5) Martin Steigerwald
2017-02-01 13:11   ` [Intel-gfx] " David Weinehall
2017-02-11 14:55     ` Martin Steigerwald

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).