All of lore.kernel.org
 help / color / mirror / Atom feed
* [git pull] drm fixes
@ 2015-02-27  4:42 ` Dave Airlie
  0 siblings, 0 replies; 31+ messages in thread
From: Dave Airlie @ 2015-02-27  4:42 UTC (permalink / raw)
  To: torvalds; +Cc: DRI mailing list, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5045 bytes --]


Hi Linus,

just general fixes pull, radeon, i915, atmel, tegra, amdkfd
and one core fix.

Dave.

The following changes since commit c517d838eb7d07bbe9507871fab3931deccff539:

  Linux 4.0-rc1 (2015-02-22 18:21:14 -0800)

are available in the git repository at:

  git://people.freedesktop.org/~airlied/linux drm-fixes

for you to fetch changes up to 21689a440bdc90650b7ee3803ec11cfabd21409e:

  Merge branch 'drm-atmel-hlcdc-fixes' of git://github.com/bbrezillon/linux-at91 into drm-fixes (2015-02-27 10:31:40 +1000)

----------------------------------------------------------------

Alex Deucher (6):
      drm/radeon: use drm_mode_vrefresh() rather than mode->vrefresh
      drm/radeon: disable mclk switching with 120hz+ monitors
      drm/radeon: dump full IB if we hit a packet error
      drm/radeon: fix 1 RB harvest config setup for TN/RL
      drm/radeon: fix atom aux payload size check for writes (v2)
      drm/radeon: only enable DP audio if the monitor supports it

Boris Brezillon (2):
      drm: atmel-hlcdc: reset layer A2Q and UPDATE bits when disabling it
      drm: atmel-hlcdc: remove useless pm_runtime_put_sync in probe

Chris Wilson (1):
      drm/i915: Check obj->vma_list under the struct_mutex

Christian König (2):
      drm/radeon: enable SRBM timeout interrupt on SI
      drm/radeon: enable SRBM timeout interrupt on EG/NI

Daniel Vetter (2):
      drm: Fix deadlock due to getconnector locking changes
      drm/i915: Align initial plane backing objects correctly

Dave Airlie (5):
      Merge tag 'drm/tegra/for-3.20-rc1-fixes' of git://anongit.freedesktop.org/tegra/linux into drm-fixes
      Merge tag 'drm-amdkfd-fixes-2015-02-23' of git://people.freedesktop.org/~gabbayo/linux into drm-fixes
      Merge branch 'drm-fixes-4.0' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
      Merge tag 'drm-intel-fixes-2015-02-26' of git://anongit.freedesktop.org/drm-intel into drm-fixes
      Merge branch 'drm-atmel-hlcdc-fixes' of git://github.com/bbrezillon/linux-at91 into drm-fixes

Imre Deak (1):
      drm/i915: avoid processing spurious/shared interrupts in low-power states

Jani Nikula (2):
      drm/i915/skl: handle all pixel formats in skylake_update_primary_plane()
      drm/i915: Dell Chromebook 11 has PWM backlight

Leo Liu (1):
      drm/radeon: enable SRBM timeout interrupt on CIK v2

Nathan-J. Hirschauer (1):
      drm/radeon: enable native backlight control on old macs

Nick Hoath (1):
      drm/i915: Fix a use after free, and unbalanced refcounting

Nicolas Ferre (1):
      drm: atmel-hlcdc: remove clock polarity from crtc driver

Oded Gabbay (2):
      drm/amdkfd: Initialize only amdkfd's assigned pipelines
      drm/amdkfd: don't set get_pipes_num() as inline

Rodrigo Vivi (2):
      drm/i915/bdw: PCI IDs ending in 0xb are ULT.
      drm/i915: Fix frontbuffer false positve.

Thierry Reding (4):
      drm/tegra: hdmi: Explicitly set clock rate
      drm/tegra: dc: Reset state's active_changed field
      drm/tegra: dc: Wire up CRTC parent of atomic state
      drm/tegra: dc: Move more code into ->init()

 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 10 ++-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |  8 +--
 .../drm/amd/amdkfd/kfd_device_queue_manager_cik.c  |  2 +-
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c     |  2 +-
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c       |  2 -
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_layer.c    |  3 +-
 drivers/gpu/drm/drm_crtc.c                         |  3 +-
 drivers/gpu/drm/i915/i915_drv.h                    | 15 +++-
 drivers/gpu/drm/i915/i915_gem.c                    |  3 +-
 drivers/gpu/drm/i915/i915_gem_stolen.c             |  6 +-
 drivers/gpu/drm/i915/i915_gem_tiling.c             |  7 +-
 drivers/gpu/drm/i915/i915_irq.c                    | 22 ++++++
 drivers/gpu/drm/i915/intel_display.c               | 33 ++++++---
 drivers/gpu/drm/i915/intel_lrc.c                   |  8 +--
 drivers/gpu/drm/radeon/atombios_dp.c               |  7 ++
 drivers/gpu/drm/radeon/atombios_encoders.c         | 21 ++++--
 drivers/gpu/drm/radeon/cik.c                       |  8 +++
 drivers/gpu/drm/radeon/cikd.h                      |  4 ++
 drivers/gpu/drm/radeon/evergreen.c                 |  7 ++
 drivers/gpu/drm/radeon/evergreend.h                |  4 ++
 drivers/gpu/drm/radeon/ni.c                        | 10 +--
 drivers/gpu/drm/radeon/nid.h                       |  4 ++
 drivers/gpu/drm/radeon/r600_dpm.c                  |  2 +-
 drivers/gpu/drm/radeon/radeon_cs.c                 | 16 ++++-
 drivers/gpu/drm/radeon/radeon_encoders.c           |  3 +
 drivers/gpu/drm/radeon/radeon_pm.c                 |  6 ++
 drivers/gpu/drm/radeon/si.c                        | 22 ++++--
 drivers/gpu/drm/radeon/sid.h                       |  4 ++
 drivers/gpu/drm/tegra/dc.c                         | 79 +++++++++++-----------
 drivers/gpu/drm/tegra/hdmi.c                       |  8 +++
 include/drm/i915_pciids.h                          |  4 +-
 31 files changed, 235 insertions(+), 98 deletions(-)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [git pull] drm fixes
@ 2015-02-27  4:42 ` Dave Airlie
  0 siblings, 0 replies; 31+ messages in thread
From: Dave Airlie @ 2015-02-27  4:42 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel, DRI mailing list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5161 bytes --]


Hi Linus,

just general fixes pull, radeon, i915, atmel, tegra, amdkfd
and one core fix.

Dave.

The following changes since commit c517d838eb7d07bbe9507871fab3931deccff539:

  Linux 4.0-rc1 (2015-02-22 18:21:14 -0800)

are available in the git repository at:

  git://people.freedesktop.org/~airlied/linux drm-fixes

for you to fetch changes up to 21689a440bdc90650b7ee3803ec11cfabd21409e:

  Merge branch 'drm-atmel-hlcdc-fixes' of git://github.com/bbrezillon/linux-at91 into drm-fixes (2015-02-27 10:31:40 +1000)

----------------------------------------------------------------

Alex Deucher (6):
      drm/radeon: use drm_mode_vrefresh() rather than mode->vrefresh
      drm/radeon: disable mclk switching with 120hz+ monitors
      drm/radeon: dump full IB if we hit a packet error
      drm/radeon: fix 1 RB harvest config setup for TN/RL
      drm/radeon: fix atom aux payload size check for writes (v2)
      drm/radeon: only enable DP audio if the monitor supports it

Boris Brezillon (2):
      drm: atmel-hlcdc: reset layer A2Q and UPDATE bits when disabling it
      drm: atmel-hlcdc: remove useless pm_runtime_put_sync in probe

Chris Wilson (1):
      drm/i915: Check obj->vma_list under the struct_mutex

Christian König (2):
      drm/radeon: enable SRBM timeout interrupt on SI
      drm/radeon: enable SRBM timeout interrupt on EG/NI

Daniel Vetter (2):
      drm: Fix deadlock due to getconnector locking changes
      drm/i915: Align initial plane backing objects correctly

Dave Airlie (5):
      Merge tag 'drm/tegra/for-3.20-rc1-fixes' of git://anongit.freedesktop.org/tegra/linux into drm-fixes
      Merge tag 'drm-amdkfd-fixes-2015-02-23' of git://people.freedesktop.org/~gabbayo/linux into drm-fixes
      Merge branch 'drm-fixes-4.0' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
      Merge tag 'drm-intel-fixes-2015-02-26' of git://anongit.freedesktop.org/drm-intel into drm-fixes
      Merge branch 'drm-atmel-hlcdc-fixes' of git://github.com/bbrezillon/linux-at91 into drm-fixes

Imre Deak (1):
      drm/i915: avoid processing spurious/shared interrupts in low-power states

Jani Nikula (2):
      drm/i915/skl: handle all pixel formats in skylake_update_primary_plane()
      drm/i915: Dell Chromebook 11 has PWM backlight

Leo Liu (1):
      drm/radeon: enable SRBM timeout interrupt on CIK v2

Nathan-J. Hirschauer (1):
      drm/radeon: enable native backlight control on old macs

Nick Hoath (1):
      drm/i915: Fix a use after free, and unbalanced refcounting

Nicolas Ferre (1):
      drm: atmel-hlcdc: remove clock polarity from crtc driver

Oded Gabbay (2):
      drm/amdkfd: Initialize only amdkfd's assigned pipelines
      drm/amdkfd: don't set get_pipes_num() as inline

Rodrigo Vivi (2):
      drm/i915/bdw: PCI IDs ending in 0xb are ULT.
      drm/i915: Fix frontbuffer false positve.

Thierry Reding (4):
      drm/tegra: hdmi: Explicitly set clock rate
      drm/tegra: dc: Reset state's active_changed field
      drm/tegra: dc: Wire up CRTC parent of atomic state
      drm/tegra: dc: Move more code into ->init()

 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 10 ++-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |  8 +--
 .../drm/amd/amdkfd/kfd_device_queue_manager_cik.c  |  2 +-
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c     |  2 +-
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c       |  2 -
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_layer.c    |  3 +-
 drivers/gpu/drm/drm_crtc.c                         |  3 +-
 drivers/gpu/drm/i915/i915_drv.h                    | 15 +++-
 drivers/gpu/drm/i915/i915_gem.c                    |  3 +-
 drivers/gpu/drm/i915/i915_gem_stolen.c             |  6 +-
 drivers/gpu/drm/i915/i915_gem_tiling.c             |  7 +-
 drivers/gpu/drm/i915/i915_irq.c                    | 22 ++++++
 drivers/gpu/drm/i915/intel_display.c               | 33 ++++++---
 drivers/gpu/drm/i915/intel_lrc.c                   |  8 +--
 drivers/gpu/drm/radeon/atombios_dp.c               |  7 ++
 drivers/gpu/drm/radeon/atombios_encoders.c         | 21 ++++--
 drivers/gpu/drm/radeon/cik.c                       |  8 +++
 drivers/gpu/drm/radeon/cikd.h                      |  4 ++
 drivers/gpu/drm/radeon/evergreen.c                 |  7 ++
 drivers/gpu/drm/radeon/evergreend.h                |  4 ++
 drivers/gpu/drm/radeon/ni.c                        | 10 +--
 drivers/gpu/drm/radeon/nid.h                       |  4 ++
 drivers/gpu/drm/radeon/r600_dpm.c                  |  2 +-
 drivers/gpu/drm/radeon/radeon_cs.c                 | 16 ++++-
 drivers/gpu/drm/radeon/radeon_encoders.c           |  3 +
 drivers/gpu/drm/radeon/radeon_pm.c                 |  6 ++
 drivers/gpu/drm/radeon/si.c                        | 22 ++++--
 drivers/gpu/drm/radeon/sid.h                       |  4 ++
 drivers/gpu/drm/tegra/dc.c                         | 79 +++++++++++-----------
 drivers/gpu/drm/tegra/hdmi.c                       |  8 +++
 include/drm/i915_pciids.h                          |  4 +-
 31 files changed, 235 insertions(+), 98 deletions(-)

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
  2015-02-27  4:42 ` Dave Airlie
@ 2015-03-01  5:40   ` Linus Torvalds
  -1 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2015-03-01  5:40 UTC (permalink / raw)
  To: Dave Airlie, Daniel Vetter, Jani Nikula
  Cc: DRI mailing list, Linux Kernel Mailing List, intel-gfx

Hmm. I haven't updated the old Mac Mini I have in a *long* time, but
today I decided to try.

And it causes problems in drm.

I'm not sure how new these problems are, I think the previous kernel I
booted on this machine was 3.16. But I thought I'd better report them
as-is, because bisection on this thing takes *forever*, and it's quite
possible that you or one of the i915 guys goes "ahh, of course", so no
point in wasting time on bisection unless absolutely required.

Anyway, I have two warnings, and then had to do a work-around to avoid an oops.

Warning #1:

    ...
    fbcon: inteldrmfb (fb0) is primary device
    random: nonblocking pool is initialized
    [drm] Setting output timings on SDVOB failed
    ------------[ cut here ]------------
    WARNING: CPU: 1 PID: 15 at drivers/gpu/drm/i915/i915_gem.c:4524
i915_gem_free_object+0x22d/0x250()
    WARN_ON(obj->frontbuffer_bits)
    CPU: 1 PID: 15 Comm: kworker/u4:1 Tainted: G        W
4.0.0-rc1-00129-g6380ad5-dirty #3
    Hardware name: Apple Computer, Inc. Macmini1,1/Mac-F4208EC8, BIOS
   MM11.88Z.0055.B03.0604071521 04/07/06
    Workqueue: events_unbound async_run_entry_fn
    Call Trace:
      dump_stack+0x41/0x52
      warn_slowpath_common+0x7f/0xb0
      warn_slowpath_fmt+0x2e/0x30
      i915_gem_free_object+0x22d/0x250
      drm_gem_object_free+0x1a/0x20
      intel_user_framebuffer_destroy+0x5d/0x80
      drm_framebuffer_free+0x43/0x60
      drm_framebuffer_unreference+0x28/0x60
      drm_mode_set_config_internal+0x8e/0xc0
      restore_fbdev_mode+0xc2/0xf0
      drm_fb_helper_restore_fbdev_mode_unlocked+0x24/0x70
      drm_fb_helper_set_par+0x1d/0x40
      intel_fbdev_set_par+0x18/0x60
      fbcon_init+0x4e2/0x530
      do_bind_con_driver+0x15e/0x330
      do_take_over_console+0xd5/0x160
      do_fbcon_takeover+0x5d/0xc0
      fbcon_event_notify+0x5dd/0x760
      notifier_call_chain+0x41/0x60
      __blocking_notifier_call_chain+0x46/0x60
      blocking_notifier_call_chain+0x1a/0x20
      fb_notifier_call_chain+0x11/0x20
      register_framebuffer+0x1f2/0x2f0
      drm_fb_helper_initial_config+0x2aa/0x370
      intel_fbdev_initial_config+0x14/0x20
      async_run_entry_fn+0x31/0xd0
      process_one_work+0xee/0x2b0
      worker_thread+0x39/0x400
      kthread+0xac/0xd0
      ret_from_kernel_thread+0x21/0x30
    ---[ end trace e533d8d502f4f45e ]---
    Console: switching to colour frame buffer device 210x65
    ...

but things seemed to work despite it. The more scary warning is #2:

    ...
    dracut: Starting plymouth daemon
    usb 5-1: new full-speed USB device number 2 using uhci_hcd
    setfont (1129) used greatest stack depth: 6448 bytes left
    ------------[ cut here ]------------
    WARNING: CPU: 1 PID: 1127 at include/linux/kref.h:47
drm_framebuffer_reference+0x39/0x70()
    CPU: 1 PID: 1127 Comm: plymouthd Tainted: G        W
4.0.0-rc1-00129-g6380ad5-dirty #3
    Hardware name: Apple Computer, Inc. Macmini1,1/Mac-F4208EC8, BIOS
   MM11.88Z.0055.B03.0604071521 04/07/06
    Call Trace:
      dump_stack+0x41/0x52
      warn_slowpath_common+0x7f/0xb0
      warn_slowpath_null+0x1d/0x20
      drm_framebuffer_reference+0x39/0x70
      intel_plane_duplicate_state+0x36/0x70
      drm_plane_helper_update+0x24/0xc0
      __intel_set_mode+0x71c/0x9c0
      intel_set_mode+0x6b/0x90
      intel_get_load_detect_pipe+0x332/0x470
      intel_tv_detect+0x84/0x4e0
      drm_helper_probe_single_connector_modes_merge_bits+0x1a3/0x440
      drm_helper_probe_single_connector_modes+0x12/0x20
      drm_mode_getconnector+0x2e7/0x320
      drm_ioctl+0x1e5/0x560
      do_vfs_ioctl+0x71/0x590
      SyS_ioctl+0x92/0xa0
      syscall_call+0x7/0x7
    ---[ end trace e533d8d502f4f460 ]---
    usb 5-1: New USB device found, idVendor=05ac, idProduct=1000
    usb 5-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
    ...

which is scary because it implies some bad reference counting problem.
And in fact, that warning was followed by a NULL pointer oops, which I
worked around with this crazy patch:

    diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
    index 6b6b07f..28527ac 100644
    --- a/drivers/gpu/drm/drm_crtc.c
    +++ b/drivers/gpu/drm/drm_crtc.c
    @@ -452,7 +452,8 @@ static void drm_framebuffer_free(struct kref *kref)
            }
            mutex_unlock(&dev->mode_config.fb_lock);

    -       fb->funcs->destroy(fb);
    +       if (fb->funcs)
    +               fb->funcs->destroy(fb);
     }

     static struct drm_framebuffer *__drm_framebuffer_lookup(struct
drm_device *dev,

because "fb->funcs" was NULL. I assume it was NULL because some fb
freeing had cleared them already due to the kref going down to zero.

Any ideas? This is an ancient box with ancient user land, and not
getting a lot of testing. But it *used* to work.

                         Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
@ 2015-03-01  5:40   ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2015-03-01  5:40 UTC (permalink / raw)
  To: Dave Airlie, Daniel Vetter, Jani Nikula
  Cc: intel-gfx, Linux Kernel Mailing List, DRI mailing list

Hmm. I haven't updated the old Mac Mini I have in a *long* time, but
today I decided to try.

And it causes problems in drm.

I'm not sure how new these problems are, I think the previous kernel I
booted on this machine was 3.16. But I thought I'd better report them
as-is, because bisection on this thing takes *forever*, and it's quite
possible that you or one of the i915 guys goes "ahh, of course", so no
point in wasting time on bisection unless absolutely required.

Anyway, I have two warnings, and then had to do a work-around to avoid an oops.

Warning #1:

    ...
    fbcon: inteldrmfb (fb0) is primary device
    random: nonblocking pool is initialized
    [drm] Setting output timings on SDVOB failed
    ------------[ cut here ]------------
    WARNING: CPU: 1 PID: 15 at drivers/gpu/drm/i915/i915_gem.c:4524
i915_gem_free_object+0x22d/0x250()
    WARN_ON(obj->frontbuffer_bits)
    CPU: 1 PID: 15 Comm: kworker/u4:1 Tainted: G        W
4.0.0-rc1-00129-g6380ad5-dirty #3
    Hardware name: Apple Computer, Inc. Macmini1,1/Mac-F4208EC8, BIOS
   MM11.88Z.0055.B03.0604071521 04/07/06
    Workqueue: events_unbound async_run_entry_fn
    Call Trace:
      dump_stack+0x41/0x52
      warn_slowpath_common+0x7f/0xb0
      warn_slowpath_fmt+0x2e/0x30
      i915_gem_free_object+0x22d/0x250
      drm_gem_object_free+0x1a/0x20
      intel_user_framebuffer_destroy+0x5d/0x80
      drm_framebuffer_free+0x43/0x60
      drm_framebuffer_unreference+0x28/0x60
      drm_mode_set_config_internal+0x8e/0xc0
      restore_fbdev_mode+0xc2/0xf0
      drm_fb_helper_restore_fbdev_mode_unlocked+0x24/0x70
      drm_fb_helper_set_par+0x1d/0x40
      intel_fbdev_set_par+0x18/0x60
      fbcon_init+0x4e2/0x530
      do_bind_con_driver+0x15e/0x330
      do_take_over_console+0xd5/0x160
      do_fbcon_takeover+0x5d/0xc0
      fbcon_event_notify+0x5dd/0x760
      notifier_call_chain+0x41/0x60
      __blocking_notifier_call_chain+0x46/0x60
      blocking_notifier_call_chain+0x1a/0x20
      fb_notifier_call_chain+0x11/0x20
      register_framebuffer+0x1f2/0x2f0
      drm_fb_helper_initial_config+0x2aa/0x370
      intel_fbdev_initial_config+0x14/0x20
      async_run_entry_fn+0x31/0xd0
      process_one_work+0xee/0x2b0
      worker_thread+0x39/0x400
      kthread+0xac/0xd0
      ret_from_kernel_thread+0x21/0x30
    ---[ end trace e533d8d502f4f45e ]---
    Console: switching to colour frame buffer device 210x65
    ...

but things seemed to work despite it. The more scary warning is #2:

    ...
    dracut: Starting plymouth daemon
    usb 5-1: new full-speed USB device number 2 using uhci_hcd
    setfont (1129) used greatest stack depth: 6448 bytes left
    ------------[ cut here ]------------
    WARNING: CPU: 1 PID: 1127 at include/linux/kref.h:47
drm_framebuffer_reference+0x39/0x70()
    CPU: 1 PID: 1127 Comm: plymouthd Tainted: G        W
4.0.0-rc1-00129-g6380ad5-dirty #3
    Hardware name: Apple Computer, Inc. Macmini1,1/Mac-F4208EC8, BIOS
   MM11.88Z.0055.B03.0604071521 04/07/06
    Call Trace:
      dump_stack+0x41/0x52
      warn_slowpath_common+0x7f/0xb0
      warn_slowpath_null+0x1d/0x20
      drm_framebuffer_reference+0x39/0x70
      intel_plane_duplicate_state+0x36/0x70
      drm_plane_helper_update+0x24/0xc0
      __intel_set_mode+0x71c/0x9c0
      intel_set_mode+0x6b/0x90
      intel_get_load_detect_pipe+0x332/0x470
      intel_tv_detect+0x84/0x4e0
      drm_helper_probe_single_connector_modes_merge_bits+0x1a3/0x440
      drm_helper_probe_single_connector_modes+0x12/0x20
      drm_mode_getconnector+0x2e7/0x320
      drm_ioctl+0x1e5/0x560
      do_vfs_ioctl+0x71/0x590
      SyS_ioctl+0x92/0xa0
      syscall_call+0x7/0x7
    ---[ end trace e533d8d502f4f460 ]---
    usb 5-1: New USB device found, idVendor=05ac, idProduct=1000
    usb 5-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
    ...

which is scary because it implies some bad reference counting problem.
And in fact, that warning was followed by a NULL pointer oops, which I
worked around with this crazy patch:

    diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
    index 6b6b07f..28527ac 100644
    --- a/drivers/gpu/drm/drm_crtc.c
    +++ b/drivers/gpu/drm/drm_crtc.c
    @@ -452,7 +452,8 @@ static void drm_framebuffer_free(struct kref *kref)
            }
            mutex_unlock(&dev->mode_config.fb_lock);

    -       fb->funcs->destroy(fb);
    +       if (fb->funcs)
    +               fb->funcs->destroy(fb);
     }

     static struct drm_framebuffer *__drm_framebuffer_lookup(struct
drm_device *dev,

because "fb->funcs" was NULL. I assume it was NULL because some fb
freeing had cleared them already due to the kref going down to zero.

Any ideas? This is an ancient box with ancient user land, and not
getting a lot of testing. But it *used* to work.

                         Linus
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
  2015-03-01  5:40   ` Linus Torvalds
@ 2015-03-01  6:08     ` Linus Torvalds
  -1 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2015-03-01  6:08 UTC (permalink / raw)
  To: Dave Airlie, Daniel Vetter, Jani Nikula
  Cc: DRI mailing list, Linux Kernel Mailing List, intel-gfx

On Sat, Feb 28, 2015 at 9:40 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'm not sure how new these problems are, I think the previous kernel I
> booted on this machine was 3.16.

Hmm. 3.19 works fine, even if it ends up spewing

    WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/drm_irq.c:1121
drm_wait_one_vblank+0x125/0x130()
    vblank not available on crtc 1, ret=-22

a lot. But it doesn't have the problems I saw with current -git.

So it's new to this merge window.

I'll see how painful it is to bisect it,

           Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
@ 2015-03-01  6:08     ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2015-03-01  6:08 UTC (permalink / raw)
  To: Dave Airlie, Daniel Vetter, Jani Nikula
  Cc: intel-gfx, Linux Kernel Mailing List, DRI mailing list

On Sat, Feb 28, 2015 at 9:40 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'm not sure how new these problems are, I think the previous kernel I
> booted on this machine was 3.16.

Hmm. 3.19 works fine, even if it ends up spewing

    WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/drm_irq.c:1121
drm_wait_one_vblank+0x125/0x130()
    vblank not available on crtc 1, ret=-22

a lot. But it doesn't have the problems I saw with current -git.

So it's new to this merge window.

I'll see how painful it is to bisect it,

           Linus
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
  2015-03-01  6:08     ` Linus Torvalds
@ 2015-03-01  7:27       ` Linus Torvalds
  -1 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2015-03-01  7:27 UTC (permalink / raw)
  To: Dave Airlie, Daniel Vetter, Jani Nikula
  Cc: DRI mailing list, Linux Kernel Mailing List, intel-gfx

On Sat, Feb 28, 2015 at 10:08 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'll see how painful it is to bisect it,

Not surprisingly, it went right for the drm merge.

Commit 8c334ce8f0fe ("Merge branch 'timers-core-for-linus'..") is
good, while the next merge commit 796e1c55717e ("Merge branch
'drm-next' ..") is bad and doesn't boot.

I'll try to narrow it down at least a *bit* more, even if it's a
painfully slow machine.

           Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
@ 2015-03-01  7:27       ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2015-03-01  7:27 UTC (permalink / raw)
  To: Dave Airlie, Daniel Vetter, Jani Nikula
  Cc: intel-gfx, Linux Kernel Mailing List, DRI mailing list

On Sat, Feb 28, 2015 at 10:08 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'll see how painful it is to bisect it,

Not surprisingly, it went right for the drm merge.

Commit 8c334ce8f0fe ("Merge branch 'timers-core-for-linus'..") is
good, while the next merge commit 796e1c55717e ("Merge branch
'drm-next' ..") is bad and doesn't boot.

I'll try to narrow it down at least a *bit* more, even if it's a
painfully slow machine.

           Linus
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
  2015-03-01  7:27       ` Linus Torvalds
@ 2015-03-01 20:35         ` Linus Torvalds
  -1 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2015-03-01 20:35 UTC (permalink / raw)
  To: Dave Airlie, Daniel Vetter, Jani Nikula
  Cc: DRI mailing list, Linux Kernel Mailing List, intel-gfx

On Sat, Feb 28, 2015 at 11:27 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'll try to narrow it down at least a *bit* more, even if it's a
> painfully slow machine.

It was brought in by either

  Merge tag 'topic/atomic-core-2015-01-05'

or

  Merge tag 'topic/core-stuff-2014-12-19'

with the atomic stuff being the prime suspect (b46004b70367 is good,
so is 4b08eae52f2f and dafffda023b0).

The compiles should be going faster now that I'm getting closer, so
I'll have a tighter bisect soon. Knock wood.

           Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
@ 2015-03-01 20:35         ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2015-03-01 20:35 UTC (permalink / raw)
  To: Dave Airlie, Daniel Vetter, Jani Nikula
  Cc: intel-gfx, Linux Kernel Mailing List, DRI mailing list

On Sat, Feb 28, 2015 at 11:27 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'll try to narrow it down at least a *bit* more, even if it's a
> painfully slow machine.

It was brought in by either

  Merge tag 'topic/atomic-core-2015-01-05'

or

  Merge tag 'topic/core-stuff-2014-12-19'

with the atomic stuff being the prime suspect (b46004b70367 is good,
so is 4b08eae52f2f and dafffda023b0).

The compiles should be going faster now that I'm getting closer, so
I'll have a tighter bisect soon. Knock wood.

           Linus
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
  2015-03-01 20:35         ` Linus Torvalds
@ 2015-03-01 21:00           ` Linus Torvalds
  -1 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2015-03-01 21:00 UTC (permalink / raw)
  To: Dave Airlie, Daniel Vetter, Jani Nikula
  Cc: DRI mailing list, Linux Kernel Mailing List, intel-gfx

On Sun, Mar 1, 2015 at 12:35 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The compiles should be going faster now that I'm getting closer, so
> I'll have a tighter bisect soon. Knock wood.

Grr.

I found:

  ccfc08655d5fd5076828f45fb09194c070f2f63a is the first bad commit

but that boot failure is apparently fixed by 2caa80e72b57.

So my bisect seems to be worthless, because there were two independent
boot failures, and it found the known one.

Back to the drawing board.

            Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
@ 2015-03-01 21:00           ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2015-03-01 21:00 UTC (permalink / raw)
  To: Dave Airlie, Daniel Vetter, Jani Nikula
  Cc: intel-gfx, Linux Kernel Mailing List, DRI mailing list

On Sun, Mar 1, 2015 at 12:35 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The compiles should be going faster now that I'm getting closer, so
> I'll have a tighter bisect soon. Knock wood.

Grr.

I found:

  ccfc08655d5fd5076828f45fb09194c070f2f63a is the first bad commit

but that boot failure is apparently fixed by 2caa80e72b57.

So my bisect seems to be worthless, because there were two independent
boot failures, and it found the known one.

Back to the drawing board.

            Linus
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
  2015-03-01 21:00           ` Linus Torvalds
  (?)
@ 2015-03-02  1:59           ` Linus Torvalds
  2015-03-02  9:04               ` Daniel Vetter
  -1 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2015-03-02  1:59 UTC (permalink / raw)
  To: Dave Airlie, Daniel Vetter, Jani Nikula, Matt Roper,
	Ander Conselvan de Oliveira
  Cc: DRI mailing list, Linux Kernel Mailing List, intel-gfx

[-- Attachment #1: Type: text/plain, Size: 1497 bytes --]

On Sun, Mar 1, 2015 at 1:00 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Back to the drawing board.

Ok, many hours later, but I found it.

The bisection was a disaster, having to work around other bugs in this
area, but it ended up getting "close enough" that I figured out what
went wrong.

The "intel_plane_duplicate_state()" is horribly horribly buggy. It
looks at the state->fb pointer, but it may have been free'd already.

This workaround "works for me", but it's really still very
questionable, because while the "kref_get_unless_zero()" works
correctly when the last reference has been dropped, I'm not sure that
there is any guarantee that the whole allocation even exists any more,
so I think the *correct* thing to do would be to clear state->fb when
dropping the kref. But this was the smallest working patch I could
come up with. Somebody who actually knows the code should start
looking at the places that do drm_framebuffer_unreference(), and
actually clear that pointer instead.

Added Matt Roper and Ander Conselvan de Oliveira to the discussion,
since they are the ones git says are involved with the original broken
intel_plane_duplicate_state().

Anyway, attached is

 (a) the patch with a big comment

 (b) the warnings I get on that machine that show where this problem
triggers (and another warning earlier).

Comments? I'm sure this probably only triggers with *old* X servers
that don't do all the modern dri stuff.

                               Linus

[-- Attachment #2: 0001-Workaround-for-drm-bug.patch --]
[-- Type: text/x-patch, Size: 1424 bytes --]

From c182b15c3abee75cdc9d9564b6ab826403690f4e Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@localhost.localdomain>
Date: Sat, 28 Feb 2015 21:44:48 -0800
Subject: [PATCH] Workaround for drm bug

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

---
 drivers/gpu/drm/i915/intel_atomic_plane.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_atomic_plane.c b/drivers/gpu/drm/i915/intel_atomic_plane.c
index 9e6f727..72714d3 100644
--- a/drivers/gpu/drm/i915/intel_atomic_plane.c
+++ b/drivers/gpu/drm/i915/intel_atomic_plane.c
@@ -85,8 +85,23 @@ intel_plane_duplicate_state(struct drm_plane *plane)
 		return NULL;
 
 	state = &intel_state->base;
-	if (state->fb)
-		drm_framebuffer_reference(state->fb);
+
+	/*
+	 * We cannot do drm_framebuffer_reference(), because the reference
+	 * may already have been dropped.
+	 *
+	 * So we do what drm_framebuffer_lookup() does, namely do a
+	 * kref_get_unless_zero(). Even that is somewhat questionable,
+	 * in that maybe the 'fb' already got free'd. So warn loudly
+	 * about this.
+	 *
+	 * Maybe the base.fb should be cleared by whatever drops the
+	 * reference?
+	 */
+	if (state->fb && !kref_get_unless_zero(&state->fb->refcount)) {
+		state->fb = NULL;
+		WARN_ONCE(1, "intel_plane_duplicate_state got plane with dead frame buffer");
+	}
 
 	return state;
 }
-- 
2.3.1.167.g7f4ba4b


[-- Attachment #3: drm-bug-dmesg --]
[-- Type: application/octet-stream, Size: 6501 bytes --]

...
[    0.898565] random: nonblocking pool is initialized
[    0.932621] [drm] Setting output timings on SDVOB failed
[    1.082274] ------------[ cut here ]------------
[    1.082283] WARNING: CPU: 0 PID: 15 at drivers/gpu/drm/i915/i915_gem.c:4524 i915_gem_free_object+0x22d/0x250()
[    1.082291] WARN_ON(obj->frontbuffer_bits)
[    1.082291] CPU: 0 PID: 15 Comm: kworker/u4:1 Tainted: G        W       4.0.0-rc1-00154-g7c7766e #39
[    1.082293] Hardware name: Apple Computer, Inc. Macmini1,1/Mac-F4208EC8, BIOS     MM11.88Z.0055.B03.0604071521 04/07/06
[    1.082299] Workqueue: events_unbound async_run_entry_fn
[    1.082304]  00000000 00000000 f68cfb64 c166d337 c17ee3c4 f68cfb94 c103e3bf c17ee764
[    1.082309]  f68cfbc0 0000000f c17ee3c4 000011ac c12fe52d c12fe52d f69e0d00 00000000
[    1.082313]  f69e0d30 f68cfbac c103e43e 00000009 f68cfba4 c17ee764 f68cfbc0 f68cfbd8
[    1.082314] Call Trace:
[    1.082321]  [<c166d337>] dump_stack+0x41/0x52
[    1.082328]  [<c103e3bf>] warn_slowpath_common+0x7f/0xb0
[    1.082331]  [<c12fe52d>] ? i915_gem_free_object+0x22d/0x250
[    1.082334]  [<c12fe52d>] ? i915_gem_free_object+0x22d/0x250
[    1.082337]  [<c103e43e>] warn_slowpath_fmt+0x2e/0x30
[    1.082340]  [<c12fe52d>] i915_gem_free_object+0x22d/0x250
[    1.082344]  [<c166ed04>] ? mutex_lock+0x14/0x40
[    1.082348]  [<c12bafea>] drm_gem_object_free+0x1a/0x20
[    1.082353]  [<c1326d1d>] intel_user_framebuffer_destroy+0x5d/0x80
[    1.082356]  [<c166ed04>] ? mutex_lock+0x14/0x40
[    1.082360]  [<c12c3f33>] drm_framebuffer_free+0x43/0x60
[    1.082365]  [<c12c4588>] drm_framebuffer_unreference+0x28/0x60
[    1.082369]  [<c12c5c7e>] drm_mode_set_config_internal+0x8e/0xc0
[    1.082371]  [<c12b4e32>] restore_fbdev_mode+0xc2/0xf0
[    1.082375]  [<c12d3276>] ? __drm_modeset_lock_all+0xa6/0xe0
[    1.082379]  [<c12b7314>] drm_fb_helper_restore_fbdev_mode_unlocked+0x24/0x70
[    1.082382]  [<c12b737d>] drm_fb_helper_set_par+0x1d/0x40
[    1.082386]  [<c13452c8>] intel_fbdev_set_par+0x18/0x60
[    1.082390]  [<c1232b72>] fbcon_init+0x4e2/0x530
[    1.082395]  [<c115c230>] ? __kernfs_create_file+0x60/0x90
[    1.082400]  [<c129068e>] do_bind_con_driver+0x15e/0x330
[    1.082403]  [<c115cac7>] ? sysfs_create_file_ns+0x27/0x30
[    1.082406]  [<c136ecbf>] ? device_create_file+0x2f/0xa0
[    1.082411]  [<c1291a15>] do_take_over_console+0xd5/0x160
[    1.082414]  [<c1232c1d>] do_fbcon_takeover+0x5d/0xc0
[    1.082418]  [<c123757d>] fbcon_event_notify+0x5dd/0x760
[    1.082421]  [<c115c8fc>] ? sysfs_add_file_mode_ns+0x6c/0x1e0
[    1.082424]  [<c1054fe1>] notifier_call_chain+0x41/0x60
[    1.082427]  [<c10552b6>] __blocking_notifier_call_chain+0x46/0x60
[    1.082430]  [<c10552ea>] blocking_notifier_call_chain+0x1a/0x20
[    1.082432]  [<c123c501>] fb_notifier_call_chain+0x11/0x20
[    1.082435]  [<c123ebf2>] register_framebuffer+0x1f2/0x2f0
[    1.082439]  [<c12b713a>] drm_fb_helper_initial_config+0x2aa/0x370
[    1.082443]  [<c1345d14>] intel_fbdev_initial_config+0x14/0x20
[    1.082445]  [<c10565a1>] async_run_entry_fn+0x31/0xd0
[    1.082452]  [<c104f6c9>] ? pwq_dec_nr_in_flight+0x39/0x90
[    1.082455]  [<c104f80e>] process_one_work+0xee/0x2b0
[    1.082458]  [<c104fa39>] worker_thread+0x39/0x400
[    1.082461]  [<c105427c>] kthread+0xac/0xd0
[    1.082464]  [<c104fa00>] ? process_scheduled_works+0x30/0x30
[    1.082467]  [<c1670781>] ret_from_kernel_thread+0x21/0x30
[    1.082470]  [<c10541d0>] ? __kthread_parkme+0x60/0x60
[    1.082472] ---[ end trace 6309eba3c7a2324a ]---
[    1.094040] Console: switching to colour frame buffer device 210x65
...
[    2.903705] setfont (1128) used greatest stack depth: 6448 bytes left
[    2.915551] ------------[ cut here ]------------
[    2.917985] WARNING: CPU: 1 PID: 1126 at drivers/gpu/drm/i915/intel_atomic_plane.c:103 intel_plane_duplicate_state+0xc9/0xe0()
[    2.920518] intel_plane_duplicate_state got plane with dead frame buffer
[    2.920580] CPU: 1 PID: 1126 Comm: plymouthd Tainted: G        W       4.0.0-rc1-00154-g7c7766e #39
[    2.925701] Hardware name: Apple Computer, Inc. Macmini1,1/Mac-F4208EC8, BIOS     MM11.88Z.0055.B03.0604071521 04/07/06
[    2.928344]  00000000 00000000 f628ba4c c166d337 c17f5fd0 f628ba7c c103e3bf c17f5ffc
[    2.931048]  f628baa8 00000466 c17f5fd0 00000067 c1348ae9 c1348ae9 f6b47f04 f6ba5480
[    2.933771]  f6b47380 f628ba94 c103e43e 00000009 f628ba8c c17f5ffc f628baa8 f628bab4
[    2.936491] Call Trace:
[    2.939141]  [<c166d337>] dump_stack+0x41/0x52
[    2.941776]  [<c103e3bf>] warn_slowpath_common+0x7f/0xb0
[    2.944445]  [<c1348ae9>] ? intel_plane_duplicate_state+0xc9/0xe0
[    2.947117]  [<c1348ae9>] ? intel_plane_duplicate_state+0xc9/0xe0
[    2.949800]  [<c103e43e>] warn_slowpath_fmt+0x2e/0x30
[    2.952480]  [<c1348ae9>] intel_plane_duplicate_state+0xc9/0xe0
[    2.955192]  [<c12ae154>] drm_plane_helper_update+0x24/0xc0
[    2.957927]  [<c1330d5c>] __intel_set_mode+0x71c/0x9c0
[    2.960635]  [<c13366cb>] intel_set_mode+0x6b/0x90
[    2.963342]  [<c1336b12>] intel_get_load_detect_pipe+0x332/0x470
[    2.966026]  [<c11fffff>] ? sg_last+0x2f/0x40
[    2.968698]  [<c11fb85a>] ? snprintf+0x1a/0x20
[    2.971334]  [<c12caf24>] ? drm_mode_set_name+0x44/0x50
[    2.973968]  [<c1367b24>] intel_tv_detect+0x84/0x4e0
[    2.976599]  [<c105fed3>] ? update_cfs_rq_blocked_load+0x123/0x170
[    2.979249]  [<c12ad573>] drm_helper_probe_single_connector_modes_merge_bits+0x1a3/0x440
[    2.981936]  [<c166ed04>] ? mutex_lock+0x14/0x40
[    2.993074]  [<c12ad842>] drm_helper_probe_single_connector_modes+0x12/0x20
[    2.995760]  [<c12c7db7>] drm_mode_getconnector+0x2e7/0x320
[    2.998431]  [<c1038d72>] ? kmap_atomic_prot+0x42/0x110
[    3.001086]  [<c12bc6f5>] drm_ioctl+0x1e5/0x560
[    3.003725]  [<c12c7ad0>] ? get_properties+0xd0/0xd0
[    3.006376]  [<c10ea55b>] ? page_add_new_anon_rmap+0x5b/0x70
[    3.009000]  [<c11a3e3a>] ? avc_has_perm+0x2a/0x110
[    3.011614]  [<c10e25a1>] ? handle_mm_fault+0xcf1/0xea0
[    3.014189]  [<c12bc510>] ? drm_setversion+0x170/0x170
[    3.016766]  [<c110fd21>] do_vfs_ioctl+0x71/0x590
[    3.019327]  [<c11a8f3c>] ? inode_has_perm.clone.37+0x2c/0x40
[    3.021877]  [<c11a8fe1>] ? file_has_perm+0x91/0xa0
[    3.024430]  [<c11aa7e2>] ? selinux_file_ioctl+0x42/0xe0
[    3.026978]  [<c11102d2>] SyS_ioctl+0x92/0xa0
[    3.029532]  [<c1670912>] syscall_call+0x7/0x7
[    3.032058] ---[ end trace 6309eba3c7a2324c ]---
[    3.102207] usb 1-6.1: new full-speed USB device number 5 using ehci-pci
..

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
  2015-03-02  1:59           ` Linus Torvalds
@ 2015-03-02  9:04               ` Daniel Vetter
  0 siblings, 0 replies; 31+ messages in thread
From: Daniel Vetter @ 2015-03-02  9:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Airlie, Daniel Vetter, Jani Nikula, Matt Roper,
	Ander Conselvan de Oliveira, intel-gfx,
	Linux Kernel Mailing List, DRI mailing list

On Sun, Mar 01, 2015 at 05:59:53PM -0800, Linus Torvalds wrote:
> On Sun, Mar 1, 2015 at 1:00 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Back to the drawing board.
> 
> Ok, many hours later, but I found it.
> 
> The bisection was a disaster, having to work around other bugs in this
> area, but it ended up getting "close enough" that I figured out what
> went wrong.

Sorry for all the bisect work. Ville dug as far as the load detect being
unhappy due to atomic last week. But since I general don't real mail on
w/e I've only seen this thread now.

> The "intel_plane_duplicate_state()" is horribly horribly buggy. It
> looks at the state->fb pointer, but it may have been free'd already.
> 
> This workaround "works for me", but it's really still very
> questionable, because while the "kref_get_unless_zero()" works
> correctly when the last reference has been dropped, I'm not sure that
> there is any guarantee that the whole allocation even exists any more,
> so I think the *correct* thing to do would be to clear state->fb when
> dropping the kref. But this was the smallest working patch I could
> come up with. Somebody who actually knows the code should start
> looking at the places that do drm_framebuffer_unreference(), and
> actually clear that pointer instead.
> 
> Added Matt Roper and Ander Conselvan de Oliveira to the discussion,
> since they are the ones git says are involved with the original broken
> intel_plane_duplicate_state().
> 
> Anyway, attached is
> 
>  (a) the patch with a big comment
> 
>  (b) the warnings I get on that machine that show where this problem
> triggers (and another warning earlier).
> 
> Comments? I'm sure this probably only triggers with *old* X servers
> that don't do all the modern dri stuff.

It's not old X servers but old machines which don't have hotplug
interrupts (or still have tv-out, which doesn't have hpd support
anywhere).

I'll dig into this now just fyi the rules about how plane->state should be
handled:
- you need plane->lock
- it's invariant once assigned, updates should only be done with a
  duplicate_state(), update state you want to change and the going through
  the atomic commit machinery
- drm_plane_state->fb should always be a full reference

But switching to atomic amounts to a full rewrite of the drm core and
drivers (semantics for modeset updates are totally different). So there's
lots of glue and ducttape going on to keep up the illusion for both old
code and partially transitioned driver. It's all a bit wobbly atm and
don't loook too closely at some of the hacks we have.

And can you please attach a bactrace of the WARN in your patch, just to
double-check you blow up at the same spot?

Thanks, Daniel

>                                Linus

> From c182b15c3abee75cdc9d9564b6ab826403690f4e Mon Sep 17 00:00:00 2001
> From: Linus Torvalds <torvalds@localhost.localdomain>
> Date: Sat, 28 Feb 2015 21:44:48 -0800
> Subject: [PATCH] Workaround for drm bug
> 
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> 
> ---
>  drivers/gpu/drm/i915/intel_atomic_plane.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_atomic_plane.c b/drivers/gpu/drm/i915/intel_atomic_plane.c
> index 9e6f727..72714d3 100644
> --- a/drivers/gpu/drm/i915/intel_atomic_plane.c
> +++ b/drivers/gpu/drm/i915/intel_atomic_plane.c
> @@ -85,8 +85,23 @@ intel_plane_duplicate_state(struct drm_plane *plane)
>  		return NULL;
>  
>  	state = &intel_state->base;
> -	if (state->fb)
> -		drm_framebuffer_reference(state->fb);
> +
> +	/*
> +	 * We cannot do drm_framebuffer_reference(), because the reference
> +	 * may already have been dropped.
> +	 *
> +	 * So we do what drm_framebuffer_lookup() does, namely do a
> +	 * kref_get_unless_zero(). Even that is somewhat questionable,
> +	 * in that maybe the 'fb' already got free'd. So warn loudly
> +	 * about this.
> +	 *
> +	 * Maybe the base.fb should be cleared by whatever drops the
> +	 * reference?
> +	 */
> +	if (state->fb && !kref_get_unless_zero(&state->fb->refcount)) {
> +		state->fb = NULL;
> +		WARN_ONCE(1, "intel_plane_duplicate_state got plane with dead frame buffer");
> +	}
>  
>  	return state;
>  }
> -- 
> 2.3.1.167.g7f4ba4b
> 


> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel


-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
@ 2015-03-02  9:04               ` Daniel Vetter
  0 siblings, 0 replies; 31+ messages in thread
From: Daniel Vetter @ 2015-03-02  9:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: intel-gfx, Linux Kernel Mailing List, DRI mailing list, Daniel Vetter

On Sun, Mar 01, 2015 at 05:59:53PM -0800, Linus Torvalds wrote:
> On Sun, Mar 1, 2015 at 1:00 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Back to the drawing board.
> 
> Ok, many hours later, but I found it.
> 
> The bisection was a disaster, having to work around other bugs in this
> area, but it ended up getting "close enough" that I figured out what
> went wrong.

Sorry for all the bisect work. Ville dug as far as the load detect being
unhappy due to atomic last week. But since I general don't real mail on
w/e I've only seen this thread now.

> The "intel_plane_duplicate_state()" is horribly horribly buggy. It
> looks at the state->fb pointer, but it may have been free'd already.
> 
> This workaround "works for me", but it's really still very
> questionable, because while the "kref_get_unless_zero()" works
> correctly when the last reference has been dropped, I'm not sure that
> there is any guarantee that the whole allocation even exists any more,
> so I think the *correct* thing to do would be to clear state->fb when
> dropping the kref. But this was the smallest working patch I could
> come up with. Somebody who actually knows the code should start
> looking at the places that do drm_framebuffer_unreference(), and
> actually clear that pointer instead.
> 
> Added Matt Roper and Ander Conselvan de Oliveira to the discussion,
> since they are the ones git says are involved with the original broken
> intel_plane_duplicate_state().
> 
> Anyway, attached is
> 
>  (a) the patch with a big comment
> 
>  (b) the warnings I get on that machine that show where this problem
> triggers (and another warning earlier).
> 
> Comments? I'm sure this probably only triggers with *old* X servers
> that don't do all the modern dri stuff.

It's not old X servers but old machines which don't have hotplug
interrupts (or still have tv-out, which doesn't have hpd support
anywhere).

I'll dig into this now just fyi the rules about how plane->state should be
handled:
- you need plane->lock
- it's invariant once assigned, updates should only be done with a
  duplicate_state(), update state you want to change and the going through
  the atomic commit machinery
- drm_plane_state->fb should always be a full reference

But switching to atomic amounts to a full rewrite of the drm core and
drivers (semantics for modeset updates are totally different). So there's
lots of glue and ducttape going on to keep up the illusion for both old
code and partially transitioned driver. It's all a bit wobbly atm and
don't loook too closely at some of the hacks we have.

And can you please attach a bactrace of the WARN in your patch, just to
double-check you blow up at the same spot?

Thanks, Daniel

>                                Linus

> From c182b15c3abee75cdc9d9564b6ab826403690f4e Mon Sep 17 00:00:00 2001
> From: Linus Torvalds <torvalds@localhost.localdomain>
> Date: Sat, 28 Feb 2015 21:44:48 -0800
> Subject: [PATCH] Workaround for drm bug
> 
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> 
> ---
>  drivers/gpu/drm/i915/intel_atomic_plane.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_atomic_plane.c b/drivers/gpu/drm/i915/intel_atomic_plane.c
> index 9e6f727..72714d3 100644
> --- a/drivers/gpu/drm/i915/intel_atomic_plane.c
> +++ b/drivers/gpu/drm/i915/intel_atomic_plane.c
> @@ -85,8 +85,23 @@ intel_plane_duplicate_state(struct drm_plane *plane)
>  		return NULL;
>  
>  	state = &intel_state->base;
> -	if (state->fb)
> -		drm_framebuffer_reference(state->fb);
> +
> +	/*
> +	 * We cannot do drm_framebuffer_reference(), because the reference
> +	 * may already have been dropped.
> +	 *
> +	 * So we do what drm_framebuffer_lookup() does, namely do a
> +	 * kref_get_unless_zero(). Even that is somewhat questionable,
> +	 * in that maybe the 'fb' already got free'd. So warn loudly
> +	 * about this.
> +	 *
> +	 * Maybe the base.fb should be cleared by whatever drops the
> +	 * reference?
> +	 */
> +	if (state->fb && !kref_get_unless_zero(&state->fb->refcount)) {
> +		state->fb = NULL;
> +		WARN_ONCE(1, "intel_plane_duplicate_state got plane with dead frame buffer");
> +	}
>  
>  	return state;
>  }
> -- 
> 2.3.1.167.g7f4ba4b
> 


> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel


-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
  2015-03-01  6:08     ` Linus Torvalds
@ 2015-03-02  9:44       ` Paul Bolle
  -1 siblings, 0 replies; 31+ messages in thread
From: Paul Bolle @ 2015-03-02  9:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Airlie, Daniel Vetter, Jani Nikula, DRI mailing list,
	Linux Kernel Mailing List, intel-gfx

On Sat, 2015-02-28 at 22:08 -0800, Linus Torvalds wrote:
> Hmm. 3.19 works fine, even if it ends up spewing
> 
>     WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/drm_irq.c:1121
> drm_wait_one_vblank+0x125/0x130()
>     vblank not available on crtc 1, ret=-22
> 
> a lot.

For what it's worth, that stream of WARNINGs was reported for v3.19-rc1
in http://lkml.kernel.org/r/20150131211609.GA3710@yulia-desktop .

Commit f9b61ff6bce9 ("drm/i915: Push vblank enable/disable past
encoder->enable/disable") silenced it again in v4.0-rc1. I guess that
commit will end up in v3.19-stable in due time.


Paul Bolle


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
@ 2015-03-02  9:44       ` Paul Bolle
  0 siblings, 0 replies; 31+ messages in thread
From: Paul Bolle @ 2015-03-02  9:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: intel-gfx, Linux Kernel Mailing List, DRI mailing list, Daniel Vetter

On Sat, 2015-02-28 at 22:08 -0800, Linus Torvalds wrote:
> Hmm. 3.19 works fine, even if it ends up spewing
> 
>     WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/drm_irq.c:1121
> drm_wait_one_vblank+0x125/0x130()
>     vblank not available on crtc 1, ret=-22
> 
> a lot.

For what it's worth, that stream of WARNINGs was reported for v3.19-rc1
in http://lkml.kernel.org/r/20150131211609.GA3710@yulia-desktop .

Commit f9b61ff6bce9 ("drm/i915: Push vblank enable/disable past
encoder->enable/disable") silenced it again in v4.0-rc1. I guess that
commit will end up in v3.19-stable in due time.


Paul Bolle

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
  2015-03-02  9:44       ` Paul Bolle
@ 2015-03-02 10:26         ` Jani Nikula
  -1 siblings, 0 replies; 31+ messages in thread
From: Jani Nikula @ 2015-03-02 10:26 UTC (permalink / raw)
  To: Paul Bolle, Linus Torvalds
  Cc: Dave Airlie, Daniel Vetter, DRI mailing list,
	Linux Kernel Mailing List, intel-gfx

On Mon, 02 Mar 2015, Paul Bolle <pebolle@tiscali.nl> wrote:
> On Sat, 2015-02-28 at 22:08 -0800, Linus Torvalds wrote:
>> Hmm. 3.19 works fine, even if it ends up spewing
>> 
>>     WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/drm_irq.c:1121
>> drm_wait_one_vblank+0x125/0x130()
>>     vblank not available on crtc 1, ret=-22
>> 
>> a lot.
>
> For what it's worth, that stream of WARNINGs was reported for v3.19-rc1
> in http://lkml.kernel.org/r/20150131211609.GA3710@yulia-desktop .
>
> Commit f9b61ff6bce9 ("drm/i915: Push vblank enable/disable past
> encoder->enable/disable") silenced it again in v4.0-rc1. I guess that
> commit will end up in v3.19-stable in due time.

Yes; the commit lacked cc: stable, I just requested a backport moments
before your mail [1].

BR,
Jani.


[1] http://mid.gmane.org/874mq3oif5.fsf@intel.com


-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
@ 2015-03-02 10:26         ` Jani Nikula
  0 siblings, 0 replies; 31+ messages in thread
From: Jani Nikula @ 2015-03-02 10:26 UTC (permalink / raw)
  To: Paul Bolle, Linus Torvalds
  Cc: Dave Airlie, Daniel Vetter, intel-gfx, Linux Kernel Mailing List,
	DRI mailing list

On Mon, 02 Mar 2015, Paul Bolle <pebolle@tiscali.nl> wrote:
> On Sat, 2015-02-28 at 22:08 -0800, Linus Torvalds wrote:
>> Hmm. 3.19 works fine, even if it ends up spewing
>> 
>>     WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/drm_irq.c:1121
>> drm_wait_one_vblank+0x125/0x130()
>>     vblank not available on crtc 1, ret=-22
>> 
>> a lot.
>
> For what it's worth, that stream of WARNINGs was reported for v3.19-rc1
> in http://lkml.kernel.org/r/20150131211609.GA3710@yulia-desktop .
>
> Commit f9b61ff6bce9 ("drm/i915: Push vblank enable/disable past
> encoder->enable/disable") silenced it again in v4.0-rc1. I guess that
> commit will end up in v3.19-stable in due time.

Yes; the commit lacked cc: stable, I just requested a backport moments
before your mail [1].

BR,
Jani.


[1] http://mid.gmane.org/874mq3oif5.fsf@intel.com


-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [git pull] drm fixes
  2015-03-02  9:44       ` Paul Bolle
@ 2015-03-02 10:33         ` Daniel Vetter
  -1 siblings, 0 replies; 31+ messages in thread
From: Daniel Vetter @ 2015-03-02 10:33 UTC (permalink / raw)
  To: Paul Bolle
  Cc: Linus Torvalds, Dave Airlie, intel-gfx,
	Linux Kernel Mailing List, DRI mailing list, Daniel Vetter

On Mon, Mar 02, 2015 at 10:44:16AM +0100, Paul Bolle wrote:
> On Sat, 2015-02-28 at 22:08 -0800, Linus Torvalds wrote:
> > Hmm. 3.19 works fine, even if it ends up spewing
> > 
> >     WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/drm_irq.c:1121
> > drm_wait_one_vblank+0x125/0x130()
> >     vblank not available on crtc 1, ret=-22
> > 
> > a lot.
> 
> For what it's worth, that stream of WARNINGs was reported for v3.19-rc1
> in http://lkml.kernel.org/r/20150131211609.GA3710@yulia-desktop .
> 
> Commit f9b61ff6bce9 ("drm/i915: Push vblank enable/disable past
> encoder->enable/disable") silenced it again in v4.0-rc1. I guess that
> commit will end up in v3.19-stable in due time.

Yeah we've forgotten to tag it as stable (it missed 3.19 by a notch and
we didn't add teh tag when moving it to the 3.20^W4.0 branch) but Jani
just sent out the mail for that.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
@ 2015-03-02 10:33         ` Daniel Vetter
  0 siblings, 0 replies; 31+ messages in thread
From: Daniel Vetter @ 2015-03-02 10:33 UTC (permalink / raw)
  To: Paul Bolle
  Cc: Dave Airlie, intel-gfx, Linux Kernel Mailing List,
	DRI mailing list, Daniel Vetter, Linus Torvalds

On Mon, Mar 02, 2015 at 10:44:16AM +0100, Paul Bolle wrote:
> On Sat, 2015-02-28 at 22:08 -0800, Linus Torvalds wrote:
> > Hmm. 3.19 works fine, even if it ends up spewing
> > 
> >     WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/drm_irq.c:1121
> > drm_wait_one_vblank+0x125/0x130()
> >     vblank not available on crtc 1, ret=-22
> > 
> > a lot.
> 
> For what it's worth, that stream of WARNINGs was reported for v3.19-rc1
> in http://lkml.kernel.org/r/20150131211609.GA3710@yulia-desktop .
> 
> Commit f9b61ff6bce9 ("drm/i915: Push vblank enable/disable past
> encoder->enable/disable") silenced it again in v4.0-rc1. I guess that
> commit will end up in v3.19-stable in due time.

Yeah we've forgotten to tag it as stable (it missed 3.19 by a notch and
we didn't add teh tag when moving it to the 3.20^W4.0 branch) but Jani
just sent out the mail for that.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
  2015-03-02  9:04               ` Daniel Vetter
@ 2015-03-02 16:53                 ` Linus Torvalds
  -1 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2015-03-02 16:53 UTC (permalink / raw)
  To: Linus Torvalds, Dave Airlie, Daniel Vetter, Jani Nikula,
	Matt Roper, Ander Conselvan de Oliveira, intel-gfx,
	Linux Kernel Mailing List, DRI mailing list

On Mon, Mar 2, 2015 at 1:04 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>
> And can you please attach a bactrace of the WARN in your patch, just to
> double-check you blow up at the same spot?

So the dmesg I attached had a backtrace for the new WARN_ONCE() (in
addition to an unrelated(?) one from i915_gem_free_object()).

Or did you mean a backtrace of the oops when things go wrong, when my
patch is *not* applied? My first email had that with the kref.h
warning from drm_framebuffer_reference, which is otherwise the same
thing.

And after things go wrong, and the plane handling thing uses a
framebuffer that has already been freed, then the resulting oopses end
up being kind of random. Which was partly why it ended up beign so
hard to finally bisect, and I actually eventually gave up on it -
because sometimes things would just happen to work, sometimes things
would blow up and oops when X started (resulting in just a dead
machine), sometimes things would oops at bootup.

The most common oops was something going wrong when the framebuffer
was free'd for the second time, and it would cause a NULL pointer
dereference in drm_framebuffer_free() or one of the routines it called
(so often a NULL pointer dereference in
"mutex_lock(&dev->mode_config.fb_lock)" because "dev" was NULL, or the
call to "fb->funcs->destroy(fb)" would oops because "fb->funcs" was
NULL.

                         Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [git pull] drm fixes
@ 2015-03-02 16:53                 ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2015-03-02 16:53 UTC (permalink / raw)
  To: Linus Torvalds, Dave Airlie, Daniel Vetter, Jani Nikula,
	Matt Roper, Ander Conselvan de Oliveira, intel-gfx,
	Linux Kernel Mailing List, DRI mailing list

On Mon, Mar 2, 2015 at 1:04 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>
> And can you please attach a bactrace of the WARN in your patch, just to
> double-check you blow up at the same spot?

So the dmesg I attached had a backtrace for the new WARN_ONCE() (in
addition to an unrelated(?) one from i915_gem_free_object()).

Or did you mean a backtrace of the oops when things go wrong, when my
patch is *not* applied? My first email had that with the kref.h
warning from drm_framebuffer_reference, which is otherwise the same
thing.

And after things go wrong, and the plane handling thing uses a
framebuffer that has already been freed, then the resulting oopses end
up being kind of random. Which was partly why it ended up beign so
hard to finally bisect, and I actually eventually gave up on it -
because sometimes things would just happen to work, sometimes things
would blow up and oops when X started (resulting in just a dead
machine), sometimes things would oops at bootup.

The most common oops was something going wrong when the framebuffer
was free'd for the second time, and it would cause a NULL pointer
dereference in drm_framebuffer_free() or one of the routines it called
(so often a NULL pointer dereference in
"mutex_lock(&dev->mode_config.fb_lock)" because "dev" was NULL, or the
call to "fb->funcs->destroy(fb)" would oops because "fb->funcs" was
NULL.

                         Linus
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [git pull] drm fixes
  2015-03-02 16:53                 ` Linus Torvalds
@ 2015-03-02 17:23                   ` Daniel Vetter
  -1 siblings, 0 replies; 31+ messages in thread
From: Daniel Vetter @ 2015-03-02 17:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Airlie, Daniel Vetter, Jani Nikula, Matt Roper,
	Ander Conselvan de Oliveira, intel-gfx,
	Linux Kernel Mailing List, DRI mailing list

On Mon, Mar 2, 2015 at 5:53 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Mar 2, 2015 at 1:04 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> And can you please attach a bactrace of the WARN in your patch, just to
>> double-check you blow up at the same spot?
>
> So the dmesg I attached had a backtrace for the new WARN_ONCE() (in
> addition to an unrelated(?) one from i915_gem_free_object()).
>
> Or did you mean a backtrace of the oops when things go wrong, when my
> patch is *not* applied? My first email had that with the kref.h
> warning from drm_framebuffer_reference, which is otherwise the same
> thing.

I've mixed things up with the other reporter which was full of the
subsequent oopses. But after I've sorted out why drm-intel-next
doesn't blow up the same way I see the bug now. Still baffled that we
underrun the refcount apparently since the same pile of legacy code +
atomic glue is used for the old modeset ioctl. But obviously something
is different, so still digging.

The gem_free_object backtrace is a completely unrelated issue. Fix for
that is in drm-intel-fixes and on the way to you:

commit 62e537f8d568347bbe4e00d7803a838750cdc618
Author: Rodrigo Vivi <rodrigo.vivi@intel.com>
Date:   Tue Feb 24 13:37:54 2015 -0800

    drm/i915: Fix frontbuffer false positve.

If that one doesn't help please scream ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [git pull] drm fixes
@ 2015-03-02 17:23                   ` Daniel Vetter
  0 siblings, 0 replies; 31+ messages in thread
From: Daniel Vetter @ 2015-03-02 17:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: intel-gfx, Linux Kernel Mailing List, DRI mailing list, Daniel Vetter

On Mon, Mar 2, 2015 at 5:53 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Mar 2, 2015 at 1:04 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> And can you please attach a bactrace of the WARN in your patch, just to
>> double-check you blow up at the same spot?
>
> So the dmesg I attached had a backtrace for the new WARN_ONCE() (in
> addition to an unrelated(?) one from i915_gem_free_object()).
>
> Or did you mean a backtrace of the oops when things go wrong, when my
> patch is *not* applied? My first email had that with the kref.h
> warning from drm_framebuffer_reference, which is otherwise the same
> thing.

I've mixed things up with the other reporter which was full of the
subsequent oopses. But after I've sorted out why drm-intel-next
doesn't blow up the same way I see the bug now. Still baffled that we
underrun the refcount apparently since the same pile of legacy code +
atomic glue is used for the old modeset ioctl. But obviously something
is different, so still digging.

The gem_free_object backtrace is a completely unrelated issue. Fix for
that is in drm-intel-fixes and on the way to you:

commit 62e537f8d568347bbe4e00d7803a838750cdc618
Author: Rodrigo Vivi <rodrigo.vivi@intel.com>
Date:   Tue Feb 24 13:37:54 2015 -0800

    drm/i915: Fix frontbuffer false positve.

If that one doesn't help please scream ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH] drm/i915: Fix modeset state confusion in the load detect code
  2015-03-02  9:44       ` Paul Bolle
                         ` (2 preceding siblings ...)
  (?)
@ 2015-03-03 16:31       ` Daniel Vetter
  2015-03-03 17:03         ` Linus Torvalds
  2015-03-04 12:41         ` shuang.he
  -1 siblings, 2 replies; 31+ messages in thread
From: Daniel Vetter @ 2015-03-03 16:31 UTC (permalink / raw)
  To: Intel Graphics Development
  Cc: Paul Bolle, Daniel Vetter, DRI Development, Daniel Vetter,
	Linus Torvalds

This is a tricky story of the new atomic state handling and the legacy
code fighting over each another. The bug at hand is an underrun of the
framebuffer reference with subsequent hilarity caused by the load
detect code. Which is peculiar since the the exact same code works
fine as the implementation of the legacy setcrtc ioctl.

Let's look at the ingredients:

- Currently our code is a crazy mix of legacy modeset interfaces to
  set the parameters and half-baked atomic state tracking underneath.
  While this transition is going we're using the transitional plane
  helpers to update the atomic side (drm_plane_helper_disable/update
  and friends), i.e. plane->state->fb. Since the state structure owns
  the fb those functions take care of that themselves.

  The legacy state (specifically crtc->primary->fb) is still managed
  by the old code (and mostly by the drm core), with the fb reference
  counting done by callers (core drm for the ioctl or the i915 load
  detect code). The relevant commit is

  commit ea2c67bb4affa84080c616920f3899f123786e56
  Author: Matt Roper <matthew.d.roper@intel.com>
  Date:   Tue Dec 23 10:41:52 2014 -0800

      drm/i915: Move to atomic plane helpers (v9)

- drm_plane_helper_disable has special code to handle multiple calls
  in a row - it checks plane->crtc == NULL and bails out. This is to
  match the proper atomic implementation which needs the crtc to get
  at the implied locking context atomic updates always need. See

  commit acf24a395c5a9290189b080383564437101d411c
  Author: Daniel Vetter <daniel.vetter@ffwll.ch>
  Date:   Tue Jul 29 15:33:05 2014 +0200

      drm/plane-helper: transitional atomic plane helpers

- The universal plane code split out the implicit primary plane from
  the CRTC into it's own full-blown drm_plane object. As part of that
  the setcrtc ioctl (which updated both the crtc mode and primary
  plane) learned to set crtc->primary->crtc on modeset to make sure
  the plane->crtc assignments statate up to date in

  commit e13161af80c185ecd8dc4641d0f5df58f9e3e0af
  Author: Matt Roper <matthew.d.roper@intel.com>
  Date:   Tue Apr 1 15:22:38 2014 -0700

      drm: Add drm_crtc_init_with_planes() (v2)

  Unfortunately we've forgotten to update the load detect code. Which
  wasn't a problem since the load detect modeset is temporary and
  always undone before we drop the locks.

- Finally there is a organically grown history (i.e. don't ask) around
  who sets the legacy plane->fb for the various driver entry points.
  Originally updating that was the drivers duty, but for almost all
  places we've moved that (plus updating the refcounts) into the core.
  Again the exception is the load detect code.

Taking all together the following happens:
- The load detect code doesn't set crtc->primary->crtc. This is only
  really an issue on crtcs never before used or when userspace
  explicitly disabled the primary plane.

- The plane helper glue code short-circuits because of that and leaves
  a non-NULL fb behind in plane->state->fb and plane->fb. The state
  fb isn't a real problem (it's properly refcounted on its own), it's
  just the canary.

- Load detect code drops the reference for that fb, but doesn't set
  plane->fb = NULL. This is ok since it's still living in that old
  world where drivers had to clear the pointer but the core/callers
  handled the refcounting.

- On the next modeset the drm core notices plane->fb and takes care of
  refcounting it properly by doing another unref. This drops the
  refcount to zero, leaving state->plane now pointing at freed memory.

- intel_plane_duplicate_state still assume it owns a reference to that
  very state->fb and bad things start to happen.

Fix this all by applying the same duct-tape as for the legacy setcrtc
ioctl code and set crtc->primary->crtc properly.

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Paulo Zanoni <przanoni@gmail.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index cc3305e30c1b..d116caf98a72 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -8859,6 +8859,7 @@ retry:
 			old->release_fb->funcs->destroy(old->release_fb);
 		goto fail;
 	}
+	crtc->primary->crtc = crtc;
 
 	/* let the connector get through one full cycle before testing */
 	intel_wait_for_vblank(dev, intel_crtc->pipe);
-- 
2.1.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] drm/i915: Fix modeset state confusion in the load detect code
  2015-03-03 16:31       ` [PATCH] drm/i915: Fix modeset state confusion in the load detect code Daniel Vetter
@ 2015-03-03 17:03         ` Linus Torvalds
  2015-03-03 17:11           ` Daniel Vetter
  2015-03-04 12:41         ` shuang.he
  1 sibling, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2015-03-03 17:03 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Paul Bolle, Intel Graphics Development, DRI Development, Daniel Vetter

On Tue, Mar 3, 2015 at 8:31 AM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> Fix this all by applying the same duct-tape as for the legacy setcrtc
> ioctl code and set crtc->primary->crtc properly.

Ack. Tests fine on the machine that showed issues for me.

I'll apply it manually directly to my tree, since I want to release
rc2 and I was holding that up in the hope this would get released (I
hate releasing even early rc's with issues that I know about and that
break machines I have access to).

If it ends up in your tree too and I'll get a duplicate commit later,
git will sort it all out, so it's fine.

Thanks,
                            Linus
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] drm/i915: Fix modeset state confusion in the load detect code
  2015-03-03 17:03         ` Linus Torvalds
@ 2015-03-03 17:11           ` Daniel Vetter
  2015-03-03 17:12             ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Vetter @ 2015-03-03 17:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul Bolle, Daniel Vetter, Intel Graphics Development,
	DRI Development, Daniel Vetter

On Tue, Mar 03, 2015 at 09:03:32AM -0800, Linus Torvalds wrote:
> On Tue, Mar 3, 2015 at 8:31 AM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >
> > Fix this all by applying the same duct-tape as for the legacy setcrtc
> > ioctl code and set crtc->primary->crtc properly.
> 
> Ack. Tests fine on the machine that showed issues for me.
> 
> I'll apply it manually directly to my tree, since I want to release
> rc2 and I was holding that up in the hope this would get released (I
> hate releasing even early rc's with issues that I know about and that
> break machines I have access to).

Fine with me. If you haven't pushed out yet can you maybe clarify the
commit message?

The setcrtc step must be one that disables the crtc, only then will we
(again) skip the book-keeping for plane->state->fb and unref to 0 in the
drm core. If you directly switch to another fb/mode then we'll unref to 0
in the state duplicate and then blow up in the drm core when doing the
unref for plane->old_fb.

But since X generally disable everything before enabling the new config
the previuos one is how we more likely blow up, and hence also why
duplicate_state blows up.

> If it ends up in your tree too and I'll get a duplicate commit later,
> git will sort it all out, so it's fine.

The revert and another patch to make the load detect code testable in an
automated fashion on modern platfroms to avoid another occurence of this
mess are fully independent of this one line. So no conflicts to be
expected.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] drm/i915: Fix modeset state confusion in the load detect code
  2015-03-03 17:11           ` Daniel Vetter
@ 2015-03-03 17:12             ` Linus Torvalds
  2015-03-03 17:19               ` Daniel Vetter
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2015-03-03 17:12 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Paul Bolle, Daniel Vetter, Intel Graphics Development,
	DRI Development, Daniel Vetter

On Tue, Mar 3, 2015 at 9:11 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>
> Fine with me. If you haven't pushed out yet can you maybe clarify the
> commit message?

Oh well. I already applied and tagged it, so it's what it is..

                     Linus
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] drm/i915: Fix modeset state confusion in the load detect code
  2015-03-03 17:12             ` Linus Torvalds
@ 2015-03-03 17:19               ` Daniel Vetter
  0 siblings, 0 replies; 31+ messages in thread
From: Daniel Vetter @ 2015-03-03 17:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul Bolle, Daniel Vetter, Intel Graphics Development,
	DRI Development, Daniel Vetter

On Tue, Mar 03, 2015 at 09:12:47AM -0800, Linus Torvalds wrote:
> On Tue, Mar 3, 2015 at 9:11 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > Fine with me. If you haven't pushed out yet can you maybe clarify the
> > commit message?
> 
> Oh well. I already applied and tagged it, so it's what it is..

No biggie, the explanation is correct and just glosses a bit over that
detail.  In case someone hit this the other way round and fell all over
plane->fb being freed instead of state->fb. But really unlikely - these
old machines don't tend to run fancy non-X compositors ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] drm/i915: Fix modeset state confusion in the load detect code
  2015-03-03 16:31       ` [PATCH] drm/i915: Fix modeset state confusion in the load detect code Daniel Vetter
  2015-03-03 17:03         ` Linus Torvalds
@ 2015-03-04 12:41         ` shuang.he
  1 sibling, 0 replies; 31+ messages in thread
From: shuang.he @ 2015-03-04 12:41 UTC (permalink / raw)
  To: shuang.he, ethan.gao, intel-gfx, daniel.vetter

Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 5879
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
PNV                 -6              278/278              272/278
ILK                                  308/308              308/308
SNB                 -1              284/284              283/284
IVB                 -1              380/380              379/380
BYT                                  294/294              294/294
HSW                                  387/387              387/387
BDW                 -1              316/316              315/316
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
 PNV  igt_gem_userptr_blits_coherency-sync      NO_RESULT(1)CRASH(8)NRUN(1)PASS(8)      CRASH(1)NRUN(1)
 PNV  igt_gem_userptr_blits_coherency-unsync      FAIL(1)NO_RESULT(1)CRASH(6)PASS(7)      CRASH(1)PASS(1)
 PNV  igt_gem_userptr_blits_minor-unsync-interruptible      DMESG_WARN(1)PASS(5)      DMESG_WARN(1)PASS(1)
 PNV  igt_gen3_render_linear_blits      FAIL(6)NRUN(1)DMESG_WARN(1)PASS(9)      FAIL(2)
 PNV  igt_gen3_render_mixed_blits      FAIL(9)PASS(9)      FAIL(2)
 PNV  igt_gem_fence_thrash_bo-write-verify-threaded-none      FAIL(2)CRASH(4)PASS(7)      CRASH(2)
*SNB  igt_gem_fence_thrash_bo-write-verify-y      PASS(5)      DMESG_WARN(1)PASS(1)
*IVB  igt_gem_storedw_batches_loop_secure-dispatch      PASS(3)      DMESG_WARN(1)PASS(1)
*BDW  igt_gem_gtt_hog      PASS(19)      DMESG_WARN(1)PASS(1)
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2015-03-04 12:41 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-27  4:42 [git pull] drm fixes Dave Airlie
2015-02-27  4:42 ` Dave Airlie
2015-03-01  5:40 ` Linus Torvalds
2015-03-01  5:40   ` Linus Torvalds
2015-03-01  6:08   ` Linus Torvalds
2015-03-01  6:08     ` Linus Torvalds
2015-03-01  7:27     ` Linus Torvalds
2015-03-01  7:27       ` Linus Torvalds
2015-03-01 20:35       ` Linus Torvalds
2015-03-01 20:35         ` Linus Torvalds
2015-03-01 21:00         ` Linus Torvalds
2015-03-01 21:00           ` Linus Torvalds
2015-03-02  1:59           ` Linus Torvalds
2015-03-02  9:04             ` Daniel Vetter
2015-03-02  9:04               ` Daniel Vetter
2015-03-02 16:53               ` Linus Torvalds
2015-03-02 16:53                 ` Linus Torvalds
2015-03-02 17:23                 ` [Intel-gfx] " Daniel Vetter
2015-03-02 17:23                   ` Daniel Vetter
2015-03-02  9:44     ` Paul Bolle
2015-03-02  9:44       ` Paul Bolle
2015-03-02 10:26       ` Jani Nikula
2015-03-02 10:26         ` Jani Nikula
2015-03-02 10:33       ` [Intel-gfx] " Daniel Vetter
2015-03-02 10:33         ` Daniel Vetter
2015-03-03 16:31       ` [PATCH] drm/i915: Fix modeset state confusion in the load detect code Daniel Vetter
2015-03-03 17:03         ` Linus Torvalds
2015-03-03 17:11           ` Daniel Vetter
2015-03-03 17:12             ` Linus Torvalds
2015-03-03 17:19               ` Daniel Vetter
2015-03-04 12:41         ` shuang.he

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.