All of lore.kernel.org
 help / color / mirror / Atom feed
* intel_fbdev_restore_mode derefence null pointer
@ 2016-02-03  7:49 Li, Weinan Z
  2016-02-03  9:17 ` [PATCH] drm/i915: Protect fbdev across slow or failed initialisation Chris Wilson
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Li, Weinan Z @ 2016-02-03  7:49 UTC (permalink / raw)
  To: intel-gfx


[-- Attachment #1.1.1: Type: text/plain, Size: 5702 bytes --]


I met one kenel panic issue in iVGT, usually can be easily reproduced with multi-vms.

unable to handle kernel NULL pointer dereference at 00000000000000a0IP: [<
(d29) ffffffffa025a52b>] intel_fbdev_restore_mode+0x57/0x73 [i915]

The drm_device ->dev_private -> fbdev ->fb access function run before the initialization of it.
Since the "intel_fbdev_initial_config" run in "async_schedule", before the ifbdev->fb initialization, one access from
drm_release -> drm_lastclose->i915_driver_lastclose-> intel_fbdev_restore_mode occurred, then got kernel panic.
Do we need to add NULL pointer or async_synchronize_cookie() to avoid this issue?
I also find similar issue in bugs.freedesktop
https://bugs.freedesktop.org/show_bug.cgi?id=93580


Below is the error log:
d29) init: failsafe main process (1412) killed by TERM signal
(d29) init: bluetooth main process (1574) terminated with status 1
(d29) init: bluetooth main process ended, respawning
(d29) init: bluetooth main process (1642) terminated with status 1
(d29) init: bluetooth main process ended, respawning
(d29) init: bluetooth main process (1689) terminated with status 1
(d29) init: bluetooth respawning too fast, stopped
(d29) e1000: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
(d29) [drm] failed to retrieve link info, disabling eDP
(d29) i915 0000:00:02.0: Direct firmware load for i915/skl_guc_ver4.bin failed with e
(d29) rror -2SUBSYSTEM=pciDEVICE=+pci:0000:00:02.0
(d29) [drm:intel_guc_ucode_init [i915]] *ERROR* Failed to fetch GuC firmware from i91
(d29) 5/skl_guc_ver4.bin (error -2)
(d29) [drm] VGT ballooning configuration:
(d29) [drm] Mappable graphic memory: base 0x1e000000 size 131072KiB
(d29) [drm] Unmappable graphic memory: base 0x88000000 size 393216KiB
(d29) [drm] balloon space: range [ 0x40000000 - 0x88000000 ] 1179648 KiB.
(d29) [drm] balloon space: range [ 0xa0000000 - 0xfffff000 ] 1572860 KiB.
(d29) [drm] balloon space: range [ 0x0 - 0x1e000000 ] 491520 KiB.
(d29) [drm] balloon space: range [ 0x26000000 - 0x40000000 ] 425984 KiB.
(d29) [drm] VGT balloon successfully
[ 4506.568318] vGT info:(ring_pp_mode_write:744) EXECLIST enabling on ring 0.
[ 4506.576307] vGT-3: add to render run queue!
[ 4506.582888] vGT info:(ring_pp_mode_write:744) EXECLIST enabling on ring 1.
[ 4506.591714] vGT info:(ring_pp_mode_write:744) EXECLIST enabling on ring 2.
[ 4506.600092] vGT info:(ring_pp_mode_write:744) EXECLIST enabling on ring 3.
[ 4506.608928] vGT info:(ring_pp_mode_write:744) EXECLIST enabling on ring 4.
(d29) [drm:intel_opregion_init [i915]] *ERROR* No ACPI video bus found
(d29) [drm] Initialized i915 1.6.0 20151010 for 0000:00:02.0 on minor 0
(d29) BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0IP: [<
(d29) ffffffffa025a52b>] intel_fbdev_restore_mode+0x57/0x73 [i915]PGD 38a79067 PUD 38
(d29) a78067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: fuse microcode parport_pc i
(d29) 915 drm_kms_helper i2c_algo_bit serio_raw acpi_cpufreq ppdev drm i2c_piix4 lp p
(d29) arport ext4 crc16 jbd2 mbcache e1000 uhci_hcd ata_generic pata_acpiCPU: 1 PID:
(d29) 1749 Comm: gpu-manager Tainted: G     U          4.3.0-rc6-vgt+ #1
(d29) Hardware name: Xen HVM domU, BIOS 4.6.0 01/15/2016
(d29) task: ffff88003be9d700 ti: ffff880038a80000 task.ti: ffff880038a80000
(d29) RIP: 0010:[<ffffffffa025a52b>]  [<ffffffffa025a52b>] intel_fbdev_restore_mode+0
(d29) x57/0x73 [i915]RSP: 0018:ffff880038a83d48  EFLAGS: 00010246
(d29) RAX: 0000000000000000 RBX: ffff8800357cb800 RCX: ffff88003d2c2400
(d29) RDX: 0000000080000000 RSI: 0000000000000000 RDI: ffff88003c139060
(d29) RBP: ffff880038a83d50 R08: 00000000ffffffff R09: ffff88003cc00000
(d29) R10: ffff88003cc001b0 R11: ffffffffa012c9a3 R12: ffff88003c139000
(d29) R13: ffff88003c139060 R14: ffff88003ba7d8e0 R15: ffff88003c139088
(d29) FS:  00007fd3bdce5740(0000) GS:ffff88003d620000(0000) knlGS:0000000000000000
(d29) CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
(d29) CR2: 00000000000000a0 CR3: 0000000000078000 CR4: 00000000003406e0
(d29) DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
(d29) DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
(d29) Stack:
(d29)  ffff88003c139000 ffff880038a83d60 ffffffffa028696c ffff880038a83d80 ffffffffa0
(d29) 1170f7 0000000000000000 ffff88003c139000 ffff880038a83de0 ffffffffa0117592 ffff
(d29) 880038b87210 ffff88003c139198 0000000000000246Call Trace:
(d29)  [<ffffffffa028696c>] i915_driver_lastclose+0x9/0xb [i915]
(d29)  [<ffffffffa01170f7>] drm_lastclose+0x3a/0x103 [drm]
(d29)  [<ffffffffa0117592>] drm_release+0x3d2/0x40b [drm]
(d29)  [<ffffffff81147d26>] __fput+0xec/0x1a7
(d29)  [<ffffffff81147e0d>] ____fput+0x9/0xb
(d29)  [<ffffffff8106ac15>] task_work_run+0x62/0x78
(d29)  [<ffffffff8100380c>] prepare_exit_to_usermode+0x93/0xaf
(d29)  [<ffffffff8100398d>] syscall_return_slowpath+0x165/0x19e
(d29)  [<ffffffff81155125>] ? do_vfs_ioctl+0x360/0x41a
(d29)  [<ffffffff8106ab44>] ? task_work_add+0x3f/0x4e
(d29)  [<ffffffff81147e87>] ? fput+0x78/0x7f
(d29)  [<ffffffff81144f91>] ? filp_close+0x63/0x6d
(d29)  [<ffffffff814f09cc>] int_ret_from_sys_call+0x25/0x8f
(d29) Code: c6 9c 29 2f a0 48 c7 c7 90 24 2d a0 31 c0 e8 2e 0e ec ff eb 2f 48 8b 43 0
(d29) 8 48 8d 78 60 e8 68 49 29 e1 48 8b 83 a0 00 00 00 31 f6 <48> 8b b8 a0 00 00 00
(d29) e8 1b 5f ff ff 48 8b 7b 08 48 83 c7 60 e8 RIP  [<ffffffffa025a52b>] intel_fbdev
(d29) _restore_mode+0x57/0x73 [i915] RSP <ffff880038a83d48>
(d29) CR2: 00000000000000a0
(d29) ---[ end trace e656291822c44c35 ]---
(d29) Kernel panic - not syncing: Fatal exception
(d29) Kernel Offset: disabled

BRs,
Weinan Li



[-- Attachment #1.1.2: Type: text/html, Size: 32864 bytes --]

[-- Attachment #1.2: image001.gif --]
[-- Type: image/gif, Size: 92 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [PATCH] drm/i915: Protect fbdev across slow or failed initialisation
@ 2016-02-04  9:21 Li, Weinan Z
  2016-02-05  0:27 ` Lukas Wunner
  0 siblings, 1 reply; 18+ messages in thread
From: Li, Weinan Z @ 2016-02-04  9:21 UTC (permalink / raw)
  To: gustav.fagerlind, Lukas Wunner; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 5982 bytes --]

Hi Wilson,
We still need this patch. Seems 54632abe8ca3 ("drm/i915: Fix oops caused by fbdev initialization
failure") as well as 366e39b4d2c5 ("drm/i915: Tear down fbdev if
initialization fails") this 2 patches can’t cover this case. It’s access ifbdev->fb before the initialization
finished, but not initialization failed. If don’t have any other patches or code update relative, it may still have in 4.5.

add info NULL check should be better, it is also initialized in the async queue
>       info = ifbdev->helper.fbdev;
> +     if (info == NULL)
> +            return false;
>       if (!info->screen_base)

BRs,
Weinan Li


From: Li, Weinan Z
Sent: Thursday, February 04, 2016 10:34 AM
To: 'gustav.fagerlind@gmail.com'; Lukas Wunner
Cc: Chris Wilson; intel-gfx@lists.freedesktop.org<mailto:intel-gfx@lists.freedesktop.org>
Subject: RE: [Intel-gfx] [PATCH] drm/i915: Protect fbdev across slow or failed initialisation

Thanks for your quick response.
Yes it is not easily be reproduced in native. In  iVGT we startup several VMs  simultaneously, it can be reproduced in several cycles, upon 1/10 fail rate.
Need to use GUI mode but not text mode to reproduce this issue.

BRs,
Weinan Li


From: Gustav Fägerlind [mailto:gustav.fagerlind@gmail.com]
Sent: Thursday, February 04, 2016 1:08 AM
To: Lukas Wunner
Cc: Chris Wilson; intel-gfx@lists.freedesktop.org<mailto:intel-gfx@lists.freedesktop.org>; Li, Weinan Z
Subject: Re: [Intel-gfx] [PATCH] drm/i915: Protect fbdev across slow or failed initialisation

Cool, thank you.
I dont believe I can easily reproduce it, it has only happend few times (and i reboot my lappy >2 times per day).

//
Gustav

2016-02-03 14:25 GMT+01:00 Lukas Wunner <lukas@wunner.de<mailto:lukas@wunner.de>>:
Hi,

On Wed, Feb 03, 2016 at 09:17:37AM +0000, Chris Wilson wrote:
> If the initialisation fails, we may be left with a dangling pointer with
> an incomplete fbdev structure.

This shouldn't happen with 4.5, the fbdev is now clobbered if initialization
fails, the existing "if (dev_priv->fbdev)" checks should thus be sufficient.
See 54632abe8ca3 ("drm/i915: Fix oops caused by fbdev initialization
failure") as well as 366e39b4d2c5 ("drm/i915: Tear down fbdev if
initialization fails").

Gustav Fagerlind and Li Weinan both reported this for 4.3. It would be
interesting to know if it can be reproduced at all with 4.5-rc2.

Best regards,

Lukas

> Here we want to disable internal calls
> into fbdev. Similarly, the initialisation may be slow and we haven't yet
> enabled the fbdev (e.g. quick suspend or last-close before the async init
> completes).
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580
> Reported-by: "Li, Weinan Z" <weinan.z.li@intel.com<mailto:weinan.z.li@intel.com>>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk<mailto:chris@chris-wilson.co.uk>>
> ---
>  drivers/gpu/drm/i915/intel_fbdev.c | 41 ++++++++++++++++++++++++--------------
>  1 file changed, 26 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
> index 09840f4380f9..6218bc5370a1 100644
> --- a/drivers/gpu/drm/i915/intel_fbdev.c
> +++ b/drivers/gpu/drm/i915/intel_fbdev.c
> @@ -114,6 +114,20 @@ static struct fb_ops intelfb_ops = {
>       .fb_debug_leave = drm_fb_helper_debug_leave,
>  };
>
> +static bool intel_fbdev_active(struct intel_fbdev *ifbdev)
> +{
> +     struct fb_info *info;
> +
> +     if (ifbdev == NULL)
> +             return false;
> +
> +     info = ifbdev->helper.fbdev;
> +     if (!info->screen_base)
> +             return false;
> +
> +     return info->state == FBINFO_STATE_RUNNING;
> +}
> +
>  static int intelfb_alloc(struct drm_fb_helper *helper,
>                        struct drm_fb_helper_surface_size *sizes)
>  {
> @@ -753,6 +767,8 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
>               return;
>
>       info = ifbdev->helper.fbdev;
> +     if (!info->screen_base)
> +             return;
>
>       if (synchronous) {
>               /* Flush any pending work to turn the console on, and then
> @@ -794,29 +810,24 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
>
>  void intel_fbdev_output_poll_changed(struct drm_device *dev)
>  {
> -     struct drm_i915_private *dev_priv = dev->dev_private;
> -     if (dev_priv->fbdev)
> +     struct drm_i915_private *dev_priv = to_i915(dev);
> +
> +     if (intel_fbdev_active(dev_priv->fbdev))
>               drm_fb_helper_hotplug_event(&dev_priv->fbdev->helper);
>  }
>
>  void intel_fbdev_restore_mode(struct drm_device *dev)
>  {
> -     int ret;
> -     struct drm_i915_private *dev_priv = dev->dev_private;
> -     struct intel_fbdev *ifbdev = dev_priv->fbdev;
> -     struct drm_fb_helper *fb_helper;
> +     struct intel_fbdev *ifbdev = to_i915(dev)->fbdev;
>
> -     if (!ifbdev)
> +     if (!intel_fbdev_active(ifbdev))
>               return;
>
> -     fb_helper = &ifbdev->helper;
> -
> -     ret = drm_fb_helper_restore_fbdev_mode_unlocked(fb_helper);
> -     if (ret) {
> -             DRM_DEBUG("failed to restore crtc mode\n");
> -     } else {
> -             mutex_lock(&fb_helper->dev->struct_mutex);
> +     if (drm_fb_helper_restore_fbdev_mode_unlocked(&ifbdev->helper) == 0) {
> +             mutex_lock(&dev->struct_mutex);
>               intel_fb_obj_invalidate(ifbdev->fb->obj, ORIGIN_GTT);
> -             mutex_unlock(&fb_helper->dev->struct_mutex);
> +             mutex_unlock(&dev->struct_mutex);
> +     } else {
> +             DRM_DEBUG("failed to restore crtc mode\n");
>       }
>  }
> --
> 2.7.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org<mailto:Intel-gfx@lists.freedesktop.org>
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[-- Attachment #1.2: Type: text/html, Size: 17071 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread
* [REGRESSION] system hang on ILK/SNB/IVB
@ 2016-03-30 17:20 Gabriel Feceoru
  2016-03-30 17:57 ` [PATCH] drm/i915: Protect fbdev across slow or failed initialisation Chris Wilson
  0 siblings, 1 reply; 18+ messages in thread
From: Gabriel Feceoru @ 2016-03-30 17:20 UTC (permalink / raw)
  To: lukas, daniel.vetter; +Cc: Tomi Sarvela, intel-gfx

This commit causes a hang while running kms suspend tests 
(kms_pipe_crc_basic@suspend-read-crc-pipe-*) on ILK/SNB/IVB, affecting CI.

Probably the same problem with the one in v2, but on older HW.


commit a7442b93cf32c1e1ddb721a26cd1f92302e2a222
Author: Lukas Wunner <lukas@wunner.de>
Date:   Wed Mar 9 12:52:53 2016 +0100

     drm/i915: Fix races on fbdev

     The ->lastclose callback invokes intel_fbdev_restore_mode() and has
     been witnessed to run before intel_fbdev_initial_config_async()
     has finished.

     We might likewise receive hotplug events before we've had a chance to
     fully set up the fbdev.

     Fix by waiting for the asynchronous thread to finish.

     v2:
     An async_synchronize_full() was also added to intel_fbdev_set_suspend()
     in v1 which turned out to be entirely gratuitous. It caused a deadlock
     on suspend (discovered by CI, thanks to Damien Lespiau and Tomi Sarvela
     for CI support) and was unnecessary since a device is never suspended
     until its ->probe callback (and all asynchronous tasks it scheduled)
     have finished. See dpm_prepare(), which calls wait_for_device_probe(),
     which calls async_synchronize_full().

     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580
     Reported-by: Gustav Fägerlind <gustav.fagerlind@gmail.com>
     Reported-by: "Li, Weinan Z" <weinan.z.li@intel.com>
     Cc: Chris Wilson <chris@chris-wilson.co.uk>
     Cc: stable@vger.kernel.org
     Signed-off-by: Lukas Wunner <lukas@wunner.de>
     Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
     Link: 
http://patchwork.freedesktop.org/patch/msgid/20160309115147.67B2B6E0D3@gabe.freedesktop.org


Regards,
Gabriel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-03-30 18:30 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-03  7:49 intel_fbdev_restore_mode derefence null pointer Li, Weinan Z
2016-02-03  9:17 ` [PATCH] drm/i915: Protect fbdev across slow or failed initialisation Chris Wilson
2016-02-03 13:25   ` Lukas Wunner
2016-02-03 17:08     ` Gustav Fägerlind
2016-02-04  2:34       ` Li, Weinan Z
2016-02-03 11:19 ` ✓ Fi.CI.BAT: success for " Patchwork
2016-02-03 11:54 ` Patchwork
2016-02-03 12:23 ` ✗ Fi.CI.BAT: failure " Patchwork
2016-02-04  9:21 [PATCH] " Li, Weinan Z
2016-02-05  0:27 ` Lukas Wunner
2016-02-05 11:09   ` Chris Wilson
2016-02-05 14:58     ` Lukas Wunner
2016-02-15 16:32       ` Daniel Vetter
2016-03-30 17:20 [REGRESSION] system hang on ILK/SNB/IVB Gabriel Feceoru
2016-03-30 17:57 ` [PATCH] drm/i915: Protect fbdev across slow or failed initialisation Chris Wilson
2016-03-30 18:10   ` kbuild test robot
2016-03-30 18:10   ` kbuild test robot
2016-03-30 18:26   ` kbuild test robot
2016-03-30 18:30   ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.