Re: ✗ Fi.CI.BAT: failure for series starting with [v3] mm/vmap: Add a notifier for when we run out of vmap address space (rev3)

From: Marius Vlad <marius.c.vlad@intel.com>
To: intel-gfx@lists.freedesktop.org
Subject: Re: ✗ Fi.CI.BAT: failure for series starting with [v3] mm/vmap: Add a notifier for when we run out of vmap address space (rev3)
Date: Tue, 29 Mar 2016 15:51:12 +0300	[thread overview]
Message-ID: <20160329125111.GA32238@mcvlad-wk.rb.intel.com> (raw)
In-Reply-To: <20160329083436.15280.36915@emeril.freedesktop.org>

[-- Attachment #1.1: Type: text/plain, Size: 14002 bytes --]

We're not catching it, but this gives deadlock trace when running 
kms_pipe_crc_basic@suspend-read-crc-pipe-A (happens on BSW):

[  132.555497] kms_pipe_crc_basic: starting subtest suspend-read-crc-pipe-A
[  132.734041] PM: Syncing filesystems ... done.
[  132.751624] Freezing user space processes ... (elapsed 0.003 seconds) done.
[  132.755240] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
[  132.758372] Suspending console(s) (use no_console_suspend to debug)
[  132.768157] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[  132.780482] sd 0:0:0:0: [sda] Stopping disk
[  132.889902] PM: suspend of devices complete after 129.133 msecs
[  132.924359] PM: late suspend of devices complete after 34.440 msecs
[  132.932433] r8169 0000:03:00.0: System wakeup enabled by ACPI
[  132.938105] xhci_hcd 0000:00:14.0: System wakeup enabled by ACPI
[  132.948029] PM: noirq suspend of devices complete after 23.660 msecs
[  132.948073] ACPI: Preparing to enter system sleep state S3
[  132.960567] PM: Saving platform NVS memory
[  132.960803] Disabling non-boot CPUs ...
[  132.999863] Broke affinity for irq 116
[  133.002229] smpboot: CPU 1 is now offline

[  133.022915] ======================================================
[  133.022916] [ INFO: possible circular locking dependency detected ]
[  133.022921] 4.5.0-gfxbench-Patchwork_315+ #1 Tainted: G     U         
[  133.022922] -------------------------------------------------------
[  133.022925] rtcwake/5998 is trying to acquire lock:
[  133.022942]  (s_active#6){++++.+}, at: [<ffffffff81252b10>] kernfs_remove_by_name_ns+0x40/0x90
[  133.022943] 
but task is already holding lock:
[  133.022953]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8107908d>] cpu_hotplug_begin+0x6d/0xc0
[  133.022954] 
which lock already depends on the new lock.

[  133.022955] 
the existing dependency chain (in reverse order) is:
[  133.022962] 
-> #3 (cpu_hotplug.lock){+.+.+.}:
[  133.022968]        [<ffffffff810ce2a1>] lock_acquire+0xb1/0x200
[  133.022974]        [<ffffffff817c3a72>] mutex_lock_nested+0x62/0x3c0
[  133.022978]        [<ffffffff81078d41>] get_online_cpus+0x61/0x80
[  133.022983]        [<ffffffff8111a2cb>] stop_machine+0x1b/0xe0
[  133.023046]        [<ffffffffa012a3ad>] gen8_ggtt_insert_entries__BKL+0x2d/0x30 [i915]
[  133.023096]        [<ffffffffa012dc06>] ggtt_bind_vma+0x46/0x70 [i915]
[  133.023146]        [<ffffffffa012f55d>] i915_vma_bind+0xed/0x260 [i915]
[  133.023197]        [<ffffffffa0137153>] i915_gem_object_do_pin+0x873/0xb20 [i915]
[  133.023248]        [<ffffffffa0137428>] i915_gem_object_pin+0x28/0x30 [i915]
[  133.023301]        [<ffffffffa014b5f5>] intel_init_pipe_control+0xb5/0x200 [i915]
[  133.023354]        [<ffffffffa01483be>] intel_logical_rings_init+0x14e/0x1080 [i915]
[  133.023405]        [<ffffffffa0137ee3>] i915_gem_init+0xf3/0x130 [i915]
[  133.023462]        [<ffffffffa01bcfeb>] i915_driver_load+0xbeb/0x1950 [i915]
[  133.023470]        [<ffffffff8151b7f4>] drm_dev_register+0xa4/0xb0
[  133.023474]        [<ffffffff8151d9de>] drm_get_pci_dev+0xce/0x1d0
[  133.023519]        [<ffffffffa00f72ff>] i915_pci_probe+0x2f/0x50 [i915]
[  133.023525]        [<ffffffff814490a5>] pci_device_probe+0x85/0xf0
[  133.023530]        [<ffffffff8153fd67>] driver_probe_device+0x227/0x440
[  133.023534]        [<ffffffff81540003>] __driver_attach+0x83/0x90
[  133.023538]        [<ffffffff8153d8b1>] bus_for_each_dev+0x61/0xa0
[  133.023542]        [<ffffffff8153f4c9>] driver_attach+0x19/0x20
[  133.023546]        [<ffffffff8153efb9>] bus_add_driver+0x1e9/0x280
[  133.023550]        [<ffffffff81540bab>] driver_register+0x5b/0xd0
[  133.023555]        [<ffffffff81447ffb>] __pci_register_driver+0x5b/0x60
[  133.023559]        [<ffffffff8151dbb6>] drm_pci_init+0xd6/0x100
[  133.023563]        [<ffffffffa0230092>] 0xffffffffa0230092
[  133.023571]        [<ffffffff810003d6>] do_one_initcall+0xa6/0x1d0
[  133.023577]        [<ffffffff8115c832>] do_init_module+0x5a/0x1c8
[  133.023585]        [<ffffffff81108d9d>] load_module+0x1efd/0x25a0
[  133.023591]        [<ffffffff81109658>] SyS_finit_module+0x98/0xc0
[  133.023598]        [<ffffffff817c82db>] entry_SYSCALL_64_fastpath+0x16/0x6f
[  133.023603] 
-> #2 (&dev->struct_mutex){+.+.+.}:
[  133.023608]        [<ffffffff810ce2a1>] lock_acquire+0xb1/0x200
[  133.023616]        [<ffffffff81516d0f>] drm_gem_mmap+0x19f/0x2a0
[  133.023622]        [<ffffffff8119adb9>] mmap_region+0x389/0x5f0
[  133.023626]        [<ffffffff8119b38a>] do_mmap+0x36a/0x420
[  133.023632]        [<ffffffff8117f03d>] vm_mmap_pgoff+0x6d/0xa0
[  133.023638]        [<ffffffff811992c3>] SyS_mmap_pgoff+0x183/0x220
[  133.023645]        [<ffffffff8100a246>] SyS_mmap+0x16/0x20
[  133.023649]        [<ffffffff817c82db>] entry_SYSCALL_64_fastpath+0x16/0x6f
[  133.023656] 
-> #1 (&mm->mmap_sem){++++++}:
[  133.023661]        [<ffffffff810ce2a1>] lock_acquire+0xb1/0x200
[  133.023667]        [<ffffffff8118fda5>] __might_fault+0x75/0xa0
[  133.023673]        [<ffffffff8125350a>] kernfs_fop_write+0x8a/0x180
[  133.023679]        [<ffffffff811d5d03>] __vfs_write+0x23/0xe0
[  133.023684]        [<ffffffff811d6ac2>] vfs_write+0xa2/0x190
[  133.023687]        [<ffffffff811d7914>] SyS_write+0x44/0xb0
[  133.023691]        [<ffffffff817c82db>] entry_SYSCALL_64_fastpath+0x16/0x6f
[  133.023697] 
-> #0 (s_active#6){++++.+}:
[  133.023701]        [<ffffffff810cd961>] __lock_acquire+0x1e81/0x1ef0
[  133.023705]        [<ffffffff810ce2a1>] lock_acquire+0xb1/0x200
[  133.023709]        [<ffffffff81251d81>] __kernfs_remove+0x241/0x320
[  133.023713]        [<ffffffff81252b10>] kernfs_remove_by_name_ns+0x40/0x90
[  133.023717]        [<ffffffff812544a0>] sysfs_remove_file_ns+0x10/0x20
[  133.023722]        [<ffffffff8153b504>] device_del+0x124/0x240
[  133.023726]        [<ffffffff8153b639>] device_unregister+0x19/0x60
[  133.023731]        [<ffffffff81545d42>] cpu_cache_sysfs_exit+0x52/0xb0
[  133.023735]        [<ffffffff81546318>] cacheinfo_cpu_callback+0x38/0x70
[  133.023739]        [<ffffffff8109bc59>] notifier_call_chain+0x39/0xa0
[  133.023743]        [<ffffffff8109bcc9>] __raw_notifier_call_chain+0x9/0x10
[  133.023748]        [<ffffffff8107900e>] cpu_notify_nofail+0x1e/0x30
[  133.023751]        [<ffffffff81079320>] _cpu_down+0x200/0x330
[  133.023756]        [<ffffffff8107987a>] disable_nonboot_cpus+0xaa/0x3b0
[  133.023761]        [<ffffffff810d47f8>] suspend_devices_and_enter+0x478/0xc30
[  133.023765]        [<ffffffff810d54c5>] pm_suspend+0x515/0x9e0
[  133.023771]        [<ffffffff810d3617>] state_store+0x77/0xe0
[  133.023777]        [<ffffffff81403eaf>] kobj_attr_store+0xf/0x20
[  133.023781]        [<ffffffff81254200>] sysfs_kf_write+0x40/0x50
[  133.023785]        [<ffffffff812535bc>] kernfs_fop_write+0x13c/0x180
[  133.023790]        [<ffffffff811d5d03>] __vfs_write+0x23/0xe0
[  133.023794]        [<ffffffff811d6ac2>] vfs_write+0xa2/0x190
[  133.023798]        [<ffffffff811d7914>] SyS_write+0x44/0xb0
[  133.023802]        [<ffffffff817c82db>] entry_SYSCALL_64_fastpath+0x16/0x6f
[  133.023805] 
other info that might help us debug this:

[  133.023813] Chain exists of:
  s_active#6 --> &dev->struct_mutex --> cpu_hotplug.lock

[  133.023814]  Possible unsafe locking scenario:

[  133.023815]        CPU0                    CPU1
[  133.023816]        ----                    ----
[  133.023819]   lock(cpu_hotplug.lock);
[  133.023823]                                lock(&dev->struct_mutex);
[  133.023826]                                lock(cpu_hotplug.lock);
[  133.023830]   lock(s_active#6);
[  133.023831] 
 *** DEADLOCK ***

[  133.023835] 8 locks held by rtcwake/5998:
[  133.023847]  #0:  (sb_writers#6){.+.+.+}, at: [<ffffffff811da0f2>] __sb_start_write+0xb2/0xf0
[  133.023855]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff812534e1>] kernfs_fop_write+0x61/0x180
[  133.023863]  #2:  (s_active#105){.+.+.+}, at: [<ffffffff812534e9>] kernfs_fop_write+0x69/0x180
[  133.023871]  #3:  (pm_mutex){+.+...}, at: [<ffffffff810d501f>] pm_suspend+0x6f/0x9e0
[  133.023882]  #4:  (acpi_scan_lock){+.+.+.}, at: [<ffffffff8147c1ef>] acpi_scan_lock_acquire+0x12/0x14
[  133.023891]  #5:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810797f4>] disable_nonboot_cpus+0x24/0x3b0
[  133.023899]  #6:  (cpu_hotplug.dep_map){++++++}, at: [<ffffffff81079020>] cpu_hotplug_begin+0x0/0xc0
[  133.023906]  #7:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8107908d>] cpu_hotplug_begin+0x6d/0xc0
[  133.023907] 
stack backtrace:
[  133.023913] CPU: 0 PID: 5998 Comm: rtcwake Tainted: G     U          4.5.0-gfxbench-Patchwork_315+ #1
[  133.023915] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0043.2015.0904.1904 09/04/2015
[  133.023922]  0000000000000000 ffff880274967870 ffffffff81401d15 ffffffff825c52d0
[  133.023927]  ffffffff82586bd0 ffff8802749678b0 ffffffff810ca1b0 ffff880274967900
[  133.023932]  ffff880273845328 ffff880273844b00 ffff880273845440 0000000000000008
[  133.023935] Call Trace:
[  133.023941]  [<ffffffff81401d15>] dump_stack+0x67/0x92
[  133.023946]  [<ffffffff810ca1b0>] print_circular_bug+0x1e0/0x2e0
[  133.023949]  [<ffffffff810cd961>] __lock_acquire+0x1e81/0x1ef0
[  133.023953]  [<ffffffff810ce2a1>] lock_acquire+0xb1/0x200
[  133.023959]  [<ffffffff81252b10>] ? kernfs_remove_by_name_ns+0x40/0x90
[  133.023962]  [<ffffffff81251d81>] __kernfs_remove+0x241/0x320
[  133.023965]  [<ffffffff81252b10>] ? kernfs_remove_by_name_ns+0x40/0x90
[  133.023969]  [<ffffffff812518a7>] ? kernfs_find_ns+0x97/0x140
[  133.023972]  [<ffffffff81252b10>] kernfs_remove_by_name_ns+0x40/0x90
[  133.023975]  [<ffffffff812544a0>] sysfs_remove_file_ns+0x10/0x20
[  133.023979]  [<ffffffff8153b504>] device_del+0x124/0x240
[  133.023982]  [<ffffffff810cb82d>] ? trace_hardirqs_on+0xd/0x10
[  133.023986]  [<ffffffff8153b639>] device_unregister+0x19/0x60
[  133.023989]  [<ffffffff81545d42>] cpu_cache_sysfs_exit+0x52/0xb0
[  133.023992]  [<ffffffff81546318>] cacheinfo_cpu_callback+0x38/0x70
[  133.023995]  [<ffffffff8109bc59>] notifier_call_chain+0x39/0xa0
[  133.023999]  [<ffffffff8109bcc9>] __raw_notifier_call_chain+0x9/0x10
[  133.024002]  [<ffffffff8107900e>] cpu_notify_nofail+0x1e/0x30
[  133.024005]  [<ffffffff81079320>] _cpu_down+0x200/0x330
[  133.024011]  [<ffffffff810e5d70>] ? __call_rcu.constprop.58+0x2f0/0x2f0
[  133.024014]  [<ffffffff810e5dd0>] ? call_rcu_bh+0x20/0x20
[  133.024019]  [<ffffffff810e1830>] ? trace_raw_output_rcu_utilization+0x60/0x60
[  133.024023]  [<ffffffff810e1830>] ? trace_raw_output_rcu_utilization+0x60/0x60
[  133.024027]  [<ffffffff8107987a>] disable_nonboot_cpus+0xaa/0x3b0
[  133.024031]  [<ffffffff810d47f8>] suspend_devices_and_enter+0x478/0xc30
[  133.024035]  [<ffffffff810d54c5>] pm_suspend+0x515/0x9e0
[  133.024038]  [<ffffffff810d3617>] state_store+0x77/0xe0
[  133.024043]  [<ffffffff81403eaf>] kobj_attr_store+0xf/0x20
[  133.024046]  [<ffffffff81254200>] sysfs_kf_write+0x40/0x50
[  133.024049]  [<ffffffff812535bc>] kernfs_fop_write+0x13c/0x180
[  133.024054]  [<ffffffff811d5d03>] __vfs_write+0x23/0xe0
[  133.024059]  [<ffffffff810c7882>] ? percpu_down_read+0x52/0x90
[  133.024062]  [<ffffffff811da0f2>] ? __sb_start_write+0xb2/0xf0
[  133.024065]  [<ffffffff811da0f2>] ? __sb_start_write+0xb2/0xf0
[  133.024069]  [<ffffffff811d6ac2>] vfs_write+0xa2/0x190
[  133.024073]  [<ffffffff811f549a>] ? __fget_light+0x6a/0x90
[  133.024076]  [<ffffffff811d7914>] SyS_write+0x44/0xb0
[  133.024080]  [<ffffffff817c82db>] entry_SYSCALL_64_fastpath+0x16/0x6f
[  133.027055] ACPI: Low-level resume complete
[  133.027294] PM: Restoring platform NVS memory

On Tue, Mar 29, 2016 at 08:34:36AM +0000, Patchwork wrote:
> == Series Details ==
> 
> Series: series starting with [v3] mm/vmap: Add a notifier for when we run out of vmap address space (rev3)
> URL   : https://patchwork.freedesktop.org/series/4569/
> State : failure
> 
> == Summary ==
> 
> Series 4569v3 Series without cover letter
> http://patchwork.freedesktop.org/api/1.0/series/4569/revisions/3/mbox/
> 
> Test kms_flip:
>         Subgroup basic-flip-vs-wf_vblank:
>                 pass       -> FAIL       (snb-x220t)
> Test pm_rpm:
>         Subgroup basic-pci-d3-state:
>                 pass       -> DMESG-WARN (bsw-nuc-2)
>         Subgroup basic-rte:
>                 dmesg-warn -> PASS       (byt-nuc) UNSTABLE
> 
> bdw-nuci7        total:192  pass:179  dwarn:0   dfail:0   fail:1   skip:12 
> bdw-ultra        total:192  pass:170  dwarn:0   dfail:0   fail:1   skip:21 
> bsw-nuc-2        total:192  pass:154  dwarn:1   dfail:0   fail:0   skip:37 
> byt-nuc          total:192  pass:157  dwarn:0   dfail:0   fail:0   skip:35 
> hsw-brixbox      total:192  pass:170  dwarn:0   dfail:0   fail:0   skip:22 
> hsw-gt2          total:192  pass:175  dwarn:0   dfail:0   fail:0   skip:17 
> ivb-t430s        total:192  pass:167  dwarn:0   dfail:0   fail:0   skip:25 
> skl-i7k-2        total:192  pass:169  dwarn:0   dfail:0   fail:0   skip:23 
> skl-nuci5        total:192  pass:181  dwarn:0   dfail:0   fail:0   skip:11 
> snb-dellxps      total:192  pass:158  dwarn:0   dfail:0   fail:0   skip:34 
> snb-x220t        total:192  pass:157  dwarn:0   dfail:0   fail:2   skip:33 
> 
> Results at /archive/results/CI_IGT_test/Patchwork_1725/
> 
> f5d413cccefa1f93d64c34f357151d42add63a84 drm-intel-nightly: 2016y-03m-24d-14h-34m-29s UTC integration manifest
> d9f9cda1b4e8a64ad1ac9bef0392e2c701b0d9f7 drm/i915/shrinker: Hook up vmap allocation failure notifier
> 93698e6141bccbabfc898a7d2cad5577f5af893b mm/vmap: Add a notifier for when we run out of vmap address space
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx