All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 196223] New: jdb2 journal - hung after suspend-resume test
@ 2017-06-30  6:11 bugzilla-daemon
  2017-06-30  7:41 ` [Bug 196223] " bugzilla-daemon
                   ` (25 more replies)
  0 siblings, 26 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-06-30  6:11 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

            Bug ID: 196223
           Summary: jdb2 journal - hung after suspend-resume test
           Product: File System
           Version: 2.5
    Kernel Version: 4.12.0-rc7-next-20170629
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: blocking
          Priority: P1
         Component: ext4
          Assignee: fs_ext4@kernel-bugs.osdl.org
          Reporter: marta.lofstedt@intel.com
        Regression: No

Created attachment 257243
  --> https://bugzilla.kernel.org/attachment.cgi?id=257243&action=edit
backtraces jdb2/ext4 hang

While running IGT graphics test on linux-next various x86 machines hang. The
issue is reproducible with running the fastfeedback.testlist of
IGT(https://cgit.freedesktop.org/xorg/app/intel-gpu-tools)
Example backtrace from BDW NUCi5, more backtraces from other machines in
attachment.

<3>[ 1330.158734] INFO: task jbd2/sda2-8:208 blocked for more than 120 seconds.
<3>[ 1330.158814]       Tainted: G     U  W       4.12.0-rc7-next-20170629 #1
<3>[ 1330.158877] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
<6>[ 1330.158950] jbd2/sda2-8     D    0   208      2 0x00000000
<4>[ 1330.158956] Call Trace:
<4>[ 1330.158969]  __schedule+0x3c7/0x830
<4>[ 1330.158974]  schedule+0x36/0x80
<4>[ 1330.158979]  io_schedule+0x16/0x40
<4>[ 1330.158984]  wait_on_page_bit_common+0x10e/0x1d0
<4>[ 1330.158988]  ? page_cache_tree_insert+0xf0/0xf0
<4>[ 1330.158992]  __filemap_fdatawait_range+0x114/0x180
<4>[ 1330.158998]  ? submit_bio+0x73/0x150
<4>[ 1330.159002]  ? submit_bio+0x73/0x150
<4>[ 1330.159007]  ? jbd2_journal_begin_ordered_truncate+0xb0/0xb0
<4>[ 1330.159012]  filemap_fdatawait_keep_errors+0x27/0x50
<4>[ 1330.159016]  jbd2_journal_commit_transaction+0x89e/0x16d0
<4>[ 1330.159024]  kjournald2+0xd2/0x270
<4>[ 1330.159028]  ? kjournald2+0xd2/0x270
<4>[ 1330.159033]  ? wait_woken+0x80/0x80
<4>[ 1330.159039]  kthread+0x109/0x140
<4>[ 1330.159044]  ? commit_timeout+0x10/0x10
<4>[ 1330.159049]  ? kthread_create_on_node+0x70/0x70
<4>[ 1330.159053]  ret_from_fork+0x25/0x30
<4>[ 1330.159057] NMI backtrace for cpu 0
<4>[ 1330.159061] CPU: 0 PID: 36 Comm: khungtaskd Tainted: G     U  W      
4.12.0-rc7-next-20170629 #1
<4>[ 1330.159063] Hardware name:                  /NUC5i5RYB, BIOS
RYBDWi35.86A.0249.2015.0529.1640 05/29/2015
<4>[ 1330.159064] Call Trace:
<4>[ 1330.159070]  dump_stack+0x63/0x8d
<4>[ 1330.159076]  nmi_cpu_backtrace+0xca/0xd0
<4>[ 1330.159081]  ? irq_force_complete_move+0x140/0x140
<4>[ 1330.159087]  nmi_trigger_cpumask_backtrace+0xde/0x110
<4>[ 1330.159092]  arch_trigger_cpumask_backtrace+0x19/0x20
<4>[ 1330.159096]  watchdog+0x2dd/0x360
<4>[ 1330.159103]  kthread+0x109/0x140
<4>[ 1330.159106]  ? reset_hung_task_detector+0x20/0x20
<4>[ 1330.159111]  ? kthread_create_on_node+0x70/0x70
<4>[ 1330.159115]  ret_from_fork+0x25/0x30

This issue is blocking linux-next testing for i915 driver.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
@ 2017-06-30  7:41 ` bugzilla-daemon
  2017-06-30  7:51 ` bugzilla-daemon
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-06-30  7:41 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #1 from Marta Löfstedt (marta.lofstedt@intel.com) ---
reproduced with 4.12.0-rc7-next-20170630

[ 1209.338769] INFO: task systemd:1 blocked for more than 120 seconds.
[ 1209.338848]       Tainted: G     U  W       4.12.0-rc7-next-20170630 #2
[ 1209.338916] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1209.338996] systemd         D    0     1      0 0x00000000
[ 1209.339002] Call Trace:
[ 1209.339015]  __schedule+0x3c7/0x830
[ 1209.339022]  ? bit_wait+0x60/0x60
[ 1209.339026]  schedule+0x36/0x80
[ 1209.339031]  io_schedule+0x16/0x40
[ 1209.339036]  bit_wait_io+0x11/0x60
[ 1209.339041]  __wait_on_bit+0x58/0x90
[ 1209.339047]  ? submit_bio+0x73/0x150
[ 1209.339052]  out_of_line_wait_on_bit+0x8e/0xb0
[ 1209.339058]  ? bit_waitqueue+0x40/0x40
[ 1209.339064]  __wait_on_buffer+0x32/0x40
[ 1209.339070]  __ext4_get_inode_loc+0x1b3/0x400
[ 1209.339076]  ext4_iget+0x92/0xbc0
[ 1209.339082]  ext4_iget_normal+0x2f/0x40
[ 1209.339087]  ext4_lookup+0xea/0x270
[ 1209.339091]  lookup_slow+0xa5/0x160
[ 1209.339096]  walk_component+0x1bf/0x370
[ 1209.339099]  ? path_init+0x1e4/0x3a0
[ 1209.339103]  path_lookupat+0x73/0x220
[ 1209.339107]  filename_lookup+0xb8/0x1a0
[ 1209.339113]  ? __check_object_size+0xb3/0x190
[ 1209.339119]  ? strncpy_from_user+0x4d/0x170
[ 1209.339124]  user_path_at_empty+0x36/0x40
[ 1209.339127]  ? user_path_at_empty+0x36/0x40
[ 1209.339132]  vfs_statx+0x83/0xf0
[ 1209.339137]  SYSC_newstat+0x3d/0x70
[ 1209.339144]  ? __do_page_fault+0x266/0x4e0
[ 1209.339149]  SyS_newstat+0xe/0x10
[ 1209.339153]  entry_SYSCALL_64_fastpath+0x1e/0xa9
[ 1209.339157] RIP: 0033:0x7f3dc48a4215
[ 1209.339160] RSP: 002b:00007ffe95210158 EFLAGS: 00000246 ORIG_RAX:
0000000000000004
[ 1209.339164] RAX: ffffffffffffffda RBX: 00007ffe95210360 RCX:
00007f3dc48a4215
[ 1209.339166] RDX: 00007ffe952101f0 RSI: 00007ffe952101f0 RDI:
00007f3dc4938038
[ 1209.339168] RBP: 0000000000000001 R08: 000000512bfda2a0 R09:
0000000000000020
[ 1209.339170] R10: 2ce33e6c02ce33e7 R11: 0000000000000246 R12:
0000000000000000
[ 1209.339172] R13: 0000000000000000 R14: 00007ffe95210370 R15:
000000512bfb9c88

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
  2017-06-30  7:41 ` [Bug 196223] " bugzilla-daemon
@ 2017-06-30  7:51 ` bugzilla-daemon
  2017-06-30 14:55 ` bugzilla-daemon
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-06-30  7:51 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

Martin Peres (martin.peres@free.fr) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |martin.peres@free.fr

--- Comment #2 from Martin Peres (martin.peres@free.fr) ---
The attached backtrace also contains lockdep information about the locks taken.
Unfortunately, it did not pinpoint to any specific issue, but a filesystem
maintainer may get what is going on.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
  2017-06-30  7:41 ` [Bug 196223] " bugzilla-daemon
  2017-06-30  7:51 ` bugzilla-daemon
@ 2017-06-30 14:55 ` bugzilla-daemon
  2017-07-03  6:15 ` bugzilla-daemon
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-06-30 14:55 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

Theodore Tso (tytso@mit.edu) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tytso@mit.edu

--- Comment #3 from Theodore Tso (tytso@mit.edu) ---
Since this commit has been implicated in another problem (with quite different
symptoms) it's a bit of a long shot, but since you have a quick and easy
reproduction, can you try reverting 81378da64de6: "jbd2: mark the transaction
context with the scope GFP_NOFS context" and see if that makes the hang go
away?

Failing that, I might have to ask you to do a bisect on the ext4 patches added
during the 4.12 merge window.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (2 preceding siblings ...)
  2017-06-30 14:55 ` bugzilla-daemon
@ 2017-07-03  6:15 ` bugzilla-daemon
  2017-07-03 13:49 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-03  6:15 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #4 from Marta Löfstedt (marta.lofstedt@intel.com) ---
(In reply to Theodore Tso from comment #3)
> Since this commit has been implicated in another problem (with quite
> different symptoms) it's a bit of a long shot, but since you have a quick
> and easy reproduction, can you try reverting 81378da64de6: "jbd2: mark the
> transaction context with the scope GFP_NOFS context" and see if that makes
> the hang go away?
> 
> Failing that, I might have to ask you to do a bisect on the ext4 patches
> added during the 4.12 merge window.

Doesn't help:
[  846.830596] INFO: task systemd:1 blocked for more than 120 seconds.
[  846.830705]       Tainted: G     U  W       4.12.0-rc7-next-20170630+ #3
[  846.830774] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  846.830854] systemd         D    0     1      0 0x00000000
[  846.830860] Call Trace:
[  846.830873]  __schedule+0x3c7/0x830
[  846.830880]  ? bit_wait+0x60/0x60
[  846.830884]  schedule+0x36/0x80
[  846.830889]  io_schedule+0x16/0x40
[  846.830893]  bit_wait_io+0x11/0x60
[  846.830898]  __wait_on_bit+0x58/0x90
[  846.830903]  ? submit_bio+0x73/0x150
[  846.830908]  out_of_line_wait_on_bit+0x8e/0xb0
[  846.830914]  ? bit_waitqueue+0x40/0x40
[  846.830920]  __wait_on_buffer+0x32/0x40
[  846.830926]  __ext4_get_inode_loc+0x1b3/0x400
[  846.830932]  ext4_iget+0x92/0xbc0
[  846.830938]  ext4_iget_normal+0x2f/0x40
[  846.830942]  ext4_lookup+0xea/0x270
[  846.830946]  lookup_slow+0xa5/0x160
[  846.830951]  walk_component+0x1bf/0x370
[  846.830954]  ? path_init+0x1e4/0x3a0
[  846.830958]  path_lookupat+0x73/0x220
[  846.830962]  filename_lookup+0xb8/0x1a0
[  846.830966]  ? __d_alloc+0x27/0x1d0
[  846.830972]  ? __check_object_size+0xb3/0x190
[  846.830977]  ? strncpy_from_user+0x4d/0x170
[  846.830981]  user_path_at_empty+0x36/0x40
[  846.830985]  ? user_path_at_empty+0x36/0x40
[  846.830990]  vfs_statx+0x83/0xf0
[  846.830995]  SYSC_newstat+0x3d/0x70
[  846.831001]  ? ep_read_events_proc+0xd0/0xd0
[  846.831006]  SyS_newstat+0xe/0x10
[  846.831010]  entry_SYSCALL_64_fastpath+0x1e/0xa9

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (3 preceding siblings ...)
  2017-07-03  6:15 ` bugzilla-daemon
@ 2017-07-03 13:49 ` bugzilla-daemon
  2017-07-04  5:56 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-03 13:49 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #5 from Marta Löfstedt (marta.lofstedt@intel.com) ---
Unfortunately, this is not a fast way to bisect. Running the testlist takes ~20
minutes. I usually hit the issue on the second repetition and usually on a
suspend-resume related tests. So, I am still searching for a reasonably new
good commit to start git bisect.

4.12.0-rc7-next-20170703 is bad
4.12.0-rc7-next-20170628 is bad 

I will spin 4.12.0-rc7-next-20170627 over night.

Are there some fs/ext4 specific test that may give a faster reproducer?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (4 preceding siblings ...)
  2017-07-03 13:49 ` bugzilla-daemon
@ 2017-07-04  5:56 ` bugzilla-daemon
  2017-07-04 22:49 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-04  5:56 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #6 from Marta Löfstedt (marta.lofstedt@intel.com) ---
4.12.0-rc7-next-20170627 is good

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (5 preceding siblings ...)
  2017-07-04  5:56 ` bugzilla-daemon
@ 2017-07-04 22:49 ` bugzilla-daemon
  2017-07-05 13:12 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-04 22:49 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #7 from Theodore Tso (tytso@mit.edu) ---
Hmm...  there aren't any significant differences between
4.12.0-rc7-next-20170627 and 4.12.0-rc7-next-20170628 in fs/jbd2 or fs/ext4.

I've taken a quick look at the changes in arch/x86, kernel, and block but
nothing has leapt out at me.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (6 preceding siblings ...)
  2017-07-04 22:49 ` bugzilla-daemon
@ 2017-07-05 13:12 ` bugzilla-daemon
  2017-07-07  8:20 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-05 13:12 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #8 from Marta Löfstedt (marta.lofstedt@intel.com) ---
FYI, for the BDW NUCi5 I am using to manually reproduce this issue, I can't not
reproduce with next-20170704 nor next-20170705. However, we are still seeing
the issue on a lot of machines in our i915 CI lab on the same tags built with
the same configuration. 
Unfortunately, this weakens my statement about next-20170627 being a good tag,
since this tag has not been run in our lab and now the results for my local
machine has started to diverge from the lab results.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (7 preceding siblings ...)
  2017-07-05 13:12 ` bugzilla-daemon
@ 2017-07-07  8:20 ` bugzilla-daemon
  2017-07-07 14:14 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-07  8:20 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #9 from Marta Löfstedt (marta.lofstedt@intel.com) ---
I have had no luck with bisecting, ending up with a commit only changing docs.
So, I guess my reproducer isn't good, which I have already discussed above.
I.e. I believe I have failed to establish a good base. Anyways, the back-traces
speak for themselves, we just don't know when this issue started. We will be
trying to do some runs on earlier tags on the next branch to establish a good
base.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (8 preceding siblings ...)
  2017-07-07  8:20 ` bugzilla-daemon
@ 2017-07-07 14:14 ` bugzilla-daemon
  2017-07-08 18:58 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-07 14:14 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

Tomi Sarvela (tomi.p.sarvela@intel.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tomi.p.sarvela@intel.com

--- Comment #10 from Tomi Sarvela (tomi.p.sarvela@intel.com) ---
Out of linux-next builds, 20170626 is known good, results copied out:

https://intel-gfx-ci.01.org/CI/next-issues.html
https://intel-gfx-ci.01.org/CI/next-all.html

20170627 couldn't be tested because the locking issue (IOMMU panic)

20170628 has been tested bad earlier.

The newest build 20170707 has collected the following traces:

https://intel-gfx-ci.01.org/CI/next-20170707/fi-ivb-3520m/dmesg-149942861_Panic_1.log
https://intel-gfx-ci.01.org/CI/next-20170707/fi-hsw-4770r/dmesg-149942860_Panic_1.log
https://intel-gfx-ci.01.org/CI/next-20170707/fi-bxt-j4205/dmesg-149942868_Panic_1.log
https://intel-gfx-ci.01.org/CI/next-20170707/fi-skl-x1585l/dmesg-149942869_Panic_1.log
https://intel-gfx-ci.01.org/CI/next-20170707/fi-glk-2a/dmesg-149942868_Panic_1.log

Marta has left for vacation. I probably don't have time for bisection before
next Tuesday.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (9 preceding siblings ...)
  2017-07-07 14:14 ` bugzilla-daemon
@ 2017-07-08 18:58 ` bugzilla-daemon
  2017-07-08 19:22 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-08 18:58 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #11 from Theodore Tso (tytso@mit.edu) ---
Do you have any problems with 4.12?  Or is it only with the linux-next branch?

Can you try the dev branch of ext4: ff95015648df44

Does it reproduce on this commit?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (10 preceding siblings ...)
  2017-07-08 18:58 ` bugzilla-daemon
@ 2017-07-08 19:22 ` bugzilla-daemon
  2017-07-09  5:05 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-08 19:22 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #12 from Theodore Tso (tytso@mit.edu) ---
It looks like the kernel is blocked waiting for I/O to complete.  So this could
be anything, up to and including the interrupt controller getting FUBAR'ed.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (11 preceding siblings ...)
  2017-07-08 19:22 ` bugzilla-daemon
@ 2017-07-09  5:05 ` bugzilla-daemon
  2017-07-09  5:27 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-09  5:05 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #13 from Theodore Tso (tytso@mit.edu) ---
I tried downloading intel-gpu-tools and tried running the igt tests using
piglet and the instructions in the README on my Lenovo T470 laptop (which has
an intel graphics chipset).  The results were not promising:

4.9: got to around test 55 (out of some 60,000+ tests) before the system reset
into the BIOS

4.11.0: got to test 17 before the test locked up the kernel so hard that
magic-sysrq wasn't working

4.12 plus the ext4 patches in linux-next: got to test 45 before test locked up
the kernel so hard that magic-sysrq wasn't working

Given that it's not working on 4.9 and 4.11, I'm not going to be able to
distinguish the regression you're seeing with the natural failure to run the
tests on a Lenovo T470.

I will say that on both my Lenovo T470 and T540p, for the past year or two
(since 4.4 or perhaps earlier) suspend resume after docking with using a MST
dock talking to a Dell 30" monitor has always had about a 10-20% chance of
locking up on resume.   It used to work just fine, but sometime in the early
4.x kernels there was a regression.

I tried reporting it as a bug on various i915 and drm lists back when I first
noticed, but it seems that MST doesn't get much love, and so I've never
considered the i915 driver in combination with the Dock and an MST display to
be terribly stable.  And I've been too cheap to spend a thousand bucks or so to
get a new 30" monitor which is non-MST, assuming it would work with the Lenovo
Dock, which I guess there is no guarantee.  :-(

(And hmm, come to think of it I guess I was running the tests while docked to
the Dell 30" monitor.  I've just gotten so used to the fact that if I suspend
while being docked and using the Dell monitor, the laptop might never come
back, so I just save all my buffers beforehand and hope I get lucky on resume. 
I've gotten so used to the lossage I forgot I probably should have tried the
test repro experiment without the dock and the Dell monitor attached.   I guess
when I have time later this week I can try rerunning the tests when I'm not
docked and see if I have any better luck.)

If anyone can give me more specific repro instructions for IGT --- is there a
specific test or subtest I should be running, to avoid running into other i915
bugs that might be unrelated?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (12 preceding siblings ...)
  2017-07-09  5:05 ` bugzilla-daemon
@ 2017-07-09  5:27 ` bugzilla-daemon
  2017-07-10  5:04 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-09  5:27 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #14 from Theodore Tso (tytso@mit.edu) ---
Also for the record, there is also no difference in any ext4/jbd2 code between
next-20170626, next-20170627, next-20170628, and next-20170630.   So if
next-20170726 is working for you, and next-201706028 is not, it very well be
the case that the fact that we are blocked waiting for I/O in ext4 code is a
symptom, not the cause of the problem, and if ext4 is the cause of the problem,
it must be indirectly.

So I'm not sure there's anything more I can do, especially since I don't have a
reliable repro that I can run at this point.

Something else that might be worth trying.  What's in linux-next is slowing
getting picked up by Linus.  You might see if the tip of Linus's tree is
passing your tests or not.  If not, it will be much easier to do a true git
bisection, unlike with linux-next since many patches get rebased.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (13 preceding siblings ...)
  2017-07-09  5:27 ` bugzilla-daemon
@ 2017-07-10  5:04 ` bugzilla-daemon
  2017-07-10  5:10 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-10  5:04 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #15 from Tomi Sarvela (tomi.p.sarvela@intel.com) ---
This S3 issue has appeared only on linux-next tree. Agree that it might be
timing or ordering problem on resume and not jbd2/ext4 issue.

I'm running linus' tree on every rc-tag. Most of the testing here is with
DRM-Tip tree, which will pick the 4.12 changes in short time.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (14 preceding siblings ...)
  2017-07-10  5:04 ` bugzilla-daemon
@ 2017-07-10  5:10 ` bugzilla-daemon
  2017-07-10  5:14 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-10  5:10 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #16 from Tomi Sarvela (tomi.p.sarvela@intel.com) ---
For the future IGT runs: basic set ("BAT") is run with -t basic. Our short
testset can be run with -T tests/intel-ci/fast-feedback.testlist.

Most of the igt-all 60k tests are gem_concurrent_blit ones (50k+). Everything
else should be in better shape. We're working to get all tests runnable in
reasonable time and successfully, at the moment expected time taken for igt all
is 3 weeks on one host.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (15 preceding siblings ...)
  2017-07-10  5:10 ` bugzilla-daemon
@ 2017-07-10  5:14 ` bugzilla-daemon
  2017-07-10 12:58 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-10  5:14 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #17 from Tomi Sarvela (tomi.p.sarvela@intel.com) ---
Right, and the helper command which expects piglit to be present is:
./scripts/run-tests.sh

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (16 preceding siblings ...)
  2017-07-10  5:14 ` bugzilla-daemon
@ 2017-07-10 12:58 ` bugzilla-daemon
  2017-07-10 22:52 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-10 12:58 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #18 from Tomi Sarvela (tomi.p.sarvela@intel.com) ---
I managed to reproduce the bug on next-20170626 and earlier by running
igt@gem_exec_suspend@basic-s3 test back-to-back, takes ~5 runs. Later
next-kernels reproduce this issue on 1-2 runs.

./scripts/run-tests.sh -vt igt@gem_exec_suspend@basic-s3 -x devices

Issue is seen when igt test returns 'pass', but doesn't drop back to command
line. Repeated Ctrl-C gets to the command prompt. Cached commands work, but
anything that needs disk access doesn't.

linus-v4.12 handled ~10 runs without a problem, then same issue.

Dmesg shows in usual case something like:

[  137.351829] nvme nvme0: I/O 197 QID 0 timeout, disable controller
[  137.464115] nvme nvme0: failed to set APST feature (-4)
[  137.861403] EXT4-fs warning (device nvme0n1p2): ext4_end_bio:322: I/O error
10 writing to inode 1453376 (offset 0 size 4096 starting block 5949696)
[  137.861406] Buffer I/O error on device nvme0n1p2, logical block 5818368

I'm concerned about the 'disable controller' part.

Setup on this testing: KBL-7700K on MSI Z170 mb. Ubuntu base, Intel 600p M.2,
one DP connected to monitor.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (17 preceding siblings ...)
  2017-07-10 12:58 ` bugzilla-daemon
@ 2017-07-10 22:52 ` bugzilla-daemon
  2017-07-11  8:12 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-10 22:52 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #19 from Theodore Tso (tytso@mit.edu) ---
The nvme failure followed by the ext4 I/O error warning makes it pretty clear
this is a storage device issue.  So the next question is whether all of your CI
hardware has nvme storage devices (in which case the problem might be in the
nvme device driver not handling suspend/resume)?   Or perhaps after the
suspend/resume something much more fundamental is getting screwed up (e.g.,
like interrupts getting disabled, or some such).

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (18 preceding siblings ...)
  2017-07-10 22:52 ` bugzilla-daemon
@ 2017-07-11  8:12 ` bugzilla-daemon
  2017-07-11  8:40 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-11  8:12 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #20 from Tomi Sarvela (tomi.p.sarvela@intel.com) ---
Most of our CI machines have standard SATA SSD disks, from 3 different vendors.
I picked just this one to test manually because it's the fastest to boot.

I'm trying to reproduce the issue on todays linux-next on other machines and
get a trace what's going on.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (19 preceding siblings ...)
  2017-07-11  8:12 ` bugzilla-daemon
@ 2017-07-11  8:40 ` bugzilla-daemon
  2017-07-11  9:59 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-11  8:40 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #21 from Tomi Sarvela (tomi.p.sarvela@intel.com) ---
There is probably two issues here:
- disabled NVMe controller on kbl-7700k with NVMe

This issue is easily reproduced with gem_exec_suspend@basic-s3, just loop it.
Has been triggered in linus-v4.12, linux-next and DRM-Tip builds, just fastest
to hit on linux-next.

- 60sec timeout and panic from watchdog on most of the other hosts

This might be controller / disk issue too, but I can't read the log well enough
to tell if that's the case.

This latter issue is probably what Marta was tracking, and it's much harder to
reproduce manually. IGT fast-feedback runs seem to hit it very regularly, for
some reason. There's nothing magical in what CI is doing: Open SSH connection
which is kept up, start Jenkins-CI Java client, this starts IGT run through a
shell script. This all within ~30 seconds of testhost OpenSSH starting up.

Collected traces from todays 20170711 run, very different hosts:

https://intel-gfx-ci.01.org/CI/next-20170711/fi-ivb-3520m/dmesg-149974972_Panic_1.log
fi-ivb-3520m: HP Elitebook 8440p laptop with Intel SSD 520

https://intel-gfx-ci.01.org/CI/next-20170711/fi-ivb-3770/dmesg-149974971_Panic_1.log
fi-ivb-3770: HP Pro 3500 desktop with Intel SSD 530

https://intel-gfx-ci.01.org/CI/next-20170711/fi-hsw-4770r/dmesg-149974972_Panic_1.log
fi-hsw-4770r: Gigabyte "Brixbox" NUC with Samsung SSD 840 EVO

https://intel-gfx-ci.01.org/CI/next-20170711/fi-bxt-j4205/dmesg-149974980_Panic_1.log
fi-bxt-j4205: Asrock J4205-ITX MB with Intel SSD 530

I'll continue to look for a way to reproduce this issue more easily.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (20 preceding siblings ...)
  2017-07-11  8:40 ` bugzilla-daemon
@ 2017-07-11  9:59 ` bugzilla-daemon
  2017-07-11 13:15 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-11  9:59 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #22 from Tomi Sarvela (tomi.p.sarvela@intel.com) ---
It seems that Broxton is hanging to this issue each fast-feedback run, with
MCE. Takes 5 minutes to reproduce.

I'll try bisecting with this setup.

- cmdline:
testrunner@bxt-j3405:/opt/igt$ ./scripts/run-tests.sh -vT
tests/intel-ci/fast-feedback.testlist 

- output before hang
running: igt/gem_exec_suspend/basic
pass: igt/gem_exec_suspend/basic   
running: igt/gem_exec_suspend/basic-s3
pass: igt/gem_exec_suspend/basic-s3   
[108/279] skip: 11, pass: 97 \

Note that basic-s3 passes before host hangs

- serial
bxt-j3405 login: [  273.799025] mce: [Hardware Error]: CPU 0: Machine Check: 0
Bank 4: a600000000028
[  273.799033] mce: [Hardware Error]: TSC 0 ADDR fef13b80 
[  273.799057] mce: [Hardware Error]: PROCESSOR 0:506c9 TIME 1499766163 SOCKET
0 APIC 0 microcode 1c
[  369.635514] INFO: task NetworkManager:582 blocked for more than 60 seconds.
[  369.643203]       Tainted: G     U         
4.12.0-next-20170711-CI-next-20170711 #1
[  369.652000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  369.662019] Kernel panic - not syncing: hung_task: blocked tasks
[  369.668287] CPU: 0 PID: 36 Comm: khungtaskd Tainted: G     U         
4.12.0-next-20170711-CI-ne1
[  369.678913] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./J3455-ITX, BIOS P1.10 06
[  369.689036] Call Trace:
[  369.691613]  dump_stack+0x67/0x97
[  369.695146]  panic+0xcf/0x212
[  369.698314]  watchdog+0x469/0x620
[  369.701849]  kthread+0x10f/0x150
[  369.705267]  ? reset_hung_task_detector+0x10/0x10
[  369.710259]  ? kthread_create_on_node+0x40/0x40
[  369.715079]  ret_from_fork+0x27/0x40
[  369.718888] Kernel Offset: disabled
[  369.722627] Rebooting in 1 seconds..
[  370.726512] ACPI MEMORY or I/O RESET_REG.

- dmesg
[  276.985613] r8169 0000:01:00.0 enp1s0: link up
[  333.940312] usb 1-6: USB disconnect, device number 7
[  335.471502] usb 1-6: new low-speed USB device number 8 using xhci_hcd
[  335.644636] usb 1-6: New USB device found, idVendor=046d, idProduct=c077
[  335.644652] usb 1-6: New USB device strings: Mfr=1, Product=2,
SerialNumber=0
[  335.644665] usb 1-6: Product: USB Optical Mouse
[  335.644676] usb 1-6: Manufacturer: Logitech
[  335.655340] input: Logitech USB Optical Mouse as
/devices/pci0000:00/0000:00:15.0/usb1/1-6/1-6:1.0/0003:046D:C077.0008/input/input21
[  335.708842] hid-generic 0003:046D:C077.0008: input,hidraw2: USB HID v1.11
Mouse [Logitech USB Optical Mouse] on usb-0000:00:15.0-6/input0
[  369.635514] INFO: task NetworkManager:582 blocked for more than 60 seconds.
[  369.643203]       Tainted: G     U         
4.12.0-next-20170711-CI-next-20170711 #1
[  369.652000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  369.660444] NetworkManager  D    0   582      1 0x00000000
[  369.660454] Call Trace:
[  369.660470]  __schedule+0x405/0xb40
[  369.660479]  ? bit_wait+0x50/0x50
[  369.660484]  schedule+0x3b/0x90
[  369.660491]  io_schedule+0x11/0x40
[  369.660496]  bit_wait_io+0xc/0x50
[  369.660501]  __wait_on_bit+0x5c/0x90
[  369.660508]  out_of_line_wait_on_bit+0x7c/0x80
[  369.660515]  ? bit_waitqueue+0x30/0x30
[  369.660522]  __wait_on_buffer+0x3b/0x40
[  369.660530]  ext4_find_entry+0x3f8/0x5b0
[  369.660543]  ? d_alloc+0x84/0xc0
[  369.660553]  ext4_lookup+0x46/0x1c0
[  369.660560]  __lookup_hash+0x53/0x80
[  369.660566]  do_unlinkat+0x12c/0x2a0
[  369.660575]  SyS_unlink+0x11/0x20
[  369.660580]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  369.660585] RIP: 0033:0x7f2e06056a57
[  369.660590] RSP: 002b:00007fff23acbac8 EFLAGS: 00000206 ORIG_RAX:
0000000000000057
[  369.660600] RAX: ffffffffffffffda RBX: ffffffff8146e023 RCX:
00007f2e06056a57
[  369.660605] RDX: 0000000002260d90 RSI: 00007f2e06322b28 RDI:
0000000002237be0
[  369.660610] RBP: ffffc9000067bf88 R08: 0000000002260df0 R09:
0000000000000100
[  369.660614] R10: 00000000000004c9 R11: 0000000000000206 R12:
0000000002221980
[  369.660618] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000002237be0
[  369.660626]  ? __this_cpu_preempt_check+0x13/0x20
[  369.660635] 
               Showing all locks held in the system:
[  369.660650] 2 locks held by khungtaskd/36:
[  369.660668]  #0:  (rcu_read_lock){......}, at: [<ffffffff81138c2c>]
watchdog+0x9c/0x620
[  369.660690]  #1:  (tasklist_lock){.+.+..}, at: [<ffffffff810d76ad>]
debug_show_all_locks+0x3d/0x1a0
[  369.660720] 2 locks held by NetworkManager/582:
[  369.660724]  #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff8122d39f>]
mnt_want_write+0x1f/0x50
[  369.660747]  #1:  (&type->i_mutex_dir_key/1){+.+.+.}, at:
[<ffffffff8121865c>] do_unlinkat+0x11c/0x2a0
[  369.660770] 1 lock held by in:imklog/612:
[  369.660774]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff8122a59e>]
__fdget_pos+0x3e/0x50
[  369.660794] 1 lock held by rs:main Q:Reg/613:
[  369.660797]  #0:  (&type->i_mutex_dir_key){++++++}, at: [<ffffffff81212ca2>]
lookup_slow+0xb2/0x1e0
[  369.660822] 1 lock held by dmesg/1402:
[  369.660826]  #0:  (&user->lock){+.+...}, at: [<ffffffff810ecc88>]
devkmsg_read+0x38/0x2d0
[  369.660852] 2 locks held by kworker/u8:29/2982:
[  369.660856]  #0:  ("events_unbound"){.+.+.+}, at: [<ffffffff8109c208>]
process_one_work+0x178/0x670
[  369.660876]  #1:  ((&entry->work)){+.+.+.}, at: [<ffffffff8109c208>]
process_one_work+0x178/0x670

[  369.660903] =============================================

[  369.660908] NMI backtrace for cpu 0
[  369.660914] CPU: 0 PID: 36 Comm: khungtaskd Tainted: G     U         
4.12.0-next-20170711-CI-next-20170711 #1
[  369.660917] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./J3455-ITX, BIOS P1.10 09/29/2016
[  369.660921] Call Trace:
[  369.660927]  dump_stack+0x67/0x97
[  369.660933]  nmi_cpu_backtrace+0xbf/0xd0
[  369.660939]  ? irq_force_complete_move+0xf0/0xf0
[  369.660945]  nmi_trigger_cpumask_backtrace+0xc0/0x100
[  369.660951]  arch_trigger_cpumask_backtrace+0x14/0x20
[  369.660955]  watchdog+0x45d/0x620
[  369.660962]  kthread+0x10f/0x150
[  369.660966]  ? reset_hung_task_detector+0x10/0x10
[  369.660972]  ? kthread_create_on_node+0x40/0x40
[  369.660977]  ret_from_fork+0x27/0x40
[  369.660989] Sending NMI from CPU 0 to CPUs 1-3:
[  369.661001] NMI backtrace for cpu 1
[  369.661004] CPU: 1 PID: 195 Comm: systemd-journal Tainted: G     U         
4.12.0-next-20170711-CI-next-20170711 #1
[  369.661004] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./J3455-ITX, BIOS P1.10 09/29/2016
[  369.661005] task: ffff8802743b8040 task.stack: ffffc90000224000
[  369.661007] RIP: 0033:0x7ffca352ba3d
[  369.661008] RSP: 002b:00007ffca349de10 EFLAGS: 00000202
[  369.661010] RAX: 0000000000000000 RBX: 0000000000000007 RCX:
00007ffca352ba3d
[  369.661010] RDX: 0000000000000000 RSI: 00007ffca349de40 RDI:
0000000000000007
[  369.661011] RBP: 00007ffca349de10 R08: 0000000015fe7e40 R09:
0000000000000001
[  369.661012] R10: 0027183014d2c5aa R11: 0000000000000202 R12:
00007ffca349de70
[  369.661013] R13: 0000000000000001 R14: 00007ffca349e0f0 R15:
0000000000000001
[  369.661014] FS:  00007fb99d7d1840(0000) GS:ffff88027fc80000(0000)
knlGS:0000000000000000
[  369.661015] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  369.661016] CR2: 00007fb99b6ee010 CR3: 0000000272577000 CR4:
00000000003406e0
[  369.661020] NMI backtrace for cpu 3 skipped: idling at pc 0xffffffff8188d448
[  369.661023] NMI backtrace for cpu 2 skipped: idling at pc 0xffffffff8188d448
[  369.662019] Kernel panic - not syncing: hung_task: blocked tasks

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (21 preceding siblings ...)
  2017-07-11  9:59 ` bugzilla-daemon
@ 2017-07-11 13:15 ` bugzilla-daemon
  2017-07-11 17:17 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-11 13:15 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #23 from Martin Peres (martin.peres@free.fr) ---
Thanks Tomi, I filed another bug here for the nvme issue:
https://bugzilla.kernel.org/show_bug.cgi?id=196325

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (22 preceding siblings ...)
  2017-07-11 13:15 ` bugzilla-daemon
@ 2017-07-11 17:17 ` bugzilla-daemon
  2017-07-12  7:52 ` bugzilla-daemon
  2017-07-12 14:17 ` bugzilla-daemon
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-11 17:17 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #24 from Tomi Sarvela (tomi.p.sarvela@intel.com) ---
Bisected by marking the merge bases good, and got the commit which looks like a
reasonable culprit:

5c279bd9e40624f4ab6e688671026d6005b066fa is the first bad commit
commit 5c279bd9e40624f4ab6e688671026d6005b066fa
Author: Christoph Hellwig <hch@lst.de>
Date:   Fri Jun 16 10:27:55 2017 +0200

    scsi: default to scsi-mq

    Remove the SCSI_MQ_DEFAULT config option and default to the blk-mq I/O
    path now that we had plenty of testing, and have I/O schedulers for
    blk-mq.  The module option to disable the blk-mq path is kept around for
    now.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

:040000 040000 57ec7d5d2ba76592a695f533a69f747700c31966
c79f6ecb070acc4fadf6fc05ca9ba32bc9c0c665 M      drivers

I'm using 'make olddefconfig' on kernel. Will doublecheck with different
kconfig options tomorrow.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (23 preceding siblings ...)
  2017-07-11 17:17 ` bugzilla-daemon
@ 2017-07-12  7:52 ` bugzilla-daemon
  2017-07-12 14:17 ` bugzilla-daemon
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-12  7:52 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #25 from Tomi Sarvela (tomi.p.sarvela@intel.com) ---
Shortest testlist I could find to reproduce the issue with is

scsi-mq.testlist:
igt@gem_exec_gttfill@basic
igt@gem_exec_suspend@basic-s3

Ran with ./scripts/run-tests.sh -vT scsi-mq.testlist

Using kernel option scsi_mod.use_blk_mq=0 makes this testlist pass.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug 196223] jdb2 journal - hung after suspend-resume test
  2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
                   ` (24 preceding siblings ...)
  2017-07-12  7:52 ` bugzilla-daemon
@ 2017-07-12 14:17 ` bugzilla-daemon
  25 siblings, 0 replies; 27+ messages in thread
From: bugzilla-daemon @ 2017-07-12 14:17 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=196223

--- Comment #26 from Theodore Tso (tytso@mit.edu) ---
OK, so this is a block_mq failure.   I recommend you send a report to
linux-block@vger.kernel.org and cc Jens Axboe <axboe@kernel.dk>

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-07-12 14:17 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-30  6:11 [Bug 196223] New: jdb2 journal - hung after suspend-resume test bugzilla-daemon
2017-06-30  7:41 ` [Bug 196223] " bugzilla-daemon
2017-06-30  7:51 ` bugzilla-daemon
2017-06-30 14:55 ` bugzilla-daemon
2017-07-03  6:15 ` bugzilla-daemon
2017-07-03 13:49 ` bugzilla-daemon
2017-07-04  5:56 ` bugzilla-daemon
2017-07-04 22:49 ` bugzilla-daemon
2017-07-05 13:12 ` bugzilla-daemon
2017-07-07  8:20 ` bugzilla-daemon
2017-07-07 14:14 ` bugzilla-daemon
2017-07-08 18:58 ` bugzilla-daemon
2017-07-08 19:22 ` bugzilla-daemon
2017-07-09  5:05 ` bugzilla-daemon
2017-07-09  5:27 ` bugzilla-daemon
2017-07-10  5:04 ` bugzilla-daemon
2017-07-10  5:10 ` bugzilla-daemon
2017-07-10  5:14 ` bugzilla-daemon
2017-07-10 12:58 ` bugzilla-daemon
2017-07-10 22:52 ` bugzilla-daemon
2017-07-11  8:12 ` bugzilla-daemon
2017-07-11  8:40 ` bugzilla-daemon
2017-07-11  9:59 ` bugzilla-daemon
2017-07-11 13:15 ` bugzilla-daemon
2017-07-11 17:17 ` bugzilla-daemon
2017-07-12  7:52 ` bugzilla-daemon
2017-07-12 14:17 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.