[Bug 32982] Kernel locks up a few minutes after boot

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Bug 32982] Kernel locks up a few minutes after boot
       [not found] <bug-32982-2300@https.bugzilla.kernel.org/>
@ 2011-04-11 23:05 ` bugzilla-daemon
  2011-04-11 23:09 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 60+ messages in thread
From: bugzilla-daemon @ 2011-04-11 23:05 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=32982


Andrew Morton <akpm@linux-foundation.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |akpm@linux-foundation.org
          Component|Video(Other)                |Video(DRI - non Intel)
         AssignedTo|drivers_video-other@kernel- |drivers_video-dri@kernel-bu
                   |bugs.osdl.org               |gs.osdl.org




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

------------------------------------------------------------------------------
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Bug 32982] Kernel locks up a few minutes after boot
       [not found] <bug-32982-2300@https.bugzilla.kernel.org/>
  2011-04-11 23:05 ` [Bug 32982] Kernel locks up a few minutes after boot bugzilla-daemon
@ 2011-04-11 23:09 ` bugzilla-daemon
  2011-04-11 23:14 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 60+ messages in thread
From: bugzilla-daemon @ 2011-04-11 23:09 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=32982


Rafael J. Wysocki <rjw@sisk.pl> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |florian@mickler.org,
                   |                            |maciej.rutecki@gmail.com,
                   |                            |rjw@sisk.pl
             Blocks|                            |32012




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

------------------------------------------------------------------------------
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Bug 32982] Kernel locks up a few minutes after boot
       [not found] <bug-32982-2300@https.bugzilla.kernel.org/>
  2011-04-11 23:05 ` [Bug 32982] Kernel locks up a few minutes after boot bugzilla-daemon
  2011-04-11 23:09 ` bugzilla-daemon
@ 2011-04-11 23:14 ` bugzilla-daemon
  2011-04-11 23:15 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 60+ messages in thread
From: bugzilla-daemon @ 2011-04-11 23:14 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=32982


Alex Deucher <alexdeucher@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |alexdeucher@gmail.com




--- Comment #1 from Alex Deucher <alexdeucher@gmail.com>  2011-04-11 23:14:27 ---
Is this a regression?  Did previous kernels work ok?  If so, can you bisect?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

------------------------------------------------------------------------------
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Bug 32982] Kernel locks up a few minutes after boot
       [not found] <bug-32982-2300@https.bugzilla.kernel.org/>
                   ` (2 preceding siblings ...)
  2011-04-11 23:14 ` bugzilla-daemon
@ 2011-04-11 23:15 ` bugzilla-daemon
  2011-04-12 18:34 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 60+ messages in thread
From: bugzilla-daemon @ 2011-04-11 23:15 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=32982





--- Comment #2 from Alex Deucher <alexdeucher@gmail.com>  2011-04-11 23:15:20 ---
Does it work ok if you remove:
vga=0x31a nomodesetting
from your grub entry?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

------------------------------------------------------------------------------
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Bug 32982] Kernel locks up a few minutes after boot
       [not found] <bug-32982-2300@https.bugzilla.kernel.org/>
                   ` (3 preceding siblings ...)
  2011-04-11 23:15 ` bugzilla-daemon
@ 2011-04-12 18:34 ` bugzilla-daemon
  2011-04-12 18:40 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 60+ messages in thread
From: bugzilla-daemon @ 2011-04-12 18:34 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=32982





--- Comment #3 from Bart Van Assche <bart.vanassche@gmail.com>  2011-04-12 18:34:48 ---
(In reply to comment #1)
> Is this a regression?  Did previous kernels work ok?  If so, can you bisect?

Yes, it's a regression - 2.6.38 and 2.6.38.2 run perfectly on the same system.

I haven't had much luck with the bisect though - halfway bisecting I
encountered a commit that made my system unbootable because reassembling the
RAID1 /boot partition failed. The log I have so far is:

$ git bisect start 'drivers/video'
# good: [521cb40b0c44418a4fd36dc633f575813d59a43d] Linux 2.6.38
git bisect good 521cb40b0c44418a4fd36dc633f575813d59a43d
# bad: [94c8a984ae2adbd9a9626fb42e0f2faf3e36e86f] Merge branch 'bugfixes' of
git://git.linux-nfs.org/projects/trondmy/nfs-2.6
git bisect bad 94c8a984ae2adbd9a9626fb42e0f2faf3e36e86f
# good: [da49252fb0392d8196833ef3da92e48fb371f8d7] Merge branch 'for-paul' of
git://gitorious.org/linux-omap-dss2/linux
git bisect good da49252fb0392d8196833ef3da92e48fb371f8d7
# good: [bf5f0019046d596d613caf74722ba4994e153899] video, sm501: add I/O
functions for use on powerpc
git bisect good bf5f0019046d596d613caf74722ba4994e153899
# good: [6b794743b2c5e21825d35b5d5dd57d6fcc388198] unicore32 framebuffer fix:
get videomemory by __get_free_pages() and make it floatable
git bisect good 6b794743b2c5e21825d35b5d5dd57d6fcc388198

unknown because of boot failure: 21cd72e7cb424f1686855602ec0fdc6e5830f249

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

------------------------------------------------------------------------------
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Bug 32982] Kernel locks up a few minutes after boot
       [not found] <bug-32982-2300@https.bugzilla.kernel.org/>
                   ` (4 preceding siblings ...)
  2011-04-12 18:34 ` bugzilla-daemon
@ 2011-04-12 18:40 ` bugzilla-daemon
  2011-04-12 18:42 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 60+ messages in thread
From: bugzilla-daemon @ 2011-04-12 18:40 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=32982





--- Comment #4 from Alex Deucher <alexdeucher@gmail.com>  2011-04-12 18:40:28 ---
'git bisect skip' to skip problematic commits.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

------------------------------------------------------------------------------
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Bug 32982] Kernel locks up a few minutes after boot
       [not found] <bug-32982-2300@https.bugzilla.kernel.org/>
                   ` (5 preceding siblings ...)
  2011-04-12 18:40 ` bugzilla-daemon
@ 2011-04-12 18:42 ` bugzilla-daemon
  2011-04-13 18:49 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 60+ messages in thread
From: bugzilla-daemon @ 2011-04-12 18:42 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=32982





--- Comment #5 from Alex Deucher <alexdeucher@gmail.com>  2011-04-12 18:42:33 ---
Also, the drm is not in drivers/video, it's in drivers/gpu

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

------------------------------------------------------------------------------
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Bug 32982] Kernel locks up a few minutes after boot
       [not found] <bug-32982-2300@https.bugzilla.kernel.org/>
                   ` (6 preceding siblings ...)
  2011-04-12 18:42 ` bugzilla-daemon
@ 2011-04-13 18:49 ` bugzilla-daemon
  2011-04-14 15:57 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 60+ messages in thread
From: bugzilla-daemon @ 2011-04-13 18:49 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=32982





--- Comment #6 from Bart Van Assche <bart.vanassche@gmail.com>  2011-04-13 18:49:13 ---
Although I'm still busy bisecting, I'd like to report that I got the following
hung task report with head b73a21fc66fee35b41db755abebfacba48b2fc76 (had
already seen something similar before with 2.6.39-rc2):

INFO: task kjournald:918 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald       D ffff880131b9ddb8     0   918      2 0x00000000
 ffff880131b9dd20 0000000000000046 ffff880131b9dca0 ffffffff8108cd6d
 0000000000000282 ffff880131b9dfd8 ffff880137729f40 ffff880131b9dfd8
 ffff880131b9c000 ffff880131b9c000 ffff880131b9c000 ffff880131b9dfd8
Call Trace:
 [<ffffffff8108cd6d>] ? trace_hardirqs_on_caller+0x14d/0x190
 [<ffffffff81048419>] ? sub_preempt_count+0xa9/0xe0
 [<ffffffffa00c615e>] journal_commit_transaction+0x13e/0x1590 [jbd]
 [<ffffffff813e2535>] ? _raw_spin_unlock_irqrestore+0x65/0x80
 [<ffffffff81048419>] ? sub_preempt_count+0xa9/0xe0
 [<ffffffff81075280>] ? wake_up_bit+0x40/0x40
 [<ffffffff8106258a>] ? del_timer_sync+0x8a/0xc0
 [<ffffffff81062500>] ? try_to_del_timer_sync+0x110/0x110
 [<ffffffffa00ca1a1>] kjournald+0xf1/0x250 [jbd]
 [<ffffffff81075280>] ? wake_up_bit+0x40/0x40
 [<ffffffffa00ca0b0>] ? commit_timeout+0x10/0x10 [jbd]
 [<ffffffff81074c66>] kthread+0x96/0xa0
 [<ffffffff813e4294>] kernel_thread_helper+0x4/0x10
 [<ffffffff8103cebb>] ? finish_task_switch+0x7b/0xe0
 [<ffffffff813e24ab>] ? _raw_spin_unlock_irq+0x3b/0x60
 [<ffffffff813e2944>] ? retint_restore_args+0xe/0xe
 [<ffffffff81074bd0>] ? __init_kthread_worker+0x70/0x70
 [<ffffffff813e4290>] ? gs_change+0xb/0xb
no locks held by kjournald/918.
INFO: task klauncher:5744 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
klauncher       D 00000001000297b4     0  5744   5743 0x00000000
 ffff88011dd73938 0000000000000046 ffff880100000000 ffffffff8108cbef
 ffffffff813e2535 ffff88011dd73fd8 ffff8801382e1f40 ffff88011dd73fd8
 ffff88011dd72000 ffff88011dd72000 ffff88011dd72000 ffff88011dd73fd8
Call Trace:
 [<ffffffff8108cbef>] ? mark_held_locks+0x6f/0xa0
 [<ffffffff813e2535>] ? _raw_spin_unlock_irqrestore+0x65/0x80
 [<ffffffff8115b830>] ? __wait_on_buffer+0x30/0x30
 [<ffffffff813df0e9>] io_schedule+0x59/0x80
 [<ffffffff8115b83e>] sleep_on_buffer+0xe/0x20
 [<ffffffff813df89a>] __wait_on_bit_lock+0x5a/0xc0
 [<ffffffff8115b830>] ? __wait_on_buffer+0x30/0x30
 [<ffffffff813df978>] out_of_line_wait_on_bit_lock+0x78/0x90
 [<ffffffff810752d0>] ? autoremove_wake_function+0x50/0x50
 [<ffffffff8115b886>] __lock_buffer+0x36/0x40
 [<ffffffffa00c56ad>] do_get_write_access+0x64d/0x660 [jbd]
 [<ffffffff81048419>] ? sub_preempt_count+0xa9/0xe0
 [<ffffffffa00c33e0>] ? start_this_handle+0x370/0x470 [jbd]
 [<ffffffffa00cb594>] ? journal_add_journal_head+0xf4/0x220 [jbd]
 [<ffffffffa00c58f1>] journal_get_write_access+0x31/0x50 [jbd]
 [<ffffffffa00efe6d>] __ext3_journal_get_write_access+0x2d/0x60 [ext3]
 [<ffffffffa00e2423>] ext3_reserve_inode_write+0x83/0xb0 [ext3]
 [<ffffffffa00e2494>] ext3_mark_inode_dirty+0x44/0x70 [ext3]
 [<ffffffffa00e4ffe>] ext3_dirty_inode+0x5e/0xa0 [ext3]
 [<ffffffff8115412f>] __mark_inode_dirty+0x3f/0x250
 [<ffffffff8114627c>] file_update_time+0xec/0x170
 [<ffffffff813dfecd>] ? mutex_lock_nested+0x27d/0x3a0
 [<ffffffff810e4138>] __generic_file_aio_write+0x1f8/0x440
 [<ffffffff810e43f5>] generic_file_aio_write+0x75/0xf0
 [<ffffffff8112ca6a>] do_sync_write+0xda/0x120
 [<ffffffff8110a2d7>] ? remove_vma+0x77/0x90
 [<ffffffff8108cdbd>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8110a2d7>] ? remove_vma+0x77/0x90
 [<ffffffff8112d186>] vfs_write+0xc6/0x170
 [<ffffffff8112d481>] sys_write+0x51/0x90
 [<ffffffff813e31eb>] system_call_fastpath+0x16/0x1b
2 locks held by klauncher/5744:
 #0:  (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [<ffffffff810e43d9>]
generic_file_aio_write+0x59/0xf0
 #1:  (jbd_handle){+.+...}, at: [<ffffffffa00c33e0>]
start_this_handle+0x370/0x470 [jbd]
INFO: task okular:4180 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
okular          D 000000010002a251     0  4180   5743 0x00000000
 ffff880041d13aa8 0000000000000046 ffff880000000000 ffffffff8108cd6d
 0000000000000282 ffff880041d13fd8 ffff880037a59f40 ffff880041d13fd8
 ffff880041d12000 ffff880041d12000 ffff880041d12000 ffff880041d13fd8
Call Trace:
 [<ffffffff8108cd6d>] ? trace_hardirqs_on_caller+0x14d/0x190
 [<ffffffffa00c32b4>] start_this_handle+0x244/0x470 [jbd]
 [<ffffffff8109a633>] ? is_module_address+0x33/0x60
 [<ffffffff81075280>] ? wake_up_bit+0x40/0x40
 [<ffffffffa00c370b>] journal_start+0xdb/0x120 [jbd]
 [<ffffffffa00eb436>] ext3_journal_start_sb+0x36/0x70 [ext3]
 [<ffffffffa00e2663>] ext3_setattr+0x1a3/0x210 [ext3]
 [<ffffffff81148556>] notify_change+0x116/0x360
 [<ffffffff8112b803>] do_truncate+0x63/0x90
 [<ffffffff81048419>] ? sub_preempt_count+0xa9/0xe0
 [<ffffffff8113abac>] do_last+0x42c/0x820
 [<ffffffff8113c3b0>] path_openat+0xd0/0x410
 [<ffffffff81102ad3>] ? might_fault+0x53/0xb0
 [<ffffffff8113c76f>] do_filp_open+0x7f/0xa0
 [<ffffffff81048419>] ? sub_preempt_count+0xa9/0xe0
 [<ffffffff813e2585>] ? _raw_spin_unlock+0x35/0x60
 [<ffffffff81149834>] ? alloc_fd+0xf4/0x150
 [<ffffffff8112c651>] do_sys_open+0x101/0x1e0
 [<ffffffff8112c750>] sys_open+0x20/0x30
 [<ffffffff813e31eb>] system_call_fastpath+0x16/0x1b
2 locks held by okular/4180:
 #0:  (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [<ffffffff8112b7f7>]
do_truncate+0x57/0x90
 #1:  (&sb->s_type->i_alloc_sem_key#4){+.+...}, at: [<ffffffff811486e0>]
notify_change+0x2a0/0x360

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

------------------------------------------------------------------------------
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
--

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Bug 32982] Kernel locks up a few minutes after boot
       [not found] <bug-32982-2300@https.bugzilla.kernel.org/>
                   ` (7 preceding siblings ...)
  2011-04-13 18:49 ` bugzilla-daemon
@ 2011-04-14 15:57 ` bugzilla-daemon
  2011-04-14 19:24 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 60+ messages in thread
From: bugzilla-daemon @ 2011-04-14 15:57 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=32982





--- Comment #7 from Bart Van Assche <bart.vanassche@gmail.com>  2011-04-14 15:57:40 ---
Still occurs with 85f2e689a5c8fb6ed8fdbee00109e7f6e5fefcb6 (2.6.29-rc3+). Note:
I'm still trying to find the offending commit via bisecting.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Bug 32982] Kernel locks up a few minutes after boot
       [not found] <bug-32982-2300@https.bugzilla.kernel.org/>
                   ` (8 preceding siblings ...)
  2011-04-14 15:57 ` bugzilla-daemon
@ 2011-04-14 19:24 ` bugzilla-daemon
  2011-04-15 10:23 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 60+ messages in thread
From: bugzilla-daemon @ 2011-04-14 19:24 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=32982





--- Comment #8 from Bart Van Assche <bart.vanassche@gmail.com>  2011-04-14 19:24:34 ---
The result of the bisect process - not sure it's useful:

# git bisect skip
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
62e0ff1ef2d8ea0814487f73a7de431396a1e914
1fcf0069f4715f6f811466db68a547a348b4d5a9
94e948e6e43cd34e0e2ca496d5e90e4ff0d884f9
53f358a81e10e798f44af896ffacaedd7ac0269b
e9c5db0b8dce1bcdc99ad26e718230810d6b5cff
b73a21fc66fee35b41db755abebfacba48b2fc76
We cannot bisect more!

git bisect start
# skip: [1fcf0069f4715f6f811466db68a547a348b4d5a9] Merge branch 'common/fbdev'
of master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6
git bisect skip 1fcf0069f4715f6f811466db68a547a348b4d5a9
# skip: [e9c5db0b8dce1bcdc99ad26e718230810d6b5cff] efifb: support AMD Radeon HD
6490
git bisect skip e9c5db0b8dce1bcdc99ad26e718230810d6b5cff
# skip: [21cd72e7cb424f1686855602ec0fdc6e5830f249] savagefb: Set up I2C based
on chip family instead of card id
git bisect skip 21cd72e7cb424f1686855602ec0fdc6e5830f249
# skip: [47c87d930f3db4fc3a30505075e07f5597e2e953] fb: Reduce priority of
resource conflict message
git bisect skip 47c87d930f3db4fc3a30505075e07f5597e2e953
# good: [899631c7916b231ba6509c90dbc33221e9194029] Merge branch 'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
git bisect good 899631c7916b231ba6509c90dbc33221e9194029
# bad: [b73a21fc66fee35b41db755abebfacba48b2fc76] video: s3c-fb: fix checkpatch
errors and warning
git bisect bad b73a21fc66fee35b41db755abebfacba48b2fc76
# skip: [6c5103890057b1bb781b26b7aae38d33e4c517d8] Merge branch
'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block
git bisect skip 6c5103890057b1bb781b26b7aae38d33e4c517d8
# good: [247f99386100d1d1c369ba98120d2edebf5426fc] fbdev: sh_mobile_lcdcfb: fix
module lock acquisition
git bisect good 247f99386100d1d1c369ba98120d2edebf5426fc
# skip: [53f358a81e10e798f44af896ffacaedd7ac0269b] Merge branch 'viafb-next' of
git://github.com/schandinat/linux-2.6
git bisect skip 53f358a81e10e798f44af896ffacaedd7ac0269b
# good: [3f086fe93f734ba76f2e130777687f81e0cbb318] viafb: initialize margins
correct
git bisect good 3f086fe93f734ba76f2e130777687f81e0cbb318
# skip: [62e0ff1ef2d8ea0814487f73a7de431396a1e914] fbcon: Remove unused
'display *p' variable from fb_flashcursor()
git bisect skip 62e0ff1ef2d8ea0814487f73a7de431396a1e914
# skip: [94e948e6e43cd34e0e2ca496d5e90e4ff0d884f9] s3fb: fix Virge/GX2
git bisect skip 94e948e6e43cd34e0e2ca496d5e90e4ff0d884f9

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Bug 32982] Kernel locks up a few minutes after boot
       [not found] <bug-32982-2300@https.bugzilla.kernel.org/>
                   ` (9 preceding siblings ...)
  2011-04-14 19:24 ` bugzilla-daemon
@ 2011-04-15 10:23 ` bugzilla-daemon
  2011-04-17 18:20 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 60+ messages in thread
From: bugzilla-daemon @ 2011-04-15 10:23 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=32982


Bart Van Assche <bart.vanassche@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|Video(DRI - non Intel)      |Block Layer
            Product|Drivers                     |IO/Storage




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Bug 32982] Kernel locks up a few minutes after boot
       [not found] <bug-32982-2300@https.bugzilla.kernel.org/>
                   ` (10 preceding siblings ...)
  2011-04-15 10:23 ` bugzilla-daemon
@ 2011-04-17 18:20 ` bugzilla-daemon
  2011-05-01  9:35 ` bugzilla-daemon
  2011-05-01 11:22 ` bugzilla-daemon
  13 siblings, 0 replies; 60+ messages in thread
From: bugzilla-daemon @ 2011-04-17 18:20 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=32982





--- Comment #9 from Rafael J. Wysocki <rjw@sisk.pl>  2011-04-17 18:20:39 ---
On Sunday, April 17, 2011, Linus Torvalds wrote:
> Is this machine running a RAID5 setup or something like that?
> 
> There is a known interaction with the new block layer plugging code
> and MD. The "hung task" report in that bugzilla looks very much like
> that issue. And you do have "root=/dev/md0", so clearly there's some
> md thing going on.
> 
> And bisecting might not work all that well for it, because I suspect
> it ends up being very much a matter of IO patterns how it triggers.
> 
> Neil supposedly has a patch for it, but I haven't seen it yet. Neil, Jens?
> 
>                                    Linus

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Bug 32982] Kernel locks up a few minutes after boot
       [not found] <bug-32982-2300@https.bugzilla.kernel.org/>
                   ` (11 preceding siblings ...)
  2011-04-17 18:20 ` bugzilla-daemon
@ 2011-05-01  9:35 ` bugzilla-daemon
  2011-05-01 11:22 ` bugzilla-daemon
  13 siblings, 0 replies; 60+ messages in thread
From: bugzilla-daemon @ 2011-05-01  9:35 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=32982


Bart Van Assche <bvanassche@acm.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |CODE_FIX




--- Comment #10 from Bart Van Assche <bvanassche@acm.org>  2011-05-01 09:35:17 ---
Was fixed in 2.6.39-rc4.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Bug 32982] Kernel locks up a few minutes after boot
       [not found] <bug-32982-2300@https.bugzilla.kernel.org/>
                   ` (12 preceding siblings ...)
  2011-05-01  9:35 ` bugzilla-daemon
@ 2011-05-01 11:22 ` bugzilla-daemon
  13 siblings, 0 replies; 60+ messages in thread
From: bugzilla-daemon @ 2011-05-01 11:22 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=32982


Rafael J. Wysocki <rjw@sisk.pl> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-05-01 17:01         ` Linus Torvalds
  0 siblings, 0 replies; 60+ messages in thread
From: Linus Torvalds @ 2011-05-01 17:01 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler

On Sun, May 1, 2011 at 2:55 AM, Bart Van Assche <bvanassche@acm.org> wrote:
>
> There is something else and completely unrelated that is puzzling me though:
> on two almost identical systems one always recognizes all internal PCIe
> cards but the other system not. This is something that seldom happened with
> 2.6.34 but happens frequently with 2.6.38 and 2.6.39-rcx. What I see is that
> during boot either both InfiniBand PCIe cards are recognized or that one
> specific card is not recognized and even doesn't show up in the lspci
> output. A BIOS upgrade didn't help. Any idea where I should start looking to
> find the cause of this issue ?

So it has happened sporadically before, but happens much more commonly
now? That very much implies some timing issue in PCI probing.

It could be, for example, that the card has a very slow reset
sequence, and doesn't respond to PCI config cycles until it has
internally booted fully. If so, a faster boot by the kernel might just
cause the Linux PCI enumeration to be done before the card is ready.

(That's a really unlikely scenario - I'm not seriously suggesting that
the card would be quite <i>that</i> stupid and slow. But there might
be similar issues at a much lower level, ie if the Linux pcie port
driver might be resetting the port and then trying to read the card
too quickly afterwards, and you'd want some added delay there).

Have you tried it "pcie_ports=compat" (or "native") makes any difference?

But you should probably contact Jesse Barnes and the linux-pci mailing
list and see if anybody has any smarter ideas.

                       Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-05-01 17:01         ` Linus Torvalds
  0 siblings, 0 replies; 60+ messages in thread
From: Linus Torvalds @ 2011-05-01 17:01 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler

On Sun, May 1, 2011 at 2:55 AM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
>
> There is something else and completely unrelated that is puzzling me though:
> on two almost identical systems one always recognizes all internal PCIe
> cards but the other system not. This is something that seldom happened with
> 2.6.34 but happens frequently with 2.6.38 and 2.6.39-rcx. What I see is that
> during boot either both InfiniBand PCIe cards are recognized or that one
> specific card is not recognized and even doesn't show up in the lspci
> output. A BIOS upgrade didn't help. Any idea where I should start looking to
> find the cause of this issue ?

So it has happened sporadically before, but happens much more commonly
now? That very much implies some timing issue in PCI probing.

It could be, for example, that the card has a very slow reset
sequence, and doesn't respond to PCI config cycles until it has
internally booted fully. If so, a faster boot by the kernel might just
cause the Linux PCI enumeration to be done before the card is ready.

(That's a really unlikely scenario - I'm not seriously suggesting that
the card would be quite <i>that</i> stupid and slow. But there might
be similar issues at a much lower level, ie if the Linux pcie port
driver might be resetting the port and then trying to read the card
too quickly afterwards, and you'd want some added delay there).

Have you tried it "pcie_ports=compat" (or "native") makes any difference?

But you should probably contact Jesse Barnes and the linux-pci mailing
list and see if anybody has any smarter ideas.

                       Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
  2011-04-30 19:42   ` Rafael J. Wysocki
@ 2011-04-30 19:51     ` Linus Torvalds
  -1 siblings, 0 replies; 60+ messages in thread
From: Linus Torvalds @ 2011-04-30 19:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Maciej Rutecki,
	Florian Mickler, Bart Van Assche

I think we had all assumed that this was the MD problem that should
have been fixed in rc4 (the symptoms matched), but I don't think we
got any confirmation from Bart on that.

Bart? Does the problem still persist in current -git?

                 Linus

On Sat, Apr 30, 2011 at 12:42 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> This message has been generated automatically as a part of a summary report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.38.  Please verify if it still should be listed and let the tracking team
> know (either way).
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=32982
> Subject         : Kernel locks up a few minutes after boot
> Submitter       : Bart Van Assche <bart.vanassche@gmail.com>
> Date            : 2011-04-10 19:55 (21 days old)
>
>
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-30 19:51     ` Linus Torvalds
  0 siblings, 0 replies; 60+ messages in thread
From: Linus Torvalds @ 2011-04-30 19:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Maciej Rutecki,
	Florian Mickler, Bart Van Assche

I think we had all assumed that this was the MD problem that should
have been fixed in rc4 (the symptoms matched), but I don't think we
got any confirmation from Bart on that.

Bart? Does the problem still persist in current -git?

                 Linus

On Sat, Apr 30, 2011 at 12:42 PM, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote:
> This message has been generated automatically as a part of a summary report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.38.  Please verify if it still should be listed and let the tracking team
> know (either way).
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=32982
> Subject         : Kernel locks up a few minutes after boot
> Submitter       : Bart Van Assche <bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Date            : 2011-04-10 19:55 (21 days old)
>
>
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Bug #32982] Kernel locks up a few minutes after boot
  2011-04-30 19:42 2.6.39-rc5-git4: Reported regressions from 2.6.38 Rafael J. Wysocki
@ 2011-04-30 19:42   ` Rafael J. Wysocki
  0 siblings, 0 replies; 60+ messages in thread
From: Rafael J. Wysocki @ 2011-04-30 19:42 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Maciej Rutecki, Florian Mickler,
	Bart Van Assche, Linus Torvalds

This message has been generated automatically as a part of a summary report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.38.  Please verify if it still should be listed and let the tracking team
know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=32982
Subject		: Kernel locks up a few minutes after boot
Submitter	: Bart Van Assche <bart.vanassche@gmail.com>
Date		: 2011-04-10 19:55 (21 days old)



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-30 19:42   ` Rafael J. Wysocki
  0 siblings, 0 replies; 60+ messages in thread
From: Rafael J. Wysocki @ 2011-04-30 19:42 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Maciej Rutecki, Florian Mickler,
	Bart Van Assche, Linus Torvalds

This message has been generated automatically as a part of a summary report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.38.  Please verify if it still should be listed and let the tracking team
know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=32982
Subject		: Kernel locks up a few minutes after boot
Submitter	: Bart Van Assche <bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2011-04-10 19:55 (21 days old)


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
  2011-04-19 16:39               ` Bart Van Assche
  (?)
@ 2011-04-21  0:38               ` Dave Dillow
  -1 siblings, 0 replies; 60+ messages in thread
From: Dave Dillow @ 2011-04-21  0:38 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Linux Kernel Mailing List

On 4/19/2011 12:39 PM, Bart Van Assche wrote:
> On Tue, Apr 19, 2011 at 5:32 AM, David Dillow<dave@thedillows.org>  wrote:
>> The mapping code for ib_srp changed in 2.6.39-rc1, but it showed
>> improved IOPS for a similar setup in my testing so I'd be surprised if
>> it is the culprit. Still, it wouldn't hurt to check. Do you have time to
>> try the new ib_srp code with 2.6.38.3 to eliminate it from the equation?
> Hello Dave,
>
> I just ran a test with the most important 2.6.39-specific ib_srp
> commits reverted but that didn't yield a measurable performance
> difference for this specific test:

Thanks for giving it a whirl,
Dave

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-19 17:43                         ` Jens Axboe
  0 siblings, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2011-04-19 17:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bart Van Assche, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown,
	David Dillow

On 2011-04-19 18:32, Linus Torvalds wrote:
> On Tue, Apr 19, 2011 at 9:13 AM, Bart Van Assche <bvanassche@acm.org> wrote:
>>
>> The same test with an initiator running 2.6.39-rc4 +
>> git://git.kernel.dk/linux-2.6-block.git for-linus + the above patch
>> yields about 155.000 IOPS on my test setup, or the same performance as
>> with 2.6.38.3. I'm running the above patch through an I/O stress test
>> now.
> 
> Goodie. So not only does that patch get back the 11%, it removes the
> crazy QUEUE_FLAG_REENTER flag that was broken to begin with. AND it
> removes a number of complicated lines.
> 
> Halleluja.

Indeed, coming your way soonish.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-19 17:43                         ` Jens Axboe
  0 siblings, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2011-04-19 17:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bart Van Assche, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown,
	David Dillow

On 2011-04-19 18:32, Linus Torvalds wrote:
> On Tue, Apr 19, 2011 at 9:13 AM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
>>
>> The same test with an initiator running 2.6.39-rc4 +
>> git://git.kernel.dk/linux-2.6-block.git for-linus + the above patch
>> yields about 155.000 IOPS on my test setup, or the same performance as
>> with 2.6.38.3. I'm running the above patch through an I/O stress test
>> now.
> 
> Goodie. So not only does that patch get back the 11%, it removes the
> crazy QUEUE_FLAG_REENTER flag that was broken to begin with. AND it
> removes a number of complicated lines.
> 
> Halleluja.

Indeed, coming your way soonish.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-19 17:43                       ` Jens Axboe
  0 siblings, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2011-04-19 17:43 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown,
	David Dillow

On 2011-04-19 18:13, Bart Van Assche wrote:
> The same test with an initiator running 2.6.39-rc4 +
> git://git.kernel.dk/linux-2.6-block.git for-linus + the above patch
> yields about 155.000 IOPS on my test setup, or the same performance as
> with 2.6.38.3. I'm running the above patch through an I/O stress test
> now.

OK, so parity, that's good. With the above patch, I can take a single
device from ~400K IOPS on 2.6.38 to ~440K IOPS on 2.6.39-rc4+patches.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-19 17:43                       ` Jens Axboe
  0 siblings, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2011-04-19 17:43 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown,
	David Dillow

On 2011-04-19 18:13, Bart Van Assche wrote:
> The same test with an initiator running 2.6.39-rc4 +
> git://git.kernel.dk/linux-2.6-block.git for-linus + the above patch
> yields about 155.000 IOPS on my test setup, or the same performance as
> with 2.6.38.3. I'm running the above patch through an I/O stress test
> now.

OK, so parity, that's good. With the above patch, I can take a single
device from ~400K IOPS on 2.6.38 to ~440K IOPS on 2.6.39-rc4+patches.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-19 17:06                       ` Jens Axboe
  0 siblings, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2011-04-19 17:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bart Van Assche, Linus Torvalds, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Maciej Rutecki,
	Florian Mickler, Neil Brown

On 2011-04-19 18:48, Christoph Hellwig wrote:
>> +		blk_run_queue_async(sdev->request_queue);
> 
> This doesn't even have to be async except when scsi drivers call
> cmd->scsi_done directly.  It seems like if this always went through the
> softirq (or kblockd) we could still run it in context for the others.

Exactly. I'll pass an 'optimize' patch past James.

>> +	/*
>> +	 * This get/put dance makes no sense
>> +	 */
>>  	get_device(&rport->dev);
>> -
>> -	spin_lock_irqsave(rport->rqst_q->queue_lock, flags);
>> -	flagset = test_bit(QUEUE_FLAG_REENTER, &rport->rqst_q->queue_flags) &&
>> -		  !test_bit(QUEUE_FLAG_REENTER, &rport->rqst_q->queue_flags);
>> -	if (flagset)
>> -		queue_flag_set(QUEUE_FLAG_REENTER, rport->rqst_q);
>> -	__blk_run_queue(rport->rqst_q);
>> -	if (flagset)
>> -		queue_flag_clear(QUEUE_FLAG_REENTER, rport->rqst_q);
>> -	spin_unlock_irqrestore(rport->rqst_q->queue_lock, flags);
>> -
>> +	blk_run_queue_async(rport->rqst_q);
> 
> And the QUEUE_FLAG_REENTER mess here never made sense either as it
> tested for a bit beeing set and not set at the same time.  So this one
> actually should be able to be replaced by a plain blk_run_queue.

Yep, it's completely broken as-is.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-19 17:06                       ` Jens Axboe
  0 siblings, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2011-04-19 17:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bart Van Assche, Linus Torvalds, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Maciej Rutecki,
	Florian Mickler, Neil Brown

On 2011-04-19 18:48, Christoph Hellwig wrote:
>> +		blk_run_queue_async(sdev->request_queue);
> 
> This doesn't even have to be async except when scsi drivers call
> cmd->scsi_done directly.  It seems like if this always went through the
> softirq (or kblockd) we could still run it in context for the others.

Exactly. I'll pass an 'optimize' patch past James.

>> +	/*
>> +	 * This get/put dance makes no sense
>> +	 */
>>  	get_device(&rport->dev);
>> -
>> -	spin_lock_irqsave(rport->rqst_q->queue_lock, flags);
>> -	flagset = test_bit(QUEUE_FLAG_REENTER, &rport->rqst_q->queue_flags) &&
>> -		  !test_bit(QUEUE_FLAG_REENTER, &rport->rqst_q->queue_flags);
>> -	if (flagset)
>> -		queue_flag_set(QUEUE_FLAG_REENTER, rport->rqst_q);
>> -	__blk_run_queue(rport->rqst_q);
>> -	if (flagset)
>> -		queue_flag_clear(QUEUE_FLAG_REENTER, rport->rqst_q);
>> -	spin_unlock_irqrestore(rport->rqst_q->queue_lock, flags);
>> -
>> +	blk_run_queue_async(rport->rqst_q);
> 
> And the QUEUE_FLAG_REENTER mess here never made sense either as it
> tested for a bit beeing set and not set at the same time.  So this one
> actually should be able to be replaced by a plain blk_run_queue.

Yep, it's completely broken as-is.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-19 16:48                     ` Christoph Hellwig
  0 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2011-04-19 16:48 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Bart Van Assche, Linus Torvalds, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Maciej Rutecki,
	Florian Mickler, Neil Brown

> +		blk_run_queue_async(sdev->request_queue);

This doesn't even have to be async except when scsi drivers call
cmd->scsi_done directly.  It seems like if this always went through the
softirq (or kblockd) we could still run it in context for the others.

> +	/*
> +	 * This get/put dance makes no sense
> +	 */
>  	get_device(&rport->dev);
> -
> -	spin_lock_irqsave(rport->rqst_q->queue_lock, flags);
> -	flagset = test_bit(QUEUE_FLAG_REENTER, &rport->rqst_q->queue_flags) &&
> -		  !test_bit(QUEUE_FLAG_REENTER, &rport->rqst_q->queue_flags);
> -	if (flagset)
> -		queue_flag_set(QUEUE_FLAG_REENTER, rport->rqst_q);
> -	__blk_run_queue(rport->rqst_q);
> -	if (flagset)
> -		queue_flag_clear(QUEUE_FLAG_REENTER, rport->rqst_q);
> -	spin_unlock_irqrestore(rport->rqst_q->queue_lock, flags);
> -
> +	blk_run_queue_async(rport->rqst_q);

And the QUEUE_FLAG_REENTER mess here never made sense either as it
tested for a bit beeing set and not set at the same time.  So this one
actually should be able to be replaced by a plain blk_run_queue.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-19 16:48                     ` Christoph Hellwig
  0 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2011-04-19 16:48 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Bart Van Assche, Linus Torvalds, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Maciej Rutecki,
	Florian Mickler, Neil Brown

> +		blk_run_queue_async(sdev->request_queue);

This doesn't even have to be async except when scsi drivers call
cmd->scsi_done directly.  It seems like if this always went through the
softirq (or kblockd) we could still run it in context for the others.

> +	/*
> +	 * This get/put dance makes no sense
> +	 */
>  	get_device(&rport->dev);
> -
> -	spin_lock_irqsave(rport->rqst_q->queue_lock, flags);
> -	flagset = test_bit(QUEUE_FLAG_REENTER, &rport->rqst_q->queue_flags) &&
> -		  !test_bit(QUEUE_FLAG_REENTER, &rport->rqst_q->queue_flags);
> -	if (flagset)
> -		queue_flag_set(QUEUE_FLAG_REENTER, rport->rqst_q);
> -	__blk_run_queue(rport->rqst_q);
> -	if (flagset)
> -		queue_flag_clear(QUEUE_FLAG_REENTER, rport->rqst_q);
> -	spin_unlock_irqrestore(rport->rqst_q->queue_lock, flags);
> -
> +	blk_run_queue_async(rport->rqst_q);

And the QUEUE_FLAG_REENTER mess here never made sense either as it
tested for a bit beeing set and not set at the same time.  So this one
actually should be able to be replaced by a plain blk_run_queue.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-19 16:39               ` Bart Van Assche
  0 siblings, 0 replies; 60+ messages in thread
From: Bart Van Assche @ 2011-04-19 16:39 UTC (permalink / raw)
  To: David Dillow
  Cc: Jens Axboe, Linus Torvalds, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Maciej Rutecki,
	Florian Mickler, Neil Brown

On Tue, Apr 19, 2011 at 5:32 AM, David Dillow <dave@thedillows.org> wrote:
>
> On Mon, 2011-04-18 at 20:21 +0200, Bart Van Assche wrote:
> > On Mon, Apr 18, 2011 at 1:44 PM, Jens Axboe <jaxboe@fusionio.com> wrote:
> > > Bart, can you try and pull:
> > >
> > > git://git.kernel.dk/linux-2.6-block.git for-linus
> > >
> > > into Linus' tree and see if that works? This has, among other things,
> > > Neils fixes for MD.
> >
> > md seems to work stable with the resulting tree, but it looks there is
> > a performance regression in the block layer not related to the md
> > issue. If I run a small block IOPS test on a block device created by
> > ib_srp (NOOP scheduler) I see about 11% less IOPS than with 2.6.38.3
> > (155.000 IOPS with 2.6.38.3 and 140.000 IOPS with 2.6.39-rc3+).
>
> The mapping code for ib_srp changed in 2.6.39-rc1, but it showed
> improved IOPS for a similar setup in my testing so I'd be surprised if
> it is the culprit. Still, it wouldn't hurt to check. Do you have time to
> try the new ib_srp code with 2.6.38.3 to eliminate it from the equation?

Hello Dave,

I just ran a test with the most important 2.6.39-specific ib_srp
commits reverted but that didn't yield a measurable performance
difference for this specific test:

$ git show --format=format:%s 7f9e5c48c1078507747434d4c182ab10925bf98a
be8b981453a4904399cb090c1660618e250092d8
c07d424d6118d528ef71b22b7424bfc359c307a5
8f26c9ff9cd0317ad867bce972f69e0c6c2cbe3c
961e0be89a5120a1409ebc525cca6f603615a8a8
8c4037b501acd2ec3abc7925e66af8af40a2da9d | grep '^IB'
IB: Increase DMA max_segment_size on Mellanox hardware
IB/srp: try to use larger FMR sizes to cover our mappings
IB/srp: add support for indirect tables that don't fit in SRP_CMD
IB/srp: rework mapping engine to use multiple FMR entries
IB/srp: move IB CM setup completion into its own function
IB/srp: always avoid non-zero offsets into an FMR

Bart.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-19 16:39               ` Bart Van Assche
  0 siblings, 0 replies; 60+ messages in thread
From: Bart Van Assche @ 2011-04-19 16:39 UTC (permalink / raw)
  To: David Dillow
  Cc: Jens Axboe, Linus Torvalds, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Maciej Rutecki,
	Florian Mickler, Neil Brown

On Tue, Apr 19, 2011 at 5:32 AM, David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org> wrote:
>
> On Mon, 2011-04-18 at 20:21 +0200, Bart Van Assche wrote:
> > On Mon, Apr 18, 2011 at 1:44 PM, Jens Axboe <jaxboe-5c4llco8/ftWk0Htik3J/w@public.gmane.org> wrote:
> > > Bart, can you try and pull:
> > >
> > > git://git.kernel.dk/linux-2.6-block.git for-linus
> > >
> > > into Linus' tree and see if that works? This has, among other things,
> > > Neils fixes for MD.
> >
> > md seems to work stable with the resulting tree, but it looks there is
> > a performance regression in the block layer not related to the md
> > issue. If I run a small block IOPS test on a block device created by
> > ib_srp (NOOP scheduler) I see about 11% less IOPS than with 2.6.38.3
> > (155.000 IOPS with 2.6.38.3 and 140.000 IOPS with 2.6.39-rc3+).
>
> The mapping code for ib_srp changed in 2.6.39-rc1, but it showed
> improved IOPS for a similar setup in my testing so I'd be surprised if
> it is the culprit. Still, it wouldn't hurt to check. Do you have time to
> try the new ib_srp code with 2.6.38.3 to eliminate it from the equation?

Hello Dave,

I just ran a test with the most important 2.6.39-specific ib_srp
commits reverted but that didn't yield a measurable performance
difference for this specific test:

$ git show --format=format:%s 7f9e5c48c1078507747434d4c182ab10925bf98a
be8b981453a4904399cb090c1660618e250092d8
c07d424d6118d528ef71b22b7424bfc359c307a5
8f26c9ff9cd0317ad867bce972f69e0c6c2cbe3c
961e0be89a5120a1409ebc525cca6f603615a8a8
8c4037b501acd2ec3abc7925e66af8af40a2da9d | grep '^IB'
IB: Increase DMA max_segment_size on Mellanox hardware
IB/srp: try to use larger FMR sizes to cover our mappings
IB/srp: add support for indirect tables that don't fit in SRP_CMD
IB/srp: rework mapping engine to use multiple FMR entries
IB/srp: move IB CM setup completion into its own function
IB/srp: always avoid non-zero offsets into an FMR

Bart.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-19 16:32                       ` Linus Torvalds
  0 siblings, 0 replies; 60+ messages in thread
From: Linus Torvalds @ 2011-04-19 16:32 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown,
	David Dillow

On Tue, Apr 19, 2011 at 9:13 AM, Bart Van Assche <bvanassche@acm.org> wrote:
>
> The same test with an initiator running 2.6.39-rc4 +
> git://git.kernel.dk/linux-2.6-block.git for-linus + the above patch
> yields about 155.000 IOPS on my test setup, or the same performance as
> with 2.6.38.3. I'm running the above patch through an I/O stress test
> now.

Goodie. So not only does that patch get back the 11%, it removes the
crazy QUEUE_FLAG_REENTER flag that was broken to begin with. AND it
removes a number of complicated lines.

Halleluja.

                        Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-19 16:32                       ` Linus Torvalds
  0 siblings, 0 replies; 60+ messages in thread
From: Linus Torvalds @ 2011-04-19 16:32 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown,
	David Dillow

On Tue, Apr 19, 2011 at 9:13 AM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
>
> The same test with an initiator running 2.6.39-rc4 +
> git://git.kernel.dk/linux-2.6-block.git for-linus + the above patch
> yields about 155.000 IOPS on my test setup, or the same performance as
> with 2.6.38.3. I'm running the above patch through an I/O stress test
> now.

Goodie. So not only does that patch get back the 11%, it removes the
crazy QUEUE_FLAG_REENTER flag that was broken to begin with. AND it
removes a number of complicated lines.

Halleluja.

                        Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
  2011-04-19 11:16                   ` Jens Axboe
  (?)
@ 2011-04-19 16:13                   ` Bart Van Assche
  2011-04-19 16:32                       ` Linus Torvalds
  2011-04-19 17:43                       ` Jens Axboe
  -1 siblings, 2 replies; 60+ messages in thread
From: Bart Van Assche @ 2011-04-19 16:13 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown,
	David Dillow

On Tue, Apr 19, 2011 at 1:16 PM, Jens Axboe <jaxboe@fusionio.com> wrote:
> On 2011-04-19 11:09, Jens Axboe wrote:
> > On 2011-04-18 20:32, Bart Van Assche wrote:
> >> On Mon, Apr 18, 2011 at 8:28 PM, Jens Axboe <jaxboe@fusionio.com> wrote:
> >>> On 2011-04-18 20:21, Bart Van Assche wrote:
> >>>> a performance regression in the block layer not related to the md
> >>>> issue. If I run a small block IOPS test on a block device created by
> >>>> ib_srp (NOOP scheduler) I see about 11% less IOPS than with 2.6.38.3
> >>>> (155.000 IOPS with 2.6.38.3 and 140.000 IOPS with 2.6.39-rc3+).
> >>>
> >>> That's not good. What's the test case?
> >>
> >> Nothing more than a fio IOPS test:
> >>
> >> fio --bs=512 --ioengine=libaio --buffered=0 --rw=read --thread
> >> --iodepth=64 --numjobs=2 --loops=10000 --group_reporting --size=1G
> >>     --gtod_reduce=1 --name=iops-test --filename=/dev/${dev} --invalidate=1
> >
> > Bart, can you try the below:
>
> Here's a more complete variant. James, lets get rid of this REENTER
> crap. It's completely bogus and triggers falsely for a variety of
> reasons. The below will work, but there may be room for improvement on
> the SCSI side.
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 5fa3dd2..4e49665 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -303,15 +303,7 @@ void __blk_run_queue(struct request_queue *q)
>        if (unlikely(blk_queue_stopped(q)))
>                return;
>
> -       /*
> -        * Only recurse once to avoid overrunning the stack, let the unplug
> -        * handling reinvoke the handler shortly if we already got there.
> -        */
> -       if (!queue_flag_test_and_set(QUEUE_FLAG_REENTER, q)) {
> -               q->request_fn(q);
> -               queue_flag_clear(QUEUE_FLAG_REENTER, q);
> -       } else
> -               queue_delayed_work(kblockd_workqueue, &q->delay_work, 0);
> +       q->request_fn(q);
>  }
>  EXPORT_SYMBOL(__blk_run_queue);
>
> @@ -328,6 +320,7 @@ void blk_run_queue_async(struct request_queue *q)
>        if (likely(!blk_queue_stopped(q)))
>                queue_delayed_work(kblockd_workqueue, &q->delay_work, 0);
>  }
> +EXPORT_SYMBOL(blk_run_queue_async);
>
>  /**
>  * blk_run_queue - run a single device queue
> diff --git a/block/blk.h b/block/blk.h
> index c9df8fc..6126346 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -22,7 +22,6 @@ void blk_rq_timed_out_timer(unsigned long data);
>  void blk_delete_timer(struct request *);
>  void blk_add_timer(struct request *);
>  void __generic_unplug_device(struct request_queue *);
> -void blk_run_queue_async(struct request_queue *q);
>
>  /*
>  * Internal atomic flags for request handling
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index ab55c2f..e9901b8 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -411,8 +411,6 @@ static void scsi_run_queue(struct request_queue *q)
>        list_splice_init(&shost->starved_list, &starved_list);
>
>        while (!list_empty(&starved_list)) {
> -               int flagset;
> -
>                /*
>                 * As long as shost is accepting commands and we have
>                 * starved queues, call blk_run_queue. scsi_request_fn
> @@ -435,20 +433,7 @@ static void scsi_run_queue(struct request_queue *q)
>                        continue;
>                }
>
> -               spin_unlock(shost->host_lock);
> -
> -               spin_lock(sdev->request_queue->queue_lock);
> -               flagset = test_bit(QUEUE_FLAG_REENTER, &q->queue_flags) &&
> -                               !test_bit(QUEUE_FLAG_REENTER,
> -                                       &sdev->request_queue->queue_flags);
> -               if (flagset)
> -                       queue_flag_set(QUEUE_FLAG_REENTER, sdev->request_queue);
> -               __blk_run_queue(sdev->request_queue);
> -               if (flagset)
> -                       queue_flag_clear(QUEUE_FLAG_REENTER, sdev->request_queue);
> -               spin_unlock(sdev->request_queue->queue_lock);
> -
> -               spin_lock(shost->host_lock);
> +               blk_run_queue_async(sdev->request_queue);
>        }
>        /* put any unprocessed entries back */
>        list_splice(&starved_list, &shost->starved_list);
> diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
> index 28c3350..815069d 100644
> --- a/drivers/scsi/scsi_transport_fc.c
> +++ b/drivers/scsi/scsi_transport_fc.c
> @@ -3816,28 +3816,17 @@ fail_host_msg:
>  static void
>  fc_bsg_goose_queue(struct fc_rport *rport)
>  {
> -       int flagset;
> -       unsigned long flags;
> -
>        if (!rport->rqst_q)
>                return;
>
> +       /*
> +        * This get/put dance makes no sense
> +        */
>        get_device(&rport->dev);
> -
> -       spin_lock_irqsave(rport->rqst_q->queue_lock, flags);
> -       flagset = test_bit(QUEUE_FLAG_REENTER, &rport->rqst_q->queue_flags) &&
> -                 !test_bit(QUEUE_FLAG_REENTER, &rport->rqst_q->queue_flags);
> -       if (flagset)
> -               queue_flag_set(QUEUE_FLAG_REENTER, rport->rqst_q);
> -       __blk_run_queue(rport->rqst_q);
> -       if (flagset)
> -               queue_flag_clear(QUEUE_FLAG_REENTER, rport->rqst_q);
> -       spin_unlock_irqrestore(rport->rqst_q->queue_lock, flags);
> -
> +       blk_run_queue_async(rport->rqst_q);
>        put_device(&rport->dev);
>  }
>
> -
>  /**
>  * fc_bsg_rport_dispatch - process rport bsg requests and dispatch to LLDD
>  * @q:         rport request queue
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index cbbfd98..2ad95fa 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -388,20 +388,19 @@ struct request_queue
>  #define        QUEUE_FLAG_SYNCFULL     3       /* read queue has been filled */
>  #define QUEUE_FLAG_ASYNCFULL   4       /* write queue has been filled */
>  #define QUEUE_FLAG_DEAD                5       /* queue being torn down */
> -#define QUEUE_FLAG_REENTER     6       /* Re-entrancy avoidance */
> -#define QUEUE_FLAG_ELVSWITCH   7       /* don't use elevator, just do FIFO */
> -#define QUEUE_FLAG_BIDI                8       /* queue supports bidi requests */
> -#define QUEUE_FLAG_NOMERGES     9      /* disable merge attempts */
> -#define QUEUE_FLAG_SAME_COMP   10      /* force complete on same CPU */
> -#define QUEUE_FLAG_FAIL_IO     11      /* fake timeout */
> -#define QUEUE_FLAG_STACKABLE   12      /* supports request stacking */
> -#define QUEUE_FLAG_NONROT      13      /* non-rotational device (SSD) */
> +#define QUEUE_FLAG_ELVSWITCH   6       /* don't use elevator, just do FIFO */
> +#define QUEUE_FLAG_BIDI                7       /* queue supports bidi requests */
> +#define QUEUE_FLAG_NOMERGES     8      /* disable merge attempts */
> +#define QUEUE_FLAG_SAME_COMP   9       /* force complete on same CPU */
> +#define QUEUE_FLAG_FAIL_IO     10      /* fake timeout */
> +#define QUEUE_FLAG_STACKABLE   11      /* supports request stacking */
> +#define QUEUE_FLAG_NONROT      12      /* non-rotational device (SSD) */
>  #define QUEUE_FLAG_VIRT        QUEUE_FLAG_NONROT /* paravirt device */
> -#define QUEUE_FLAG_IO_STAT     15      /* do IO stats */
> -#define QUEUE_FLAG_DISCARD     16      /* supports DISCARD */
> -#define QUEUE_FLAG_NOXMERGES   17      /* No extended merges */
> -#define QUEUE_FLAG_ADD_RANDOM  18      /* Contributes to random pool */
> -#define QUEUE_FLAG_SECDISCARD  19      /* supports SECDISCARD */
> +#define QUEUE_FLAG_IO_STAT     13      /* do IO stats */
> +#define QUEUE_FLAG_DISCARD     14      /* supports DISCARD */
> +#define QUEUE_FLAG_NOXMERGES   15      /* No extended merges */
> +#define QUEUE_FLAG_ADD_RANDOM  16      /* Contributes to random pool */
> +#define QUEUE_FLAG_SECDISCARD  17      /* supports SECDISCARD */
>
>  #define QUEUE_FLAG_DEFAULT     ((1 << QUEUE_FLAG_IO_STAT) |            \
>                                 (1 << QUEUE_FLAG_STACKABLE)    |       \
> @@ -699,6 +698,7 @@ extern void blk_sync_queue(struct request_queue *q);
>  extern void __blk_stop_queue(struct request_queue *q);
>  extern void __blk_run_queue(struct request_queue *q);
>  extern void blk_run_queue(struct request_queue *);
> +extern void blk_run_queue_async(struct request_queue *q);
>  extern int blk_rq_map_user(struct request_queue *, struct request *,
>                           struct rq_map_data *, void __user *, unsigned long,
>                           gfp_t);

Hello Jens,

The same test with an initiator running 2.6.39-rc4 +
git://git.kernel.dk/linux-2.6-block.git for-linus + the above patch
yields about 155.000 IOPS on my test setup, or the same performance as
with 2.6.38.3. I'm running the above patch through an I/O stress test
now.

Bart.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-19 11:16                   ` Jens Axboe
  0 siblings, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2011-04-19 11:16 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown

On 2011-04-19 11:09, Jens Axboe wrote:
> On 2011-04-18 20:32, Bart Van Assche wrote:
>> On Mon, Apr 18, 2011 at 8:28 PM, Jens Axboe <jaxboe@fusionio.com> wrote:
>>> On 2011-04-18 20:21, Bart Van Assche wrote:
>>>> a performance regression in the block layer not related to the md
>>>> issue. If I run a small block IOPS test on a block device created by
>>>> ib_srp (NOOP scheduler) I see about 11% less IOPS than with 2.6.38.3
>>>> (155.000 IOPS with 2.6.38.3 and 140.000 IOPS with 2.6.39-rc3+).
>>>
>>> That's not good. What's the test case?
>>
>> Nothing more than a fio IOPS test:
>>
>> fio --bs=512 --ioengine=libaio --buffered=0 --rw=read --thread
>> --iodepth=64 --numjobs=2 --loops=10000 --group_reporting --size=1G
>>     --gtod_reduce=1 --name=iops-test --filename=/dev/${dev} --invalidate=1
> 
> Bart, can you try the below:

Here's a more complete variant. James, lets get rid of this REENTER
crap. It's completely bogus and triggers falsely for a variety of
reasons. The below will work, but there may be room for improvement on
the SCSI side.

diff --git a/block/blk-core.c b/block/blk-core.c
index 5fa3dd2..4e49665 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -303,15 +303,7 @@ void __blk_run_queue(struct request_queue *q)
 	if (unlikely(blk_queue_stopped(q)))
 		return;
 
-	/*
-	 * Only recurse once to avoid overrunning the stack, let the unplug
-	 * handling reinvoke the handler shortly if we already got there.
-	 */
-	if (!queue_flag_test_and_set(QUEUE_FLAG_REENTER, q)) {
-		q->request_fn(q);
-		queue_flag_clear(QUEUE_FLAG_REENTER, q);
-	} else
-		queue_delayed_work(kblockd_workqueue, &q->delay_work, 0);
+	q->request_fn(q);
 }
 EXPORT_SYMBOL(__blk_run_queue);
 
@@ -328,6 +320,7 @@ void blk_run_queue_async(struct request_queue *q)
 	if (likely(!blk_queue_stopped(q)))
 		queue_delayed_work(kblockd_workqueue, &q->delay_work, 0);
 }
+EXPORT_SYMBOL(blk_run_queue_async);
 
 /**
  * blk_run_queue - run a single device queue
diff --git a/block/blk.h b/block/blk.h
index c9df8fc..6126346 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -22,7 +22,6 @@ void blk_rq_timed_out_timer(unsigned long data);
 void blk_delete_timer(struct request *);
 void blk_add_timer(struct request *);
 void __generic_unplug_device(struct request_queue *);
-void blk_run_queue_async(struct request_queue *q);
 
 /*
  * Internal atomic flags for request handling
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ab55c2f..e9901b8 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -411,8 +411,6 @@ static void scsi_run_queue(struct request_queue *q)
 	list_splice_init(&shost->starved_list, &starved_list);
 
 	while (!list_empty(&starved_list)) {
-		int flagset;
-
 		/*
 		 * As long as shost is accepting commands and we have
 		 * starved queues, call blk_run_queue. scsi_request_fn
@@ -435,20 +433,7 @@ static void scsi_run_queue(struct request_queue *q)
 			continue;
 		}
 
-		spin_unlock(shost->host_lock);
-
-		spin_lock(sdev->request_queue->queue_lock);
-		flagset = test_bit(QUEUE_FLAG_REENTER, &q->queue_flags) &&
-				!test_bit(QUEUE_FLAG_REENTER,
-					&sdev->request_queue->queue_flags);
-		if (flagset)
-			queue_flag_set(QUEUE_FLAG_REENTER, sdev->request_queue);
-		__blk_run_queue(sdev->request_queue);
-		if (flagset)
-			queue_flag_clear(QUEUE_FLAG_REENTER, sdev->request_queue);
-		spin_unlock(sdev->request_queue->queue_lock);
-
-		spin_lock(shost->host_lock);
+		blk_run_queue_async(sdev->request_queue);
 	}
 	/* put any unprocessed entries back */
 	list_splice(&starved_list, &shost->starved_list);
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 28c3350..815069d 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3816,28 +3816,17 @@ fail_host_msg:
 static void
 fc_bsg_goose_queue(struct fc_rport *rport)
 {
-	int flagset;
-	unsigned long flags;
-
 	if (!rport->rqst_q)
 		return;
 
+	/*
+	 * This get/put dance makes no sense
+	 */
 	get_device(&rport->dev);
-
-	spin_lock_irqsave(rport->rqst_q->queue_lock, flags);
-	flagset = test_bit(QUEUE_FLAG_REENTER, &rport->rqst_q->queue_flags) &&
-		  !test_bit(QUEUE_FLAG_REENTER, &rport->rqst_q->queue_flags);
-	if (flagset)
-		queue_flag_set(QUEUE_FLAG_REENTER, rport->rqst_q);
-	__blk_run_queue(rport->rqst_q);
-	if (flagset)
-		queue_flag_clear(QUEUE_FLAG_REENTER, rport->rqst_q);
-	spin_unlock_irqrestore(rport->rqst_q->queue_lock, flags);
-
+	blk_run_queue_async(rport->rqst_q);
 	put_device(&rport->dev);
 }
 
-
 /**
  * fc_bsg_rport_dispatch - process rport bsg requests and dispatch to LLDD
  * @q:		rport request queue
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index cbbfd98..2ad95fa 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -388,20 +388,19 @@ struct request_queue
 #define	QUEUE_FLAG_SYNCFULL	3	/* read queue has been filled */
 #define QUEUE_FLAG_ASYNCFULL	4	/* write queue has been filled */
 #define QUEUE_FLAG_DEAD		5	/* queue being torn down */
-#define QUEUE_FLAG_REENTER	6	/* Re-entrancy avoidance */
-#define QUEUE_FLAG_ELVSWITCH	7	/* don't use elevator, just do FIFO */
-#define QUEUE_FLAG_BIDI		8	/* queue supports bidi requests */
-#define QUEUE_FLAG_NOMERGES     9	/* disable merge attempts */
-#define QUEUE_FLAG_SAME_COMP   10	/* force complete on same CPU */
-#define QUEUE_FLAG_FAIL_IO     11	/* fake timeout */
-#define QUEUE_FLAG_STACKABLE   12	/* supports request stacking */
-#define QUEUE_FLAG_NONROT      13	/* non-rotational device (SSD) */
+#define QUEUE_FLAG_ELVSWITCH	6	/* don't use elevator, just do FIFO */
+#define QUEUE_FLAG_BIDI		7	/* queue supports bidi requests */
+#define QUEUE_FLAG_NOMERGES     8	/* disable merge attempts */
+#define QUEUE_FLAG_SAME_COMP	9	/* force complete on same CPU */
+#define QUEUE_FLAG_FAIL_IO     10	/* fake timeout */
+#define QUEUE_FLAG_STACKABLE   11	/* supports request stacking */
+#define QUEUE_FLAG_NONROT      12	/* non-rotational device (SSD) */
 #define QUEUE_FLAG_VIRT        QUEUE_FLAG_NONROT /* paravirt device */
-#define QUEUE_FLAG_IO_STAT     15	/* do IO stats */
-#define QUEUE_FLAG_DISCARD     16	/* supports DISCARD */
-#define QUEUE_FLAG_NOXMERGES   17	/* No extended merges */
-#define QUEUE_FLAG_ADD_RANDOM  18	/* Contributes to random pool */
-#define QUEUE_FLAG_SECDISCARD  19	/* supports SECDISCARD */
+#define QUEUE_FLAG_IO_STAT     13	/* do IO stats */
+#define QUEUE_FLAG_DISCARD     14	/* supports DISCARD */
+#define QUEUE_FLAG_NOXMERGES   15	/* No extended merges */
+#define QUEUE_FLAG_ADD_RANDOM  16	/* Contributes to random pool */
+#define QUEUE_FLAG_SECDISCARD  17	/* supports SECDISCARD */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
@@ -699,6 +698,7 @@ extern void blk_sync_queue(struct request_queue *q);
 extern void __blk_stop_queue(struct request_queue *q);
 extern void __blk_run_queue(struct request_queue *q);
 extern void blk_run_queue(struct request_queue *);
+extern void blk_run_queue_async(struct request_queue *q);
 extern int blk_rq_map_user(struct request_queue *, struct request *,
 			   struct rq_map_data *, void __user *, unsigned long,
 			   gfp_t);

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-19 11:16                   ` Jens Axboe
  0 siblings, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2011-04-19 11:16 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown

On 2011-04-19 11:09, Jens Axboe wrote:
> On 2011-04-18 20:32, Bart Van Assche wrote:
>> On Mon, Apr 18, 2011 at 8:28 PM, Jens Axboe <jaxboe-5c4llco8/ftWk0Htik3J/w@public.gmane.org> wrote:
>>> On 2011-04-18 20:21, Bart Van Assche wrote:
>>>> a performance regression in the block layer not related to the md
>>>> issue. If I run a small block IOPS test on a block device created by
>>>> ib_srp (NOOP scheduler) I see about 11% less IOPS than with 2.6.38.3
>>>> (155.000 IOPS with 2.6.38.3 and 140.000 IOPS with 2.6.39-rc3+).
>>>
>>> That's not good. What's the test case?
>>
>> Nothing more than a fio IOPS test:
>>
>> fio --bs=512 --ioengine=libaio --buffered=0 --rw=read --thread
>> --iodepth=64 --numjobs=2 --loops=10000 --group_reporting --size=1G
>>     --gtod_reduce=1 --name=iops-test --filename=/dev/${dev} --invalidate=1
> 
> Bart, can you try the below:

Here's a more complete variant. James, lets get rid of this REENTER
crap. It's completely bogus and triggers falsely for a variety of
reasons. The below will work, but there may be room for improvement on
the SCSI side.

diff --git a/block/blk-core.c b/block/blk-core.c
index 5fa3dd2..4e49665 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -303,15 +303,7 @@ void __blk_run_queue(struct request_queue *q)
 	if (unlikely(blk_queue_stopped(q)))
 		return;
 
-	/*
-	 * Only recurse once to avoid overrunning the stack, let the unplug
-	 * handling reinvoke the handler shortly if we already got there.
-	 */
-	if (!queue_flag_test_and_set(QUEUE_FLAG_REENTER, q)) {
-		q->request_fn(q);
-		queue_flag_clear(QUEUE_FLAG_REENTER, q);
-	} else
-		queue_delayed_work(kblockd_workqueue, &q->delay_work, 0);
+	q->request_fn(q);
 }
 EXPORT_SYMBOL(__blk_run_queue);
 
@@ -328,6 +320,7 @@ void blk_run_queue_async(struct request_queue *q)
 	if (likely(!blk_queue_stopped(q)))
 		queue_delayed_work(kblockd_workqueue, &q->delay_work, 0);
 }
+EXPORT_SYMBOL(blk_run_queue_async);
 
 /**
  * blk_run_queue - run a single device queue
diff --git a/block/blk.h b/block/blk.h
index c9df8fc..6126346 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -22,7 +22,6 @@ void blk_rq_timed_out_timer(unsigned long data);
 void blk_delete_timer(struct request *);
 void blk_add_timer(struct request *);
 void __generic_unplug_device(struct request_queue *);
-void blk_run_queue_async(struct request_queue *q);
 
 /*
  * Internal atomic flags for request handling
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ab55c2f..e9901b8 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -411,8 +411,6 @@ static void scsi_run_queue(struct request_queue *q)
 	list_splice_init(&shost->starved_list, &starved_list);
 
 	while (!list_empty(&starved_list)) {
-		int flagset;
-
 		/*
 		 * As long as shost is accepting commands and we have
 		 * starved queues, call blk_run_queue. scsi_request_fn
@@ -435,20 +433,7 @@ static void scsi_run_queue(struct request_queue *q)
 			continue;
 		}
 
-		spin_unlock(shost->host_lock);
-
-		spin_lock(sdev->request_queue->queue_lock);
-		flagset = test_bit(QUEUE_FLAG_REENTER, &q->queue_flags) &&
-				!test_bit(QUEUE_FLAG_REENTER,
-					&sdev->request_queue->queue_flags);
-		if (flagset)
-			queue_flag_set(QUEUE_FLAG_REENTER, sdev->request_queue);
-		__blk_run_queue(sdev->request_queue);
-		if (flagset)
-			queue_flag_clear(QUEUE_FLAG_REENTER, sdev->request_queue);
-		spin_unlock(sdev->request_queue->queue_lock);
-
-		spin_lock(shost->host_lock);
+		blk_run_queue_async(sdev->request_queue);
 	}
 	/* put any unprocessed entries back */
 	list_splice(&starved_list, &shost->starved_list);
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 28c3350..815069d 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3816,28 +3816,17 @@ fail_host_msg:
 static void
 fc_bsg_goose_queue(struct fc_rport *rport)
 {
-	int flagset;
-	unsigned long flags;
-
 	if (!rport->rqst_q)
 		return;
 
+	/*
+	 * This get/put dance makes no sense
+	 */
 	get_device(&rport->dev);
-
-	spin_lock_irqsave(rport->rqst_q->queue_lock, flags);
-	flagset = test_bit(QUEUE_FLAG_REENTER, &rport->rqst_q->queue_flags) &&
-		  !test_bit(QUEUE_FLAG_REENTER, &rport->rqst_q->queue_flags);
-	if (flagset)
-		queue_flag_set(QUEUE_FLAG_REENTER, rport->rqst_q);
-	__blk_run_queue(rport->rqst_q);
-	if (flagset)
-		queue_flag_clear(QUEUE_FLAG_REENTER, rport->rqst_q);
-	spin_unlock_irqrestore(rport->rqst_q->queue_lock, flags);
-
+	blk_run_queue_async(rport->rqst_q);
 	put_device(&rport->dev);
 }
 
-
 /**
  * fc_bsg_rport_dispatch - process rport bsg requests and dispatch to LLDD
  * @q:		rport request queue
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index cbbfd98..2ad95fa 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -388,20 +388,19 @@ struct request_queue
 #define	QUEUE_FLAG_SYNCFULL	3	/* read queue has been filled */
 #define QUEUE_FLAG_ASYNCFULL	4	/* write queue has been filled */
 #define QUEUE_FLAG_DEAD		5	/* queue being torn down */
-#define QUEUE_FLAG_REENTER	6	/* Re-entrancy avoidance */
-#define QUEUE_FLAG_ELVSWITCH	7	/* don't use elevator, just do FIFO */
-#define QUEUE_FLAG_BIDI		8	/* queue supports bidi requests */
-#define QUEUE_FLAG_NOMERGES     9	/* disable merge attempts */
-#define QUEUE_FLAG_SAME_COMP   10	/* force complete on same CPU */
-#define QUEUE_FLAG_FAIL_IO     11	/* fake timeout */
-#define QUEUE_FLAG_STACKABLE   12	/* supports request stacking */
-#define QUEUE_FLAG_NONROT      13	/* non-rotational device (SSD) */
+#define QUEUE_FLAG_ELVSWITCH	6	/* don't use elevator, just do FIFO */
+#define QUEUE_FLAG_BIDI		7	/* queue supports bidi requests */
+#define QUEUE_FLAG_NOMERGES     8	/* disable merge attempts */
+#define QUEUE_FLAG_SAME_COMP	9	/* force complete on same CPU */
+#define QUEUE_FLAG_FAIL_IO     10	/* fake timeout */
+#define QUEUE_FLAG_STACKABLE   11	/* supports request stacking */
+#define QUEUE_FLAG_NONROT      12	/* non-rotational device (SSD) */
 #define QUEUE_FLAG_VIRT        QUEUE_FLAG_NONROT /* paravirt device */
-#define QUEUE_FLAG_IO_STAT     15	/* do IO stats */
-#define QUEUE_FLAG_DISCARD     16	/* supports DISCARD */
-#define QUEUE_FLAG_NOXMERGES   17	/* No extended merges */
-#define QUEUE_FLAG_ADD_RANDOM  18	/* Contributes to random pool */
-#define QUEUE_FLAG_SECDISCARD  19	/* supports SECDISCARD */
+#define QUEUE_FLAG_IO_STAT     13	/* do IO stats */
+#define QUEUE_FLAG_DISCARD     14	/* supports DISCARD */
+#define QUEUE_FLAG_NOXMERGES   15	/* No extended merges */
+#define QUEUE_FLAG_ADD_RANDOM  16	/* Contributes to random pool */
+#define QUEUE_FLAG_SECDISCARD  17	/* supports SECDISCARD */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
@@ -699,6 +698,7 @@ extern void blk_sync_queue(struct request_queue *q);
 extern void __blk_stop_queue(struct request_queue *q);
 extern void __blk_run_queue(struct request_queue *q);
 extern void blk_run_queue(struct request_queue *);
+extern void blk_run_queue_async(struct request_queue *q);
 extern int blk_rq_map_user(struct request_queue *, struct request *,
 			   struct rq_map_data *, void __user *, unsigned long,
 			   gfp_t);

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-19  9:09                 ` Jens Axboe
  0 siblings, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2011-04-19  9:09 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown

On 2011-04-18 20:32, Bart Van Assche wrote:
> On Mon, Apr 18, 2011 at 8:28 PM, Jens Axboe <jaxboe@fusionio.com> wrote:
>> On 2011-04-18 20:21, Bart Van Assche wrote:
>>> a performance regression in the block layer not related to the md
>>> issue. If I run a small block IOPS test on a block device created by
>>> ib_srp (NOOP scheduler) I see about 11% less IOPS than with 2.6.38.3
>>> (155.000 IOPS with 2.6.38.3 and 140.000 IOPS with 2.6.39-rc3+).
>>
>> That's not good. What's the test case?
> 
> Nothing more than a fio IOPS test:
> 
> fio --bs=512 --ioengine=libaio --buffered=0 --rw=read --thread
> --iodepth=64 --numjobs=2 --loops=10000 --group_reporting --size=1G
>     --gtod_reduce=1 --name=iops-test --filename=/dev/${dev} --invalidate=1

Bart, can you try the below:

diff --git a/block/blk-core.c b/block/blk-core.c
index 5fa3dd2..9b41da1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -307,11 +307,7 @@ void __blk_run_queue(struct request_queue *q)
 	 * Only recurse once to avoid overrunning the stack, let the unplug
 	 * handling reinvoke the handler shortly if we already got there.
 	 */
-	if (!queue_flag_test_and_set(QUEUE_FLAG_REENTER, q)) {
-		q->request_fn(q);
-		queue_flag_clear(QUEUE_FLAG_REENTER, q);
-	} else
-		queue_delayed_work(kblockd_workqueue, &q->delay_work, 0);
+	q->request_fn(q);
 }
 EXPORT_SYMBOL(__blk_run_queue);
 

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-19  9:09                 ` Jens Axboe
  0 siblings, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2011-04-19  9:09 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown

On 2011-04-18 20:32, Bart Van Assche wrote:
> On Mon, Apr 18, 2011 at 8:28 PM, Jens Axboe <jaxboe-5c4llco8/ftWk0Htik3J/w@public.gmane.org> wrote:
>> On 2011-04-18 20:21, Bart Van Assche wrote:
>>> a performance regression in the block layer not related to the md
>>> issue. If I run a small block IOPS test on a block device created by
>>> ib_srp (NOOP scheduler) I see about 11% less IOPS than with 2.6.38.3
>>> (155.000 IOPS with 2.6.38.3 and 140.000 IOPS with 2.6.39-rc3+).
>>
>> That's not good. What's the test case?
> 
> Nothing more than a fio IOPS test:
> 
> fio --bs=512 --ioengine=libaio --buffered=0 --rw=read --thread
> --iodepth=64 --numjobs=2 --loops=10000 --group_reporting --size=1G
>     --gtod_reduce=1 --name=iops-test --filename=/dev/${dev} --invalidate=1

Bart, can you try the below:

diff --git a/block/blk-core.c b/block/blk-core.c
index 5fa3dd2..9b41da1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -307,11 +307,7 @@ void __blk_run_queue(struct request_queue *q)
 	 * Only recurse once to avoid overrunning the stack, let the unplug
 	 * handling reinvoke the handler shortly if we already got there.
 	 */
-	if (!queue_flag_test_and_set(QUEUE_FLAG_REENTER, q)) {
-		q->request_fn(q);
-		queue_flag_clear(QUEUE_FLAG_REENTER, q);
-	} else
-		queue_delayed_work(kblockd_workqueue, &q->delay_work, 0);
+	q->request_fn(q);
 }
 EXPORT_SYMBOL(__blk_run_queue);
 

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-19  3:32             ` David Dillow
  0 siblings, 0 replies; 60+ messages in thread
From: David Dillow @ 2011-04-19  3:32 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Linus Torvalds, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Maciej Rutecki,
	Florian Mickler, Neil Brown

On Mon, 2011-04-18 at 20:21 +0200, Bart Van Assche wrote:
> On Mon, Apr 18, 2011 at 1:44 PM, Jens Axboe <jaxboe@fusionio.com> wrote:
> > Bart, can you try and pull:
> >
> > git://git.kernel.dk/linux-2.6-block.git for-linus
> >
> > into Linus' tree and see if that works? This has, among other things,
> > Neils fixes for MD.
> 
> md seems to work stable with the resulting tree, but it looks there is
> a performance regression in the block layer not related to the md
> issue. If I run a small block IOPS test on a block device created by
> ib_srp (NOOP scheduler) I see about 11% less IOPS than with 2.6.38.3
> (155.000 IOPS with 2.6.38.3 and 140.000 IOPS with 2.6.39-rc3+).

The mapping code for ib_srp changed in 2.6.39-rc1, but it showed
improved IOPS for a similar setup in my testing so I'd be surprised if
it is the culprit. Still, it wouldn't hurt to check. Do you have time to
try the new ib_srp code with 2.6.38.3 to eliminate it from the equation?

Thanks,
Dave


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-19  3:32             ` David Dillow
  0 siblings, 0 replies; 60+ messages in thread
From: David Dillow @ 2011-04-19  3:32 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Linus Torvalds, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Maciej Rutecki,
	Florian Mickler, Neil Brown

On Mon, 2011-04-18 at 20:21 +0200, Bart Van Assche wrote:
> On Mon, Apr 18, 2011 at 1:44 PM, Jens Axboe <jaxboe-5c4llco8/ftWk0Htik3J/w@public.gmane.org> wrote:
> > Bart, can you try and pull:
> >
> > git://git.kernel.dk/linux-2.6-block.git for-linus
> >
> > into Linus' tree and see if that works? This has, among other things,
> > Neils fixes for MD.
> 
> md seems to work stable with the resulting tree, but it looks there is
> a performance regression in the block layer not related to the md
> issue. If I run a small block IOPS test on a block device created by
> ib_srp (NOOP scheduler) I see about 11% less IOPS than with 2.6.38.3
> (155.000 IOPS with 2.6.38.3 and 140.000 IOPS with 2.6.39-rc3+).

The mapping code for ib_srp changed in 2.6.39-rc1, but it showed
improved IOPS for a similar setup in my testing so I'd be surprised if
it is the culprit. Still, it wouldn't hurt to check. Do you have time to
try the new ib_srp code with 2.6.38.3 to eliminate it from the equation?

Thanks,
Dave

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-18 18:38                 ` Jens Axboe
  0 siblings, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2011-04-18 18:38 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown

On 2011-04-18 20:32, Bart Van Assche wrote:
> On Mon, Apr 18, 2011 at 8:28 PM, Jens Axboe <jaxboe@fusionio.com> wrote:
>> On 2011-04-18 20:21, Bart Van Assche wrote:
>>> a performance regression in the block layer not related to the md
>>> issue. If I run a small block IOPS test on a block device created by
>>> ib_srp (NOOP scheduler) I see about 11% less IOPS than with 2.6.38.3
>>> (155.000 IOPS with 2.6.38.3 and 140.000 IOPS with 2.6.39-rc3+).
>>
>> That's not good. What's the test case?
> 
> Nothing more than a fio IOPS test:
> 
> fio --bs=512 --ioengine=libaio --buffered=0 --rw=read --thread
> --iodepth=64 --numjobs=2 --loops=10000 --group_reporting --size=1G
>     --gtod_reduce=1 --name=iops-test --filename=/dev/${dev} --invalidate=1

Interesting, I'll have to check if we regressed with all these recent
changes. Comparing your .38 to .39-rc3+, are you using more/less CPU,
more/less sys%, etc?

A quick perf record -fg / perf report -g for both kernels would be nice
to see.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-18 18:38                 ` Jens Axboe
  0 siblings, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2011-04-18 18:38 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown

On 2011-04-18 20:32, Bart Van Assche wrote:
> On Mon, Apr 18, 2011 at 8:28 PM, Jens Axboe <jaxboe-5c4llco8/ftWk0Htik3J/w@public.gmane.org> wrote:
>> On 2011-04-18 20:21, Bart Van Assche wrote:
>>> a performance regression in the block layer not related to the md
>>> issue. If I run a small block IOPS test on a block device created by
>>> ib_srp (NOOP scheduler) I see about 11% less IOPS than with 2.6.38.3
>>> (155.000 IOPS with 2.6.38.3 and 140.000 IOPS with 2.6.39-rc3+).
>>
>> That's not good. What's the test case?
> 
> Nothing more than a fio IOPS test:
> 
> fio --bs=512 --ioengine=libaio --buffered=0 --rw=read --thread
> --iodepth=64 --numjobs=2 --loops=10000 --group_reporting --size=1G
>     --gtod_reduce=1 --name=iops-test --filename=/dev/${dev} --invalidate=1

Interesting, I'll have to check if we regressed with all these recent
changes. Comparing your .38 to .39-rc3+, are you using more/less CPU,
more/less sys%, etc?

A quick perf record -fg / perf report -g for both kernels would be nice
to see.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-18 18:32               ` Bart Van Assche
  0 siblings, 0 replies; 60+ messages in thread
From: Bart Van Assche @ 2011-04-18 18:32 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown

On Mon, Apr 18, 2011 at 8:28 PM, Jens Axboe <jaxboe@fusionio.com> wrote:
> On 2011-04-18 20:21, Bart Van Assche wrote:
>> a performance regression in the block layer not related to the md
>> issue. If I run a small block IOPS test on a block device created by
>> ib_srp (NOOP scheduler) I see about 11% less IOPS than with 2.6.38.3
>> (155.000 IOPS with 2.6.38.3 and 140.000 IOPS with 2.6.39-rc3+).
>
> That's not good. What's the test case?

Nothing more than a fio IOPS test:

fio --bs=512 --ioengine=libaio --buffered=0 --rw=read --thread
--iodepth=64 --numjobs=2 --loops=10000 --group_reporting --size=1G
    --gtod_reduce=1 --name=iops-test --filename=/dev/${dev} --invalidate=1

Bart.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-18 18:32               ` Bart Van Assche
  0 siblings, 0 replies; 60+ messages in thread
From: Bart Van Assche @ 2011-04-18 18:32 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown

On Mon, Apr 18, 2011 at 8:28 PM, Jens Axboe <jaxboe-5c4llco8/ftWk0Htik3J/w@public.gmane.org> wrote:
> On 2011-04-18 20:21, Bart Van Assche wrote:
>> a performance regression in the block layer not related to the md
>> issue. If I run a small block IOPS test on a block device created by
>> ib_srp (NOOP scheduler) I see about 11% less IOPS than with 2.6.38.3
>> (155.000 IOPS with 2.6.38.3 and 140.000 IOPS with 2.6.39-rc3+).
>
> That's not good. What's the test case?

Nothing more than a fio IOPS test:

fio --bs=512 --ioengine=libaio --buffered=0 --rw=read --thread
--iodepth=64 --numjobs=2 --loops=10000 --group_reporting --size=1G
    --gtod_reduce=1 --name=iops-test --filename=/dev/${dev} --invalidate=1

Bart.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-18 18:28             ` Jens Axboe
  0 siblings, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2011-04-18 18:28 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown

On 2011-04-18 20:21, Bart Van Assche wrote:
> On Mon, Apr 18, 2011 at 1:44 PM, Jens Axboe <jaxboe@fusionio.com> wrote:
>> Bart, can you try and pull:
>>
>> git://git.kernel.dk/linux-2.6-block.git for-linus
>>
>> into Linus' tree and see if that works? This has, among other things,
>> Neils fixes for MD.
> 
> md seems to work stable with the resulting tree, but it looks there is

OK, that's the most important bit.

> a performance regression in the block layer not related to the md
> issue. If I run a small block IOPS test on a block device created by
> ib_srp (NOOP scheduler) I see about 11% less IOPS than with 2.6.38.3
> (155.000 IOPS with 2.6.38.3 and 140.000 IOPS with 2.6.39-rc3+).

That's not good. What's the test case?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-18 18:28             ` Jens Axboe
  0 siblings, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2011-04-18 18:28 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown

On 2011-04-18 20:21, Bart Van Assche wrote:
> On Mon, Apr 18, 2011 at 1:44 PM, Jens Axboe <jaxboe-5c4llco8/ftWk0Htik3J/w@public.gmane.org> wrote:
>> Bart, can you try and pull:
>>
>> git://git.kernel.dk/linux-2.6-block.git for-linus
>>
>> into Linus' tree and see if that works? This has, among other things,
>> Neils fixes for MD.
> 
> md seems to work stable with the resulting tree, but it looks there is

OK, that's the most important bit.

> a performance regression in the block layer not related to the md
> issue. If I run a small block IOPS test on a block device created by
> ib_srp (NOOP scheduler) I see about 11% less IOPS than with 2.6.38.3
> (155.000 IOPS with 2.6.38.3 and 140.000 IOPS with 2.6.39-rc3+).

That's not good. What's the test case?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-18 18:21           ` Bart Van Assche
  0 siblings, 0 replies; 60+ messages in thread
From: Bart Van Assche @ 2011-04-18 18:21 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown

On Mon, Apr 18, 2011 at 1:44 PM, Jens Axboe <jaxboe@fusionio.com> wrote:
> Bart, can you try and pull:
>
> git://git.kernel.dk/linux-2.6-block.git for-linus
>
> into Linus' tree and see if that works? This has, among other things,
> Neils fixes for MD.

md seems to work stable with the resulting tree, but it looks there is
a performance regression in the block layer not related to the md
issue. If I run a small block IOPS test on a block device created by
ib_srp (NOOP scheduler) I see about 11% less IOPS than with 2.6.38.3
(155.000 IOPS with 2.6.38.3 and 140.000 IOPS with 2.6.39-rc3+).

Bart.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-18 18:21           ` Bart Van Assche
  0 siblings, 0 replies; 60+ messages in thread
From: Bart Van Assche @ 2011-04-18 18:21 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown

On Mon, Apr 18, 2011 at 1:44 PM, Jens Axboe <jaxboe-5c4llco8/ftWk0Htik3J/w@public.gmane.org> wrote:
> Bart, can you try and pull:
>
> git://git.kernel.dk/linux-2.6-block.git for-linus
>
> into Linus' tree and see if that works? This has, among other things,
> Neils fixes for MD.

md seems to work stable with the resulting tree, but it looks there is
a performance regression in the block layer not related to the md
issue. If I run a small block IOPS test on a block device created by
ib_srp (NOOP scheduler) I see about 11% less IOPS than with 2.6.38.3
(155.000 IOPS with 2.6.38.3 and 140.000 IOPS with 2.6.39-rc3+).

Bart.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-18 11:44         ` Jens Axboe
  0 siblings, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2011-04-18 11:44 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown

On 2011-04-17 20:37, Bart Van Assche wrote:
> On Sun, Apr 17, 2011 at 7:03 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Sun, Apr 17, 2011 at 5:57 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>>> This message has been generated automatically as a part of a summary report
>>> of recent regressions.
>>>
>>> The following bug entry is on the current list of known regressions
>>> from 2.6.38.  Please verify if it still should be listed and let the tracking team
>>> know (either way).
>>>
>>>
>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=32982
>>> Subject         : Kernel locks up a few minutes after boot
>>> Submitter       : Bart Van Assche <bart.vanassche@gmail.com>
>>> Date            : 2011-04-10 19:55 (8 days old)
>>
>> Is this machine running a RAID5 setup or something like that?
>>
>> There is a known interaction with the new block layer plugging code
>> and MD. The "hung task" report in that bugzilla looks very much like
>> that issue. And you do have "root=/dev/md0", so clearly there's some
>> md thing going on.
>>
>> And bisecting might not work all that well for it, because I suspect
>> it ends up being very much a matter of IO patterns how it triggers.
>>
>> Neil supposedly has a patch for it, but I haven't seen it yet. Neil, Jens?
> 
> (converted top-posting into bottom-posting)
> 
> Hello Linus,
> 
> On the system on which bug #32982 has been triggered md0, md1 and md2
> have been configured as two-disk RAID1 (mirroring).
> 
> I've done my best to trigger enough I/O in order to obtain reliable
> bisect results. A difficulty I encountered during bisecting though was
> that I encountered unbootable kernels (all skipped revisions).

Bart, can you try and pull:

git://git.kernel.dk/linux-2.6-block.git for-linus

into Linus' tree and see if that works? This has, among other things,
Neils fixes for MD.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-18 11:44         ` Jens Axboe
  0 siblings, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2011-04-18 11:44 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown

On 2011-04-17 20:37, Bart Van Assche wrote:
> On Sun, Apr 17, 2011 at 7:03 PM, Linus Torvalds
> <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
>> On Sun, Apr 17, 2011 at 5:57 AM, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote:
>>> This message has been generated automatically as a part of a summary report
>>> of recent regressions.
>>>
>>> The following bug entry is on the current list of known regressions
>>> from 2.6.38.  Please verify if it still should be listed and let the tracking team
>>> know (either way).
>>>
>>>
>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=32982
>>> Subject         : Kernel locks up a few minutes after boot
>>> Submitter       : Bart Van Assche <bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>> Date            : 2011-04-10 19:55 (8 days old)
>>
>> Is this machine running a RAID5 setup or something like that?
>>
>> There is a known interaction with the new block layer plugging code
>> and MD. The "hung task" report in that bugzilla looks very much like
>> that issue. And you do have "root=/dev/md0", so clearly there's some
>> md thing going on.
>>
>> And bisecting might not work all that well for it, because I suspect
>> it ends up being very much a matter of IO patterns how it triggers.
>>
>> Neil supposedly has a patch for it, but I haven't seen it yet. Neil, Jens?
> 
> (converted top-posting into bottom-posting)
> 
> Hello Linus,
> 
> On the system on which bug #32982 has been triggered md0, md1 and md2
> have been configured as two-disk RAID1 (mirroring).
> 
> I've done my best to trigger enough I/O in order to obtain reliable
> bisect results. A difficulty I encountered during bisecting though was
> that I encountered unbootable kernels (all skipped revisions).

Bart, can you try and pull:

git://git.kernel.dk/linux-2.6-block.git for-linus

into Linus' tree and see if that works? This has, among other things,
Neils fixes for MD.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-17 22:20           ` NeilBrown
  0 siblings, 0 replies; 60+ messages in thread
From: NeilBrown @ 2011-04-17 22:20 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Jens Axboe

On Mon, 18 Apr 2011 07:07:11 +1000 NeilBrown <neilb@suse.de> wrote:

> On Sun, 17 Apr 2011 20:37:39 +0200 Bart Van Assche <bvanassche@acm.org> wrote:
> 
> > On Sun, Apr 17, 2011 at 7:03 PM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > > On Sun, Apr 17, 2011 at 5:57 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > > > This message has been generated automatically as a part of a summary report
> > > > of recent regressions.
> > > >
> > > > The following bug entry is on the current list of known regressions
> > > > from 2.6.38.  Please verify if it still should be listed and let the tracking team
> > > > know (either way).
> > > >
> > > >
> > > > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=32982
> > > > Subject         : Kernel locks up a few minutes after boot
> > > > Submitter       : Bart Van Assche <bart.vanassche@gmail.com>
> > > > Date            : 2011-04-10 19:55 (8 days old)
> > >
> > > Is this machine running a RAID5 setup or something like that?
> > >
> > > There is a known interaction with the new block layer plugging code
> > > and MD. The "hung task" report in that bugzilla looks very much like
> > > that issue. And you do have "root=/dev/md0", so clearly there's some
> > > md thing going on.
> > >
> > > And bisecting might not work all that well for it, because I suspect
> > > it ends up being very much a matter of IO patterns how it triggers.
> > >
> > > Neil supposedly has a patch for it, but I haven't seen it yet. Neil, Jens?
> > 
> > (converted top-posting into bottom-posting)
> > 
> > Hello Linus,
> > 
> > On the system on which bug #32982 has been triggered md0, md1 and md2
> > have been configured as two-disk RAID1 (mirroring).
> 
> If any of those have write-intent bitmaps then I definitely know what the
> problem is and I'll be posting patches later today (probably not much later).
> 

Actually it won't be today.  The new block device plugging is still unusable
for MD - so I won't be able to fix this until that gets sorted out.

NeilBrown

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-17 22:20           ` NeilBrown
  0 siblings, 0 replies; 60+ messages in thread
From: NeilBrown @ 2011-04-17 22:20 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Jens Axboe

On Mon, 18 Apr 2011 07:07:11 +1000 NeilBrown <neilb-l3A5Bk7waGM@public.gmane.org> wrote:

> On Sun, 17 Apr 2011 20:37:39 +0200 Bart Van Assche <bvanassche-y/PYEvSCHaw@public.gmane.orgg> wrote:
> 
> > On Sun, Apr 17, 2011 at 7:03 PM, Linus Torvalds
> > <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> > > On Sun, Apr 17, 2011 at 5:57 AM, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote:
> > > > This message has been generated automatically as a part of a summary report
> > > > of recent regressions.
> > > >
> > > > The following bug entry is on the current list of known regressions
> > > > from 2.6.38.  Please verify if it still should be listed and let the tracking team
> > > > know (either way).
> > > >
> > > >
> > > > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=32982
> > > > Subject         : Kernel locks up a few minutes after boot
> > > > Submitter       : Bart Van Assche <bart.vanassche@gmail.com>
> > > > Date            : 2011-04-10 19:55 (8 days old)
> > >
> > > Is this machine running a RAID5 setup or something like that?
> > >
> > > There is a known interaction with the new block layer plugging code
> > > and MD. The "hung task" report in that bugzilla looks very much like
> > > that issue. And you do have "root=/dev/md0", so clearly there's some
> > > md thing going on.
> > >
> > > And bisecting might not work all that well for it, because I suspect
> > > it ends up being very much a matter of IO patterns how it triggers.
> > >
> > > Neil supposedly has a patch for it, but I haven't seen it yet. Neil, Jens?
> > 
> > (converted top-posting into bottom-posting)
> > 
> > Hello Linus,
> > 
> > On the system on which bug #32982 has been triggered md0, md1 and md2
> > have been configured as two-disk RAID1 (mirroring).
> 
> If any of those have write-intent bitmaps then I definitely know what the
> problem is and I'll be posting patches later today (probably not much later).
> 

Actually it won't be today.  The new block device plugging is still unusable
for MD - so I won't be able to fix this until that gets sorted out.

NeilBrown

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-17 21:07         ` NeilBrown
  0 siblings, 0 replies; 60+ messages in thread
From: NeilBrown @ 2011-04-17 21:07 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Jens Axboe

On Sun, 17 Apr 2011 20:37:39 +0200 Bart Van Assche <bvanassche@acm.org> wrote:

> On Sun, Apr 17, 2011 at 7:03 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Sun, Apr 17, 2011 at 5:57 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > > This message has been generated automatically as a part of a summary report
> > > of recent regressions.
> > >
> > > The following bug entry is on the current list of known regressions
> > > from 2.6.38.  Please verify if it still should be listed and let the tracking team
> > > know (either way).
> > >
> > >
> > > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=32982
> > > Subject         : Kernel locks up a few minutes after boot
> > > Submitter       : Bart Van Assche <bart.vanassche@gmail.com>
> > > Date            : 2011-04-10 19:55 (8 days old)
> >
> > Is this machine running a RAID5 setup or something like that?
> >
> > There is a known interaction with the new block layer plugging code
> > and MD. The "hung task" report in that bugzilla looks very much like
> > that issue. And you do have "root=/dev/md0", so clearly there's some
> > md thing going on.
> >
> > And bisecting might not work all that well for it, because I suspect
> > it ends up being very much a matter of IO patterns how it triggers.
> >
> > Neil supposedly has a patch for it, but I haven't seen it yet. Neil, Jens?
> 
> (converted top-posting into bottom-posting)
> 
> Hello Linus,
> 
> On the system on which bug #32982 has been triggered md0, md1 and md2
> have been configured as two-disk RAID1 (mirroring).

If any of those have write-intent bitmaps then I definitely know what the
problem is and I'll be posting patches later today (probably not much later).

If not .. then I'm less sure but it would certainly be worth testing after
applying the promised fixes.

NeilBrown


> 
> I've done my best to trigger enough I/O in order to obtain reliable
> bisect results. A difficulty I encountered during bisecting though was
> that I encountered unbootable kernels (all skipped revisions).
> 
> Bart.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-17 21:07         ` NeilBrown
  0 siblings, 0 replies; 60+ messages in thread
From: NeilBrown @ 2011-04-17 21:07 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Jens Axboe

On Sun, 17 Apr 2011 20:37:39 +0200 Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:

> On Sun, Apr 17, 2011 at 7:03 PM, Linus Torvalds
> <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> > On Sun, Apr 17, 2011 at 5:57 AM, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote:
> > > This message has been generated automatically as a part of a summary report
> > > of recent regressions.
> > >
> > > The following bug entry is on the current list of known regressions
> > > from 2.6.38.  Please verify if it still should be listed and let the tracking team
> > > know (either way).
> > >
> > >
> > > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=32982
> > > Subject         : Kernel locks up a few minutes after boot
> > > Submitter       : Bart Van Assche <bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > > Date            : 2011-04-10 19:55 (8 days old)
> >
> > Is this machine running a RAID5 setup or something like that?
> >
> > There is a known interaction with the new block layer plugging code
> > and MD. The "hung task" report in that bugzilla looks very much like
> > that issue. And you do have "root=/dev/md0", so clearly there's some
> > md thing going on.
> >
> > And bisecting might not work all that well for it, because I suspect
> > it ends up being very much a matter of IO patterns how it triggers.
> >
> > Neil supposedly has a patch for it, but I haven't seen it yet. Neil, Jens?
> 
> (converted top-posting into bottom-posting)
> 
> Hello Linus,
> 
> On the system on which bug #32982 has been triggered md0, md1 and md2
> have been configured as two-disk RAID1 (mirroring).

If any of those have write-intent bitmaps then I definitely know what the
problem is and I'll be posting patches later today (probably not much later).

If not .. then I'm less sure but it would certainly be worth testing after
applying the promised fixes.

NeilBrown


> 
> I've done my best to trigger enough I/O in order to obtain reliable
> bisect results. A difficulty I encountered during bisecting though was
> that I encountered unbootable kernels (all skipped revisions).
> 
> Bart.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-17 18:37       ` Bart Van Assche
  0 siblings, 0 replies; 60+ messages in thread
From: Bart Van Assche @ 2011-04-17 18:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown,
	Jens Axboe

On Sun, Apr 17, 2011 at 7:03 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sun, Apr 17, 2011 at 5:57 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > This message has been generated automatically as a part of a summary report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.38.  Please verify if it still should be listed and let the tracking team
> > know (either way).
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=32982
> > Subject         : Kernel locks up a few minutes after boot
> > Submitter       : Bart Van Assche <bart.vanassche@gmail.com>
> > Date            : 2011-04-10 19:55 (8 days old)
>
> Is this machine running a RAID5 setup or something like that?
>
> There is a known interaction with the new block layer plugging code
> and MD. The "hung task" report in that bugzilla looks very much like
> that issue. And you do have "root=/dev/md0", so clearly there's some
> md thing going on.
>
> And bisecting might not work all that well for it, because I suspect
> it ends up being very much a matter of IO patterns how it triggers.
>
> Neil supposedly has a patch for it, but I haven't seen it yet. Neil, Jens?

(converted top-posting into bottom-posting)

Hello Linus,

On the system on which bug #32982 has been triggered md0, md1 and md2
have been configured as two-disk RAID1 (mirroring).

I've done my best to trigger enough I/O in order to obtain reliable
bisect results. A difficulty I encountered during bisecting though was
that I encountered unbootable kernels (all skipped revisions).

Bart.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-17 18:37       ` Bart Van Assche
  0 siblings, 0 replies; 60+ messages in thread
From: Bart Van Assche @ 2011-04-17 18:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, Florian Mickler, Neil Brown,
	Jens Axboe

On Sun, Apr 17, 2011 at 7:03 PM, Linus Torvalds
<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> On Sun, Apr 17, 2011 at 5:57 AM, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote:
> > This message has been generated automatically as a part of a summary report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.38.  Please verify if it still should be listed and let the tracking team
> > know (either way).
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=32982
> > Subject         : Kernel locks up a few minutes after boot
> > Submitter       : Bart Van Assche <bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > Date            : 2011-04-10 19:55 (8 days old)
>
> Is this machine running a RAID5 setup or something like that?
>
> There is a known interaction with the new block layer plugging code
> and MD. The "hung task" report in that bugzilla looks very much like
> that issue. And you do have "root=/dev/md0", so clearly there's some
> md thing going on.
>
> And bisecting might not work all that well for it, because I suspect
> it ends up being very much a matter of IO patterns how it triggers.
>
> Neil supposedly has a patch for it, but I haven't seen it yet. Neil, Jens?

(converted top-posting into bottom-posting)

Hello Linus,

On the system on which bug #32982 has been triggered md0, md1 and md2
have been configured as two-disk RAID1 (mirroring).

I've done my best to trigger enough I/O in order to obtain reliable
bisect results. A difficulty I encountered during bisecting though was
that I encountered unbootable kernels (all skipped revisions).

Bart.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
  2011-04-17 12:57   ` Rafael J. Wysocki
@ 2011-04-17 17:03     ` Linus Torvalds
  -1 siblings, 0 replies; 60+ messages in thread
From: Linus Torvalds @ 2011-04-17 17:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Maciej Rutecki,
	Florian Mickler, Bart Van Assche, Neil Brown, Jens Axboe

Is this machine running a RAID5 setup or something like that?

There is a known interaction with the new block layer plugging code
and MD. The "hung task" report in that bugzilla looks very much like
that issue. And you do have "root=/dev/md0", so clearly there's some
md thing going on.

And bisecting might not work all that well for it, because I suspect
it ends up being very much a matter of IO patterns how it triggers.

Neil supposedly has a patch for it, but I haven't seen it yet. Neil, Jens?

                                   Linus

On Sun, Apr 17, 2011 at 5:57 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> This message has been generated automatically as a part of a summary report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.38.  Please verify if it still should be listed and let the tracking team
> know (either way).
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=32982
> Subject         : Kernel locks up a few minutes after boot
> Submitter       : Bart Van Assche <bart.vanassche@gmail.com>
> Date            : 2011-04-10 19:55 (8 days old)
>
>
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-17 17:03     ` Linus Torvalds
  0 siblings, 0 replies; 60+ messages in thread
From: Linus Torvalds @ 2011-04-17 17:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Maciej Rutecki,
	Florian Mickler, Bart Van Assche, Neil Brown, Jens Axboe

Is this machine running a RAID5 setup or something like that?

There is a known interaction with the new block layer plugging code
and MD. The "hung task" report in that bugzilla looks very much like
that issue. And you do have "root=/dev/md0", so clearly there's some
md thing going on.

And bisecting might not work all that well for it, because I suspect
it ends up being very much a matter of IO patterns how it triggers.

Neil supposedly has a patch for it, but I haven't seen it yet. Neil, Jens?

                                   Linus

On Sun, Apr 17, 2011 at 5:57 AM, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote:
> This message has been generated automatically as a part of a summary report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.38.  Please verify if it still should be listed and let the tracking team
> know (either way).
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=32982
> Subject         : Kernel locks up a few minutes after boot
> Submitter       : Bart Van Assche <bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Date            : 2011-04-10 19:55 (8 days old)
>
>
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Bug #32982] Kernel locks up a few minutes after boot
  2011-04-17 12:52 2.6.39-rc3-git7: Reported regressions from 2.6.38 Rafael J. Wysocki
@ 2011-04-17 12:57   ` Rafael J. Wysocki
  0 siblings, 0 replies; 60+ messages in thread
From: Rafael J. Wysocki @ 2011-04-17 12:57 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Maciej Rutecki, Florian Mickler,
	Bart Van Assche, Linus Torvalds

This message has been generated automatically as a part of a summary report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.38.  Please verify if it still should be listed and let the tracking team
know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=32982
Subject		: Kernel locks up a few minutes after boot
Submitter	: Bart Van Assche <bart.vanassche@gmail.com>
Date		: 2011-04-10 19:55 (8 days old)



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Bug #32982] Kernel locks up a few minutes after boot
@ 2011-04-17 12:57   ` Rafael J. Wysocki
  0 siblings, 0 replies; 60+ messages in thread
From: Rafael J. Wysocki @ 2011-04-17 12:57 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Maciej Rutecki, Florian Mickler,
	Bart Van Assche, Linus Torvalds

This message has been generated automatically as a part of a summary report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.38.  Please verify if it still should be listed and let the tracking team
know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=32982
Subject		: Kernel locks up a few minutes after boot
Submitter	: Bart Van Assche <bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2011-04-10 19:55 (8 days old)


^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2011-05-01 17:01 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-32982-2300@https.bugzilla.kernel.org/>
2011-04-11 23:05 ` [Bug 32982] Kernel locks up a few minutes after boot bugzilla-daemon
2011-04-11 23:09 ` bugzilla-daemon
2011-04-11 23:14 ` bugzilla-daemon
2011-04-11 23:15 ` bugzilla-daemon
2011-04-12 18:34 ` bugzilla-daemon
2011-04-12 18:40 ` bugzilla-daemon
2011-04-12 18:42 ` bugzilla-daemon
2011-04-13 18:49 ` bugzilla-daemon
2011-04-14 15:57 ` bugzilla-daemon
2011-04-14 19:24 ` bugzilla-daemon
2011-04-15 10:23 ` bugzilla-daemon
2011-04-17 18:20 ` bugzilla-daemon
2011-05-01  9:35 ` bugzilla-daemon
2011-05-01 11:22 ` bugzilla-daemon
2011-04-17 12:52 2.6.39-rc3-git7: Reported regressions from 2.6.38 Rafael J. Wysocki
2011-04-17 12:57 ` [Bug #32982] Kernel locks up a few minutes after boot Rafael J. Wysocki
2011-04-17 12:57   ` Rafael J. Wysocki
2011-04-17 17:03   ` Linus Torvalds
2011-04-17 17:03     ` Linus Torvalds
2011-04-17 18:37     ` Bart Van Assche
2011-04-17 18:37       ` Bart Van Assche
2011-04-17 21:07       ` NeilBrown
2011-04-17 21:07         ` NeilBrown
2011-04-17 22:20         ` NeilBrown
2011-04-17 22:20           ` NeilBrown
2011-04-18 11:44       ` Jens Axboe
2011-04-18 11:44         ` Jens Axboe
2011-04-18 18:21         ` Bart Van Assche
2011-04-18 18:21           ` Bart Van Assche
2011-04-18 18:28           ` Jens Axboe
2011-04-18 18:28             ` Jens Axboe
2011-04-18 18:32             ` Bart Van Assche
2011-04-18 18:32               ` Bart Van Assche
2011-04-18 18:38               ` Jens Axboe
2011-04-18 18:38                 ` Jens Axboe
2011-04-19  9:09               ` Jens Axboe
2011-04-19  9:09                 ` Jens Axboe
2011-04-19 11:16                 ` Jens Axboe
2011-04-19 11:16                   ` Jens Axboe
2011-04-19 16:13                   ` Bart Van Assche
2011-04-19 16:32                     ` Linus Torvalds
2011-04-19 16:32                       ` Linus Torvalds
2011-04-19 17:43                       ` Jens Axboe
2011-04-19 17:43                         ` Jens Axboe
2011-04-19 17:43                     ` Jens Axboe
2011-04-19 17:43                       ` Jens Axboe
2011-04-19 16:48                   ` Christoph Hellwig
2011-04-19 16:48                     ` Christoph Hellwig
2011-04-19 17:06                     ` Jens Axboe
2011-04-19 17:06                       ` Jens Axboe
2011-04-19  3:32           ` David Dillow
2011-04-19  3:32             ` David Dillow
2011-04-19 16:39             ` Bart Van Assche
2011-04-19 16:39               ` Bart Van Assche
2011-04-21  0:38               ` Dave Dillow
2011-04-30 19:42 2.6.39-rc5-git4: Reported regressions from 2.6.38 Rafael J. Wysocki
2011-04-30 19:42 ` [Bug #32982] Kernel locks up a few minutes after boot Rafael J. Wysocki
2011-04-30 19:42   ` Rafael J. Wysocki
2011-04-30 19:51   ` Linus Torvalds
2011-04-30 19:51     ` Linus Torvalds
     [not found]     ` <BANLkTik_aeVn9Jf_cWnoY0fNUm+tjMnixA@mail.gmail.com>
2011-05-01 17:01       ` Linus Torvalds
2011-05-01 17:01         ` Linus Torvalds

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.