linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kblockd/1: page allocation failure in 2.6.9
@ 2004-12-21  7:39 Frank Steiner
  2004-12-23 11:26 ` Frank Steiner
  0 siblings, 1 reply; 10+ messages in thread
From: Frank Steiner @ 2004-12-21  7:39 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Hi,

we got sth. that looks similar to but not exactly like a kernel oops on one
of our hosts saying "kblockd/1: page allocation failure. order:0, mode:0x21"
(the full output is below). The kernel is 2.6.9.

After that happend, "free" told me that 1.7GB of the 2GB was used and nothing
was buffered (i.e., the "-/+ buffers" shopwed the same amount of used memory),
which was strange because there was no process running that did any real
work. A tomcat server and a mysql database are running on this machine, but
the database is still empty and the tomcat does not yet offer any services.

Also, "ps -aux" showed only 2/3 of the processes then hang until I killed it.
I've never seen "ps -aux" hang! I had to reboot to get rid of this.

I guess the problem occured after copying a large amount of data to the disks.
The host is a dual xeon 1.7GHz with an internal 147GB Raid-1 (ICP vortex controlled
with gdth modul) and two external 600GB IDE-Raid5, connected to an Adaptec 29160
with the aic7xxx modul.

Can someone tell me what this kblockd messages mean? What is kblockd? Could
that point to a hardware problem (CPU, memory, mainboard) or a driver problems
with one the scsi controllers or sth.? The log says sth about scsi_get_command.
And are these messages sth. like a kernel oops or not? I've never seen sth.
like this before so I'm a bit lost :-)

Thanks for any hints!
Best regards,
Frank


Dec 20 13:17:21 turing kernel: kblockd/1: page allocation failure. order:0, mode:0x21
Dec 20 13:17:21 turing kernel:  [__alloc_pages+824/832] __alloc_pages+0x338/0x340
Dec 20 13:17:21 turing kernel:  [<c0142648>] __alloc_pages+0x338/0x340
Dec 20 13:17:21 turing kernel:  [__get_free_pages+24/48] __get_free_pages+0x18/0x30
Dec 20 13:17:21 turing kernel:  [<c0142668>] __get_free_pages+0x18/0x30
Dec 20 13:17:21 turing kernel:  [kmem_getpages+36/224] kmem_getpages+0x24/0xe0
Dec 20 13:17:21 turing kernel:  [<c0145884>] kmem_getpages+0x24/0xe0
Dec 20 13:17:21 turing kernel:  [cache_grow+168/352] cache_grow+0xa8/0x160
Dec 20 13:17:21 turing kernel:  [<c0146578>] cache_grow+0xa8/0x160
Dec 20 13:17:21 turing kernel:  [cache_alloc_refill+354/560] 
cache_alloc_refill+0x162/0x230
Dec 20 13:17:21 turing kernel:  [<c0146792>] cache_alloc_refill+0x162/0x230
Dec 20 13:17:21 turing kernel:  [kmem_cache_alloc+52/64] kmem_cache_alloc+0x34/0x40
Dec 20 13:17:22 turing kernel:  [<c0146a54>] kmem_cache_alloc+0x34/0x40
Dec 20 13:17:22 turing kernel:  [__scsi_get_command+21/96] __scsi_get_command+0x15/0x60
Dec 20 13:17:22 turing kernel:  [<c02eba35>] __scsi_get_command+0x15/0x60
Dec 20 13:17:22 turing kernel:  [scsi_get_command+15/144] scsi_get_command+0xf/0x90
Dec 20 13:17:22 turing kernel:  [<c02eba8f>] scsi_get_command+0xf/0x90
Dec 20 13:17:22 turing kernel:  [scsi_prep_fn+227/448] scsi_prep_fn+0xe3/0x1c0
Dec 20 13:17:22 turing kernel:  [<c02f0f93>] scsi_prep_fn+0xe3/0x1c0
Dec 20 13:17:22 turing kernel:  [elv_next_request+85/224] elv_next_request+0x55/0xe0
Dec 20 13:17:22 turing kernel:  [<c029a795>] elv_next_request+0x55/0xe0
Dec 20 13:17:22 turing kernel:  [scsi_request_fn+501/960] scsi_request_fn+0x1f5/0x3c0
Dec 20 13:17:22 turing kernel:  [<c02f1265>] scsi_request_fn+0x1f5/0x3c0
Dec 20 13:17:22 turing kernel:  [__generic_unplug_device+46/48] 
__generic_unplug_device+0x2e/0x30
Dec 20 13:17:22 turing kernel:  [<c029c29e>] __generic_unplug_device+0x2e/0x30
Dec 20 13:17:22 turing kernel:  [generic_unplug_device+21/48] 
generic_unplug_device+0x15/0x30
Dec 20 13:17:22 turing kernel:  [<c029c2b5>] generic_unplug_device+0x15/0x30
Dec 20 13:17:22 turing kernel:  [blk_unplug_work+6/16] blk_unplug_work+0x6/0x10
Dec 20 13:17:22 turing kernel:  [<c029c2f6>] blk_unplug_work+0x6/0x10
Dec 20 13:17:22 turing kernel:  [worker_thread+424/560] worker_thread+0x1a8/0x230
Dec 20 13:17:22 turing kernel:  [<c0130048>] worker_thread+0x1a8/0x230
Dec 20 13:17:22 turing kernel:  [blk_unplug_work+0/16] blk_unplug_work+0x0/0x10
Dec 20 13:17:22 turing kernel:  [<c029c2f0>] blk_unplug_work+0x0/0x10
Dec 20 13:17:22 turing kernel:  [default_wake_function+0/16] 
default_wake_function+0x0/0x10
Dec 20 13:17:22 turing kernel:  [<c011e470>] default_wake_function+0x0/0x10
Dec 20 13:17:22 turing kernel:  [default_wake_function+0/16] 
default_wake_function+0x0/0x10
Dec 20 13:17:22 turing kernel:  [<c011e470>] default_wake_function+0x0/0x10
Dec 20 13:17:22 turing kernel:  [worker_thread+0/560] worker_thread+0x0/0x230
Dec 20 13:17:22 turing kernel:  [<c012fea0>] worker_thread+0x0/0x230
Dec 20 13:17:22 turing kernel:  [kthread+134/176] kthread+0x86/0xb0
Dec 20 13:17:22 turing kernel:  [<c0133b46>] kthread+0x86/0xb0
Dec 20 13:17:22 turing kernel:  [kthread+0/176] kthread+0x0/0xb0
Dec 20 13:17:22 turing kernel:  [<c0133ac0>] kthread+0x0/0xb0
Dec 20 13:17:22 turing kernel:  [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
Dec 20 13:17:22 turing kernel:  [<c0105275>] kernel_thread_helper+0x5/0x10
Dec 20 13:17:22 turing kernel: klogd: page allocation failure. order:0, mode:0x21
Dec 20 13:17:22 turing kernel:  [__alloc_pages+824/832] __alloc_pages+0x338/0x340
Dec 20 13:17:22 turing kernel:  [<c0142648>] __alloc_pages+0x338/0x340
Dec 20 13:17:22 turing kernel:  [__get_free_pages+24/48] __get_free_pages+0x18/0x30
Dec 20 13:17:22 turing kernel:  [<c0142668>] __get_free_pages+0x18/0x30
Dec 20 13:17:22 turing kernel:  [kmem_getpages+36/224] kmem_getpages+0x24/0xe0
Dec 20 13:17:22 turing kernel:  [<c0145884>] kmem_getpages+0x24/0xe0
Dec 20 13:17:22 turing kernel:  [cache_grow+168/352] cache_grow+0xa8/0x160
Dec 20 13:17:22 turing kernel:  [<c0146578>] cache_grow+0xa8/0x160
Dec 20 13:17:22 turing kernel:  [cache_alloc_refill+354/560] 
cache_alloc_refill+0x162/0x230
Dec 20 13:17:22 turing kernel:  [<c0146792>] cache_alloc_refill+0x162/0x230
Dec 20 13:17:22 turing kernel:  [kmem_cache_alloc+52/64] kmem_cache_alloc+0x34/0x40
Dec 20 13:17:22 turing kernel:  [<c0146a54>] kmem_cache_alloc+0x34/0x40
Dec 20 13:17:22 turing kernel:  [__scsi_get_command+21/96] __scsi_get_command+0x15/0x60
Dec 20 13:17:22 turing kernel:  [<c02eba35>] __scsi_get_command+0x15/0x60
Dec 20 13:17:22 turing kernel:  [scsi_get_command+15/144] scsi_get_command+0xf/0x90
Dec 20 13:17:22 turing kernel:  [<c02eba8f>] scsi_get_command+0xf/0x90
Dec 20 13:17:22 turing kernel:  [scsi_prep_fn+227/448] scsi_prep_fn+0xe3/0x1c0
Dec 20 13:17:22 turing kernel:  [<c02f0f93>] scsi_prep_fn+0xe3/0x1c0
Dec 20 13:17:22 turing kernel:  [elv_next_request+85/224] elv_next_request+0x55/0xe0
Dec 20 13:17:22 turing kernel:  [<c029a795>] elv_next_request+0x55/0xe0
Dec 20 13:17:22 turing kernel:  [__generic_unplug_device+34/48] 
__generic_unplug_device+0x22/0x30
Dec 20 13:17:22 turing kernel:  [<c029c292>] __generic_unplug_device+0x22/0x30
Dec 20 13:17:22 turing kernel:  [generic_unplug_device+21/48] 
generic_unplug_device+0x15/0x30
Dec 20 13:17:22 turing kernel:  [<c029c2b5>] generic_unplug_device+0x15/0x30
Dec 20 13:17:22 turing kernel:  [blk_backing_dev_unplug+18/32] 
blk_backing_dev_unplug+0x12/0x20
Dec 20 13:17:22 turing kernel:  [<c029c2e2>] blk_backing_dev_unplug+0x12/0x20
Dec 20 13:17:22 turing kernel:  [swap_unplug_io_fn+90/144] swap_unplug_io_fn+0x5a/0x90
Dec 20 13:17:22 turing kernel:  [<c015347a>] swap_unplug_io_fn+0x5a/0x90
Dec 20 13:17:22 turing kernel:  [swap_unplug_io_fn+0/144] swap_unplug_io_fn+0x0/0x90
Dec 20 13:17:22 turing kernel:  [<c0153420>] swap_unplug_io_fn+0x0/0x90
Dec 20 13:17:22 turing kernel:  [block_sync_page+68/80] block_sync_page+0x44/0x50
Dec 20 13:17:22 turing kernel:  [<c015f284>] block_sync_page+0x44/0x50
Dec 20 13:17:22 turing kernel:  [__lock_page+235/256] __lock_page+0xeb/0x100
Dec 20 13:17:22 turing kernel:  [<c013e42b>] __lock_page+0xeb/0x100
Dec 20 13:17:22 turing kernel:  [page_wake_function+0/64] page_wake_function+0x0/0x40
Dec 20 13:17:22 turing kernel:  [<c013e130>] page_wake_function+0x0/0x40
Dec 20 13:17:22 turing kernel:  [page_wake_function+0/64] page_wake_function+0x0/0x40
Dec 20 13:17:22 turing kernel:  [<c013e130>] page_wake_function+0x0/0x40
Dec 20 13:17:22 turing kernel:  [grab_swap_token+158/176] grab_swap_token+0x9e/0xb0
Dec 20 13:17:22 turing kernel:  [<c015584e>] grab_swap_token+0x9e/0xb0
Dec 20 13:17:22 turing kernel:  [do_swap_page+327/656] do_swap_page+0x147/0x290
Dec 20 13:17:22 turing kernel:  [<c014d077>] do_swap_page+0x147/0x290
Dec 20 13:17:22 turing kernel:  [handle_mm_fault+244/336] handle_mm_fault+0xf4/0x150
Dec 20 13:17:22 turing kernel:  [<c014d7b4>] handle_mm_fault+0xf4/0x150
Dec 20 13:17:22 turing kernel:  [do_page_fault+443/1400] do_page_fault+0x1bb/0x578
Dec 20 13:17:22 turing kernel:  [<c011af4b>] do_page_fault+0x1bb/0x578
Dec 20 13:17:22 turing kernel:  [avc_has_perm_noaudit+188/416] 
avc_has_perm_noaudit+0xbc/0x1a0
Dec 20 13:17:22 turing kernel:  [<c020fb2c>] avc_has_perm_noaudit+0xbc/0x1a0
Dec 20 13:17:22 turing kernel:  [recalc_task_prio+330/480] recalc_task_prio+0x14a/0x1e0
Dec 20 13:17:22 turing kernel:  [<c011c5ea>] recalc_task_prio+0x14a/0x1e0
Dec 20 13:17:22 turing kernel:  [finish_task_switch+50/112] finish_task_switch+0x32/0x70
Dec 20 13:17:22 turing kernel:  [<c011cfa2>] finish_task_switch+0x32/0x70
Dec 20 13:17:22 turing kernel:  [schedule+848/2832] schedule+0x350/0xb10
Dec 20 13:17:22 turing kernel:  [<c03908b0>] schedule+0x350/0xb10
Dec 20 13:17:22 turing kernel:  [do_page_fault+0/1400] do_page_fault+0x0/0x578
Dec 20 13:17:22 turing kernel:  [<c011ad90>] do_page_fault+0x0/0x578
Dec 20 13:17:22 turing kernel:  [error_code+45/64] error_code+0x2d/0x40
Dec 20 13:17:22 turing kernel:  [<c0107f5d>] error_code+0x2d/0x40
Dec 20 13:17:22 turing kernel:  [do_syslog+779/992] do_syslog+0x30b/0x3e0
Dec 20 13:17:22 turing kernel:  [<c012197b>] do_syslog+0x30b/0x3e0
Dec 20 13:17:22 turing kernel:  [autoremove_wake_function+0/48] 
autoremove_wake_function+0x0/0x30
Dec 20 13:17:22 turing kernel:  [<c011f9e0>] autoremove_wake_function+0x0/0x30
Dec 20 13:17:22 turing kernel:  [autoremove_wake_function+0/48] 
autoremove_wake_function+0x0/0x30
Dec 20 13:17:22 turing kernel:  [<c011f9e0>] autoremove_wake_function+0x0/0x30
Dec 20 13:17:22 turing kernel:  [kmsg_read+0/64] kmsg_read+0x0/0x40
Dec 20 13:17:22 turing kernel:  [<c018a620>] kmsg_read+0x0/0x40
Dec 20 13:17:22 turing kernel:  [vfs_read+174/240] vfs_read+0xae/0xf0
Dec 20 13:17:22 turing kernel:  [<c015aa8e>] vfs_read+0xae/0xf0
Dec 20 13:17:22 turing kernel:  [sys_read+60/112] sys_read+0x3c/0x70
Dec 20 13:17:22 turing kernel:  [<c015acec>] sys_read+0x3c/0x70
Dec 20 13:17:22 turing kernel:  [syscall_call+7/11] syscall_call+0x7/0xb
Dec 20 13:17:22 turing kernel:  [<c0106e27>] syscall_call+0x7/0xb
Dec 20 13:17:22 turing kernel: klogd: page allocation failure. order:0, mode:0x21
Dec 20 13:17:22 turing kernel:  [__alloc_pages+824/832] __alloc_pages+0x338/0x340
Dec 20 13:17:22 turing kernel:  [<c0142648>] __alloc_pages+0x338/0x340
Dec 20 13:17:22 turing kernel:  [__get_free_pages+24/48] __get_free_pages+0x18/0x30
Dec 20 13:17:22 turing kernel:  [<c0142668>] __get_free_pages+0x18/0x30
Dec 20 13:17:22 turing kernel:  [kmem_getpages+36/224] kmem_getpages+0x24/0xe0
Dec 20 13:17:22 turing kernel:  [<c0145884>] kmem_getpages+0x24/0xe0
Dec 20 13:17:22 turing kernel:  [cache_grow+168/352] cache_grow+0xa8/0x160
Dec 20 13:17:22 turing kernel:  [<c0146578>] cache_grow+0xa8/0x160
Dec 20 13:17:22 turing kernel:  [cache_alloc_refill+354/560] 
cache_alloc_refill+0x162/0x230
Dec 20 13:17:22 turing kernel:  [<c0146792>] cache_alloc_refill+0x162/0x230
Dec 20 13:17:22 turing kernel:  [kmem_cache_alloc+52/64] kmem_cache_alloc+0x34/0x40
Dec 20 13:17:22 turing kernel:  [<c0146a54>] kmem_cache_alloc+0x34/0x40
Dec 20 13:17:22 turing kernel:  [__scsi_get_command+21/96] __scsi_get_command+0x15/0x60
Dec 20 13:17:22 turing kernel:  [<c02eba35>] __scsi_get_command+0x15/0x60
Dec 20 13:17:22 turing kernel:  [scsi_get_command+15/144] scsi_get_command+0xf/0x90
Dec 20 13:17:22 turing kernel:  [<c02eba8f>] scsi_get_command+0xf/0x90
Dec 20 13:17:22 turing kernel:  [scsi_prep_fn+227/448] scsi_prep_fn+0xe3/0x1c0
Dec 20 13:17:22 turing kernel:  [<c02f0f93>] scsi_prep_fn+0xe3/0x1c0
Dec 20 13:17:22 turing kernel:  [elv_next_request+85/224] elv_next_request+0x55/0xe0
Dec 20 13:17:22 turing kernel:  [<c029a795>] elv_next_request+0x55/0xe0
Dec 20 13:17:22 turing kernel:  [__generic_unplug_device+34/48] 
__generic_unplug_device+0x22/0x30
Dec 20 13:17:22 turing kernel:  [<c029c292>] __generic_unplug_device+0x22/0x30
Dec 20 13:17:22 turing kernel:  [generic_unplug_device+21/48] 
generic_unplug_device+0x15/0x30
Dec 20 13:17:22 turing kernel:  [<c029c2b5>] generic_unplug_device+0x15/0x30
Dec 20 13:17:22 turing kernel:  [blk_backing_dev_unplug+0/32] 
blk_backing_dev_unplug+0x0/0x20
Dec 20 13:17:22 turing kernel:  [<c029c2d0>] blk_backing_dev_unplug+0x0/0x20
Dec 20 13:17:22 turing kernel:  [blk_backing_dev_unplug+18/32] 
blk_backing_dev_unplug+0x12/0x20
Dec 20 13:17:22 turing kernel:  [<c029c2e2>] blk_backing_dev_unplug+0x12/0x20
Dec 20 13:17:22 turing kernel:  [block_sync_page+68/80] block_sync_page+0x44/0x50
Dec 20 13:17:22 turing kernel:  [<c015f284>] block_sync_page+0x44/0x50
Dec 20 13:17:22 turing kernel:  [__lock_page+235/256] __lock_page+0xeb/0x100
Dec 20 13:17:22 turing kernel:  [<c013e42b>] __lock_page+0xeb/0x100
Dec 20 13:17:22 turing kernel:  [page_wake_function+0/64] page_wake_function+0x0/0x40
Dec 20 13:17:22 turing kernel:  [<c013e130>] page_wake_function+0x0/0x40
Dec 20 13:17:22 turing kernel:  [page_wake_function+0/64] page_wake_function+0x0/0x40
Dec 20 13:17:22 turing kernel:  [<c013e130>] page_wake_function+0x0/0x40
Dec 20 13:17:22 turing kernel:  [find_get_page+57/80] find_get_page+0x39/0x50
Dec 20 13:17:22 turing kernel:  [<c013e479>] find_get_page+0x39/0x50
Dec 20 13:17:22 turing kernel:  [filemap_nopage+697/864] filemap_nopage+0x2b9/0x360
Dec 20 13:17:22 turing kernel:  [<c013f489>] filemap_nopage+0x2b9/0x360
Dec 20 13:17:22 turing kernel:  [do_no_page+180/688] do_no_page+0xb4/0x2b0
Dec 20 13:17:22 turing kernel:  [<c014d3f4>] do_no_page+0xb4/0x2b0
Dec 20 13:17:22 turing kernel:  [handle_mm_fault+272/336] handle_mm_fault+0x110/0x150
Dec 20 13:17:22 turing kernel:  [<c014d7d0>] handle_mm_fault+0x110/0x150
Dec 20 13:17:22 turing kernel:  [do_page_fault+443/1400] do_page_fault+0x1bb/0x578
Dec 20 13:17:22 turing kernel:  [<c011af4b>] do_page_fault+0x1bb/0x578
Dec 20 13:17:22 turing kernel:  [do_syslog+722/992] do_syslog+0x2d2/0x3e0
Dec 20 13:17:22 turing kernel:  [<c0121942>] do_syslog+0x2d2/0x3e0
Dec 20 13:17:22 turing kernel:  [autoremove_wake_function+0/48] 
autoremove_wake_function+0x0/0x30
Dec 20 13:17:22 turing kernel:  [<c011f9e0>] autoremove_wake_function+0x0/0x30
Dec 20 13:17:22 turing kernel:  [autoremove_wake_function+0/48] 
autoremove_wake_function+0x0/0x30
Dec 20 13:17:22 turing kernel:  [<c011f9e0>] autoremove_wake_function+0x0/0x30
Dec 20 13:17:22 turing kernel:  [kmsg_read+0/64] kmsg_read+0x0/0x40
Dec 20 13:17:22 turing kernel:  [<c018a620>] kmsg_read+0x0/0x40
Dec 20 13:17:22 turing kernel:  [dnotify_parent+31/112] dnotify_parent+0x1f/0x70
Dec 20 13:17:22 turing kernel:  [<c017463f>] dnotify_parent+0x1f/0x70
Dec 20 13:17:22 turing kernel:  [vfs_read+194/240] vfs_read+0xc2/0xf0
Dec 20 13:17:22 turing kernel:  [<c015aaa2>] vfs_read+0xc2/0xf0
Dec 20 13:17:22 turing kernel:  [sys_read+60/112] sys_read+0x3c/0x70
Dec 20 13:17:22 turing kernel:  [<c015acec>] sys_read+0x3c/0x70
Dec 20 13:17:22 turing kernel:  [do_page_fault+0/1400] do_page_fault+0x0/0x578
Dec 20 13:17:22 turing kernel:  [<c011ad90>] do_page_fault+0x0/0x578
Dec 20 13:17:22 turing kernel:  [error_code+45/64] error_code+0x2d/0x40
Dec 20 13:17:22 turing kernel:  [<c0107f5d>] error_code+0x2d/0x40

-- 
Dipl.-Inform. Frank Steiner   Web:  http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik    Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17           Phone: +49 89 2180-4049
80333 Muenchen, Germany       Fax:   +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kblockd/1: page allocation failure in 2.6.9
  2004-12-21  7:39 kblockd/1: page allocation failure in 2.6.9 Frank Steiner
@ 2004-12-23 11:26 ` Frank Steiner
  2004-12-23 15:51   ` Denis Vlasenko
  0 siblings, 1 reply; 10+ messages in thread
From: Frank Steiner @ 2004-12-23 11:26 UTC (permalink / raw)
  To: Linux Kernel Mailing List

No one able to give me least some hints what this message could mean?
If it is sth. like a kernel oops or sth. else? If it could point to
a hardware error, or kernel bug, or kernel configuration problem or sth?

As long as I don't even know that this message means or where it comes
from, I cannot really debug things :-(

cu,
Frank

-- 
Dipl.-Inform. Frank Steiner   Web:  http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik    Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17           Phone: +49 89 2180-4049
80333 Muenchen, Germany       Fax:   +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kblockd/1: page allocation failure in 2.6.9
  2004-12-23 11:26 ` Frank Steiner
@ 2004-12-23 15:51   ` Denis Vlasenko
  2004-12-23 15:55     ` Frank Steiner
  0 siblings, 1 reply; 10+ messages in thread
From: Denis Vlasenko @ 2004-12-23 15:51 UTC (permalink / raw)
  To: Frank Steiner, Linux Kernel Mailing List

On Thursday 23 December 2004 11:26, Frank Steiner wrote:
> No one able to give me least some hints what this message could mean?
> If it is sth. like a kernel oops or sth. else? If it could point to
> a hardware error, or kernel bug, or kernel configuration problem or sth?
>
> As long as I don't even know that this message means or where it comes
> from, I cannot really debug things :-(

Details?
-- 
vda

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kblockd/1: page allocation failure in 2.6.9
  2004-12-23 15:51   ` Denis Vlasenko
@ 2004-12-23 15:55     ` Frank Steiner
  2004-12-24 13:20       ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Frank Steiner @ 2004-12-23 15:55 UTC (permalink / raw)
  To: Denis Vlasenko; +Cc: Linux Kernel Mailing List

Hi,

Denis Vlasenko wrote

> On Thursday 23 December 2004 11:26, Frank Steiner wrote:
>>No one able to give me least some hints what this message could mean?
>>If it is sth. like a kernel oops or sth. else? If it could point to
>>a hardware error, or kernel bug, or kernel configuration problem or sth?
>>
>>As long as I don't even know that this message means or where it comes
>>from, I cannot really debug things :-(
> 
> Details?

what exactly do you mean by details? The details of the kernel messages?
They were in my first email, but I post them again below in case the first
mail got lost! Or do you need more details about my system?

I quote the complete first mail to the lkml below, because it contains some details
about the host, too.

cu,
Frank

Frank Steiner wrote

> Hi,
> 
> we got sth. that looks similar to but not exactly like a kernel oops on one
> of our hosts saying "kblockd/1: page allocation failure. order:0, mode:0x21"
> (the full output is below). The kernel is 2.6.9.
> 
> After that happend, "free" told me that 1.7GB of the 2GB was used and nothing
> was buffered (i.e., the "-/+ buffers" shopwed the same amount of used memory),
> which was strange because there was no process running that did any real
> work. A tomcat server and a mysql database are running on this machine, but
> the database is still empty and the tomcat does not yet offer any services.
> 
> Also, "ps -aux" showed only 2/3 of the processes then hang until I killed it.
> I've never seen "ps -aux" hang! I had to reboot to get rid of this.
> 
> I guess the problem occured after copying a large amount of data to the disks.
> The host is a dual xeon 1.7GHz with an internal 147GB Raid-1 (ICP vortex controlled
> with gdth modul) and two external 600GB IDE-Raid5, connected to an Adaptec 29160
> with the aic7xxx modul.
> 
> Can someone tell me what this kblockd messages mean? What is kblockd? Could
> that point to a hardware problem (CPU, memory, mainboard) or a driver problems
> with one the scsi controllers or sth.? The log says sth about scsi_get_command.
> And are these messages sth. like a kernel oops or not? I've never seen sth.
> like this before so I'm a bit lost :-)
> 
> Thanks for any hints!
> Best regards,
> Frank
> 
> 
> Dec 20 13:17:21 turing kernel: kblockd/1: page allocation failure. order:0, mode:0x21
> Dec 20 13:17:21 turing kernel:  [__alloc_pages+824/832] __alloc_pages+0x338/0x340
> Dec 20 13:17:21 turing kernel:  [<c0142648>] __alloc_pages+0x338/0x340
> Dec 20 13:17:21 turing kernel:  [__get_free_pages+24/48] __get_free_pages+0x18/0x30
> Dec 20 13:17:21 turing kernel:  [<c0142668>] __get_free_pages+0x18/0x30
> Dec 20 13:17:21 turing kernel:  [kmem_getpages+36/224] kmem_getpages+0x24/0xe0
> Dec 20 13:17:21 turing kernel:  [<c0145884>] kmem_getpages+0x24/0xe0
> Dec 20 13:17:21 turing kernel:  [cache_grow+168/352] cache_grow+0xa8/0x160
> Dec 20 13:17:21 turing kernel:  [<c0146578>] cache_grow+0xa8/0x160
> Dec 20 13:17:21 turing kernel:  [cache_alloc_refill+354/560] 
> cache_alloc_refill+0x162/0x230
> Dec 20 13:17:21 turing kernel:  [<c0146792>] cache_alloc_refill+0x162/0x230
> Dec 20 13:17:21 turing kernel:  [kmem_cache_alloc+52/64] kmem_cache_alloc+0x34/0x40
> Dec 20 13:17:22 turing kernel:  [<c0146a54>] kmem_cache_alloc+0x34/0x40
> Dec 20 13:17:22 turing kernel:  [__scsi_get_command+21/96] __scsi_get_command+0x15/0x60
> Dec 20 13:17:22 turing kernel:  [<c02eba35>] __scsi_get_command+0x15/0x60
> Dec 20 13:17:22 turing kernel:  [scsi_get_command+15/144] scsi_get_command+0xf/0x90
> Dec 20 13:17:22 turing kernel:  [<c02eba8f>] scsi_get_command+0xf/0x90
> Dec 20 13:17:22 turing kernel:  [scsi_prep_fn+227/448] scsi_prep_fn+0xe3/0x1c0
> Dec 20 13:17:22 turing kernel:  [<c02f0f93>] scsi_prep_fn+0xe3/0x1c0
> Dec 20 13:17:22 turing kernel:  [elv_next_request+85/224] elv_next_request+0x55/0xe0
> Dec 20 13:17:22 turing kernel:  [<c029a795>] elv_next_request+0x55/0xe0
> Dec 20 13:17:22 turing kernel:  [scsi_request_fn+501/960] scsi_request_fn+0x1f5/0x3c0
> Dec 20 13:17:22 turing kernel:  [<c02f1265>] scsi_request_fn+0x1f5/0x3c0
> Dec 20 13:17:22 turing kernel:  [__generic_unplug_device+46/48] 
> __generic_unplug_device+0x2e/0x30
> Dec 20 13:17:22 turing kernel:  [<c029c29e>] __generic_unplug_device+0x2e/0x30
> Dec 20 13:17:22 turing kernel:  [generic_unplug_device+21/48] 
> generic_unplug_device+0x15/0x30
> Dec 20 13:17:22 turing kernel:  [<c029c2b5>] generic_unplug_device+0x15/0x30
> Dec 20 13:17:22 turing kernel:  [blk_unplug_work+6/16] blk_unplug_work+0x6/0x10
> Dec 20 13:17:22 turing kernel:  [<c029c2f6>] blk_unplug_work+0x6/0x10
> Dec 20 13:17:22 turing kernel:  [worker_thread+424/560] worker_thread+0x1a8/0x230
> Dec 20 13:17:22 turing kernel:  [<c0130048>] worker_thread+0x1a8/0x230
> Dec 20 13:17:22 turing kernel:  [blk_unplug_work+0/16] blk_unplug_work+0x0/0x10
> Dec 20 13:17:22 turing kernel:  [<c029c2f0>] blk_unplug_work+0x0/0x10
> Dec 20 13:17:22 turing kernel:  [default_wake_function+0/16] 
> default_wake_function+0x0/0x10
> Dec 20 13:17:22 turing kernel:  [<c011e470>] default_wake_function+0x0/0x10
> Dec 20 13:17:22 turing kernel:  [default_wake_function+0/16] 
> default_wake_function+0x0/0x10
> Dec 20 13:17:22 turing kernel:  [<c011e470>] default_wake_function+0x0/0x10
> Dec 20 13:17:22 turing kernel:  [worker_thread+0/560] worker_thread+0x0/0x230
> Dec 20 13:17:22 turing kernel:  [<c012fea0>] worker_thread+0x0/0x230
> Dec 20 13:17:22 turing kernel:  [kthread+134/176] kthread+0x86/0xb0
> Dec 20 13:17:22 turing kernel:  [<c0133b46>] kthread+0x86/0xb0
> Dec 20 13:17:22 turing kernel:  [kthread+0/176] kthread+0x0/0xb0
> Dec 20 13:17:22 turing kernel:  [<c0133ac0>] kthread+0x0/0xb0
> Dec 20 13:17:22 turing kernel:  [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
> Dec 20 13:17:22 turing kernel:  [<c0105275>] kernel_thread_helper+0x5/0x10
> Dec 20 13:17:22 turing kernel: klogd: page allocation failure. order:0, mode:0x21
> Dec 20 13:17:22 turing kernel:  [__alloc_pages+824/832] __alloc_pages+0x338/0x340
> Dec 20 13:17:22 turing kernel:  [<c0142648>] __alloc_pages+0x338/0x340
> Dec 20 13:17:22 turing kernel:  [__get_free_pages+24/48] __get_free_pages+0x18/0x30
> Dec 20 13:17:22 turing kernel:  [<c0142668>] __get_free_pages+0x18/0x30
> Dec 20 13:17:22 turing kernel:  [kmem_getpages+36/224] kmem_getpages+0x24/0xe0
> Dec 20 13:17:22 turing kernel:  [<c0145884>] kmem_getpages+0x24/0xe0
> Dec 20 13:17:22 turing kernel:  [cache_grow+168/352] cache_grow+0xa8/0x160
> Dec 20 13:17:22 turing kernel:  [<c0146578>] cache_grow+0xa8/0x160
> Dec 20 13:17:22 turing kernel:  [cache_alloc_refill+354/560] 
> cache_alloc_refill+0x162/0x230
> Dec 20 13:17:22 turing kernel:  [<c0146792>] cache_alloc_refill+0x162/0x230
> Dec 20 13:17:22 turing kernel:  [kmem_cache_alloc+52/64] kmem_cache_alloc+0x34/0x40
> Dec 20 13:17:22 turing kernel:  [<c0146a54>] kmem_cache_alloc+0x34/0x40
> Dec 20 13:17:22 turing kernel:  [__scsi_get_command+21/96] __scsi_get_command+0x15/0x60
> Dec 20 13:17:22 turing kernel:  [<c02eba35>] __scsi_get_command+0x15/0x60
> Dec 20 13:17:22 turing kernel:  [scsi_get_command+15/144] scsi_get_command+0xf/0x90
> Dec 20 13:17:22 turing kernel:  [<c02eba8f>] scsi_get_command+0xf/0x90
> Dec 20 13:17:22 turing kernel:  [scsi_prep_fn+227/448] scsi_prep_fn+0xe3/0x1c0
> Dec 20 13:17:22 turing kernel:  [<c02f0f93>] scsi_prep_fn+0xe3/0x1c0
> Dec 20 13:17:22 turing kernel:  [elv_next_request+85/224] elv_next_request+0x55/0xe0
> Dec 20 13:17:22 turing kernel:  [<c029a795>] elv_next_request+0x55/0xe0
> Dec 20 13:17:22 turing kernel:  [__generic_unplug_device+34/48] 
> __generic_unplug_device+0x22/0x30
> Dec 20 13:17:22 turing kernel:  [<c029c292>] __generic_unplug_device+0x22/0x30
> Dec 20 13:17:22 turing kernel:  [generic_unplug_device+21/48] 
> generic_unplug_device+0x15/0x30
> Dec 20 13:17:22 turing kernel:  [<c029c2b5>] generic_unplug_device+0x15/0x30
> Dec 20 13:17:22 turing kernel:  [blk_backing_dev_unplug+18/32] 
> blk_backing_dev_unplug+0x12/0x20
> Dec 20 13:17:22 turing kernel:  [<c029c2e2>] blk_backing_dev_unplug+0x12/0x20
> Dec 20 13:17:22 turing kernel:  [swap_unplug_io_fn+90/144] swap_unplug_io_fn+0x5a/0x90
> Dec 20 13:17:22 turing kernel:  [<c015347a>] swap_unplug_io_fn+0x5a/0x90
> Dec 20 13:17:22 turing kernel:  [swap_unplug_io_fn+0/144] swap_unplug_io_fn+0x0/0x90
> Dec 20 13:17:22 turing kernel:  [<c0153420>] swap_unplug_io_fn+0x0/0x90
> Dec 20 13:17:22 turing kernel:  [block_sync_page+68/80] block_sync_page+0x44/0x50
> Dec 20 13:17:22 turing kernel:  [<c015f284>] block_sync_page+0x44/0x50
> Dec 20 13:17:22 turing kernel:  [__lock_page+235/256] __lock_page+0xeb/0x100
> Dec 20 13:17:22 turing kernel:  [<c013e42b>] __lock_page+0xeb/0x100
> Dec 20 13:17:22 turing kernel:  [page_wake_function+0/64] page_wake_function+0x0/0x40
> Dec 20 13:17:22 turing kernel:  [<c013e130>] page_wake_function+0x0/0x40
> Dec 20 13:17:22 turing kernel:  [page_wake_function+0/64] page_wake_function+0x0/0x40
> Dec 20 13:17:22 turing kernel:  [<c013e130>] page_wake_function+0x0/0x40
> Dec 20 13:17:22 turing kernel:  [grab_swap_token+158/176] grab_swap_token+0x9e/0xb0
> Dec 20 13:17:22 turing kernel:  [<c015584e>] grab_swap_token+0x9e/0xb0
> Dec 20 13:17:22 turing kernel:  [do_swap_page+327/656] do_swap_page+0x147/0x290
> Dec 20 13:17:22 turing kernel:  [<c014d077>] do_swap_page+0x147/0x290
> Dec 20 13:17:22 turing kernel:  [handle_mm_fault+244/336] handle_mm_fault+0xf4/0x150
> Dec 20 13:17:22 turing kernel:  [<c014d7b4>] handle_mm_fault+0xf4/0x150
> Dec 20 13:17:22 turing kernel:  [do_page_fault+443/1400] do_page_fault+0x1bb/0x578
> Dec 20 13:17:22 turing kernel:  [<c011af4b>] do_page_fault+0x1bb/0x578
> Dec 20 13:17:22 turing kernel:  [avc_has_perm_noaudit+188/416] 
> avc_has_perm_noaudit+0xbc/0x1a0
> Dec 20 13:17:22 turing kernel:  [<c020fb2c>] avc_has_perm_noaudit+0xbc/0x1a0
> Dec 20 13:17:22 turing kernel:  [recalc_task_prio+330/480] recalc_task_prio+0x14a/0x1e0
> Dec 20 13:17:22 turing kernel:  [<c011c5ea>] recalc_task_prio+0x14a/0x1e0
> Dec 20 13:17:22 turing kernel:  [finish_task_switch+50/112] finish_task_switch+0x32/0x70
> Dec 20 13:17:22 turing kernel:  [<c011cfa2>] finish_task_switch+0x32/0x70
> Dec 20 13:17:22 turing kernel:  [schedule+848/2832] schedule+0x350/0xb10
> Dec 20 13:17:22 turing kernel:  [<c03908b0>] schedule+0x350/0xb10
> Dec 20 13:17:22 turing kernel:  [do_page_fault+0/1400] do_page_fault+0x0/0x578
> Dec 20 13:17:22 turing kernel:  [<c011ad90>] do_page_fault+0x0/0x578
> Dec 20 13:17:22 turing kernel:  [error_code+45/64] error_code+0x2d/0x40
> Dec 20 13:17:22 turing kernel:  [<c0107f5d>] error_code+0x2d/0x40
> Dec 20 13:17:22 turing kernel:  [do_syslog+779/992] do_syslog+0x30b/0x3e0
> Dec 20 13:17:22 turing kernel:  [<c012197b>] do_syslog+0x30b/0x3e0
> Dec 20 13:17:22 turing kernel:  [autoremove_wake_function+0/48] 
> autoremove_wake_function+0x0/0x30
> Dec 20 13:17:22 turing kernel:  [<c011f9e0>] autoremove_wake_function+0x0/0x30
> Dec 20 13:17:22 turing kernel:  [autoremove_wake_function+0/48] 
> autoremove_wake_function+0x0/0x30
> Dec 20 13:17:22 turing kernel:  [<c011f9e0>] autoremove_wake_function+0x0/0x30
> Dec 20 13:17:22 turing kernel:  [kmsg_read+0/64] kmsg_read+0x0/0x40
> Dec 20 13:17:22 turing kernel:  [<c018a620>] kmsg_read+0x0/0x40
> Dec 20 13:17:22 turing kernel:  [vfs_read+174/240] vfs_read+0xae/0xf0
> Dec 20 13:17:22 turing kernel:  [<c015aa8e>] vfs_read+0xae/0xf0
> Dec 20 13:17:22 turing kernel:  [sys_read+60/112] sys_read+0x3c/0x70
> Dec 20 13:17:22 turing kernel:  [<c015acec>] sys_read+0x3c/0x70
> Dec 20 13:17:22 turing kernel:  [syscall_call+7/11] syscall_call+0x7/0xb
> Dec 20 13:17:22 turing kernel:  [<c0106e27>] syscall_call+0x7/0xb
> Dec 20 13:17:22 turing kernel: klogd: page allocation failure. order:0, mode:0x21
> Dec 20 13:17:22 turing kernel:  [__alloc_pages+824/832] __alloc_pages+0x338/0x340
> Dec 20 13:17:22 turing kernel:  [<c0142648>] __alloc_pages+0x338/0x340
> Dec 20 13:17:22 turing kernel:  [__get_free_pages+24/48] __get_free_pages+0x18/0x30
> Dec 20 13:17:22 turing kernel:  [<c0142668>] __get_free_pages+0x18/0x30
> Dec 20 13:17:22 turing kernel:  [kmem_getpages+36/224] kmem_getpages+0x24/0xe0
> Dec 20 13:17:22 turing kernel:  [<c0145884>] kmem_getpages+0x24/0xe0
> Dec 20 13:17:22 turing kernel:  [cache_grow+168/352] cache_grow+0xa8/0x160
> Dec 20 13:17:22 turing kernel:  [<c0146578>] cache_grow+0xa8/0x160
> Dec 20 13:17:22 turing kernel:  [cache_alloc_refill+354/560] 
> cache_alloc_refill+0x162/0x230
> Dec 20 13:17:22 turing kernel:  [<c0146792>] cache_alloc_refill+0x162/0x230
> Dec 20 13:17:22 turing kernel:  [kmem_cache_alloc+52/64] kmem_cache_alloc+0x34/0x40
> Dec 20 13:17:22 turing kernel:  [<c0146a54>] kmem_cache_alloc+0x34/0x40
> Dec 20 13:17:22 turing kernel:  [__scsi_get_command+21/96] __scsi_get_command+0x15/0x60
> Dec 20 13:17:22 turing kernel:  [<c02eba35>] __scsi_get_command+0x15/0x60
> Dec 20 13:17:22 turing kernel:  [scsi_get_command+15/144] scsi_get_command+0xf/0x90
> Dec 20 13:17:22 turing kernel:  [<c02eba8f>] scsi_get_command+0xf/0x90
> Dec 20 13:17:22 turing kernel:  [scsi_prep_fn+227/448] scsi_prep_fn+0xe3/0x1c0
> Dec 20 13:17:22 turing kernel:  [<c02f0f93>] scsi_prep_fn+0xe3/0x1c0
> Dec 20 13:17:22 turing kernel:  [elv_next_request+85/224] elv_next_request+0x55/0xe0
> Dec 20 13:17:22 turing kernel:  [<c029a795>] elv_next_request+0x55/0xe0
> Dec 20 13:17:22 turing kernel:  [__generic_unplug_device+34/48] 
> __generic_unplug_device+0x22/0x30
> Dec 20 13:17:22 turing kernel:  [<c029c292>] __generic_unplug_device+0x22/0x30
> Dec 20 13:17:22 turing kernel:  [generic_unplug_device+21/48] 
> generic_unplug_device+0x15/0x30
> Dec 20 13:17:22 turing kernel:  [<c029c2b5>] generic_unplug_device+0x15/0x30
> Dec 20 13:17:22 turing kernel:  [blk_backing_dev_unplug+0/32] 
> blk_backing_dev_unplug+0x0/0x20
> Dec 20 13:17:22 turing kernel:  [<c029c2d0>] blk_backing_dev_unplug+0x0/0x20
> Dec 20 13:17:22 turing kernel:  [blk_backing_dev_unplug+18/32] 
> blk_backing_dev_unplug+0x12/0x20
> Dec 20 13:17:22 turing kernel:  [<c029c2e2>] blk_backing_dev_unplug+0x12/0x20
> Dec 20 13:17:22 turing kernel:  [block_sync_page+68/80] block_sync_page+0x44/0x50
> Dec 20 13:17:22 turing kernel:  [<c015f284>] block_sync_page+0x44/0x50
> Dec 20 13:17:22 turing kernel:  [__lock_page+235/256] __lock_page+0xeb/0x100
> Dec 20 13:17:22 turing kernel:  [<c013e42b>] __lock_page+0xeb/0x100
> Dec 20 13:17:22 turing kernel:  [page_wake_function+0/64] page_wake_function+0x0/0x40
> Dec 20 13:17:22 turing kernel:  [<c013e130>] page_wake_function+0x0/0x40
> Dec 20 13:17:22 turing kernel:  [page_wake_function+0/64] page_wake_function+0x0/0x40
> Dec 20 13:17:22 turing kernel:  [<c013e130>] page_wake_function+0x0/0x40
> Dec 20 13:17:22 turing kernel:  [find_get_page+57/80] find_get_page+0x39/0x50
> Dec 20 13:17:22 turing kernel:  [<c013e479>] find_get_page+0x39/0x50
> Dec 20 13:17:22 turing kernel:  [filemap_nopage+697/864] filemap_nopage+0x2b9/0x360
> Dec 20 13:17:22 turing kernel:  [<c013f489>] filemap_nopage+0x2b9/0x360
> Dec 20 13:17:22 turing kernel:  [do_no_page+180/688] do_no_page+0xb4/0x2b0
> Dec 20 13:17:22 turing kernel:  [<c014d3f4>] do_no_page+0xb4/0x2b0
> Dec 20 13:17:22 turing kernel:  [handle_mm_fault+272/336] handle_mm_fault+0x110/0x150
> Dec 20 13:17:22 turing kernel:  [<c014d7d0>] handle_mm_fault+0x110/0x150
> Dec 20 13:17:22 turing kernel:  [do_page_fault+443/1400] do_page_fault+0x1bb/0x578
> Dec 20 13:17:22 turing kernel:  [<c011af4b>] do_page_fault+0x1bb/0x578
> Dec 20 13:17:22 turing kernel:  [do_syslog+722/992] do_syslog+0x2d2/0x3e0
> Dec 20 13:17:22 turing kernel:  [<c0121942>] do_syslog+0x2d2/0x3e0
> Dec 20 13:17:22 turing kernel:  [autoremove_wake_function+0/48] 
> autoremove_wake_function+0x0/0x30
> Dec 20 13:17:22 turing kernel:  [<c011f9e0>] autoremove_wake_function+0x0/0x30
> Dec 20 13:17:22 turing kernel:  [autoremove_wake_function+0/48] 
> autoremove_wake_function+0x0/0x30
> Dec 20 13:17:22 turing kernel:  [<c011f9e0>] autoremove_wake_function+0x0/0x30
> Dec 20 13:17:22 turing kernel:  [kmsg_read+0/64] kmsg_read+0x0/0x40
> Dec 20 13:17:22 turing kernel:  [<c018a620>] kmsg_read+0x0/0x40
> Dec 20 13:17:22 turing kernel:  [dnotify_parent+31/112] dnotify_parent+0x1f/0x70
> Dec 20 13:17:22 turing kernel:  [<c017463f>] dnotify_parent+0x1f/0x70
> Dec 20 13:17:22 turing kernel:  [vfs_read+194/240] vfs_read+0xc2/0xf0
> Dec 20 13:17:22 turing kernel:  [<c015aaa2>] vfs_read+0xc2/0xf0
> Dec 20 13:17:22 turing kernel:  [sys_read+60/112] sys_read+0x3c/0x70
> Dec 20 13:17:22 turing kernel:  [<c015acec>] sys_read+0x3c/0x70
> Dec 20 13:17:22 turing kernel:  [do_page_fault+0/1400] do_page_fault+0x0/0x578
> Dec 20 13:17:22 turing kernel:  [<c011ad90>] do_page_fault+0x0/0x578
> Dec 20 13:17:22 turing kernel:  [error_code+45/64] error_code+0x2d/0x40
> Dec 20 13:17:22 turing kernel:  [<c0107f5d>] error_code+0x2d/0x40
> 



-- 
Dipl.-Inform. Frank Steiner   Web:  http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik    Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17           Phone: +49 89 2180-4049
80333 Muenchen, Germany       Fax:   +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kblockd/1: page allocation failure in 2.6.9
  2004-12-23 15:55     ` Frank Steiner
@ 2004-12-24 13:20       ` Jens Axboe
  2004-12-24 19:28         ` James Bottomley
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2004-12-24 13:20 UTC (permalink / raw)
  To: Frank Steiner
  Cc: Denis Vlasenko, Linux Kernel Mailing List, James Bottomley,
	Andrew Morton

On Thu, Dec 23 2004, Frank Steiner wrote:
> >Dec 20 13:17:21 turing kernel: kblockd/1: page allocation failure. 
> >order:0, mode:0x21

This looks fishy - this is GFP_ATOMIC | GFP_DMA, where it should only be
GFP_ATOMIC. gdth should not have shost->cmd_pool->gfp_mask ==
GFP_ATOMIC, that looks like a bug in the driver.

Apart from that, the trace looks sane and the SCSI mid layer should
recover from this condition and not cause a hung io subsystem. The only
way I can see this fail is if the scsi host free_list is not filled for
some reason during init, or if the commands allocated from there are
lost or never finished by the hardware.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kblockd/1: page allocation failure in 2.6.9
  2004-12-24 13:20       ` Jens Axboe
@ 2004-12-24 19:28         ` James Bottomley
  2004-12-25 22:49           ` Frank Steiner
  0 siblings, 1 reply; 10+ messages in thread
From: James Bottomley @ 2004-12-24 19:28 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Frank Steiner, Denis Vlasenko, Linux Kernel, Andrew Morton

On Fri, 2004-12-24 at 14:20 +0100, Jens Axboe wrote:
> This looks fishy - this is GFP_ATOMIC | GFP_DMA, where it should only be
> GFP_ATOMIC. gdth should not have shost->cmd_pool->gfp_mask ==
> GFP_ATOMIC, that looks like a bug in the driver.

This may be a fault in the gdth driver (if that's where the trace came
from).  It sets unchecked_isa_dma to 1 in its template and then resets
it at various places in the driver it may have been reset too late to
prevent it from getting the GFP_DMA scsi command pool.

> Apart from that, the trace looks sane and the SCSI mid layer should
> recover from this condition and not cause a hung io subsystem. The only
> way I can see this fail is if the scsi host free_list is not filled for
> some reason during init, or if the commands allocated from there are
> lost or never finished by the hardware.

Yes, I've checked this out in SCSI by artificially simulating a memory
allocation failure in scsi_get_command().  My system behaves nicely even
as I rack up the failures.  Is there anything else unusual in the log
that may indicate what the problem is?

James



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kblockd/1: page allocation failure in 2.6.9
  2004-12-24 19:28         ` James Bottomley
@ 2004-12-25 22:49           ` Frank Steiner
  2004-12-26 15:46             ` James Bottomley
  0 siblings, 1 reply; 10+ messages in thread
From: Frank Steiner @ 2004-12-25 22:49 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Frank Steiner, Denis Vlasenko, Linux Kernel, Andrew Morton

Hi all,

thanks for caring! I don't fully understand what you are talking about
in detail :-), but maybe I can give some more information that could help.

- If you suspect the gdth driver causing the error, it must be some very
   special situation on this host causing it. We have 2 other hosts
   with the same icp vortex GDT8514RZ controller like the host
   where the kblockd message occured. They all have internal raid1 disks
   (73gb or 146gb). One is our main NFS server (it has two raid1 with 146g
   each) and it has a lot of I/O, sometimes 50GB or more a day with peaks
   up to 200MB per second (reading), and we never saw any kblockd message
   in the logs (I just checked them all).

- there were no messages "around" the kblockd messages in /var/log/messages
   but the usual ones about remote ssh login, cron jobs etc., but the messages
   were all more than 10 minutes "away" before and after the kblockd happened.

- not much I/O can have taken place on the internal disks attached to the
   icp controller when the bug was triggered, because all the I/O for
   e.g. updates or backups happens only in the night for all hosts except
   the NFS servers.

- the host where the error occured is the only one that (in addition to the icp
   controller with the internal raid1) has two external SCSI-to-IDE-Raids
   attached to the adaptec 29160 controller that runs with the aic7xxx modul.

- According to the user working a lot on this host, it is possible that he
   did a dump of a large mysql database on the external SCSI-to-IDE raids
   around the time where the kblockd messages occured. He can't tell
   for sure if it was the same time.
   Since we never had any problems on the other hosts with the icp
   controllers and the gdth module, maybe the bug occurs in the aic7xxx
   module? Or if it occurs in the gdth, maybe it's caused by some interaction
   between the gdth and the aic7xxx driver both accessing the scsi bus?
   The gdth driver is compiled into the kernel, the aic7xxx loads as module.

- I did a "dd if=/dev/sd? of=/dev/null bs=500M" for all disks (sda on gdth,
   sdb and sdc on aic7xxx) to check if it could be some disk error or sth..
   but those dd went fine without triggering the bug.

Don't know if this info helps...

Please let me know if there is something I can do to help finding
the bug. I don't mind to compile a special kernel for this host if I can
turn on some debugging options. I saw some DEBUG_GDTH variable in gdth.c,
but I don't know how to turn this on exactly, would I have to define the
variable in the header file somehow? (Sorry, I'm not very familiar with
C :-() For the aic7xxx I found two config options AIC7XXX_DEBUG_ENABLE
and AIC7XXX_DEBUG_MASK. Could that help you identify the bug if I have all
this enabled when the bug shows up again?

Thanks!

Frank



-- 
Dipl.-Inform. Frank Steiner   Web:  http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik    Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr 17            Phone: +49 89 2180-4049
80333 Muenchen, Germany       Fax:              -4054
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kblockd/1: page allocation failure in 2.6.9
  2004-12-25 22:49           ` Frank Steiner
@ 2004-12-26 15:46             ` James Bottomley
  2004-12-26 22:31               ` Frank Steiner
  0 siblings, 1 reply; 10+ messages in thread
From: James Bottomley @ 2004-12-26 15:46 UTC (permalink / raw)
  To: Frank Steiner; +Cc: Jens Axboe, Denis Vlasenko, Linux Kernel, Andrew Morton

On Sat, 2004-12-25 at 23:49 +0100, Frank Steiner wrote:
> - If you suspect the gdth driver causing the error, it must be some very
>    special situation on this host causing it. We have 2 other hosts
>    with the same icp vortex GDT8514RZ controller like the host
>    where the kblockd message occured. They all have internal raid1 disks
>    (73gb or 146gb). One is our main NFS server (it has two raid1 with 146g
>    each) and it has a lot of I/O, sometimes 50GB or more a day with peaks
>    up to 200MB per second (reading), and we never saw any kblockd message
>    in the logs (I just checked them all).

The kblockd message is just a symptom of the machine running low on
memory and starting to fail normal kernel memory allocations.  There's
always a potential for hangs when something can't allocate memory:
usually it's in the middle of a transaction and just forgets about it;
what should happen (as we just verified SCSI does) is that the
transaction should be rolled back and retried.

> - there were no messages "around" the kblockd messages in /var/log/messages
>    but the usual ones about remote ssh login, cron jobs etc., but the messages
>    were all more than 10 minutes "away" before and after the kblockd happened.

That's unfortunate.  It means that whatever caused this left no trace.
The best working theory is still a memory allocation failure somewhere.
If it occurs again, could you get a full system process trace (<alt>-
<sysrq>-t) and send that?  That might give a better clue as to what went
on.

James



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kblockd/1: page allocation failure in 2.6.9
  2004-12-26 15:46             ` James Bottomley
@ 2004-12-26 22:31               ` Frank Steiner
  2005-01-10  9:19                 ` Frank Steiner
  0 siblings, 1 reply; 10+ messages in thread
From: Frank Steiner @ 2004-12-26 22:31 UTC (permalink / raw)
  To: James Bottomley
  Cc: Frank Steiner, Jens Axboe, Denis Vlasenko, Linux Kernel, Andrew Morton

James Bottomley wrote

> The kblockd message is just a symptom of the machine running low on
> memory and starting to fail normal kernel memory allocations.  There's
> always a potential for hangs when something can't allocate memory:
> usually it's in the middle of a transaction and just forgets about it;
> what should happen (as we just verified SCSI does) is that the
> transaction should be rolled back and retried.

Ah, ok. Now at least I know what this messages mean! Indeed, as I wrote
in the first mail, 1.7GB of the 2GB memory of the machine were in use,
and that was "real" usage, i.e., without any buffers. But that was already
three hours after the failure occured. Unfortunately we didn't check what
process was using how much memory because we just rebooted as quick as
possible to get the server processes running again.
Might be a memory leak in some application, because neither the mysql
database nor the tomcat server should take more than a few hundred MB
of memory alltogether. I will ask the user to redo the mysql database
dump, maybe that was it and we can trigger the failure again. But that
will take two weeks until everyone is back in office :-)

> 
>>- there were no messages "around" the kblockd messages in /var/log/messages
>>   but the usual ones about remote ssh login, cron jobs etc., but the messages
>>   were all more than 10 minutes "away" before and after the kblockd happened.
> 
> That's unfortunate.  It means that whatever caused this left no trace.
> The best working theory is still a memory allocation failure somewhere.
> If it occurs again, could you get a full system process trace (<alt>-
> <sysrq>-t) and send that?  That might give a better clue as to what went
> on.

Yes, sure! If we can reproduce it, I will send the log and try to figure
out which process takes so much memory. Thanks for your help!

cu,
Frank

-- 
Dipl.-Inform. Frank Steiner   Web:  http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik    Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr 17            Phone: +49 89 2180-4049
80333 Muenchen, Germany       Fax:              -4054
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kblockd/1: page allocation failure in 2.6.9
  2004-12-26 22:31               ` Frank Steiner
@ 2005-01-10  9:19                 ` Frank Steiner
  0 siblings, 0 replies; 10+ messages in thread
From: Frank Steiner @ 2005-01-10  9:19 UTC (permalink / raw)
  To: Frank Steiner
  Cc: James Bottomley, Jens Axboe, Denis Vlasenko, Linux Kernel, Andrew Morton

Hi,

we can now reproduce the problem, but it looks like the problems are
not caused by the kblockd.

The problem on this host is that from time to time "ps -aux" just hangs
and starts eating up all memory. When it has taken enough, either some
kblockd message occurs, and/or the oom killer jumps in and starts killing
threads.
So, the ps -aux hangs *before* the kblockd messages occur, and is not
caused by it (like I assumed before). And since I don't get any disk
errors etc. after the kblockd messages, I guess everything is fine and
the scsi operation indeed recovers the way you said it should.

Now we just need to find out why ps -aux hangs. Seems to be a problem
with the nfsd, because it hangs when showing the [nfsd] entries and
works after restarting the nfs server. In case someone is interested
in this issue, I described it in more detail on the nfs list at 
http://marc.theaimsgroup.com/?l=linux-nfs&m=110509676609987&w=2

Thanks for your help!
cu,
Frank




-- 
Dipl.-Inform. Frank Steiner   Web:  http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik    Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17           Phone: +49 89 2180-4049
80333 Muenchen, Germany       Fax:   +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-01-10  9:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-21  7:39 kblockd/1: page allocation failure in 2.6.9 Frank Steiner
2004-12-23 11:26 ` Frank Steiner
2004-12-23 15:51   ` Denis Vlasenko
2004-12-23 15:55     ` Frank Steiner
2004-12-24 13:20       ` Jens Axboe
2004-12-24 19:28         ` James Bottomley
2004-12-25 22:49           ` Frank Steiner
2004-12-26 15:46             ` James Bottomley
2004-12-26 22:31               ` Frank Steiner
2005-01-10  9:19                 ` Frank Steiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).