* 2.4.20: Proccess stuck in __lock_page ... @ 2003-05-27 3:41 manish 2003-05-27 4:03 ` Marcelo Tosatti 0 siblings, 1 reply; 142+ messages in thread From: manish @ 2003-05-27 3:41 UTC (permalink / raw) To: linux-kernel, manish Hello ! I am running the 2.4.20 kernel on a system with 3.5 GB RAM and dual CPU. I am running bonnie accross four drives in parallel: bonnie -s 1000 -d /<dir-name> bdflush settings on this system: [root@dyn-10-123-130-235 vm]# cat bdflush 2 50 32 100 50 300 1 0 0 All the bonnie process and any other process (like df, ps -ef etc.) are hung in __lock_page. Breaking into kdb, I observe the following for one such bonnie process: schedule(..) __lock_page(..) lock_page(..) do_generic_file_read(..) generic_file_read(..) After this, the processes never exit the hang. At times, a couple of bonnie processes complete but the hang still occurs with the remaining processes and with the other processes. I tried out the 2.5.33 kernel (one of the 2.5 series) and observed that the hang does not occur. If I run, two bonnie processes, they never get stuck. Actually, if I run 4 parallel mke2fs, they too get stuck. Any clues where this could be happening? Thanks -Manish ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 3:41 2.4.20: Proccess stuck in __lock_page manish @ 2003-05-27 4:03 ` Marcelo Tosatti 2003-05-27 4:25 ` manish ` (2 more replies) 0 siblings, 3 replies; 142+ messages in thread From: Marcelo Tosatti @ 2003-05-27 4:03 UTC (permalink / raw) To: manish; +Cc: linux-kernel On Mon, 26 May 2003, manish wrote: > Hello ! > > I am running the 2.4.20 kernel on a system with 3.5 GB RAM and dual CPU. > I am running bonnie accross four drives in parallel: > > bonnie -s 1000 -d /<dir-name> > > bdflush settings on this system: > > [root@dyn-10-123-130-235 vm]# cat bdflush > 2 50 32 100 50 300 1 0 0 > > All the bonnie process and any other process (like df, ps -ef etc.) are > hung in __lock_page. Breaking into kdb, I observe the following for one > such bonnie process: > > schedule(..) > __lock_page(..) > lock_page(..) > do_generic_file_read(..) > generic_file_read(..) > > After this, the processes never exit the hang. At times, a couple of > bonnie processes complete but the hang still occurs with the remaining > processes and with the other processes. > > I tried out the 2.5.33 kernel (one of the 2.5 series) and observed that > the hang does not occur. If I run, two bonnie processes, they never get > stuck. Actually, if I run 4 parallel mke2fs, they too get stuck. > > Any clues where this could be happening? Hi, Are you sure there is no disk activity ? Run vmstat and check that, please. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 4:03 ` Marcelo Tosatti @ 2003-05-27 4:25 ` manish 2003-05-27 4:59 ` Marcelo Tosatti 2003-05-27 4:31 ` manish 2003-05-27 14:14 ` Carl-Daniel Hailfinger 2 siblings, 1 reply; 142+ messages in thread From: manish @ 2003-05-27 4:25 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: linux-kernel Marcelo Tosatti wrote: > >On Mon, 26 May 2003, manish wrote: > >>Hello ! >> >>I am running the 2.4.20 kernel on a system with 3.5 GB RAM and dual CPU. >>I am running bonnie accross four drives in parallel: >> >>bonnie -s 1000 -d /<dir-name> >> >>bdflush settings on this system: >> >>[root@dyn-10-123-130-235 vm]# cat bdflush >>2 50 32 100 50 300 1 0 0 >> >>All the bonnie process and any other process (like df, ps -ef etc.) are >>hung in __lock_page. Breaking into kdb, I observe the following for one >>such bonnie process: >> >>schedule(..) >>__lock_page(..) >>lock_page(..) >>do_generic_file_read(..) >>generic_file_read(..) >> >>After this, the processes never exit the hang. At times, a couple of >>bonnie processes complete but the hang still occurs with the remaining >>processes and with the other processes. >> >>I tried out the 2.5.33 kernel (one of the 2.5 series) and observed that >>the hang does not occur. If I run, two bonnie processes, they never get >>stuck. Actually, if I run 4 parallel mke2fs, they too get stuck. >> >>Any clues where this could be happening? >> > >Hi, > >Are you sure there is no disk activity ? > >Run vmstat and check that, please. > Hello ! Thanks for the response. The light on the controller does not blink at all. Intitially, it does blink. However, after this hang, it does not at all. vmstat after the hang 1 1 0 780 2056892 5784 1415324 0 0 0 4 102 7 49 1 50 1 1 0 780 2056892 5784 1415324 0 0 0 4 102 9 49 1 50 1 1 0 780 2056892 5784 1415324 0 0 0 5 104 10 29 21 50 0 1 0 780 2056708 5784 1415324 0 0 0 1 104 12 0 13 86 1 1 0 780 2222904 5784 1249396 0 0 0 172 126 25 0 4 96 0 1 0 780 3081052 5784 391324 0 0 0 403 161 43 0 12 88 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 1 0 780 3080952 5788 391408 0 0 29 9 120 72 0 0 100 0 1 0 780 3080952 5788 391408 0 0 0 0 111 19 0 0 100 0 1 0 780 3080952 5788 391408 0 0 0 1 103 9 0 0 100 0 1 0 780 3080952 5788 391408 0 0 0 0 101 9 0 0 100 0 1 0 780 3080952 5788 391408 0 0 0 0 101 7 0 0 100 0 1 0 780 3080952 5788 391408 0 0 0 0 101 9 0 0 100 0 1 0 780 3080952 5788 391408 0 0 0 0 102 9 0 0 100 0 1 0 780 3080952 5788 391408 0 0 0 1 101 8 0 0 100 0 1 0 780 3081308 5788 391420 0 0 0 231 150 92 3 0 97 0 1 0 780 3081308 5788 391420 0 0 0 0 102 7 0 0 100 0 1 0 780 3081308 5788 391420 0 0 0 0 102 7 0 0 100 0 1 0 780 3081304 5788 391420 0 0 0 0 101 9 0 0 100 0 1 0 780 3081304 5788 391420 0 0 0 0 102 8 0 0 100 0 1 0 780 3081300 5788 391420 0 0 0 0 101 8 0 0 100 0 1 0 780 3081300 5788 391420 0 0 0 0 101 9 0 0 100 0 1 0 780 3081296 5788 391420 0 0 0 0 101 7 0 0 100 0 1 0 780 3081296 5788 391420 0 0 0 0 101 9 0 0 100 0 1 0 780 3081292 5788 391420 0 0 0 0 102 9 0 0 100 0 1 0 780 3081292 5788 391420 0 0 0 0 101 8 0 0 100 0 1 0 780 3081288 5788 391420 0 0 0 0 102 9 0 0 100 0 1 0 780 3081288 5788 391420 0 0 0 0 102 7 0 0 100 0 1 0 780 3081284 5788 391420 0 0 0 0 102 9 0 0 100 0 1 0 780 3081284 5788 391420 0 0 0 0 102 8 0 0 100 0 1 0 780 3081280 5788 391420 0 0 0 0 101 8 0 0 100 0 1 0 780 3081276 5788 391420 0 0 0 0 102 9 0 0 100 0 1 0 780 3081260 5788 391420 0 0 0 0 235 30 0 0 100 0 1 0 780 3081260 5788 391420 0 0 0 0 101 9 0 0 100 0 1 0 780 3081256 5788 391420 0 0 0 0 101 7 0 0 100 0 1 0 780 3081248 5788 391424 0 0 0 169 137 54 3 1 97 0 1 0 780 3081248 5788 391424 0 0 0 0 101 9 0 0 100 0 1 0 780 3081248 5788 391424 0 0 0 0 101 8 0 0 100 0 1 0 780 3081248 5788 391424 0 0 0 0 101 9 0 0 100 One bonnie process is hung. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 4:25 ` manish @ 2003-05-27 4:59 ` Marcelo Tosatti 2003-05-27 15:29 ` manish 0 siblings, 1 reply; 142+ messages in thread From: Marcelo Tosatti @ 2003-05-27 4:59 UTC (permalink / raw) To: manish; +Cc: linux-kernel On Mon, 26 May 2003, manish wrote: > Marcelo Tosatti wrote: > > > > >On Mon, 26 May 2003, manish wrote: > > > >>Hello ! > >> > >>I am running the 2.4.20 kernel on a system with 3.5 GB RAM and dual CPU. > >>I am running bonnie accross four drives in parallel: > >> > >>bonnie -s 1000 -d /<dir-name> > >> > >>bdflush settings on this system: > >> > >>[root@dyn-10-123-130-235 vm]# cat bdflush > >>2 50 32 100 50 300 1 0 0 > >> > >>All the bonnie process and any other process (like df, ps -ef etc.) are > >>hung in __lock_page. Breaking into kdb, I observe the following for one > >>such bonnie process: > >> > >>schedule(..) > >>__lock_page(..) > >>lock_page(..) > >>do_generic_file_read(..) > >>generic_file_read(..) > >> > >>After this, the processes never exit the hang. At times, a couple of > >>bonnie processes complete but the hang still occurs with the remaining > >>processes and with the other processes. > >> > >>I tried out the 2.5.33 kernel (one of the 2.5 series) and observed that > >>the hang does not occur. If I run, two bonnie processes, they never get > >>stuck. Actually, if I run 4 parallel mke2fs, they too get stuck. > >> > >>Any clues where this could be happening? > >> > > > >Hi, > > > >Are you sure there is no disk activity ? > > > >Run vmstat and check that, please. > > > Hello ! > > Thanks for the response. > > The light on the controller does not blink at all. Intitially, it does > blink. However, after this hang, it does not at all. > > vmstat after the hang > > 1 1 0 780 2056892 5784 1415324 0 0 0 4 102 7 > 49 1 50 > 1 1 0 780 2056892 5784 1415324 0 0 0 4 102 9 > 49 1 50 > 1 1 0 780 2056892 5784 1415324 0 0 0 5 104 10 > 29 21 50 > 0 1 0 780 2056708 5784 1415324 0 0 0 1 104 12 > 0 13 86 > 1 1 0 780 2222904 5784 1249396 0 0 0 172 126 25 > 0 4 96 > 0 1 0 780 3081052 5784 391324 0 0 0 403 161 43 > 0 12 88 > procs memory swap io > system cpu > r b w swpd free buff cache si so bi bo in cs us > sy id > 0 1 0 780 3080952 5788 391408 0 0 29 9 120 72 > 0 0 100 > 0 1 0 780 3080952 5788 391408 0 0 0 0 111 19 > 0 0 100 > 0 1 0 780 3080952 5788 391408 0 0 0 1 103 9 > 0 0 100 > 0 1 0 780 3080952 5788 391408 0 0 0 0 101 9 > 0 0 100 > 0 1 0 780 3080952 5788 391408 0 0 0 0 101 7 > 0 0 100 > 0 1 0 780 3080952 5788 391408 0 0 0 0 101 9 > 0 0 100 > 0 1 0 780 3080952 5788 391408 0 0 0 0 102 9 > 0 0 100 > 0 1 0 780 3080952 5788 391408 0 0 0 1 101 8 > 0 0 100 > 0 1 0 780 3081308 5788 391420 0 0 0 231 150 92 > 3 0 97 > 0 1 0 780 3081308 5788 391420 0 0 0 0 102 7 > 0 0 100 > 0 1 0 780 3081308 5788 391420 0 0 0 0 102 7 > 0 0 100 > 0 1 0 780 3081304 5788 391420 0 0 0 0 101 9 > 0 0 100 > 0 1 0 780 3081304 5788 391420 0 0 0 0 102 8 > 0 0 100 > 0 1 0 780 3081300 5788 391420 0 0 0 0 101 8 > 0 0 100 > 0 1 0 780 3081300 5788 391420 0 0 0 0 101 9 > 0 0 100 > 0 1 0 780 3081296 5788 391420 0 0 0 0 101 7 Ok, and does it happen with the stock kernel? ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 4:59 ` Marcelo Tosatti @ 2003-05-27 15:29 ` manish 2003-05-27 16:59 ` Marcelo Tosatti 0 siblings, 1 reply; 142+ messages in thread From: manish @ 2003-05-27 15:29 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: linux-kernel Marcelo Tosatti wrote: > >On Mon, 26 May 2003, manish wrote: > >>Marcelo Tosatti wrote: >> >>>On Mon, 26 May 2003, manish wrote: >>> >>>>Hello ! >>>> >>>>I am running the 2.4.20 kernel on a system with 3.5 GB RAM and dual CPU. >>>>I am running bonnie accross four drives in parallel: >>>> >>>>bonnie -s 1000 -d /<dir-name> >>>> >>>>bdflush settings on this system: >>>> >>>>[root@dyn-10-123-130-235 vm]# cat bdflush >>>>2 50 32 100 50 300 1 0 0 >>>> >>>>All the bonnie process and any other process (like df, ps -ef etc.) are >>>>hung in __lock_page. Breaking into kdb, I observe the following for one >>>>such bonnie process: >>>> >>>>schedule(..) >>>>__lock_page(..) >>>>lock_page(..) >>>>do_generic_file_read(..) >>>>generic_file_read(..) >>>> >>>>After this, the processes never exit the hang. At times, a couple of >>>>bonnie processes complete but the hang still occurs with the remaining >>>>processes and with the other processes. >>>> >>>>I tried out the 2.5.33 kernel (one of the 2.5 series) and observed that >>>>the hang does not occur. If I run, two bonnie processes, they never get >>>>stuck. Actually, if I run 4 parallel mke2fs, they too get stuck. >>>> >>>>Any clues where this could be happening? >>>> >>>Hi, >>> >>>Are you sure there is no disk activity ? >>> >>>Run vmstat and check that, please. >>> >>Hello ! >> >>Thanks for the response. >> >> The light on the controller does not blink at all. Intitially, it does >>blink. However, after this hang, it does not at all. >> >>vmstat after the hang >> >>1 1 0 780 2056892 5784 1415324 0 0 0 4 102 7 >>49 1 50 >> 1 1 0 780 2056892 5784 1415324 0 0 0 4 102 9 >>49 1 50 >> 1 1 0 780 2056892 5784 1415324 0 0 0 5 104 10 >>29 21 50 >> 0 1 0 780 2056708 5784 1415324 0 0 0 1 104 12 >>0 13 86 >> 1 1 0 780 2222904 5784 1249396 0 0 0 172 126 25 >>0 4 96 >> 0 1 0 780 3081052 5784 391324 0 0 0 403 161 43 >>0 12 88 >> procs memory swap io >>system cpu >> r b w swpd free buff cache si so bi bo in cs us >>sy id >> 0 1 0 780 3080952 5788 391408 0 0 29 9 120 72 >>0 0 100 >> 0 1 0 780 3080952 5788 391408 0 0 0 0 111 19 >>0 0 100 >> 0 1 0 780 3080952 5788 391408 0 0 0 1 103 9 >>0 0 100 >> 0 1 0 780 3080952 5788 391408 0 0 0 0 101 9 >>0 0 100 >> 0 1 0 780 3080952 5788 391408 0 0 0 0 101 7 >>0 0 100 >> 0 1 0 780 3080952 5788 391408 0 0 0 0 101 9 >>0 0 100 >> 0 1 0 780 3080952 5788 391408 0 0 0 0 102 9 >>0 0 100 >> 0 1 0 780 3080952 5788 391408 0 0 0 1 101 8 >>0 0 100 >> 0 1 0 780 3081308 5788 391420 0 0 0 231 150 92 >>3 0 97 >> 0 1 0 780 3081308 5788 391420 0 0 0 0 102 7 >>0 0 100 >> 0 1 0 780 3081308 5788 391420 0 0 0 0 102 7 >>0 0 100 >> 0 1 0 780 3081304 5788 391420 0 0 0 0 101 9 >>0 0 100 >> 0 1 0 780 3081304 5788 391420 0 0 0 0 102 8 >>0 0 100 >> 0 1 0 780 3081300 5788 391420 0 0 0 0 101 8 >>0 0 100 >> 0 1 0 780 3081300 5788 391420 0 0 0 0 101 9 >>0 0 100 >> 0 1 0 780 3081296 5788 391420 0 0 0 0 101 7 >> > >Ok, and does it happen with the stock kernel? > Yes, with the stock kernel too but after long hrs of runtime .. Thanks -Manish ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 15:29 ` manish @ 2003-05-27 16:59 ` Marcelo Tosatti 0 siblings, 0 replies; 142+ messages in thread From: Marcelo Tosatti @ 2003-05-27 16:59 UTC (permalink / raw) To: manish; +Cc: lkml On Tue, 27 May 2003, manish wrote: > >Ok, and does it happen with the stock kernel? > Yes, with the stock kernel too but after long hrs of runtime .. Could you try Alt+SysRq+T and send us the output on the locked STOCK kernel please? ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 4:03 ` Marcelo Tosatti 2003-05-27 4:25 ` manish @ 2003-05-27 4:31 ` manish 2003-05-27 14:14 ` Carl-Daniel Hailfinger 2 siblings, 0 replies; 142+ messages in thread From: manish @ 2003-05-27 4:31 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: linux-kernel Marcelo Tosatti wrote: > >On Mon, 26 May 2003, manish wrote: > >>Hello ! >> >>I am running the 2.4.20 kernel on a system with 3.5 GB RAM and dual CPU. >>I am running bonnie accross four drives in parallel: >> >>bonnie -s 1000 -d /<dir-name> >> >>bdflush settings on this system: >> >>[root@dyn-10-123-130-235 vm]# cat bdflush >>2 50 32 100 50 300 1 0 0 >> >>All the bonnie process and any other process (like df, ps -ef etc.) are >>hung in __lock_page. Breaking into kdb, I observe the following for one >>such bonnie process: >> >>schedule(..) >>__lock_page(..) >>lock_page(..) >>do_generic_file_read(..) >>generic_file_read(..) >> >>After this, the processes never exit the hang. At times, a couple of >>bonnie processes complete but the hang still occurs with the remaining >>processes and with the other processes. >> >>I tried out the 2.5.33 kernel (one of the 2.5 series) and observed that >>the hang does not occur. If I run, two bonnie processes, they never get >>stuck. Actually, if I run 4 parallel mke2fs, they too get stuck. >> >>Any clues where this could be happening? >> > >Hi, > >Are you sure there is no disk activity ? > >Run vmstat and check that, please. > Hello ! My bad. This is one of the kernels that had modified the IO subsystem to replace the io_request_lock with a finer grained host_lock and queue_lock. I also noticed that the hang occurs when the settings of bdflush are the following: root@dyn-10-123-130-235 vm]# cat bdflush 30 50 32 100 50 300 60 0 0 Thanks -Manish ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 4:03 ` Marcelo Tosatti 2003-05-27 4:25 ` manish 2003-05-27 4:31 ` manish @ 2003-05-27 14:14 ` Carl-Daniel Hailfinger 2003-05-27 14:28 ` William Lee Irwin III 2003-05-27 17:27 ` Marcelo Tosatti 2 siblings, 2 replies; 142+ messages in thread From: Carl-Daniel Hailfinger @ 2003-05-27 14:14 UTC (permalink / raw) To: Marcelo Tosatti Cc: manish, linux-kernel, Christian Klose, Marc-Christian Petersen, William Lee Irwin III Christian, this looks supiciously like the problem you are experiencing since 2.4.19-pre. Maybe we can fix this for good. Marcelo Tosatti wrote: > > On Mon, 26 May 2003, manish wrote: > > >>Hello ! >> >>I am running the 2.4.20 kernel on a system with 3.5 GB RAM and dual CPU. >>I am running bonnie accross four drives in parallel: >> >>bonnie -s 1000 -d /<dir-name> >> >>bdflush settings on this system: >> >>[root@dyn-10-123-130-235 vm]# cat bdflush >>2 50 32 100 50 300 1 0 0 >> >>All the bonnie process and any other process (like df, ps -ef etc.) are >>hung in __lock_page. Breaking into kdb, I observe the following for one Following is SysRq-T output for stuck processes during such a pause from Christian Klose. Only processes in D state are listed for brevity. Especially the last two call traces are interesting. kjournald D C15C7240 4 122 1 123 120 (L-TLB) Call Trace: [__get_request_wait+197/208] [__make_request+392/1472] [generic_make_request+226/304] [submit_bh+80/112] [ll_rw_block+263/432] [journal_commit_transaction+4017/4416] [kjournald+277/464] [commit_timeout+0/16] [kernel_thread+46/64] [kjournald+0/464] kmail D D73E9360 2656 1960 1 1978 (NOTLB) Call Trace: [sleep_on+56/96] [log_wait_commit+56/80] [journal_stop+345/480] [journal_force_commit+60/64] [ext3_force_commit+35/48] [ext3_sync_file+132/176] [ext3_writepage+0/672] [sys_fsync+151/208] [system_call+51/56] mc D C016B338 0 2177 2152 2179 (NOTLB) Call Trace: [journal_stop+328/480] [__lock_page+149/192] [lock_page+26/32] [do_generic_file_read+653/1104] [file_read_actor+0/160] [generic_file_read+178/368] [file_read_actor+0/160] [sys_read+163/320] [system_call+51/56] kmail D 00200282 2656 1960 1 1978 (NOTLB) Call Trace: [sleep_on+56/96] [log_wait_commit+56/80] [journal_stop+345/480] [journal_force_commit+60/64] [ext3_force_commit+35/48] [ext3_sync_file+132/176] [ext3_writepage+0/672] [sys_fsync+151/208] [system_call+51/56] mc D C016B338 0 2177 2152 2179 (NOTLB) Call Trace: [journal_stop+328/480] [__lock_page+149/192] [lock_page+26/32] [do_generic_file_read+653/1104] [file_read_actor+0/160] [generic_file_read+178/368] [file_read_actor+0/160] [sys_read+163/320] [system_call+51/56] grep D DFD7E120 0 3243 1470 3244 (NOTLB) Call Trace: [__wait_on_buffer+93/144] [bread+123/144] [ext3_get_branch+106/240] [ext3_get_block_handle+120/688] [create_buffers+107/224] [ext3_get_block+74/144] [block_read_full_page+541/624] [__alloc_pages+75/400] [page_cache_read+173/208] [ext3_get_block+0/144] [read_cluster_nonblocking+57/80] [filemap_nopage+285/560] [do_no_page+137/480] [do_page_fault+376/1246] [handle_mm_fault+119/256] [do_page_fault+376/1246] [rb_insert_color+210/240] [do_page_fault+0/1246] [error_code+52/60] [clear_user+51/80] [do_page_fault+0/1246] [error_code+52/60] [clear_user+51/80] [padzero+40/48] [load_elf_binary+1179/2848] [load_elf_binary+0/2848] [search_binary_handler+269/400] [copy_strings+440/560] [do_execve+365/544] [sys_execve+66/128] [system_call+51/56] grep D C02508D4 0 3244 1470 3245 3243 (NOTLB) Call Trace: [__lock_page+149/192] [lock_page+26/32] [filemap_nopage+305/560] [do_no_page+137/480] [do_page_fault+376/1246] [handle_mm_fault+119/256] [do_page_fault+376/1246] [rb_insert_color+210/240] [do_page_fault+0/1246] [error_code+52/60] [clear_user+51/80] [do_page_fault+0/1246] [error_code+52/60] [clear_user+51/80] [padzero+40/48] [load_elf_binary+1179/2848] [__lock_page+175/192] [file_read_actor+0/160] [load_elf_binary+0/2848] [search_binary_handler+269/400] [copy_strings+440/560] [do_execve+365/544] [sys_execve+66/128] [system_call+51/56] grep D C02508D4 0 3245 1470 3244 (NOTLB) Call Trace: [__lock_page+149/192] [lock_page+26/32] [filemap_nopage+305/560] [do_no_page+137/480] [do_page_fault+376/1246] [handle_mm_fault+119/256] [do_page_fault+376/1246] [rb_insert_color+210/240] [do_page_fault+0/1246] [error_code+52/60] [clear_user+51/80] [do_page_fault+0/1246] [error_code+52/60] [clear_user+51/80] [padzero+40/48] [load_elf_binary+1179/2848] [__lock_page+175/192] [file_read_actor+0/160] [load_elf_binary+0/2848] [search_binary_handler+269/400] [copy_strings+440/560] [do_execve+365/544] [sys_execve+66/128] [system_call+51/56] Regards, Carl-Daniel ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 14:14 ` Carl-Daniel Hailfinger @ 2003-05-27 14:28 ` William Lee Irwin III 2003-05-27 17:27 ` Marcelo Tosatti 1 sibling, 0 replies; 142+ messages in thread From: William Lee Irwin III @ 2003-05-27 14:28 UTC (permalink / raw) To: Carl-Daniel Hailfinger Cc: Marcelo Tosatti, manish, linux-kernel, Christian Klose, Marc-Christian Petersen On Tue, May 27, 2003 at 04:14:51PM +0200, Carl-Daniel Hailfinger wrote: > Christian, > this looks supiciously like the problem you are experiencing since > 2.4.19-pre. Maybe we can fix this for good. The most I know of this is that someone made it go away by backing out some ll_rw_blk.c cset. -- wli ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 14:14 ` Carl-Daniel Hailfinger 2003-05-27 14:28 ` William Lee Irwin III @ 2003-05-27 17:27 ` Marcelo Tosatti 2003-05-27 17:36 ` Marc-Christian Petersen ` (2 more replies) 1 sibling, 3 replies; 142+ messages in thread From: Marcelo Tosatti @ 2003-05-27 17:27 UTC (permalink / raw) To: Carl-Daniel Hailfinger Cc: manish, linux-kernel, Christian Klose, Marc-Christian Petersen, William Lee Irwin III On Tue, 27 May 2003, Carl-Daniel Hailfinger wrote: > Christian, > > this looks supiciously like the problem you are experiencing since > 2.4.19-pre. Maybe we can fix this for good. > > Marcelo Tosatti wrote: > > > > On Mon, 26 May 2003, manish wrote: > > > > > >>Hello ! > >> > >>I am running the 2.4.20 kernel on a system with 3.5 GB RAM and dual CPU. > >>I am running bonnie accross four drives in parallel: > >> > >>bonnie -s 1000 -d /<dir-name> > >> > >>bdflush settings on this system: > >> > >>[root@dyn-10-123-130-235 vm]# cat bdflush > >>2 50 32 100 50 300 1 0 0 > >> > >>All the bonnie process and any other process (like df, ps -ef etc.) are > >>hung in __lock_page. Breaking into kdb, I observe the following for one > > Following is SysRq-T output for stuck processes during such a pause from > Christian Klose. Only processes in D state are listed for brevity. > Especially the last two call traces are interesting. A "pause" is perfectly fine (to some extent, of course), now a hang is not. Is this backtrace from a hanged, unusable kernel or ? ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 17:27 ` Marcelo Tosatti @ 2003-05-27 17:36 ` Marc-Christian Petersen 2003-05-27 17:47 ` Marcelo Tosatti 2003-05-27 17:36 ` William Lee Irwin III 2003-05-27 17:38 ` Carl-Daniel Hailfinger 2 siblings, 1 reply; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-27 17:36 UTC (permalink / raw) To: linux-kernel, Marcelo Tosatti, Carl-Daniel Hailfinger Cc: manish, Christian Klose, William Lee Irwin III On Tuesday 27 May 2003 19:27, Marcelo Tosatti wrote: Hi Marcelo, > > Following is SysRq-T output for stuck processes during such a pause from > > Christian Klose. Only processes in D state are listed for brevity. > > Especially the last two call traces are interesting. > A "pause" is perfectly fine (to some extent, of course), now a hang is > not. Is this backtrace from a hanged, unusable kernel or ? A pause is _not_ perfectly fine, even not to some extent. That pause we are discussing about is a pause of the _whole_ machine, not just disk i/o pauses. Mouse stops, keyboard stops, everything stops, who knows wtf. That behaviour is absolutely bullshit for desktop users. For serverusage you may not notice it in this dimension (mostly no X so no mouse), but also for a server environment this may be very bad. ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 17:36 ` Marc-Christian Petersen @ 2003-05-27 17:47 ` Marcelo Tosatti 2003-05-27 17:52 ` Marc-Christian Petersen ` (2 more replies) 0 siblings, 3 replies; 142+ messages in thread From: Marcelo Tosatti @ 2003-05-27 17:47 UTC (permalink / raw) To: Marc-Christian Petersen Cc: linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III On Tue, 27 May 2003, Marc-Christian Petersen wrote: > On Tuesday 27 May 2003 19:27, Marcelo Tosatti wrote: > > Hi Marcelo, > > > > Following is SysRq-T output for stuck processes during such a pause from > > > Christian Klose. Only processes in D state are listed for brevity. > > > Especially the last two call traces are interesting. > > A "pause" is perfectly fine (to some extent, of course), now a hang is > > not. Is this backtrace from a hanged, unusable kernel or ? > A pause is _not_ perfectly fine, even not to some extent. That pause we are > discussing about is a pause of the _whole_ machine, not just disk i/o pauses. > Mouse stops, keyboard stops, everything stops, who knows wtf. Do you also notice them? > That behaviour is absolutely bullshit for desktop users. For serverusage you > may not notice it in this dimension (mostly no X so no mouse), but also for a > server environment this may be very bad. Agreed. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 17:47 ` Marcelo Tosatti @ 2003-05-27 17:52 ` Marc-Christian Petersen 2003-05-27 17:57 ` Marcelo Tosatti 2003-05-27 17:53 ` manish 2003-05-27 18:12 ` Matthias Mueller 2 siblings, 1 reply; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-27 17:52 UTC (permalink / raw) To: Marcelo Tosatti Cc: linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III On Tuesday 27 May 2003 19:47, Marcelo Tosatti wrote: Hi Marcelo, > > A pause is _not_ perfectly fine, even not to some extent. That pause we > > are discussing about is a pause of the _whole_ machine, not just disk i/o > > pauses. Mouse stops, keyboard stops, everything stops, who knows wtf. > Do you also notice them? I do, people I know do also, numbers of those people only _I_ know are about ~30. I've reported this problem over a year ago while 2.4.19-pre time. > > That behaviour is absolutely bullshit for desktop users. For serverusage > > you may not notice it in this dimension (mostly no X so no mouse), but > > also for a server environment this may be very bad. > Agreed. thanks =) ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 17:52 ` Marc-Christian Petersen @ 2003-05-27 17:57 ` Marcelo Tosatti 2003-05-27 18:08 ` Marc-Christian Petersen 2003-05-27 18:09 ` manish 0 siblings, 2 replies; 142+ messages in thread From: Marcelo Tosatti @ 2003-05-27 17:57 UTC (permalink / raw) To: Marc-Christian Petersen Cc: linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III On Tue, 27 May 2003, Marc-Christian Petersen wrote: > On Tuesday 27 May 2003 19:47, Marcelo Tosatti wrote: > > Hi Marcelo, > > > > A pause is _not_ perfectly fine, even not to some extent. That pause we > > > are discussing about is a pause of the _whole_ machine, not just disk i/o > > > pauses. Mouse stops, keyboard stops, everything stops, who knows wtf. > > Do you also notice them? > I do, people I know do also, numbers of those people only _I_ know are about > ~30. I've reported this problem over a year ago while 2.4.19-pre time. Can you please try to reproduce it with -aa? > > > That behaviour is absolutely bullshit for desktop users. For serverusage > > > you may not notice it in this dimension (mostly no X so no mouse), but > > > also for a server environment this may be very bad. > > Agreed. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 17:57 ` Marcelo Tosatti @ 2003-05-27 18:08 ` Marc-Christian Petersen 2003-05-27 18:25 ` Andrea Arcangeli 2003-05-27 18:09 ` manish 1 sibling, 1 reply; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-27 18:08 UTC (permalink / raw) To: Marcelo Tosatti, Andrea Arcangeli Cc: linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III On Tuesday 27 May 2003 19:57, Marcelo Tosatti wrote: Hi Marcelo, > > I do, people I know do also, numbers of those people only _I_ know are > > about ~30. I've reported this problem over a year ago while 2.4.19-pre > > time. > Can you please try to reproduce it with -aa? not again ;) I've tried almost all known kernel tree's around, every kernel has the same effect. I even tried SuSE and Redhat Kernels. I've 'wasted' tons of time just find a solution for it. Andrea introduced, to address _exact_ this problem (pauses, stops, mouse is dead etc.), his lowlatency elevator. Side effect: decreases i/o throughput, and the "pauses/stops" are still there. Much less but not gone. The _only_ workaround yet (known to the public) is to change nr_requests in drivers/block/ll_rw_blk.c from 128 to 4 which gives a performance hit of about 40% (not acceptable in any way). .oO( I am quite sure I've mailed you all this stuff privately in response to your private mail to me ;) )Oo. ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 18:08 ` Marc-Christian Petersen @ 2003-05-27 18:25 ` Andrea Arcangeli 2003-05-27 18:33 ` Marcelo Tosatti 2003-05-27 18:35 ` Marc-Christian Petersen 0 siblings, 2 replies; 142+ messages in thread From: Andrea Arcangeli @ 2003-05-27 18:25 UTC (permalink / raw) To: Marc-Christian Petersen Cc: Marcelo Tosatti, linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III On Tue, May 27, 2003 at 08:08:43PM +0200, Marc-Christian Petersen wrote: > On Tuesday 27 May 2003 19:57, Marcelo Tosatti wrote: > > Hi Marcelo, > > > > I do, people I know do also, numbers of those people only _I_ know are > > > about ~30. I've reported this problem over a year ago while 2.4.19-pre > > > time. > > Can you please try to reproduce it with -aa? > not again ;) > > I've tried almost all known kernel tree's around, every kernel has the same > effect. I even tried SuSE and Redhat Kernels. > > I've 'wasted' tons of time just find a solution for it. > > Andrea introduced, to address _exact_ this problem (pauses, stops, mouse is > dead etc.), his lowlatency elevator. Side effect: decreases i/o throughput, not exactly decreases I/O throughput, the latest I/O benchmarks I seen from Randy (dbench/tiotest/bonnie/etc..) were still the fastest and it included the lowlatency elevator patch. So it may not help latency but it doesn't hurt in the numbers, at least not in the high end (that in theory is the one that needs the overkill length in the I/O queue most). However it definitely helps latency for me and I had a number of positive reports. Also make sure that you elvtune -r 0 -w 0 /dev/hda, also the journaling may affect the latency so you can try with plain ext2 to be sure it's not a fs issue. the lowlatency elevator patch may not be perfect but it definitely seems to work better here. especially since there's no apparent throughput loss, it makes lots of sense to keep it applied, or it would waste lots of ram for apparently no gain. > and the "pauses/stops" are still there. Much less but not gone. > > The _only_ workaround yet (known to the public) is to change nr_requests in > drivers/block/ll_rw_blk.c from 128 to 4 which gives a performance hit of > about 40% (not acceptable in any way). > > .oO( I am quite sure I've mailed you all this stuff privately in response to > your private mail to me ;) )Oo. > > ciao, Marc > Andrea ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 18:25 ` Andrea Arcangeli @ 2003-05-27 18:33 ` Marcelo Tosatti 2003-05-27 18:39 ` Marc-Christian Petersen ` (2 more replies) 2003-05-27 18:35 ` Marc-Christian Petersen 1 sibling, 3 replies; 142+ messages in thread From: Marcelo Tosatti @ 2003-05-27 18:33 UTC (permalink / raw) To: Andrea Arcangeli Cc: Marc-Christian Petersen, linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III u On Tue, 27 May 2003, Andrea Arcangeli wrote: > On Tue, May 27, 2003 at 08:08:43PM +0200, Marc-Christian Petersen wrote: > > On Tuesday 27 May 2003 19:57, Marcelo Tosatti wrote: > > > > Hi Marcelo, > > > > > > I do, people I know do also, numbers of those people only _I_ know are > > > > about ~30. I've reported this problem over a year ago while 2.4.19-pre > > > > time. > > > Can you please try to reproduce it with -aa? > > not again ;) > > > > I've tried almost all known kernel tree's around, every kernel has the same > > effect. I even tried SuSE and Redhat Kernels. > > > > I've 'wasted' tons of time just find a solution for it. > > > > Andrea introduced, to address _exact_ this problem (pauses, stops, mouse is > > dead etc.), his lowlatency elevator. Side effect: decreases i/o throughput, > > not exactly decreases I/O throughput, the latest I/O benchmarks I seen > from Randy (dbench/tiotest/bonnie/etc..) were still the fastest and it > included the lowlatency elevator patch. So it may not help latency but > it doesn't hurt in the numbers, at least not in the high end (that in > theory is the one that needs the overkill length in the I/O queue most). > > However it definitely helps latency for me and I had a number of > positive reports. > > Also make sure that you elvtune -r 0 -w 0 /dev/hda, also the journaling > may affect the latency so you can try with plain ext2 to be sure it's > not a fs issue. > > the lowlatency elevator patch may not be perfect but it definitely seems > to work better here. especially since there's no apparent throughput > loss, it makes lots of sense to keep it applied, or it would waste lots > of ram for apparently no gain. Andrea, It seems your "fix-pausing" patch is fixing a potential wakeup miss, right? (I looked quickly throught it). Could you explain me the problem its trying to fix and how? Its too late to fix that in 2.4.21 (rc5 is going out in hours). ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 18:33 ` Marcelo Tosatti @ 2003-05-27 18:39 ` Marc-Christian Petersen 2003-05-27 19:00 ` manish 2003-05-27 20:03 ` Andrea Arcangeli 2003-05-27 20:08 ` Chris Mason 2 siblings, 1 reply; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-27 18:39 UTC (permalink / raw) To: Marcelo Tosatti, Andrea Arcangeli Cc: linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III On Tuesday 27 May 2003 20:33, Marcelo Tosatti wrote: Hi Marcelo, > It seems your "fix-pausing" patch is fixing a potential wakeup > miss, right? (I looked quickly throught it). Could you explain me the > problem its trying to fix and how? Please have also a look here: http://hypermail.idiosynkrasia.net/linux-kernel/archived/2002/week45/0305.html ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 18:39 ` Marc-Christian Petersen @ 2003-05-27 19:00 ` manish 2003-05-27 19:01 ` Marcelo Tosatti 0 siblings, 1 reply; 142+ messages in thread From: manish @ 2003-05-27 19:00 UTC (permalink / raw) To: Marc-Christian Petersen Cc: Marcelo Tosatti, Andrea Arcangeli, linux-kernel, Carl-Daniel Hailfinger, Christian Klose, William Lee Irwin III Marc-Christian Petersen wrote: >On Tuesday 27 May 2003 20:33, Marcelo Tosatti wrote: > >Hi Marcelo, > >>It seems your "fix-pausing" patch is fixing a potential wakeup >>miss, right? (I looked quickly throught it). Could you explain me the >>problem its trying to fix and how? >> >Please have also a look here: > >http://hypermail.idiosynkrasia.net/linux-kernel/archived/2002/week45/0305.html > >ciao, Marc > Hello ! I applied the fix-pausing-2 patch to the 2.4.20 kernel. This time on, the stack trace: sys_write generic_file_write ext2_get_group_desc bread __wait_on_buffer schedule ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 19:00 ` manish @ 2003-05-27 19:01 ` Marcelo Tosatti 2003-05-27 19:09 ` manish 2003-05-27 19:12 ` manish 0 siblings, 2 replies; 142+ messages in thread From: Marcelo Tosatti @ 2003-05-27 19:01 UTC (permalink / raw) To: manish Cc: Marc-Christian Petersen, Andrea Arcangeli, linux-kernel, Carl-Daniel Hailfinger, Christian Klose, William Lee Irwin III On Tue, 27 May 2003, manish wrote: > Marc-Christian Petersen wrote: > > >On Tuesday 27 May 2003 20:33, Marcelo Tosatti wrote: > > > >Hi Marcelo, > > > >>It seems your "fix-pausing" patch is fixing a potential wakeup > >>miss, right? (I looked quickly throught it). Could you explain me the > >>problem its trying to fix and how? > >> > >Please have also a look here: > > > >http://hypermail.idiosynkrasia.net/linux-kernel/archived/2002/week45/0305.html > > > >ciao, Marc > > > Hello ! > > I applied the fix-pausing-2 patch to the 2.4.20 kernel. This time on, > the stack trace: > > sys_write > generic_file_write > ext2_get_group_desc > bread > __wait_on_buffer > schedule Huh? You mean bonnie still deadlocks or ? ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 19:01 ` Marcelo Tosatti @ 2003-05-27 19:09 ` manish 2003-05-27 19:12 ` manish 1 sibling, 0 replies; 142+ messages in thread From: manish @ 2003-05-27 19:09 UTC (permalink / raw) To: Marcelo Tosatti Cc: Marc-Christian Petersen, Andrea Arcangeli, linux-kernel, Carl-Daniel Hailfinger, Christian Klose, William Lee Irwin III Marcelo Tosatti wrote: > >On Tue, 27 May 2003, manish wrote: > >>Marc-Christian Petersen wrote: >> >>>On Tuesday 27 May 2003 20:33, Marcelo Tosatti wrote: >>> >>>Hi Marcelo, >>> >>>>It seems your "fix-pausing" patch is fixing a potential wakeup >>>>miss, right? (I looked quickly throught it). Could you explain me the >>>>problem its trying to fix and how? >>>> >>>Please have also a look here: >>> >>>http://hypermail.idiosynkrasia.net/linux-kernel/archived/2002/week45/0305.html >>> >>>ciao, Marc >>> >>Hello ! >> >>I applied the fix-pausing-2 patch to the 2.4.20 kernel. This time on, >>the stack trace: >> >>sys_write >>generic_file_write >>ext2_get_group_desc >>bread >>__wait_on_buffer >>schedule >> > >Huh? You mean bonnie still deadlocks or ? > Well, this is to the kernel that has the io_request_lock removed. The stock kernel (with the fix-pausing-2 patch) is running fine upto now. However, we will have to give it a few hrs of runtime. Thanks Manish ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 19:01 ` Marcelo Tosatti 2003-05-27 19:09 ` manish @ 2003-05-27 19:12 ` manish 2003-05-27 19:28 ` Marcelo Tosatti 1 sibling, 1 reply; 142+ messages in thread From: manish @ 2003-05-27 19:12 UTC (permalink / raw) To: Marcelo Tosatti Cc: Marc-Christian Petersen, Andrea Arcangeli, linux-kernel, Carl-Daniel Hailfinger, Christian Klose, William Lee Irwin III Marcelo Tosatti wrote: > >On Tue, 27 May 2003, manish wrote: > >>Marc-Christian Petersen wrote: >> >>>On Tuesday 27 May 2003 20:33, Marcelo Tosatti wrote: >>> >>>Hi Marcelo, >>> >>>>It seems your "fix-pausing" patch is fixing a potential wakeup >>>>miss, right? (I looked quickly throught it). Could you explain me the >>>>problem its trying to fix and how? >>>> >>>Please have also a look here: >>> >>>http://hypermail.idiosynkrasia.net/linux-kernel/archived/2002/week45/0305.html >>> >>>ciao, Marc >>> >>Hello ! >> >>I applied the fix-pausing-2 patch to the 2.4.20 kernel. This time on, >>the stack trace: >> >>sys_write >>generic_file_write >>ext2_get_group_desc >>bread >>__wait_on_buffer >>schedule >> > >Huh? You mean bonnie still deadlocks or ? > At the time the processes get stuck: [root@dyn-10-123-130-235 vm]# more /proc/meminfo total: used: free: shared: buffers: cached: Mem: 3709870080 3699126272 10743808 0 18313216 3531255808 Swap: 1077501952 0 1077501952 MemTotal: 3622920 kB MemFree: 10492 kB MemShared: 0 kB Buffers: 17884 kB Cached: 3448492 kB SwapCached: 0 kB Active: 25252 kB Inactive: 3445344 kB HighTotal: 2752512 kB HighFree: 2120 kB LowTotal: 870408 kB LowFree: 8372 kB SwapTotal: 1052248 kB SwapFree: 1052248 kB ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 19:12 ` manish @ 2003-05-27 19:28 ` Marcelo Tosatti 2003-05-27 19:34 ` manish 0 siblings, 1 reply; 142+ messages in thread From: Marcelo Tosatti @ 2003-05-27 19:28 UTC (permalink / raw) To: manish Cc: Marc-Christian Petersen, Andrea Arcangeli, linux-kernel, Carl-Daniel Hailfinger, Christian Klose, William Lee Irwin III On Tue, 27 May 2003, manish wrote: > Marcelo Tosatti wrote: > > > > >On Tue, 27 May 2003, manish wrote: > > > >>Marc-Christian Petersen wrote: > >> > >>>On Tuesday 27 May 2003 20:33, Marcelo Tosatti wrote: > >>> > >>>Hi Marcelo, > >>> > >>>>It seems your "fix-pausing" patch is fixing a potential wakeup > >>>>miss, right? (I looked quickly throught it). Could you explain me the > >>>>problem its trying to fix and how? > >>>> > >>>Please have also a look here: > >>> > >>>http://hypermail.idiosynkrasia.net/linux-kernel/archived/2002/week45/0305.html > >>> > >>>ciao, Marc > >>> > >>Hello ! > >> > >>I applied the fix-pausing-2 patch to the 2.4.20 kernel. This time on, > >>the stack trace: > >> > >>sys_write > >>generic_file_write > >>ext2_get_group_desc > >>bread > >>__wait_on_buffer > >>schedule > >> > > > >Huh? You mean bonnie still deadlocks or ? > > > At the time the processes get stuck: > > > [root@dyn-10-123-130-235 vm]# more /proc/meminfo > total: used: free: shared: buffers: cached: > Mem: 3709870080 3699126272 10743808 0 18313216 3531255808 > Swap: 1077501952 0 1077501952 > MemTotal: 3622920 kB > MemFree: 10492 kB > MemShared: 0 kB > Buffers: 17884 kB > Cached: 3448492 kB > SwapCached: 0 kB > Active: 25252 kB > Inactive: 3445344 kB > HighTotal: 2752512 kB > HighFree: 2120 kB > LowTotal: 870408 kB > LowFree: 8372 kB > SwapTotal: 1052248 kB > SwapFree: 1052248 kB > Ok, so just to confirm: You're still getting pauses with Andrea's patches but no hangs anymore? Correct? ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 19:28 ` Marcelo Tosatti @ 2003-05-27 19:34 ` manish 2003-05-27 20:20 ` Andrea Arcangeli 0 siblings, 1 reply; 142+ messages in thread From: manish @ 2003-05-27 19:34 UTC (permalink / raw) To: Marcelo Tosatti Cc: Marc-Christian Petersen, Andrea Arcangeli, linux-kernel, Carl-Daniel Hailfinger, Christian Klose, William Lee Irwin III Marcelo Tosatti wrote: > >On Tue, 27 May 2003, manish wrote: > >>Marcelo Tosatti wrote: >> >>>On Tue, 27 May 2003, manish wrote: >>> >>>>Marc-Christian Petersen wrote: >>>> >>>>>On Tuesday 27 May 2003 20:33, Marcelo Tosatti wrote: >>>>> >>>>>Hi Marcelo, >>>>> >>>>>>It seems your "fix-pausing" patch is fixing a potential wakeup >>>>>>miss, right? (I looked quickly throught it). Could you explain me the >>>>>>problem its trying to fix and how? >>>>>> >>>>>Please have also a look here: >>>>> >>>>>http://hypermail.idiosynkrasia.net/linux-kernel/archived/2002/week45/0305.html >>>>> >>>>>ciao, Marc >>>>> >>>>Hello ! >>>> >>>>I applied the fix-pausing-2 patch to the 2.4.20 kernel. This time on, >>>>the stack trace: >>>> >>>>sys_write >>>>generic_file_write >>>>ext2_get_group_desc >>>>bread >>>>__wait_on_buffer >>>>schedule >>>> >>>Huh? You mean bonnie still deadlocks or ? >>> >>At the time the processes get stuck: >> >> >>[root@dyn-10-123-130-235 vm]# more /proc/meminfo >> total: used: free: shared: buffers: cached: >>Mem: 3709870080 3699126272 10743808 0 18313216 3531255808 >>Swap: 1077501952 0 1077501952 >>MemTotal: 3622920 kB >>MemFree: 10492 kB >>MemShared: 0 kB >>Buffers: 17884 kB >>Cached: 3448492 kB >>SwapCached: 0 kB >>Active: 25252 kB >>Inactive: 3445344 kB >>HighTotal: 2752512 kB >>HighFree: 2120 kB >>LowTotal: 870408 kB >>LowFree: 8372 kB >>SwapTotal: 1052248 kB >>SwapFree: 1052248 kB >> > >Ok, so just to confirm: You're still getting pauses with Andrea's patches >but no hangs anymore? > >Correct? > Hi Marcelo, I have applied Andrea's patch to two kernels: 1. Stock 2.4.20 2. 2.4.20 with the io_request_lock removed. The tests on the first one are still going. The tests on the second one showed processes getting stuck for long times (> 5 minutes) and not paused ... Thanks Manish ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 19:34 ` manish @ 2003-05-27 20:20 ` Andrea Arcangeli 2003-05-27 20:25 ` Marc-Christian Petersen 0 siblings, 1 reply; 142+ messages in thread From: Andrea Arcangeli @ 2003-05-27 20:20 UTC (permalink / raw) To: manish Cc: Marcelo Tosatti, Marc-Christian Petersen, linux-kernel, Carl-Daniel Hailfinger, Christian Klose, William Lee Irwin III On Tue, May 27, 2003 at 12:34:38PM -0700, manish wrote: > Marcelo Tosatti wrote: > > > > >On Tue, 27 May 2003, manish wrote: > > > >>Marcelo Tosatti wrote: > >> > >>>On Tue, 27 May 2003, manish wrote: > >>> > >>>>Marc-Christian Petersen wrote: > >>>> > >>>>>On Tuesday 27 May 2003 20:33, Marcelo Tosatti wrote: > >>>>> > >>>>>Hi Marcelo, > >>>>> > >>>>>>It seems your "fix-pausing" patch is fixing a potential wakeup > >>>>>>miss, right? (I looked quickly throught it). Could you explain me the > >>>>>>problem its trying to fix and how? > >>>>>> > >>>>>Please have also a look here: > >>>>> > >>>>>http://hypermail.idiosynkrasia.net/linux-kernel/archived/2002/week45/0305.html > >>>>> > >>>>>ciao, Marc > >>>>> > >>>>Hello ! > >>>> > >>>>I applied the fix-pausing-2 patch to the 2.4.20 kernel. This time on, > >>>>the stack trace: > >>>> > >>>>sys_write > >>>>generic_file_write > >>>>ext2_get_group_desc > >>>>bread > >>>>__wait_on_buffer > >>>>schedule > >>>> > >>>Huh? You mean bonnie still deadlocks or ? > >>> > >>At the time the processes get stuck: > >> > >> > >>[root@dyn-10-123-130-235 vm]# more /proc/meminfo > >> total: used: free: shared: buffers: cached: > >>Mem: 3709870080 3699126272 10743808 0 18313216 3531255808 > >>Swap: 1077501952 0 1077501952 > >>MemTotal: 3622920 kB > >>MemFree: 10492 kB > >>MemShared: 0 kB > >>Buffers: 17884 kB > >>Cached: 3448492 kB > >>SwapCached: 0 kB > >>Active: 25252 kB > >>Inactive: 3445344 kB > >>HighTotal: 2752512 kB > >>HighFree: 2120 kB > >>LowTotal: 870408 kB > >>LowFree: 8372 kB > >>SwapTotal: 1052248 kB > >>SwapFree: 1052248 kB > >> > > > >Ok, so just to confirm: You're still getting pauses with Andrea's patches > >but no hangs anymore? > > > >Correct? > > > Hi Marcelo, > > I have applied Andrea's patch to two kernels: > > 1. Stock 2.4.20 > 2. 2.4.20 with the io_request_lock removed. > > The tests on the first one are still going. The tests on the second one > showed processes getting stuck for long times (> 5 minutes) and not > paused ... sorry if it's a dumb question but what is the "io_request_lock removed" thing? Hope you didn't delete any io_request_lock, if you did you can get worse things than crashes (i.e. mm/fs corruption). the pausing bug was a genuine race (quite innocent, if you could trigger a disk unplug you could recover from it) Andrea ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 20:20 ` Andrea Arcangeli @ 2003-05-27 20:25 ` Marc-Christian Petersen 2003-05-27 20:42 ` manish 0 siblings, 1 reply; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-27 20:25 UTC (permalink / raw) To: Andrea Arcangeli, manish Cc: Marcelo Tosatti, linux-kernel, Carl-Daniel Hailfinger, Christian Klose, William Lee Irwin III On Tuesday 27 May 2003 22:20, Andrea Arcangeli wrote: Hi Andrea, > > 1. Stock 2.4.20 > > 2. 2.4.20 with the io_request_lock removed. > > The tests on the first one are still going. The tests on the second one > > showed processes getting stuck for long times (> 5 minutes) and not > > paused ... > sorry if it's a dumb question but what is the "io_request_lock removed" > thing? Hope you didn't delete any io_request_lock, if you did you can > get worse things than crashes (i.e. mm/fs corruption). the pausing bug > was a genuine race (quite innocent, if you could trigger a disk unplug > you could recover from it) > > Andrea funny. I asked him the same ;) see his response: ----------------------------------------------------------------------- >what is this io_request_lock patch you are talking about? > >ciao, Marc > We made some changes to the 2.4.20 kernel to remove the io_request_lock and replace with queue_lock and host_lock. ----------------------------------------------------------------------- ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 20:25 ` Marc-Christian Petersen @ 2003-05-27 20:42 ` manish 2003-05-27 20:47 ` Andrea Arcangeli 0 siblings, 1 reply; 142+ messages in thread From: manish @ 2003-05-27 20:42 UTC (permalink / raw) To: Marc-Christian Petersen Cc: Andrea Arcangeli, Marcelo Tosatti, linux-kernel, Carl-Daniel Hailfinger, Christian Klose, William Lee Irwin III Marc-Christian Petersen wrote: >On Tuesday 27 May 2003 22:20, Andrea Arcangeli wrote: > >Hi Andrea, > > >>>1. Stock 2.4.20 >>>2. 2.4.20 with the io_request_lock removed. >>>The tests on the first one are still going. The tests on the second one >>>showed processes getting stuck for long times (> 5 minutes) and not >>>paused ... >>> >>sorry if it's a dumb question but what is the "io_request_lock removed" >>thing? Hope you didn't delete any io_request_lock, if you did you can >>get worse things than crashes (i.e. mm/fs corruption). the pausing bug >>was a genuine race (quite innocent, if you could trigger a disk unplug >>you could recover from it) >> >>Andrea >> >funny. I asked him the same ;) > >see his response: > >----------------------------------------------------------------------- > >>what is this io_request_lock patch you are talking about? >> >>ciao, Marc >> >We made some changes to the 2.4.20 kernel to remove the io_request_lock >and replace with queue_lock and host_lock. >----------------------------------------------------------------------- > >ciao, Marc > We made a change in the 2.4.20 kernel to remove the io_request_lock and replace with the host_lock and the queue_lock. Probably, not a right thing to do Thanks Manish ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 20:42 ` manish @ 2003-05-27 20:47 ` Andrea Arcangeli 2003-05-27 20:50 ` manish 0 siblings, 1 reply; 142+ messages in thread From: Andrea Arcangeli @ 2003-05-27 20:47 UTC (permalink / raw) To: manish Cc: Marc-Christian Petersen, Marcelo Tosatti, linux-kernel, Carl-Daniel Hailfinger, Christian Klose, William Lee Irwin III On Tue, May 27, 2003 at 01:42:32PM -0700, manish wrote: > Marc-Christian Petersen wrote: > > >On Tuesday 27 May 2003 22:20, Andrea Arcangeli wrote: > > > >Hi Andrea, > > > > > >>>1. Stock 2.4.20 > >>>2. 2.4.20 with the io_request_lock removed. > >>>The tests on the first one are still going. The tests on the second one > >>>showed processes getting stuck for long times (> 5 minutes) and not > >>>paused ... > >>> > >>sorry if it's a dumb question but what is the "io_request_lock removed" > >>thing? Hope you didn't delete any io_request_lock, if you did you can > >>get worse things than crashes (i.e. mm/fs corruption). the pausing bug > >>was a genuine race (quite innocent, if you could trigger a disk unplug > >>you could recover from it) > >> > >>Andrea > >> > >funny. I asked him the same ;) > > > >see his response: > > > >----------------------------------------------------------------------- > > > >>what is this io_request_lock patch you are talking about? > >> > >>ciao, Marc > >> > >We made some changes to the 2.4.20 kernel to remove the io_request_lock > >and replace with queue_lock and host_lock. > >----------------------------------------------------------------------- > > > >ciao, Marc > > > We made a change in the 2.4.20 kernel to remove the io_request_lock and > replace with the host_lock and the queue_lock. Probably, not a right > thing to do right you are, but never mind, only remeber e2fsck the fs before booting the box so you don't risk fs corruption later with the solid kernels. Andrea ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 20:47 ` Andrea Arcangeli @ 2003-05-27 20:50 ` manish 2003-05-27 21:05 ` Andrea Arcangeli 0 siblings, 1 reply; 142+ messages in thread From: manish @ 2003-05-27 20:50 UTC (permalink / raw) To: Andrea Arcangeli Cc: Marc-Christian Petersen, Marcelo Tosatti, linux-kernel, Carl-Daniel Hailfinger, Christian Klose, William Lee Irwin III Andrea Arcangeli wrote: >On Tue, May 27, 2003 at 01:42:32PM -0700, manish wrote: > >>Marc-Christian Petersen wrote: >> >>>On Tuesday 27 May 2003 22:20, Andrea Arcangeli wrote: >>> >>>Hi Andrea, >>> >>> >>>>>1. Stock 2.4.20 >>>>>2. 2.4.20 with the io_request_lock removed. >>>>>The tests on the first one are still going. The tests on the second one >>>>>showed processes getting stuck for long times (> 5 minutes) and not >>>>>paused ... >>>>> >>>>sorry if it's a dumb question but what is the "io_request_lock removed" >>>>thing? Hope you didn't delete any io_request_lock, if you did you can >>>>get worse things than crashes (i.e. mm/fs corruption). the pausing bug >>>>was a genuine race (quite innocent, if you could trigger a disk unplug >>>>you could recover from it) >>>> >>>>Andrea >>>> >>>funny. I asked him the same ;) >>> >>>see his response: >>> >>>----------------------------------------------------------------------- >>> >>>>what is this io_request_lock patch you are talking about? >>>> >>>>ciao, Marc >>>> >>>We made some changes to the 2.4.20 kernel to remove the io_request_lock >>>and replace with queue_lock and host_lock. >>>----------------------------------------------------------------------- >>> >>>ciao, Marc >>> >>We made a change in the 2.4.20 kernel to remove the io_request_lock and >>replace with the host_lock and the queue_lock. Probably, not a right >>thing to do >> > >right you are, but never mind, only remeber e2fsck the fs before >booting the box so you don't risk fs corruption later with the solid >kernels. > >Andrea > So, does it imply that we cannot remove the io_request_lock in 2.4 at all? Thanks Manish ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 20:50 ` manish @ 2003-05-27 21:05 ` Andrea Arcangeli 0 siblings, 0 replies; 142+ messages in thread From: Andrea Arcangeli @ 2003-05-27 21:05 UTC (permalink / raw) To: manish Cc: Marc-Christian Petersen, Marcelo Tosatti, linux-kernel, Carl-Daniel Hailfinger, Christian Klose, William Lee Irwin III On Tue, May 27, 2003 at 01:50:55PM -0700, manish wrote: > Andrea Arcangeli wrote: > > >On Tue, May 27, 2003 at 01:42:32PM -0700, manish wrote: > > > >>Marc-Christian Petersen wrote: > >> > >>>On Tuesday 27 May 2003 22:20, Andrea Arcangeli wrote: > >>> > >>>Hi Andrea, > >>> > >>> > >>>>>1. Stock 2.4.20 > >>>>>2. 2.4.20 with the io_request_lock removed. > >>>>>The tests on the first one are still going. The tests on the second one > >>>>>showed processes getting stuck for long times (> 5 minutes) and not > >>>>>paused ... > >>>>> > >>>>sorry if it's a dumb question but what is the "io_request_lock removed" > >>>>thing? Hope you didn't delete any io_request_lock, if you did you can > >>>>get worse things than crashes (i.e. mm/fs corruption). the pausing bug > >>>>was a genuine race (quite innocent, if you could trigger a disk unplug > >>>>you could recover from it) > >>>> > >>>>Andrea > >>>> > >>>funny. I asked him the same ;) > >>> > >>>see his response: > >>> > >>>----------------------------------------------------------------------- > >>> > >>>>what is this io_request_lock patch you are talking about? > >>>> > >>>>ciao, Marc > >>>> > >>>We made some changes to the 2.4.20 kernel to remove the io_request_lock > >>>and replace with queue_lock and host_lock. > >>>----------------------------------------------------------------------- > >>> > >>>ciao, Marc > >>> > >>We made a change in the 2.4.20 kernel to remove the io_request_lock and > >>replace with the host_lock and the queue_lock. Probably, not a right > >>thing to do > >> > > > >right you are, but never mind, only remeber e2fsck the fs before > >booting the box so you don't risk fs corruption later with the solid > >kernels. > > > >Andrea > > > So, does it imply that we cannot remove the io_request_lock in 2.4 at all? io_request_lock can be at most made per-device in 2.4, this is just the case in my tree for istance. Locks are there for a reason, unless you redesign the code to work more scalar, you can't just drop them and expect stuff to work. But the io_request_lock has nothing to do with both the hangs and the delays, it only hurts scalability if you've lots of devices and lots of cpus. Andrea ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 18:33 ` Marcelo Tosatti 2003-05-27 18:39 ` Marc-Christian Petersen @ 2003-05-27 20:03 ` Andrea Arcangeli 2003-05-27 20:08 ` Marcelo Tosatti 2003-05-27 20:08 ` Chris Mason 2 siblings, 1 reply; 142+ messages in thread From: Andrea Arcangeli @ 2003-05-27 20:03 UTC (permalink / raw) To: Marcelo Tosatti Cc: Marc-Christian Petersen, linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III [-- Attachment #1: Type: text/plain, Size: 2662 bytes --] On Tue, May 27, 2003 at 03:33:14PM -0300, Marcelo Tosatti wrote: > u > > On Tue, 27 May 2003, Andrea Arcangeli wrote: > > > On Tue, May 27, 2003 at 08:08:43PM +0200, Marc-Christian Petersen wrote: > > > On Tuesday 27 May 2003 19:57, Marcelo Tosatti wrote: > > > > > > Hi Marcelo, > > > > > > > > I do, people I know do also, numbers of those people only _I_ know are > > > > > about ~30. I've reported this problem over a year ago while 2.4.19-pre > > > > > time. > > > > Can you please try to reproduce it with -aa? > > > not again ;) > > > > > > I've tried almost all known kernel tree's around, every kernel has the same > > > effect. I even tried SuSE and Redhat Kernels. > > > > > > I've 'wasted' tons of time just find a solution for it. > > > > > > Andrea introduced, to address _exact_ this problem (pauses, stops, mouse is > > > dead etc.), his lowlatency elevator. Side effect: decreases i/o throughput, > > > > not exactly decreases I/O throughput, the latest I/O benchmarks I seen > > from Randy (dbench/tiotest/bonnie/etc..) were still the fastest and it > > included the lowlatency elevator patch. So it may not help latency but > > it doesn't hurt in the numbers, at least not in the high end (that in > > theory is the one that needs the overkill length in the I/O queue most). > > > > However it definitely helps latency for me and I had a number of > > positive reports. > > > > Also make sure that you elvtune -r 0 -w 0 /dev/hda, also the journaling > > may affect the latency so you can try with plain ext2 to be sure it's > > not a fs issue. > > > > the lowlatency elevator patch may not be perfect but it definitely seems > > to work better here. especially since there's no apparent throughput > > loss, it makes lots of sense to keep it applied, or it would waste lots > > of ram for apparently no gain. > > Andrea, > > It seems your "fix-pausing" patch is fixing a potential wakeup > miss, right? (I looked quickly throught it). Could you explain me the yes, not just one but multiple of them, all similar. lots of boxes were hanging in a weird manner until I found and fixed this glitch. > problem its trying to fix and how? I'm attaching the old email, it should have all the explanataions. but don't use that old patch (that was the first revision and it missed one last race in wait_for_request noticed by Chris or Andrew [or both?]), use this one instead (seems just the second revision, should be that one plus that last race fix): http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.21rc2aa1/9980_fix-pausing-2 thanks, > > Its too late to fix that in 2.4.21 (rc5 is going out in hours). Andrea [-- Attachment #2: Type: message/rfc822, Size: 18863 bytes --] [-- Attachment #2.1.1: Type: text/plain, Size: 7288 bytes --] I recently found and fixed a misterious hang that could hang the kernel with tasks in D state with the disk idle. We could reproduce very long hangs (several hours) with tasks in D state with reiserfs after some hour of some intensive load running (not cerberus, see below why), but it wasn't a reiserfs specific problem, reiserfs just happens to take the lock_super while doing the fsync_buffer_list and this leads kupdate to get stuck in the lock_super waiting a wait_on_buffer to return, so with kupdate stuck the background run_task_queue() doesn't run every 5 seconds anymore and in turn if there's a missing unplug somewhere it will lead to an hanging machine in wait_on_buffer for indefinite/infinite time (kind of deadlock, unless somebody can trigger a readpage or something that unplugs the disk queue, usually logging in with ssh fixed the problem). Increasing singificantly the kupdate interval would potentially lead to the same indefinite hang on a ext2 while running fsync. For some time I didn't even consider the possibility of wait_on_buffer being the problem, there are over 700 patches applied in the kernel where we could reproduce so for some time I was looking at everything but the buggy place. After ruling out various other bits (scheduler fixes/compiler/fsync corruption fixes etc..) I actually realized the problem is a longstanding design locking problem in wait_on_buffer (then I found the same problem in wait_on_page and yesterday Chris found a similar problem in get_request_wait too, the get_request_wait is not exactly the same issue, but it's quite similar and it could lead to exactly the same hangs). Probably nobody noticed this yet because normally with ext2/ext3 these hangs happens in all the machines but they are resolved after a disk idle time of 2.5 seconds in mean and they happens once in a while, normally people would see mean delays of 2.5 sec caming from the datacenter and they would think it's a normal I/O congestion or the elevator or something during the fsync on ext2. Furthmore as Chris pointed out with very intensive load bdflush would be usually running in the background, this race can trigger only with mid writepage loads when bdflush/pdflush has no reason to run. We also have the lowlatency fixes (they're fixes) inside submit_bh so we probably opened a larger window for the race to trigger than mainline. Chris also double checked the bug we were facing was really this race by introducing a delay in submit_bh to make it reproducible in a reasonable amount of time. the race looks like this: CPU0 CPU1 ----------------- ------------------------ reiserfs_writepage lock_buffer() fsync_buffers_list() under lock_super() wait_on_buffer() run_task_queue(&tq_disk) -> noop schedule() <- hang with lock_super acquired submit_bh() /* don't unplug here */ This example is reiserfs specific but any wait_on_buffer can definitely hang indefinitely against any concurrent ll_rw_block or submit_bh (even on UP since submit_bh is a blocking operation and in particular with the lowlat fixes). There's no big kernel lock anymore serializing wait_on_buffer/ll_rw_block. This design locking problem was introduced with the removal of the BKL from wait_on_buffer/wait_on_page/ll_rw_blk during one of the 2.3 scalability efforts. So any 2.4 kernel out there is affected by this race. in short the problem here is that the wait_on_"something" has no clue if the locked "something" is just inserted in the I/O queue and visible to the device, so it has no clue if the run_task_queue may become a noop or if it may affect the "something". And the writer side that executes the submit_bh won't unplug the queue rightfully (to allow merging and boost performance until somebody actually asks for the I/O completed ASAP). I fixed the race by simply doing a wakeup of any waiter after any submit_bh/submit_bio that left stuff pending in the I/O queue. So if the race triggers now the wait_on_something will get a wakeup and in turn it will trigger the unplug again closing the window for the race. This is fixing the problem in practice and it seems the best fix at least for 2.4, and I don't see any potential performance regression, so I don't feel the need of anything more complicated than this, the race triggers once every several hours only under some special workload. You may try to avoid loading the waitqueue head cacheline during submit_bh, but at least for 2.4 I don't think it worth the complexity and it's an I/O path anyways so it's certainly not critical. The problem noticed by Chris with get_request_wait is similar, the unplugging was run before adding the task to the waitqueue, so the unplug could free the requests and somebody else could allocate the freed requests without unplugging the queue afterwards. I fixed it simply by unplugging the queue just before schedule(). That was really a more genuine bug than the other subtle ones. With get_request_wait the fix is so simple because we deal with entities that are guaranteed to be affected by the queue unplug always (they're the I/O requests), this isn't the case with the locked bh or locked/writeback pages, that was infact the wrong assumption that allowed the other races to trigger in the first place. while doing these fixes I noticed various other bits: 1) in general the blk_run_queues()/run_task_queue()/sync_page should always run just before schedule(), it's pointless to unplug anything if we don't run schedule (ultramicrooptimization) 2) in 2.4 the run_task_queue() in wait_on_buffer could have its TQ_ACTIVE executed inside the add_wait_queue critical section since spin_unlock has inclusive semantics (literally speculative reads can pass the spin_unlock even on x86/x86-64) 3) the __blk_put_request/blkdev_release_request was increasing the count and reading the waitqueue contents without even a barrier() for the asm layer, it needs an smp_mb() in between to serialize against get_request_wait that runs locklessy I did two patches one for 2.4.20rc1 and one for 2.5.47 (sorry no bk tree here, I will try to make bitdropper.py available shortly so I can access the new info encoded in proprietary form too) that will address all these races. However 2.5 should be analysed further, I didn't search too hard for all the possible places that could have this race in 2.5, I searched hard in 2.4 and I only addressed all the same problems in 2.5. The only bit I think could be problematic in 2.4 is the nfs specualtive I/O, the reason nfs is implementing a sync_page in the first place. That may have the same race, I heard infact of some report with nfs hung in wait_on_page, and I wonder if this could explain it too. I assume the fs maintainers will take care of checking their fs for missing wakeups of page waiters in 2.4 and 2.5 now that the problem is well known. http://www.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.20rc1/fix-pausing-1 http://www.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.5/2.5.47/fix-pausing-1 they're attached to this email too since they're tiny. Andrea [-- Attachment #2.1.2: fix-pausing-1 --] [-- Type: text/plain, Size: 5359 bytes --] diff -urNp 2.4.20rc1/drivers/block/ll_rw_blk.c hangs-2.4/drivers/block/ll_rw_blk.c --- 2.4.20rc1/drivers/block/ll_rw_blk.c Sat Nov 2 19:45:33 2002 +++ hangs-2.4/drivers/block/ll_rw_blk.c Tue Nov 12 02:18:35 2002 @@ -590,12 +590,20 @@ static struct request *__get_request_wai register struct request *rq; DECLARE_WAITQUEUE(wait, current); - generic_unplug_device(q); add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); do { set_current_state(TASK_UNINTERRUPTIBLE); - if (q->rq[rw].count == 0) + if (q->rq[rw].count == 0) { + /* + * All we care about is not to stall if any request + * is been released after we set TASK_UNINTERRUPTIBLE. + * This is the most efficient place to unplug the queue + * in case we hit the race and we can get the request + * without waiting. + */ + generic_unplug_device(q); schedule(); + } spin_lock_irq(&io_request_lock); rq = get_request(q, rw); spin_unlock_irq(&io_request_lock); @@ -829,9 +837,11 @@ void blkdev_release_request(struct reque */ if (q) { list_add(&req->queue, &q->rq[rw].free); - if (++q->rq[rw].count >= q->batch_requests && - waitqueue_active(&q->wait_for_requests[rw])) - wake_up(&q->wait_for_requests[rw]); + if (++q->rq[rw].count >= q->batch_requests) { + smp_mb(); + if (waitqueue_active(&q->wait_for_requests[rw])) + wake_up(&q->wait_for_requests[rw]); + } } } @@ -1200,6 +1210,11 @@ void submit_bh(int rw, struct buffer_hea generic_make_request(rw, bh); + /* fix race condition with wait_on_buffer() */ + smp_mb(); /* spin_unlock may have inclusive semantics */ + if (waitqueue_active(&bh->b_wait)) + wake_up(&bh->b_wait); + switch (rw) { case WRITE: kstat.pgpgout += count; diff -urNp 2.4.20rc1/fs/buffer.c hangs-2.4/fs/buffer.c --- 2.4.20rc1/fs/buffer.c Sat Nov 2 19:45:40 2002 +++ hangs-2.4/fs/buffer.c Tue Nov 12 02:17:56 2002 @@ -153,10 +153,23 @@ void __wait_on_buffer(struct buffer_head get_bh(bh); add_wait_queue(&bh->b_wait, &wait); do { - run_task_queue(&tq_disk); set_task_state(tsk, TASK_UNINTERRUPTIBLE); if (!buffer_locked(bh)) break; + /* + * We must read tq_disk in TQ_ACTIVE after the + * add_wait_queue effect is visible to other cpus. + * We could unplug some line above it wouldn't matter + * but we can't do that right after add_wait_queue + * without an smp_mb() in between because spin_unlock + * has inclusive semantics. + * Doing it here is the most efficient place so we + * don't do a suprious unplug if we get a racy + * wakeup that make buffer_locked to return 0, and + * doing it here avoids an explicit smp_mb() we + * rely on the implicit one in set_task_state. + */ + run_task_queue(&tq_disk); schedule(); } while (buffer_locked(bh)); tsk->state = TASK_RUNNING; @@ -1508,6 +1521,9 @@ static int __block_write_full_page(struc /* Done - end_buffer_io_async will unlock */ SetPageUptodate(page); + + wakeup_page_waiters(page); + return 0; out: @@ -1539,6 +1555,7 @@ out: } while (bh != head); if (need_unlock) UnlockPage(page); + wakeup_page_waiters(page); return err; } @@ -1755,6 +1772,8 @@ int block_read_full_page(struct page *pa else submit_bh(READ, bh); } + + wakeup_page_waiters(page); return 0; } @@ -2368,6 +2387,7 @@ int brw_page(int rw, struct page *page, submit_bh(rw, bh); bh = next; } while (bh != head); + wakeup_page_waiters(page); return 0; } diff -urNp 2.4.20rc1/fs/reiserfs/inode.c hangs-2.4/fs/reiserfs/inode.c --- 2.4.20rc1/fs/reiserfs/inode.c Sat Nov 2 19:45:46 2002 +++ hangs-2.4/fs/reiserfs/inode.c Tue Nov 12 02:17:56 2002 @@ -1993,6 +1993,7 @@ static int reiserfs_write_full_page(stru */ if (nr) { submit_bh_for_writepage(arr, nr) ; + wakeup_page_waiters(page); } else { UnlockPage(page) ; } diff -urNp 2.4.20rc1/include/linux/pagemap.h hangs-2.4/include/linux/pagemap.h --- 2.4.20rc1/include/linux/pagemap.h Sat Nov 2 19:45:48 2002 +++ hangs-2.4/include/linux/pagemap.h Tue Nov 12 04:35:52 2002 @@ -97,6 +97,8 @@ static inline void wait_on_page(struct p ___wait_on_page(page); } +extern void wakeup_page_waiters(struct page * page); + /* * Returns locked page at given index in given cache, creating it if needed. */ diff -urNp 2.4.20rc1/kernel/ksyms.c hangs-2.4/kernel/ksyms.c --- 2.4.20rc1/kernel/ksyms.c Sat Nov 2 19:45:48 2002 +++ hangs-2.4/kernel/ksyms.c Tue Nov 12 04:36:25 2002 @@ -293,6 +293,7 @@ EXPORT_SYMBOL(filemap_fdatasync); EXPORT_SYMBOL(filemap_fdatawait); EXPORT_SYMBOL(lock_page); EXPORT_SYMBOL(unlock_page); +EXPORT_SYMBOL(wakeup_page_waiters); /* device registration */ EXPORT_SYMBOL(register_chrdev); diff -urNp 2.4.20rc1/mm/filemap.c hangs-2.4/mm/filemap.c --- 2.4.20rc1/mm/filemap.c Sat Nov 2 19:45:48 2002 +++ hangs-2.4/mm/filemap.c Tue Nov 12 04:35:40 2002 @@ -909,6 +909,20 @@ void lock_page(struct page *page) } /* + * This must be called after every submit_bh with end_io + * callbacks that would result into the blkdev layer waking + * up the page after a queue unplug. + */ +void wakeup_page_waiters(struct page * page) +{ + wait_queue_head_t * head; + + head = page_waitqueue(page); + if (waitqueue_active(head)) + wake_up(head); +} + +/* * a rather lightweight function, finding and getting a reference to a * hashed page atomically. */ [-- Attachment #2.1.3: fix-pausing-1 --] [-- Type: text/plain, Size: 5331 bytes --] diff -urNp 2.5.47/drivers/block/ll_rw_blk.c hangs-2.5/drivers/block/ll_rw_blk.c --- 2.5.47/drivers/block/ll_rw_blk.c Tue Nov 12 01:59:41 2002 +++ hangs-2.5/drivers/block/ll_rw_blk.c Tue Nov 12 02:37:42 2002 @@ -1281,12 +1281,13 @@ static struct request *get_request_wait( spin_lock_prefetch(q->queue_lock); - generic_unplug_device(q); do { prepare_to_wait_exclusive(&rl->wait, &wait, TASK_UNINTERRUPTIBLE); - if (!rl->count) + if (!rl->count){ + generic_unplug_device(q); io_schedule(); + } finish_wait(&rl->wait, &wait); spin_lock_irq(q->queue_lock); rq = get_request(q, rw); @@ -1487,8 +1488,11 @@ void __blk_put_request(request_queue_t * rl->count++; if (rl->count >= queue_congestion_off_threshold()) clear_queue_congested(q, rw); - if (rl->count >= batch_requests && waitqueue_active(&rl->wait)) - wake_up(&rl->wait); + if (rl->count >= batch_requests) { + smp_mb(); + if (waitqueue_active(&rl->wait)) + wake_up(&rl->wait); + } } } diff -urNp 2.5.47/fs/buffer.c hangs-2.5/fs/buffer.c --- 2.5.47/fs/buffer.c Tue Nov 12 01:59:42 2002 +++ hangs-2.5/fs/buffer.c Tue Nov 12 02:47:46 2002 @@ -135,9 +135,10 @@ void __wait_on_buffer(struct buffer_head get_bh(bh); do { prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); - blk_run_queues(); - if (buffer_locked(bh)) + if (buffer_locked(bh)) { + blk_run_queues(); schedule(); + } } while (buffer_locked(bh)); put_bh(bh); finish_wait(wqh, &wait); @@ -1727,7 +1728,8 @@ done: if (uptodate) SetPageUptodate(page); end_page_writeback(page); - } + } else + wakeup_page_waiters(page); if (err == 0) return ret; return err; @@ -2011,6 +2013,7 @@ int block_read_full_page(struct page *pa else submit_bh(READ, bh); } + wakeup_page_waiters(page); return 0; } @@ -2315,6 +2318,8 @@ static int end_bio_bh_io_sync(struct bio int submit_bh(int rw, struct buffer_head * bh) { struct bio *bio; + int ret; + wait_queue_head_t *wqh = bh_waitq_head(bh); BUG_ON(!buffer_locked(bh)); BUG_ON(!buffer_mapped(bh)); @@ -2348,7 +2353,13 @@ int submit_bh(int rw, struct buffer_head bio->bi_end_io = end_bio_bh_io_sync; bio->bi_private = bh; - return submit_bio(rw, bio); + ret = submit_bio(rw, bio); + + smp_mb(); /* spin_unlock may have inclusive semantics */ + if (waitqueue_active(wqh)) + wake_up(wqh); + + return ret; } /** diff -urNp 2.5.47/fs/reiserfs/inode.c hangs-2.5/fs/reiserfs/inode.c --- 2.5.47/fs/reiserfs/inode.c Thu Oct 31 01:42:25 2002 +++ hangs-2.5/fs/reiserfs/inode.c Tue Nov 12 02:50:47 2002 @@ -1987,6 +1987,7 @@ static int reiserfs_write_full_page(stru */ if (nr) { submit_bh_for_writepage(arr, nr) ; + wakeup_page_waiters(page); } else { end_page_writeback(page) ; } diff -urNp 2.5.47/include/linux/pagemap.h hangs-2.5/include/linux/pagemap.h --- 2.5.47/include/linux/pagemap.h Tue Nov 12 01:59:43 2002 +++ hangs-2.5/include/linux/pagemap.h Tue Nov 12 02:45:27 2002 @@ -122,4 +122,7 @@ static inline void wait_on_page_writebac } extern void end_page_writeback(struct page *page); + +extern void wakeup_page_waiters(struct page * page); + #endif /* _LINUX_PAGEMAP_H */ diff -urNp 2.5.47/mm/filemap.c hangs-2.5/mm/filemap.c --- 2.5.47/mm/filemap.c Tue Nov 12 01:59:43 2002 +++ hangs-2.5/mm/filemap.c Tue Nov 12 02:44:59 2002 @@ -272,9 +272,10 @@ void wait_on_page_bit(struct page *page, do { prepare_to_wait(waitqueue, &wait, TASK_UNINTERRUPTIBLE); - sync_page(page); - if (test_bit(bit_nr, &page->flags)) + if (test_bit(bit_nr, &page->flags)) { + sync_page(page); io_schedule(); + } } while (test_bit(bit_nr, &page->flags)); finish_wait(waitqueue, &wait); } @@ -336,15 +337,30 @@ void __lock_page(struct page *page) while (TestSetPageLocked(page)) { prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); - sync_page(page); - if (PageLocked(page)) + if (PageLocked(page)) { + sync_page(page); io_schedule(); + } } finish_wait(wqh, &wait); } EXPORT_SYMBOL(__lock_page); /* + * This must be called after every submit_bh with end_io + * callbacks that would result into the blkdev layer waking + * up the page after a queue unplug. + */ +void wakeup_page_waiters(struct page * page) +{ + wait_queue_head_t * wqh; + + wqh = page_waitqueue(page); + if (waitqueue_active(wqh)) + wake_up(wqh); +} + +/* * a rather lightweight function, finding and getting a reference to a * hashed page atomically. */ diff -urNp 2.5.47/mm/page_io.c hangs-2.5/mm/page_io.c --- 2.5.47/mm/page_io.c Thu Oct 31 01:41:56 2002 +++ hangs-2.5/mm/page_io.c Tue Nov 12 02:50:12 2002 @@ -104,6 +104,7 @@ int swap_writepage(struct page *page) SetPageWriteback(page); unlock_page(page); submit_bio(WRITE, bio); + wakeup_page_waiters(page); out: return ret; } @@ -121,6 +122,7 @@ int swap_readpage(struct file *file, str } inc_page_state(pswpin); submit_bio(READ, bio); + wakeup_page_waiters(page); out: return ret; } --- hangs-2.5/kernel/ksyms.c.~1~ Tue Nov 12 01:59:43 2002 +++ hangs-2.5/kernel/ksyms.c Tue Nov 12 04:36:37 2002 @@ -336,6 +336,7 @@ EXPORT_SYMBOL(filemap_fdatawrite); EXPORT_SYMBOL(filemap_fdatawait); EXPORT_SYMBOL(lock_page); EXPORT_SYMBOL(unlock_page); +EXPORT_SYMBOL(wakeup_page_waiters); /* device registration */ EXPORT_SYMBOL(register_chrdev); ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 20:03 ` Andrea Arcangeli @ 2003-05-27 20:08 ` Marcelo Tosatti 2003-05-27 20:25 ` Andrea Arcangeli 0 siblings, 1 reply; 142+ messages in thread From: Marcelo Tosatti @ 2003-05-27 20:08 UTC (permalink / raw) To: Andrea Arcangeli Cc: Marc-Christian Petersen, linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III On Tue, 27 May 2003, Andrea Arcangeli wrote: > > It seems your "fix-pausing" patch is fixing a potential wakeup > > miss, right? (I looked quickly throught it). Could you explain me the > > yes, not just one but multiple of them, all similar. lots of boxes were > hanging in a weird manner until I found and fixed this glitch. > > > problem its trying to fix and how? > > I'm attaching the old email, it should have all the explanataions. > > but don't use that old patch (that was the first revision and it missed > one last race in wait_for_request noticed by Chris or Andrew [or > both?]), use this one instead (seems just the second revision, should be > that one plus that last race fix): > > http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.21rc2aa1/9980_fix-pausing-2 I wonder if the additional wakeups result in performance degradation (not that it matters much in case there is no other way to fix the problem). But anyway I would like to have some numbers with/without the patch. Do you have them ? ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 20:08 ` Marcelo Tosatti @ 2003-05-27 20:25 ` Andrea Arcangeli 2003-05-27 22:18 ` Andrew Morton 0 siblings, 1 reply; 142+ messages in thread From: Andrea Arcangeli @ 2003-05-27 20:25 UTC (permalink / raw) To: Marcelo Tosatti Cc: Marc-Christian Petersen, linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III On Tue, May 27, 2003 at 05:08:38PM -0300, Marcelo Tosatti wrote: > > > On Tue, 27 May 2003, Andrea Arcangeli wrote: > > > > It seems your "fix-pausing" patch is fixing a potential wakeup > > > miss, right? (I looked quickly throught it). Could you explain me the > > > > yes, not just one but multiple of them, all similar. lots of boxes were > > hanging in a weird manner until I found and fixed this glitch. > > > > > problem its trying to fix and how? > > > > I'm attaching the old email, it should have all the explanataions. > > > > but don't use that old patch (that was the first revision and it missed > > one last race in wait_for_request noticed by Chris or Andrew [or > > both?]), use this one instead (seems just the second revision, should be > > that one plus that last race fix): > > > > http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.21rc2aa1/9980_fix-pausing-2 > > I wonder if the additional wakeups result in performance degradation (not > that it matters much in case there is no other way to fix the problem). in theory yes. > > But anyway I would like to have some numbers with/without the patch. > > Do you have them ? Hmm, in bigbox.html we should find the difference of the timings before/after, and I recall it wasn't measurable. I can search for it on Thu if you want the exact numbers. However the last numbers from Randy showed my tree going faster than 2.5 with bonnie and tiotest so I think we don't need to worry and I would probably not fix it in a different way in 2.4 even if it would mean a 1% degradation. When it was shipped there was no time to measure any degradation but the problem it fix is so severe that we never had any doubt if to include it or not ;). Andrea ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 20:25 ` Andrea Arcangeli @ 2003-05-27 22:18 ` Andrew Morton 2003-05-27 22:38 ` Andrea Arcangeli 0 siblings, 1 reply; 142+ messages in thread From: Andrew Morton @ 2003-05-27 22:18 UTC (permalink / raw) To: Andrea Arcangeli Cc: marcelo, m.c.p, linux-kernel, c-d.hailfinger.kernel.2003, manish, christian.klose, wli Andrea Arcangeli <andrea@suse.de> wrote: > > However the last numbers from Randy showed my tree going faster than 2.5 > with bonnie and tiotest so I think we don't need to worry and I would > probably not fix it in a different way in 2.4 even if it would mean a 1% > degradation. That could be because -aa quadruples the size of the VM readahead window. Changes such as that should be removed when assessing the performance impact of this particular patch. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 22:18 ` Andrew Morton @ 2003-05-27 22:38 ` Andrea Arcangeli 2003-05-27 22:40 ` Andrew Morton 0 siblings, 1 reply; 142+ messages in thread From: Andrea Arcangeli @ 2003-05-27 22:38 UTC (permalink / raw) To: Andrew Morton Cc: marcelo, m.c.p, linux-kernel, c-d.hailfinger.kernel.2003, manish, christian.klose, wli On Tue, May 27, 2003 at 03:18:30PM -0700, Andrew Morton wrote: > Andrea Arcangeli <andrea@suse.de> wrote: > > > > However the last numbers from Randy showed my tree going faster than 2.5 > > with bonnie and tiotest so I think we don't need to worry and I would > > probably not fix it in a different way in 2.4 even if it would mean a 1% > > degradation. > > That could be because -aa quadruples the size of the VM readahead window. > > Changes such as that should be removed when assessing the performance > impact of this particular patch. I understand that was a generic benchmark against 2.5, not meant to evaluate the effect of the fixed readahead (see the name of the patch "readahead-got-broken-somehwere"). I don't see any good reason why should Randy cripple down my tree before benchmarking against 2.5? if something it's ok to apply some of my patches to 2.5, that's great, the other way around not IMHO. Andrea ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 22:38 ` Andrea Arcangeli @ 2003-05-27 22:40 ` Andrew Morton 2003-05-27 22:58 ` Andrea Arcangeli 0 siblings, 1 reply; 142+ messages in thread From: Andrew Morton @ 2003-05-27 22:40 UTC (permalink / raw) To: Andrea Arcangeli Cc: marcelo, m.c.p, linux-kernel, c-d.hailfinger.kernel.2003, manish, christian.klose, wli Andrea Arcangeli <andrea@suse.de> wrote: > > On Tue, May 27, 2003 at 03:18:30PM -0700, Andrew Morton wrote: > > Andrea Arcangeli <andrea@suse.de> wrote: > > > > > > However the last numbers from Randy showed my tree going faster than 2.5 > > > with bonnie and tiotest so I think we don't need to worry and I would > > > probably not fix it in a different way in 2.4 even if it would mean a 1% > > > degradation. > > > > That could be because -aa quadruples the size of the VM readahead window. > > > > Changes such as that should be removed when assessing the performance > > impact of this particular patch. > > I understand that was a generic benchmark against 2.5, not meant to > evaluate the effect of the fixed readahead (see the name of the patch > "readahead-got-broken-somehwere"). I don't see any good reason why > should Randy cripple down my tree before benchmarking against 2.5? if > something it's ok to apply some of my patches to 2.5, that's great, the > other way around not IMHO. > No. What I am saying is that evaluation of the effect of an IO scheduler change cannot be performed when there is a 4:1 change in the readhead window present in the same tree. ie: we cannot conclude anything about the effect of the IO scheduler change from Randy's numbers. Too many variables. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 22:40 ` Andrew Morton @ 2003-05-27 22:58 ` Andrea Arcangeli 0 siblings, 0 replies; 142+ messages in thread From: Andrea Arcangeli @ 2003-05-27 22:58 UTC (permalink / raw) To: Andrew Morton Cc: marcelo, m.c.p, linux-kernel, c-d.hailfinger.kernel.2003, manish, christian.klose, wli On Tue, May 27, 2003 at 03:40:49PM -0700, Andrew Morton wrote: > Andrea Arcangeli <andrea@suse.de> wrote: > > > > On Tue, May 27, 2003 at 03:18:30PM -0700, Andrew Morton wrote: > > > Andrea Arcangeli <andrea@suse.de> wrote: > > > > > > > > However the last numbers from Randy showed my tree going faster than 2.5 > > > > with bonnie and tiotest so I think we don't need to worry and I would > > > > probably not fix it in a different way in 2.4 even if it would mean a 1% > > > > degradation. > > > > > > That could be because -aa quadruples the size of the VM readahead window. > > > > > > Changes such as that should be removed when assessing the performance > > > impact of this particular patch. > > > > I understand that was a generic benchmark against 2.5, not meant to > > evaluate the effect of the fixed readahead (see the name of the patch > > "readahead-got-broken-somehwere"). I don't see any good reason why > > should Randy cripple down my tree before benchmarking against 2.5? if > > something it's ok to apply some of my patches to 2.5, that's great, the > > other way around not IMHO. > > > > No. > > What I am saying is that evaluation of the effect of an IO scheduler change > cannot be performed when there is a 4:1 change in the readhead window present > in the same tree. > > ie: we cannot conclude anything about the effect of the IO scheduler change > from Randy's numbers. Too many variables. an accurate evaluation can't be made from such comparison, but I never claimed that to be an accurate evaluation, I just said we don't need to worry, == "can't be too bad". I just said it can't be too bad. and this is true, you even admit that a readahead change for sure has more impact than whatever change the fix-pausing generated. That's all I meant. Can't be too bad. the fact mainline doesn't do readahead properly is much worse thing than whatever slowdown can be generated by the fix pausing. Furthmore I said we can deduce the accurate numbers from bigbox.html, with very minor changes (not 2.4 vs 2.5) that as well shows the fix for the deadlock not measurable as far as I can tell. Andrea ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 18:33 ` Marcelo Tosatti 2003-05-27 18:39 ` Marc-Christian Petersen 2003-05-27 20:03 ` Andrea Arcangeli @ 2003-05-27 20:08 ` Chris Mason 2 siblings, 0 replies; 142+ messages in thread From: Chris Mason @ 2003-05-27 20:08 UTC (permalink / raw) To: Marcelo Tosatti Cc: Andrea Arcangeli, Marc-Christian Petersen, linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III On Tue, 2003-05-27 at 14:33, Marcelo Tosatti wrote: > Andrea, > > It seems your "fix-pausing" patch is fixing a potential wakeup > miss, right? (I looked quickly throught it). Could you explain me the > problem its trying to fix and how? > > Its too late to fix that in 2.4.21 (rc5 is going out in hours). The bug report seems to be on ext2, and on a box with 3.5GB of ram and 4G of dirty data. So, I don't think he is hitting the fix-pausing bug, which needs just the right set of conditions to miss unplugs: 1) bdflush can't be awake, so the percentage of dirty buffers has to be somewhat low. Otherwise bdflush will trigger unplugs. 2) kupdate needs to be stuck waiting on the super lock, otherwise kupdate would be triggering unplugs 2a) Some process needs to be calling wait_on_buffer() with the super lock held. This makes it pretty much impossible to trigger on ext2 without using O_SYNC mode. 3) You've got to race in __wait_on_buffer (cut n' paste from an old mail from Andrea) CPU0 CPU1 ----------------- ------------------------ reiserfs_writepage lock_buffer() fsync_buffers_list() under lock_super() wait_on_buffer() run_task_queue(&tq_disk) -> noop schedule() <- hang with lock_super acquired submit_bh() /* don't unplug here */ With ext3, you can trigger with two procs, it gets much easier if you toss a schedule() into submit_bh(), right before generic_make_request. reiserfs + the data logging patches is easier to trigger and produces longer pauses. For ext3: A: while(1) sync B: while(1) write(fd, 8k); fsync(fd); ftruncate(fd, 0); The idea behind proc B is to increase the chances the sync and the fsync are trying to write and wait on the same buffer. ext3 is hung on a metadata block, while it tries to get write access to the block before logging it. This ends up calling wait_on_buffer with the super held while in proc B, while proc A is in sync flushing the metadata block. I trigged the hang in ext3 during block allocation, so the ftruncate makes sure ext3 is constantly allocating blocks (and always dirtying the same bitmap/direct block). It isn't a perfect reproduction of the hang, because in ext3 kjournald wakes up every once and a while (~30 seconds or more) and kicks the transaction. But, with more procs running, someone could be waiting with the journal lock held, which would keep kjournald from fixing things. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 18:25 ` Andrea Arcangeli 2003-05-27 18:33 ` Marcelo Tosatti @ 2003-05-27 18:35 ` Marc-Christian Petersen 2003-05-27 20:10 ` Andrea Arcangeli 1 sibling, 1 reply; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-27 18:35 UTC (permalink / raw) To: Andrea Arcangeli Cc: Marcelo Tosatti, linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III On Tuesday 27 May 2003 20:25, Andrea Arcangeli wrote: Hi Andrea, > not exactly decreases I/O throughput, the latest I/O benchmarks I seen it decreases performance. I've seen this, Con also saw this (well it's better than the 'nr_requests = 4' change ;) but mouse stops are still there. > from Randy (dbench/tiotest/bonnie/etc..) were still the fastest and it > included the lowlatency elevator patch. So it may not help latency but > it doesn't hurt in the numbers, at least not in the high end (that in > theory is the one that needs the overkill length in the I/O queue most). I agree with the last sentence, in theory, but practice showed something different (about 10% to 15% performance decrease) But I am quite sure that this depends on your machine/hardware. Using IDE instead of SCSI for example. > However it definitely helps latency for me and I had a number of > positive reports. It helps but it's not as good as 2.4.18 stock. > Also make sure that you elvtune -r 0 -w 0 /dev/hda, also the journaling I also tried that. > may affect the latency so you can try with plain ext2 to be sure it's > not a fs issue. Sure, I did this too. FS independent, where ReiserFS is still the best for this scenario with the most few pauses than any other FS (ext2, ext3, ...) But for desktop usage: not acceptable! No way, No go! > the lowlatency elevator patch may not be perfect but it definitely seems > to work better here. especially since there's no apparent throughput > loss, it makes lots of sense to keep it applied, or it would waste lots > of ram for apparently no gain. hehe, well wasting RAM for no gain is my next part on my todo ;) (cache everything even if there is no RAM for example, well but this is not the point in this thread) ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 18:35 ` Marc-Christian Petersen @ 2003-05-27 20:10 ` Andrea Arcangeli 2003-05-27 20:24 ` Marc-Christian Petersen 0 siblings, 1 reply; 142+ messages in thread From: Andrea Arcangeli @ 2003-05-27 20:10 UTC (permalink / raw) To: Marc-Christian Petersen Cc: Marcelo Tosatti, linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III Hi, On Tue, May 27, 2003 at 08:35:33PM +0200, Marc-Christian Petersen wrote: > On Tuesday 27 May 2003 20:25, Andrea Arcangeli wrote: > > Hi Andrea, > > > not exactly decreases I/O throughput, the latest I/O benchmarks I seen > it decreases performance. I've seen this, Con also saw this (well it's better > than the 'nr_requests = 4' change ;) but mouse stops are still there. > > > from Randy (dbench/tiotest/bonnie/etc..) were still the fastest and it > > included the lowlatency elevator patch. So it may not help latency but > > it doesn't hurt in the numbers, at least not in the high end (that in > > theory is the one that needs the overkill length in the I/O queue most). > I agree with the last sentence, in theory, but practice showed something > different (about 10% to 15% performance decrease) > > But I am quite sure that this depends on your machine/hardware. Using IDE > instead of SCSI for example. 10/15 performance drop doesn't sound good, no matter what hardware ;). However in contest I recall there was quite an improvement in latency at least (I mean, it had some positive effect too) Getting the best throughput and latency at the same time is normally not possible, however evaluating if it's losing excessive throughput given a certain latency improvement is difficult. > > > However it definitely helps latency for me and I had a number of > > positive reports. > It helps but it's not as good as 2.4.18 stock. I'll try to find what's the precise reason of the interactivity drop with the 2.4.18->2.4.19 blkdev changes on Thu. I think I shortly looked into it once but there was no definitive answer, or anyways going back to the 2.4.18 code didn't appeal or make much sense. However I suspect this responsiveness issue could be storage hardware dependent. The sentence by Linus in the last few days while talking with Jens, about storage that reorders stuff and starve requests at the two ends of the platter was very scary, maybe you're really bitten by something like that. Linux does the right thing but your hardware keeps posting stuff under the os and mine doesn't. > > > Also make sure that you elvtune -r 0 -w 0 /dev/hda, also the journaling > I also tried that. > > > may affect the latency so you can try with plain ext2 to be sure it's > > not a fs issue. > Sure, I did this too. FS independent, where ReiserFS is still the best for > this scenario with the most few pauses than any other FS (ext2, ext3, ...) > > But for desktop usage: not acceptable! No way, No go! > > > the lowlatency elevator patch may not be perfect but it definitely seems > > to work better here. especially since there's no apparent throughput > > loss, it makes lots of sense to keep it applied, or it would waste lots > > of ram for apparently no gain. > hehe, well wasting RAM for no gain is my next part on my todo ;) (cache > everything even if there is no RAM for example, well but this is not the > point in this thread) > > ciao, Marc > Andrea ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 20:10 ` Andrea Arcangeli @ 2003-05-27 20:24 ` Marc-Christian Petersen 2003-05-27 20:45 ` Andrea Arcangeli 2003-05-27 20:55 ` Jens Axboe 0 siblings, 2 replies; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-27 20:24 UTC (permalink / raw) To: Andrea Arcangeli Cc: Marcelo Tosatti, linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III On Tuesday 27 May 2003 22:10, Andrea Arcangeli wrote: Hi Andrea, > 10/15 performance drop doesn't sound good, no matter what hardware ;). lol, well. YES ;) > However in contest I recall there was quite an improvement in latency at > least (I mean, it had some positive effect too) Yeah, but latency != throughput ;) > Getting the best throughput and latency at the same time is normally not > possible, however evaluating if it's losing excessive throughput given a > certain latency improvement is difficult. It is possible. I use 2.5 (preferably -mm tree) now more then any 2.4*. I use the AS (Anticipatory IO Scheduler) which AKPM included in his tree. This scheduler is kicking ass. Everything is rock fast, I can trash my HD to whatever I want, I still get no mouse stops, keyboard stops or anything like that. Even starting up multiple programs is possible while trashing the HD. Sure, it takes longer but it works :) I try to backport BIO and then AS for quite over 2 weeks now, but it seems, at least for me, that it's an impossible mission ;( > I'll try to find what's the precise reason of the interactivity drop cool. Thanks. > with the 2.4.18->2.4.19 blkdev changes on Thu. I think I shortly looked > into it once but there was no definitive answer, or anyways going back > to the 2.4.18 code didn't appeal or make much sense. Yeah, that's not an option. The throughput has been increased in 2.4.19 compared to 2.4.18. > However I suspect this responsiveness issue could be storage hardware > dependent. Hmm, I am quite sure that it isn't. I have ton's of mostly totally different hardware in my company, also test machines for WOLK at freenet.de (the biggest I had was a QUAD Xeon 1GHz with 16GB memory and hardware RAID (Compaq ML570 to be exact (f*cking nice machine btw. ;) and I even hit it on that machine. Friends of mine having also different hardware then me, also hitting that bug. _If_ it's the case of storage hardware, then many storage hardware is affected ;) > The sentence by Linus in the last few days while talking with Jens, > about storage that reorders stuff and starve requests at the two ends of > the platter was very scary, maybe you're really bitten by something like > that. Linux does the right thing but your hardware keeps posting stuff > under the os and mine doesn't. Oh, did I miss something at lkml or was it privately? ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 20:24 ` Marc-Christian Petersen @ 2003-05-27 20:45 ` Andrea Arcangeli 2003-05-27 20:53 ` Marc-Christian Petersen 2003-05-27 20:55 ` Jens Axboe 1 sibling, 1 reply; 142+ messages in thread From: Andrea Arcangeli @ 2003-05-27 20:45 UTC (permalink / raw) To: Marc-Christian Petersen Cc: Marcelo Tosatti, linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III On Tue, May 27, 2003 at 10:24:22PM +0200, Marc-Christian Petersen wrote: > I try to backport BIO and then AS for quite over 2 weeks now, but it seems, at > least for me, that it's an impossible mission ;( bio breaks all drivers, not a good idea to backport ;) note that the anticipatory scheduler generates very bad results with the winmark. it certainly has merits but it has large downsides too. I would be also curious if you could compare anticipatory with CFQ. The CFQ was designed to provide the highest possible degree of fariness. > > I'll try to find what's the precise reason of the interactivity drop > cool. Thanks. > > > with the 2.4.18->2.4.19 blkdev changes on Thu. I think I shortly looked > > into it once but there was no definitive answer, or anyways going back > > to the 2.4.18 code didn't appeal or make much sense. > Yeah, that's not an option. The throughput has been increased in 2.4.19 > compared to 2.4.18. agreed. > > > However I suspect this responsiveness issue could be storage hardware > > dependent. > Hmm, I am quite sure that it isn't. I have ton's of mostly totally different > hardware in my company, also test machines for WOLK at freenet.de (the > biggest I had was a QUAD Xeon 1GHz with 16GB memory and hardware RAID (Compaq > ML570 to be exact (f*cking nice machine btw. ;) and I even hit it on that > machine. Friends of mine having also different hardware then me, also hitting > that bug. _If_ it's the case of storage hardware, then many storage hardware > is affected ;) ;) > > The sentence by Linus in the last few days while talking with Jens, > > about storage that reorders stuff and starve requests at the two ends of > > the platter was very scary, maybe you're really bitten by something like > > that. Linux does the right thing but your hardware keeps posting stuff > > under the os and mine doesn't. > Oh, did I miss something at lkml or was it privately? I read it on l-k yesterday a few days ago, search emails from Linus with Jens somewhere in CC and you should find it. Andrea ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 20:45 ` Andrea Arcangeli @ 2003-05-27 20:53 ` Marc-Christian Petersen 2003-05-27 21:00 ` Jens Axboe 0 siblings, 1 reply; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-27 20:53 UTC (permalink / raw) To: Andrea Arcangeli Cc: Marcelo Tosatti, linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III On Tuesday 27 May 2003 22:45, Andrea Arcangeli wrote: Hi Andrea, > > I try to backport BIO and then AS for quite over 2 weeks now, but it > > seems, at least for me, that it's an impossible mission ;( > bio breaks all drivers, not a good idea to backport ;) HAHAHAH. Another wasted 2 weeks in my life ;-) But why does it brake all drivers? Could you please elaborate a bit? > note that the anticipatory scheduler generates very bad results with the > winmark. it certainly has merits but it has large downsides too. hmm, I am not aware of it, or even I _was_ not aware of it till now. > I would be also curious if you could compare anticipatory with CFQ. The > CFQ was designed to provide the highest possible degree of fariness. I'll can bench it, sure. I used CFQ before I switched to AS because I was curious about AS and as I didn't see a real difference in latency but AS gave me more throughput, I use AS from now on. > I read it on l-k yesterday a few days ago, search emails from Linus with > Jens somewhere in CC and you should find it. Already found it :) thank you. ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 20:53 ` Marc-Christian Petersen @ 2003-05-27 21:00 ` Jens Axboe 2003-05-27 21:11 ` Marc-Christian Petersen 0 siblings, 1 reply; 142+ messages in thread From: Jens Axboe @ 2003-05-27 21:00 UTC (permalink / raw) To: Marc-Christian Petersen Cc: Andrea Arcangeli, Marcelo Tosatti, linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III On Tue, May 27 2003, Marc-Christian Petersen wrote: > On Tuesday 27 May 2003 22:45, Andrea Arcangeli wrote: > > Hi Andrea, > > > > I try to backport BIO and then AS for quite over 2 weeks now, but it > > > seems, at least for me, that it's an impossible mission ;( > > bio breaks all drivers, not a good idea to backport ;) > HAHAHAH. Another wasted 2 weeks in my life ;-) > > But why does it brake all drivers? Could you please elaborate a bit? Are you serious? Please tell me you haven't spend two weeks on the project not realising this? I think the problem here is that you are saying 'bio' when you really mean something else. bio is the 2.5 io structure. What _exactly_ do you mean with 'backporting bio'? I don't think you have the slightest idea of the nastiness involved with doing something like that. -- Jens Axboe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 21:00 ` Jens Axboe @ 2003-05-27 21:11 ` Marc-Christian Petersen 2003-05-27 21:19 ` Jens Axboe 0 siblings, 1 reply; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-27 21:11 UTC (permalink / raw) To: Jens Axboe Cc: Andrea Arcangeli, Marcelo Tosatti, linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III On Tuesday 27 May 2003 23:00, Jens Axboe wrote: Hi Jens, > Are you serious? Please tell me you haven't spend two weeks on the > project not realising this? Well, 2 weeks means in hours not more than 5 or 6 just delayed over many days. And it was further just to go deeper into the code, not a real attempt to backport it. NM. ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 21:11 ` Marc-Christian Petersen @ 2003-05-27 21:19 ` Jens Axboe 0 siblings, 0 replies; 142+ messages in thread From: Jens Axboe @ 2003-05-27 21:19 UTC (permalink / raw) To: Marc-Christian Petersen Cc: Andrea Arcangeli, Marcelo Tosatti, linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III On Tue, May 27 2003, Marc-Christian Petersen wrote: > On Tuesday 27 May 2003 23:00, Jens Axboe wrote: > > Hi Jens, > > > Are you serious? Please tell me you haven't spend two weeks on the > > project not realising this? > Well, 2 weeks means in hours not more than 5 or 6 just delayed over many days. > > And it was further just to go deeper into the code, not a real attempt to > backport it. NM. A bigger analysis of the problem before starting mindless (and useless) porting would have brought you a lot farther :) If you're just looking to port some io schedulers, the explanation I left you in the previous mail should be plenty to get you started. -- Jens Axboe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 20:24 ` Marc-Christian Petersen 2003-05-27 20:45 ` Andrea Arcangeli @ 2003-05-27 20:55 ` Jens Axboe 2003-05-27 21:05 ` William Lee Irwin III 1 sibling, 1 reply; 142+ messages in thread From: Jens Axboe @ 2003-05-27 20:55 UTC (permalink / raw) To: Marc-Christian Petersen Cc: Andrea Arcangeli, Marcelo Tosatti, linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III On Tue, May 27 2003, Marc-Christian Petersen wrote: > I try to backport BIO and then AS for quite over 2 weeks now, but it > seems, at least for me, that it's an impossible mission ;( You're nuts, that's not only incredibly silly it's not even needed for what you want. What you want is the proper io scheduler abstraction interface. With that in place, you can port the 2.5 io schedulers without too much trouble. They have very little dependencies on bio itself ('bio' has become on of the most abused terms in 2.5. I use it only to describe the io structure). You basically need to pin down users that directly manipulate the queue to extract/insert requests. So step one is doing elv_add_request(), elv_next_request, and elv_remove_request(). That is a 1:1 mapping to what 2.4 has right now, so you should be able to accomplish this change without changing how the code works. But still, why on earth waste your time with something like this now when we are so close to 2.6? 2.4 is a stable code base, it should stay that way. I'm really not interested in more esoteric 2.4 backports, the vendor kernels are bad enough as it is. -- Jens Axboe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 20:55 ` Jens Axboe @ 2003-05-27 21:05 ` William Lee Irwin III 2003-05-27 21:18 ` Jens Axboe 2003-05-27 21:33 ` Andrea Arcangeli 0 siblings, 2 replies; 142+ messages in thread From: William Lee Irwin III @ 2003-05-27 21:05 UTC (permalink / raw) To: Jens Axboe Cc: Marc-Christian Petersen, Andrea Arcangeli, Marcelo Tosatti, linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose On Tue, May 27, 2003 at 10:55:16PM +0200, Jens Axboe wrote: > But still, why on earth waste your time with something like this now > when we are so close to 2.6? 2.4 is a stable code base, it should stay > that way. I'm really not interested in more esoteric 2.4 backports, the > vendor kernels are bad enough as it is. They've backported everything else, so I guess it stood to reason it'd happen eventually. I, for one, got a good laugh out of it. =) Makes me wonder if the 2.4 distro backport trees' diffs are bigger than 2.4 itself yet. -- wli ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 21:05 ` William Lee Irwin III @ 2003-05-27 21:18 ` Jens Axboe 2003-05-27 21:33 ` Andrea Arcangeli 1 sibling, 0 replies; 142+ messages in thread From: Jens Axboe @ 2003-05-27 21:18 UTC (permalink / raw) To: William Lee Irwin III, Marc-Christian Petersen, Andrea Arcangeli, Marcelo Tosatti, linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose On Tue, May 27 2003, William Lee Irwin III wrote: > I, for one, got a good laugh out of it. =) Makes me wonder if the 2.4 > distro backport trees' diffs are bigger than 2.4 itself yet. Heh, well they're open for inspection, it's probably not far off :) -- Jens Axboe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 21:05 ` William Lee Irwin III 2003-05-27 21:18 ` Jens Axboe @ 2003-05-27 21:33 ` Andrea Arcangeli 1 sibling, 0 replies; 142+ messages in thread From: Andrea Arcangeli @ 2003-05-27 21:33 UTC (permalink / raw) To: William Lee Irwin III, Jens Axboe, Marc-Christian Petersen, Marcelo Tosatti, linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose On Tue, May 27, 2003 at 02:05:18PM -0700, William Lee Irwin III wrote: > They've backported everything else, so I guess it stood to reason it'd > happen eventually. you probably forgot we have varyio in 2.4 due the lack of bio ;) Andrea ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 17:57 ` Marcelo Tosatti 2003-05-27 18:08 ` Marc-Christian Petersen @ 2003-05-27 18:09 ` manish 1 sibling, 0 replies; 142+ messages in thread From: manish @ 2003-05-27 18:09 UTC (permalink / raw) To: Marcelo Tosatti Cc: Marc-Christian Petersen, linux-kernel, Carl-Daniel Hailfinger, Christian Klose, William Lee Irwin III Marcelo Tosatti wrote: > >On Tue, 27 May 2003, Marc-Christian Petersen wrote: > >>On Tuesday 27 May 2003 19:47, Marcelo Tosatti wrote: >> >>Hi Marcelo, >> >>>>A pause is _not_ perfectly fine, even not to some extent. That pause we >>>>are discussing about is a pause of the _whole_ machine, not just disk i/o >>>>pauses. Mouse stops, keyboard stops, everything stops, who knows wtf. >>>> >>>Do you also notice them? >>> >>I do, people I know do also, numbers of those people only _I_ know are about >>~30. I've reported this problem over a year ago while 2.4.19-pre time. >> > >Can you please try to reproduce it with -aa? > >>>>That behaviour is absolutely bullshit for desktop users. For serverusage >>>>you may not notice it in this dimension (mostly no X so no mouse), but >>>>also for a server environment this may be very bad. >>>> >>>Agreed. >>> Hello ! After several tests, I have noticed that I can produce this problem easily when my bdflush settings are: 30 50 32 100 50 300 60 0 0 and it occurs very less frequently when my settings are: 2 50 32 100 50 300 1 0 0 Right now, I noticed the following stack trace for one such stuck process: sys_read generic_file_read do_generic_file_read page_cache_read __alloc_pages balance_classzone try_to_free_pages shrink_caches shrink_cache try_to_release_page try_to_free_buffer sync_page_buffers wait_on_buffer __wait_on_buffer schedule Thanks -Manish ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 17:47 ` Marcelo Tosatti 2003-05-27 17:52 ` Marc-Christian Petersen @ 2003-05-27 17:53 ` manish 2003-05-27 18:01 ` Marc-Christian Petersen 2003-05-27 18:12 ` Matthias Mueller 2 siblings, 1 reply; 142+ messages in thread From: manish @ 2003-05-27 17:53 UTC (permalink / raw) To: Marcelo Tosatti Cc: Marc-Christian Petersen, linux-kernel, Carl-Daniel Hailfinger, Christian Klose, William Lee Irwin III Marcelo Tosatti wrote: >On Tue, 27 May 2003, Marc-Christian Petersen wrote: > >>On Tuesday 27 May 2003 19:27, Marcelo Tosatti wrote: >> >>Hi Marcelo, >> >>>>Following is SysRq-T output for stuck processes during such a pause from >>>>Christian Klose. Only processes in D state are listed for brevity. >>>>Especially the last two call traces are interesting. >>>> >>>A "pause" is perfectly fine (to some extent, of course), now a hang is >>>not. Is this backtrace from a hanged, unusable kernel or ? >>> >>A pause is _not_ perfectly fine, even not to some extent. That pause we are >>discussing about is a pause of the _whole_ machine, not just disk i/o pauses. >>Mouse stops, keyboard stops, everything stops, who knows wtf. >> > >Do you also notice them? > > >>That behaviour is absolutely bullshit for desktop users. For serverusage you >>may not notice it in this dimension (mostly no X so no mouse), but also for a >>server environment this may be very bad. >> > >Agreed. > Hi Marc, With respect to the hangs that you noticed, did the processes complete after a "pause" or did they stay hung (deadlocked)? Thanks Manish ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 17:53 ` manish @ 2003-05-27 18:01 ` Marc-Christian Petersen 2003-05-27 18:16 ` Marcelo Tosatti 0 siblings, 1 reply; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-27 18:01 UTC (permalink / raw) To: manish, Marcelo Tosatti Cc: linux-kernel, Carl-Daniel Hailfinger, Christian Klose, William Lee Irwin III On Tuesday 27 May 2003 19:53, manish wrote: Hi Manish, > With respect to the hangs that you noticed, did the processes complete > after a "pause" or did they stay hung (deadlocked)? yes, no processes get ever deadlocked nor anything else in this area. The whole system just does _nothing_ for an amount of time (1-15 seconds, depends). _Sometimes_ (not always) even a ping is stoped for the amount of time the machine does nothing but pausing. Also not a hardware problem. I made this clear before reporting this bug. Tested tons of different hardware, different drivers for the network card etc. I repeat this now for the $high_number'th time ;): - 2.4.18 worked perfect - 2.4.19-pre not ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 18:01 ` Marc-Christian Petersen @ 2003-05-27 18:16 ` Marcelo Tosatti 2003-05-27 18:25 ` Marc-Christian Petersen 0 siblings, 1 reply; 142+ messages in thread From: Marcelo Tosatti @ 2003-05-27 18:16 UTC (permalink / raw) To: Marc-Christian Petersen Cc: manish, linux-kernel, Carl-Daniel Hailfinger, Christian Klose, William Lee Irwin III On Tue, 27 May 2003, Marc-Christian Petersen wrote: > On Tuesday 27 May 2003 19:53, manish wrote: > > Hi Manish, > > > With respect to the hangs that you noticed, did the processes complete > > after a "pause" or did they stay hung (deadlocked)? > yes, no processes get ever deadlocked nor anything else in this area. The > whole system just does _nothing_ for an amount of time (1-15 seconds, > depends). _Sometimes_ (not always) even a ping is stoped for the amount of > time the machine does nothing but pausing. > > Also not a hardware problem. I made this clear before reporting this bug. > Tested tons of different hardware, different drivers for the network card > etc. > > I repeat this now for the $high_number'th time ;): > - 2.4.18 worked perfect > - 2.4.19-pre not Thats very useful information. Can you track down which -pre introduced the hangs? Thanks! ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 18:16 ` Marcelo Tosatti @ 2003-05-27 18:25 ` Marc-Christian Petersen 0 siblings, 0 replies; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-27 18:25 UTC (permalink / raw) To: Marcelo Tosatti Cc: manish, linux-kernel, Carl-Daniel Hailfinger, Christian Klose, William Lee Irwin III On Tuesday 27 May 2003 20:16, Marcelo Tosatti wrote: Hi Marcelo, > > I repeat this now for the $high_number'th time ;): > > - 2.4.18 worked perfect > > - 2.4.19-pre not > Thats very useful information. Can you track down which -pre introduced > the hangs? If I am not on drugs and my last test was not under drugs, the causing patch is this one: http://linux.bkbits.net:8080/linux-2.4/diffs/drivers/block/ll_rw_blk.c@1.29?nav=index.html|ChangeSet@-2y|cset@1.160|hist/drivers/block/ll_rw_blk.c ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 17:47 ` Marcelo Tosatti 2003-05-27 17:52 ` Marc-Christian Petersen 2003-05-27 17:53 ` manish @ 2003-05-27 18:12 ` Matthias Mueller 2 siblings, 0 replies; 142+ messages in thread From: Matthias Mueller @ 2003-05-27 18:12 UTC (permalink / raw) To: Marcelo Tosatti Cc: Marc-Christian Petersen, linux-kernel, Carl-Daniel Hailfinger, manish, Christian Klose, William Lee Irwin III Hi, On Tue, May 27, 2003 at 02:47:24PM -0300, Marcelo Tosatti wrote: > On Tue, 27 May 2003, Marc-Christian Petersen wrote: > > > A "pause" is perfectly fine (to some extent, of course), now a hang is > > > not. Is this backtrace from a hanged, unusable kernel or ? > > A pause is _not_ perfectly fine, even not to some extent. That pause we are > > discussing about is a pause of the _whole_ machine, not just disk i/o pauses. > > Mouse stops, keyboard stops, everything stops, who knows wtf. > > Do you also notice them? Since 2.4.19 I notice a lot of pauses with interactive work (desktop usage). If i copy a big file over network or on local disk, some of my desktop machines simply don't respond anymore to user requests (e.g. I start copying a large file over nfs to local disk and start mozilla, mozilla won't start until the copy is finished). My current testcase is: dd if=/dev/zero of=blubber bs=4096 count=65000 and moving the mouse during this operation. With 2.4.18 everything is ok, the mouse runs smooth the whole time. 2.4.19 and later: I get mouse hangs, it won't move for a second, sometimes longer. wolk reduces this problem, but doesn't solve it. On my servers (mostly IBM xseries 345 and 335) it's ok with a vanilla-kernel, but there is no interactive work, mostly routing or network monitoring. I hope, I can run a vanilla 2.4 kernel again on my machines, at the moment that isn't possible. Bye, Matthias -- Matthias.Mueller@rz.uni-karlsruhe.de Rechenzentrum Universitaet Karlsruhe Abteilung Netze ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 17:27 ` Marcelo Tosatti 2003-05-27 17:36 ` Marc-Christian Petersen @ 2003-05-27 17:36 ` William Lee Irwin III 2003-05-27 17:38 ` Carl-Daniel Hailfinger 2 siblings, 0 replies; 142+ messages in thread From: William Lee Irwin III @ 2003-05-27 17:36 UTC (permalink / raw) To: Marcelo Tosatti Cc: Carl-Daniel Hailfinger, manish, linux-kernel, Christian Klose, Marc-Christian Petersen On Tue, 27 May 2003, Carl-Daniel Hailfinger wrote: >> Following is SysRq-T output for stuck processes during such a pause from >> Christian Klose. Only processes in D state are listed for brevity. >> Especially the last two call traces are interesting. On Tue, May 27, 2003 at 02:27:00PM -0300, Marcelo Tosatti wrote: > A "pause" is perfectly fine (to some extent, of course), now a hang is > not. Is this backtrace from a hanged, unusable kernel or ? This sounds like deadlocked proceses, but not a whole system hang. Sounds like a correctness issue, not a performance issue. -- wli ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 17:27 ` Marcelo Tosatti 2003-05-27 17:36 ` Marc-Christian Petersen 2003-05-27 17:36 ` William Lee Irwin III @ 2003-05-27 17:38 ` Carl-Daniel Hailfinger 2003-05-27 17:50 ` manish 2 siblings, 1 reply; 142+ messages in thread From: Carl-Daniel Hailfinger @ 2003-05-27 17:38 UTC (permalink / raw) To: Marcelo Tosatti Cc: manish, linux-kernel, Christian Klose, Marc-Christian Petersen, William Lee Irwin III Marcelo Tosatti wrote: > > On Tue, 27 May 2003, Carl-Daniel Hailfinger wrote: > >>Marcelo Tosatti wrote: >> >>>On Mon, 26 May 2003, manish wrote: >>>>All the bonnie process and any other process (like df, ps -ef etc.) are >>>>hung in __lock_page. Breaking into kdb, I observe the following for one >> >>Following is SysRq-T output for stuck processes during such a pause from >>Christian Klose. Only processes in D state are listed for brevity. >>Especially the last two call traces are interesting. > > A "pause" is perfectly fine (to some extent, of course), now a hang is > not. Is this backtrace from a hanged, unusable kernel or ? AFAIK, the kernel is not unusable, but a 20 second pause with no disk access at all is not nice either. Regards, Carl-Daniel ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 17:38 ` Carl-Daniel Hailfinger @ 2003-05-27 17:50 ` manish 2003-05-27 18:04 ` Marc-Christian Petersen 0 siblings, 1 reply; 142+ messages in thread From: manish @ 2003-05-27 17:50 UTC (permalink / raw) To: Carl-Daniel Hailfinger Cc: Marcelo Tosatti, linux-kernel, Christian Klose, Marc-Christian Petersen, William Lee Irwin III Carl-Daniel Hailfinger wrote: >Marcelo Tosatti wrote: > >>On Tue, 27 May 2003, Carl-Daniel Hailfinger wrote: >> >>>Marcelo Tosatti wrote: >>> >>>>On Mon, 26 May 2003, manish wrote: >>>> >>>>>All the bonnie process and any other process (like df, ps -ef etc.) are >>>>>hung in __lock_page. Breaking into kdb, I observe the following for one >>>>> >>>Following is SysRq-T output for stuck processes during such a pause from >>>Christian Klose. Only processes in D state are listed for brevity. >>>Especially the last two call traces are interesting. >>> >>A "pause" is perfectly fine (to some extent, of course), now a hang is >>not. Is this backtrace from a hanged, unusable kernel or ? >> > >AFAIK, the kernel is not unusable, but a 20 second pause with no disk >access at all is not nice either. > > >Regards, >Carl-Daniel > Hello ! It is not a system hang but the processes hang showing the same stack trace. This is certainly not a pause since the bonnie processes that were hung (or deadlocked) never completed after several hrs. The stack trace was the same. Thanks Manish ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 17:50 ` manish @ 2003-05-27 18:04 ` Marc-Christian Petersen 2003-05-27 23:06 ` Georg Nikodym ` (3 more replies) 0 siblings, 4 replies; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-27 18:04 UTC (permalink / raw) To: manish, Carl-Daniel Hailfinger, Andrea Arcangeli Cc: Marcelo Tosatti, linux-kernel, Christian Klose, William Lee Irwin III On Tuesday 27 May 2003 19:50, manish wrote: Hi Manish, > It is not a system hang but the processes hang showing the same stack > trace. This is certainly not a pause since the bonnie processes that > were hung (or deadlocked) never completed after several hrs. The stack > trace was the same. then you are hitting a different bug or a bug related to the issues Christian Klose and me and $tons of others were complaining. The bug you are hitting might be the problem with "process stuck in D state" Andrea Arcangeli fixed, let me guess, over half a year ago or so. In case you have a good mind to try to address your issue, you might want to try out the patch you can find here: http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.21rc2aa1/9980_fix-pausing-2 ALL: Anyone who has this kind of pauses/stops/mouse is dead/keyboard is dead/: speak _NOW_ please, doesn't matter who you are! I've added Andrea into CC. ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 18:04 ` Marc-Christian Petersen @ 2003-05-27 23:06 ` Georg Nikodym 2003-05-27 23:26 ` Christopher S. Aker 2003-05-28 5:33 ` Con Kolivas ` (2 subsequent siblings) 3 siblings, 1 reply; 142+ messages in thread From: Georg Nikodym @ 2003-05-27 23:06 UTC (permalink / raw) To: Marc-Christian Petersen Cc: manish, Carl-Daniel Hailfinger, Andrea Arcangeli, Marcelo Tosatti, linux-kernel, Christian Klose, William Lee Irwin III [-- Attachment #1: Type: text/plain, Size: 322 bytes --] On Tue, 27 May 2003 20:04:49 +0200 Marc-Christian Petersen <m.c.p@wolk-project.de> wrote: > ALL: Anyone who has this kind of pauses/stops/mouse is dead/keyboard > is dead/: > speak _NOW_ please, doesn't matter who you are! Uh, ok. These pauses have kept me from using anything newer than riel's 2.4.19-rmap15a -g [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 23:06 ` Georg Nikodym @ 2003-05-27 23:26 ` Christopher S. Aker 0 siblings, 0 replies; 142+ messages in thread From: Christopher S. Aker @ 2003-05-27 23:26 UTC (permalink / raw) To: Marc-Christian Petersen Cc: linux-kernel, manish, Carl-Daniel Hailfinger, Andrea Arcangeli, Marcelo Tosatti, Christian Klose, William Lee Irwin III, Georg Nikodym > ALL: Anyone who has this kind of pauses/stops/mouse is dead/keyboard > is dead/: speak _NOW_ please, doesn't matter who you are! I've been able to reproduce the pauses on two different machines/mb/processor, although each machine has >= 2.5GB ram. I can reproduce this in 2.4.19, 2.4.20, and the 2.4.21-rc1/rc2/rc3. After the machine un-pauses, everything completes/returns to normal. I don't experience deadlocked processes. Both my machines are IDE, using UDMA, hdparam stuff is maxxed; messing with bdflush, elvtune doesn't make any difference. Limiting the ram on the machines didn't help. Pauses have lasted anywhere from a few seconds to a few minutes. Anything later than 2.4.18 is unusable for me because of this. -Chris ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 18:04 ` Marc-Christian Petersen 2003-05-27 23:06 ` Georg Nikodym @ 2003-05-28 5:33 ` Con Kolivas 2003-05-28 6:04 ` Jens Axboe 2003-05-28 7:16 ` Marc Wilson 2003-05-28 9:36 ` Ragnar Hojland Espinosa 3 siblings, 1 reply; 142+ messages in thread From: Con Kolivas @ 2003-05-28 5:33 UTC (permalink / raw) To: Marc-Christian Petersen, manish, Carl-Daniel Hailfinger, Andrea Arcangeli Cc: Marcelo Tosatti, linux-kernel, Christian Klose, William Lee Irwin III On Wed, 28 May 2003 04:04, Marc-Christian Petersen wrote: > On Tuesday 27 May 2003 19:50, manish wrote: > > Hi Manish, > > > It is not a system hang but the processes hang showing the same stack > > trace. This is certainly not a pause since the bonnie processes that > > were hung (or deadlocked) never completed after several hrs. The stack > > trace was the same. > > then you are hitting a different bug or a bug related to the issues > Christian Klose and me and $tons of others were complaining. > > The bug you are hitting might be the problem with "process stuck in D > state" Andrea Arcangeli fixed, let me guess, over half a year ago or so. > > In case you have a good mind to try to address your issue, you might want > to try out the patch you can find here: > > http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.21rc2 >aa1/9980_fix-pausing-2 > > ALL: Anyone who has this kind of pauses/stops/mouse is dead/keyboard is > dead/: speak _NOW_ please, doesn't matter who you are! Yo! I'll throw my babushka into the ring too. I think it's obvious from MCP's comments that I've been involved in testing this problem. I've spent hours, possibly days trying to find a way to fix the pauses introduced since 2.4.19pre1. I agree with what MCP describes that the machine can come to a standstill under any sort of disk i/o and is unusable for a variable length of time. I've been playing with all sorts of numbers in my patchset to try and limit it with only mild success. The best results I've had without a major decrease in throughput was using akpm's read latency 2 patch but by significantly reducing the nr_requests. It was changing the number of requests that I discovered dropping them to 4 fixed the problem but destroyed write throughput. I was pleased to see AA give the problem recognition after my contest results on his kernel but disappointed that the problem only was reduced, not fixed. I have seen it on every piece of hardware I have used a 2.4.19+ kernel on using the desktop. I have no idea what the real problem is, but I firmly believe with MCP that it is the biggest flaw in 2.4 on the desktop (no idea what it does to servers). We've tried over and over again fiddling with the numbers and patches and only going to less than 2.4.19 fixes it completely. Con ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 5:33 ` Con Kolivas @ 2003-05-28 6:04 ` Jens Axboe 2003-05-28 7:13 ` Con Kolivas 0 siblings, 1 reply; 142+ messages in thread From: Jens Axboe @ 2003-05-28 6:04 UTC (permalink / raw) To: Con Kolivas Cc: Marc-Christian Petersen, manish, Carl-Daniel Hailfinger, Andrea Arcangeli, Marcelo Tosatti, linux-kernel, Christian Klose, William Lee Irwin III On Wed, May 28 2003, Con Kolivas wrote: > On Wed, 28 May 2003 04:04, Marc-Christian Petersen wrote: > > On Tuesday 27 May 2003 19:50, manish wrote: > > > > Hi Manish, > > > > > It is not a system hang but the processes hang showing the same stack > > > trace. This is certainly not a pause since the bonnie processes that > > > were hung (or deadlocked) never completed after several hrs. The stack > > > trace was the same. > > > > then you are hitting a different bug or a bug related to the issues > > Christian Klose and me and $tons of others were complaining. > > > > The bug you are hitting might be the problem with "process stuck in D > > state" Andrea Arcangeli fixed, let me guess, over half a year ago or so. > > > > In case you have a good mind to try to address your issue, you might want > > to try out the patch you can find here: > > > > http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.21rc2 > >aa1/9980_fix-pausing-2 > > > > ALL: Anyone who has this kind of pauses/stops/mouse is dead/keyboard is > > dead/: speak _NOW_ please, doesn't matter who you are! > > Yo! > > I'll throw my babushka into the ring too. I think it's obvious from MCP's > comments that I've been involved in testing this problem. I've spent hours, > possibly days trying to find a way to fix the pauses introduced since > 2.4.19pre1. I agree with what MCP describes that the machine can come to a > standstill under any sort of disk i/o and is unusable for a variable length > of time. I've been playing with all sorts of numbers in my patchset to try > and limit it with only mild success. The best results I've had without a > major decrease in throughput was using akpm's read latency 2 patch but by > significantly reducing the nr_requests. It was changing the number of > requests that I discovered dropping them to 4 fixed the problem but destroyed > write throughput. I was pleased to see AA give the problem recognition after > my contest results on his kernel but disappointed that the problem only was > reduced, not fixed. Does the problem change at all if you force batch_requests to 0? -- Jens Axboe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 6:04 ` Jens Axboe @ 2003-05-28 7:13 ` Con Kolivas 2003-05-28 7:13 ` Jens Axboe 0 siblings, 1 reply; 142+ messages in thread From: Con Kolivas @ 2003-05-28 7:13 UTC (permalink / raw) To: Jens Axboe Cc: Marc-Christian Petersen, manish, Carl-Daniel Hailfinger, Andrea Arcangeli, Marcelo Tosatti, linux-kernel, Christian Klose, William Lee Irwin III On Wed, 28 May 2003 16:04, Jens Axboe wrote: > On Wed, May 28 2003, Con Kolivas wrote: > > On Wed, 28 May 2003 04:04, Marc-Christian Petersen wrote: > > > On Tuesday 27 May 2003 19:50, manish wrote: > > > > > > Hi Manish, > > > > > > > It is not a system hang but the processes hang showing the same stack > > > > trace. This is certainly not a pause since the bonnie processes that > > > > were hung (or deadlocked) never completed after several hrs. The > > > > stack trace was the same. > > > > > > then you are hitting a different bug or a bug related to the issues > > > Christian Klose and me and $tons of others were complaining. > > > > > > The bug you are hitting might be the problem with "process stuck in D > > > state" Andrea Arcangeli fixed, let me guess, over half a year ago or > > > so. > > > > > > In case you have a good mind to try to address your issue, you might > > > want to try out the patch you can find here: > > > > > > http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.2 > > >1rc2 aa1/9980_fix-pausing-2 > > > > > > ALL: Anyone who has this kind of pauses/stops/mouse is dead/keyboard is > > > dead/: speak _NOW_ please, doesn't matter who you are! > > > > Yo! > > > > I'll throw my babushka into the ring too. I think it's obvious from MCP's > > comments that I've been involved in testing this problem. I've spent > > hours, possibly days trying to find a way to fix the pauses introduced > > since 2.4.19pre1. I agree with what MCP describes that the machine can > > come to a standstill under any sort of disk i/o and is unusable for a > > variable length of time. I've been playing with all sorts of numbers in > > my patchset to try and limit it with only mild success. The best results > > I've had without a major decrease in throughput was using akpm's read > > latency 2 patch but by significantly reducing the nr_requests. It was > > changing the number of requests that I discovered dropping them to 4 > > fixed the problem but destroyed write throughput. I was pleased to see AA > > give the problem recognition after my contest results on his kernel but > > disappointed that the problem only was reduced, not fixed. > > Does the problem change at all if you force batch_requests to 0? I've tried batch_requests to 1 by itself (without changing the nr_request) and that didn't fix it, but recall dropping nr_requests to 2 (which would make batch requests==0) made the machine fail to boot so I haven't tried batch requests 0 by itself. Should it boot with it == 0? Con ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 7:13 ` Con Kolivas @ 2003-05-28 7:13 ` Jens Axboe 2003-05-28 7:32 ` Marc-Christian Petersen 0 siblings, 1 reply; 142+ messages in thread From: Jens Axboe @ 2003-05-28 7:13 UTC (permalink / raw) To: Con Kolivas Cc: Marc-Christian Petersen, manish, Carl-Daniel Hailfinger, Andrea Arcangeli, Marcelo Tosatti, linux-kernel, Christian Klose, William Lee Irwin III On Wed, May 28 2003, Con Kolivas wrote: > On Wed, 28 May 2003 16:04, Jens Axboe wrote: > > On Wed, May 28 2003, Con Kolivas wrote: > > > On Wed, 28 May 2003 04:04, Marc-Christian Petersen wrote: > > > > On Tuesday 27 May 2003 19:50, manish wrote: > > > > > > > > Hi Manish, > > > > > > > > > It is not a system hang but the processes hang showing the same stack > > > > > trace. This is certainly not a pause since the bonnie processes that > > > > > were hung (or deadlocked) never completed after several hrs. The > > > > > stack trace was the same. > > > > > > > > then you are hitting a different bug or a bug related to the issues > > > > Christian Klose and me and $tons of others were complaining. > > > > > > > > The bug you are hitting might be the problem with "process stuck in D > > > > state" Andrea Arcangeli fixed, let me guess, over half a year ago or > > > > so. > > > > > > > > In case you have a good mind to try to address your issue, you might > > > > want to try out the patch you can find here: > > > > > > > > http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.2 > > > >1rc2 aa1/9980_fix-pausing-2 > > > > > > > > ALL: Anyone who has this kind of pauses/stops/mouse is dead/keyboard is > > > > dead/: speak _NOW_ please, doesn't matter who you are! > > > > > > Yo! > > > > > > I'll throw my babushka into the ring too. I think it's obvious from MCP's > > > comments that I've been involved in testing this problem. I've spent > > > hours, possibly days trying to find a way to fix the pauses introduced > > > since 2.4.19pre1. I agree with what MCP describes that the machine can > > > come to a standstill under any sort of disk i/o and is unusable for a > > > variable length of time. I've been playing with all sorts of numbers in > > > my patchset to try and limit it with only mild success. The best results > > > I've had without a major decrease in throughput was using akpm's read > > > latency 2 patch but by significantly reducing the nr_requests. It was > > > changing the number of requests that I discovered dropping them to 4 > > > fixed the problem but destroyed write throughput. I was pleased to see AA > > > give the problem recognition after my contest results on his kernel but > > > disappointed that the problem only was reduced, not fixed. > > > > Does the problem change at all if you force batch_requests to 0? > > I've tried batch_requests to 1 by itself (without changing the > nr_request) and that didn't fix it, but recall dropping nr_requests to > 2 (which would make batch requests==0) made the machine fail to boot > so I haven't tried batch requests 0 by itself. Should it boot with it > == 0? If you leave nr_requests as it is, I don't see why it should not boot with batch_requests == 0. I can't see in all of these mails whether backing out akpm's starvation patch makes the problem go away. Does it? -- Jens Axboe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 7:13 ` Jens Axboe @ 2003-05-28 7:32 ` Marc-Christian Petersen 2003-05-28 7:35 ` Jens Axboe 0 siblings, 1 reply; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-28 7:32 UTC (permalink / raw) To: Jens Axboe, Con Kolivas Cc: manish, Carl-Daniel Hailfinger, Andrea Arcangeli, Marcelo Tosatti, linux-kernel, Christian Klose, William Lee Irwin III On Wednesday 28 May 2003 09:13, Jens Axboe wrote: Hi Jens, > If you leave nr_requests as it is, I don't see why it should not boot > with batch_requests == 0. > I can't see in all of these mails whether backing out akpm's starvation > patch makes the problem go away. Does it? If you mean "http://linux.bkbits.net:8080/linux-2.4/diffs/drivers/block/ll_rw_blk.c@1.29?nav=index.html|ChangeSet@-2y|cset@1.160|hist/drivers/block/ll_rw_blk.c" that one, the answer is YES. ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 7:32 ` Marc-Christian Petersen @ 2003-05-28 7:35 ` Jens Axboe 2003-05-28 7:51 ` Andrew Morton 0 siblings, 1 reply; 142+ messages in thread From: Jens Axboe @ 2003-05-28 7:35 UTC (permalink / raw) To: Marc-Christian Petersen Cc: Con Kolivas, manish, Andrea Arcangeli, Marcelo Tosatti, linux-kernel, Andrew Morton On Wed, May 28 2003, Marc-Christian Petersen wrote: > On Wednesday 28 May 2003 09:13, Jens Axboe wrote: > > Hi Jens, > > > If you leave nr_requests as it is, I don't see why it should not boot > > with batch_requests == 0. > > I can't see in all of these mails whether backing out akpm's starvation > > patch makes the problem go away. Does it? > If you mean > "http://linux.bkbits.net:8080/linux-2.4/diffs/drivers/block/ll_rw_blk.c@1.29?nav=index.html|ChangeSet@-2y|cset@1.160|hist/drivers/block/ll_rw_blk.c" > > that one, the answer is YES. That's the one, yes. Andrew, looks like your patch brought out some really bad behaviour. -- Jens Axboe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 7:35 ` Jens Axboe @ 2003-05-28 7:51 ` Andrew Morton 2003-05-28 8:30 ` Jens Axboe ` (2 more replies) 0 siblings, 3 replies; 142+ messages in thread From: Andrew Morton @ 2003-05-28 7:51 UTC (permalink / raw) To: Jens Axboe; +Cc: m.c.p, kernel, manish, andrea, marcelo, linux-kernel Jens Axboe <axboe@suse.de> wrote: > > > that one, the answer is YES. > > That's the one, yes. Andrew, looks like your patch brought out some > really bad behaviour. Yes, but why? It'd be interesting if any of these changes make a difference. drivers/block/ll_rw_blk.c | 7 fs/buffer.c | 3030 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 3033 insertions(+), 4 deletions(-) diff -puN drivers/block/ll_rw_blk.c~a drivers/block/ll_rw_blk.c --- 24/drivers/block/ll_rw_blk.c~a 2003-05-28 00:48:09.000000000 -0700 +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 00:50:02.000000000 -0700 @@ -590,10 +590,10 @@ static struct request *__get_request_wai register struct request *rq; DECLARE_WAITQUEUE(wait, current); - generic_unplug_device(q); - add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); + add_wait_queue(&q->wait_for_requests[rw], &wait); do { set_current_state(TASK_UNINTERRUPTIBLE); + generic_unplug_device(q); if (q->rq[rw].count == 0) schedule(); spin_lock_irq(&io_request_lock); @@ -829,8 +829,7 @@ void blkdev_release_request(struct reque */ if (q) { list_add(&req->queue, &q->rq[rw].free); - if (++q->rq[rw].count >= q->batch_requests && - waitqueue_active(&q->wait_for_requests[rw])) + if (++q->rq[rw].count >= q->batch_requests) wake_up(&q->wait_for_requests[rw]); } } _ ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 7:51 ` Andrew Morton @ 2003-05-28 8:30 ` Jens Axboe 2003-05-28 8:43 ` Marc-Christian Petersen 2003-05-28 8:40 ` Marc-Christian Petersen 2003-05-28 10:13 ` Matthias Mueller 2 siblings, 1 reply; 142+ messages in thread From: Jens Axboe @ 2003-05-28 8:30 UTC (permalink / raw) To: Andrew Morton; +Cc: m.c.p, kernel, manish, andrea, marcelo, linux-kernel On Wed, May 28 2003, Andrew Morton wrote: > Jens Axboe <axboe@suse.de> wrote: > > > > > that one, the answer is YES. > > > > That's the one, yes. Andrew, looks like your patch brought out some > > really bad behaviour. > > Yes, but why? > > It'd be interesting if any of these changes make a difference. > > > drivers/block/ll_rw_blk.c | 7 > fs/buffer.c | 3030 ++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 3033 insertions(+), 4 deletions(-) > > diff -puN drivers/block/ll_rw_blk.c~a drivers/block/ll_rw_blk.c > --- 24/drivers/block/ll_rw_blk.c~a 2003-05-28 00:48:09.000000000 -0700 > +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 00:50:02.000000000 -0700 > @@ -590,10 +590,10 @@ static struct request *__get_request_wai > register struct request *rq; > DECLARE_WAITQUEUE(wait, current); > > - generic_unplug_device(q); > - add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); > + add_wait_queue(&q->wait_for_requests[rw], &wait); > do { > set_current_state(TASK_UNINTERRUPTIBLE); > + generic_unplug_device(q); > if (q->rq[rw].count == 0) > schedule(); > spin_lock_irq(&io_request_lock); > @@ -829,8 +829,7 @@ void blkdev_release_request(struct reque > */ > if (q) { > list_add(&req->queue, &q->rq[rw].free); > - if (++q->rq[rw].count >= q->batch_requests && > - waitqueue_active(&q->wait_for_requests[rw])) > + if (++q->rq[rw].count >= q->batch_requests) > wake_up(&q->wait_for_requests[rw]); > } > } The unplug() move could be the key, in theory we could end up having to unplug the queue again. Question to the ones seeing the stalls - does a sysrq-s make things go again? -- Jens Axboe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 8:30 ` Jens Axboe @ 2003-05-28 8:43 ` Marc-Christian Petersen 0 siblings, 0 replies; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-28 8:43 UTC (permalink / raw) To: Jens Axboe, Andrew Morton; +Cc: kernel, manish, andrea, marcelo, linux-kernel On Wednesday 28 May 2003 10:30, Jens Axboe wrote: Hi Jens, > The unplug() move could be the key, in theory we could end up having to > unplug the queue again. Hmm, afaik fix-pausing-2 patch does it similar, moving unplug_device() to the same place. > Question to the ones seeing the stalls - does a sysrq-s make things go > again? no (at least not for me) ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 7:51 ` Andrew Morton 2003-05-28 8:30 ` Jens Axboe @ 2003-05-28 8:40 ` Marc-Christian Petersen 2003-05-28 10:13 ` Matthias Mueller 2 siblings, 0 replies; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-28 8:40 UTC (permalink / raw) To: Andrew Morton, Jens Axboe; +Cc: kernel, manish, andrea, marcelo, linux-kernel On Wednesday 28 May 2003 09:51, Andrew Morton wrote: Hi Andrew, > Yes, but why? I don't know :( > It'd be interesting if any of these changes make a difference. I'll check it this evening! Many thanks. ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 7:51 ` Andrew Morton 2003-05-28 8:30 ` Jens Axboe 2003-05-28 8:40 ` Marc-Christian Petersen @ 2003-05-28 10:13 ` Matthias Mueller 2003-05-28 10:18 ` Jens Axboe ` (2 more replies) 2 siblings, 3 replies; 142+ messages in thread From: Matthias Mueller @ 2003-05-28 10:13 UTC (permalink / raw) To: Andrew Morton Cc: Jens Axboe, m.c.p, kernel, manish, andrea, marcelo, linux-kernel On Wed, May 28, 2003 at 12:51:56AM -0700, Andrew Morton wrote: > It'd be interesting if any of these changes make a difference. > > > drivers/block/ll_rw_blk.c | 7 > fs/buffer.c | 3030 ++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 3033 insertions(+), 4 deletions(-) > > diff -puN drivers/block/ll_rw_blk.c~a drivers/block/ll_rw_blk.c > --- 24/drivers/block/ll_rw_blk.c~a 2003-05-28 00:48:09.000000000 -0700 > +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 00:50:02.000000000 -0700 > @@ -590,10 +590,10 @@ static struct request *__get_request_wai > register struct request *rq; > DECLARE_WAITQUEUE(wait, current); > > - generic_unplug_device(q); > - add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); > + add_wait_queue(&q->wait_for_requests[rw], &wait); > do { > set_current_state(TASK_UNINTERRUPTIBLE); > + generic_unplug_device(q); > if (q->rq[rw].count == 0) > schedule(); > spin_lock_irq(&io_request_lock); > @@ -829,8 +829,7 @@ void blkdev_release_request(struct reque > */ > if (q) { > list_add(&req->queue, &q->rq[rw].free); > - if (++q->rq[rw].count >= q->batch_requests && > - waitqueue_active(&q->wait_for_requests[rw])) > + if (++q->rq[rw].count >= q->batch_requests) > wake_up(&q->wait_for_requests[rw]); > } > } > Works fine on my notebook. Good throughput and no mouse hangs anymore. Thanks, Matthias -- Matthias.Mueller@rz.uni-karlsruhe.de Rechenzentrum Universitaet Karlsruhe Abteilung Netze ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 10:13 ` Matthias Mueller @ 2003-05-28 10:18 ` Jens Axboe 2003-05-28 10:23 ` Andrew Morton 2003-05-28 10:24 ` Marc-Christian Petersen 2 siblings, 0 replies; 142+ messages in thread From: Jens Axboe @ 2003-05-28 10:18 UTC (permalink / raw) To: Andrew Morton, m.c.p, kernel, manish, andrea, marcelo, linux-kernel On Wed, May 28 2003, Matthias Mueller wrote: > On Wed, May 28, 2003 at 12:51:56AM -0700, Andrew Morton wrote: > > It'd be interesting if any of these changes make a difference. > > > > > > drivers/block/ll_rw_blk.c | 7 > > fs/buffer.c | 3030 ++++++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 3033 insertions(+), 4 deletions(-) > > > > diff -puN drivers/block/ll_rw_blk.c~a drivers/block/ll_rw_blk.c > > --- 24/drivers/block/ll_rw_blk.c~a 2003-05-28 00:48:09.000000000 -0700 > > +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 00:50:02.000000000 -0700 > > @@ -590,10 +590,10 @@ static struct request *__get_request_wai > > register struct request *rq; > > DECLARE_WAITQUEUE(wait, current); > > > > - generic_unplug_device(q); > > - add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); > > + add_wait_queue(&q->wait_for_requests[rw], &wait); > > do { > > set_current_state(TASK_UNINTERRUPTIBLE); > > + generic_unplug_device(q); > > if (q->rq[rw].count == 0) > > schedule(); > > spin_lock_irq(&io_request_lock); > > @@ -829,8 +829,7 @@ void blkdev_release_request(struct reque > > */ > > if (q) { > > list_add(&req->queue, &q->rq[rw].free); > > - if (++q->rq[rw].count >= q->batch_requests && > > - waitqueue_active(&q->wait_for_requests[rw])) > > + if (++q->rq[rw].count >= q->batch_requests) > > wake_up(&q->wait_for_requests[rw]); > > } > > } > > > > Works fine on my notebook. Good throughput and no mouse hangs anymore. Could you possibly try just the last hunk of the patch, then? Ie just remove the waitqueue_active(&q->wait_for_requests[rw]) check, leave the rest as-is. -- Jens Axboe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 10:13 ` Matthias Mueller 2003-05-28 10:18 ` Jens Axboe @ 2003-05-28 10:23 ` Andrew Morton 2003-05-28 10:25 ` Jens Axboe ` (4 more replies) 2003-05-28 10:24 ` Marc-Christian Petersen 2 siblings, 5 replies; 142+ messages in thread From: Andrew Morton @ 2003-05-28 10:23 UTC (permalink / raw) To: Matthias Mueller Cc: axboe, m.c.p, kernel, manish, andrea, marcelo, linux-kernel Matthias Mueller <matthias.mueller@rz.uni-karlsruhe.de> wrote: > > Works fine on my notebook. Good throughput and no mouse hangs anymore. Interesting. Could you please work out which change caused it? Go back to stock 2.4 and then apply this: diff -puN drivers/block/ll_rw_blk.c~1 drivers/block/ll_rw_blk.c --- 24/drivers/block/ll_rw_blk.c~1 2003-05-28 03:20:42.000000000 -0700 +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:20:57.000000000 -0700 @@ -590,10 +590,10 @@ static struct request *__get_request_wai register struct request *rq; DECLARE_WAITQUEUE(wait, current); - generic_unplug_device(q); add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); do { set_current_state(TASK_UNINTERRUPTIBLE); + generic_unplug_device(q); if (q->rq[rw].count == 0) schedule(); spin_lock_irq(&io_request_lock); then this: diff -puN drivers/block/ll_rw_blk.c~2 drivers/block/ll_rw_blk.c --- 24/drivers/block/ll_rw_blk.c~2 2003-05-28 03:21:03.000000000 -0700 +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:21:09.000000000 -0700 @@ -590,7 +590,7 @@ static struct request *__get_request_wai register struct request *rq; DECLARE_WAITQUEUE(wait, current); - add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); + add_wait_queue(&q->wait_for_requests[rw], &wait); do { set_current_state(TASK_UNINTERRUPTIBLE); generic_unplug_device(q); Then this (totally unlikely, don't bother): diff -puN drivers/block/ll_rw_blk.c~3 drivers/block/ll_rw_blk.c --- 24/drivers/block/ll_rw_blk.c~3 2003-05-28 03:21:15.000000000 -0700 +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:21:39.000000000 -0700 @@ -829,8 +829,7 @@ void blkdev_release_request(struct reque */ if (q) { list_add(&req->queue, &q->rq[rw].free); - if (++q->rq[rw].count >= q->batch_requests && - waitqueue_active(&q->wait_for_requests[rw])) + if (++q->rq[rw].count >= q->batch_requests) wake_up(&q->wait_for_requests[rw]); } } _ ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 10:23 ` Andrew Morton @ 2003-05-28 10:25 ` Jens Axboe 2003-05-28 10:48 ` Con Kolivas 2003-05-28 10:29 ` Con Kolivas ` (3 subsequent siblings) 4 siblings, 1 reply; 142+ messages in thread From: Jens Axboe @ 2003-05-28 10:25 UTC (permalink / raw) To: Andrew Morton Cc: Matthias Mueller, m.c.p, kernel, manish, andrea, marcelo, linux-kernel On Wed, May 28 2003, Andrew Morton wrote: > Matthias Mueller <matthias.mueller@rz.uni-karlsruhe.de> wrote: > > > > Works fine on my notebook. Good throughput and no mouse hangs anymore. > > Interesting. > > Could you please work out which change caused it? Go back to stock 2.4 and > then apply this: > > > diff -puN drivers/block/ll_rw_blk.c~1 drivers/block/ll_rw_blk.c > --- 24/drivers/block/ll_rw_blk.c~1 2003-05-28 03:20:42.000000000 -0700 > +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:20:57.000000000 -0700 > @@ -590,10 +590,10 @@ static struct request *__get_request_wai > register struct request *rq; > DECLARE_WAITQUEUE(wait, current); > > - generic_unplug_device(q); > add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); > do { > set_current_state(TASK_UNINTERRUPTIBLE); > + generic_unplug_device(q); > if (q->rq[rw].count == 0) > schedule(); > spin_lock_irq(&io_request_lock); I think it was already established that this wasn't the reason. Was my first suspect too, though... > then this: > > diff -puN drivers/block/ll_rw_blk.c~2 drivers/block/ll_rw_blk.c > --- 24/drivers/block/ll_rw_blk.c~2 2003-05-28 03:21:03.000000000 -0700 > +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:21:09.000000000 -0700 > @@ -590,7 +590,7 @@ static struct request *__get_request_wai > register struct request *rq; > DECLARE_WAITQUEUE(wait, current); > > - add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); > + add_wait_queue(&q->wait_for_requests[rw], &wait); > do { > set_current_state(TASK_UNINTERRUPTIBLE); > generic_unplug_device(q); Since we do a general wake_up(), only the order of wakeups matter here right (lifo vs fifo). Given that, the _exclusive() should be more fair possibly at the cost of a bit of throughput. > Then this (totally unlikely, don't bother): > > diff -puN drivers/block/ll_rw_blk.c~3 drivers/block/ll_rw_blk.c > --- 24/drivers/block/ll_rw_blk.c~3 2003-05-28 03:21:15.000000000 -0700 > +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:21:39.000000000 -0700 > @@ -829,8 +829,7 @@ void blkdev_release_request(struct reque > */ > if (q) { > list_add(&req->queue, &q->rq[rw].free); > - if (++q->rq[rw].count >= q->batch_requests && > - waitqueue_active(&q->wait_for_requests[rw])) > + if (++q->rq[rw].count >= q->batch_requests) > wake_up(&q->wait_for_requests[rw]); > } > } Well it's the only one left :). But you are right, try one of them at the time, establishing the effect of each of them. -- Jens Axboe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 10:25 ` Jens Axboe @ 2003-05-28 10:48 ` Con Kolivas 2003-05-28 10:50 ` Jens Axboe 2003-05-28 11:03 ` Nick Piggin 0 siblings, 2 replies; 142+ messages in thread From: Con Kolivas @ 2003-05-28 10:48 UTC (permalink / raw) To: Jens Axboe, Andrew Morton Cc: Matthias Mueller, m.c.p, manish, andrea, marcelo, linux-kernel On Wed, 28 May 2003 20:25, Jens Axboe wrote: > On Wed, May 28 2003, Andrew Morton wrote: > > Matthias Mueller <matthias.mueller@rz.uni-karlsruhe.de> wrote: > > > Works fine on my notebook. Good throughput and no mouse hangs anymore. > > > > Interesting. > > > > Could you please work out which change caused it? Go back to stock 2.4 > > and then apply this: > > > > > > diff -puN drivers/block/ll_rw_blk.c~1 drivers/block/ll_rw_blk.c > > --- 24/drivers/block/ll_rw_blk.c~1 2003-05-28 03:20:42.000000000 -0700 > > +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:20:57.000000000 -0700 > > @@ -590,10 +590,10 @@ static struct request *__get_request_wai > > register struct request *rq; > > DECLARE_WAITQUEUE(wait, current); > > > > - generic_unplug_device(q); > > add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); > > do { > > set_current_state(TASK_UNINTERRUPTIBLE); > > + generic_unplug_device(q); > > if (q->rq[rw].count == 0) > > schedule(); > > spin_lock_irq(&io_request_lock); > > I think it was already established that this wasn't the reason. Was my > first suspect too, though... > > > then this: > > > > diff -puN drivers/block/ll_rw_blk.c~2 drivers/block/ll_rw_blk.c > > --- 24/drivers/block/ll_rw_blk.c~2 2003-05-28 03:21:03.000000000 -0700 > > +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:21:09.000000000 -0700 > > @@ -590,7 +590,7 @@ static struct request *__get_request_wai > > register struct request *rq; > > DECLARE_WAITQUEUE(wait, current); > > > > - add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); > > + add_wait_queue(&q->wait_for_requests[rw], &wait); > > do { > > set_current_state(TASK_UNINTERRUPTIBLE); > > generic_unplug_device(q); > > Since we do a general wake_up(), only the order of wakeups matter here > right (lifo vs fifo). Given that, the _exclusive() should be more fair > possibly at the cost of a bit of throughput. > > > Then this (totally unlikely, don't bother): > > > > diff -puN drivers/block/ll_rw_blk.c~3 drivers/block/ll_rw_blk.c > > --- 24/drivers/block/ll_rw_blk.c~3 2003-05-28 03:21:15.000000000 -0700 > > +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:21:39.000000000 -0700 > > @@ -829,8 +829,7 @@ void blkdev_release_request(struct reque > > */ > > if (q) { > > list_add(&req->queue, &q->rq[rw].free); > > - if (++q->rq[rw].count >= q->batch_requests && > > - waitqueue_active(&q->wait_for_requests[rw])) > > + if (++q->rq[rw].count >= q->batch_requests) > > wake_up(&q->wait_for_requests[rw]); > > } > > } > > Well it's the only one left :). But you are right, try one of them at > the time, establishing the effect of each of them. THIS IS IT! The last one. No pauses writing a 2Gb file now unless I do a read midstream. Con ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 10:48 ` Con Kolivas @ 2003-05-28 10:50 ` Jens Axboe 2003-05-28 10:59 ` Andrew Morton 2003-05-28 11:03 ` Nick Piggin 1 sibling, 1 reply; 142+ messages in thread From: Jens Axboe @ 2003-05-28 10:50 UTC (permalink / raw) To: Con Kolivas Cc: Andrew Morton, Matthias Mueller, m.c.p, manish, andrea, marcelo, linux-kernel On Wed, May 28 2003, Con Kolivas wrote: > On Wed, 28 May 2003 20:25, Jens Axboe wrote: > > On Wed, May 28 2003, Andrew Morton wrote: > > > Matthias Mueller <matthias.mueller@rz.uni-karlsruhe.de> wrote: > > > > Works fine on my notebook. Good throughput and no mouse hangs anymore. > > > > > > Interesting. > > > > > > Could you please work out which change caused it? Go back to stock 2.4 > > > and then apply this: > > > > > > > > > diff -puN drivers/block/ll_rw_blk.c~1 drivers/block/ll_rw_blk.c > > > --- 24/drivers/block/ll_rw_blk.c~1 2003-05-28 03:20:42.000000000 -0700 > > > +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:20:57.000000000 -0700 > > > @@ -590,10 +590,10 @@ static struct request *__get_request_wai > > > register struct request *rq; > > > DECLARE_WAITQUEUE(wait, current); > > > > > > - generic_unplug_device(q); > > > add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); > > > do { > > > set_current_state(TASK_UNINTERRUPTIBLE); > > > + generic_unplug_device(q); > > > if (q->rq[rw].count == 0) > > > schedule(); > > > spin_lock_irq(&io_request_lock); > > > > I think it was already established that this wasn't the reason. Was my > > first suspect too, though... > > > > > then this: > > > > > > diff -puN drivers/block/ll_rw_blk.c~2 drivers/block/ll_rw_blk.c > > > --- 24/drivers/block/ll_rw_blk.c~2 2003-05-28 03:21:03.000000000 -0700 > > > +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:21:09.000000000 -0700 > > > @@ -590,7 +590,7 @@ static struct request *__get_request_wai > > > register struct request *rq; > > > DECLARE_WAITQUEUE(wait, current); > > > > > > - add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); > > > + add_wait_queue(&q->wait_for_requests[rw], &wait); > > > do { > > > set_current_state(TASK_UNINTERRUPTIBLE); > > > generic_unplug_device(q); > > > > Since we do a general wake_up(), only the order of wakeups matter here > > right (lifo vs fifo). Given that, the _exclusive() should be more fair > > possibly at the cost of a bit of throughput. > > > > > Then this (totally unlikely, don't bother): > > > > > > diff -puN drivers/block/ll_rw_blk.c~3 drivers/block/ll_rw_blk.c > > > --- 24/drivers/block/ll_rw_blk.c~3 2003-05-28 03:21:15.000000000 -0700 > > > +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:21:39.000000000 -0700 > > > @@ -829,8 +829,7 @@ void blkdev_release_request(struct reque > > > */ > > > if (q) { > > > list_add(&req->queue, &q->rq[rw].free); > > > - if (++q->rq[rw].count >= q->batch_requests && > > > - waitqueue_active(&q->wait_for_requests[rw])) > > > + if (++q->rq[rw].count >= q->batch_requests) > > > wake_up(&q->wait_for_requests[rw]); > > > } > > > } > > > > Well it's the only one left :). But you are right, try one of them at > > the time, establishing the effect of each of them. > > THIS IS IT! The last one. No pauses writing a 2Gb file now unless I do a read > midstream. Cool, especially since we can easily apply this to -rc5 without any worries. Marcelo, if you please...? ===== drivers/block/ll_rw_blk.c 1.44 vs edited ===== --- 1.44/drivers/block/ll_rw_blk.c Mon Apr 14 12:53:03 2003 +++ edited/drivers/block/ll_rw_blk.c Wed May 28 12:49:30 2003 @@ -829,8 +829,7 @@ */ if (q) { list_add(&req->queue, &q->rq[rw].free); - if (++q->rq[rw].count >= q->batch_requests && - waitqueue_active(&q->wait_for_requests[rw])) + if (++q->rq[rw].count >= q->batch_requests) wake_up(&q->wait_for_requests[rw]); } } -- Jens Axboe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 10:50 ` Jens Axboe @ 2003-05-28 10:59 ` Andrew Morton 2003-05-28 11:17 ` Marc-Christian Petersen 0 siblings, 1 reply; 142+ messages in thread From: Andrew Morton @ 2003-05-28 10:59 UTC (permalink / raw) To: Jens Axboe Cc: kernel, matthias.mueller, m.c.p, manish, andrea, marcelo, linux-kernel Jens Axboe <axboe@suse.de> wrote: > > > THIS IS IT! The last one. No pauses writing a 2Gb file now unless I do a read > > midstream. > > Cool, especially since we can easily apply this to -rc5 without any > worries. Marcelo, if you please...? > > ===== drivers/block/ll_rw_blk.c 1.44 vs edited ===== > --- 1.44/drivers/block/ll_rw_blk.c Mon Apr 14 12:53:03 2003 > +++ edited/drivers/block/ll_rw_blk.c Wed May 28 12:49:30 2003 > @@ -829,8 +829,7 @@ > */ > if (q) { > list_add(&req->queue, &q->rq[rw].free); > - if (++q->rq[rw].count >= q->batch_requests && > - waitqueue_active(&q->wait_for_requests[rw])) > + if (++q->rq[rw].count >= q->batch_requests) > wake_up(&q->wait_for_requests[rw]); > } > } umm, I'd like confirmation of that. The waitqueue_active() test is wrong because of a missing barrier, but only on SMP. And if it does make a mistake it will surely correct itself when the next request is put back. (That's why I left it there...) More testing, please. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 10:59 ` Andrew Morton @ 2003-05-28 11:17 ` Marc-Christian Petersen 2003-05-28 11:27 ` Andrew Morton 2003-05-29 12:52 ` Andrea Arcangeli 0 siblings, 2 replies; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-28 11:17 UTC (permalink / raw) To: Andrew Morton, Jens Axboe Cc: kernel, matthias.mueller, manish, andrea, marcelo, linux-kernel [-- Attachment #1: Type: text/plain, Size: 396 bytes --] On Wednesday 28 May 2003 12:59, Andrew Morton wrote: Hi Andrew, > umm, I'd like confirmation of that. > > The waitqueue_active() test is wrong because of a missing barrier, but only > on SMP. And if it does make a mistake it will surely correct itself when > the next request is put back. (That's why I left it there...) > More testing, please. Does the attached one make sense? ciao, Marc [-- Attachment #2: llrwblk.patch --] [-- Type: text/x-diff, Size: 478 bytes --] --- old/drivers/block/ll_rw_blk.c 2003-05-14 23:11:08.000000000 +0200 +++ new/drivers/block/ll_rw_blk.c 2003-05-28 13:04:34.000000000 +0200 @@ -829,9 +829,10 @@ void blkdev_release_request(struct reque */ if (q) { list_add(&req->queue, &q->rq[rw].free); - if (++q->rq[rw].count >= q->batch_requests && - waitqueue_active(&q->wait_for_requests[rw])) + if (++q->rq[rw].count >= q->batch_requests) { + smp_mb(); wake_up(&q->wait_for_requests[rw]); + } } } ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 11:17 ` Marc-Christian Petersen @ 2003-05-28 11:27 ` Andrew Morton 2003-05-28 11:31 ` Marc-Christian Petersen 2003-05-28 11:41 ` Con Kolivas 2003-05-29 12:52 ` Andrea Arcangeli 1 sibling, 2 replies; 142+ messages in thread From: Andrew Morton @ 2003-05-28 11:27 UTC (permalink / raw) To: Marc-Christian Petersen Cc: axboe, kernel, matthias.mueller, manish, andrea, marcelo, linux-kernel Marc-Christian Petersen <m.c.p@wolk-project.de> wrote: > > Does the attached one make sense? Nope. Guys, you're the ones who can reproduce this. Please spend more time working out which chunk (or combination thereof) actually fixes the problem. If indeed any of them do. I'm suspecting that Con's fingers slipped. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 11:27 ` Andrew Morton @ 2003-05-28 11:31 ` Marc-Christian Petersen 2003-05-28 12:53 ` Jens Axboe 2003-05-29 16:23 ` Marc-Christian Petersen 2003-05-28 11:41 ` Con Kolivas 1 sibling, 2 replies; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-28 11:31 UTC (permalink / raw) To: Andrew Morton Cc: axboe, kernel, matthias.mueller, manish, andrea, marcelo, linux-kernel On Wednesday 28 May 2003 13:27, Andrew Morton wrote: Hi Akpm, > > Does the attached one make sense? > Nope. nm. > Guys, you're the ones who can reproduce this. Please spend more time > working out which chunk (or combination thereof) actually fixes the > problem. If indeed any of them do. As I said, I will test it this evening. ATM I don't have time to recompile and reboot. This evening I will test extensively, even on SMP, SCSI, IDE and so on. ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 11:31 ` Marc-Christian Petersen @ 2003-05-28 12:53 ` Jens Axboe 2003-05-28 12:58 ` Matthias Mueller ` (4 more replies) 2003-05-29 16:23 ` Marc-Christian Petersen 1 sibling, 5 replies; 142+ messages in thread From: Jens Axboe @ 2003-05-28 12:53 UTC (permalink / raw) To: Marc-Christian Petersen Cc: Andrew Morton, kernel, matthias.mueller, manish, andrea, marcelo, linux-kernel On Wed, May 28 2003, Marc-Christian Petersen wrote: > On Wednesday 28 May 2003 13:27, Andrew Morton wrote: > > Hi Akpm, > > > > Does the attached one make sense? > > Nope. > nm. > > > Guys, you're the ones who can reproduce this. Please spend more time > > working out which chunk (or combination thereof) actually fixes the > > problem. If indeed any of them do. > As I said, I will test it this evening. ATM I don't have time to > recompile and reboot. This evening I will test extensively, even on > SMP, SCSI, IDE and so on. May I ask how you are reproducing the bad results? I'm trying in vain here... -- Jens Axboe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 12:53 ` Jens Axboe @ 2003-05-28 12:58 ` Matthias Mueller 2003-05-28 13:07 ` Carl-Daniel Hailfinger ` (3 subsequent siblings) 4 siblings, 0 replies; 142+ messages in thread From: Matthias Mueller @ 2003-05-28 12:58 UTC (permalink / raw) To: Jens Axboe Cc: Marc-Christian Petersen, Andrew Morton, kernel, manish, andrea, marcelo, linux-kernel On Wed, May 28, 2003 at 02:53:12PM +0200, Jens Axboe wrote: > May I ask how you are reproducing the bad results? I'm trying in vain > here... I can reproduce it with dd if=/dev/zero of=trash bs=4096 count=65000 on my notebook (probably a slower harddisk makes it easier to see the mouse hangs). Matthias ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 12:53 ` Jens Axboe 2003-05-28 12:58 ` Matthias Mueller @ 2003-05-28 13:07 ` Carl-Daniel Hailfinger 2003-05-28 13:08 ` Jens Axboe 2003-05-28 13:25 ` Stefan Foerster ` (2 subsequent siblings) 4 siblings, 1 reply; 142+ messages in thread From: Carl-Daniel Hailfinger @ 2003-05-28 13:07 UTC (permalink / raw) To: Jens Axboe Cc: Marc-Christian Petersen, Andrew Morton, kernel, matthias.mueller, manish, andrea, marcelo, linux-kernel Jens Axboe wrote: > On Wed, May 28 2003, Marc-Christian Petersen wrote: > >>On Wednesday 28 May 2003 13:27, Andrew Morton wrote: >> >>>Guys, you're the ones who can reproduce this. Please spend more time >>>working out which chunk (or combination thereof) actually fixes the >>>problem. If indeed any of them do. >> >>As I said, I will test it this evening. ATM I don't have time to >>recompile and reboot. This evening I will test extensively, even on >>SMP, SCSI, IDE and so on. > > May I ask how you are reproducing the bad results? I'm trying in vain > here... Quoting Con Kolivas: dd if=/dev/zero of=dump bs=4096 count=512000 HTH, Carl-Daniel ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 13:07 ` Carl-Daniel Hailfinger @ 2003-05-28 13:08 ` Jens Axboe 2003-05-28 13:16 ` Matthias Mueller ` (3 more replies) 0 siblings, 4 replies; 142+ messages in thread From: Jens Axboe @ 2003-05-28 13:08 UTC (permalink / raw) To: Carl-Daniel Hailfinger Cc: Marc-Christian Petersen, Andrew Morton, kernel, matthias.mueller, manish, andrea, marcelo, linux-kernel On Wed, May 28 2003, Carl-Daniel Hailfinger wrote: > Jens Axboe wrote: > > On Wed, May 28 2003, Marc-Christian Petersen wrote: > > > >>On Wednesday 28 May 2003 13:27, Andrew Morton wrote: > >> > >>>Guys, you're the ones who can reproduce this. Please spend more time > >>>working out which chunk (or combination thereof) actually fixes the > >>>problem. If indeed any of them do. > >> > >>As I said, I will test it this evening. ATM I don't have time to > >>recompile and reboot. This evening I will test extensively, even on > >>SMP, SCSI, IDE and so on. > > > > May I ask how you are reproducing the bad results? I'm trying in vain > > here... > > Quoting Con Kolivas: > > dd if=/dev/zero of=dump bs=4096 count=512000 already tried that, no go. on ide/scsi? what filesystem? how much ram? anything else running? smp/up? -- Jens Axboe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 13:08 ` Jens Axboe @ 2003-05-28 13:16 ` Matthias Mueller 2003-05-28 13:21 ` Con Kolivas ` (2 subsequent siblings) 3 siblings, 0 replies; 142+ messages in thread From: Matthias Mueller @ 2003-05-28 13:16 UTC (permalink / raw) To: Jens Axboe Cc: Carl-Daniel Hailfinger, Marc-Christian Petersen, Andrew Morton, kernel, manish, andrea, marcelo, linux-kernel On Wed, May 28, 2003 at 03:08:39PM +0200, Jens Axboe wrote: > > > May I ask how you are reproducing the bad results? I'm trying in vain > > > here... > > > > Quoting Con Kolivas: > > > > dd if=/dev/zero of=dump bs=4096 count=512000 > > already tried that, no go. on ide/scsi? what filesystem? how much ram? > anything else running? smp/up? ide-notebook-harddrive, tested with ext2 and ext3. 256MB Ram, X11 started, idle bind9 and idle postgresql. Tested directly after a reboot, ~85MB Ram used without buffers/cache. Matthias ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 13:08 ` Jens Axboe 2003-05-28 13:16 ` Matthias Mueller @ 2003-05-28 13:21 ` Con Kolivas 2003-05-28 13:30 ` Carl-Daniel Hailfinger 2003-05-28 13:27 ` Stefan Foerster 2003-05-28 14:28 ` Chris Mason 3 siblings, 1 reply; 142+ messages in thread From: Con Kolivas @ 2003-05-28 13:21 UTC (permalink / raw) To: Jens Axboe, Carl-Daniel Hailfinger Cc: Marc-Christian Petersen, Andrew Morton, matthias.mueller, manish, andrea, marcelo, linux-kernel On Wed, 28 May 2003 23:08, Jens Axboe wrote: > On Wed, May 28 2003, Carl-Daniel Hailfinger wrote: > > Jens Axboe wrote: > > > On Wed, May 28 2003, Marc-Christian Petersen wrote: > > >>On Wednesday 28 May 2003 13:27, Andrew Morton wrote: > > >>>Guys, you're the ones who can reproduce this. Please spend more time > > >>>working out which chunk (or combination thereof) actually fixes the > > >>>problem. If indeed any of them do. > > >> > > >>As I said, I will test it this evening. ATM I don't have time to > > >>recompile and reboot. This evening I will test extensively, even on > > >>SMP, SCSI, IDE and so on. > > > > > > May I ask how you are reproducing the bad results? I'm trying in vain > > > here... > > > > Quoting Con Kolivas: > > > > dd if=/dev/zero of=dump bs=4096 count=512000 > > already tried that, no go. on ide/scsi? what filesystem? how much ram? > anything else running? smp/up? I'm using UP on IDE. I reproduce it easily on a P3 256Mb laptop with 5400rpm drive, and less easily but still occurs on a P4 2.53 512Mb pc with 2x7200rpm software raid 0 IDE drives. Even if the only thing you try to do is move the mouse, the mouse will freeze for up to 30secs. When you first start the write no disk activity happens for up to a few seconds, then it will start writing madly and the machine will come to a standstill for a variable length of time. Then it will come back to life for a few seconds only to die again for a few seconds and so on till the write is complete. Still testing combinations to see which is the best, but 1+2 seems better than 3 alone as doing reads midstream in the write don't cause hangs. I haven't seen zombie processes ever. Con ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 13:21 ` Con Kolivas @ 2003-05-28 13:30 ` Carl-Daniel Hailfinger 2003-05-28 13:33 ` Con Kolivas 0 siblings, 1 reply; 142+ messages in thread From: Carl-Daniel Hailfinger @ 2003-05-28 13:30 UTC (permalink / raw) To: Con Kolivas Cc: Jens Axboe, Marc-Christian Petersen, Andrew Morton, matthias.mueller, manish, andrea, marcelo, linux-kernel Con Kolivas wrote: > On Wed, 28 May 2003 23:08, Jens Axboe wrote: > >>On Wed, May 28 2003, Carl-Daniel Hailfinger wrote: >> >>>Jens Axboe wrote: >>> >>>>On Wed, May 28 2003, Marc-Christian Petersen wrote: >>>> >>>>>On Wednesday 28 May 2003 13:27, Andrew Morton wrote: >>>>> >>>>>>Guys, you're the ones who can reproduce this. Please spend more time >>>>>>working out which chunk (or combination thereof) actually fixes the >>>>>>problem. If indeed any of them do. >>>>> >>>>>As I said, I will test it this evening. ATM I don't have time to >>>>>recompile and reboot. This evening I will test extensively, even on >>>>>SMP, SCSI, IDE and so on. >>>> >>>>May I ask how you are reproducing the bad results? I'm trying in vain >>>>here... >>> >>>Quoting Con Kolivas: >>> >>>dd if=/dev/zero of=dump bs=4096 count=512000 >> >>already tried that, no go. on ide/scsi? what filesystem? how much ram? >>anything else running? smp/up? > > > I'm using UP on IDE. I reproduce it easily on a P3 256Mb laptop with 5400rpm > drive, and less easily but still occurs on a P4 2.53 512Mb pc with 2x7200rpm > software raid 0 IDE drives. Even if the only thing you try to do is move the > mouse, the mouse will freeze for up to 30secs. When you first start the write > no disk activity happens for up to a few seconds, then it will start writing > madly and the machine will come to a standstill for a variable length of > time. Then it will come back to life for a few seconds only to die again for > a few seconds and so on till the write is complete. > > Still testing combinations to see which is the best, but 1+2 seems better than > 3 alone as doing reads midstream in the write don't cause hangs. I haven't > seen zombie processes ever. Just curious - which compiler did you use? Carl-Daniel ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 13:30 ` Carl-Daniel Hailfinger @ 2003-05-28 13:33 ` Con Kolivas 0 siblings, 0 replies; 142+ messages in thread From: Con Kolivas @ 2003-05-28 13:33 UTC (permalink / raw) To: Carl-Daniel Hailfinger Cc: Jens Axboe, Marc-Christian Petersen, Andrew Morton, matthias.mueller, manish, andrea, marcelo, linux-kernel On Wed, 28 May 2003 23:30, Carl-Daniel Hailfinger wrote: > Con Kolivas wrote: > > On Wed, 28 May 2003 23:08, Jens Axboe wrote: > >>On Wed, May 28 2003, Carl-Daniel Hailfinger wrote: > >>>Jens Axboe wrote: > >>>>On Wed, May 28 2003, Marc-Christian Petersen wrote: > >>>>>On Wednesday 28 May 2003 13:27, Andrew Morton wrote: > >>>>>>Guys, you're the ones who can reproduce this. Please spend more time > >>>>>>working out which chunk (or combination thereof) actually fixes the > >>>>>>problem. If indeed any of them do. > >>>>> > >>>>>As I said, I will test it this evening. ATM I don't have time to > >>>>>recompile and reboot. This evening I will test extensively, even on > >>>>>SMP, SCSI, IDE and so on. > >>>> > >>>>May I ask how you are reproducing the bad results? I'm trying in vain > >>>>here... > >>> > >>>Quoting Con Kolivas: > >>> > >>>dd if=/dev/zero of=dump bs=4096 count=512000 > >> > >>already tried that, no go. on ide/scsi? what filesystem? how much ram? > >>anything else running? smp/up? > > > > I'm using UP on IDE. I reproduce it easily on a P3 256Mb laptop with > > 5400rpm drive, and less easily but still occurs on a P4 2.53 512Mb pc > > with 2x7200rpm software raid 0 IDE drives. Even if the only thing you try > > to do is move the mouse, the mouse will freeze for up to 30secs. When you > > first start the write no disk activity happens for up to a few seconds, > > then it will start writing madly and the machine will come to a > > standstill for a variable length of time. Then it will come back to life > > for a few seconds only to die again for a few seconds and so on till the > > write is complete. > > > > Still testing combinations to see which is the best, but 1+2 seems better > > than 3 alone as doing reads midstream in the write don't cause hangs. I > > haven't seen zombie processes ever. > > Just curious - which compiler did you use? For this latest testing gcc 3.2.2 The hangs predate this to a time when I was using 2.95.3 and getting the hangs. Con ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 13:08 ` Jens Axboe 2003-05-28 13:16 ` Matthias Mueller 2003-05-28 13:21 ` Con Kolivas @ 2003-05-28 13:27 ` Stefan Foerster 2003-05-28 13:37 ` Stefan Foerster 2003-05-28 14:28 ` Chris Mason 3 siblings, 1 reply; 142+ messages in thread From: Stefan Foerster @ 2003-05-28 13:27 UTC (permalink / raw) To: Jens Axboe Cc: Carl-Daniel Hailfinger, Marc-Christian Petersen, Andrew Morton, kernel, matthias.mueller, manish, andrea, marcelo, linux-kernel * Jens Axboe <axboe@suse.de> wrote: > On Wed, May 28 2003, Carl-Daniel Hailfinger wrote: >> dd if=/dev/zero of=dump bs=4096 count=512000 > > already tried that, no go. on ide/scsi? what filesystem? how much ram? > anything else running? smp/up? Doesn't matter if IDE or SCSI, to be honest, SCSI with the old aic7xxx from vanilla 2.4.20 is even worse than IDE. My box is up, had only my window manager with some open xterms running, nothing which should create any load. Ciao Stefan -- Stefan Förster Public Key: 0xBBE2A9E9 FdI #122: Updateritis - Softwarebulemie (Frank Klemm) ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 13:27 ` Stefan Foerster @ 2003-05-28 13:37 ` Stefan Foerster 0 siblings, 0 replies; 142+ messages in thread From: Stefan Foerster @ 2003-05-28 13:37 UTC (permalink / raw) To: linux-kernel * Stefan Foerster <stefan@stefan-foerster.de> wrote: [...] > Doesn't matter if IDE or SCSI, to be honest, SCSI with the old aic7xxx > from vanilla 2.4.20 is even worse than IDE. > > My box is up, had only my window manager with some open xterms > running, nothing which should create any load. Oh silly me, forgot to include that info: I have 512MB of RAM, an Athlon XP. Filesystems didn't seem to matter much in my tests, got hangs with ext2, ext3 and XFS. Ciao Stefan -- Stefan Förster Public Key: 0xBBE2A9E9 FdI #44: Verdeckter Fehler - Siemens hat mitentwickelt. (Jörg Pechau) ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 13:08 ` Jens Axboe ` (2 preceding siblings ...) 2003-05-28 13:27 ` Stefan Foerster @ 2003-05-28 14:28 ` Chris Mason 2003-05-28 14:33 ` Jens Axboe 3 siblings, 1 reply; 142+ messages in thread From: Chris Mason @ 2003-05-28 14:28 UTC (permalink / raw) To: Jens Axboe Cc: Carl-Daniel Hailfinger, Marc-Christian Petersen, Andrew Morton, kernel, matthias.mueller, manish, andrea, marcelo, linux-kernel On Wed, 2003-05-28 at 09:08, Jens Axboe wrote: > > > > May I ask how you are reproducing the bad results? I'm trying in vain > > > here... > > > > Quoting Con Kolivas: > > > > dd if=/dev/zero of=dump bs=4096 count=512000 > > already tried that, no go. on ide/scsi? what filesystem? how much ram? > anything else running? smp/up? I think we've got a few different problems. On SMP boxes, you need to have the fix-pausing patch from andrea applied to catch all the corner cases. On UP boxes it's possible the requests are starving in the drive, SCSI users should try with the max tags set down to something sensible, between 8 and 32. IDE people can try lowering the max_kb_per_request paramater in /proc/ide/<drive>/settings, but this should only affect starvation with the writeback cache on. I made a patch a while ago that timed how long people spent waiting in __get_request_wait, it might help us figure out where the starvation is really happening. -chris ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 14:28 ` Chris Mason @ 2003-05-28 14:33 ` Jens Axboe 2003-05-28 14:58 ` Chris Mason 0 siblings, 1 reply; 142+ messages in thread From: Jens Axboe @ 2003-05-28 14:33 UTC (permalink / raw) To: Chris Mason Cc: Carl-Daniel Hailfinger, Marc-Christian Petersen, Andrew Morton, kernel, matthias.mueller, manish, andrea, marcelo, linux-kernel On Wed, May 28 2003, Chris Mason wrote: > On Wed, 2003-05-28 at 09:08, Jens Axboe wrote: > > > > > > May I ask how you are reproducing the bad results? I'm trying in vain > > > > here... > > > > > > Quoting Con Kolivas: > > > > > > dd if=/dev/zero of=dump bs=4096 count=512000 > > > > already tried that, no go. on ide/scsi? what filesystem? how much ram? > > anything else running? smp/up? > > I think we've got a few different problems. On SMP boxes, you need to > have the fix-pausing patch from andrea applied to catch all the corner > cases. Agree > > On UP boxes it's possible the requests are starving in the drive, SCSI > users should try with the max tags set down to something sensible, > between 8 and 32. > > IDE people can try lowering the max_kb_per_request paramater in > /proc/ide/<drive>/settings, but this should only affect starvation with > the writeback cache on. > > I made a patch a while ago that timed how long people spent waiting in > __get_request_wait, it might help us figure out where the starvation is > really happening. But this seems totally unrelated to the reported problems, we are talking about complete stalls of the mouse. No amount of io starvation should provoke something like that. -- Jens Axboe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 14:33 ` Jens Axboe @ 2003-05-28 14:58 ` Chris Mason 2003-05-28 15:39 ` Jens Axboe 0 siblings, 1 reply; 142+ messages in thread From: Chris Mason @ 2003-05-28 14:58 UTC (permalink / raw) To: Jens Axboe Cc: Carl-Daniel Hailfinger, Marc-Christian Petersen, Andrew Morton, kernel, matthias.mueller, manish, andrea, marcelo, linux-kernel On Wed, 2003-05-28 at 10:33, Jens Axboe wrote: > > On UP boxes it's possible the requests are starving in the drive, SCSI > > users should try with the max tags set down to something sensible, > > between 8 and 32. > > > > IDE people can try lowering the max_kb_per_request paramater in > > /proc/ide/<drive>/settings, but this should only affect starvation with > > the writeback cache on. > > > > I made a patch a while ago that timed how long people spent waiting in > > __get_request_wait, it might help us figure out where the starvation is > > really happening. > > But this seems totally unrelated to the reported problems, we are > talking about complete stalls of the mouse. No amount of io starvation > should provoke something like that. Well, if it wasn't io related starvation, andrew's batch requests patch wouldn't change things. I'm hoping the stats patch will get us some numbers to go along with the perceived stalls, almost done merging. -chris ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 14:58 ` Chris Mason @ 2003-05-28 15:39 ` Jens Axboe 2003-05-28 23:38 ` Chris Mason 0 siblings, 1 reply; 142+ messages in thread From: Jens Axboe @ 2003-05-28 15:39 UTC (permalink / raw) To: Chris Mason Cc: Carl-Daniel Hailfinger, Marc-Christian Petersen, Andrew Morton, kernel, matthias.mueller, manish, andrea, marcelo, linux-kernel On Wed, May 28 2003, Chris Mason wrote: > On Wed, 2003-05-28 at 10:33, Jens Axboe wrote: > > > > On UP boxes it's possible the requests are starving in the drive, SCSI > > > users should try with the max tags set down to something sensible, > > > between 8 and 32. > > > > > > IDE people can try lowering the max_kb_per_request paramater in > > > /proc/ide/<drive>/settings, but this should only affect starvation with > > > the writeback cache on. > > > > > > I made a patch a while ago that timed how long people spent waiting in > > > __get_request_wait, it might help us figure out where the starvation is > > > really happening. > > > > But this seems totally unrelated to the reported problems, we are > > talking about complete stalls of the mouse. No amount of io starvation > > should provoke something like that. > > Well, if it wasn't io related starvation, andrew's batch requests patch > wouldn't change things. I'm hoping the stats patch will get us some > numbers to go along with the perceived stalls, almost done merging. Correction then, it doesn't appear to be starvation in the usual sense. But you are right, pulling some stats out of the situation would be nice. I still can't reproduce here. -- Jens Axboe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 15:39 ` Jens Axboe @ 2003-05-28 23:38 ` Chris Mason 0 siblings, 0 replies; 142+ messages in thread From: Chris Mason @ 2003-05-28 23:38 UTC (permalink / raw) To: Jens Axboe Cc: Carl-Daniel Hailfinger, Marc-Christian Petersen, Andrew Morton, kernel, matthias.mueller, manish, andrea, marcelo, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1578 bytes --] On Wed, 2003-05-28 at 11:39, Jens Axboe wrote: > Correction then, it doesn't appear to be starvation in the usual sense. > But you are right, pulling some stats out of the situation would be > nice. I still can't reproduce here. Well, it's not pretty but it gets some numbers out there. This patch only calculates the time spent waiting in __get_request_wait, it isn't interested in any other metrics. stats are per-queue and are reset when you mount the FS, you get a print out either when you unmount the FS or when you run elvtune /dev/xxx (no other args, just enough to trigger the read ioctl). The output looks like this (after a dbench 50 run 2.4.21-rc6) device 03:04: num_req 12248, total jiffies waited 26729 417 forced to wait 1 min wait, 432 max wait 64 average wait 314 < 100, 62 < 200, 20 < 300, 20 < 400, 1 < 500 0 waits longer than 500 jiffies It tells us there were 12248 total requests (merges don't count), and that we spent 26,729 jiffies waiting in __get_request_wait. We had to wait 417 times, the minimum was 1 and the max was 432 jiffies. The line with the < signs is a simple way to get the deviations. 314 requests waited < 100 jiffies, 62 requests waited less than 200 jiffies, etc. People who see stalls on UP machines and have seen improvements by playing with code in drivers/block/ll_rw_blk.c are encouraged to try getting numbers with this patch applied. It will make it easier to figure things out. I haven't tried Andrea's fix-pausing on top of this yet, any rejects should be minor. -chris [-- Attachment #2: lat-stat-3.diff --] [-- Type: text/plain, Size: 4770 bytes --] ===== drivers/block/blkpg.c 1.9 vs edited ===== --- 1.9/drivers/block/blkpg.c Sat Mar 30 06:58:05 2002 +++ edited/drivers/block/blkpg.c Wed May 28 19:33:16 2003 @@ -261,6 +261,7 @@ return blkpg_ioctl(dev, (struct blkpg_ioctl_arg *) arg); case BLKELVGET: + blk_print_stats(dev); return blkelvget_ioctl(&blk_get_queue(dev)->elevator, (blkelv_ioctl_arg_t *) arg); case BLKELVSET: ===== drivers/block/ll_rw_blk.c 1.44 vs edited ===== --- 1.44/drivers/block/ll_rw_blk.c Mon Apr 14 06:53:03 2003 +++ edited/drivers/block/ll_rw_blk.c Wed May 28 19:34:10 2003 @@ -442,6 +442,56 @@ spin_lock_init(&q->queue_lock); } +void blk_print_stats(kdev_t dev) +{ + request_queue_t *q; + unsigned long avg_wait; + unsigned long min_wait; + unsigned long high_wait; + unsigned long *d; + + q = blk_get_queue(dev); + if (!q) + return; + + min_wait = q->min_wait; + if (min_wait == ~0UL) + min_wait = 0; + if (q->num_wait) + avg_wait = q->total_wait / q->num_wait; + else + avg_wait = 0; + printk("device %s: num_req %lu, total jiffies waited %lu\n", + kdevname(dev), q->num_req, q->total_wait); + printk("\t%lu forced to wait\n", q->num_wait); + printk("\t%lu min wait, %lu max wait\n", min_wait, q->max_wait); + printk("\t%lu average wait\n", avg_wait); + d = q->deviation; + printk("\t%lu < 100, %lu < 200, %lu < 300, %lu < 400, %lu < 500\n", + d[0], d[1], d[2], d[3], d[4]); + high_wait = d[0] + d[1] + d[2] + d[3] + d[4]; + high_wait = q->num_wait - high_wait; + printk("\t%lu waits longer than 500 jiffies\n", high_wait); +} + +static void reset_stats(request_queue_t *q) +{ + q->max_wait = 0; + q->min_wait = ~0UL; + q->total_wait = 0; + q->num_req = 0; + q->num_wait = 0; + memset(q->deviation, 0, sizeof(q->deviation)); +} +void blk_reset_stats(kdev_t dev) +{ + request_queue_t *q; + q = blk_get_queue(dev); + if (!q) + return; + printk("reset latency stats on device %s\n", kdevname(dev)); + reset_stats(q); +} static int __make_request(request_queue_t * q, int rw, struct buffer_head * bh); /** @@ -491,6 +541,9 @@ q->plug_tq.routine = &generic_unplug_device; q->plug_tq.data = q; q->plugged = 0; + + reset_stats(q); + /* * These booleans describe the queue properties. We set the * default (and most common) values here. Other drivers can @@ -588,6 +641,8 @@ static struct request *__get_request_wait(request_queue_t *q, int rw) { register struct request *rq; + unsigned long wait_start = jiffies; + unsigned long time_waited; DECLARE_WAITQUEUE(wait, current); generic_unplug_device(q); @@ -602,6 +657,18 @@ } while (rq == NULL); remove_wait_queue(&q->wait_for_requests[rw], &wait); current->state = TASK_RUNNING; + + time_waited = jiffies - wait_start; + if (time_waited > q->max_wait) + q->max_wait = time_waited; + if (time_waited && time_waited < q->min_wait) + q->min_wait = time_waited; + q->total_wait += time_waited; + q->num_wait++; + if (time_waited < 500) { + q->deviation[time_waited/100]++; + } + return rq; } @@ -1064,6 +1131,7 @@ req->rq_dev = bh->b_rdev; req->start_time = jiffies; req_new_io(req, 0, count); + q->num_req++; blk_started_io(count); add_request(q, req, insert_here); out: ===== fs/super.c 1.49 vs edited ===== --- 1.49/fs/super.c Wed Dec 18 21:34:24 2002 +++ edited/fs/super.c Wed May 28 19:29:26 2003 @@ -404,6 +404,7 @@ up_write(&s->s_umount); put_super(s); put_filesystem(fs); + blk_print_stats(dev); if (bdev) blkdev_put(bdev, BDEV_FS); else @@ -726,6 +727,7 @@ if (!fs_type->read_super(s, data, flags & MS_VERBOSE ? 1 : 0)) goto Einval; s->s_flags |= MS_ACTIVE; + blk_reset_stats(dev); path_release(&nd); return s; ===== include/linux/blkdev.h 1.23 vs edited ===== --- 1.23/include/linux/blkdev.h Fri Nov 29 17:03:01 2002 +++ edited/include/linux/blkdev.h Wed May 28 19:27:18 2003 @@ -138,8 +138,17 @@ * Tasks wait here for free read and write requests */ wait_queue_head_t wait_for_requests[2]; + unsigned long max_wait; + unsigned long min_wait; + unsigned long total_wait; + unsigned long num_req; + unsigned long num_wait; + unsigned long deviation[5]; }; +void blk_reset_stats(kdev_t dev); +void blk_print_stats(kdev_t dev); + #define blk_queue_plugged(q) (q)->plugged #define blk_fs_request(rq) ((rq)->cmd == READ || (rq)->cmd == WRITE) #define blk_queue_empty(q) list_empty(&(q)->queue_head) @@ -217,6 +226,7 @@ extern void generic_make_request(int rw, struct buffer_head * bh); extern inline request_queue_t *blk_get_queue(kdev_t dev); extern void blkdev_release_request(struct request *); +extern void blk_print_stats(kdev_t dev); /* * Access functions for manipulating queue properties ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 12:53 ` Jens Axboe 2003-05-28 12:58 ` Matthias Mueller 2003-05-28 13:07 ` Carl-Daniel Hailfinger @ 2003-05-28 13:25 ` Stefan Foerster 2003-05-28 18:19 ` Zwane Mwaikambo 2003-05-28 18:47 ` Elladan 4 siblings, 0 replies; 142+ messages in thread From: Stefan Foerster @ 2003-05-28 13:25 UTC (permalink / raw) To: Jens Axboe Cc: Marc-Christian Petersen, Andrew Morton, kernel, matthias.mueller, manish, andrea, marcelo, linux-kernel * Jens Axboe <axboe@suse.de> wrote: > On Wed, May 28 2003, Marc-Christian Petersen wrote: >>> Guys, you're the ones who can reproduce this. Please spend more time >>> working out which chunk (or combination thereof) actually fixes the >>> problem. If indeed any of them do. >> As I said, I will test it this evening. ATM I don't have time to >> recompile and reboot. This evening I will test extensively, even on >> SMP, SCSI, IDE and so on. > > May I ask how you are reproducing the bad results? I'm trying in vain > here... It is easily reproducable by using dd with an appropriate blocksize reading from /dev/zero. With chunk #3 from Andrew, I do not get pauses, but I noticed text scrolling in an xterm stopping for like a second. I did not get any zombie processes. Ciao Stefan ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 12:53 ` Jens Axboe ` (2 preceding siblings ...) 2003-05-28 13:25 ` Stefan Foerster @ 2003-05-28 18:19 ` Zwane Mwaikambo 2003-05-28 18:32 ` Zwane Mwaikambo 2003-05-28 18:47 ` Elladan 4 siblings, 1 reply; 142+ messages in thread From: Zwane Mwaikambo @ 2003-05-28 18:19 UTC (permalink / raw) To: Jens Axboe Cc: Marc-Christian Petersen, Andrew Morton, kernel, matthias.mueller, manish, andrea, marcelo, linux-kernel On Wed, 28 May 2003, Jens Axboe wrote: > > > Guys, you're the ones who can reproduce this. Please spend more time > > > working out which chunk (or combination thereof) actually fixes the > > > problem. If indeed any of them do. > > As I said, I will test it this evening. ATM I don't have time to > > recompile and reboot. This evening I will test extensively, even on > > SMP, SCSI, IDE and so on. > > May I ask how you are reproducing the bad results? I'm trying in vain > here... I can reproduce across spindles with cvs import'ing a kernel tree, make sure you're running X11 and try and do things in it, e.g. scrolling windows, dragging etc. Zwane -- function.linuxpower.ca ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 18:19 ` Zwane Mwaikambo @ 2003-05-28 18:32 ` Zwane Mwaikambo 0 siblings, 0 replies; 142+ messages in thread From: Zwane Mwaikambo @ 2003-05-28 18:32 UTC (permalink / raw) To: Jens Axboe Cc: Marc-Christian Petersen, Andrew Morton, kernel, matthias.mueller, manish, andrea, marcelo, linux-kernel On Wed, 28 May 2003, Zwane Mwaikambo wrote: > I can reproduce across spindles with cvs import'ing a kernel tree, > make sure you're running X11 and try and do things in it, e.g. scrolling > windows, dragging etc. Forgot to mention, 2x 400MHz/512MB RAM, read is from UW2/7200 write to UDMA33/5400 (w/ 2MB cache). Zwane -- function.linuxpower.ca ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 12:53 ` Jens Axboe ` (3 preceding siblings ...) 2003-05-28 18:19 ` Zwane Mwaikambo @ 2003-05-28 18:47 ` Elladan 2003-05-28 23:03 ` Con Kolivas 4 siblings, 1 reply; 142+ messages in thread From: Elladan @ 2003-05-28 18:47 UTC (permalink / raw) To: Jens Axboe Cc: Marc-Christian Petersen, Andrew Morton, kernel, matthias.mueller, manish, andrea, marcelo, linux-kernel On Wed, May 28, 2003 at 02:53:12PM +0200, Jens Axboe wrote: > On Wed, May 28 2003, Marc-Christian Petersen wrote: > > On Wednesday 28 May 2003 13:27, Andrew Morton wrote: > > > > Hi Akpm, > > > > > > Does the attached one make sense? > > > Nope. > > nm. > > > > > Guys, you're the ones who can reproduce this. Please spend more time > > > working out which chunk (or combination thereof) actually fixes the > > > problem. If indeed any of them do. > > As I said, I will test it this evening. ATM I don't have time to > > recompile and reboot. This evening I will test extensively, even on > > SMP, SCSI, IDE and so on. > > May I ask how you are reproducing the bad results? I'm trying in vain > here... It might be useful to check what video hardware and X servers people are using here. If the behavior is just mouse freezups, the "silken mouse" feature of XFree might have some effect, since it involves XFree binding a signal to mouse device events. -J ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 18:47 ` Elladan @ 2003-05-28 23:03 ` Con Kolivas 2003-05-29 13:09 ` Andrea Arcangeli 0 siblings, 1 reply; 142+ messages in thread From: Con Kolivas @ 2003-05-28 23:03 UTC (permalink / raw) To: Elladan, Jens Axboe Cc: Marc-Christian Petersen, Andrew Morton, matthias.mueller, manish, andrea, marcelo, linux-kernel On Thu, 29 May 2003 04:47, Elladan wrote: > On Wed, May 28, 2003 at 02:53:12PM +0200, Jens Axboe wrote: > > On Wed, May 28 2003, Marc-Christian Petersen wrote: > > > On Wednesday 28 May 2003 13:27, Andrew Morton wrote: > > > > > > Hi Akpm, > > > > > > > > Does the attached one make sense? > > > > > > > > Nope. > > > > > > nm. > > > > > > > Guys, you're the ones who can reproduce this. Please spend more time > > > > working out which chunk (or combination thereof) actually fixes the > > > > problem. If indeed any of them do. > > > > > > As I said, I will test it this evening. ATM I don't have time to > > > recompile and reboot. This evening I will test extensively, even on > > > SMP, SCSI, IDE and so on. > > > > May I ask how you are reproducing the bad results? I'm trying in vain > > here... > > It might be useful to check what video hardware and X servers people are > using here. If the behavior is just mouse freezups, the "silken mouse" > feature of XFree might have some effect, since it involves XFree binding > a signal to mouse device events. Xfree 3.3.6, 4.2,4.3 Drivers nvidia, nv, sis, sisfb, vesa, vesafb are the drivers on the machines where I've seen it happen so far - ie without discrimination. Con ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 23:03 ` Con Kolivas @ 2003-05-29 13:09 ` Andrea Arcangeli 2003-05-29 15:04 ` Con Kolivas 0 siblings, 1 reply; 142+ messages in thread From: Andrea Arcangeli @ 2003-05-29 13:09 UTC (permalink / raw) To: Con Kolivas Cc: Elladan, Jens Axboe, Marc-Christian Petersen, Andrew Morton, matthias.mueller, manish, marcelo, linux-kernel On Thu, May 29, 2003 at 09:03:42AM +1000, Con Kolivas wrote: > On Thu, 29 May 2003 04:47, Elladan wrote: > > On Wed, May 28, 2003 at 02:53:12PM +0200, Jens Axboe wrote: > > > On Wed, May 28 2003, Marc-Christian Petersen wrote: > > > > On Wednesday 28 May 2003 13:27, Andrew Morton wrote: > > > > > > > > Hi Akpm, > > > > > > > > > > Does the attached one make sense? > > > > > > > > > > Nope. > > > > > > > > nm. > > > > > > > > > Guys, you're the ones who can reproduce this. Please spend more time > > > > > working out which chunk (or combination thereof) actually fixes the > > > > > problem. If indeed any of them do. > > > > > > > > As I said, I will test it this evening. ATM I don't have time to > > > > recompile and reboot. This evening I will test extensively, even on > > > > SMP, SCSI, IDE and so on. > > > > > > May I ask how you are reproducing the bad results? I'm trying in vain > > > here... > > > > It might be useful to check what video hardware and X servers people are > > using here. If the behavior is just mouse freezups, the "silken mouse" > > feature of XFree might have some effect, since it involves XFree binding > > a signal to mouse device events. > > Xfree 3.3.6, 4.2,4.3 > Drivers nvidia, nv, sis, sisfb, vesa, vesafb > > are the drivers on the machines where I've seen it happen so far - ie without > discrimination. what about the window manager? do you use focus follow mouse? Just trying to find a pattern. For the record KDE 3.1 + focus follow mouse and X 4.3.0 here, I guess Jens uses the same software combination. the mouse for me is always perfectly fluid no matter how fast and how long I write, no matter if I don't touch the mouse for minutes, ALT+TAB as well. I definitely can't reproduce in any way the mouse stalls (I'm using cp /dev/zero . on a ext3 fs in ordered mode). hardware is 1G of ram smp IDE single spindle primary master matrox GS450. I almost couldn't notice the background write flood if I only would increase the xmms buffer (infact I thought it stopped writing for a dozen seconds out of space, and instead it was still writing). (kernel is 2.4.21rc4aa1 of course) Andrea ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-29 13:09 ` Andrea Arcangeli @ 2003-05-29 15:04 ` Con Kolivas 0 siblings, 0 replies; 142+ messages in thread From: Con Kolivas @ 2003-05-29 15:04 UTC (permalink / raw) To: Andrea Arcangeli Cc: Elladan, Jens Axboe, Marc-Christian Petersen, Andrew Morton, matthias.mueller, manish, marcelo, linux-kernel On Thu, 29 May 2003 23:09, Andrea Arcangeli wrote: > On Thu, May 29, 2003 at 09:03:42AM +1000, Con Kolivas wrote: > > On Thu, 29 May 2003 04:47, Elladan wrote: > > > On Wed, May 28, 2003 at 02:53:12PM +0200, Jens Axboe wrote: > > > > On Wed, May 28 2003, Marc-Christian Petersen wrote: > > > > > On Wednesday 28 May 2003 13:27, Andrew Morton wrote: > > > > > > > > > > Hi Akpm, > > > > > > > > > > > > Does the attached one make sense? > > > > > > > > > > > > Nope. > > > > > > > > > > nm. > > > > > > > > > > > Guys, you're the ones who can reproduce this. Please spend more > > > > > > time working out which chunk (or combination thereof) actually > > > > > > fixes the problem. If indeed any of them do. > > > > > > > > > > As I said, I will test it this evening. ATM I don't have time to > > > > > recompile and reboot. This evening I will test extensively, even on > > > > > SMP, SCSI, IDE and so on. > > > > > > > > May I ask how you are reproducing the bad results? I'm trying in vain > > > > here... > > > > > > It might be useful to check what video hardware and X servers people > > > are using here. If the behavior is just mouse freezups, the "silken > > > mouse" feature of XFree might have some effect, since it involves XFree > > > binding a signal to mouse device events. > > > > Xfree 3.3.6, 4.2,4.3 > > Drivers nvidia, nv, sis, sisfb, vesa, vesafb > > > > are the drivers on the machines where I've seen it happen so far - ie > > without discrimination. > > what about the window manager? do you use focus follow mouse? Just > trying to find a pattern. For the record KDE 3.1 + focus follow mouse > and X 4.3.0 here, I guess Jens uses the same software combination. the > mouse for me is always perfectly fluid no matter how fast and how long I > write, no matter if I don't touch the mouse for minutes, ALT+TAB as > well. I definitely can't reproduce in any way the mouse stalls (I'm > using cp /dev/zero . on a ext3 fs in ordered mode). hardware is 1G of > ram smp IDE single spindle primary master matrox GS450. I almost > couldn't notice the background write flood if I only would increase the > xmms buffer (infact I thought it stopped writing for a dozen seconds out > of space, and instead it was still writing). (kernel is 2.4.21rc4aa1 of > course) Why should it matter what wm I use if the pauses were there before and not there now? Con ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 11:31 ` Marc-Christian Petersen 2003-05-28 12:53 ` Jens Axboe @ 2003-05-29 16:23 ` Marc-Christian Petersen 1 sibling, 0 replies; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-29 16:23 UTC (permalink / raw) To: Andrew Morton Cc: axboe, kernel, matthias.mueller, andrea, marcelo, linux-kernel On Wednesday 28 May 2003 13:31, Marc-Christian Petersen wrote: Hi Andrew, > > Guys, you're the ones who can reproduce this. Please spend more time > > working out which chunk (or combination thereof) actually fixes the > > problem. If indeed any of them do. > As I said, I will test it this evening. ATM I don't have time to recompile > and reboot. This evening I will test extensively, even on SMP, SCSI, IDE > and so on. Sorry, haven't had any time yesterday. So my 10¢ comment for the patches (like the ones in -rc6). 1. Braindead pausings are GONE (mouse is not sticky as w/o the patch). 2. Mouse sticks are still there rarely (short ones, max. 1 second) (If one can say 1 second is short ...). 3. all three patches are needed. No side effects yet tho. Works with SCSI, IDE and SMP. ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 11:27 ` Andrew Morton 2003-05-28 11:31 ` Marc-Christian Petersen @ 2003-05-28 11:41 ` Con Kolivas 1 sibling, 0 replies; 142+ messages in thread From: Con Kolivas @ 2003-05-28 11:41 UTC (permalink / raw) To: Andrew Morton, Marc-Christian Petersen Cc: axboe, matthias.mueller, manish, andrea, marcelo, linux-kernel On Wed, 28 May 2003 21:27, Andrew Morton wrote: > Marc-Christian Petersen <m.c.p@wolk-project.de> wrote: > > Does the attached one make sense? > > Nope. > > Guys, you're the ones who can reproduce this. Please spend more time > working out which chunk (or combination thereof) actually fixes the > problem. If indeed any of them do. > > I'm suspecting that Con's fingers slipped. I've been known to be email trigger happy in the past but a serious thrashing with just this one change made massive improvements. However - One test case does not a fix give. Others please test this. It's extremely important. If you're interested the best test for me is: dd if=/dev/zero of=dump bs=4096 count=512000 Con ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 11:17 ` Marc-Christian Petersen 2003-05-28 11:27 ` Andrew Morton @ 2003-05-29 12:52 ` Andrea Arcangeli 1 sibling, 0 replies; 142+ messages in thread From: Andrea Arcangeli @ 2003-05-29 12:52 UTC (permalink / raw) To: Marc-Christian Petersen Cc: Andrew Morton, Jens Axboe, kernel, matthias.mueller, manish, marcelo, linux-kernel On Wed, May 28, 2003 at 01:17:59PM +0200, Marc-Christian Petersen wrote: > On Wednesday 28 May 2003 12:59, Andrew Morton wrote: > > Hi Andrew, > > > umm, I'd like confirmation of that. > > > > The waitqueue_active() test is wrong because of a missing barrier, but only > > on SMP. And if it does make a mistake it will surely correct itself when > > the next request is put back. (That's why I left it there...) > > More testing, please. > Does the attached one make sense? btw, I already fixed this race in my tree: void blkdev_release_request(struct request *req) { request_queue_t *q = req->q; req->rq_status = RQ_INACTIVE; req->q = NULL; /* * Request may not have originated from ll_rw_blk. if not, * assume it has free buffers and check waiters */ if (q) { list_add(&req->queue, &q->rq.free); if (++q->rq.count >= q->batch_requests && !blk_oversized_queue_batch(q)) { smp_mb(); if (waitqueue_active(&q->wait_for_requests)) wake_up(&q->wait_for_requests); so if this was this one my tree wouldn't exibith it (and it would trigger on smp only). > > ciao, Marc > > Andrea ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 10:48 ` Con Kolivas 2003-05-28 10:50 ` Jens Axboe @ 2003-05-28 11:03 ` Nick Piggin 1 sibling, 0 replies; 142+ messages in thread From: Nick Piggin @ 2003-05-28 11:03 UTC (permalink / raw) To: Con Kolivas, Jens Axboe, Andrew Morton Cc: Matthias Mueller, m.c.p, manish, andrea, marcelo, linux-kernel Con Kolivas wrote: >On Wed, 28 May 2003 20:25, Jens Axboe wrote: > >>On Wed, May 28 2003, Andrew Morton wrote: >> >>>Then this (totally unlikely, don't bother): >>> >>>diff -puN drivers/block/ll_rw_blk.c~3 drivers/block/ll_rw_blk.c >>>--- 24/drivers/block/ll_rw_blk.c~3 2003-05-28 03:21:15.000000000 -0700 >>>+++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:21:39.000000000 -0700 >>>@@ -829,8 +829,7 @@ void blkdev_release_request(struct reque >>> */ >>> if (q) { >>> list_add(&req->queue, &q->rq[rw].free); >>>- if (++q->rq[rw].count >= q->batch_requests && >>>- waitqueue_active(&q->wait_for_requests[rw])) >>>+ if (++q->rq[rw].count >= q->batch_requests) >>> wake_up(&q->wait_for_requests[rw]); >>> } >>> } >>> >>Well it's the only one left :). But you are right, try one of them at >>the time, establishing the effect of each of them. >> > >THIS IS IT! The last one. No pauses writing a 2Gb file now unless I do a read >midstream. > > OK, I can't see how this would make a difference, but there is similar (batch_requests) code in the mm tree, so it would be nice if someone would work out what is going on. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 10:23 ` Andrew Morton 2003-05-28 10:25 ` Jens Axboe @ 2003-05-28 10:29 ` Con Kolivas 2003-05-28 10:29 ` Marc-Christian Petersen 2003-05-28 12:10 ` Matthias Mueller ` (2 subsequent siblings) 4 siblings, 1 reply; 142+ messages in thread From: Con Kolivas @ 2003-05-28 10:29 UTC (permalink / raw) To: Andrew Morton, Matthias Mueller Cc: axboe, m.c.p, manish, andrea, marcelo, linux-kernel On Wed, 28 May 2003 20:23, Andrew Morton wrote: > Matthias Mueller <matthias.mueller@rz.uni-karlsruhe.de> wrote: > > Works fine on my notebook. Good throughput and no mouse hangs anymore. > > Interesting. > > Could you please work out which change caused it? Go back to stock 2.4 and > then apply this: > > > diff -puN drivers/block/ll_rw_blk.c~1 drivers/block/ll_rw_blk.c > --- 24/drivers/block/ll_rw_blk.c~1 2003-05-28 03:20:42.000000000 -0700 > +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:20:57.000000000 -0700 > @@ -590,10 +590,10 @@ static struct request *__get_request_wai > register struct request *rq; > DECLARE_WAITQUEUE(wait, current); > > - generic_unplug_device(q); > add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); > do { > set_current_state(TASK_UNINTERRUPTIBLE); > + generic_unplug_device(q); > if (q->rq[rw].count == 0) > schedule(); > spin_lock_irq(&io_request_lock); It's not this because this is the layout in my -ck* and it still exhibits the pauses. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 10:29 ` Con Kolivas @ 2003-05-28 10:29 ` Marc-Christian Petersen 0 siblings, 0 replies; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-28 10:29 UTC (permalink / raw) To: Con Kolivas, Andrew Morton, Matthias Mueller Cc: axboe, manish, andrea, marcelo, linux-kernel On Wednesday 28 May 2003 12:29, Con Kolivas wrote: Hi Con, AKPM, Jens, > > diff -puN drivers/block/ll_rw_blk.c~1 drivers/block/ll_rw_blk.c > > --- 24/drivers/block/ll_rw_blk.c~1 2003-05-28 03:20:42.000000000 -0700 > > +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:20:57.000000000 -0700 > > @@ -590,10 +590,10 @@ static struct request *__get_request_wai > > register struct request *rq; > > DECLARE_WAITQUEUE(wait, current); > > > > - generic_unplug_device(q); > > add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); > > do { > > set_current_state(TASK_UNINTERRUPTIBLE); > > + generic_unplug_device(q); > > if (q->rq[rw].count == 0) > > schedule(); > > spin_lock_irq(&io_request_lock); > It's not this because this is the layout in my -ck* and it still exhibits > the pauses. Same for -WOLK* ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 10:23 ` Andrew Morton 2003-05-28 10:25 ` Jens Axboe 2003-05-28 10:29 ` Con Kolivas @ 2003-05-28 12:10 ` Matthias Mueller 2003-05-28 12:14 ` Matthias Mueller 2003-05-29 13:19 ` Andrea Arcangeli 2003-05-28 14:00 ` Con Kolivas 2003-05-29 1:32 ` manish 4 siblings, 2 replies; 142+ messages in thread From: Matthias Mueller @ 2003-05-28 12:10 UTC (permalink / raw) To: Andrew Morton; +Cc: axboe, m.c.p, kernel, manish, andrea, marcelo, linux-kernel On Wed, May 28, 2003 at 03:23:15AM -0700, Andrew Morton wrote: > Could you please work out which change caused it? Go back to stock 2.4 and > then apply this: > > > diff -puN drivers/block/ll_rw_blk.c~1 drivers/block/ll_rw_blk.c > --- 24/drivers/block/ll_rw_blk.c~1 2003-05-28 03:20:42.000000000 -0700 > +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:20:57.000000000 -0700 > @@ -590,10 +590,10 @@ static struct request *__get_request_wai > register struct request *rq; > DECLARE_WAITQUEUE(wait, current); > > - generic_unplug_device(q); > add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); > do { > set_current_state(TASK_UNINTERRUPTIBLE); > + generic_unplug_device(q); > if (q->rq[rw].count == 0) > schedule(); > spin_lock_irq(&io_request_lock); > > > > then this: > > diff -puN drivers/block/ll_rw_blk.c~2 drivers/block/ll_rw_blk.c > --- 24/drivers/block/ll_rw_blk.c~2 2003-05-28 03:21:03.000000000 -0700 > +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:21:09.000000000 -0700 > @@ -590,7 +590,7 @@ static struct request *__get_request_wai > register struct request *rq; > DECLARE_WAITQUEUE(wait, current); > > - add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); > + add_wait_queue(&q->wait_for_requests[rw], &wait); > do { > set_current_state(TASK_UNINTERRUPTIBLE); > generic_unplug_device(q); > > > Then this (totally unlikely, don't bother): > > diff -puN drivers/block/ll_rw_blk.c~3 drivers/block/ll_rw_blk.c > --- 24/drivers/block/ll_rw_blk.c~3 2003-05-28 03:21:15.000000000 -0700 > +++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:21:39.000000000 -0700 > @@ -829,8 +829,7 @@ void blkdev_release_request(struct reque > */ > if (q) { > list_add(&req->queue, &q->rq[rw].free); > - if (++q->rq[rw].count >= q->batch_requests && > - waitqueue_active(&q->wait_for_requests[rw])) > + if (++q->rq[rw].count >= q->batch_requests) > wake_up(&q->wait_for_requests[rw]); > } > } > > _ Tested all of them and some combinations: patch 1 alone: still mouse hangs patch 2 alone: still mouse hangs patch 3 alone: no hangs, but I get some zombie process (starting a lot of xterms results in zombie xterms, not noticed with vanilla and the other patches) patch 1+2: no mouse hangs patch 1+2+3: no mouse hangs, no zombies Bye, Matthias -- Matthias.Mueller@rz.uni-karlsruhe.de Rechenzentrum Universitaet Karlsruhe Abteilung Netze ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 12:10 ` Matthias Mueller @ 2003-05-28 12:14 ` Matthias Mueller 2003-05-28 12:21 ` Carl-Daniel Hailfinger 2003-05-29 13:19 ` Andrea Arcangeli 1 sibling, 1 reply; 142+ messages in thread From: Matthias Mueller @ 2003-05-28 12:14 UTC (permalink / raw) To: Andrew Morton, axboe, m.c.p, kernel, manish, andrea, marcelo, linux-kernel On Wed, May 28, 2003 at 02:10:40PM +0200, Matthias Mueller wrote: > Tested all of them and some combinations: > patch 1 alone: still mouse hangs > patch 2 alone: still mouse hangs > patch 3 alone: no hangs, but I get some zombie process (starting a lot of > xterms results in zombie xterms, not noticed with vanilla > and the other patches) > patch 1+2: no mouse hangs > patch 1+2+3: no mouse hangs, no zombies Forgot to mention: no zombies with patch 1 or 2 Matthias -- Matthias.Mueller@rz.uni-karlsruhe.de Rechenzentrum Universitaet Karlsruhe Abteilung Netze ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 12:14 ` Matthias Mueller @ 2003-05-28 12:21 ` Carl-Daniel Hailfinger 2003-05-28 12:23 ` Matthias Mueller 0 siblings, 1 reply; 142+ messages in thread From: Carl-Daniel Hailfinger @ 2003-05-28 12:21 UTC (permalink / raw) To: Matthias Mueller Cc: Andrew Morton, axboe, m.c.p, kernel, manish, andrea, marcelo, linux-kernel Matthias Mueller wrote: > On Wed, May 28, 2003 at 02:10:40PM +0200, Matthias Mueller wrote: > >>Tested all of them and some combinations: >>patch 1 alone: still mouse hangs >>patch 2 alone: still mouse hangs >>patch 3 alone: no hangs, but I get some zombie process (starting a lot of >> xterms results in zombie xterms, not noticed with vanilla >> and the other patches) >>patch 1+2: no mouse hangs >>patch 1+2+3: no mouse hangs, no zombies > > > Forgot to mention: no zombies with patch 1 or 2 So 1+2 gives you zombies? Carl-Daniel ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 12:21 ` Carl-Daniel Hailfinger @ 2003-05-28 12:23 ` Matthias Mueller 2003-05-28 12:28 ` Carl-Daniel Hailfinger 0 siblings, 1 reply; 142+ messages in thread From: Matthias Mueller @ 2003-05-28 12:23 UTC (permalink / raw) To: Carl-Daniel Hailfinger Cc: Andrew Morton, axboe, m.c.p, kernel, manish, andrea, marcelo, linux-kernel On Wed, May 28, 2003 at 02:21:08PM +0200, Carl-Daniel Hailfinger wrote: > Matthias Mueller wrote: > > On Wed, May 28, 2003 at 02:10:40PM +0200, Matthias Mueller wrote: > > > >>Tested all of them and some combinations: > >>patch 1 alone: still mouse hangs > >>patch 2 alone: still mouse hangs > >>patch 3 alone: no hangs, but I get some zombie process (starting a lot of > >> xterms results in zombie xterms, not noticed with vanilla > >> and the other patches) > >>patch 1+2: no mouse hangs > >>patch 1+2+3: no mouse hangs, no zombies > > > > > > Forgot to mention: no zombies with patch 1 or 2 > > So 1+2 gives you zombies? No, work ok, just forgot to mention that, too. I think I should go to sleep... Matthias -- Matthias.Mueller@rz.uni-karlsruhe.de Rechenzentrum Universitaet Karlsruhe Abteilung Netze ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 12:23 ` Matthias Mueller @ 2003-05-28 12:28 ` Carl-Daniel Hailfinger 2003-05-28 12:38 ` Matthias Mueller 0 siblings, 1 reply; 142+ messages in thread From: Carl-Daniel Hailfinger @ 2003-05-28 12:28 UTC (permalink / raw) To: Matthias Mueller Cc: Andrew Morton, axboe, m.c.p, kernel, manish, andrea, marcelo, linux-kernel Matthias Mueller wrote: > On Wed, May 28, 2003 at 02:21:08PM +0200, Carl-Daniel Hailfinger wrote: > >>Matthias Mueller wrote: >> >>>On Wed, May 28, 2003 at 02:10:40PM +0200, Matthias Mueller wrote: >>> >>> >>>>Tested all of them and some combinations: >>>>patch 1 alone: hangs, no zombies >>>>patch 2 alone: hangs, no zombies >>>>patch 3 alone: no hangs, zombies >>>>patch 1+2: no hangs, no zombies >>>>patch 1+2+3: no hangs, no zombies Right? ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 12:28 ` Carl-Daniel Hailfinger @ 2003-05-28 12:38 ` Matthias Mueller 0 siblings, 0 replies; 142+ messages in thread From: Matthias Mueller @ 2003-05-28 12:38 UTC (permalink / raw) To: Carl-Daniel Hailfinger Cc: Andrew Morton, axboe, m.c.p, kernel, manish, andrea, marcelo, linux-kernel On Wed, May 28, 2003 at 02:28:10PM +0200, Carl-Daniel Hailfinger wrote: > Matthias Mueller wrote: > > On Wed, May 28, 2003 at 02:21:08PM +0200, Carl-Daniel Hailfinger wrote: > > > >>Matthias Mueller wrote: > >> > >>>On Wed, May 28, 2003 at 02:10:40PM +0200, Matthias Mueller wrote: > >>> > >>> > >>>>Tested all of them and some combinations: > >>>>patch 1 alone: hangs, no zombies > >>>>patch 2 alone: hangs, no zombies > >>>>patch 3 alone: no hangs, zombies > >>>>patch 1+2: no hangs, no zombies > >>>>patch 1+2+3: no hangs, no zombies > > Right? Yes. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 12:10 ` Matthias Mueller 2003-05-28 12:14 ` Matthias Mueller @ 2003-05-29 13:19 ` Andrea Arcangeli 2003-05-29 14:10 ` Matthias Mueller 1 sibling, 1 reply; 142+ messages in thread From: Andrea Arcangeli @ 2003-05-29 13:19 UTC (permalink / raw) To: Andrew Morton, axboe, m.c.p, kernel, manish, marcelo, linux-kernel On Wed, May 28, 2003 at 02:10:40PM +0200, Matthias Mueller wrote: > Tested all of them and some combinations: > patch 1 alone: still mouse hangs > patch 2 alone: still mouse hangs > patch 3 alone: no hangs, but I get some zombie process (starting a lot of > xterms results in zombie xterms, not noticed with vanilla > and the other patches) > patch 1+2: no mouse hangs > patch 1+2+3: no mouse hangs, no zombies I can't find a sense in the zombie thing, how can you generate zombie at all from xterms? That sounds like your userspace is terribly broken and it may have race conditions or whatever. In no way those patches can generate or not-generate zombies from xterms. I never ever seen a zombie xterm in my whole linux experience. either that or the GUI is doing something intentionally to try to reduce the number of wait4 syscalls to the miniumum colescing the wait4, but that would be very bad design of the GUI software since you're not going to start an xterm (or whatever else window) a every millisecond, so it would be very pointless and confusing, I certainly wouldn't like it. (the wait4 thing I don't love it even in the servers where it might be accepted as a microoptimization) It's impossible to trust the rest of the report while hearing about such a fundamental brekage in the core of your GUI, the mouse hangs could be just an userspace bug that triggers when some timing changes in presence of writes, or whatever. So please install an userspace that never generates zombie xterm ever, and see if you can reproduce still. Andrea ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-29 13:19 ` Andrea Arcangeli @ 2003-05-29 14:10 ` Matthias Mueller 2003-05-29 16:22 ` Andrea Arcangeli 0 siblings, 1 reply; 142+ messages in thread From: Matthias Mueller @ 2003-05-29 14:10 UTC (permalink / raw) To: Andrea Arcangeli Cc: Andrew Morton, axboe, m.c.p, kernel, manish, marcelo, linux-kernel On Thu, May 29, 2003 at 03:19:37PM +0200, Andrea Arcangeli wrote: > On Wed, May 28, 2003 at 02:10:40PM +0200, Matthias Mueller wrote: > > Tested all of them and some combinations: > > patch 1 alone: still mouse hangs > > patch 2 alone: still mouse hangs > > patch 3 alone: no hangs, but I get some zombie process (starting a lot of > > xterms results in zombie xterms, not noticed with vanilla > > and the other patches) > > patch 1+2: no mouse hangs > > patch 1+2+3: no mouse hangs, no zombies > > I can't find a sense in the zombie thing, how can you generate zombie at > all from xterms? That sounds like your userspace is terribly broken and > it may have race conditions or whatever. In no way those patches can > generate or not-generate zombies from xterms. I never ever seen a zombie > xterm in my whole linux experience. I rechecked everything an noticed, that it wasn't a xterm, but a wrapper script, that executed rxvt. I changed that to plain xterm and the zombies were gone. So I think there was probably a bug in rxvt triggered there. After that I redid the tests, with the same result (and no zombies). I can feel no difference between 1+2 or 1+2+3. Matthias -- Matthias.Mueller@rz.uni-karlsruhe.de Rechenzentrum Universitaet Karlsruhe Abteilung Netze ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-29 14:10 ` Matthias Mueller @ 2003-05-29 16:22 ` Andrea Arcangeli 0 siblings, 0 replies; 142+ messages in thread From: Andrea Arcangeli @ 2003-05-29 16:22 UTC (permalink / raw) To: Andrew Morton, axboe, m.c.p, kernel, manish, marcelo, linux-kernel On Thu, May 29, 2003 at 04:10:34PM +0200, Matthias Mueller wrote: > On Thu, May 29, 2003 at 03:19:37PM +0200, Andrea Arcangeli wrote: > > On Wed, May 28, 2003 at 02:10:40PM +0200, Matthias Mueller wrote: > > > Tested all of them and some combinations: > > > patch 1 alone: still mouse hangs > > > patch 2 alone: still mouse hangs > > > patch 3 alone: no hangs, but I get some zombie process (starting a lot of > > > xterms results in zombie xterms, not noticed with vanilla > > > and the other patches) > > > patch 1+2: no mouse hangs > > > patch 1+2+3: no mouse hangs, no zombies > > > > I can't find a sense in the zombie thing, how can you generate zombie at > > all from xterms? That sounds like your userspace is terribly broken and > > it may have race conditions or whatever. In no way those patches can > > generate or not-generate zombies from xterms. I never ever seen a zombie > > xterm in my whole linux experience. > > I rechecked everything an noticed, that it wasn't a xterm, but a wrapper > script, that executed rxvt. I changed that to plain xterm and the zombies > were gone. So I think there was probably a bug in rxvt triggered there. > After that I redid the tests, with the same result (and no zombies). > I can feel no difference between 1+2 or 1+2+3. this sounds very sane now thanks for fixing the issues with the zombies! it also makes sense to me that 1+2 is the same as 1+2+3, because I'd be very surprised if the (purely smp) race condition in 3 made a whole lot of difference for interactivity of a large write. Andrea ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 10:23 ` Andrew Morton ` (2 preceding siblings ...) 2003-05-28 12:10 ` Matthias Mueller @ 2003-05-28 14:00 ` Con Kolivas 2003-05-29 13:24 ` Andrea Arcangeli 2003-05-29 1:32 ` manish 4 siblings, 1 reply; 142+ messages in thread From: Con Kolivas @ 2003-05-28 14:00 UTC (permalink / raw) To: Andrew Morton, Matthias Mueller Cc: axboe, m.c.p, manish, andrea, marcelo, linux-kernel On Wed, 28 May 2003 20:23, Andrew Morton wrote: > Could you please work out which change caused it? Go back to stock 2.4 and > then apply this: > [snip] 1 > then this: [snip] 2 > Then this (totally unlikely, don't bother): [snip] 3 Ok patch combination final score for me is as follows in the presence of a large continuous write: 1 No change 2 No change 3 improvement++; minor hangs with reads 1+2 improvement+++; minor pauses with switching applications 1+2+3 improvement++++; no pauses Applications may start up slowly that's fine. The mouse cursor keeps spinning and responding at all times though with 1+2+3 which it hasn't done in 2.4 for a year or so. Con ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 14:00 ` Con Kolivas @ 2003-05-29 13:24 ` Andrea Arcangeli 2003-05-29 13:55 ` Willy Tarreau 0 siblings, 1 reply; 142+ messages in thread From: Andrea Arcangeli @ 2003-05-29 13:24 UTC (permalink / raw) To: Con Kolivas Cc: Andrew Morton, Matthias Mueller, axboe, m.c.p, manish, marcelo, linux-kernel On Thu, May 29, 2003 at 12:00:11AM +1000, Con Kolivas wrote: > On Wed, 28 May 2003 20:23, Andrew Morton wrote: > > Could you please work out which change caused it? Go back to stock 2.4 and > > then apply this: > > > [snip] 1 > > > then this: > [snip] 2 > > > Then this (totally unlikely, don't bother): > [snip] 3 > > Ok patch combination final score for me is as follows in the presence of a > large continuous write: > 1 No change > 2 No change > 3 improvement++; minor hangs with reads > 1+2 improvement+++; minor pauses with switching applications > 1+2+3 improvement++++; no pauses then please try 1+2 alone too (i.e. w/o 3), because it's not obvious to me that you're really the race in 3 in a single write (I spotted and just fixed such a race in my tree some months ago, but thought it was a theoretical one only, I mean on x86). The improvement++ might be just an emotional feeling if you didn't generate numbers to measure it (I know on myself it can happen when you try a new patch, that everything seems faster until you really measure it ;). > Applications may start up slowly that's fine. The mouse cursor keeps spinning > and responding at all times though with 1+2+3 which it hasn't done in 2.4 for the mouse cursor always worked and still works fine for me (and I was just running with 3 applied, just to get the theretical bit correct). > a year or so. > > Con Andrea ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-29 13:24 ` Andrea Arcangeli @ 2003-05-29 13:55 ` Willy Tarreau 2003-05-29 14:09 ` Con Kolivas ` (3 more replies) 0 siblings, 4 replies; 142+ messages in thread From: Willy Tarreau @ 2003-05-29 13:55 UTC (permalink / raw) To: Andrea Arcangeli Cc: Con Kolivas, Andrew Morton, Matthias Mueller, axboe, m.c.p, manish, marcelo, linux-kernel Hello ! I've done a few tests with -rc6 on my dev machine (dual xp 1.5G, 512 MB, scsi). It's the *FIRST* time I have ever seen my mouse cursor hang (just a little bit however, and totally acceptable) ! Usually, my kernel include -aa VM and lowlat patches, and I've never encountered this behaviour on this machine with such a configuration. However, with stock kernel, I admit that during the 2 minutes it takes to write the 2G file, I see the mouse stick two or three times during about 1 second, which is quite acceptable IMHO. Opening an xterm may take 10s to get to the prompt (more annoying). Same to launch 'ps'. I use a fairly simple window manager (ctwm), which doesn't access the disk once it's launched. It never gets stuck during all the operation if I disable the swap. If I enable the swap, it sometimes takes one or two seconds to draw a menu. The swap is used up to about 4 MB. I then tried -rc6 with ll_rw_blk from -rc5, and it's worse, even with swap disabled. The hangs happen more often, but are about the same durations. So I confirm that -rc6 is better here than -rc5. I retried with rc4aa1, and everything went very smooth again ; it takes at most 1 second to get an xterm with the prompt ready, and ps responds immediately. So I think that there are two things here: - those who experience very long hangs may use a heavy window manager which does continuous disk accesses (I mean it accesses the disk for any simple operation). - a hungry WM may also be swapped during such operations, rendering it totally unusable, particularly if the swap is on the same physical disk as the file being written to. So, could the people who report long hangs retry with swap disabled ? Can we limit the amount of memory consummed by the cache during such a write ? Regards, Willy ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-29 13:55 ` Willy Tarreau @ 2003-05-29 14:09 ` Con Kolivas 2003-05-29 14:38 ` Matthias Mueller ` (2 subsequent siblings) 3 siblings, 0 replies; 142+ messages in thread From: Con Kolivas @ 2003-05-29 14:09 UTC (permalink / raw) To: Willy Tarreau, Andrea Arcangeli Cc: Andrew Morton, Matthias Mueller, axboe, m.c.p, manish, marcelo, linux-kernel On Thu, 29 May 2003 23:55, Willy Tarreau wrote: > Hello ! > > I've done a few tests with -rc6 on my dev machine (dual xp 1.5G, 512 MB, > scsi). It's the *FIRST* time I have ever seen my mouse cursor hang (just a > little bit however, and totally acceptable) ! Usually, my kernel include > -aa VM and lowlat patches, and I've never encountered this behaviour on > this machine with such a configuration. However, with stock kernel, I admit > that during the 2 minutes it takes to write the 2G file, I see the mouse > stick two or three times during about 1 second, which is quite acceptable > IMHO. Opening an xterm may take 10s to get to the prompt (more annoying). > Same to launch 'ps'. > > I use a fairly simple window manager (ctwm), which doesn't access the disk > once it's launched. It never gets stuck during all the operation if I > disable the swap. If I enable the swap, it sometimes takes one or two > seconds to draw a menu. The swap is used up to about 4 MB. > > I then tried -rc6 with ll_rw_blk from -rc5, and it's worse, even with swap > disabled. The hangs happen more often, but are about the same durations. So > I confirm that -rc6 is better here than -rc5. > > I retried with rc4aa1, and everything went very smooth again ; it takes at > most 1 second to get an xterm with the prompt ready, and ps responds > immediately. So I think that there are two things here: > - those who experience very long hangs may use a heavy window manager > which does continuous disk accesses (I mean it accesses the disk for > any simple operation). > - a hungry WM may also be swapped during such operations, rendering it > totally unusable, particularly if the swap is on the same physical disk > as the file being written to. > > So, could the people who report long hangs retry with swap disabled ? > Can we limit the amount of memory consummed by the cache during such a > write ? I still get hangs with rc6 with massive writeouts to swap. The problem was that I was getting hangs without writeouts to swap with 2.4.19pre1 ->2.4.21pre5. I didn't expect the patch backout to suddenly make writing to swap occur for free (although that would be nice). Con ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-29 13:55 ` Willy Tarreau 2003-05-29 14:09 ` Con Kolivas @ 2003-05-29 14:38 ` Matthias Mueller 2003-05-29 16:10 ` Willy TARREAU 2003-05-29 14:45 ` Marc-Christian Petersen 2003-05-29 16:19 ` Andrea Arcangeli 3 siblings, 1 reply; 142+ messages in thread From: Matthias Mueller @ 2003-05-29 14:38 UTC (permalink / raw) To: Willy Tarreau Cc: Andrea Arcangeli, Con Kolivas, Andrew Morton, axboe, m.c.p, manish, marcelo, linux-kernel On Thu, May 29, 2003 at 03:55:08PM +0200, Willy Tarreau wrote: > Hello ! > > I've done a few tests with -rc6 on my dev machine (dual xp 1.5G, 512 MB, scsi). > It's the *FIRST* time I have ever seen my mouse cursor hang (just a little bit > however, and totally acceptable) ! Usually, my kernel include -aa VM and lowlat > patches, and I've never encountered this behaviour on this machine with such a > configuration. However, with stock kernel, I admit that during the 2 minutes it > takes to write the 2G file, I see the mouse stick two or three times during > about 1 second, which is quite acceptable IMHO. Opening an xterm may take 10s > to get to the prompt (more annoying). Same to launch 'ps'. > > I use a fairly simple window manager (ctwm), which doesn't access the disk once > it's launched. It never gets stuck during all the operation if I disable the > swap. If I enable the swap, it sometimes takes one or two seconds to draw a > menu. The swap is used up to about 4 MB. > > I then tried -rc6 with ll_rw_blk from -rc5, and it's worse, even with swap > disabled. The hangs happen more often, but are about the same durations. So I > confirm that -rc6 is better here than -rc5. > > I retried with rc4aa1, and everything went very smooth again ; it takes at most > 1 second to get an xterm with the prompt ready, and ps responds immediately. So > I think that there are two things here: > - those who experience very long hangs may use a heavy window manager > which does continuous disk accesses (I mean it accesses the disk for any > simple operation). > - a hungry WM may also be swapped during such operations, rendering it > totally unusable, particularly if the swap is on the same physical disk > as the file being written to. > > So, could the people who report long hangs retry with swap disabled ? > Can we limit the amount of memory consummed by the cache during such a write ? I run fluxbox, not a very heavy window manager, but I installed ctwm and tried again with vanilla 2.4.20. If I disabled swap the short hangs (1s) are gone, but the long mouse hangs (10s) are still there. Matthias -- Matthias.Mueller@rz.uni-karlsruhe.de Rechenzentrum Universitaet Karlsruhe Abteilung Netze ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-29 14:38 ` Matthias Mueller @ 2003-05-29 16:10 ` Willy TARREAU 0 siblings, 0 replies; 142+ messages in thread From: Willy TARREAU @ 2003-05-29 16:10 UTC (permalink / raw) To: Willy Tarreau, Andrea Arcangeli, Con Kolivas, Andrew Morton, axboe, m.c.p, manish, marcelo, linux-kernel On Thu, May 29, 2003 at 04:38:28PM +0200, Matthias Mueller wrote: > I run fluxbox, not a very heavy window manager, but I installed ctwm and > tried again with vanilla 2.4.20. If I disabled swap the short hangs (1s) are > gone, but the long mouse hangs (10s) are still there. Thanks for the test, but I find it really amazing that the mouse hangs while it has nothing to do with any block device at all ! Cheers, Willy ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-29 13:55 ` Willy Tarreau 2003-05-29 14:09 ` Con Kolivas 2003-05-29 14:38 ` Matthias Mueller @ 2003-05-29 14:45 ` Marc-Christian Petersen 2003-05-29 16:06 ` Willy TARREAU 2003-05-29 16:19 ` Andrea Arcangeli 3 siblings, 1 reply; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-29 14:45 UTC (permalink / raw) To: Willy Tarreau Cc: Andrea Arcangeli, Con Kolivas, Andrew Morton, Matthias Mueller, axboe, marcelo, linux-kernel On Thursday 29 May 2003 15:55, Willy Tarreau wrote: Hi Willy, > I've done a few tests with -rc6 on my dev machine (dual xp 1.5G, 512 MB, > scsi). It's the *FIRST* time I have ever seen my mouse cursor hang (just a > little bit however, and totally acceptable) ! Usually, my kernel include -aa > VM and lowlat patches, and I've never encountered this behaviour on this > machine with such a configuration. However, with stock kernel, I admit that > during the 2 minutes it takes to write the 2G file, I see the mouse stick > two or three times during about 1 second, which is quite acceptable IMHO. WRONG. A mouse stick is not acceptable in _any_ way. Other OS' can handle this pretty well, and if Linux has problems with mouse sticks, this has to be fixed! Either in kernel space or in userspace (XFree86). > Opening an xterm may take 10s to get to the prompt (more annoying). Same to > launch 'ps'. ACK! > I retried with rc4aa1, and everything went very smooth again ; it takes at > most 1 second to get an xterm with the prompt ready, and ps responds > immediately. So I think that there are two things here: > - those who experience very long hangs may use a heavy window manager > which does continuous disk accesses (I mean it accesses the disk for > any simple operation). > - a hungry WM may also be swapped during such operations, rendering it > totally unusable, particularly if the swap is on the same physical disk > as the file being written to. Well, sorry, but: no! The pauses/stops occurs no matter of what WindowManager (KDE2/3, WindowMaker, fvwm, gnome etc. foobar). The point why you are not seeing such things with -aa is his Lowlatency Elevator and lowlatency-fixes and some important fixes which are not in stock kernel yet. I reproduced mouse sticks and keyboard does not accept anything problems for $seconds with _every_ kernel which is based on 2.4.19/2.4.20/2.4.21*. This also includes -AA (well, not that braindead bad like mainline did before the fix) but this is because of lowlat elevator from Andrea. And as I told yesterday (or 2 days ago? dunno) lowlat elevator drops throughput (Andrea, it _does_ ;). It's not just only mouse hangs (as I've reported tons of times) but also keyboard does not accept any input (delay varies between 1 to 15 seconds) and this also applies if you don't run X at all. Another fine example is: - Start a screen session, not running X at all. - Trash your HD with tons of writes. - Press Ctrl-A-C for a new screen session. You will see, it takes as long as, you wrote above, with starting up an Xterm or calling ps. It does _not_ happen with 2.4.18! > So, could the people who report long hangs retry with swap disabled ? It's somewhat better but not acceptable. > Can we limit the amount of memory consummed by the cache during such a > write ? I ask for such a feature since years ;) Well, my summary: The bug is there, for over 15 months ( I won't mention it again that I've reported the bug 15 months ago ;-) ... It _may_ be some very obscure hardware problem to be able to reproduce this bug but as this thread shows up, there are tons of people who can reproduce this with different hardware starting with 2.4.19-pre1. ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-29 14:45 ` Marc-Christian Petersen @ 2003-05-29 16:06 ` Willy TARREAU 2003-05-29 16:49 ` Andrea Arcangeli 0 siblings, 1 reply; 142+ messages in thread From: Willy TARREAU @ 2003-05-29 16:06 UTC (permalink / raw) To: Marc-Christian Petersen Cc: Willy Tarreau, Andrea Arcangeli, Con Kolivas, Andrew Morton, Matthias Mueller, axboe, marcelo, linux-kernel On Thu, May 29, 2003 at 04:45:26PM +0200, Marc-Christian Petersen wrote: > > machine with such a configuration. However, with stock kernel, I admit that > > during the 2 minutes it takes to write the 2G file, I see the mouse stick > > two or three times during about 1 second, which is quite acceptable IMHO. > WRONG. A mouse stick is not acceptable in _any_ way. Other OS' can handle this Excuse me, Marc, I didn't mean it was normally acceptable, but quite acceptable compared to what other people report. > > Opening an xterm may take 10s to get to the prompt (more annoying). Same to > > launch 'ps'. > ACK! The problem is specifically due to the cache, and only related to I/O but not to other subsystems : if I start 50 xterms during that write, they take the same time to respond as when there's only one. And they all respond simultaneously, showing that they were all waiting for the files to be read from the disk. But I cannot hang anything which doesn't need disk access. Perhaps some people have their X server swap ! > The pauses/stops occurs no matter of what WindowManager (KDE2/3, WindowMaker, > fvwm, gnome etc. foobar). The point why you are not seeing such things with > -aa is his Lowlatency Elevator and lowlatency-fixes and some important fixes > which are not in stock kernel yet. Do you agree that if the WM does no disk access and the mouse/keyboard freezes, it means that X and/or the WM swap ? And if it's not the case, then it's related to something else, and I don't see how playing with elevators can help! > I reproduced mouse sticks and keyboard does not accept anything problems for > $seconds with _every_ kernel which is based on 2.4.19/2.4.20/2.4.21*. This > also includes -AA (well, not that braindead bad like mainline did before the > fix) but this is because of lowlat elevator from Andrea. And as I told > yesterday (or 2 days ago? dunno) lowlat elevator drops throughput (Andrea, it > _does_ ;). I also confirm it does ; it takes 122 seconds to write this file in -rc6, and 142 seconds in -aa. But I don't think that desktop people would notice anyway. > It's not just only mouse hangs (as I've reported tons of times) but also > keyboard does not accept any input (delay varies between 1 to 15 seconds) and > this also applies if you don't run X at all. in fact, we don't know if the keyboard doesn't accept inputs or if the process bound to the TTY is stuck ! If Alt-SysRq replies immediately, the problem is on the user process side. > - Start a screen session, not running X at all. > - Trash your HD with tons of writes. > - Press Ctrl-A-C for a new screen session. > > You will see, it takes as long as, you wrote above, with starting up an Xterm > or calling ps. It does _not_ happen with 2.4.18! I think that for this, screen will need to allocate some memory, which may take some time under these conditions. I don't have screen right here, so I won't try it, but I suspect that a program which uses pre-allocated memory will have no problem at all. > > So, could the people who report long hangs retry with swap disabled ? > It's somewhat better but not acceptable. OK > > Can we limit the amount of memory consummed by the cache during such a > > write ? > I ask for such a feature since years ;) another solution would be to be able to specify that a process could use pre-allocated memory. Cheers, Willy ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-29 16:06 ` Willy TARREAU @ 2003-05-29 16:49 ` Andrea Arcangeli 2003-05-29 17:46 ` Willy Tarreau 0 siblings, 1 reply; 142+ messages in thread From: Andrea Arcangeli @ 2003-05-29 16:49 UTC (permalink / raw) To: Willy TARREAU Cc: Marc-Christian Petersen, Con Kolivas, Andrew Morton, Matthias Mueller, axboe, marcelo, linux-kernel On Thu, May 29, 2003 at 06:06:04PM +0200, Willy TARREAU wrote: > I also confirm it does ; it takes 122 seconds to write this file in -rc6, and > 142 seconds in -aa. But I don't think that desktop people would notice anyway. btw, were you running parallel reads or writes at the same time? (i.e. launching xterms or ps etc.. in parallel?) I ask because if xterm startups quick is because the write workload is getting more seeks in its way. I'd be very interested if you can measure a bonnie performance change in contigous reads and writes on a otherwise completely idle machine, the size of the queue has to be big enough to keep the I/O pipeline full during contigous writes at full speed. saying that throughput decrease alone is not enough to evaluate the reason of this drop. you can also try with: echo 20 500 0 0 500 3000 30 10 >/proc/sys/vm/bdflush just in case. Andrea ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-29 16:49 ` Andrea Arcangeli @ 2003-05-29 17:46 ` Willy Tarreau 0 siblings, 0 replies; 142+ messages in thread From: Willy Tarreau @ 2003-05-29 17:46 UTC (permalink / raw) To: Andrea Arcangeli Cc: Willy TARREAU, Marc-Christian Petersen, Con Kolivas, Andrew Morton, Matthias Mueller, axboe, marcelo, linux-kernel On Thu, May 29, 2003 at 06:49:40PM +0200, Andrea Arcangeli wrote: > btw, were you running parallel reads or writes at the same time? (i.e. > launching xterms or ps etc.. in parallel?) I ask because if xterm > startups quick is because the write workload is getting more seeks in > its way. Well, you're right, I was starting some xterms, but not that much perhaps a tens during all the test. > I'd be very interested if you can measure a bonnie performance change in > contigous reads and writes on a otherwise completely idle machine, the > size of the queue has to be big enough to keep the I/O pipeline full > during contigous writes at full speed. for this I'll have to install bonnie, I won't do it right now. > you can also try with: > > echo 20 500 0 0 500 3000 30 10 >/proc/sys/vm/bdflush interestingly, it seems as the lower the last 2 values, the longer it takes. I retried without opening any xterm, and it took 130 seconds. With the above changes to bdflush, 135 s. With '80 50', 118s. vmstat also show me that the test begins at a sustained 16-19 MB/s write throughput during about the first minute. Then it starts to show regular drops to 5-7 MB/s for 6-7s, and goes back to full speed. Since this is on reiserfs, I wonder if this activity is not related to the journal. Moreover, the disk still writes during about 10s after the end of the dd, so I don't think that mesuring the time dd takes to complete is a good indicator of anything (or I should try with a final sync). If I write simultaneously to two 1G files, wait a few time and then read from them while still writing, I begin to wait a few seconds for xterm to give me the prompt. But when writes finish and there are only concurrent reads, everything gets smooth again, eventhough the disk emits a terrible seek sound ! Cheers, Willy ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-29 13:55 ` Willy Tarreau ` (2 preceding siblings ...) 2003-05-29 14:45 ` Marc-Christian Petersen @ 2003-05-29 16:19 ` Andrea Arcangeli 3 siblings, 0 replies; 142+ messages in thread From: Andrea Arcangeli @ 2003-05-29 16:19 UTC (permalink / raw) To: Willy Tarreau Cc: Con Kolivas, Andrew Morton, Matthias Mueller, axboe, m.c.p, manish, marcelo, linux-kernel On Thu, May 29, 2003 at 03:55:08PM +0200, Willy Tarreau wrote: > So, could the people who report long hangs retry with swap disabled ? > Can we limit the amount of memory consummed by the cache during such a write ? the vm should be (i.e. is supposed to be) smart enough not to unmap anything significant just because of large writes. I'm sure it's not swapping anything on my desktop during write flood (and certainly not the mouse pointer) but checking with swapoff is certainly a good hint to be sure. Andrea ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 10:23 ` Andrew Morton ` (3 preceding siblings ...) 2003-05-28 14:00 ` Con Kolivas @ 2003-05-29 1:32 ` manish 4 siblings, 0 replies; 142+ messages in thread From: manish @ 2003-05-29 1:32 UTC (permalink / raw) To: Andrew Morton Cc: Matthias Mueller, axboe, m.c.p, kernel, andrea, marcelo, linux-kernel Andrew Morton wrote: >Matthias Mueller <matthias.mueller@rz.uni-karlsruhe.de> wrote: > >>Works fine on my notebook. Good throughput and no mouse hangs anymore. >> > >Interesting. > >Could you please work out which change caused it? Go back to stock 2.4 and >then apply this: > > >diff -puN drivers/block/ll_rw_blk.c~1 drivers/block/ll_rw_blk.c >--- 24/drivers/block/ll_rw_blk.c~1 2003-05-28 03:20:42.000000000 -0700 >+++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:20:57.000000000 -0700 >@@ -590,10 +590,10 @@ static struct request *__get_request_wai > register struct request *rq; > DECLARE_WAITQUEUE(wait, current); > >- generic_unplug_device(q); > add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); > do { > set_current_state(TASK_UNINTERRUPTIBLE); >+ generic_unplug_device(q); > if (q->rq[rw].count == 0) > schedule(); > spin_lock_irq(&io_request_lock); > > > >then this: > >diff -puN drivers/block/ll_rw_blk.c~2 drivers/block/ll_rw_blk.c >--- 24/drivers/block/ll_rw_blk.c~2 2003-05-28 03:21:03.000000000 -0700 >+++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:21:09.000000000 -0700 >@@ -590,7 +590,7 @@ static struct request *__get_request_wai > register struct request *rq; > DECLARE_WAITQUEUE(wait, current); > >- add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait); >+ add_wait_queue(&q->wait_for_requests[rw], &wait); > do { > set_current_state(TASK_UNINTERRUPTIBLE); > generic_unplug_device(q); > > >Then this (totally unlikely, don't bother): > >diff -puN drivers/block/ll_rw_blk.c~3 drivers/block/ll_rw_blk.c >--- 24/drivers/block/ll_rw_blk.c~3 2003-05-28 03:21:15.000000000 -0700 >+++ 24-akpm/drivers/block/ll_rw_blk.c 2003-05-28 03:21:39.000000000 -0700 >@@ -829,8 +829,7 @@ void blkdev_release_request(struct reque > */ > if (q) { > list_add(&req->queue, &q->rq[rw].free); >- if (++q->rq[rw].count >= q->batch_requests && >- waitqueue_active(&q->wait_for_requests[rw])) >+ if (++q->rq[rw].count >= q->batch_requests) > wake_up(&q->wait_for_requests[rw]); > } > } > >_ > Hello ! I have applied patch 1+2+3 and it seemed to have solved the stalls/pauses that I was seeing with the stock kernel after long hrs of test using bonnie. Thanks much Manish ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 10:13 ` Matthias Mueller 2003-05-28 10:18 ` Jens Axboe 2003-05-28 10:23 ` Andrew Morton @ 2003-05-28 10:24 ` Marc-Christian Petersen 2 siblings, 0 replies; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-28 10:24 UTC (permalink / raw) To: Matthias Mueller, Andrew Morton Cc: Jens Axboe, kernel, manish, andrea, marcelo, linux-kernel On Wednesday 28 May 2003 12:13, Matthias Mueller wrote: Hi Matthias, Andrew, > > It'd be interesting if any of these changes make a difference. > Works fine on my notebook. Good throughput and no mouse hangs anymore. damn, I *KNEW* Andrew is able to fix this. I knew that for over a year!! ;) ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 18:04 ` Marc-Christian Petersen 2003-05-27 23:06 ` Georg Nikodym 2003-05-28 5:33 ` Con Kolivas @ 2003-05-28 7:16 ` Marc Wilson 2003-05-28 19:53 ` David Ford 2003-05-28 9:36 ` Ragnar Hojland Espinosa 3 siblings, 1 reply; 142+ messages in thread From: Marc Wilson @ 2003-05-28 7:16 UTC (permalink / raw) To: linux-kernel On Tue, May 27, 2003 at 08:04:49PM +0200, Marc-Christian Petersen wrote: > ALL: Anyone who has this kind of pauses/stops/mouse is dead/keyboard is dead/: > speak _NOW_ please, doesn't matter who you are! Ok, add my box to the list. Variety of post 2.4.18 kernels, -ac's, -rc's, etc... all demonstrate it to one degree or another. Lately it's gotten REALLY bad. Currently I'm using 21-rc2-ac2 and it freezes for upwards of 15 sec regularly when I'm exercising the HD (three simultaneous brag threads downloading from various newsgroups). The mouse moves, but other than that, X is entirely unresponsive. An xterm with continually scrolling text, for example, will appear to stop scrolling until the kernel comes back. The HD light is on solid the whole time. 21-rc2 does it too. I haven't tried anything later than that yet. Well, I tried 20-ck7 and it ate my RAID0 due to a DMA-ism and I've not tested anything else since. :( -- Marc Wilson | Nothing in life is to be feared. It is only to msw@cox.net | be understood. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 7:16 ` Marc Wilson @ 2003-05-28 19:53 ` David Ford 0 siblings, 0 replies; 142+ messages in thread From: David Ford @ 2003-05-28 19:53 UTC (permalink / raw) To: linux-kernel Hmm, odd. I see similar dead time in 2.5.x, it is annoying but I haven't had any time to track it down. I'm currently on .69 and planning on putting .70 on this evening. David Marc Wilson wrote: >On Tue, May 27, 2003 at 08:04:49PM +0200, Marc-Christian Petersen wrote: > > >>ALL: Anyone who has this kind of pauses/stops/mouse is dead/keyboard is dead/: >> speak _NOW_ please, doesn't matter who you are! >> >> > >Ok, add my box to the list. Variety of post 2.4.18 kernels, -ac's, -rc's, >etc... all demonstrate it to one degree or another. > >Lately it's gotten REALLY bad. > >Currently I'm using 21-rc2-ac2 and it freezes for upwards of 15 sec >regularly when I'm exercising the HD (three simultaneous brag threads >downloading from various newsgroups). The mouse moves, but other than >that, X is entirely unresponsive. An xterm with continually scrolling >text, for example, will appear to stop scrolling until the kernel comes >back. > >The HD light is on solid the whole time. > >21-rc2 does it too. I haven't tried anything later than that yet. Well, I >tried 20-ck7 and it ate my RAID0 due to a DMA-ism and I've not tested >anything else since. :( > ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-27 18:04 ` Marc-Christian Petersen ` (2 preceding siblings ...) 2003-05-28 7:16 ` Marc Wilson @ 2003-05-28 9:36 ` Ragnar Hojland Espinosa 2003-05-28 9:45 ` Jens Axboe ` (2 more replies) 3 siblings, 3 replies; 142+ messages in thread From: Ragnar Hojland Espinosa @ 2003-05-28 9:36 UTC (permalink / raw) To: Marc-Christian Petersen Cc: manish, Carl-Daniel Hailfinger, Andrea Arcangeli, Marcelo Tosatti, linux-kernel, Christian Klose, William Lee Irwin III On Tue, May 27, 2003 at 08:04:49PM +0200, Marc-Christian Petersen wrote: > > ALL: Anyone who has this kind of pauses/stops/mouse is dead/keyboard is dead/: > speak _NOW_ please, doesn't matter who you are! FWIW, me too. Actually it just happens in the fixing stage when burning prebuilt iso images from the hard disk (same IDE channel as the burner, 2.4.20) Having a completely frozen machine under X was quite panic inducing ;) A friend told me they also get regular "pauses" when quitting from vmware. -- Ragnar Hojland - Project Manager Linalco "Especialistas Linux y en Software Libre" http://www.linalco.com Tel: +34-91-5970074 Fax: +34-91-5970083 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 9:36 ` Ragnar Hojland Espinosa @ 2003-05-28 9:45 ` Jens Axboe 2003-05-28 9:53 ` Marc-Christian Petersen 2003-05-28 10:58 ` Alan Cox 2 siblings, 0 replies; 142+ messages in thread From: Jens Axboe @ 2003-05-28 9:45 UTC (permalink / raw) To: Ragnar Hojland Espinosa Cc: Marc-Christian Petersen, manish, Carl-Daniel Hailfinger, Andrea Arcangeli, Marcelo Tosatti, linux-kernel, Christian Klose, William Lee Irwin III On Wed, May 28 2003, Ragnar Hojland Espinosa wrote: > On Tue, May 27, 2003 at 08:04:49PM +0200, Marc-Christian Petersen wrote: > > > > ALL: Anyone who has this kind of pauses/stops/mouse is dead/keyboard is dead/: > > speak _NOW_ please, doesn't matter who you are! > > FWIW, me too. > > Actually it just happens in the fixing stage when burning prebuilt iso > images from the hard disk (same IDE channel as the burner, 2.4.20) > Having a completely frozen machine under X was quite panic inducing ;) > > A friend told me they also get regular "pauses" when quitting from > vmware. Lemme guess, hard drive on the same channel as the burner? There's nothing we can do about that, hardware limitation. The reason you see it during fixation is because that's one long single command, and we cannot preempt the channel and service requests while that is going on. -- Jens Axboe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 9:36 ` Ragnar Hojland Espinosa 2003-05-28 9:45 ` Jens Axboe @ 2003-05-28 9:53 ` Marc-Christian Petersen 2003-05-28 10:01 ` Jens Axboe 2003-05-28 10:58 ` Alan Cox 2 siblings, 1 reply; 142+ messages in thread From: Marc-Christian Petersen @ 2003-05-28 9:53 UTC (permalink / raw) To: Ragnar Hojland Espinosa Cc: manish, Carl-Daniel Hailfinger, Andrea Arcangeli, Marcelo Tosatti, linux-kernel, Christian Klose, William Lee Irwin III On Wednesday 28 May 2003 11:36, Ragnar Hojland Espinosa wrote: Hi Ragnar, > Actually it just happens in the fixing stage when burning prebuilt iso > images from the hard disk (same IDE channel as the burner, 2.4.20) > Having a completely frozen machine under X was quite panic inducing ;) That's a problem of IDE itself. I still say IDE is broken by design ;-) > A friend told me they also get regular "pauses" when quitting from > vmware. Yep, occurs also with my machines. ciao, Marc ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 9:53 ` Marc-Christian Petersen @ 2003-05-28 10:01 ` Jens Axboe 0 siblings, 0 replies; 142+ messages in thread From: Jens Axboe @ 2003-05-28 10:01 UTC (permalink / raw) To: Marc-Christian Petersen Cc: Ragnar Hojland Espinosa, manish, Carl-Daniel Hailfinger, Andrea Arcangeli, Marcelo Tosatti, linux-kernel, Christian Klose, William Lee Irwin III On Wed, May 28 2003, Marc-Christian Petersen wrote: > On Wednesday 28 May 2003 11:36, Ragnar Hojland Espinosa wrote: > > Hi Ragnar, > > > Actually it just happens in the fixing stage when burning prebuilt iso > > images from the hard disk (same IDE channel as the burner, 2.4.20) > > Having a completely frozen machine under X was quite panic inducing ;) > That's a problem of IDE itself. I still say IDE is broken by design ;-) It is actually possible to use the IMMED bit of the CLOSE_TRACK command to get around this. In that case the cd-r will return the command as completed and the drive on the same channel can service requests. -- Jens Axboe ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 9:36 ` Ragnar Hojland Espinosa 2003-05-28 9:45 ` Jens Axboe 2003-05-28 9:53 ` Marc-Christian Petersen @ 2003-05-28 10:58 ` Alan Cox 2003-05-29 8:34 ` Ragnar Hojland Espinosa 2 siblings, 1 reply; 142+ messages in thread From: Alan Cox @ 2003-05-28 10:58 UTC (permalink / raw) To: Ragnar Hojland Espinosa Cc: Marc-Christian Petersen, manish, Carl-Daniel Hailfinger, Andrea Arcangeli, Marcelo Tosatti, Linux Kernel Mailing List, Christian Klose, William Lee Irwin III On Mer, 2003-05-28 at 10:36, Ragnar Hojland Espinosa wrote: > Actually it just happens in the fixing stage when burning prebuilt iso > images from the hard disk (same IDE channel as the burner, 2.4.20) > Having a completely frozen machine under X was quite panic inducing ;) If you have a disk and the burner ont he same channel this is quite normal. The fixate is a single ATAPI command and like all ATA commands locks the bus to both master/slave for its duration of execution. Its an IDE limitation ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 10:58 ` Alan Cox @ 2003-05-29 8:34 ` Ragnar Hojland Espinosa 0 siblings, 0 replies; 142+ messages in thread From: Ragnar Hojland Espinosa @ 2003-05-29 8:34 UTC (permalink / raw) To: Alan Cox Cc: Marc-Christian Petersen, manish, Carl-Daniel Hailfinger, Andrea Arcangeli, Marcelo Tosatti, Linux Kernel Mailing List, Christian Klose, William Lee Irwin III On Wed, May 28, 2003 at 11:58:43AM +0100, Alan Cox wrote: > On Mer, 2003-05-28 at 10:36, Ragnar Hojland Espinosa wrote: > > Actually it just happens in the fixing stage when burning prebuilt iso > > images from the hard disk (same IDE channel as the burner, 2.4.20) > > Having a completely frozen machine under X was quite panic inducing ;) > > If you have a disk and the burner ont he same channel this is quite > normal. The fixate is a single ATAPI command and like all ATA commands > locks the bus to both master/slave for its duration of execution. > > Its an IDE limitation Thats what you get for cheap hardware ;) Anyway, I do have two questions regarding pauses when fixating, in case someone knows.. - Why it doesn't the freeze always happen (I think it doesn't) - Why doesn't the complete computer freeze happen always. -- Ragnar Hojland - Project Manager Linalco "Especialistas Linux y en Software Libre" http://www.linalco.com Tel: +34-91-5970074 Fax: +34-91-5970083 ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20030527035006$5339@gated-at.bofh.it>]
[parent not found: <20030527175008$3573@gated-at.bofh.it>]
[parent not found: <20030527180016$418c@gated-at.bofh.it>]
[parent not found: <20030527182011$4acb@gated-at.bofh.it>]
[parent not found: <20030528094008$1500@gated-at.bofh.it>]
[parent not found: <20030528095014$7b21@gated-at.bofh.it>]
* Re: 2.4.20: Proccess stuck in __lock_page ... [not found] ` <20030528095014$7b21@gated-at.bofh.it> @ 2003-05-28 18:55 ` Thomas Tonino 2003-06-02 10:43 ` Jens Axboe 0 siblings, 1 reply; 142+ messages in thread From: Thomas Tonino @ 2003-05-28 18:55 UTC (permalink / raw) To: linux-kernel Jens Axboe wrote: > Lemme guess, hard drive on the same channel as the burner? There's > nothing we can do about that, hardware limitation. hmmm... most drives these days have a command to read free buffer capacity, so there is no need to send more than the drive can swallow - and no need to tie up the channel. > The reason you see it > during fixation is because that's one long single command, and we cannot > preempt the channel and service requests while that is going on. But this may be the exception that breaks the rule. Bah. Thomas ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.4.20: Proccess stuck in __lock_page ... 2003-05-28 18:55 ` Thomas Tonino @ 2003-06-02 10:43 ` Jens Axboe 0 siblings, 0 replies; 142+ messages in thread From: Jens Axboe @ 2003-06-02 10:43 UTC (permalink / raw) To: Thomas Tonino; +Cc: linux-kernel On Wed, May 28 2003, Thomas Tonino wrote: > Jens Axboe wrote: > > >Lemme guess, hard drive on the same channel as the burner? There's > >nothing we can do about that, hardware limitation. > > hmmm... most drives these days have a command to read free buffer capacity, > so there is no need to send more than the drive can swallow - and no need > to tie up the channel. As we cannot do more than 128kb in a single request (cdrecord uses 63kb for writing), there's no problem there. I think you are misunderstanding me. This is not a problem with ide layer starving the hard drive by continually sending writes to the cd-r, it's a problem with not being able to preempt service for a single command duration. > >The reason you see it > >during fixation is because that's one long single command, and we cannot > >preempt the channel and service requests while that is going on. > > But this may be the exception that breaks the rule. Bah. No, that is the entire problem. -- Jens Axboe ^ permalink raw reply [flat|nested] 142+ messages in thread
end of thread, other threads:[~2003-06-02 10:29 UTC | newest] Thread overview: 142+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-05-27 3:41 2.4.20: Proccess stuck in __lock_page manish 2003-05-27 4:03 ` Marcelo Tosatti 2003-05-27 4:25 ` manish 2003-05-27 4:59 ` Marcelo Tosatti 2003-05-27 15:29 ` manish 2003-05-27 16:59 ` Marcelo Tosatti 2003-05-27 4:31 ` manish 2003-05-27 14:14 ` Carl-Daniel Hailfinger 2003-05-27 14:28 ` William Lee Irwin III 2003-05-27 17:27 ` Marcelo Tosatti 2003-05-27 17:36 ` Marc-Christian Petersen 2003-05-27 17:47 ` Marcelo Tosatti 2003-05-27 17:52 ` Marc-Christian Petersen 2003-05-27 17:57 ` Marcelo Tosatti 2003-05-27 18:08 ` Marc-Christian Petersen 2003-05-27 18:25 ` Andrea Arcangeli 2003-05-27 18:33 ` Marcelo Tosatti 2003-05-27 18:39 ` Marc-Christian Petersen 2003-05-27 19:00 ` manish 2003-05-27 19:01 ` Marcelo Tosatti 2003-05-27 19:09 ` manish 2003-05-27 19:12 ` manish 2003-05-27 19:28 ` Marcelo Tosatti 2003-05-27 19:34 ` manish 2003-05-27 20:20 ` Andrea Arcangeli 2003-05-27 20:25 ` Marc-Christian Petersen 2003-05-27 20:42 ` manish 2003-05-27 20:47 ` Andrea Arcangeli 2003-05-27 20:50 ` manish 2003-05-27 21:05 ` Andrea Arcangeli 2003-05-27 20:03 ` Andrea Arcangeli 2003-05-27 20:08 ` Marcelo Tosatti 2003-05-27 20:25 ` Andrea Arcangeli 2003-05-27 22:18 ` Andrew Morton 2003-05-27 22:38 ` Andrea Arcangeli 2003-05-27 22:40 ` Andrew Morton 2003-05-27 22:58 ` Andrea Arcangeli 2003-05-27 20:08 ` Chris Mason 2003-05-27 18:35 ` Marc-Christian Petersen 2003-05-27 20:10 ` Andrea Arcangeli 2003-05-27 20:24 ` Marc-Christian Petersen 2003-05-27 20:45 ` Andrea Arcangeli 2003-05-27 20:53 ` Marc-Christian Petersen 2003-05-27 21:00 ` Jens Axboe 2003-05-27 21:11 ` Marc-Christian Petersen 2003-05-27 21:19 ` Jens Axboe 2003-05-27 20:55 ` Jens Axboe 2003-05-27 21:05 ` William Lee Irwin III 2003-05-27 21:18 ` Jens Axboe 2003-05-27 21:33 ` Andrea Arcangeli 2003-05-27 18:09 ` manish 2003-05-27 17:53 ` manish 2003-05-27 18:01 ` Marc-Christian Petersen 2003-05-27 18:16 ` Marcelo Tosatti 2003-05-27 18:25 ` Marc-Christian Petersen 2003-05-27 18:12 ` Matthias Mueller 2003-05-27 17:36 ` William Lee Irwin III 2003-05-27 17:38 ` Carl-Daniel Hailfinger 2003-05-27 17:50 ` manish 2003-05-27 18:04 ` Marc-Christian Petersen 2003-05-27 23:06 ` Georg Nikodym 2003-05-27 23:26 ` Christopher S. Aker 2003-05-28 5:33 ` Con Kolivas 2003-05-28 6:04 ` Jens Axboe 2003-05-28 7:13 ` Con Kolivas 2003-05-28 7:13 ` Jens Axboe 2003-05-28 7:32 ` Marc-Christian Petersen 2003-05-28 7:35 ` Jens Axboe 2003-05-28 7:51 ` Andrew Morton 2003-05-28 8:30 ` Jens Axboe 2003-05-28 8:43 ` Marc-Christian Petersen 2003-05-28 8:40 ` Marc-Christian Petersen 2003-05-28 10:13 ` Matthias Mueller 2003-05-28 10:18 ` Jens Axboe 2003-05-28 10:23 ` Andrew Morton 2003-05-28 10:25 ` Jens Axboe 2003-05-28 10:48 ` Con Kolivas 2003-05-28 10:50 ` Jens Axboe 2003-05-28 10:59 ` Andrew Morton 2003-05-28 11:17 ` Marc-Christian Petersen 2003-05-28 11:27 ` Andrew Morton 2003-05-28 11:31 ` Marc-Christian Petersen 2003-05-28 12:53 ` Jens Axboe 2003-05-28 12:58 ` Matthias Mueller 2003-05-28 13:07 ` Carl-Daniel Hailfinger 2003-05-28 13:08 ` Jens Axboe 2003-05-28 13:16 ` Matthias Mueller 2003-05-28 13:21 ` Con Kolivas 2003-05-28 13:30 ` Carl-Daniel Hailfinger 2003-05-28 13:33 ` Con Kolivas 2003-05-28 13:27 ` Stefan Foerster 2003-05-28 13:37 ` Stefan Foerster 2003-05-28 14:28 ` Chris Mason 2003-05-28 14:33 ` Jens Axboe 2003-05-28 14:58 ` Chris Mason 2003-05-28 15:39 ` Jens Axboe 2003-05-28 23:38 ` Chris Mason 2003-05-28 13:25 ` Stefan Foerster 2003-05-28 18:19 ` Zwane Mwaikambo 2003-05-28 18:32 ` Zwane Mwaikambo 2003-05-28 18:47 ` Elladan 2003-05-28 23:03 ` Con Kolivas 2003-05-29 13:09 ` Andrea Arcangeli 2003-05-29 15:04 ` Con Kolivas 2003-05-29 16:23 ` Marc-Christian Petersen 2003-05-28 11:41 ` Con Kolivas 2003-05-29 12:52 ` Andrea Arcangeli 2003-05-28 11:03 ` Nick Piggin 2003-05-28 10:29 ` Con Kolivas 2003-05-28 10:29 ` Marc-Christian Petersen 2003-05-28 12:10 ` Matthias Mueller 2003-05-28 12:14 ` Matthias Mueller 2003-05-28 12:21 ` Carl-Daniel Hailfinger 2003-05-28 12:23 ` Matthias Mueller 2003-05-28 12:28 ` Carl-Daniel Hailfinger 2003-05-28 12:38 ` Matthias Mueller 2003-05-29 13:19 ` Andrea Arcangeli 2003-05-29 14:10 ` Matthias Mueller 2003-05-29 16:22 ` Andrea Arcangeli 2003-05-28 14:00 ` Con Kolivas 2003-05-29 13:24 ` Andrea Arcangeli 2003-05-29 13:55 ` Willy Tarreau 2003-05-29 14:09 ` Con Kolivas 2003-05-29 14:38 ` Matthias Mueller 2003-05-29 16:10 ` Willy TARREAU 2003-05-29 14:45 ` Marc-Christian Petersen 2003-05-29 16:06 ` Willy TARREAU 2003-05-29 16:49 ` Andrea Arcangeli 2003-05-29 17:46 ` Willy Tarreau 2003-05-29 16:19 ` Andrea Arcangeli 2003-05-29 1:32 ` manish 2003-05-28 10:24 ` Marc-Christian Petersen 2003-05-28 7:16 ` Marc Wilson 2003-05-28 19:53 ` David Ford 2003-05-28 9:36 ` Ragnar Hojland Espinosa 2003-05-28 9:45 ` Jens Axboe 2003-05-28 9:53 ` Marc-Christian Petersen 2003-05-28 10:01 ` Jens Axboe 2003-05-28 10:58 ` Alan Cox 2003-05-29 8:34 ` Ragnar Hojland Espinosa [not found] <20030527035006$5339@gated-at.bofh.it> [not found] ` <20030527175008$3573@gated-at.bofh.it> [not found] ` <20030527180016$418c@gated-at.bofh.it> [not found] ` <20030527182011$4acb@gated-at.bofh.it> [not found] ` <20030528094008$1500@gated-at.bofh.it> [not found] ` <20030528095014$7b21@gated-at.bofh.it> 2003-05-28 18:55 ` Thomas Tonino 2003-06-02 10:43 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).