* 2.4.22pre6aa1 @ 2003-07-17 10:28 Andrea Arcangeli 2003-07-17 10:42 ` 2.4.22pre6aa1 ooyama eiichi ` (3 more replies) 0 siblings, 4 replies; 37+ messages in thread From: Andrea Arcangeli @ 2003-07-17 10:28 UTC (permalink / raw) To: linux-kernel URL: http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.22pre6aa1.gz changelog diff between 2.4.21rc8aa1 and 2.4.22pre6aa1: Only in 2.4.21rc8aa1: 00_01_cciss-1 Only in 2.4.21rc8aa1: 00_02_cciss-1 Only in 2.4.21rc8aa1: 00_03_cciss-1 Updates are in mainline. Only in 2.4.21rc8aa1: 00_backout-irda-trivial-1 Somebody acknowledged and fixed the breakage properly. I hadn't a chance to test it myself yet on my cellphone, but I will shortly. Only in 2.4.21rc8aa1: 00_binfmt-elf-checks-1 Only in 2.4.22pre6aa1: 00_binfmt-elf-checks-2 Only in 2.4.21rc8aa1: 00_dirty-inode-1 Only in 2.4.22pre6aa1: 00_dirty-inode-3 Only in 2.4.21rc8aa1: 00_drop-inetpeer-cache-4.gz Only in 2.4.22pre6aa1: 00_drop-inetpeer-cache-5.gz Only in 2.4.21rc8aa1: 00_ext3-register-filesystem-lifo-1 Only in 2.4.22pre6aa1: 00_ext3-register-filesystem-lifo-2 Only in 2.4.21rc8aa1: 00_extraversion-24 Only in 2.4.22pre6aa1: 00_extraversion-26 Only in 2.4.21rc8aa1: 00_generic_file_write_nolock-1 Only in 2.4.22pre6aa1: 00_generic_file_write_nolock-3 Only in 2.4.21rc8aa1: 00_module-locking-fix-2 Only in 2.4.22pre6aa1: 00_module-locking-fix-3 Only in 2.4.21rc8aa1: 00_netconsole-2.4.10-C2-3.gz Only in 2.4.22pre6aa1: 00_netconsole-2.4.10-C2-4.gz Only in 2.4.21rc8aa1: 00_rwsem-fair-36 Only in 2.4.21rc8aa1: 00_rwsem-fair-36-recursive-8 Only in 2.4.22pre6aa1: 00_rwsem-fair-38 Only in 2.4.22pre6aa1: 00_rwsem-fair-38-recursive-8 Only in 2.4.21rc8aa1: 00_setfl-race-fix-2 Only in 2.4.22pre6aa1: 00_setfl-race-fix-3 Only in 2.4.21rc8aa1: 00_vm-cleanups-2 Only in 2.4.22pre6aa1: 00_vm-cleanups-3 Only in 2.4.21rc8aa1: 05_vm_20_cleanups-2 Only in 2.4.22pre6aa1: 05_vm_20_cleanups-3 Only in 2.4.21rc8aa1: 07_qlogicfc-4.gz Only in 2.4.22pre6aa1: 07_qlogicfc-5.gz Only in 2.4.21rc8aa1: 10_rawio-vary-io-18 Only in 2.4.22pre6aa1: 10_rawio-vary-io-21 Only in 2.4.21rc8aa1: 20_rcu-poll-8 Only in 2.4.22pre6aa1: 20_rcu-poll-9 Only in 2.4.21rc8aa1: 20_sched-o1-fixes-8 Only in 2.4.22pre6aa1: 20_sched-o1-fixes-9 Only in 2.4.21rc8aa1: 50_uml-patch-2.4.20-5-1.gz Only in 2.4.22pre6aa1: 50_uml-patch-2.4.20-5-2.gz Only in 2.4.21rc8aa1: 60_atomic-lookup-5 Only in 2.4.22pre6aa1: 60_atomic-lookup-6 Only in 2.4.21rc8aa1: 60_tux-exports-6 Only in 2.4.22pre6aa1: 60_tux-exports-7 Only in 2.4.21rc8aa1: 70_delalloc-2 Only in 2.4.22pre6aa1: 70_delalloc-3 Only in 2.4.21rc8aa1: 96_inode_read_write-atomic-6 Only in 2.4.22pre6aa1: 96_inode_read_write-atomic-8 Only in 2.4.21rc8aa1: 97_i_size-corruption-fixes-2 Only in 2.4.22pre6aa1: 97_i_size-corruption-fixes-4 Only in 2.4.21rc8aa1: 9900_aio-20.gz Only in 2.4.22pre6aa1: 9900_aio-21.gz Only in 2.4.22pre6aa1: 9920_kgdb-10.gz Only in 2.4.21rc8aa1: 9920_kgdb-8.gz Only in 2.4.21rc8aa1: 9925_kmsgdump-0.4.4-2.gz Only in 2.4.22pre6aa1: 9925_kmsgdump-0.4.4-3.gz Only in 2.4.21rc8aa1: 9930_io_request_scale-5 Only in 2.4.22pre6aa1: 9930_io_request_scale-6 Only in 2.4.22pre6aa1: 9985_blk-atomic-12 Only in 2.4.21rc8aa1: 9985_blk-atomic-9 Only in 2.4.21rc8aa1: 9996_kiobuf-slab-1 Only in 2.4.22pre6aa1: 9996_kiobuf-slab-2 Only in 2.4.21rc8aa1: 9998_lowlatency-fixes-12 Only in 2.4.22pre6aa1: 9998_lowlatency-fixes-13 Only in 2.4.21rc8aa1: 9999_dm-1 Only in 2.4.22pre6aa1: 9999_dm-2 Only in 2.4.21rc8aa1: 9999_gcc-3.3-6 Only in 2.4.22pre6aa1: 9999_gcc-3.3-7 Only in 2.4.21rc8aa1: 9999_sched_yield_scale-2 Only in 2.4.22pre6aa1: 9999_sched_yield_scale-5 Rediffed. Only in 2.4.22pre6aa1: 00_copy-namespace-1 Fix copy-namespace. Only in 2.4.21rc8aa1: 00_cpufreq-1 Dropped (would better go in mainline than in -aa, I already tried it and it doesn't do what I need, and now it's rejecting in multiple ways). Only in 2.4.22pre6aa1: 00_crc-makefile-clean-1 Remeber to delete the autogenerated files to generate clean diffs. Only in 2.4.21rc8aa1: 00_cs46xx-u32-1 Only in 2.4.21rc8aa1: 00_floppy-smp-race-and-queuesize-1 Only in 2.4.21rc8aa1: 00_ipv6-route-fix-1 Only in 2.4.21rc8aa1: 00_o_direct-b_page-null-1 Only in 2.4.21rc8aa1: 00_ppp-ioctl-memleak-1 Only in 2.4.21rc8aa1: 00_tcp-tw-death-2 Only in 2.4.21rc8aa1: 00_usbnet-zaurus-c700-1 Only in 2.4.21rc8aa1: 00_wait_kio-cleanup-1 Only in 2.4.21rc8aa1: 10_tlb-state-3 Only in 2.4.21rc8aa1: 30_02_call-reserve1-1 Only in 2.4.21rc8aa1: 30_03_call-reserve2-2 Only in 2.4.21rc8aa1: 30_04_noac-1 Only in 2.4.21rc8aa1: 30_09_o_direct-3 Only in 2.4.21rc8aa1: 30_10-lockd1-1 Only in 2.4.21rc8aa1: 30_11-lockd2-1 Only in 2.4.21rc8aa1: 30_13-lockd4-1 Only in 2.4.21rc8aa1: 30_15-xprt_fixes-1 Only in 2.4.21rc8aa1: 70_quota-backport-3 Only in 2.4.21rc8aa1: 9999901_O_DIRECT-1 Merged in mainline. Only in 2.4.22pre6aa1: 00_elevator-lowlatency-1 Reduced the number of requests during seeks (the latency times increased slightly during seeks with pre5/pre6). Only in 2.4.22pre6aa1: 00_elevator-read-reservation-axboe-2l-1 Incremental patch from Jens, that reserved some spare request for reads. This is been measured to avoid some waiting for reads and it's beneficial in the common case. Only in 2.4.22pre6aa1: 00_fdatasync-cleanup-1 Avoid a compile time warning. Only in 2.4.21rc8aa1: 00_ksoftirqd-max-loop-networking-1 Only in 2.4.22pre6aa1: 00_ksoftirqd-max-loop-networking-2 Merged a fix from Philip Craig to be sure to make the anti-DoS logic effective. He wrote and verified the code. It makes perfect sense so it's applied. Normal usages shouldn't notice the difference, especially with the max-loop logic. Only in 2.4.22pre6aa1: 00_parport-multi-io-pci-1 Multi-io cards depends on config-pci, from Matthew Bell. Only in 2.4.21rc8aa1: 00_radeon-3 This started to reject and the mainline code seems slightly different now. Should be rechecked later. Only in 2.4.21rc8aa1: 00_sched-O1-aa-2.4.19rc3-12.gz Only in 2.4.22pre6aa1: 00_sched-O1-aa-2.4.19rc3-14.gz Avoid losing an half timeslice of in signal delivery delay if the signal was sent while the task was under weakup. Fix from Ingo Molnar. Only in 2.4.21rc8aa1: 00_semop-timeout-2 Only in 2.4.22pre6aa1: 00_semop-timeout-3 Most of it merged in mainline, except the ia64 entry in the syscall table. Interestingly the syscall now allocated for ia64 is different than the one in 21rc8aa1. Only in 2.4.21rc8aa1: 00_smp-timers-not-deadlocking-3 Only in 2.4.22pre6aa1: 00_smp-timers-not-deadlocking-5 Merged an anti deadlock fix from lcm, 2.5 probably needs it too. In short the theory that mod_timer is the only thing that can run in parallel was wrong, add_timer and del_timer/del_timer_sync can too. Having already fixed mod_timer in a backwards compatible way before merging the smp-timers in -aa, made it easy to fix those further windows too. Only in 2.4.21rc8aa1: 00_usb_get_string-len-1 Dropped, was the wrong fix and it could break stuff. Only in 2.4.22pre6aa1: 05_vm_25_try_to_free_buffers-invariant-1 Minor cleanup from Daniele Bellucci. Only in 2.4.21rc8aa1: 10_o_direct-open-check-3 Only in 2.4.22pre6aa1: 10_o_direct-open-check-4 Updated to handle the double API. Only in 2.4.21rc8aa1: 10_try-cciss-only-4G-1 Dropped, new code in mainline. Only in 2.4.22pre6aa1: 21_ppc64-aa-2 Was used to fix ppc64 around pre3, but pre3-pre6 may have broke stuff again, I didn't check. Only in 2.4.21rc8aa1: 70_xfs-1.2-3.gz Only in 2.4.22pre6aa1: 70_xfs-1.3-2.gz Only in 2.4.21rc8aa1: 70_xfs-config-stuff-3 Only in 2.4.22pre6aa1: 70_xfs-config-stuff-4 Only in 2.4.21rc8aa1: 70_xfs-exports-1 Only in 2.4.22pre6aa1: 70_xfs-exports-2 Only in 2.4.21rc8aa1: 70_xfs-sysctl-2 Only in 2.4.22pre6aa1: 70_xfs-sysctl-3 Only in 2.4.21rc8aa1: 71_posix_acl-2 Only in 2.4.22pre6aa1: 71_posix_acl-3 Only in 2.4.22pre6aa1: 71_xfs-VM_IO-1 Only in 2.4.21rc8aa1: 71_xfs-aa-2 Only in 2.4.22pre6aa1: 71_xfs-aa-4 Only in 2.4.22pre6aa1: 71_xfs-fixup-1 Only in 2.4.22pre6aa1: 71_xfs-infrastructure-1 Only in 2.4.22pre6aa1: 71_xfs-tuning-1 Upgraded XFS from 1.2 to 1.3. Only in 2.4.21rc8aa1: 80_x86_64-common-code-6 Only in 2.4.21rc8aa1: 82_x86_64-suse-12 Only in 2.4.21rc8aa1: 84_x86-64-arch-3 Only in 2.4.21rc8aa1: 85_x86-64-includes-2 Dropped, mainline is more uptodate. Though it won't compile like ia64. Only in 2.4.21rc8aa1: 93_NUMAQ-10 Only in 2.4.22pre6aa1: 93_NUMAQ-13 Merged latest numa code for x440. Only in 2.4.22pre6aa1: 9900_aio-21-ppc-1 ppc aio code. Only in 2.4.22pre6aa1: 9901_aio-blkdev-1 Allow aio on blkdevices too (dunno who wrote this). Only in 2.4.21rc8aa1: 9910_shm-largepage-13.gz Only in 2.4.22pre6aa1: 9910_shm-largepage-16.gz Thanks to Hugh for the help in porting the bigpages to the rewritten shmfs layer in 22pre. No idea at the moment if it works or if it only compiles. Only in 2.4.21rc8aa1: 9940_ocfs-2.gz Only in 2.4.22pre6aa1: 9940_ocfs-3.gz Only in 2.4.21rc8aa1: 9941_ocfs-20021012.gz Only in 2.4.22pre6aa1: 9941_ocfs-direct-1 Only in 2.4.22pre6aa1: 9941_ocfs-warnings-1 Only in 2.4.21rc8aa1: 9942_ocfs-compile-2 Only in 2.4.22pre6aa1: 9942_ocfs-o_direct-API-1 Upgraded to a more recent ocfs version (merged by Andi Kleen). Only in 2.4.21rc8aa1: 9980_fix-pausing-5 Only in 2.4.22pre6aa1: 9980_fix-pausing-6 Only in 2.4.21rc8aa1: 9981_elevator-lowlatency-5 Fix pausing and elevator lowlatency are now in 2.4.22pre. Unplugging the queue may avoid a reschedule. Only in 2.4.21rc8aa1: 9986_elevator-merge-fast-path-1 Only in 2.4.22pre6aa1: 9986_elevator-merge-fast-path-2 Enabled for headactive devices (i.e. IDE) too. Idea and original patch from Daniele Bellucci, final patch from Jens Axboe. Only in 2.4.22pre6aa1: 9998_lowlatency-reiserfs-1 Added an appealing reschedule hook (should be double checked). Only in 2.4.22pre6aa1: 9999900_desktop-2 Added a desktop mode that guarantees an higher degree of fariness in the scheduler. Only in 2.4.22pre6aa1: 9999900_drm-4.3-1.gz drm updates. Merged by Chip Salzenberg. Only in 2.4.22pre6aa1: 9999900_ecc-20020904-1.gz ecc timer poller latest code. Merged by Chip Salzenberg. Only in 2.4.21rc8aa1: 9999900_ikd-1 Only in 2.4.22pre6aa1: 9999900_ikd-2.gz Initialize it at boot so it will have a chance to work. Only in 2.4.22pre6aa1: 9999900_x86-movsl-copy-user-1 Boost the copy-user asm. Only in 2.4.21rc8aa1: 9999_truncate-nopage-race-1 Only in 2.4.22pre6aa1: 9999_truncate-nopage-race-3 Take advantage of the i_alloc_sem in read mode to serialize only against truncates, to avoid possible suprious reschedules. Only in 2.4.21rc8aa1: 10_ext3-o_direct-2 Only in 2.4.22pre6aa1: 10_ext3-o_direct-3 Only in 2.4.21rc8aa1: 40_o_direct-reiserfs-2 Update to new API. Andrea ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-17 10:28 2.4.22pre6aa1 Andrea Arcangeli @ 2003-07-17 10:42 ` ooyama eiichi 2003-07-17 10:52 ` 2.4.22pre6aa1 Marc-Christian Petersen 2003-07-17 10:53 ` 2.4.22pre6aa1 ooyama eiichi 2003-07-17 15:42 ` 2.4.22pre6aa1 Dave Jones ` (2 subsequent siblings) 3 siblings, 2 replies; 37+ messages in thread From: ooyama eiichi @ 2003-07-17 10:42 UTC (permalink / raw) To: linux-kernel Hi Andrea. I am sorry, I couldn't find this file. Maybe, I have to wait ? From: Andrea Arcangeli <andrea@suse.de> Subject: 2.4.22pre6aa1 Date: Thu, 17 Jul 2003 12:28:57 +0200 > URL: > > http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.22pre6aa1.gz > > changelog diff between 2.4.21rc8aa1 and 2.4.22pre6aa1: > > Only in 2.4.21rc8aa1: 00_01_cciss-1 > Only in 2.4.21rc8aa1: 00_02_cciss-1 > Only in 2.4.21rc8aa1: 00_03_cciss-1 > > Updates are in mainline. > > Only in 2.4.21rc8aa1: 00_backout-irda-trivial-1 > > Somebody acknowledged and fixed the breakage properly. > I hadn't a chance to test it myself yet on my cellphone, > but I will shortly. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-17 10:42 ` 2.4.22pre6aa1 ooyama eiichi @ 2003-07-17 10:52 ` Marc-Christian Petersen 2003-07-17 10:53 ` 2.4.22pre6aa1 ooyama eiichi 1 sibling, 0 replies; 37+ messages in thread From: Marc-Christian Petersen @ 2003-07-17 10:52 UTC (permalink / raw) To: ooyama eiichi, linux-kernel On Thursday 17 July 2003 12:42, ooyama eiichi wrote: Hi Ooyama, > I am sorry, I couldn't find this file. > Maybe, I have to wait ? use another mirror, e.g: http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.22pre6aa1.gz ciao, Marc ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-17 10:42 ` 2.4.22pre6aa1 ooyama eiichi 2003-07-17 10:52 ` 2.4.22pre6aa1 Marc-Christian Petersen @ 2003-07-17 10:53 ` ooyama eiichi 1 sibling, 0 replies; 37+ messages in thread From: ooyama eiichi @ 2003-07-17 10:53 UTC (permalink / raw) To: linux-kernel Hi,Andrea. I can get the file from this URL: (without "us") http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.22pre6aa1.gz > Hi Andrea. > > I am sorry, I couldn't find this file. > Maybe, I have to wait ? > > From: Andrea Arcangeli <andrea@suse.de> > Subject: 2.4.22pre6aa1 > Date: Thu, 17 Jul 2003 12:28:57 +0200 > > > URL: > > > > http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.22pre6aa1.gz > > > > changelog diff between 2.4.21rc8aa1 and 2.4.22pre6aa1: > > > > Only in 2.4.21rc8aa1: 00_01_cciss-1 > > Only in 2.4.21rc8aa1: 00_02_cciss-1 > > Only in 2.4.21rc8aa1: 00_03_cciss-1 > > > > Updates are in mainline. > > > > Only in 2.4.21rc8aa1: 00_backout-irda-trivial-1 > > > > Somebody acknowledged and fixed the breakage properly. > > I hadn't a chance to test it myself yet on my cellphone, > > but I will shortly. > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-17 10:28 2.4.22pre6aa1 Andrea Arcangeli 2003-07-17 10:42 ` 2.4.22pre6aa1 ooyama eiichi @ 2003-07-17 15:42 ` Dave Jones 2003-07-17 20:31 ` 2.4.22pre6aa1 Andrea Arcangeli 2003-07-17 22:13 ` 2.4.22pre6aa1 Marc-Christian Petersen 2003-07-18 18:18 ` 2.4.22pre6aa1 Christoph Hellwig 3 siblings, 1 reply; 37+ messages in thread From: Dave Jones @ 2003-07-17 15:42 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel On Thu, Jul 17, 2003 at 12:28:57PM +0200, Andrea Arcangeli wrote: > Only in 2.4.21rc8aa1: 00_cpufreq-1 > > Dropped (would better go in mainline than in -aa Proposed for 2.4.23. Marcelo doesn't seem to have any objections. > I already tried it and it doesn't do what I need You know where to report bugs... Dave ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-17 15:42 ` 2.4.22pre6aa1 Dave Jones @ 2003-07-17 20:31 ` Andrea Arcangeli 0 siblings, 0 replies; 37+ messages in thread From: Andrea Arcangeli @ 2003-07-17 20:31 UTC (permalink / raw) To: Dave Jones, linux-kernel On Thu, Jul 17, 2003 at 04:42:12PM +0100, Dave Jones wrote: > On Thu, Jul 17, 2003 at 12:28:57PM +0200, Andrea Arcangeli wrote: > > I already tried it and it doesn't do what I need > > You know where to report bugs... Hmm I thought it was a feature not a bug or I would have already reported something ;) What I need is to set the frequency to around 400mhz when on battery, but that's not any of the speedstep frequencies, the speedstep frequencies are too fast (750/1200mhz) or too slow (250mhz). Is it supposed to work that way? thanks, Andrea ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-17 10:28 2.4.22pre6aa1 Andrea Arcangeli 2003-07-17 10:42 ` 2.4.22pre6aa1 ooyama eiichi 2003-07-17 15:42 ` 2.4.22pre6aa1 Dave Jones @ 2003-07-17 22:13 ` Marc-Christian Petersen 2003-07-17 22:26 ` 2.4.22pre6aa1 Andrea Arcangeli ` (2 more replies) 2003-07-18 18:18 ` 2.4.22pre6aa1 Christoph Hellwig 3 siblings, 3 replies; 37+ messages in thread From: Marc-Christian Petersen @ 2003-07-17 22:13 UTC (permalink / raw) To: Andrea Arcangeli, linux-kernel, Chris Mason On Thursday 17 July 2003 12:28, Andrea Arcangeli wrote: Hi Andrea, > Only in 2.4.22pre6aa1: 00_elevator-lowlatency-1 > Only in 2.4.22pre6aa1: 00_elevator-read-reservation-axboe-2l-1 Hmm, this is now my first day testing out .22-pre6 and .22-pre6aa1 with the new I/O stall fixes. At a first look & feel it's very good, but I've noticed a side effect (if it can be called so): VMware4 Workstation ------------------- 2.4.22-pre[6|6aa1]: ~ 1 minute 02 seconds from: Start this virtual machine ... 2.4.22-pre2 : ~ 30 seconds from: Start this virtual machine ... ... to start up Windows 2000 Professional completely. Well, personally I don't care about the slowdown of vmware startup with a VM but there may be many other slowdows?! ciao, Marc ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-17 22:13 ` 2.4.22pre6aa1 Marc-Christian Petersen @ 2003-07-17 22:26 ` Andrea Arcangeli 2003-07-17 22:27 ` 2.4.22pre6aa1 Mike Fedyk 2003-07-17 22:30 ` 2.4.22pre6aa1 Marc-Christian Petersen 2 siblings, 0 replies; 37+ messages in thread From: Andrea Arcangeli @ 2003-07-17 22:26 UTC (permalink / raw) To: Marc-Christian Petersen; +Cc: linux-kernel, Chris Mason On Fri, Jul 18, 2003 at 12:13:38AM +0200, Marc-Christian Petersen wrote: > 2.4.22-pre[6|6aa1]: ~ 1 minute 02 seconds from: Start this virtual machine ... > 2.4.22-pre2 : ~ 30 seconds from: Start this virtual machine ... > > ... to start up Windows 2000 Professional completely. can you check what's doing? reading or writing? I guess it's a kind of workload that would seek all over the place. However throughput should be better with seeks now since I could grow the queue (if something only latency would be worse but the above is a throughput thing only, latency doesn't matter). Can you retry once more time with pre2 vs pre6 to be 100% sure it's reproducible? thanks, Andrea ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-17 22:13 ` 2.4.22pre6aa1 Marc-Christian Petersen 2003-07-17 22:26 ` 2.4.22pre6aa1 Andrea Arcangeli @ 2003-07-17 22:27 ` Mike Fedyk 2003-07-17 22:32 ` 2.4.22pre6aa1 Marc-Christian Petersen 2003-07-17 22:30 ` 2.4.22pre6aa1 Marc-Christian Petersen 2 siblings, 1 reply; 37+ messages in thread From: Mike Fedyk @ 2003-07-17 22:27 UTC (permalink / raw) To: Marc-Christian Petersen; +Cc: Andrea Arcangeli, linux-kernel, Chris Mason On Fri, Jul 18, 2003 at 12:13:38AM +0200, Marc-Christian Petersen wrote: > VMware4 Workstation > ------------------- > > 2.4.22-pre[6|6aa1]: ~ 1 minute 02 seconds from: Start this virtual machine ... > 2.4.22-pre2 : ~ 30 seconds from: Start this virtual machine ... > > ... to start up Windows 2000 Professional completely. > > Well, personally I don't care about the slowdown of vmware startup with a VM > but there may be many other slowdows?! Can you try a stock -pre kernel, say pre[256], and see where the additional time starts? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-17 22:27 ` 2.4.22pre6aa1 Mike Fedyk @ 2003-07-17 22:32 ` Marc-Christian Petersen 0 siblings, 0 replies; 37+ messages in thread From: Marc-Christian Petersen @ 2003-07-17 22:32 UTC (permalink / raw) To: Mike Fedyk; +Cc: Andrea Arcangeli, linux-kernel, Chris Mason On Friday 18 July 2003 00:27, Mike Fedyk wrote: Hi Mike, > Can you try a stock -pre kernel, say pre[256], and see where the additional > time starts? Sure. Well, but I expect the behaviour starts with -pre3. Anyway, I'll test. ciao, Marc ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-17 22:13 ` 2.4.22pre6aa1 Marc-Christian Petersen 2003-07-17 22:26 ` 2.4.22pre6aa1 Andrea Arcangeli 2003-07-17 22:27 ` 2.4.22pre6aa1 Mike Fedyk @ 2003-07-17 22:30 ` Marc-Christian Petersen 2003-07-17 22:50 ` 2.4.22pre6aa1 Andrea Arcangeli 2 siblings, 1 reply; 37+ messages in thread From: Marc-Christian Petersen @ 2003-07-17 22:30 UTC (permalink / raw) To: Andrea Arcangeli, linux-kernel, Chris Mason On Friday 18 July 2003 00:13, Marc-Christian Petersen wrote: > On Thursday 17 July 2003 12:28, Andrea Arcangeli wrote: > > Hi Andrea, > > > Only in 2.4.22pre6aa1: 00_elevator-lowlatency-1 > > Only in 2.4.22pre6aa1: 00_elevator-read-reservation-axboe-2l-1 > > Hmm, this is now my first day testing out .22-pre6 and .22-pre6aa1 with the > new I/O stall fixes. At a first look & feel it's very good, but I've > noticed a side effect (if it can be called so): > > VMware4 Workstation > ------------------- > > 2.4.22-pre[6|6aa1]: ~ 1 minute 02 seconds from: Start this virtual machine > ... 2.4.22-pre2 : ~ 30 seconds from: Start this virtual > machine ... > > ... to start up Windows 2000 Professional completely. > > Well, personally I don't care about the slowdown of vmware startup with a > VM but there may be many other slowdows?! hmmm: 2.4.22-pre[6|6aa1]: ------------------- root@codeman:[/] # dd if=/dev/zero of=/home/largefile bs=16384 count=131072 131072+0 records in 131072+0 records out 2147483648 bytes transferred in 128.765686 seconds (16677453 bytes/sec) 2.4.22-pre2: ------------ root@codeman:[/] # dd if=/dev/zero of=/home/largefile bs=16384 count=131072 131072+0 records in 131072+0 records out 2147483648 bytes transferred in 98.489331 seconds (21804226 bytes/sec) both kernels freshly rebooted. Machine: -------- Celeron 1,3GHz 512MB RAM 2x IDE (UDMA100) 60/40 GB 1GB SWAP, 512MB on each disk (same priority) ext3fs (data=ordered) XFree 4.3 WindowMaker 0.82-CVS ciao, Marc ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-17 22:30 ` 2.4.22pre6aa1 Marc-Christian Petersen @ 2003-07-17 22:50 ` Andrea Arcangeli 2003-07-18 0:30 ` 2.4.22pre6aa1 Chris Mason ` (2 more replies) 0 siblings, 3 replies; 37+ messages in thread From: Andrea Arcangeli @ 2003-07-17 22:50 UTC (permalink / raw) To: Marc-Christian Petersen; +Cc: linux-kernel, Chris Mason On Fri, Jul 18, 2003 at 12:30:45AM +0200, Marc-Christian Petersen wrote: > On Friday 18 July 2003 00:13, Marc-Christian Petersen wrote: > > > On Thursday 17 July 2003 12:28, Andrea Arcangeli wrote: > > > > Hi Andrea, > > > > > Only in 2.4.22pre6aa1: 00_elevator-lowlatency-1 > > > Only in 2.4.22pre6aa1: 00_elevator-read-reservation-axboe-2l-1 > > > > Hmm, this is now my first day testing out .22-pre6 and .22-pre6aa1 with the > > new I/O stall fixes. At a first look & feel it's very good, but I've > > noticed a side effect (if it can be called so): > > > > VMware4 Workstation > > ------------------- > > > > 2.4.22-pre[6|6aa1]: ~ 1 minute 02 seconds from: Start this virtual machine > > ... 2.4.22-pre2 : ~ 30 seconds from: Start this virtual > > machine ... > > > > ... to start up Windows 2000 Professional completely. > > > > Well, personally I don't care about the slowdown of vmware startup with a > > VM but there may be many other slowdows?! > hmmm: > > 2.4.22-pre[6|6aa1]: > ------------------- > root@codeman:[/] # dd if=/dev/zero of=/home/largefile bs=16384 count=131072 > 131072+0 records in > 131072+0 records out > 2147483648 bytes transferred in 128.765686 seconds (16677453 bytes/sec) > > 2.4.22-pre2: > ------------ > root@codeman:[/] # dd if=/dev/zero of=/home/largefile bs=16384 count=131072 > 131072+0 records in > 131072+0 records out > 2147483648 bytes transferred in 98.489331 seconds (21804226 bytes/sec) > > both kernels freshly rebooted. this explains it. Can you try to change include/linux/blkdev.h like this: -#define MAX_QUEUE_SECTORS (4 << (20 - 9)) /* 4 mbytes when full sized */ +#define MAX_QUEUE_SECTORS (16 << (20 - 9)) /* 4 mbytes when full sized */ This will raise the queue from 4 to 16M. That is the first(/only) thing that can explain a drop in performnace while doing contigous I/O. However I didn't expect it to make a difference, or at least not so relevant. If this doesn't help at all, it might not be an elevator/blkdev thing. At least on my machines the contigous I/O still at the same speed. You also where the only one reporting a loss of performance with elevator-lowlatency, it could be still the same problem that you've seen at that time. Last but not the least, if it's an elevator/blkdev thing, you must be able to measure it with reads too, not only with writes. Can you try to read that file back? (careful about the cache effects if you read it multiple times and you interrupt it, best it to benchmark reads after a mount to be sure) > ext3fs (data=ordered) can you try with data=writeback (or ext2) or hdparm -W1 and see if you can still see the same delta between the two kernels? (careful with -W1 as it invalidates journaling) thanks, Andrea ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-17 22:50 ` 2.4.22pre6aa1 Andrea Arcangeli @ 2003-07-18 0:30 ` Chris Mason 2003-07-22 12:28 ` 2.4.22pre6aa1 Marc-Christian Petersen 2003-07-18 5:47 ` 2.4.22pre6aa1 Andrea Arcangeli 2003-07-22 13:34 ` 2.4.22pre6aa1 Marc-Christian Petersen 2 siblings, 1 reply; 37+ messages in thread From: Chris Mason @ 2003-07-18 0:30 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Marc-Christian Petersen, linux-kernel On Thu, 2003-07-17 at 18:50, Andrea Arcangeli wrote: > On Fri, Jul 18, 2003 at 12:30:45AM +0200, Marc-Christian Petersen wrote: > > 2.4.22-pre[6|6aa1]: > > ------------------- > > root@codeman:[/] # dd if=/dev/zero of=/home/largefile bs=16384 count=131072 > > 131072+0 records in > > 131072+0 records out > > 2147483648 bytes transferred in 128.765686 seconds (16677453 bytes/sec) > > > > 2.4.22-pre2: > > ------------ > > root@codeman:[/] # dd if=/dev/zero of=/home/largefile bs=16384 count=131072 > > 131072+0 records in > > 131072+0 records out > > 2147483648 bytes transferred in 98.489331 seconds (21804226 bytes/sec) > > > > both kernels freshly rebooted. > > this explains it. > > Can you try to change include/linux/blkdev.h like this: > > -#define MAX_QUEUE_SECTORS (4 << (20 - 9)) /* 4 mbytes when full sized */ > +#define MAX_QUEUE_SECTORS (16 << (20 - 9)) /* 4 mbytes when full sized */ > > This will raise the queue from 4 to 16M. That is the first(/only) thing > that can explain a drop in performnace while doing contigous I/O. > However I didn't expect it to make a difference, or at least not so > relevant. > > If this doesn't help at all, it might not be an elevator/blkdev thing. > At least on my machines the contigous I/O still at the same speed. > Especially with just one writer, you really shouldn't be able to see a difference in pre6. Did you measure this change on both pre6 and pre6aa1. Your message indicated that but I wanted to double check to make sure. -chris ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-18 0:30 ` 2.4.22pre6aa1 Chris Mason @ 2003-07-22 12:28 ` Marc-Christian Petersen 2003-07-22 14:04 ` 2.4.22pre6aa1 Andrea Arcangeli 0 siblings, 1 reply; 37+ messages in thread From: Marc-Christian Petersen @ 2003-07-22 12:28 UTC (permalink / raw) To: Chris Mason, Andrea Arcangeli; +Cc: linux-kernel On Friday 18 July 2003 02:30, Chris Mason wrote: Hi Chris, > > If this doesn't help at all, it might not be an elevator/blkdev thing. > > At least on my machines the contigous I/O still at the same speed. > Especially with just one writer, you really shouldn't be able to see a > difference in pre6. Did you measure this change on both pre6 and > pre6aa1. Your message indicated that but I wanted to double check to > make sure. Yes, I measured it with pre6 and pre6aa1. There is no noticable difference. ciao, Marc ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-22 12:28 ` 2.4.22pre6aa1 Marc-Christian Petersen @ 2003-07-22 14:04 ` Andrea Arcangeli 0 siblings, 0 replies; 37+ messages in thread From: Andrea Arcangeli @ 2003-07-22 14:04 UTC (permalink / raw) To: Marc-Christian Petersen; +Cc: Chris Mason, linux-kernel On Tue, Jul 22, 2003 at 02:28:03PM +0200, Marc-Christian Petersen wrote: > Yes, I measured it with pre6 and pre6aa1. There is no noticable difference. this makes sense, thanks for double checking. Andrea ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-17 22:50 ` 2.4.22pre6aa1 Andrea Arcangeli 2003-07-18 0:30 ` 2.4.22pre6aa1 Chris Mason @ 2003-07-18 5:47 ` Andrea Arcangeli 2003-07-22 13:34 ` 2.4.22pre6aa1 Marc-Christian Petersen 2 siblings, 0 replies; 37+ messages in thread From: Andrea Arcangeli @ 2003-07-18 5:47 UTC (permalink / raw) To: Marc-Christian Petersen; +Cc: linux-kernel, Chris Mason On Fri, Jul 18, 2003 at 12:50:02AM +0200, Andrea Arcangeli wrote: > At least on my machines the contigous I/O still at the same speed. Just to be 100% sure I run an accurate benchmarks myself too. I had the numbers for the pre-2.4.22pre levels, but I didn't benchmarked yet on the final code in mainline that had some cosmetical difference. These are the results for the contigous I/O with vanilla 2.4.21 against vanilla 2.4.22-pre6 against 2.4.22pre6aa2 (and aa2 is completely equal to aa1 in terms of blkdev/IO). BTW, pre6aa2 is configured as desktop so it has some additional overhead (not significant in pure I/O bound computatations). aic7xxx booted with mem=128m ext3 data=ordered -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- Kernel MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 2.4.21 100 11052 77.3 21683 15.5 16401 8.8 20347 82.1 32765 6.1 865.6 2.8 2.4.21 100 13533 94.5 21236 13.9 15904 7.9 21182 82.3 35019 5.1 1254.3 2.2 2.4.21 100 12402 86.5 22453 14.9 16165 5.8 20270 82.1 34754 6.4 1398.8 3.8 22pre6 100 13070 91.4 23314 15.3 15402 6.5 21202 81.8 33167 8.4 959.9 2.2 22pre6 100 13181 92.2 18556 12.5 16506 6.9 20562 78.1 33394 4.9 1271.9 1.9 22pre6 100 14082 98.5 23170 16.1 16199 5.7 21045 81.2 34124 7.3 1450.6 4.0 22pre6aa2 100 12703 90.5 23245 16.3 15533 6.8 19730 79.6 37072 8.0 775.9 1.4 22pre6aa2 100 13241 94.0 20602 14.4 15562 7.0 19675 79.6 37102 7.7 843.5 1.8 22pre6aa2 100 12948 93.0 21566 15.0 15970 7.6 19460 81.7 36599 7.2 740.6 1.7 as you can see for contigous I/O I can't measure any regression at all, the minor variations across the three runs are likely for the largest part influenced by the ext3 block allocation that can change for every run. Andrea ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-17 22:50 ` 2.4.22pre6aa1 Andrea Arcangeli 2003-07-18 0:30 ` 2.4.22pre6aa1 Chris Mason 2003-07-18 5:47 ` 2.4.22pre6aa1 Andrea Arcangeli @ 2003-07-22 13:34 ` Marc-Christian Petersen 2003-07-22 13:59 ` 2.4.22pre6aa1 Andrea Arcangeli 2 siblings, 1 reply; 37+ messages in thread From: Marc-Christian Petersen @ 2003-07-22 13:34 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel, Chris Mason On Friday 18 July 2003 00:50, Andrea Arcangeli wrote: Hi Andrea, > Can you try to change include/linux/blkdev.h like this: > -#define MAX_QUEUE_SECTORS (4 << (20 - 9)) /* 4 mbytes when full sized */ > +#define MAX_QUEUE_SECTORS (16 << (20 - 9)) /* 4 mbytes when full sized */ > This will raise the queue from 4 to 16M. That is the first(/only) thing > that can explain a drop in performnace while doing contigous I/O. > However I didn't expect it to make a difference, or at least not so > relevant. > If this doesn't help at all, it might not be an elevator/blkdev thing. > At least on my machines the contigous I/O still at the same speed. well, it doesn't help at all. I/O gets more worse with that change. (8mb/s less). How can this happen? *wondering* > You also where the only one reporting a loss of performance with > elevator-lowlatency, it could be still the same problem that you've > seen at that time. The only one? Surely not. Also Con tested your elevator-lowlatency and we both saw performance degration :) > can you try with data=writeback (or ext2) or hdparm -W1 and see if you > can still see the same delta between the two kernels? (careful with -W1 > as it invalidates journaling) Yes, I'll do it later this day. Sorry for my late reply. I've been very busy. ciao, Marc ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-22 13:34 ` 2.4.22pre6aa1 Marc-Christian Petersen @ 2003-07-22 13:59 ` Andrea Arcangeli 2003-07-24 12:27 ` 2.4.22pre6aa1 Marc-Christian Petersen 0 siblings, 1 reply; 37+ messages in thread From: Andrea Arcangeli @ 2003-07-22 13:59 UTC (permalink / raw) To: Marc-Christian Petersen; +Cc: linux-kernel, Chris Mason On Tue, Jul 22, 2003 at 03:34:16PM +0200, Marc-Christian Petersen wrote: > On Friday 18 July 2003 00:50, Andrea Arcangeli wrote: > > Hi Andrea, > > > Can you try to change include/linux/blkdev.h like this: > > -#define MAX_QUEUE_SECTORS (4 << (20 - 9)) /* 4 mbytes when full sized */ > > +#define MAX_QUEUE_SECTORS (16 << (20 - 9)) /* 4 mbytes when full sized */ > > This will raise the queue from 4 to 16M. That is the first(/only) thing > > that can explain a drop in performnace while doing contigous I/O. > > However I didn't expect it to make a difference, or at least not so > > relevant. > > If this doesn't help at all, it might not be an elevator/blkdev thing. > > At least on my machines the contigous I/O still at the same speed. > well, it doesn't help at all. I/O gets more worse with that change. (8mb/s > less). How can this happen? *wondering* > > > You also where the only one reporting a loss of performance with > > elevator-lowlatency, it could be still the same problem that you've > > seen at that time. > The only one? Surely not. Also Con tested your elevator-lowlatency and we both > saw performance degration :) performance degradation when? note that we're only talking about contigous I/O here, not contest. I can't measure any performance degradation during contigous I/O and if something it could be explained by the now shorter queue, but you tried enlarging it and it went even slower (this was good btw, confirming a larger queue was completely worthless and it only hurts the VM without providing any I/O bandwidth pipelining benefit). The elevator-lowlatency should have no other effect other than a shorter queue during pure contigous I/O. > > can you try with data=writeback (or ext2) or hdparm -W1 and see if you > > can still see the same delta between the two kernels? (careful with -W1 > > as it invalidates journaling) > Yes, I'll do it later this day. please try plain ext2, this sounds like some fs effect of some sort. The fs must throttle on the shorter queue or seek differently somehow. > Sorry for my late reply. I've been very busy. No problem ;) Andrea ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-22 13:59 ` 2.4.22pre6aa1 Andrea Arcangeli @ 2003-07-24 12:27 ` Marc-Christian Petersen 2003-07-24 14:14 ` 2.4.22pre6aa1 Chris Mason 0 siblings, 1 reply; 37+ messages in thread From: Marc-Christian Petersen @ 2003-07-24 12:27 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel, Chris Mason, Nick Piggin On Tuesday 22 July 2003 15:59, Andrea Arcangeli wrote: Hi Andrea, > performance degradation when? note that we're only talking about > contigous I/O here, not contest. I can't measure any performance > degradation during contigous I/O and if something it could be explained > by the now shorter queue, but you tried enlarging it and it went even > slower (this was good btw, confirming a larger queue was completely > worthless and it only hurts the VM without providing any I/O bandwidth > pipelining benefit). The elevator-lowlatency should have no other effect > other than a shorter queue during pure contigous I/O. Well, contigous I/O isn't a big problem, though I saw performance degradation in contigous I/O. The problem is, that I still see mouse stops while heavy I/O, that I still see keyboard stops while heavy I/O, X is dog slow while heavy I/O (renicing X to -20 doesn't really help). I really miss the 2.4.18 time where this wasn't a problem at all! Contest was not the reason. An easy reproducable scenario is: dd if=/dev/zero of=/home/largefile bs=16384 count=131072 This will kill your mouse, keyboard and X. The only "workaround" not to see mouse stops, keyboard stops and X dogstyle was decreasing nr_requests from 128 to 4. Anything higher resulted in pauses (e.g. 8 for nr_requests). Maybe SCSI behaves totally different, dunno. ATM I don't have SCSI around to test it, only IDE (ATA100/ATA133). I've tested this too for .22-pre7, changing "MAX_NR_REQUESTS 1024" to "4". And now the big surprise: Still mouse stops, keyboard stops while, e.g. the above dd command, but with, for sure, very low throughput. So throughput dropping is not the problem here at all. I have very very low throughput but still pauses/stops. How is this possible? I am very confused about the code :-( > > > can you try with data=writeback (or ext2) or hdparm -W1 and see if you > > > can still see the same delta between the two kernels? (careful with -W1 > > > as it invalidates journaling) > > Yes, I'll do it later this day. > please try plain ext2, this sounds like some fs effect of some sort. The > fs must throttle on the shorter queue or seek differently somehow. well, ext2 does not make any difference :-( I thought trying out q->full from Chris would make any difference. I am quite sure that it must be a merge error by me, otherwise I cannot explain why q->full kills my X-windows for tons of seconds during a "make -j16 bzImage modules" I get stops of the whole system too for some seconds every 30 seconds or so. Ripped out q->full (not just disabling via elvtune -b 0) fixed at least that behaviour. Another funny thing, not dependant on q->full, is, that VMware needs over 1 Minute to start up with a Windows 2000 in it where w/o the lowlat elevator it needs ~30 seconds or less to start up completely. VMware has reads/writes during the startup about _max_ of 500kb/s. Before it went up to 10mb/s. Now we should decide if it's either a bug in the kernel or a bug in VMware ;)) > > Sorry for my late reply. I've been very busy. > No problem ;) ok :) thnx. Sorry again for the delay, but I wanted to be sure about the reports so I had to test many things out first. Hmm, I am a bit afraid that no one else noticed this yet. This reminds be back to over a year ago where I reported I/O stalls/pauses/stops with 2.4.19-pre's and no one noticed that but you after some time. A 'real' fix for that came up over one year later and some days before we had a big discussion about it with many people involved noticing it too. Don't get me wrong Andrea and Chris :) .. but I am quite disappointed about current Linux for the Desktop. 2.4 has I/O problems, 2.6 has Scheduler problems, 2 things I cannot live with for my Desktop. Maybe Linus is right when he said, Linux may be Desktop ready in 2006. Any suggestions what I can do to help to fix that silly behaviour? I really really want a usable 2.4 tree again (read: 2.4.22 final) :) P.S.: I've CC'ed Nick. ciao, Marc ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-24 12:27 ` 2.4.22pre6aa1 Marc-Christian Petersen @ 2003-07-24 14:14 ` Chris Mason 0 siblings, 0 replies; 37+ messages in thread From: Chris Mason @ 2003-07-24 14:14 UTC (permalink / raw) To: Marc-Christian Petersen; +Cc: Andrea Arcangeli, linux-kernel, Nick Piggin On Thu, 2003-07-24 at 08:27, Marc-Christian Petersen wrote: > On Tuesday 22 July 2003 15:59, Andrea Arcangeli wrote: > > Hi Andrea, > > > performance degradation when? note that we're only talking about > > contigous I/O here, not contest. I can't measure any performance > > degradation during contigous I/O and if something it could be explained > > by the now shorter queue, but you tried enlarging it and it went even > > slower (this was good btw, confirming a larger queue was completely > > worthless and it only hurts the VM without providing any I/O bandwidth > > pipelining benefit). The elevator-lowlatency should have no other effect > > other than a shorter queue during pure contigous I/O. > Well, contigous I/O isn't a big problem, though I saw performance degradation > in contigous I/O. The problem is, that I still see mouse stops while heavy > I/O, that I still see keyboard stops while heavy I/O, X is dog slow while > heavy I/O (renicing X to -20 doesn't really help). I really miss the 2.4.18 > time where this wasn't a problem at all! > Contest was not the reason. An easy reproducable scenario is: > > dd if=/dev/zero of=/home/largefile bs=16384 count=131072 > > This will kill your mouse, keyboard and X. The only "workaround" not to see > mouse stops, keyboard stops and X dogstyle was decreasing nr_requests from > 128 to 4. Anything higher resulted in pauses (e.g. 8 for nr_requests). > Maybe SCSI behaves totally different, dunno. ATM I don't have SCSI around to > test it, only IDE (ATA100/ATA133). Ok, there's something fundamental we're missing here, the IDE boxes I test on don't show this ;-) Can you setup a serial console and capture sysrq-t during the pause? Or better yet setup kgdb. What kind of keyboard/mouse do you have? I'll give you an updated q->full patch on Monday, including the __get_request_wait latency stats. -chris ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-17 10:28 2.4.22pre6aa1 Andrea Arcangeli ` (2 preceding siblings ...) 2003-07-17 22:13 ` 2.4.22pre6aa1 Marc-Christian Petersen @ 2003-07-18 18:18 ` Christoph Hellwig 2003-07-18 22:27 ` 2.4.22pre6aa1 Andrea Arcangeli 3 siblings, 1 reply; 37+ messages in thread From: Christoph Hellwig @ 2003-07-18 18:18 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel On Thu, Jul 17, 2003 at 12:28:57PM +0200, Andrea Arcangeli wrote: > Only in 2.4.21rc8aa1: 9910_shm-largepage-13.gz > Only in 2.4.22pre6aa1: 9910_shm-largepage-16.gz > > Thanks to Hugh for the help in porting the bigpages > to the rewritten shmfs layer in 22pre. No idea at the moment if it > works or if it only compiles. Any reason you don't use a backport of hugetlbfs like the IA64 or the RH AS3 tree? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-18 18:18 ` 2.4.22pre6aa1 Christoph Hellwig @ 2003-07-18 22:27 ` Andrea Arcangeli 2003-07-18 22:48 ` 2.4.22pre6aa1 William Lee Irwin III 0 siblings, 1 reply; 37+ messages in thread From: Andrea Arcangeli @ 2003-07-18 22:27 UTC (permalink / raw) To: Christoph Hellwig, linux-kernel On Fri, Jul 18, 2003 at 07:18:53PM +0100, Christoph Hellwig wrote: > On Thu, Jul 17, 2003 at 12:28:57PM +0200, Andrea Arcangeli wrote: > > Only in 2.4.21rc8aa1: 9910_shm-largepage-13.gz > > Only in 2.4.22pre6aa1: 9910_shm-largepage-16.gz > > > > Thanks to Hugh for the help in porting the bigpages > > to the rewritten shmfs layer in 22pre. No idea at the moment if it > > works or if it only compiles. > > Any reason you don't use a backport of hugetlbfs like the IA64 or > the RH AS3 tree? bigpages= is a documented API that has to be used in production, so I can easily add the hugetlbfs API but I guess I've to keep this one anyways. I also would need to verify the performance of hugetlbfs before suggesting migrating to it, for example I don't want preallocation/prefaulting (IIRC hugetlbfs preallocates everything). I also like the single huge array of page pointers, that is very hardwired but optimal for those workloads. Andrea ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-18 22:27 ` 2.4.22pre6aa1 Andrea Arcangeli @ 2003-07-18 22:48 ` William Lee Irwin III 2003-07-18 22:53 ` 2.4.22pre6aa1 Andrea Arcangeli 0 siblings, 1 reply; 37+ messages in thread From: William Lee Irwin III @ 2003-07-18 22:48 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Christoph Hellwig, linux-kernel On Sat, Jul 19, 2003 at 12:27:50AM +0200, Andrea Arcangeli wrote: > bigpages= is a documented API that has to be used in production, so I > can easily add the hugetlbfs API but I guess I've to keep this one > anyways. I also would need to verify the performance of hugetlbfs before > suggesting migrating to it, for example I don't want > preallocation/prefaulting (IIRC hugetlbfs preallocates everything). I > also like the single huge array of page pointers, that is very hardwired > but optimal for those workloads. Most of the complaints I've gotten are about lack of support for mixed PSE and non-PSE mappings, not preallocation or performance (generally its usage doesn't involve creation/destruction cycle performance requirements, and most of the time they intend to use 100% of the memory). It's basically too stupid and operating on too small a data set to screw up performance-wise apart from creation/destruction, which is not intended to be performant (and will never be; it blits oversized areas). I wouldn't mind hearing of what you believe is missing, so long as it's within the constraints of what's mergeable. =( -- wli ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-18 22:48 ` 2.4.22pre6aa1 William Lee Irwin III @ 2003-07-18 22:53 ` Andrea Arcangeli 2003-07-18 23:04 ` 2.4.22pre6aa1 William Lee Irwin III 0 siblings, 1 reply; 37+ messages in thread From: Andrea Arcangeli @ 2003-07-18 22:53 UTC (permalink / raw) To: William Lee Irwin III, Christoph Hellwig, linux-kernel On Fri, Jul 18, 2003 at 03:48:24PM -0700, William Lee Irwin III wrote: > On Sat, Jul 19, 2003 at 12:27:50AM +0200, Andrea Arcangeli wrote: > > bigpages= is a documented API that has to be used in production, so I > > can easily add the hugetlbfs API but I guess I've to keep this one > > anyways. I also would need to verify the performance of hugetlbfs before > > suggesting migrating to it, for example I don't want > > preallocation/prefaulting (IIRC hugetlbfs preallocates everything). I > > also like the single huge array of page pointers, that is very hardwired > > but optimal for those workloads. > > Most of the complaints I've gotten are about lack of support for mixed > PSE and non-PSE mappings, not preallocation or performance (generally > its usage doesn't involve creation/destruction cycle performance > requirements, and most of the time they intend to use 100% of the memory). > > It's basically too stupid and operating on too small a data set to > screw up performance-wise apart from creation/destruction, which is not > intended to be performant (and will never be; it blits oversized areas). > > I wouldn't mind hearing of what you believe is missing, so long as it's > within the constraints of what's mergeable. =( I tend to think the creation/destruction will be the most noticeable performance difference in practice. allocating 42G in a single block will take a bit of time ;). I'm not necessairly worse or unacceptable, but it's different. And I feel I've to retain the bigpages= API (as an API not as in implementation) anyways. Furthmore I'm unsure if hugtlbfs is relaxed like the shm-largpeage patch is, I mean, it should be possible to mmap the stuff with 4k granularty too, or stuff could break due that change of API too. Andrea ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-18 22:53 ` 2.4.22pre6aa1 Andrea Arcangeli @ 2003-07-18 23:04 ` William Lee Irwin III 2003-07-18 23:12 ` 2.4.22pre6aa1 Andrea Arcangeli 0 siblings, 1 reply; 37+ messages in thread From: William Lee Irwin III @ 2003-07-18 23:04 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Christoph Hellwig, linux-kernel On Sat, Jul 19, 2003 at 12:53:28AM +0200, Andrea Arcangeli wrote: > I tend to think the creation/destruction will be the most noticeable > performance difference in practice. allocating 42G in a single block > will take a bit of time ;). I'm not necessairly worse or unacceptable, > but it's different. And I feel I've to retain the bigpages= API (as an > API not as in implementation) anyways. Furthmore I'm unsure if hugtlbfs > is relaxed like the shm-largpeage patch is, I mean, it should be > possible to mmap the stuff with 4k granularty too, or stuff could break > due that change of API too. I've just not gotten feedback about creation and destruction; I get the impression it's an uncommon operation. The alignment etc. considerations are bits I probably can't get merged. =( Most of the work I did was trying to get the preexisting semantics into more standard-looking API's, e.g. vfs ops and standard-ish sysv shm. -- wli ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-18 23:04 ` 2.4.22pre6aa1 William Lee Irwin III @ 2003-07-18 23:12 ` Andrea Arcangeli 2003-07-18 23:53 ` 2.4.22pre6aa1 William Lee Irwin III 0 siblings, 1 reply; 37+ messages in thread From: Andrea Arcangeli @ 2003-07-18 23:12 UTC (permalink / raw) To: William Lee Irwin III, Christoph Hellwig, linux-kernel On Fri, Jul 18, 2003 at 04:04:31PM -0700, William Lee Irwin III wrote: > On Sat, Jul 19, 2003 at 12:53:28AM +0200, Andrea Arcangeli wrote: > > I tend to think the creation/destruction will be the most noticeable > > performance difference in practice. allocating 42G in a single block > > will take a bit of time ;). I'm not necessairly worse or unacceptable, > > but it's different. And I feel I've to retain the bigpages= API (as an > > API not as in implementation) anyways. Furthmore I'm unsure if hugtlbfs > > is relaxed like the shm-largpeage patch is, I mean, it should be > > possible to mmap the stuff with 4k granularty too, or stuff could break > > due that change of API too. > > I've just not gotten feedback about creation and destruction; I get the > impression it's an uncommon operation. It's uncommon of course. A 42G allocated all at once, may take a while and 48G works flawlessy at peak performance w/o 4:4. I support as much as 64G all in a single shmfs file backed by bigpages (and it won't run out of memory with a 64G box either, even with the 3:1 mapping) > The alignment etc. considerations are bits I probably can't get merged. =( so the apps will need changes and a kernel API way to know the hardware page size provided by hugetlbfs (though they could probe for it with many tries). > Most of the work I did was trying to get the preexisting semantics into > more standard-looking API's, e.g. vfs ops and standard-ish sysv shm. yes. Andrea ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-18 23:12 ` 2.4.22pre6aa1 Andrea Arcangeli @ 2003-07-18 23:53 ` William Lee Irwin III 2003-07-19 0:04 ` 2.4.22pre6aa1 Andrea Arcangeli 0 siblings, 1 reply; 37+ messages in thread From: William Lee Irwin III @ 2003-07-18 23:53 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Christoph Hellwig, linux-kernel On Sat, Jul 19, 2003 at 01:12:30AM +0200, Andrea Arcangeli wrote: > so the apps will need changes and a kernel API way to know the hardware > page size provided by hugetlbfs (though they could probe for it with > many tries). The hugepage size is exported in /proc/meminfo for the time being. I think 2.7 will see something we both like better. -- wli ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-18 23:53 ` 2.4.22pre6aa1 William Lee Irwin III @ 2003-07-19 0:04 ` Andrea Arcangeli 0 siblings, 0 replies; 37+ messages in thread From: Andrea Arcangeli @ 2003-07-19 0:04 UTC (permalink / raw) To: William Lee Irwin III, Christoph Hellwig, linux-kernel On Fri, Jul 18, 2003 at 04:53:09PM -0700, William Lee Irwin III wrote: > On Sat, Jul 19, 2003 at 01:12:30AM +0200, Andrea Arcangeli wrote: > > so the apps will need changes and a kernel API way to know the hardware > > page size provided by hugetlbfs (though they could probe for it with > > many tries). > > The hugepage size is exported in /proc/meminfo for the time being. ok. > I think 2.7 will see something we both like better. the transparency feature in the shm-largepage patch is quite nice since you could trivially put an app on the fs w/o any breakage that way (not everything has to be strictly mapped with bigpages, so it would make the code more relaxed by just changing the mountpoint). Of course a way to know for sure if a mapping is marked VM_LARGEPAGE would be needed then to be sure the app has the right pieces of vm backed with the right page size. Andrea ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1
@ 2003-07-23 11:21 Sergey S. Kostyliov
2003-07-25 5:28 ` 2.4.22pre6aa1 Andrea Arcangeli
0 siblings, 1 reply; 37+ messages in thread
From: Sergey S. Kostyliov @ 2003-07-23 11:21 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-kernel
Hello Andrea,
This is during `swapoff -a`, on a heavily loaded box:
ksymoops 2.4.9 on i686 2.4.22-pre6aa1. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.22-pre6aa1/ (default)
-m /usr/src/linux/System.map (default)
Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.
Error (regular_file): read_system_map stat /usr/src/linux/System.map failed
ksymoops: No such file or directory
kernel BUG at shmem.c:490!
invalid operand: 0000 2.4.22-pre6aa1 #1 SMP Thu Jul 17 20:24:29 MSD 2003
CPU: 0
EIP: 0010:[<801424cb>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 00000508 ebx: 8d846a00 ecx: c7919800 edx: c79198fc
esi: c79198fc edi: 8d846b40 ebp: 97b999a0 esp: af853e34
ds: 0018 es: 0018 ss: 0018
Process oracle (pid: 23274, stackpage=af853000)
Stack: 8d846a00 8d846a00 80142460 c7919800 80161abf 8d846a00 00000000 00000000
97b999a0 8d846a00 8d846a00 8015e98e 8d846a00 97b999a0 97b999a0 97b999b8
8015ea5a 97b999a0 803349e0 8d141860 c78f6fa0 80148acb 97b999a0 9eb9f774
Call Trace: [<80142460>] [<80161abf>] [<8015e98e>] [<8015ea5a>]
[<80148acb>]
[<80132695>] [<8011c35a>] [<80121c66>] [<80128069>] [<80127f13>]
[<80128115>]
[<801073a8>] [<801236c4>] [<80123562>] [<80127546>] [<80115b50>]
[<80117b80>]
[<80107614>]
Code: 0f 0b ea 01 24 c1 27 80 eb cd 0f 0b e9 01 24 c1 27 80 eb bc
>>EIP; 801424cb <alloc_pages_node+75b/2c70> <=====
Trace; 80142460 <alloc_pages_node+6f0/2c70>
Trace; 80161abf <iput+14f/2b0>
Trace; 8015e98e <lock_may_write+21e/260>
Trace; 8015ea5a <dput+8a/150>
Trace; 80148acb <fput+db/110>
Trace; 80132695 <do_brk+3d5/700>
Trace; 8011c35a <remove_wait_queue+4fa/d10>
Trace; 80121c66 <exit_mm+4f6/770>
Trace; 80128069 <unblock_all_signals+109/150>
Trace; 80127f13 <flush_signal_handlers+103/110>
Trace; 80128115 <dequeue_signal+65/4f0>
Trace; 801073a8 <__read_lock_failed+11a8/17c0>
Trace; 801236c4 <tasklet_kill+f4/120>
Trace; 80123562 <__tasklet_hi_schedule+162/1a0>
Trace; 80127546 <del_timer_sync+a16/ca0>
Trace; 80115b50 <smp_call_function+ce0/19f0>
Trace; 80117b80 <__verify_write+230/ab0>
Trace; 80107614 <__read_lock_failed+1414/17c0>
Code; 801424cb <alloc_pages_node+75b/2c70>
00000000 <_EIP>:
Code; 801424cb <alloc_pages_node+75b/2c70> <=====
0: 0f 0b ud2a <=====
Code; 801424cd <alloc_pages_node+75d/2c70>
2: ea 01 24 c1 27 80 eb ljmp $0xeb80,$0x27c12401
Code; 801424d4 <alloc_pages_node+764/2c70>
9: cd 0f int $0xf
Code; 801424d6 <alloc_pages_node+766/2c70>
b: 0b e9 or %ecx,%ebp
Code; 801424d8 <alloc_pages_node+768/2c70>
d: 01 24 c1 add %esp,(%ecx,%eax,8)
Code; 801424db <alloc_pages_node+76b/2c70>
10: 27 daa
Code; 801424dc <alloc_pages_node+76c/2c70>
11: 80 eb bc sub $0xbc,%bl
1 warning and 1 error issued. Results may not be reliable.
--
Best regards,
Sergey S. Kostyliov <rathamahata@php4.ru>
Public PGP key: http://sysadminday.org.ru/rathamahata.asc
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-23 11:21 2.4.22pre6aa1 Sergey S. Kostyliov @ 2003-07-25 5:28 ` Andrea Arcangeli 2003-07-25 11:10 ` 2.4.22pre6aa1 Sergey S. Kostyliov 0 siblings, 1 reply; 37+ messages in thread From: Andrea Arcangeli @ 2003-07-25 5:28 UTC (permalink / raw) To: Sergey S. Kostyliov; +Cc: linux-kernel On Wed, Jul 23, 2003 at 03:21:15PM +0400, Sergey S. Kostyliov wrote: > Hello Andrea, > > This is during `swapoff -a`, on a heavily loaded box: > > ksymoops 2.4.9 on i686 2.4.22-pre6aa1. Options used > -V (default) > -k /proc/ksyms (default) > -l /proc/modules (default) > -o /lib/modules/2.4.22-pre6aa1/ (default) > -m /usr/src/linux/System.map (default) > > Warning: You did not tell me where to find symbol information. I will > assume that the log matches the kernel and modules that are running > right now and I'll use the default options above for symbol resolution. > If the current kernel and/or modules do not match the log, you can get > more accurate output by telling me the kernel version and where to find > map, modules, ksyms etc. ksymoops -h explains the options. > > Error (regular_file): read_system_map stat /usr/src/linux/System.map failed > ksymoops: No such file or directory > kernel BUG at shmem.c:490! hmm, 2.4.22pre6aa1 was the first 2.4 largepages port to the >=22pre shmfs backport from 2.5. It could be a bug in 2.5, or a bug present only in the backport of the 2.5 code to 22pre, or even a bug only present in -aa due the largepage patch ported on top of the backport included in 22pre. I'll have a closer look at it tomorrow. The place where it crashed is: BUG_ON(inode->i_blocks); it might be only a minor accounting issue. It needs some auditing. I'm afraid you're the first one testing the shmfs backport in 22pre + the largepage support patch in my tree with a big app doing swapoff at the same time. Are you using bigpages btw? thank you very much for the feedback, Andrea PS. shall this give us relevant problems in the debugging/auditing, I'll just give you a patch to backout the backport and go back to the shmfs code in 2.4.21rc8aa1 that is running rock solid in production with largepages (I doubt you need the loop device on top of shmfs anyways). I prefer not to spend much time on new 2.4 features. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-25 5:28 ` 2.4.22pre6aa1 Andrea Arcangeli @ 2003-07-25 11:10 ` Sergey S. Kostyliov 2003-07-25 19:02 ` 2.4.22pre6aa1 Andrea Arcangeli 0 siblings, 1 reply; 37+ messages in thread From: Sergey S. Kostyliov @ 2003-07-25 11:10 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel Hello Andrea, On Friday 25 July 2003 09:28, you wrote: > On Wed, Jul 23, 2003 at 03:21:15PM +0400, Sergey S. Kostyliov wrote: > > Hello Andrea, <cut> > hmm, 2.4.22pre6aa1 was the first 2.4 largepages port to the >=22pre > shmfs backport from 2.5. It could be a bug in 2.5, or a bug present only > in the backport of the 2.5 code to 22pre, or even a bug only present in > -aa due the largepage patch ported on top of the backport included in > 22pre. I'll have a closer look at it tomorrow. The place where it > crashed is: > > BUG_ON(inode->i_blocks); > > it might be only a minor accounting issue. It needs some auditing. Thanks for you recponce! Yes, it seems possible. At least it continue to run just fine after oops for 2.4.22pre6aa1. Btw I've managed to get a hard lockup with 2.4.22pre7aa1 in the same scenario. It just stops responding to even Alt+SysRq+* from keyboard. > > I'm afraid you're the first one testing the shmfs backport in 22pre + > the largepage support patch in my tree with a big app doing swapoff at > the same time. > > Are you using bigpages btw? No, I'm not using bigpages. > > thank you very much for the feedback, > > Andrea > > PS. shall this give us relevant problems in the debugging/auditing, I'll > just give you a patch to backout the backport and go back to the shmfs > code in 2.4.21rc8aa1 that is running rock solid in production with > largepages (I doubt you need the loop device on top of shmfs anyways). I > prefer not to spend much time on new 2.4 features. I doubt it depends on bigpages because they are not used in my setup. But I can live with that. Rule: do not run `swapoff -a` under load doesn't sound as impossible in my case (if this is the only way to trigger this problem). -- Best regards, Sergey S. Kostyliov <rathamahata@php4.ru> Public PGP key: http://sysadminday.org.ru/rathamahata.asc ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-25 11:10 ` 2.4.22pre6aa1 Sergey S. Kostyliov @ 2003-07-25 19:02 ` Andrea Arcangeli 2003-08-03 17:12 ` 2.4.22pre6aa1 Sergey S. Kostyliov 0 siblings, 1 reply; 37+ messages in thread From: Andrea Arcangeli @ 2003-07-25 19:02 UTC (permalink / raw) To: Sergey S. Kostyliov; +Cc: linux-kernel Hi Sergey, On Fri, Jul 25, 2003 at 03:10:59PM +0400, Sergey S. Kostyliov wrote: > I doubt it depends on bigpages because they > are not used in my setup. But I can live with that. Rule: do not run > `swapoff -a` under load doesn't sound as impossible in my case (if this > is the only way to trigger this problem). can you reproduce it with 2.4.21rc8aa1? If not, then likely it's a 2.5/2.6 bug that went in 2.4 during the backport. I spoke with Hugh an hour ago about this, he will soon look into this too. Andrea ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-07-25 19:02 ` 2.4.22pre6aa1 Andrea Arcangeli @ 2003-08-03 17:12 ` Sergey S. Kostyliov 2003-08-16 11:56 ` 2.4.22pre6aa1 Andrea Arcangeli 0 siblings, 1 reply; 37+ messages in thread From: Sergey S. Kostyliov @ 2003-08-03 17:12 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel Hello Andrea, On Friday 25 July 2003 23:02, Andrea Arcangeli wrote: > Hi Sergey, > > On Fri, Jul 25, 2003 at 03:10:59PM +0400, Sergey S. Kostyliov wrote: > > I doubt it depends on bigpages because they > > are not used in my setup. But I can live with that. Rule: do not run > > `swapoff -a` under load doesn't sound as impossible in my case (if this > > is the only way to trigger this problem). > > can you reproduce it with 2.4.21rc8aa1? If not, then likely it's a > 2.5/2.6 bug that went in 2.4 during the backport. I spoke with Hugh an > hour ago about this, he will soon look into this too. Sorry for late responce. I wasn't able to reproduce neither oops nor lockup with 2.4.21rc8aa1. > > Andrea -- Best regards, Sergey S. Kostyliov <rathamahata@php4.ru> Public PGP key: http://sysadminday.org.ru/rathamahata.asc ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-08-03 17:12 ` 2.4.22pre6aa1 Sergey S. Kostyliov @ 2003-08-16 11:56 ` Andrea Arcangeli 2003-08-16 13:54 ` 2.4.22pre6aa1 Hugh Dickins 0 siblings, 1 reply; 37+ messages in thread From: Andrea Arcangeli @ 2003-08-16 11:56 UTC (permalink / raw) To: Sergey S. Kostyliov; +Cc: linux-kernel On Sun, Aug 03, 2003 at 09:12:00PM +0400, Sergey S. Kostyliov wrote: > Hello Andrea, > > On Friday 25 July 2003 23:02, Andrea Arcangeli wrote: > > Hi Sergey, > > > > On Fri, Jul 25, 2003 at 03:10:59PM +0400, Sergey S. Kostyliov wrote: > > > I doubt it depends on bigpages because they > > > are not used in my setup. But I can live with that. Rule: do not run > > > `swapoff -a` under load doesn't sound as impossible in my case (if this > > > is the only way to trigger this problem). > > > > can you reproduce it with 2.4.21rc8aa1? If not, then likely it's a > > 2.5/2.6 bug that went in 2.4 during the backport. I spoke with Hugh an > > hour ago about this, he will soon look into this too. > > Sorry for late responce. I wasn't able to reproduce neither oops nor > lockup with 2.4.21rc8aa1. ok good. I'm betting it's the shm backport that destabilized something. I had no time to look further into it during vacations ;), but the first suspect thing I mentioned to Hugh during OLS was this: static void shmem_removepage(struct page *page) { if (!PageLaunder(page)) shmem_free_blocks(page->mapping->host, 1); } It's not exactly obvious how the accounting should change in function of the Launder bit. I mean, a writepage can happen even w/o the launder bitflag set (if it's not invoked by the vm) and I don't see how a msync or a vm pressure writepage trigger should be different in terms of accounting of the blocks in an inode. Overall I need a bit more of time on Monday to digest the whole backport to be sure of what's going on and if the above is right after all. Andrea ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-08-16 11:56 ` 2.4.22pre6aa1 Andrea Arcangeli @ 2003-08-16 13:54 ` Hugh Dickins 2003-08-16 14:00 ` 2.4.22pre6aa1 Hugh Dickins 0 siblings, 1 reply; 37+ messages in thread From: Hugh Dickins @ 2003-08-16 13:54 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Sergey S. Kostyliov, linux-kernel Hi Andrea, Welcome back. Sergey and I have been in contact over this while you were away, I kept it private so as not to inflate your mailbox further. Brief summary (subject to confirmation by Sergey's testing) would be: don't worry about it, it's not an -aa problem, it's a long-standing and rare bug, in fact much less likely to occur in current 2.6 and 2.4.22-rc than in 2.4.21 and earlier: seems Sergey's just been doing good testing. So I've not bothered Marcelo with fixing it for 2.4.22, will submit fix to 2.6.0-test and 2.4.23-pre later on. You'll immediately counter what I've said there, by pointing out that BUG_ON(inode->i_blocks) couldn't have triggered in 2.4.21 and earlier, since I only added it in 2.4.22-pre. True, but instead it would have gone on to hit clear_inode's "if (inode->i_data.nrpages) BUG();" (assuming I've identified the issue correctly). On Sat, 16 Aug 2003, Andrea Arcangeli wrote: > On Sun, Aug 03, 2003 at 09:12:00PM +0400, Sergey S. Kostyliov wrote: > > On Friday 25 July 2003 23:02, Andrea Arcangeli wrote: > > > On Fri, Jul 25, 2003 at 03:10:59PM +0400, Sergey S. Kostyliov wrote: > > > > I doubt it depends on bigpages because they > > > > are not used in my setup. But I can live with that. Rule: do not run > > > > `swapoff -a` under load doesn't sound as impossible in my case (if this > > > > is the only way to trigger this problem). I believe the issue is that shmem_unuse_inode can swizzle a page from swap cache back into page cache after deletion's or truncation's truncate_inode_pages has cleaned out the page cache for that inode. Not a great big deal in the truncation case (though it could depart from spec: I can imagine fsx detecting inconsistency, seen before 2.4.22-pre, but not since), but dangerous in the deletion case - if there were neither i_blocks nor nrpages BUG, then you'd end up with a page in the cache with page->mapping pointing into freed inode. There used to be nothing to prevent this (the info->sem I eliminated was of no use in the swapoff case), but in 2.5 and 2.4.22-pre I added those I_FREEING and i_size checks to shmem_unuse_inode to prevent it. Or so I thought. But faced with explaining Sergey's BUG_ON, I eventually realized it's not good enough (when SMP) just to check before adding into page cache, it needs to be checked after. Or, as in the patch Sergey is currently testing below, shmem_truncate must be prepared to truncate_inode_pages again. That's the approach I originally implemented in 2.5, but I grew disgusted with it every time I thought of partial truncation trundling twice through truncate_inode_pages (it can easily be avoided when nrpages == 0, but that's unlikely in partial truncation). So VM_PAGEIN flag stuff to restrict it to when it might be necessary; extended to cover other races when reading the page at the same time as truncating (though I think generic_file_read has a window of this kind that we've never worried about). I expect to split the patch into several before sending Marcelo and Andrew. There may be another piece needed, for even rarer race: what if the truncated page arrives at shmem_writepage after shmem_truncate has cleaned the swap pages, before it recalls truncate_inode_pages? But I'll return to this later, I'm attending to other stuff right now, this is all exceedingly rare (unless Sergey shows otherwise). If Andrew happens to be reading this, yes, these subtle races and oft-revisited solutions do shed further doubt on the whole business of tmpfs swapcache swizzling: perhaps 2.7 can find a safer way. > > > can you reproduce it with 2.4.21rc8aa1? If not, then likely it's a > > > 2.5/2.6 bug that went in 2.4 during the backport. I spoke with Hugh an > > > hour ago about this, he will soon look into this too. > > > > Sorry for late responce. I wasn't able to reproduce neither oops nor > > lockup with 2.4.21rc8aa1. It (or rather, clear_inode's nrpages BUG) should be much easier to hit with 2.4.21rc8aa1. I wonder whether Sergey was just (un)lucky to hit it in his 2.4.22pre6aa1 testing: he's not mentioned whether or not he can reproduce it at will. I've not been able to reproduce it at all. There might be some kind of timing difference, which somehow makes it easier to hit the narrower window in 2.4.22pre6aa1, but I don't see what that is. > ok good. I'm betting it's the shm backport that destabilized something. > I had no time to look further into it during vacations ;), but the first > suspect thing I mentioned to Hugh during OLS was this: > > static void shmem_removepage(struct page *page) > { > if (!PageLaunder(page)) > shmem_free_blocks(page->mapping->host, 1); > } > > It's not exactly obvious how the accounting should change in function of > the Launder bit. I mean, a writepage can happen even w/o the launder > bitflag set (if it's not invoked by the vm) and I don't see how a msync > or a vm pressure writepage trigger should be different in terms of > accounting of the blocks in an inode. I thought we'd settled this one then. I understand you're suspicious of using a PageLaunder test in that way, but it has worked correctly in the -ac tree for a year or so. The point is, shmem_removepage gets called whenever shmem page removed from cache, so it gets called when shmem_writepage moves page from page to swap cache: but in that case the page must still be counted as occupying filesystem space, we must not adjust shmem_free_blocks. PageLaunder, set only during writepage, identifies that case. I guess I should add a comment there. > Overall I need a bit more of time on Monday to digest the whole backport > to be sure of what's going on and if the above is right after all. If you have time to do so, that would be great: but I don't think it need be your priority. Certainly nobody else has reported a problem. Hugh ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-08-16 13:54 ` 2.4.22pre6aa1 Hugh Dickins @ 2003-08-16 14:00 ` Hugh Dickins 2003-08-16 14:50 ` 2.4.22pre6aa1 Sergey S. Kostyliov 0 siblings, 1 reply; 37+ messages in thread From: Hugh Dickins @ 2003-08-16 14:00 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Sergey S. Kostyliov, linux-kernel On Sat, 16 Aug 2003, Hugh Dickins wrote: > > Or, as in the patch Sergey is currently testing below, shmem_truncate > must be prepared to truncate_inode_pages again. That's the approach > I originally implemented in 2.5, but I grew disgusted with it every > time I thought of partial truncation trundling twice through > truncate_inode_pages (it can easily be avoided when nrpages == 0, > but that's unlikely in partial truncation). > > So VM_PAGEIN flag stuff to restrict it to when it might be necessary; > extended to cover other races when reading the page at the same time > as truncating (though I think generic_file_read has a window of this > kind that we've never worried about). I expect to split the patch > into several before sending Marcelo and Andrew. And here is the patch I claimed to be below. If you apply it to anything other than 2.4.22-pre6aa1, please be careful to check that it has applied correctly. Originally I made a patch against 2.4.22-pre6, and then applied to 2.4.22-pre6aa1: but I have never seen patch make such a mess of it! Hugh --- 2.4.22-pre6aa1/mm/shmem.c Thu Jul 31 15:23:58 2003 +++ linux/mm/shmem.c Mon Aug 11 21:00:55 2003 @@ -92,6 +92,9 @@ #define VM_ACCT(size) (PAGE_CACHE_ALIGN(size) >> PAGE_SHIFT) +/* info->flags needs a VM_flag to handle pagein/truncate race efficiently */ +#define VM_PAGEIN VM_READ + /* Pretend that each entry is of this size in directory's i_size */ #define BOGO_DIRENT_SIZE 20 @@ -435,6 +438,18 @@ BUG_ON(info->swapped > info->next_index); spin_unlock(&info->lock); + + if (inode->i_mapping->nrpages && (info->flags & VM_PAGEIN)) { + /* + * Call truncate_inode_pages again: racing shmem_unuse_inode + * may have swizzled a page in from swap since vmtruncate or + * generic_delete_inode did it, before we lowered next_index. + * Also, though shmem_getpage checks i_size before adding to + * cache, no recheck after: so fix the narrow window there too. + */ + truncate_inode_pages(inode->i_mapping, inode->i_size); + } + if (freed) shmem_free_blocks(inode, freed); } @@ -459,6 +474,19 @@ attr->ia_size>>PAGE_CACHE_SHIFT, &page, SGP_READ); } + /* + * Reset VM_PAGEIN flag so that shmem_truncate can + * detect if any pages might have been added to cache + * after truncate_inode_pages. But we needn't bother + * if it's being fully truncated to zero-length: the + * nrpages check is efficient enough in that case. + */ + if (attr->ia_size) { + struct shmem_inode_info *info = SHMEM_I(inode); + spin_lock(&info->lock); + info->flags &= ~VM_PAGEIN; + spin_unlock(&info->lock); + } } } @@ -511,7 +539,6 @@ struct address_space *mapping; swp_entry_t *ptr; unsigned long idx; - unsigned long limit; int offset; idx = 0; @@ -543,13 +570,9 @@ inode = info->inode; mapping = inode->i_mapping; delete_from_swap_cache(page); - - /* Racing against delete or truncate? Must leave out of page cache */ - limit = (inode->i_state & I_FREEING)? 0: - (inode->i_size + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; - - if (idx >= limit || add_to_page_cache_unique(page, - mapping, idx, page_hash(mapping, idx)) == 0) { + if (add_to_page_cache_unique(page, + mapping, idx, page_hash(mapping, idx)) == 0) { + info->flags |= VM_PAGEIN; ptr[offset].val = 0; info->swapped--; } else if (add_to_swap_cache(page, entry) != 0) @@ -634,6 +657,7 @@ * Add page back to page cache, unref swap, try again. */ add_to_page_cache_locked(page, mapping, index); + info->flags |= VM_PAGEIN; spin_unlock(&info->lock); swap_free(swap); goto getswap; @@ -809,6 +833,7 @@ swap_free(swap); } else if (add_to_page_cache_unique(swappage, mapping, idx, page_hash(mapping, idx)) == 0) { + info->flags |= VM_PAGEIN; entry->val = 0; info->swapped--; spin_unlock(&info->lock); @@ -868,6 +893,7 @@ goto failed; goto repeat; } + info->flags |= VM_PAGEIN; } spin_unlock(&info->lock); ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.4.22pre6aa1 2003-08-16 14:00 ` 2.4.22pre6aa1 Hugh Dickins @ 2003-08-16 14:50 ` Sergey S. Kostyliov 0 siblings, 0 replies; 37+ messages in thread From: Sergey S. Kostyliov @ 2003-08-16 14:50 UTC (permalink / raw) To: Hugh Dickins, Andrea Arcangeli; +Cc: linux-kernel Hi Hugh and Andrew, On Saturday 16 August 2003 18:00, Hugh Dickins wrote: > On Sat, 16 Aug 2003, Hugh Dickins wrote: > > Or, as in the patch Sergey is currently testing below, shmem_truncate > > must be prepared to truncate_inode_pages again. That's the approach > > I originally implemented in 2.5, but I grew disgusted with it every > > time I thought of partial truncation trundling twice through > > truncate_inode_pages (it can easily be avoided when nrpages == 0, > > but that's unlikely in partial truncation). > > > > So VM_PAGEIN flag stuff to restrict it to when it might be necessary; > > extended to cover other races when reading the page at the same time > > as truncating (though I think generic_file_read has a window of this > > kind that we've never worried about). I expect to split the patch > > into several before sending Marcelo and Andrew. > > And here is the patch I claimed to be below. > If you apply it to anything other than 2.4.22-pre6aa1, > please be careful to check that it has applied correctly. > Originally I made a patch against 2.4.22-pre6, and then applied to > 2.4.22-pre6aa1: but I have never seen patch make such a mess of it! I just want to confirm that I wasn't able to repeat this problem with the patch below (applied to 2.4.22-pre7aa1) after more than 3 days of testing. I'll inform you if any issues will arise. Thank you! > > Hugh > > --- 2.4.22-pre6aa1/mm/shmem.c Thu Jul 31 15:23:58 2003 > +++ linux/mm/shmem.c Mon Aug 11 21:00:55 2003 > @@ -92,6 +92,9 @@ > > #define VM_ACCT(size) (PAGE_CACHE_ALIGN(size) >> PAGE_SHIFT) > > +/* info->flags needs a VM_flag to handle pagein/truncate race efficiently > */ +#define VM_PAGEIN VM_READ > + > /* Pretend that each entry is of this size in directory's i_size */ > #define BOGO_DIRENT_SIZE 20 > > @@ -435,6 +438,18 @@ > > BUG_ON(info->swapped > info->next_index); > spin_unlock(&info->lock); > + > + if (inode->i_mapping->nrpages && (info->flags & VM_PAGEIN)) { > + /* > + * Call truncate_inode_pages again: racing shmem_unuse_inode > + * may have swizzled a page in from swap since vmtruncate or > + * generic_delete_inode did it, before we lowered next_index. > + * Also, though shmem_getpage checks i_size before adding to > + * cache, no recheck after: so fix the narrow window there too. > + */ > + truncate_inode_pages(inode->i_mapping, inode->i_size); > + } > + > if (freed) > shmem_free_blocks(inode, freed); > } > @@ -459,6 +474,19 @@ > attr->ia_size>>PAGE_CACHE_SHIFT, > &page, SGP_READ); > } > + /* > + * Reset VM_PAGEIN flag so that shmem_truncate can > + * detect if any pages might have been added to cache > + * after truncate_inode_pages. But we needn't bother > + * if it's being fully truncated to zero-length: the > + * nrpages check is efficient enough in that case. > + */ > + if (attr->ia_size) { > + struct shmem_inode_info *info = SHMEM_I(inode); > + spin_lock(&info->lock); > + info->flags &= ~VM_PAGEIN; > + spin_unlock(&info->lock); > + } > } > } > > @@ -511,7 +539,6 @@ > struct address_space *mapping; > swp_entry_t *ptr; > unsigned long idx; > - unsigned long limit; > int offset; > > idx = 0; > @@ -543,13 +570,9 @@ > inode = info->inode; > mapping = inode->i_mapping; > delete_from_swap_cache(page); > - > - /* Racing against delete or truncate? Must leave out of page cache */ > - limit = (inode->i_state & I_FREEING)? 0: > - (inode->i_size + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; > - > - if (idx >= limit || add_to_page_cache_unique(page, > - mapping, idx, page_hash(mapping, idx)) == 0) { > + if (add_to_page_cache_unique(page, > + mapping, idx, page_hash(mapping, idx)) == 0) { > + info->flags |= VM_PAGEIN; > ptr[offset].val = 0; > info->swapped--; > } else if (add_to_swap_cache(page, entry) != 0) > @@ -634,6 +657,7 @@ > * Add page back to page cache, unref swap, try again. > */ > add_to_page_cache_locked(page, mapping, index); > + info->flags |= VM_PAGEIN; > spin_unlock(&info->lock); > swap_free(swap); > goto getswap; > @@ -809,6 +833,7 @@ > swap_free(swap); > } else if (add_to_page_cache_unique(swappage, > mapping, idx, page_hash(mapping, idx)) == 0) { > + info->flags |= VM_PAGEIN; > entry->val = 0; > info->swapped--; > spin_unlock(&info->lock); > @@ -868,6 +893,7 @@ > goto failed; > goto repeat; > } > + info->flags |= VM_PAGEIN; > } > > spin_unlock(&info->lock); -- Best regards, Sergey S. Kostyliov <rathamahata@php4.ru> Public PGP key: http://sysadminday.org.ru/rathamahata.asc ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2003-08-16 14:50 UTC | newest] Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-07-17 10:28 2.4.22pre6aa1 Andrea Arcangeli 2003-07-17 10:42 ` 2.4.22pre6aa1 ooyama eiichi 2003-07-17 10:52 ` 2.4.22pre6aa1 Marc-Christian Petersen 2003-07-17 10:53 ` 2.4.22pre6aa1 ooyama eiichi 2003-07-17 15:42 ` 2.4.22pre6aa1 Dave Jones 2003-07-17 20:31 ` 2.4.22pre6aa1 Andrea Arcangeli 2003-07-17 22:13 ` 2.4.22pre6aa1 Marc-Christian Petersen 2003-07-17 22:26 ` 2.4.22pre6aa1 Andrea Arcangeli 2003-07-17 22:27 ` 2.4.22pre6aa1 Mike Fedyk 2003-07-17 22:32 ` 2.4.22pre6aa1 Marc-Christian Petersen 2003-07-17 22:30 ` 2.4.22pre6aa1 Marc-Christian Petersen 2003-07-17 22:50 ` 2.4.22pre6aa1 Andrea Arcangeli 2003-07-18 0:30 ` 2.4.22pre6aa1 Chris Mason 2003-07-22 12:28 ` 2.4.22pre6aa1 Marc-Christian Petersen 2003-07-22 14:04 ` 2.4.22pre6aa1 Andrea Arcangeli 2003-07-18 5:47 ` 2.4.22pre6aa1 Andrea Arcangeli 2003-07-22 13:34 ` 2.4.22pre6aa1 Marc-Christian Petersen 2003-07-22 13:59 ` 2.4.22pre6aa1 Andrea Arcangeli 2003-07-24 12:27 ` 2.4.22pre6aa1 Marc-Christian Petersen 2003-07-24 14:14 ` 2.4.22pre6aa1 Chris Mason 2003-07-18 18:18 ` 2.4.22pre6aa1 Christoph Hellwig 2003-07-18 22:27 ` 2.4.22pre6aa1 Andrea Arcangeli 2003-07-18 22:48 ` 2.4.22pre6aa1 William Lee Irwin III 2003-07-18 22:53 ` 2.4.22pre6aa1 Andrea Arcangeli 2003-07-18 23:04 ` 2.4.22pre6aa1 William Lee Irwin III 2003-07-18 23:12 ` 2.4.22pre6aa1 Andrea Arcangeli 2003-07-18 23:53 ` 2.4.22pre6aa1 William Lee Irwin III 2003-07-19 0:04 ` 2.4.22pre6aa1 Andrea Arcangeli 2003-07-23 11:21 2.4.22pre6aa1 Sergey S. Kostyliov 2003-07-25 5:28 ` 2.4.22pre6aa1 Andrea Arcangeli 2003-07-25 11:10 ` 2.4.22pre6aa1 Sergey S. Kostyliov 2003-07-25 19:02 ` 2.4.22pre6aa1 Andrea Arcangeli 2003-08-03 17:12 ` 2.4.22pre6aa1 Sergey S. Kostyliov 2003-08-16 11:56 ` 2.4.22pre6aa1 Andrea Arcangeli 2003-08-16 13:54 ` 2.4.22pre6aa1 Hugh Dickins 2003-08-16 14:00 ` 2.4.22pre6aa1 Hugh Dickins 2003-08-16 14:50 ` 2.4.22pre6aa1 Sergey S. Kostyliov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).