* 2.6.0-test9-mm3 @ 2003-11-13 7:30 Andrew Morton 2003-11-13 20:03 ` [PATCH] linux-2.6.0-test9-mm3_verbose-timesource-acpi-pm_A0 john stultz ` (4 more replies) 0 siblings, 5 replies; 49+ messages in thread From: Andrew Morton @ 2003-11-13 7:30 UTC (permalink / raw) To: linux-kernel, linux-mm http://www.zip.com.au/~akpm/linux/patches/2.6.0-test9-mm3.gz kernel.org is being slow. Will appear at: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test9/2.6.0-test9-mm3/ - Various new fixes; generally uncritical ones. - Significant changes to the AIO and direct-io code. This needs beating on; hopefully we're now close to a solution to the fairly complex problems in there. - Several ext2 and ext3 allocator fixes. These need serious testing on big SMP. - Anyone who has patches in here which they think should go into 2.6.0, please retest them in -mm3 and let me know, thanks. linus.patch Latest Linus tree -as-badness-warning-fix.patch -3c509-mca-fix.patch -ext2-allocation-fix.patch -ohci-locking-fix.patch -disable-ide-tcq.patch -via-quirk-fix.patch -raid1-recovery-fix.patch -journal_remove_journal_head-assertion-fix.patch -x86_64-tss-limit-fix.patch -keyboard-repeat-rate-setting-fix.patch -aio-refcounting-fix.patch Merged -RD16-rest-B6.patch Al said to drop this. +cramfs-use-pagecache.patch cramfs fixes -ia32-MSI-support-tweaks.patch Folded into ia32-MSI-support.patch +ia32-MSI-support-x86_64-fixes.patch x86_64 build fix -ia32-efi-asm-warning-fix.patch -ia32-efi-support-mem-equals-fix.patch -CONFIG_ACPI_EFI-defaults-off.patch -ia32-efi-support-warning-fixes.patch -ia32-efi-support-tidy.patch -ia32-efi-other-arch-fix.patch -efi-constant-sizing-fix.patch -ia32-efi-config-option.patch -ia32-efi-config-option-tweaks.patch -ia32-efi-config-help-update.patch -ia64-CONFIG_EFI-update.patch Folded into ia32-efi-support.patch +ia64-ia32-missing-compat-syscalls.patch +compat-layer-fixes.patch 32-bit compat layer fixes +compat-ioctl-for-i2c.patch compat layer for i2c (old version) +loop-bio-handling-fix.patch Loop driver fixlet -gcc-Os-if-embedded-better-help.patch Folded into gcc-Os-if-embedded.patch +as-request-poisoning-fix.patch +as-fix-all-known-bugs.patch Anticipatory scheduler fixes. +more-than-256-cpus.patch cpumask fixes for huge SMP +acpi-pm-timer.patch +acpi-pm-timer-fixes.patch Yet another timer source for ia32 +ZONE_SHIFT-from-NODES_SHIFT.patch Memory zone arith fixup +ext2_new_inode-fixes.patch +ext2_new_inode-fixes-tweaks.patch +remove-ext2_reverve_inode.patch ext2 fixes +memmove-speedup.patch Make memmove() faster. +percpu-counter-linkage-fix.patch Fix the build for when ext2 and ext3 are modular +ide-scsi-warnings.patch Print warnings when someone tries to use ide-scsi for a cdrom +pipe-readv-writev.patch pipe readv() and writev() correctness fix and speedup +ext3_new_inode-scan-fix.patch ext3 inode allocator fix +lockless-semop.patch sysv semaphore SMP speedup +percpu_counter-use-alloc_percpu.patch Fix the percpu counters for huge SMP. +i450nx-scanning-fix.patch PCI bridge fix for i450nx chipset machines +serio-pm-fix.patch Fix psmouse PM resume +find_busiest_queue-commentary.patch CPU scheduler comments +ext2-block-allocator-fixes.patch More ext2 allocator fixes. +SOUND_CMPCI-config-typo-fix.patch Sound driver config fix +atkbd-24-compatibility.patch Make AT keyboard userspace interface compatible with 2.4's. +init_h-needs-compiler_h.patch +init_h-needs-compiler_h-fix.patch Compile fix +cpu_sibling_map-fix.patch cpu_sibling_map is broken on summit. +tulip-hash-fix.patch Fix multicast hash generation for some tulips +context-switch-accounting-fix.patch Fix CPU scheduler beancounting with CONFIG_PREEMPT. +access-vfs_permission-fix.patch Fix access() +eicon-linkage-fix.patch ISDM build fix +kobject-docco-additions.patch Documentation additions. -O_DIRECT-race-fixes-rework-XFS-fix.patch -O_DIRECT-race-fixes-rework-XFS-fix-fix.patch Folded into O_DIRECT-race-fixes-rollup.patch +dio-aio-fixes.patch +dio-aio-fixes-fixes.patch AIO/direct-io fixes +promise-sata-id.patch Additional STAT PCI ID. All 201 patches linus.patch mm.patch add -mmN to EXTRAVERSION kgdb-ga.patch kgdb stub for ia32 (George Anzinger's one) kgdbL warning fix kgdb-buff-too-big.patch kgdb buffer overflow fix kgdb-warning-fix.patch kgdbL warning fix kgdb-build-fix.patch kgdb-spinlock-fix.patch kgdb-fix-debug-info.patch kgdb: CONFIG_DEBUG_INFO fix kgdb-cpumask_t.patch kgdb-x86_64-fixes.patch x86_64 fixes kgdb-over-ethernet.patch kgdb-over-ethernet patch kgdb-over-ethernet-fixes.patch kgdb-over-ethernet fixlets kgdb-CONFIG_NET_POLL_CONTROLLER.patch kgdb: replace CONFIG_KGDB with CONFIG_NET_RX_POLL in net drivers kgdb-handle-stopped-NICs.patch kgdb: handle netif_stopped NICs eepro100-poll-controller.patch tlan-poll_controller.patch tulip-poll_controller.patch tg3-poll_controller.patch kgdb: tg3 poll_controller 8139too-poll_controller.patch 8139too poll controller kgdb-eth-smp-fix.patch kgdb-over-ethernet: fix SMP kgdb-eth-reattach.patch kgdb-skb_reserve-fix.patch kgdb-over-ethernet: skb_reserve() fix must-fix.patch should-fix.patch must-fix-update-01.patch must fix lists update RD1-cdrom_ioctl-B6.patch RD2-ioctl-B6.patch RD2-ioctl-B6-fix.patch RD2-ioctl-B6 fixes RD3-cdrom_open-B6.patch RD4-open-B6.patch RD5-cdrom_release-B6.patch RD6-release-B6.patch RD7-presto_journal_close-B6.patch RD8-f_mapping-B6.patch RD9-f_mapping2-B6.patch RD10-i_sem-B6.patch RD11-f_mapping3-B6.patch RD12-generic_osync_inode-B6.patch RD13-bd_acquire-B6.patch RD14-generic_write_checks-B6.patch RD15-I_BDEV-B6.patch cramfs-use-pagecache.patch cramfs: use pagecache better invalidate_inodes-speedup.patch invalidate_inodes speedup invalidate_inodes-speedup-fixes-2.patch more invalidate_inodes speedup fixes serio-01-renaming.patch serio: rename serio_[un]register_slave_port to __serio_[un]register_port serio-02-race-fix.patch serio: possible race between port removal and kseriod serio-03-blacklist.patch Add black list to handler<->device matching serio-04-synaptics-cleanup.patch Synaptics: code cleanup serio-05-reconnect-facility.patch serio: reconnect facility serio-06-synaptics-use-reconnect.patch Synaptics: use serio_reconnect acpi_off-fix.patch fix acpi=off cfq-4.patch CFQ io scheduler CFQ fixes config_spinline.patch uninline spinlocks for profiling accuracy. ppc64-bar-0-fix.patch Allow PCI BARs that start at 0 ppc64-reloc_hide.patch sym-do-160.patch make the SYM driver do 160 MB/sec input-use-after-free-checks.patch input layer debug checks aic7xxx-parallel-build-fix.patch fix parallel builds for aic7xxx ramdisk-cleanup.patch intel8x0-cleanup.patch intel8x0 cleanups pdflush-diag.patch kobject-oops-fixes.patch fix oopses is kobject parent is removed before child futex-uninlinings.patch futex uninlining zap_page_range-debug.patch zap_page_range() debug call_usermodehelper-retval-fix-3.patch Make call_usermodehelper report exit status asus-L5-fix.patch Asus L5 framebuffer fix jffs-use-daemonize.patch tulip-NAPI-support.patch tulip NAPI support tulip-napi-disable.patch tulip NAPI: disable poll in close get_user_pages-handle-VM_IO.patch ia32-MSI-support.patch Updated ia32 MSI Patches ia32-MSI-support-x86_64-fixes.patch ia32-efi-support.patch EFI support for ia32 efi warning fix fix EFI for ppc64, ia64 efi: warning fixes ia32 EFI: Add CONFIG_EFI efi: Update Kconfig help efi update patch (ia64) support-zillions-of-scsi-disks.patch support many SCSI disks SGI-IOC4-IDE-chipset-support.patch Add support for SGI's IOC4 chipset sparc32-sched_clock.patch pcibios_test_irq-fix.patch Fix pcibios test IRQ handler return fixmap-in-proc-pid-maps.patch report user-readable fixmap area in /proc/PID/maps i82365-sysfs-ordering-fix.patch Fix init_i82365 sysfs ordering oops pci_set_power_state-might-sleep.patch ia64-ia32-missing-compat-syscalls.patch From: Arun Sharma <arun.sharma@intel.com> Subject: Missing compat syscalls in ia64 compat-layer-fixes.patch Minor bug fixes to the compat layer compat-ioctl-for-i2c.patch compat_ioctl for i2c compat_ioctl-cleanup.patch cleanup of compat_ioctl functions fix-sqrt.patch sqrt() fixes scale-min_free_kbytes.patch scale the initial value of min_free_kbytes cdrom-allocation-try-harder.patch Use __GFP_REPEAT for cdrom buffer sym-2.1.18f.patch CONFIG_STANDALONE-default-to-n.patch Make CONFIG_STANDALONE default to N extra-buffer-diags.patch nosysfs.patch constant_test_bit-doesnt-like-zwanes-gcc.patch gcc bug workaround for constant_test_bit() slab-leak-detector.patch slab leak detector early-serial-registration-fix.patch serial console registration bugfix 3c527-smp-update.patch SMP support on 3c527 net driver 3c527-race-fix.patch ext3-latency-fix.patch ext3 scheduling latency fix videobuf_waiton-race-fix.patch firmware-kernel_thread-on-demand.patch Remove workqueue usage from request_firmware_async() loop-autoloading-fix.patch Fix loop module auto loading loop-module-alias.patch loop needs MODULE_ALIAS_BLOCK loop-remove-blkdev-special-case.patch loop-highmem.patch remove useless highmem bounce from loop/cryptoloop loop-highmem-fixes.patch loop-bio-handling-fix.patch loop: BIO handling fix cmpci-set_fs-fix.patch cmpci.c: remove pointless set_fs() dentry-bloat-fix-2.patch Fix dcache and icache bloat with deep directories nls-config-fixes.patch NSL config fixes proc_pid_lookup-vs-exit-race-fix.patch Fix proc_pid_lookup vs exit race gcc-Os-if-embedded.patch Add `gcc -Os' config option aic7xxx-sleep-in-spinlock-fix.patch vm86-sysenter-fix.patch Fix sysenter disabling in vm86 mode gettimeofday-resolution-fix.patch gettimeofday resolution fix refill_counter-overflow-fix.patch vmscan: reset refill_counter after refilling the inactive list verbose-timesource.patch be verbose about the time source as-regression-fix.patch Fix IO scheduler regression as-request-poisoning.patch AS: request poisoning as-request-poisoning-fix.patch AS: request poisining fix as-fix-all-known-bugs.patch AS fixes as-new-process-estimation.patch AS: new process estimation as-cooperative-thinktime.patch AS: thinktime improvement scale-nr_requests.patch scale nr_requests with TCQ depth truncate_inode_pages-check.patch local_bh_enable-warning-fix.patch cdc-acm-softirq-rx.patch cdc-acm: move rx processing to softirq forcedeth.patch forcedeth: nForce ethernet driver reiserfs-pinned-buffer-fix.patch reiserfs pinned buffer fix proc-pid-maps-output-fix.patch Restore /proc/pid/maps formatting atomic_dec-debug.patch atomic_dec debug sis900-pm-support.patch Add PM support to sis900 network driver 8139too-locking-fix.patch 8139too locking fix ia32-wp-test-cleanup.patch ia32 WP test cleanup hugetlb-needs-pse.patch ia32: hugetlb needs pse powermate-payload-size-fix.patch Griffin Powermate fix more-than-256-cpus.patch Fix for more than 256 CPUs acpi-pm-timer.patch ACPI PM Timer acpi-pm-timer-fixes.patch ACPI PM-Timer fixes ZONE_SHIFT-from-NODES_SHIFT.patch Use NODES_SHIFT to calculate ZONE_SHIFT ext2_new_inode-fixes.patch Fix bugs in ext2_new_inode() ext2_new_inode-fixes-tweaks.patch ext2_new_inode: more tweaking remove-ext2_reverve_inode.patch memmove-speedup.patch optimize ia32 memmove percpu-counter-linkage-fix.patch fix percpu_counter_mod linkage problem ide-scsi-warnings.patch ide-scsi: warn when used for cdroms pipe-readv-writev.patch Fix writev atomicity on pipe/fifo ext3_new_inode-scan-fix.patch ext3_new_inode fixlet lockless-semop.patch lockless semop percpu_counter-use-alloc_percpu.patch use alloc_percpu in percpu_counters i450nx-scanning-fix.patch i450nx PCI scanning fix serio-pm-fix.patch psmouse pm resume fix find_busiest_queue-commentary.patch find_busiest_queue() commentary fix ext2-block-allocator-fixes.patch ext2 block allocator fixes SOUND_CMPCI-config-typo-fix.patch fix SOUND_CMPCI Configure help entry atkbd-24-compatibility.patch Fixes for keyboard 2.4 compatibility init_h-needs-compiler_h.patch init.h needs to include compiler.h init_h-needs-compiler_h-fix.patch compile fix for older gcc's cpu_sibling_map-fix.patch cpu_sibling_map fix tulip-hash-fix.patch tulip filter hash fix context-switch-accounting-fix.patch Fix context switch accounting access-vfs_permission-fix.patch Subject: Re: [PATCH] fix access() / vfs_permission() bug eicon-linkage-fix.patch eicon/ and hardware/eicon/ drivers using the same symbols kobject-docco-additions.patch Improve documentation for kobjects list_del-debug.patch list_del debug check print-build-options-on-oops.patch show_task-free-stack-fix.patch show_task() fix and cleanup oops-dump-preceding-code.patch i386 oops output: dump preceding code lockmeter.patch printk-oops-mangle-fix.patch disentangle printk's whilst oopsing on SMP 4g-2.6.0-test2-mm2-A5.patch 4G/4G split patch 4G/4G: remove debug code 4g4g: pmd fix 4g/4g: fixes from Bill 4g4g: fpu emulation fix 4g/4g usercopy atomicity fix 4G/4G: remove debug code 4g4g: pmd fix 4g/4g: fixes from Bill 4g4g: fpu emulation fix 4g/4g usercopy atomicity fix 4G/4G preempt on vstack 4G/4G: even number of kmap types 4g4g: fix __get_user in slab 4g4g: Remove extra .data.idt section definition 4g/4g linker error (overlapping sections) 4G/4G: remove debug code 4g4g: pmd fix 4g/4g: fixes from Bill 4g4g: fpu emulation fix 4g4g: show_registers() fix 4g/4g usercopy atomicity fix 4g4g: debug flags fix 4g4g: Fix wrong asm-offsets entry cyclone time fixmap fix 4G/4G preempt on vstack 4G/4G: even number of kmap types 4g4g: fix __get_user in slab 4g4g: Remove extra .data.idt section definition 4g/4g linker error (overlapping sections) 4G/4G: remove debug code 4g4g: pmd fix 4g/4g: fixes from Bill 4g4g: fpu emulation fix 4g4g: show_registers() fix 4g/4g usercopy atomicity fix 4g4g: debug flags fix 4g4g: Fix wrong asm-offsets entry cyclone time fixmap fix use direct_copy_{to,from}_user for kernel access in mm/usercopy.c 4G/4G might_sleep warning fix 4g/4g pagetable accounting fix 4g4g-athlon-prefetch-handling-fix.patch 4g4g-wp-test-fix.patch Fix 4G/4G and WP test lockup 4g4g-KERNEL_DS-usercopy-fix.patch 4G/4G KERNEL_DS usercopy again ppc-fixes.patch make mm4 compile on ppc aic7xxx_old-oops-fix.patch O_DIRECT-race-fixes-rollup.patch DIO fixes forward port and AIO-DIO fix O_DIRECT race fixes comments O_DRIECT race fixes fix fix fix DIO locking rework O_DIRECT XFS fix dio-aio-fixes.patch direct-io AIO fixes dio-aio-fixes-fixes.patch dio-aio fix fix readahead-multiple-fixes.patch readahead: multipole performance fixes readahead-simplification.patch readahead simplification aio-sysctl-parms.patch aio sysctl parms aio-01-retry.patch AIO: Core retry infrastructure Fix aio process hang on EINVAL AIO: flush workqueues before destroying ioctx'es AIO: hold the context lock across unuse_mm task task_lock in use_mm() 4g4g-aio-hang-fix.patch Fix AIO and 4G-4G hang aio-retry-elevated-refcount.patch aio: extra ref count during retry aio-splice-runlist.patch Splice AIO runlist for fairer handling of multiple io contexts aio-02-lockpage_wq.patch AIO: Async page wait aio-03-fs_read.patch AIO: Filesystem aio read aio-04-buffer_wq.patch AIO: Async buffer wait lock_buffer_wq fix aio-05-fs_write.patch AIO: Filesystem aio write aio-06-bread_wq.patch AIO: Async block read aio-07-ext2getblk_wq.patch AIO: Async get block for ext2 O_SYNC-speedup-2.patch speed up O_SYNC writes O_SYNC-speedup-2-f_mapping-fixes.patch aio-09-o_sync.patch aio O_SYNC AIO: fix a BUG Unify o_sync changes for aio and regular writes aio-O_SYNC-fix bits got lost aio: writev nr_segs fix More AIO O_SYNC related fixes aio-09-o_sync-f_mapping-fixes.patch gang_lookup_next.patch Change the page gang lookup API aio-gang_lookup-fix.patch AIO gang lookup fixes aio-O_SYNC-short-write-fix.patch Fix for O_SYNC short writes aio-12-readahead.patch AIO: readahead fixes aio O_DIRECT no readahead Unified page range readahead for aio and regular reads aio-12-readahead-f_mapping-fix.patch aio-readahead-speedup.patch Readahead issues and AIO read speedup promise-sata-id.patch add Promise 20376 PCI ID ^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH] linux-2.6.0-test9-mm3_verbose-timesource-acpi-pm_A0 2003-11-13 7:30 2.6.0-test9-mm3 Andrew Morton @ 2003-11-13 20:03 ` john stultz 2003-11-13 22:03 ` 2.6.0-test9-mm3 - AIO test results Daniel McNeil ` (3 subsequent siblings) 4 siblings, 0 replies; 49+ messages in thread From: john stultz @ 2003-11-13 20:03 UTC (permalink / raw) To: Andrew Morton; +Cc: lkml, linux-mm On Wed, 2003-11-12 at 23:30, Andrew Morton wrote: > +acpi-pm-timer.patch > +acpi-pm-timer-fixes.patch > > Yet another timer source for ia32 > [snip] > verbose-timesource.patch > be verbose about the time source Andrew, I forgot that I sent you the verbose-timesource patch. The ACPI PM time source will need this simple fix to work along side that patch. thanks -john ===== arch/i386/kernel/timers/timer_pm.c 1.6 vs edited ===== --- 1.6/arch/i386/kernel/timers/timer_pm.c Tue Nov 4 11:39:50 2003 +++ edited/arch/i386/kernel/timers/timer_pm.c Thu Nov 13 11:12:23 2003 @@ -185,6 +185,7 @@ /* acpi timer_opts struct */ struct timer_opts timer_pmtmr = { + .name = "pmtmr", .init = init_pmtmr, .mark_offset = mark_offset_pmtmr, .get_offset = get_offset_pmtmr, ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 - AIO test results 2003-11-13 7:30 2.6.0-test9-mm3 Andrew Morton 2003-11-13 20:03 ` [PATCH] linux-2.6.0-test9-mm3_verbose-timesource-acpi-pm_A0 john stultz @ 2003-11-13 22:03 ` Daniel McNeil 2003-11-17 5:25 ` Suparna Bhattacharya 2003-11-13 22:04 ` 2.6.0-test9-mm3 (compile stats) John Cherry ` (2 subsequent siblings) 4 siblings, 1 reply; 49+ messages in thread From: Daniel McNeil @ 2003-11-13 22:03 UTC (permalink / raw) To: Andrew Morton; +Cc: Linux Kernel Mailing List, linux-mm, linux-aio Andrew, I'm testing test9-mm3 on a 2-proc Xeon with a ext3 file system. I tested using the test programs aiocp and aiodio_sparse. (see http://developer.osdl.org/daniel/AIO/) Using aiocp with i/o sizes from 1k to 512k to copy files worked without any errors or kernel debug messages. With 64k i/o, the aiodio_sparse program complete without any errors. There are no kernel error messages, so that is good. There are still problems with non power of 2 i/o sizes using AIO and O_DIRECT. It hangs with aio's that do not seem to complete. The test does exit when hitting ^c and there are no kernel messages. Test output below: $ ./aiodio_sparse $ ./aiodio_sparse -dd -s 1751k -r 18k -w 11k child 1843, read loop count 0 io_submit() return 16 aiodio_sparse: 16 i/o in flight aiodio_sparse: offset 180224 filesize 1793024 inflight 16 aiodio_sparse: io_getevent() returned 1 aiodio_sparse: io_getevent() res 11264 res2 0 io_submit() return 1 aiodio_sparse: offset 191488 filesize 1793024 inflight 16 aiodio_sparse: io_getevent() returned 1 aiodio_sparse: io_getevent() res 11264 res2 0 io_submit() return 1 aiodio_sparse: offset 202752 filesize 1793024 inflight 16 aiodio_sparse: io_getevent() returned 1 aiodio_sparse: io_getevent() res 11264 res2 0 io_submit() return 1 aiodio_sparse: offset 214016 filesize 1793024 inflight 16 aiodio_sparse: io_getevent() returned 1 aiodio_sparse: io_getevent() res 11264 res2 0 io_submit() return 1 aiodio_sparse: offset 225280 filesize 1793024 inflight 16 aiodio_sparse: io_getevent() returned 1 aiodio_sparse: io_getevent() res 11264 res2 0 io_submit() return 1 aiodio_sparse: offset 236544 filesize 1793024 inflight 16 aiodio_sparse: io_getevent() returned 1 aiodio_sparse: io_getevent() res 11264 res2 0 io_submit() return 1 aiodio_sparse: offset 247808 filesize 1793024 inflight 16 aiodio_sparse: io_getevent() returned 1 aiodio_sparse: io_getevent() res 11264 res2 0 io_submit() return 1 aiodio_sparse: offset 259072 filesize 1793024 inflight 16 aiodio_sparse: io_getevent() returned 1 aiodio_sparse: io_getevent() res 11264 res2 0 io_submit() return 1 aiodio_sparse: offset 270336 filesize 1793024 inflight 16 aiodio_sparse: io_getevent() returned 1 aiodio_sparse: io_getevent() res 11264 res2 0 io_submit() return 1 aiodio_sparse: offset 281600 filesize 1793024 inflight 16 aiodio_sparse: io_getevent() returned 1 aiodio_sparse: io_getevent() res 11264 res2 0 io_submit() return 1 aiodio_sparse: offset 292864 filesize 1793024 inflight 16 aiodio_sparse: io_getevent() returned 1 aiodio_sparse: io_getevent() res 11264 res2 0 io_submit() return 1 aiodio_sparse: offset 304128 filesize 1793024 inflight 16 aiodio_sparse: io_getevent() returned 1 aiodio_sparse: io_getevent() res 11264 res2 0 child 1843, read loop count 10 io_submit() return 1 aiodio_sparse: offset 315392 filesize 1793024 inflight 16 aiodio_sparse: io_getevent() returned 1 aiodio_sparse: io_getevent() res 11264 res2 0 io_submit() return 1 aiodio_sparse: offset 326656 filesize 1793024 inflight 16 aiodio_sparse: io_getevent() returned 1 aiodio_sparse: io_getevent() res 11264 res2 0 io_submit() return 1 aiodio_sparse: offset 337920 filesize 1793024 inflight 16 aiodio_sparse: io_getevent() returned 1 aiodio_sparse: io_getevent() res 11264 res2 0 io_submit() return 1 aiodio_sparse: offset 349184 filesize 1793024 inflight 16 aiodio_sparse: io_getevent() returned 1 aiodio_sparse: io_getevent() res 11264 res2 0 io_submit() return 1 aiodio_sparse: offset 360448 filesize 1793024 inflight 16 child 1843, read loop count 20 child 1843, read loop count 30 child 1843, read loop count 40 child 1843, read loop count 50 child 1843, read loop count 60 child 1843, read loop count 70 $ ./aiodio_sparse -i 9 -d -s 180k -r 18k -w 18k io_submit() return 9 aiodio_sparse: 9 i/o in flight aiodio_sparse: offset 165888 filesize 184320 inflight 9 aiodio_sparse: io_getevent() returned 1 aiodio_sparse: io_getevent() res 18432 res2 0 io_submit() return 1 child 2060, read loop count 0 child 2060, read loop count 10 child 2060, read loop count 20 Daniel On Wed, 2003-11-12 at 23:30, Andrew Morton wrote: > - Significant changes to the AIO and direct-io code. This needs beating > on; hopefully we're now close to a solution to the fairly complex problems > in there. > ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 - AIO test results 2003-11-13 22:03 ` 2.6.0-test9-mm3 - AIO test results Daniel McNeil @ 2003-11-17 5:25 ` Suparna Bhattacharya 2003-11-18 1:15 ` Daniel McNeil 0 siblings, 1 reply; 49+ messages in thread From: Suparna Bhattacharya @ 2003-11-17 5:25 UTC (permalink / raw) To: Daniel McNeil Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio On Thu, Nov 13, 2003 at 02:03:58PM -0800, Daniel McNeil wrote: > Andrew, > > I'm testing test9-mm3 on a 2-proc Xeon with a ext3 file system. > I tested using the test programs aiocp and aiodio_sparse. > (see http://developer.osdl.org/daniel/AIO/) > > Using aiocp with i/o sizes from 1k to 512k to copy files worked > without any errors or kernel debug messages. > > With 64k i/o, the aiodio_sparse program complete without any errors. > There are no kernel error messages, so that is good. > > There are still problems with non power of 2 i/o sizes using AIO and > O_DIRECT. It hangs with aio's that do not seem to complete. The test > does exit when hitting ^c and there are no kernel messages. Test output > below: Could you check if the following patch fixes the problem for you ? Regards Suparna -------------------------------------------------------------- With this patch, when the DIO code falls back to buffered i/o after having submitted part of the i/o, then buffered i/o is issued only for the remaining part of the request (i.e. the part not already covered by DIO). diff -ur pure-mm3/fs/direct-io.c linux-2.6.0-test9-mm3/fs/direct-io.c --- pure-mm3/fs/direct-io.c 2003-11-14 09:09:06.000000000 +0530 +++ linux-2.6.0-test9-mm3/fs/direct-io.c 2003-11-17 09:00:47.000000000 +0530 @@ -74,6 +74,7 @@ been performed at the start of a write */ int pages_in_io; /* approximate total IO pages */ + size_t size; /* total request size (doesn't change)*/ sector_t block_in_file; /* Current offset into the underlying file in dio_block units. */ unsigned blocks_available; /* At block_in_file. changes */ @@ -226,7 +227,7 @@ dio_complete(dio, dio->block_in_file << dio->blkbits, dio->result); /* Complete AIO later if falling back to buffered i/o */ - if (dio->result != -ENOTBLK) { + if (dio->result >= dio->size || dio->rw == READ) { aio_complete(dio->iocb, dio->result, 0); kfree(dio); } else { @@ -889,6 +890,7 @@ dio->blkbits = blkbits; dio->blkfactor = inode->i_blkbits - blkbits; dio->start_zero_done = 0; + dio->size = 0; dio->block_in_file = offset >> blkbits; dio->blocks_available = 0; dio->cur_page = NULL; @@ -925,7 +927,7 @@ for (seg = 0; seg < nr_segs; seg++) { user_addr = (unsigned long)iov[seg].iov_base; - bytes = iov[seg].iov_len; + dio->size += bytes = iov[seg].iov_len; /* Index into the first page of the first block */ dio->first_block_in_page = (user_addr & ~PAGE_MASK) >> blkbits; @@ -956,6 +958,13 @@ } } /* end iovec loop */ + if (ret == -ENOTBLK && rw == WRITE) { + /* + * The remaining part of the request will be + * be handled by buffered I/O when we return + */ + ret = 0; + } /* * There may be some unwritten disk at the end of a part-written * fs-block-sized block. Go zero that now. @@ -986,19 +995,13 @@ */ if (dio->is_async) { if (ret == 0) - ret = dio->result; /* Bytes written */ - if (ret == -ENOTBLK) { - /* - * The request will be reissued via buffered I/O - * when we return; Any I/O already issued - * effectively becomes redundant. - */ - dio->result = ret; + ret = dio->result; + if (ret > 0 && dio->result < dio->size && rw == WRITE) { dio->waiter = current; } finished_one_bio(dio); /* This can free the dio */ blk_run_queues(); - if (ret == -ENOTBLK) { + if (dio->waiter) { /* * Wait for already issued I/O to drain out and * release its references to user-space pages @@ -1032,7 +1035,8 @@ } dio_complete(dio, offset, ret); /* We could have also come here on an AIO file extend */ - if (!is_sync_kiocb(iocb) && (ret != -ENOTBLK)) + if (!is_sync_kiocb(iocb) && !(rw == WRITE && ret >= 0 && + dio->result < dio->size)) aio_complete(iocb, ret, 0); kfree(dio); } diff -ur pure-mm3/mm/filemap.c linux-2.6.0-test9-mm3/mm/filemap.c --- pure-mm3/mm/filemap.c 2003-11-14 09:15:08.000000000 +0530 +++ linux-2.6.0-test9-mm3/mm/filemap.c 2003-11-15 11:11:16.000000000 +0530 @@ -1895,14 +1895,16 @@ */ if (written >= 0 && file->f_flags & O_SYNC) status = generic_osync_inode(inode, mapping, OSYNC_METADATA); - if (written >= 0 && !is_sync_kiocb(iocb)) + if (written >= count && !is_sync_kiocb(iocb)) written = -EIOCBQUEUED; - if (written != -ENOTBLK) + if (written < 0 || written >= count) goto out_status; /* * direct-io write to a hole: fall through to buffered I/O + * for completing the rest of the request. */ - written = 0; + pos += written; + count -= written; } buf = iov->iov_base; ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 - AIO test results 2003-11-17 5:25 ` Suparna Bhattacharya @ 2003-11-18 1:15 ` Daniel McNeil 2003-11-18 1:37 ` Daniel McNeil 0 siblings, 1 reply; 49+ messages in thread From: Daniel McNeil @ 2003-11-18 1:15 UTC (permalink / raw) To: Suparna Bhattacharya Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio Suparna, Good news and bad news. Your patch does fix the non-power of two i/o size problems where AIO previously did not complete: $ ./aiodio_sparse -s 1751k -r 18k -w 11k $ aiodio_sparse -i 9 -dd -s 180k -r 18k -w 18k io_submit() return 9 aiodio_sparse: 9 i/o in flight aiodio_sparse: offset 165888 filesize 184320 inflight 9 aiodio_sparse: io_getevent() returned 1 aiodio_sparse: io_getevent() res 18432 res2 0 io_submit() return 1 AIO DIO write done unlinking file dio_sparse done writing, kill children aiodio_sparse 0 children had errors But when testing using aiocp using O_DIRECT to copy a file to an already allocated file, the aiocp process hangs. I used i/o size of 4k and that compeleted. Using i/o size of 1k and 2k, the aiocp process hung during io_sumbit() and are unkillable. Here are the stack traces: # ps -fu daniel | grep aiocp daniel 1920 1 0 16:45 ? 00:00:07 aiocp -b 1k -n 1 -f DIRECT glibc-2.3.2.tar ff2 daniel 2083 2037 0 17:00 pts/2 00:00:03 aiocp -dd -b 1k -n 8 -f DIRECT glibc-2.3.2.tar ff2 aiocp D 00000001 1920 1 1902 (NOTLB) e70abd04 00200086 c18dbc80 00000001 00000003 c02897fc 00000060 00200246 f7cdb8b4 c16522f0 c18dbc80 0000309c 640a05eb 0000008b e6d9e660 c0289a16 f7cdb8b4 e87e95cc c18dbc80 00000000 00000001 e70abd10 c0123712 e70aa000 Call Trace: [<c02897fc>] generic_unplug_device+0x50/0xbd [<c0289a16>] blk_run_queues+0xa9/0x15c [<c0123712>] io_schedule+0x26/0x30 [<c0192242>] direct_io_worker+0x376/0x5ab [<c014840f>] generic_file_direct_IO+0x70/0x89 [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5 [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1 [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf [<c014840f>] generic_file_direct_IO+0x70/0x89 [<c0145e11>] __generic_file_aio_read+0xfb/0x1ff [<c0121b70>] schedule+0x3ac/0x7ef [<c0145f48>] generic_file_aio_read+0x33/0x37 [<c0194ad3>] aio_pread+0x34/0x5f [<c0193bec>] aio_run_iocb+0xa6/0x1ed [<c019316f>] __aio_get_req+0x27/0x158 [<c0194a9f>] aio_pread+0x0/0x5f [<c0194f62>] io_submit_one+0x1ea/0x2b7 [<c0195110>] sys_io_submit+0xe1/0x194 [<c03c29a7>] syscall_call+0x7/0xb [<c03c007b>] rpc_depopulate+0x1aa/0x24b aiocp D 366EDC94 2083 2037 (NOTLB) e758bd04 00200082 f71ba000 366edc94 00000161 c02897fc 00000060 366edc94 00000161 f71ba000 c18d3c80 000069a9 366f5a0e 00000161 e8d4acc0 c0289a16 f7cdb8b4 e960465c c18d3c80 00000000 00000001 e758bd10 c0123712 e758a000 Call Trace: [<c02897fc>] generic_unplug_device+0x50/0xbd [<c0289a16>] blk_run_queues+0xa9/0x15c [<c0123712>] io_schedule+0x26/0x30 [<c0192242>] direct_io_worker+0x376/0x5ab [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5 [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1 [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf [<c014840f>] generic_file_direct_IO+0x70/0x89 [<c0145e11>] __generic_file_aio_read+0xfb/0x1ff [<c0259d3e>] write_chan+0x165/0x21e [<c0145f48>] generic_file_aio_read+0x33/0x37 [<c0194ad3>] aio_pread+0x34/0x5f [<c0193bec>] aio_run_iocb+0xa6/0x1ed [<c019316f>] __aio_get_req+0x27/0x158 [<c0194a9f>] aio_pread+0x0/0x5f [<c02532ab>] tty_write+0x1e8/0x3b2 [<c0194f62>] io_submit_one+0x1ea/0x2b7 [<c0195110>] sys_io_submit+0xe1/0x194 [<c03c29a7>] syscall_call+0x7/0xb [<c03c007b>] rpc_depopulate+0x1aa/0x24b Daniel On Sun, 2003-11-16 at 21:25, Suparna Bhattacharya wrote: > On Thu, Nov 13, 2003 at 02:03:58PM -0800, Daniel McNeil wrote: > > Andrew, > > > > I'm testing test9-mm3 on a 2-proc Xeon with a ext3 file system. > > I tested using the test programs aiocp and aiodio_sparse. > > (see http://developer.osdl.org/daniel/AIO/) > > > > Using aiocp with i/o sizes from 1k to 512k to copy files worked > > without any errors or kernel debug messages. > > > > With 64k i/o, the aiodio_sparse program complete without any errors. > > There are no kernel error messages, so that is good. > > > > There are still problems with non power of 2 i/o sizes using AIO and > > O_DIRECT. It hangs with aio's that do not seem to complete. The test > > does exit when hitting ^c and there are no kernel messages. Test output > > below: > > Could you check if the following patch fixes the problem for you ? > > Regards > Suparna > > -------------------------------------------------------------- > > With this patch, when the DIO code falls back to buffered i/o after > having submitted part of the i/o, then buffered i/o is issued only > for the remaining part of the request (i.e. the part not already > covered by DIO). > > diff -ur pure-mm3/fs/direct-io.c linux-2.6.0-test9-mm3/fs/direct-io.c > --- pure-mm3/fs/direct-io.c 2003-11-14 09:09:06.000000000 +0530 > +++ linux-2.6.0-test9-mm3/fs/direct-io.c 2003-11-17 09:00:47.000000000 +0530 > @@ -74,6 +74,7 @@ > been performed at the start of a > write */ > int pages_in_io; /* approximate total IO pages */ > + size_t size; /* total request size (doesn't change)*/ > sector_t block_in_file; /* Current offset into the underlying > file in dio_block units. */ > unsigned blocks_available; /* At block_in_file. changes */ > @@ -226,7 +227,7 @@ > dio_complete(dio, dio->block_in_file << dio->blkbits, > dio->result); > /* Complete AIO later if falling back to buffered i/o */ > - if (dio->result != -ENOTBLK) { > + if (dio->result >= dio->size || dio->rw == READ) { > aio_complete(dio->iocb, dio->result, 0); > kfree(dio); > } else { > @@ -889,6 +890,7 @@ > dio->blkbits = blkbits; > dio->blkfactor = inode->i_blkbits - blkbits; > dio->start_zero_done = 0; > + dio->size = 0; > dio->block_in_file = offset >> blkbits; > dio->blocks_available = 0; > dio->cur_page = NULL; > @@ -925,7 +927,7 @@ > > for (seg = 0; seg < nr_segs; seg++) { > user_addr = (unsigned long)iov[seg].iov_base; > - bytes = iov[seg].iov_len; > + dio->size += bytes = iov[seg].iov_len; > > /* Index into the first page of the first block */ > dio->first_block_in_page = (user_addr & ~PAGE_MASK) >> blkbits; > @@ -956,6 +958,13 @@ > } > } /* end iovec loop */ > > + if (ret == -ENOTBLK && rw == WRITE) { > + /* > + * The remaining part of the request will be > + * be handled by buffered I/O when we return > + */ > + ret = 0; > + } > /* > * There may be some unwritten disk at the end of a part-written > * fs-block-sized block. Go zero that now. > @@ -986,19 +995,13 @@ > */ > if (dio->is_async) { > if (ret == 0) > - ret = dio->result; /* Bytes written */ > - if (ret == -ENOTBLK) { > - /* > - * The request will be reissued via buffered I/O > - * when we return; Any I/O already issued > - * effectively becomes redundant. > - */ > - dio->result = ret; > + ret = dio->result; > + if (ret > 0 && dio->result < dio->size && rw == WRITE) { > dio->waiter = current; > } > finished_one_bio(dio); /* This can free the dio */ > blk_run_queues(); > - if (ret == -ENOTBLK) { > + if (dio->waiter) { > /* > * Wait for already issued I/O to drain out and > * release its references to user-space pages > @@ -1032,7 +1035,8 @@ > } > dio_complete(dio, offset, ret); > /* We could have also come here on an AIO file extend */ > - if (!is_sync_kiocb(iocb) && (ret != -ENOTBLK)) > + if (!is_sync_kiocb(iocb) && !(rw == WRITE && ret >= 0 && > + dio->result < dio->size)) > aio_complete(iocb, ret, 0); > kfree(dio); > } > diff -ur pure-mm3/mm/filemap.c linux-2.6.0-test9-mm3/mm/filemap.c > --- pure-mm3/mm/filemap.c 2003-11-14 09:15:08.000000000 +0530 > +++ linux-2.6.0-test9-mm3/mm/filemap.c 2003-11-15 11:11:16.000000000 +0530 > @@ -1895,14 +1895,16 @@ > */ > if (written >= 0 && file->f_flags & O_SYNC) > status = generic_osync_inode(inode, mapping, OSYNC_METADATA); > - if (written >= 0 && !is_sync_kiocb(iocb)) > + if (written >= count && !is_sync_kiocb(iocb)) > written = -EIOCBQUEUED; > - if (written != -ENOTBLK) > + if (written < 0 || written >= count) > goto out_status; > /* > * direct-io write to a hole: fall through to buffered I/O > + * for completing the rest of the request. > */ > - written = 0; > + pos += written; > + count -= written; > } > > buf = iov->iov_base; ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 - AIO test results 2003-11-18 1:15 ` Daniel McNeil @ 2003-11-18 1:37 ` Daniel McNeil 2003-11-18 11:55 ` Suparna Bhattacharya 0 siblings, 1 reply; 49+ messages in thread From: Daniel McNeil @ 2003-11-18 1:37 UTC (permalink / raw) To: Suparna Bhattacharya Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio Obviously, the ps output in my previous email showed that the hangs were with 1k i/o sizes. More testing using 2k, 4k, 16k, 32k, 64k, 128k, 256k and 512k all completed correctly. Even 11k and 17k worked. $ ls -l -rw------- 1 daniel daniel 88289280 Jun 9 16:54 glibc-2.3.2.tar -rw-rw-r-- 1 daniel daniel 88289280 Nov 17 17:32 ff2 So, only 1k is hanging so far. Daniel On Mon, 2003-11-17 at 17:15, Daniel McNeil wrote: > Suparna, > > Good news and bad news. Your patch does fix the non-power of two i/o > size problems where AIO previously did not complete: > > $ ./aiodio_sparse -s 1751k -r 18k -w 11k > $ aiodio_sparse -i 9 -dd -s 180k -r 18k -w 18k > io_submit() return 9 > aiodio_sparse: 9 i/o in flight > aiodio_sparse: offset 165888 filesize 184320 inflight 9 > aiodio_sparse: io_getevent() returned 1 > aiodio_sparse: io_getevent() res 18432 res2 0 > io_submit() return 1 > AIO DIO write done unlinking file > dio_sparse done writing, kill children > aiodio_sparse 0 children had errors > > But when testing using aiocp using O_DIRECT to copy a file to > an already allocated file, the aiocp process hangs. I used i/o > size of 4k and that compeleted. Using i/o size of 1k and 2k, > the aiocp process hung during io_sumbit() and are unkillable. > Here are the stack traces: > > # ps -fu daniel | grep aiocp > daniel 1920 1 0 16:45 ? 00:00:07 aiocp -b 1k -n 1 -f DIRECT glibc-2.3.2.tar ff2 > daniel 2083 2037 0 17:00 pts/2 00:00:03 aiocp -dd -b 1k -n 8 -f DIRECT glibc-2.3.2.tar ff2 > > > aiocp D 00000001 1920 1 1902 (NOTLB) > e70abd04 00200086 c18dbc80 00000001 00000003 c02897fc 00000060 00200246 > f7cdb8b4 c16522f0 c18dbc80 0000309c 640a05eb 0000008b e6d9e660 > c0289a16 > f7cdb8b4 e87e95cc c18dbc80 00000000 00000001 e70abd10 c0123712 > e70aa000 > Call Trace: > [<c02897fc>] generic_unplug_device+0x50/0xbd > [<c0289a16>] blk_run_queues+0xa9/0x15c > [<c0123712>] io_schedule+0x26/0x30 > [<c0192242>] direct_io_worker+0x376/0x5ab > [<c014840f>] generic_file_direct_IO+0x70/0x89 > [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5 > [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf > [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1 > [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf > [<c014840f>] generic_file_direct_IO+0x70/0x89 > [<c0145e11>] __generic_file_aio_read+0xfb/0x1ff > [<c0121b70>] schedule+0x3ac/0x7ef > [<c0145f48>] generic_file_aio_read+0x33/0x37 > [<c0194ad3>] aio_pread+0x34/0x5f > [<c0193bec>] aio_run_iocb+0xa6/0x1ed > [<c019316f>] __aio_get_req+0x27/0x158 > [<c0194a9f>] aio_pread+0x0/0x5f > [<c0194f62>] io_submit_one+0x1ea/0x2b7 > [<c0195110>] sys_io_submit+0xe1/0x194 > [<c03c29a7>] syscall_call+0x7/0xb > [<c03c007b>] rpc_depopulate+0x1aa/0x24b > > > aiocp D 366EDC94 2083 2037 (NOTLB) > e758bd04 00200082 f71ba000 366edc94 00000161 c02897fc 00000060 366edc94 > 00000161 f71ba000 c18d3c80 000069a9 366f5a0e 00000161 e8d4acc0 c0289a16 > f7cdb8b4 e960465c c18d3c80 00000000 00000001 e758bd10 c0123712 e758a000 > Call Trace: > [<c02897fc>] generic_unplug_device+0x50/0xbd > [<c0289a16>] blk_run_queues+0xa9/0x15c > [<c0123712>] io_schedule+0x26/0x30 > [<c0192242>] direct_io_worker+0x376/0x5ab > [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5 > [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf > [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1 > [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf > [<c014840f>] generic_file_direct_IO+0x70/0x89 > [<c0145e11>] __generic_file_aio_read+0xfb/0x1ff > [<c0259d3e>] write_chan+0x165/0x21e > [<c0145f48>] generic_file_aio_read+0x33/0x37 > [<c0194ad3>] aio_pread+0x34/0x5f > [<c0193bec>] aio_run_iocb+0xa6/0x1ed > [<c019316f>] __aio_get_req+0x27/0x158 > [<c0194a9f>] aio_pread+0x0/0x5f > [<c02532ab>] tty_write+0x1e8/0x3b2 > [<c0194f62>] io_submit_one+0x1ea/0x2b7 > [<c0195110>] sys_io_submit+0xe1/0x194 > [<c03c29a7>] syscall_call+0x7/0xb > [<c03c007b>] rpc_depopulate+0x1aa/0x24b > > > > Daniel > > On Sun, 2003-11-16 at 21:25, Suparna Bhattacharya wrote: > > On Thu, Nov 13, 2003 at 02:03:58PM -0800, Daniel McNeil wrote: > > > Andrew, > > > > > > I'm testing test9-mm3 on a 2-proc Xeon with a ext3 file system. > > > I tested using the test programs aiocp and aiodio_sparse. > > > (see http://developer.osdl.org/daniel/AIO/) > > > > > > Using aiocp with i/o sizes from 1k to 512k to copy files worked > > > without any errors or kernel debug messages. > > > > > > With 64k i/o, the aiodio_sparse program complete without any errors. > > > There are no kernel error messages, so that is good. > > > > > > There are still problems with non power of 2 i/o sizes using AIO and > > > O_DIRECT. It hangs with aio's that do not seem to complete. The test > > > does exit when hitting ^c and there are no kernel messages. Test output > > > below: > > > > Could you check if the following patch fixes the problem for you ? > > > > Regards > > Suparna > > > > -------------------------------------------------------------- > > > > With this patch, when the DIO code falls back to buffered i/o after > > having submitted part of the i/o, then buffered i/o is issued only > > for the remaining part of the request (i.e. the part not already > > covered by DIO). > > > > diff -ur pure-mm3/fs/direct-io.c linux-2.6.0-test9-mm3/fs/direct-io.c > > --- pure-mm3/fs/direct-io.c 2003-11-14 09:09:06.000000000 +0530 > > +++ linux-2.6.0-test9-mm3/fs/direct-io.c 2003-11-17 09:00:47.000000000 +0530 > > @@ -74,6 +74,7 @@ > > been performed at the start of a > > write */ > > int pages_in_io; /* approximate total IO pages */ > > + size_t size; /* total request size (doesn't change)*/ > > sector_t block_in_file; /* Current offset into the underlying > > file in dio_block units. */ > > unsigned blocks_available; /* At block_in_file. changes */ > > @@ -226,7 +227,7 @@ > > dio_complete(dio, dio->block_in_file << dio->blkbits, > > dio->result); > > /* Complete AIO later if falling back to buffered i/o */ > > - if (dio->result != -ENOTBLK) { > > + if (dio->result >= dio->size || dio->rw == READ) { > > aio_complete(dio->iocb, dio->result, 0); > > kfree(dio); > > } else { > > @@ -889,6 +890,7 @@ > > dio->blkbits = blkbits; > > dio->blkfactor = inode->i_blkbits - blkbits; > > dio->start_zero_done = 0; > > + dio->size = 0; > > dio->block_in_file = offset >> blkbits; > > dio->blocks_available = 0; > > dio->cur_page = NULL; > > @@ -925,7 +927,7 @@ > > > > for (seg = 0; seg < nr_segs; seg++) { > > user_addr = (unsigned long)iov[seg].iov_base; > > - bytes = iov[seg].iov_len; > > + dio->size += bytes = iov[seg].iov_len; > > > > /* Index into the first page of the first block */ > > dio->first_block_in_page = (user_addr & ~PAGE_MASK) >> blkbits; > > @@ -956,6 +958,13 @@ > > } > > } /* end iovec loop */ > > > > + if (ret == -ENOTBLK && rw == WRITE) { > > + /* > > + * The remaining part of the request will be > > + * be handled by buffered I/O when we return > > + */ > > + ret = 0; > > + } > > /* > > * There may be some unwritten disk at the end of a part-written > > * fs-block-sized block. Go zero that now. > > @@ -986,19 +995,13 @@ > > */ > > if (dio->is_async) { > > if (ret == 0) > > - ret = dio->result; /* Bytes written */ > > - if (ret == -ENOTBLK) { > > - /* > > - * The request will be reissued via buffered I/O > > - * when we return; Any I/O already issued > > - * effectively becomes redundant. > > - */ > > - dio->result = ret; > > + ret = dio->result; > > + if (ret > 0 && dio->result < dio->size && rw == WRITE) { > > dio->waiter = current; > > } > > finished_one_bio(dio); /* This can free the dio */ > > blk_run_queues(); > > - if (ret == -ENOTBLK) { > > + if (dio->waiter) { > > /* > > * Wait for already issued I/O to drain out and > > * release its references to user-space pages > > @@ -1032,7 +1035,8 @@ > > } > > dio_complete(dio, offset, ret); > > /* We could have also come here on an AIO file extend */ > > - if (!is_sync_kiocb(iocb) && (ret != -ENOTBLK)) > > + if (!is_sync_kiocb(iocb) && !(rw == WRITE && ret >= 0 && > > + dio->result < dio->size)) > > aio_complete(iocb, ret, 0); > > kfree(dio); > > } > > diff -ur pure-mm3/mm/filemap.c linux-2.6.0-test9-mm3/mm/filemap.c > > --- pure-mm3/mm/filemap.c 2003-11-14 09:15:08.000000000 +0530 > > +++ linux-2.6.0-test9-mm3/mm/filemap.c 2003-11-15 11:11:16.000000000 +0530 > > @@ -1895,14 +1895,16 @@ > > */ > > if (written >= 0 && file->f_flags & O_SYNC) > > status = generic_osync_inode(inode, mapping, OSYNC_METADATA); > > - if (written >= 0 && !is_sync_kiocb(iocb)) > > + if (written >= count && !is_sync_kiocb(iocb)) > > written = -EIOCBQUEUED; > > - if (written != -ENOTBLK) > > + if (written < 0 || written >= count) > > goto out_status; > > /* > > * direct-io write to a hole: fall through to buffered I/O > > + * for completing the rest of the request. > > */ > > - written = 0; > > + pos += written; > > + count -= written; > > } > > > > buf = iov->iov_base; > > -- > To unsubscribe, send a message with 'unsubscribe linux-aio' in > the body to majordomo@kvack.org. For more info on Linux AIO, > see: http://www.kvack.org/aio/ > Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 - AIO test results 2003-11-18 1:37 ` Daniel McNeil @ 2003-11-18 11:55 ` Suparna Bhattacharya 2003-11-18 23:47 ` Daniel McNeil 0 siblings, 1 reply; 49+ messages in thread From: Suparna Bhattacharya @ 2003-11-18 11:55 UTC (permalink / raw) To: Daniel McNeil Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio I don't seem to able to recreate this at my end - even with 1k block sizes. Did you notice if this problem occurs without the latest patch ? Regards Suparna On Mon, Nov 17, 2003 at 05:37:14PM -0800, Daniel McNeil wrote: > Obviously, the ps output in my previous email showed that the hangs were > with 1k i/o sizes. > > More testing using 2k, 4k, 16k, 32k, 64k, 128k, 256k and 512k all > completed correctly. > > Even 11k and 17k worked. > > $ ls -l > -rw------- 1 daniel daniel 88289280 Jun 9 16:54 glibc-2.3.2.tar > -rw-rw-r-- 1 daniel daniel 88289280 Nov 17 17:32 ff2 > > > So, only 1k is hanging so far. > > Daniel > > On Mon, 2003-11-17 at 17:15, Daniel McNeil wrote: > > Suparna, > > > > Good news and bad news. Your patch does fix the non-power of two i/o > > size problems where AIO previously did not complete: > > > > $ ./aiodio_sparse -s 1751k -r 18k -w 11k > > $ aiodio_sparse -i 9 -dd -s 180k -r 18k -w 18k > > io_submit() return 9 > > aiodio_sparse: 9 i/o in flight > > aiodio_sparse: offset 165888 filesize 184320 inflight 9 > > aiodio_sparse: io_getevent() returned 1 > > aiodio_sparse: io_getevent() res 18432 res2 0 > > io_submit() return 1 > > AIO DIO write done unlinking file > > dio_sparse done writing, kill children > > aiodio_sparse 0 children had errors > > > > But when testing using aiocp using O_DIRECT to copy a file to > > an already allocated file, the aiocp process hangs. I used i/o > > size of 4k and that compeleted. Using i/o size of 1k and 2k, > > the aiocp process hung during io_sumbit() and are unkillable. > > Here are the stack traces: > > > > # ps -fu daniel | grep aiocp > > daniel 1920 1 0 16:45 ? 00:00:07 aiocp -b 1k -n 1 -f DIRECT glibc-2.3.2.tar ff2 > > daniel 2083 2037 0 17:00 pts/2 00:00:03 aiocp -dd -b 1k -n 8 -f DIRECT glibc-2.3.2.tar ff2 > > > > > > aiocp D 00000001 1920 1 1902 (NOTLB) > > e70abd04 00200086 c18dbc80 00000001 00000003 c02897fc 00000060 00200246 > > f7cdb8b4 c16522f0 c18dbc80 0000309c 640a05eb 0000008b e6d9e660 > > c0289a16 > > f7cdb8b4 e87e95cc c18dbc80 00000000 00000001 e70abd10 c0123712 > > e70aa000 > > Call Trace: > > [<c02897fc>] generic_unplug_device+0x50/0xbd > > [<c0289a16>] blk_run_queues+0xa9/0x15c > > [<c0123712>] io_schedule+0x26/0x30 > > [<c0192242>] direct_io_worker+0x376/0x5ab > > [<c014840f>] generic_file_direct_IO+0x70/0x89 > > [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5 > > [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf > > [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1 > > [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf > > [<c014840f>] generic_file_direct_IO+0x70/0x89 > > [<c0145e11>] __generic_file_aio_read+0xfb/0x1ff > > [<c0121b70>] schedule+0x3ac/0x7ef > > [<c0145f48>] generic_file_aio_read+0x33/0x37 > > [<c0194ad3>] aio_pread+0x34/0x5f > > [<c0193bec>] aio_run_iocb+0xa6/0x1ed > > [<c019316f>] __aio_get_req+0x27/0x158 > > [<c0194a9f>] aio_pread+0x0/0x5f > > [<c0194f62>] io_submit_one+0x1ea/0x2b7 > > [<c0195110>] sys_io_submit+0xe1/0x194 > > [<c03c29a7>] syscall_call+0x7/0xb > > [<c03c007b>] rpc_depopulate+0x1aa/0x24b > > > > > > aiocp D 366EDC94 2083 2037 (NOTLB) > > e758bd04 00200082 f71ba000 366edc94 00000161 c02897fc 00000060 366edc94 > > 00000161 f71ba000 c18d3c80 000069a9 366f5a0e 00000161 e8d4acc0 c0289a16 > > f7cdb8b4 e960465c c18d3c80 00000000 00000001 e758bd10 c0123712 e758a000 > > Call Trace: > > [<c02897fc>] generic_unplug_device+0x50/0xbd > > [<c0289a16>] blk_run_queues+0xa9/0x15c > > [<c0123712>] io_schedule+0x26/0x30 > > [<c0192242>] direct_io_worker+0x376/0x5ab > > [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5 > > [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf > > [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1 > > [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf > > [<c014840f>] generic_file_direct_IO+0x70/0x89 > > [<c0145e11>] __generic_file_aio_read+0xfb/0x1ff > > [<c0259d3e>] write_chan+0x165/0x21e > > [<c0145f48>] generic_file_aio_read+0x33/0x37 > > [<c0194ad3>] aio_pread+0x34/0x5f > > [<c0193bec>] aio_run_iocb+0xa6/0x1ed > > [<c019316f>] __aio_get_req+0x27/0x158 > > [<c0194a9f>] aio_pread+0x0/0x5f > > [<c02532ab>] tty_write+0x1e8/0x3b2 > > [<c0194f62>] io_submit_one+0x1ea/0x2b7 > > [<c0195110>] sys_io_submit+0xe1/0x194 > > [<c03c29a7>] syscall_call+0x7/0xb > > [<c03c007b>] rpc_depopulate+0x1aa/0x24b > > > > > > > > Daniel > > > > On Sun, 2003-11-16 at 21:25, Suparna Bhattacharya wrote: > > > On Thu, Nov 13, 2003 at 02:03:58PM -0800, Daniel McNeil wrote: > > > > Andrew, > > > > > > > > I'm testing test9-mm3 on a 2-proc Xeon with a ext3 file system. > > > > I tested using the test programs aiocp and aiodio_sparse. > > > > (see http://developer.osdl.org/daniel/AIO/) > > > > > > > > Using aiocp with i/o sizes from 1k to 512k to copy files worked > > > > without any errors or kernel debug messages. > > > > > > > > With 64k i/o, the aiodio_sparse program complete without any errors. > > > > There are no kernel error messages, so that is good. > > > > > > > > There are still problems with non power of 2 i/o sizes using AIO and > > > > O_DIRECT. It hangs with aio's that do not seem to complete. The test > > > > does exit when hitting ^c and there are no kernel messages. Test output > > > > below: > > > > > > Could you check if the following patch fixes the problem for you ? > > > > > > Regards > > > Suparna > > > > > > -------------------------------------------------------------- > > > > > > With this patch, when the DIO code falls back to buffered i/o after > > > having submitted part of the i/o, then buffered i/o is issued only > > > for the remaining part of the request (i.e. the part not already > > > covered by DIO). > > > > > > diff -ur pure-mm3/fs/direct-io.c linux-2.6.0-test9-mm3/fs/direct-io.c > > > --- pure-mm3/fs/direct-io.c 2003-11-14 09:09:06.000000000 +0530 > > > +++ linux-2.6.0-test9-mm3/fs/direct-io.c 2003-11-17 09:00:47.000000000 +0530 > > > @@ -74,6 +74,7 @@ > > > been performed at the start of a > > > write */ > > > int pages_in_io; /* approximate total IO pages */ > > > + size_t size; /* total request size (doesn't change)*/ > > > sector_t block_in_file; /* Current offset into the underlying > > > file in dio_block units. */ > > > unsigned blocks_available; /* At block_in_file. changes */ > > > @@ -226,7 +227,7 @@ > > > dio_complete(dio, dio->block_in_file << dio->blkbits, > > > dio->result); > > > /* Complete AIO later if falling back to buffered i/o */ > > > - if (dio->result != -ENOTBLK) { > > > + if (dio->result >= dio->size || dio->rw == READ) { > > > aio_complete(dio->iocb, dio->result, 0); > > > kfree(dio); > > > } else { > > > @@ -889,6 +890,7 @@ > > > dio->blkbits = blkbits; > > > dio->blkfactor = inode->i_blkbits - blkbits; > > > dio->start_zero_done = 0; > > > + dio->size = 0; > > > dio->block_in_file = offset >> blkbits; > > > dio->blocks_available = 0; > > > dio->cur_page = NULL; > > > @@ -925,7 +927,7 @@ > > > > > > for (seg = 0; seg < nr_segs; seg++) { > > > user_addr = (unsigned long)iov[seg].iov_base; > > > - bytes = iov[seg].iov_len; > > > + dio->size += bytes = iov[seg].iov_len; > > > > > > /* Index into the first page of the first block */ > > > dio->first_block_in_page = (user_addr & ~PAGE_MASK) >> blkbits; > > > @@ -956,6 +958,13 @@ > > > } > > > } /* end iovec loop */ > > > > > > + if (ret == -ENOTBLK && rw == WRITE) { > > > + /* > > > + * The remaining part of the request will be > > > + * be handled by buffered I/O when we return > > > + */ > > > + ret = 0; > > > + } > > > /* > > > * There may be some unwritten disk at the end of a part-written > > > * fs-block-sized block. Go zero that now. > > > @@ -986,19 +995,13 @@ > > > */ > > > if (dio->is_async) { > > > if (ret == 0) > > > - ret = dio->result; /* Bytes written */ > > > - if (ret == -ENOTBLK) { > > > - /* > > > - * The request will be reissued via buffered I/O > > > - * when we return; Any I/O already issued > > > - * effectively becomes redundant. > > > - */ > > > - dio->result = ret; > > > + ret = dio->result; > > > + if (ret > 0 && dio->result < dio->size && rw == WRITE) { > > > dio->waiter = current; > > > } > > > finished_one_bio(dio); /* This can free the dio */ > > > blk_run_queues(); > > > - if (ret == -ENOTBLK) { > > > + if (dio->waiter) { > > > /* > > > * Wait for already issued I/O to drain out and > > > * release its references to user-space pages > > > @@ -1032,7 +1035,8 @@ > > > } > > > dio_complete(dio, offset, ret); > > > /* We could have also come here on an AIO file extend */ > > > - if (!is_sync_kiocb(iocb) && (ret != -ENOTBLK)) > > > + if (!is_sync_kiocb(iocb) && !(rw == WRITE && ret >= 0 && > > > + dio->result < dio->size)) > > > aio_complete(iocb, ret, 0); > > > kfree(dio); > > > } > > > diff -ur pure-mm3/mm/filemap.c linux-2.6.0-test9-mm3/mm/filemap.c > > > --- pure-mm3/mm/filemap.c 2003-11-14 09:15:08.000000000 +0530 > > > +++ linux-2.6.0-test9-mm3/mm/filemap.c 2003-11-15 11:11:16.000000000 +0530 > > > @@ -1895,14 +1895,16 @@ > > > */ > > > if (written >= 0 && file->f_flags & O_SYNC) > > > status = generic_osync_inode(inode, mapping, OSYNC_METADATA); > > > - if (written >= 0 && !is_sync_kiocb(iocb)) > > > + if (written >= count && !is_sync_kiocb(iocb)) > > > written = -EIOCBQUEUED; > > > - if (written != -ENOTBLK) > > > + if (written < 0 || written >= count) > > > goto out_status; > > > /* > > > * direct-io write to a hole: fall through to buffered I/O > > > + * for completing the rest of the request. > > > */ > > > - written = 0; > > > + pos += written; > > > + count -= written; > > > } > > > > > > buf = iov->iov_base; > > > > -- > > To unsubscribe, send a message with 'unsubscribe linux-aio' in > > the body to majordomo@kvack.org. For more info on Linux AIO, > > see: http://www.kvack.org/aio/ > > Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> > > -- > To unsubscribe, send a message with 'unsubscribe linux-aio' in > the body to majordomo@kvack.org. For more info on Linux AIO, > see: http://www.kvack.org/aio/ > Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> -- Suparna Bhattacharya (suparna@in.ibm.com) Linux Technology Center IBM Software Labs, India ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 - AIO test results 2003-11-18 11:55 ` Suparna Bhattacharya @ 2003-11-18 23:47 ` Daniel McNeil 2003-11-24 9:42 ` Suparna Bhattacharya 0 siblings, 1 reply; 49+ messages in thread From: Daniel McNeil @ 2003-11-18 23:47 UTC (permalink / raw) To: Suparna Bhattacharya Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio Suparna, I was unable to reproduce the hang in io_submit() without your patch. I ran aiocp with 1k i/o size constantly for 2 hours and it never hung. I re-ran with your patch with both as-iosched and deadline and both hung in io_submit(). aiocp would run a few times, but I put the aiocp in a while loop and it hung on the 1st or 2nd time. It did get most of the way through copying the file before hanging. This is on a 2-proc to ide disks running ext3. Here is the stack trace and other info for as-iosched: daniel 2005 0.7 0.0 1388 384 pts/0 D 13:51 0:08 aiocp -dd -b 1k -n 8 -f DIRECT glibc-2.3.2.tar ff2 cat /proc/2005/wchan io_schedule aiocp D 00000001 2005 1870 (NOTLB) e53cfc08 00200086 c18d3c80 00000001 00000003 c02897fc 00000060 00200246 f7cdb8b4 c0191630 c18d3c80 0000bfc6 78d5d3e5 00000233 e4dc1980 c0289a16 f7cdb8b4 d92978e4 c18d3c80 00000000 00000001 e53cfc14 c0123712 e53ce000 Call Trace: [<c02897fc>] generic_unplug_device+0x50/0xbd [<c0191630>] dio_bio_add_page+0x34/0x79 [<c0289a16>] blk_run_queues+0xa9/0x15c [<c0123712>] io_schedule+0x26/0x30 [<c0192242>] direct_io_worker+0x376/0x5ab [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5 [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1 [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf [<c014840f>] generic_file_direct_IO+0x70/0x89 [<c0147a80>] __generic_file_aio_write_nolock+0xa3a/0xda5 [<c025b049>] pty_write+0x1c8/0x1ca [<c01480a4>] generic_file_aio_write+0x7e/0x115 [<c0256d12>] opost+0x9e/0x1cf [<c01aa4a3>] ext3_file_write+0x3f/0xcc [<c0194b3a>] aio_pwrite+0x3c/0xad [<c0193bec>] aio_run_iocb+0xa6/0x1ed [<c019316f>] __aio_get_req+0x27/0x158 [<c0194afe>] aio_pwrite+0x0/0xad [<c02532ab>] tty_write+0x1e8/0x3b2 [<c0194f62>] io_submit_one+0x1ea/0x2b7 [<c0195110>] sys_io_submit+0xe1/0x194 [<c03c29a7>] syscall_call+0x7/0xb For deadline iosched: daniel 1889 0.1 0.0 1388 384 pts/0 D 15:12 0:01 aiocp -dd -b 1k -n 8 -f DIRECT glibc-2.3.2.tar ff2 $ cat /proc/1889/wchan io_schedule $ cat /sys/block/hdb/stat 209058 23145 45744 58542 209022 22069 0 20758 45210 aiocp D 0AD7701D 1889 1752 (NOTLB) ee2ddd04 00200086 f75e6660 0ad7701d 0000004e 00200282 ebd37cbc 0ad7701d 0000004e f75e6660 c18d3c80 00060539 0ad7701d 0000004e f75e6000 0000006b ee2ddd10 c0192212 c18d3c80 00000000 00000001 ee2ddd10 c0123712 ee2dc000 Call Trace: [<c0192212>] direct_io_worker+0x346/0x5ab [<c0123712>] io_schedule+0x26/0x30 [<c0192242>] direct_io_worker+0x376/0x5ab [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5 [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1 [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf [<c014840f>] generic_file_direct_IO+0x70/0x89 [<c0145e11>] __generic_file_aio_read+0xfb/0x1ff [<c0259d3e>] write_chan+0x165/0x21e [<c0145f48>] generic_file_aio_read+0x33/0x37 [<c0194ad3>] aio_pread+0x34/0x5f [<c0193bec>] aio_run_iocb+0xa6/0x1ed [<c019316f>] __aio_get_req+0x27/0x158 [<c0194a9f>] aio_pread+0x0/0x5f [<c02532ab>] tty_write+0x1e8/0x3b2 [<c0194f62>] io_submit_one+0x1ea/0x2b7 [<c0195110>] sys_io_submit+0xe1/0x194 [<c03c29a7>] syscall_call+0x7/0xb The hung processes are stuck in the 'D' state and unkillable, of course. I would appear something is wrong with your patch. Any ideas? Daniel On Tue, 2003-11-18 at 03:55, Suparna Bhattacharya wrote: > I don't seem to able to recreate this at my end - even with 1k > block sizes. Did you notice if this problem occurs without > the latest patch ? > > Regards > Suparna > > On Mon, Nov 17, 2003 at 05:37:14PM -0800, Daniel McNeil wrote: > > Obviously, the ps output in my previous email showed that the hangs were > > with 1k i/o sizes. > > > > More testing using 2k, 4k, 16k, 32k, 64k, 128k, 256k and 512k all > > completed correctly. > > > > Even 11k and 17k worked. > > > > $ ls -l > > -rw------- 1 daniel daniel 88289280 Jun 9 16:54 glibc-2.3.2.tar > > -rw-rw-r-- 1 daniel daniel 88289280 Nov 17 17:32 ff2 > > > > > > So, only 1k is hanging so far. > > > > Daniel > > > > On Mon, 2003-11-17 at 17:15, Daniel McNeil wrote: > > > Suparna, > > > > > > Good news and bad news. Your patch does fix the non-power of two i/o > > > size problems where AIO previously did not complete: > > > > > > $ ./aiodio_sparse -s 1751k -r 18k -w 11k > > > $ aiodio_sparse -i 9 -dd -s 180k -r 18k -w 18k > > > io_submit() return 9 > > > aiodio_sparse: 9 i/o in flight > > > aiodio_sparse: offset 165888 filesize 184320 inflight 9 > > > aiodio_sparse: io_getevent() returned 1 > > > aiodio_sparse: io_getevent() res 18432 res2 0 > > > io_submit() return 1 > > > AIO DIO write done unlinking file > > > dio_sparse done writing, kill children > > > aiodio_sparse 0 children had errors > > > > > > But when testing using aiocp using O_DIRECT to copy a file to > > > an already allocated file, the aiocp process hangs. I used i/o > > > size of 4k and that compeleted. Using i/o size of 1k and 2k, > > > the aiocp process hung during io_sumbit() and are unkillable. > > > Here are the stack traces: > > > > > > # ps -fu daniel | grep aiocp > > > daniel 1920 1 0 16:45 ? 00:00:07 aiocp -b 1k -n 1 -f DIRECT glibc-2.3.2.tar ff2 > > > daniel 2083 2037 0 17:00 pts/2 00:00:03 aiocp -dd -b 1k -n 8 -f DIRECT glibc-2.3.2.tar ff2 > > > > > > > > > aiocp D 00000001 1920 1 1902 (NOTLB) > > > e70abd04 00200086 c18dbc80 00000001 00000003 c02897fc 00000060 00200246 > > > f7cdb8b4 c16522f0 c18dbc80 0000309c 640a05eb 0000008b e6d9e660 > > > c0289a16 > > > f7cdb8b4 e87e95cc c18dbc80 00000000 00000001 e70abd10 c0123712 > > > e70aa000 > > > Call Trace: > > > [<c02897fc>] generic_unplug_device+0x50/0xbd > > > [<c0289a16>] blk_run_queues+0xa9/0x15c > > > [<c0123712>] io_schedule+0x26/0x30 > > > [<c0192242>] direct_io_worker+0x376/0x5ab > > > [<c014840f>] generic_file_direct_IO+0x70/0x89 > > > [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5 > > > [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf > > > [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1 > > > [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf > > > [<c014840f>] generic_file_direct_IO+0x70/0x89 > > > [<c0145e11>] __generic_file_aio_read+0xfb/0x1ff > > > [<c0121b70>] schedule+0x3ac/0x7ef > > > [<c0145f48>] generic_file_aio_read+0x33/0x37 > > > [<c0194ad3>] aio_pread+0x34/0x5f > > > [<c0193bec>] aio_run_iocb+0xa6/0x1ed > > > [<c019316f>] __aio_get_req+0x27/0x158 > > > [<c0194a9f>] aio_pread+0x0/0x5f > > > [<c0194f62>] io_submit_one+0x1ea/0x2b7 > > > [<c0195110>] sys_io_submit+0xe1/0x194 > > > [<c03c29a7>] syscall_call+0x7/0xb > > > [<c03c007b>] rpc_depopulate+0x1aa/0x24b > > > > > > > > > aiocp D 366EDC94 2083 2037 (NOTLB) > > > e758bd04 00200082 f71ba000 366edc94 00000161 c02897fc 00000060 366edc94 > > > 00000161 f71ba000 c18d3c80 000069a9 366f5a0e 00000161 e8d4acc0 c0289a16 > > > f7cdb8b4 e960465c c18d3c80 00000000 00000001 e758bd10 c0123712 e758a000 > > > Call Trace: > > > [<c02897fc>] generic_unplug_device+0x50/0xbd > > > [<c0289a16>] blk_run_queues+0xa9/0x15c > > > [<c0123712>] io_schedule+0x26/0x30 > > > [<c0192242>] direct_io_worker+0x376/0x5ab > > > [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5 > > > [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf > > > [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1 > > > [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf > > > [<c014840f>] generic_file_direct_IO+0x70/0x89 > > > [<c0145e11>] __generic_file_aio_read+0xfb/0x1ff > > > [<c0259d3e>] write_chan+0x165/0x21e > > > [<c0145f48>] generic_file_aio_read+0x33/0x37 > > > [<c0194ad3>] aio_pread+0x34/0x5f > > > [<c0193bec>] aio_run_iocb+0xa6/0x1ed > > > [<c019316f>] __aio_get_req+0x27/0x158 > > > [<c0194a9f>] aio_pread+0x0/0x5f > > > [<c02532ab>] tty_write+0x1e8/0x3b2 > > > [<c0194f62>] io_submit_one+0x1ea/0x2b7 > > > [<c0195110>] sys_io_submit+0xe1/0x194 > > > [<c03c29a7>] syscall_call+0x7/0xb > > > [<c03c007b>] rpc_depopulate+0x1aa/0x24b > > > > > > > > > > > > Daniel > > > > > > On Sun, 2003-11-16 at 21:25, Suparna Bhattacharya wrote: > > > > On Thu, Nov 13, 2003 at 02:03:58PM -0800, Daniel McNeil wrote: > > > > > Andrew, > > > > > > > > > > I'm testing test9-mm3 on a 2-proc Xeon with a ext3 file system. > > > > > I tested using the test programs aiocp and aiodio_sparse. > > > > > (see http://developer.osdl.org/daniel/AIO/) > > > > > > > > > > Using aiocp with i/o sizes from 1k to 512k to copy files worked > > > > > without any errors or kernel debug messages. > > > > > > > > > > With 64k i/o, the aiodio_sparse program complete without any errors. > > > > > There are no kernel error messages, so that is good. > > > > > > > > > > There are still problems with non power of 2 i/o sizes using AIO and > > > > > O_DIRECT. It hangs with aio's that do not seem to complete. The test > > > > > does exit when hitting ^c and there are no kernel messages. Test output > > > > > below: > > > > > > > > Could you check if the following patch fixes the problem for you ? > > > > > > > > Regards > > > > Suparna > > > > > > > > -------------------------------------------------------------- > > > > > > > > With this patch, when the DIO code falls back to buffered i/o after > > > > having submitted part of the i/o, then buffered i/o is issued only > > > > for the remaining part of the request (i.e. the part not already > > > > covered by DIO). > > > > > > > > diff -ur pure-mm3/fs/direct-io.c linux-2.6.0-test9-mm3/fs/direct-io.c > > > > --- pure-mm3/fs/direct-io.c 2003-11-14 09:09:06.000000000 +0530 > > > > +++ linux-2.6.0-test9-mm3/fs/direct-io.c 2003-11-17 09:00:47.000000000 +0530 > > > > @@ -74,6 +74,7 @@ > > > > been performed at the start of a > > > > write */ > > > > int pages_in_io; /* approximate total IO pages */ > > > > + size_t size; /* total request size (doesn't change)*/ > > > > sector_t block_in_file; /* Current offset into the underlying > > > > file in dio_block units. */ > > > > unsigned blocks_available; /* At block_in_file. changes */ > > > > @@ -226,7 +227,7 @@ > > > > dio_complete(dio, dio->block_in_file << dio->blkbits, > > > > dio->result); > > > > /* Complete AIO later if falling back to buffered i/o */ > > > > - if (dio->result != -ENOTBLK) { > > > > + if (dio->result >= dio->size || dio->rw == READ) { > > > > aio_complete(dio->iocb, dio->result, 0); > > > > kfree(dio); > > > > } else { > > > > @@ -889,6 +890,7 @@ > > > > dio->blkbits = blkbits; > > > > dio->blkfactor = inode->i_blkbits - blkbits; > > > > dio->start_zero_done = 0; > > > > + dio->size = 0; > > > > dio->block_in_file = offset >> blkbits; > > > > dio->blocks_available = 0; > > > > dio->cur_page = NULL; > > > > @@ -925,7 +927,7 @@ > > > > > > > > for (seg = 0; seg < nr_segs; seg++) { > > > > user_addr = (unsigned long)iov[seg].iov_base; > > > > - bytes = iov[seg].iov_len; > > > > + dio->size += bytes = iov[seg].iov_len; > > > > > > > > /* Index into the first page of the first block */ > > > > dio->first_block_in_page = (user_addr & ~PAGE_MASK) >> blkbits; > > > > @@ -956,6 +958,13 @@ > > > > } > > > > } /* end iovec loop */ > > > > > > > > + if (ret == -ENOTBLK && rw == WRITE) { > > > > + /* > > > > + * The remaining part of the request will be > > > > + * be handled by buffered I/O when we return > > > > + */ > > > > + ret = 0; > > > > + } > > > > /* > > > > * There may be some unwritten disk at the end of a part-written > > > > * fs-block-sized block. Go zero that now. > > > > @@ -986,19 +995,13 @@ > > > > */ > > > > if (dio->is_async) { > > > > if (ret == 0) > > > > - ret = dio->result; /* Bytes written */ > > > > - if (ret == -ENOTBLK) { > > > > - /* > > > > - * The request will be reissued via buffered I/O > > > > - * when we return; Any I/O already issued > > > > - * effectively becomes redundant. > > > > - */ > > > > - dio->result = ret; > > > > + ret = dio->result; > > > > + if (ret > 0 && dio->result < dio->size && rw == WRITE) { > > > > dio->waiter = current; > > > > } > > > > finished_one_bio(dio); /* This can free the dio */ > > > > blk_run_queues(); > > > > - if (ret == -ENOTBLK) { > > > > + if (dio->waiter) { > > > > /* > > > > * Wait for already issued I/O to drain out and > > > > * release its references to user-space pages > > > > @@ -1032,7 +1035,8 @@ > > > > } > > > > dio_complete(dio, offset, ret); > > > > /* We could have also come here on an AIO file extend */ > > > > - if (!is_sync_kiocb(iocb) && (ret != -ENOTBLK)) > > > > + if (!is_sync_kiocb(iocb) && !(rw == WRITE && ret >= 0 && > > > > + dio->result < dio->size)) > > > > aio_complete(iocb, ret, 0); > > > > kfree(dio); > > > > } > > > > diff -ur pure-mm3/mm/filemap.c linux-2.6.0-test9-mm3/mm/filemap.c > > > > --- pure-mm3/mm/filemap.c 2003-11-14 09:15:08.000000000 +0530 > > > > +++ linux-2.6.0-test9-mm3/mm/filemap.c 2003-11-15 11:11:16.000000000 +0530 > > > > @@ -1895,14 +1895,16 @@ > > > > */ > > > > if (written >= 0 && file->f_flags & O_SYNC) > > > > status = generic_osync_inode(inode, mapping, OSYNC_METADATA); > > > > - if (written >= 0 && !is_sync_kiocb(iocb)) > > > > + if (written >= count && !is_sync_kiocb(iocb)) > > > > written = -EIOCBQUEUED; > > > > - if (written != -ENOTBLK) > > > > + if (written < 0 || written >= count) > > > > goto out_status; > > > > /* > > > > * direct-io write to a hole: fall through to buffered I/O > > > > + * for completing the rest of the request. > > > > */ > > > > - written = 0; > > > > + pos += written; > > > > + count -= written; > > > > } > > > > > > > > buf = iov->iov_base; > > > > > > -- > > > To unsubscribe, send a message with 'unsubscribe linux-aio' in > > > the body to majordomo@kvack.org. For more info on Linux AIO, > > > see: http://www.kvack.org/aio/ > > > Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> > > > > -- > > To unsubscribe, send a message with 'unsubscribe linux-aio' in > > the body to majordomo@kvack.org. For more info on Linux AIO, > > see: http://www.kvack.org/aio/ > > Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 - AIO test results 2003-11-18 23:47 ` Daniel McNeil @ 2003-11-24 9:42 ` Suparna Bhattacharya 2003-11-25 23:49 ` [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch Daniel McNeil 0 siblings, 1 reply; 49+ messages in thread From: Suparna Bhattacharya @ 2003-11-24 9:42 UTC (permalink / raw) To: Daniel McNeil Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio On Tue, Nov 18, 2003 at 03:47:53PM -0800, Daniel McNeil wrote: > Suparna, > > I was unable to reproduce the hang in io_submit() without your patch. > I ran aiocp with 1k i/o size constantly for 2 hours and it never hung. > > I re-ran with your patch with both as-iosched and deadline and both > hung in io_submit(). aiocp would run a few times, but I put the > aiocp in a while loop and it hung on the 1st or 2nd time. It > did get most of the way through copying the file before hanging. > This is on a 2-proc to ide disks running ext3. > Found one race ... not sure if its the one causing the hangs you see. The attached patch is not a complete fix (there is one other race to close), but it would be interesting to see if this makes any difference for you. Regards Suparna -- Suparna Bhattacharya (suparna@in.ibm.com) Linux Technology Center IBM Software Labs, India ------------------------------------------------------ Don't access dio fields if its possible that the dio could already have been freed asynchronously during i/o completion. Fixme: This still leaves a window between decrement of bio_count and accessing dio->waiter during i/o completion wherein the dio could get freed by the submission path. --- pure-mm3/fs/direct-io.c 2003-11-24 13:00:33.000000000 +0530 +++ linux-2.6.0-test9-mm3/fs/direct-io.c 2003-11-24 14:15:30.000000000 +0530 @@ -994,14 +995,17 @@ * reflect the number of to-be-processed BIOs. */ if (dio->is_async) { - if (ret == 0) - ret = dio->result; - if (ret > 0 && dio->result < dio->size && rw == WRITE) { + int should_wait = 0; + + if (dio->result < dio->size && rw == WRITE) { dio->waiter = current; + should_wait = 1; } + if (ret == 0) + ret = dio->result; finished_one_bio(dio); /* This can free the dio */ blk_run_queues(); - if (dio->waiter) { + if (should_wait) { /* * Wait for already issued I/O to drain out and * release its references to user-space pages @@ -1013,7 +1017,7 @@ set_current_state(TASK_UNINTERRUPTIBLE); } set_current_state(TASK_RUNNING); - dio->waiter = NULL; + kfree(dio); } } else { finished_one_bio(dio); ^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch 2003-11-24 9:42 ` Suparna Bhattacharya @ 2003-11-25 23:49 ` Daniel McNeil 2003-11-26 7:55 ` Suparna Bhattacharya 0 siblings, 1 reply; 49+ messages in thread From: Daniel McNeil @ 2003-11-25 23:49 UTC (permalink / raw) To: Suparna Bhattacharya Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio [-- Attachment #1: Type: text/plain, Size: 1861 bytes --] Suparna, Yes your patch did help. I originally had CONFIG_DEBUG_SLAB=y which was helping me see problems because the the freed dio was getting poisoned. I also tested with CONFIG_DEBUG_PAGEALLOC=y which is very good at catching these. I updated your AIO fallback patch plus your AIO race plus I fixed the bio_count decrement fix. This patch has all three fixes and it is working for me. I fixed the bio_count race, by changing bio_list_lock into bio_lock and using that for all the bio fields. I changed bio_count and bios_in_flight from atomics into int. They are now proctected by the bio_lock. I fixed the race, by in finished_one_bio() by leaving the bio_count at 1 until after the dio_complete() and then do the bio_count decrement and wakeup holding the bio_lock. Take a look, give it a try, and let me know what you think. I've tested this on my 2-way and so far all my tests have past. I have more testing to do, but this is working better. Thanks, Daniel On Mon, 2003-11-24 at 01:42, Suparna Bhattacharya wrote: > On Tue, Nov 18, 2003 at 03:47:53PM -0800, Daniel McNeil wrote: > > Suparna, > > > > I was unable to reproduce the hang in io_submit() without your patch. > > I ran aiocp with 1k i/o size constantly for 2 hours and it never hung. > > > > I re-ran with your patch with both as-iosched and deadline and both > > hung in io_submit(). aiocp would run a few times, but I put the > > aiocp in a while loop and it hung on the 1st or 2nd time. It > > did get most of the way through copying the file before hanging. > > This is on a 2-proc to ide disks running ext3. > > > > Found one race ... not sure if its the one causing the hangs > you see. The attached patch is not a complete fix (there is one > other race to close), but it would be interesting to see if > this makes any difference for you. > > Regards > Suparna [-- Attachment #2: 2.6.0-test9-mm5.aio-dio-fallback-bio_count-race.patch --] [-- Type: text/x-patch, Size: 9088 bytes --] diff -rupN -X /home/daniel/dontdiff linux-2.6.0-test9-mm5/fs/direct-io.c linux-2.6.0-test9-mm5.ddm/fs/direct-io.c --- linux-2.6.0-test9-mm5/fs/direct-io.c 2003-11-24 09:06:05.000000000 -0800 +++ linux-2.6.0-test9-mm5.ddm/fs/direct-io.c 2003-11-25 14:52:43.566103685 -0800 @@ -74,6 +74,7 @@ struct dio { been performed at the start of a write */ int pages_in_io; /* approximate total IO pages */ + size_t size; /* total request size (doesn't change)*/ sector_t block_in_file; /* Current offset into the underlying file in dio_block units. */ unsigned blocks_available; /* At block_in_file. changes */ @@ -115,9 +116,9 @@ struct dio { int page_errors; /* errno from get_user_pages() */ /* BIO completion state */ - atomic_t bio_count; /* nr bios to be completed */ - atomic_t bios_in_flight; /* nr bios in flight */ - spinlock_t bio_list_lock; /* protects bio_list */ + spinlock_t bio_lock; /* protects BIO fields below */ + int bio_count; /* nr bios to be completed */ + int bios_in_flight; /* nr bios in flight */ struct bio *bio_list; /* singly linked via bi_private */ struct task_struct *waiter; /* waiting task (NULL if none) */ @@ -221,20 +222,38 @@ static void dio_complete(struct dio *dio */ static void finished_one_bio(struct dio *dio) { - if (atomic_dec_and_test(&dio->bio_count)) { + unsigned long flags; + + spin_lock_irqsave(&dio->bio_lock, flags); + if (dio->bio_count == 1) { if (dio->is_async) { + /* + * Last reference to the dio is going away. + * Drop spinlock and complete the DIO. + */ + spin_unlock_irqrestore(&dio->bio_lock, flags); dio_complete(dio, dio->block_in_file << dio->blkbits, dio->result); /* Complete AIO later if falling back to buffered i/o */ - if (dio->result != -ENOTBLK) { + if (dio->result >= dio->size || dio->rw == READ) { aio_complete(dio->iocb, dio->result, 0); kfree(dio); + return; } else { + /* + * Falling back to buffered + */ + spin_lock_irqsave(&dio->bio_lock, flags); + dio->bio_count--; if (dio->waiter) wake_up_process(dio->waiter); + spin_unlock_irqrestore(&dio->bio_lock, flags); + return; } } } + dio->bio_count--; + spin_unlock_irqrestore(&dio->bio_lock, flags); } static int dio_bio_complete(struct dio *dio, struct bio *bio); @@ -268,13 +287,13 @@ static int dio_bio_end_io(struct bio *bi if (bio->bi_size) return 1; - spin_lock_irqsave(&dio->bio_list_lock, flags); + spin_lock_irqsave(&dio->bio_lock, flags); bio->bi_private = dio->bio_list; dio->bio_list = bio; - atomic_dec(&dio->bios_in_flight); - if (dio->waiter && atomic_read(&dio->bios_in_flight) == 0) + dio->bios_in_flight--; + if (dio->waiter && dio->bios_in_flight == 0) wake_up_process(dio->waiter); - spin_unlock_irqrestore(&dio->bio_list_lock, flags); + spin_unlock_irqrestore(&dio->bio_lock, flags); return 0; } @@ -307,10 +326,13 @@ dio_bio_alloc(struct dio *dio, struct bl static void dio_bio_submit(struct dio *dio) { struct bio *bio = dio->bio; + unsigned long flags; bio->bi_private = dio; - atomic_inc(&dio->bio_count); - atomic_inc(&dio->bios_in_flight); + spin_lock_irqsave(&dio->bio_lock, flags); + dio->bio_count++; + dio->bios_in_flight++; + spin_unlock_irqrestore(&dio->bio_lock, flags); if (dio->is_async && dio->rw == READ) bio_set_pages_dirty(bio); submit_bio(dio->rw, bio); @@ -336,22 +358,22 @@ static struct bio *dio_await_one(struct unsigned long flags; struct bio *bio; - spin_lock_irqsave(&dio->bio_list_lock, flags); + spin_lock_irqsave(&dio->bio_lock, flags); while (dio->bio_list == NULL) { set_current_state(TASK_UNINTERRUPTIBLE); if (dio->bio_list == NULL) { dio->waiter = current; - spin_unlock_irqrestore(&dio->bio_list_lock, flags); + spin_unlock_irqrestore(&dio->bio_lock, flags); blk_run_queues(); io_schedule(); - spin_lock_irqsave(&dio->bio_list_lock, flags); + spin_lock_irqsave(&dio->bio_lock, flags); dio->waiter = NULL; } set_current_state(TASK_RUNNING); } bio = dio->bio_list; dio->bio_list = bio->bi_private; - spin_unlock_irqrestore(&dio->bio_list_lock, flags); + spin_unlock_irqrestore(&dio->bio_lock, flags); return bio; } @@ -393,7 +415,12 @@ static int dio_await_completion(struct d if (dio->bio) dio_bio_submit(dio); - while (atomic_read(&dio->bio_count)) { + /* + * The bio_lock is not held for the read of bio_count. + * This is ok since it is the dio_bio_complete() that changes + * bio_count. + */ + while (dio->bio_count) { struct bio *bio = dio_await_one(dio); int ret2; @@ -420,10 +447,10 @@ static int dio_bio_reap(struct dio *dio) unsigned long flags; struct bio *bio; - spin_lock_irqsave(&dio->bio_list_lock, flags); + spin_lock_irqsave(&dio->bio_lock, flags); bio = dio->bio_list; dio->bio_list = bio->bi_private; - spin_unlock_irqrestore(&dio->bio_list_lock, flags); + spin_unlock_irqrestore(&dio->bio_lock, flags); ret = dio_bio_complete(dio, bio); } dio->reap_counter = 0; @@ -889,6 +916,7 @@ direct_io_worker(int rw, struct kiocb *i dio->blkbits = blkbits; dio->blkfactor = inode->i_blkbits - blkbits; dio->start_zero_done = 0; + dio->size = 0; dio->block_in_file = offset >> blkbits; dio->blocks_available = 0; dio->cur_page = NULL; @@ -913,9 +941,9 @@ direct_io_worker(int rw, struct kiocb *i * (or synchronous) device could take the count to zero while we're * still submitting BIOs. */ - atomic_set(&dio->bio_count, 1); - atomic_set(&dio->bios_in_flight, 0); - spin_lock_init(&dio->bio_list_lock); + dio->bio_count = 1; + dio->bios_in_flight = 0; + spin_lock_init(&dio->bio_lock); dio->bio_list = NULL; dio->waiter = NULL; @@ -925,7 +953,7 @@ direct_io_worker(int rw, struct kiocb *i for (seg = 0; seg < nr_segs; seg++) { user_addr = (unsigned long)iov[seg].iov_base; - bytes = iov[seg].iov_len; + dio->size += bytes = iov[seg].iov_len; /* Index into the first page of the first block */ dio->first_block_in_page = (user_addr & ~PAGE_MASK) >> blkbits; @@ -956,6 +984,13 @@ direct_io_worker(int rw, struct kiocb *i } } /* end iovec loop */ + if (ret == -ENOTBLK && rw == WRITE) { + /* + * The remaining part of the request will be + * be handled by buffered I/O when we return + */ + ret = 0; + } /* * There may be some unwritten disk at the end of a part-written * fs-block-sized block. Go zero that now. @@ -985,32 +1020,35 @@ direct_io_worker(int rw, struct kiocb *i * reflect the number of to-be-processed BIOs. */ if (dio->is_async) { - if (ret == 0) - ret = dio->result; /* Bytes written */ - if (ret == -ENOTBLK) { - /* - * The request will be reissued via buffered I/O - * when we return; Any I/O already issued - * effectively becomes redundant. - */ - dio->result = ret; + int should_wait = 0; + + if (dio->result < dio->size && rw == WRITE) { dio->waiter = current; + should_wait = 1; } + if (ret == 0) + ret = dio->result; finished_one_bio(dio); /* This can free the dio */ blk_run_queues(); - if (ret == -ENOTBLK) { + if (should_wait) { + unsigned long flags; /* * Wait for already issued I/O to drain out and * release its references to user-space pages * before returning to fallback on buffered I/O */ + + spin_lock_irqsave(&dio->bio_lock, flags); set_current_state(TASK_UNINTERRUPTIBLE); - while (atomic_read(&dio->bio_count)) { + while (dio->bio_count) { + spin_unlock_irqrestore(&dio->bio_lock, flags); io_schedule(); + spin_lock_irqsave(&dio->bio_lock, flags); set_current_state(TASK_UNINTERRUPTIBLE); } + spin_unlock_irqrestore(&dio->bio_lock, flags); set_current_state(TASK_RUNNING); - dio->waiter = NULL; + kfree(dio); } } else { finished_one_bio(dio); @@ -1032,7 +1070,8 @@ direct_io_worker(int rw, struct kiocb *i } dio_complete(dio, offset, ret); /* We could have also come here on an AIO file extend */ - if (!is_sync_kiocb(iocb) && (ret != -ENOTBLK)) + if (!is_sync_kiocb(iocb) && !(rw == WRITE && ret >= 0 && + dio->result < dio->size)) aio_complete(iocb, ret, 0); kfree(dio); } diff -rupN -X /home/daniel/dontdiff linux-2.6.0-test9-mm5/mm/filemap.c linux-2.6.0-test9-mm5.ddm/mm/filemap.c --- linux-2.6.0-test9-mm5/mm/filemap.c 2003-11-24 09:06:06.000000000 -0800 +++ linux-2.6.0-test9-mm5.ddm/mm/filemap.c 2003-11-21 14:20:09.000000000 -0800 @@ -1908,14 +1908,16 @@ __generic_file_aio_write_nolock(struct k */ if (written >= 0 && file->f_flags & O_SYNC) status = generic_osync_inode(inode, mapping, OSYNC_METADATA); - if (written >= 0 && !is_sync_kiocb(iocb)) + if (written >= count && !is_sync_kiocb(iocb)) written = -EIOCBQUEUED; - if (written != -ENOTBLK) + if (written < 0 || written >= count) goto out_status; /* * direct-io write to a hole: fall through to buffered I/O + * for completing the rest of the request. */ - written = 0; + pos += written; + count -= written; } buf = iov->iov_base; ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch 2003-11-25 23:49 ` [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch Daniel McNeil @ 2003-11-26 7:55 ` Suparna Bhattacharya 2003-12-02 1:35 ` Daniel McNeil 0 siblings, 1 reply; 49+ messages in thread From: Suparna Bhattacharya @ 2003-11-26 7:55 UTC (permalink / raw) To: Daniel McNeil Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio On Tue, Nov 25, 2003 at 03:49:31PM -0800, Daniel McNeil wrote: > Suparna, > > Yes your patch did help. I originally had CONFIG_DEBUG_SLAB=y which > was helping me see problems because the the freed dio was getting > poisoned. I also tested with CONFIG_DEBUG_PAGEALLOC=y which is > very good at catching these. Ah I see - perhaps that explains why neither Janet nor I could recreate the problem that you were hitting so easily. So we should probably try running with CONFIG_DEBUG_SLAB and CONFIG_DEBUG_PAGEALLOC as well. > > I updated your AIO fallback patch plus your AIO race plus I fixed > the bio_count decrement fix. This patch has all three fixes and > it is working for me. > > I fixed the bio_count race, by changing bio_list_lock into bio_lock > and using that for all the bio fields. I changed bio_count and > bios_in_flight from atomics into int. They are now proctected by > the bio_lock. I fixed the race, by in finished_one_bio() by > leaving the bio_count at 1 until after the dio_complete() > and then do the bio_count decrement and wakeup holding the bio_lock. > > Take a look, give it a try, and let me know what you think. I had been trying a slightly different kind of fix -- appended is the updated version of the patch I last posted. It uses the bio_list_lock to protect the dio->waiter field, which finished_one_bio sets back to NULL after it has issued the wakeup; and the code that waits for i/o to drain out checks the dio->waiter field instead of bio_count. This might not seem very obvious given the nomenclature of the bio_list_lock, so I was holding back wondering if it could be improved. Your approach looks clearer in that sense -- its pretty unambiguous about what lock protects what fields. The only thing that bothers me (and this is what I was trying to avoid in my patch) is the increased use of spin_lock_irq 's (overhead of turning interrupts off and on) instead of simple atomic inc/dec in most places. Thoughts ? Regards Suparna -- Suparna Bhattacharya (suparna@in.ibm.com) Linux Technology Center IBM Software Labs, India ------------------------------------ Don't access dio fields if its possible that the dio could already have been freed asynchronously during i/o completion. The dio->bio_list_lock protects the dio->waiter field as in the case of synchronous i/o. --- pure-mm3/fs/direct-io.c 2003-11-24 13:00:33.000000000 +0530 +++ linux-2.6.0-test9-mm3/fs/direct-io.c 2003-11-25 14:08:26.000000000 +0530 @@ -231,8 +231,17 @@ aio_complete(dio->iocb, dio->result, 0); kfree(dio); } else { - if (dio->waiter) - wake_up_process(dio->waiter); + struct task_struct *waiter; + unsigned long flags; + + spin_lock_irqsave(&dio->bio_list_lock, flags); + waiter = dio->waiter; + if (waiter) { + dio->waiter = NULL; + wake_up_process(waiter); + } + spin_unlock_irqrestore(&dio->bio_list_lock, + flags); } } } @@ -994,26 +1004,35 @@ * reflect the number of to-be-processed BIOs. */ if (dio->is_async) { - if (ret == 0) - ret = dio->result; - if (ret > 0 && dio->result < dio->size && rw == WRITE) { + int should_wait = 0; + + if (dio->result < dio->size && rw == WRITE) { dio->waiter = current; + should_wait = 1; } + if (ret == 0) + ret = dio->result; finished_one_bio(dio); /* This can free the dio */ blk_run_queues(); - if (dio->waiter) { + if (should_wait) { + unsigned long flags; /* * Wait for already issued I/O to drain out and * release its references to user-space pages * before returning to fallback on buffered I/O */ + spin_lock_irqsave(&dio->bio_list_lock, flags); set_current_state(TASK_UNINTERRUPTIBLE); - while (atomic_read(&dio->bio_count)) { + while (dio->waiter) { + spin_unlock_irqrestore(&dio->bio_list_lock, + flags); io_schedule(); set_current_state(TASK_UNINTERRUPTIBLE); + spin_lock_irqsave(&dio->bio_list_lock, flags); } set_current_state(TASK_RUNNING); - dio->waiter = NULL; + spin_unlock_irqrestore(&dio->bio_list_lock, flags); + kfree(dio); } } else { finished_one_bio(dio); ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch 2003-11-26 7:55 ` Suparna Bhattacharya @ 2003-12-02 1:35 ` Daniel McNeil 2003-12-02 15:25 ` Suparna Bhattacharya 0 siblings, 1 reply; 49+ messages in thread From: Daniel McNeil @ 2003-12-02 1:35 UTC (permalink / raw) To: Suparna Bhattacharya Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio Suparna, Sorry I did not respond sooner, I was on vacation. Your patch should also fix the problem. I like mine with the cleaner locking. I am not sure your approach has less overhead. At least on x86, cli/sti are fairly inexpensive. The locked xchange or locked inc/dec is what is expensive (from what I understand). So comparing: my patch: Your patch: dio_bio_submit() spin_lock() atomic_inc(bio_count); bio_count++ atomic_inc(bios_in_flight); bios_in_flight++ spin_unlock My guess is that the spin_lock/spin_unlock is faster than 2 atomic_inc's since it is only 1 locked operation (spin_lock) verses 2 (atomic_inc's) finished_one_bio() (normal case) My patch: spin_lock() atomic_dec_and_test(bio_count) bio_count-- spin_unlock() 1 locked instruction each, so very close -- atomic_dec_and_test() does not disable interrupts, so it is probabably a little bit faster. finished_one-bio (fallback case): spin_lock() spin_lock() bio_count--; dio->waiter = null spin_unlock() spin_unlock() Both approaches are the same. dio_bio_complete() spin_lock() spin_lock() bios_in_flight-- atomic_dec() spin_unlock spin_unlock() My patch is faster since it removed 1 locked instruction. Conclusion: My guess would be that both approaches are close, but my patch has less locked instructions but does disable interrupts more. My preference is for the cleaner locking approach that is easier to understand and modify in the future. Daniel On Tue, 2003-11-25 at 23:55, Suparna Bhattacharya wrote: > On Tue, Nov 25, 2003 at 03:49:31PM -0800, Daniel McNeil wrote: > > Suparna, > > > > Yes your patch did help. I originally had CONFIG_DEBUG_SLAB=y which > > was helping me see problems because the the freed dio was getting > > poisoned. I also tested with CONFIG_DEBUG_PAGEALLOC=y which is > > very good at catching these. > > Ah I see - perhaps that explains why neither Janet nor I could > recreate the problem that you were hitting so easily. So we > should probably try running with CONFIG_DEBUG_SLAB and > CONFIG_DEBUG_PAGEALLOC as well. > > > > > I updated your AIO fallback patch plus your AIO race plus I fixed > > the bio_count decrement fix. This patch has all three fixes and > > it is working for me. > > > > I fixed the bio_count race, by changing bio_list_lock into bio_lock > > and using that for all the bio fields. I changed bio_count and > > bios_in_flight from atomics into int. They are now proctected by > > the bio_lock. I fixed the race, by in finished_one_bio() by > > leaving the bio_count at 1 until after the dio_complete() > > and then do the bio_count decrement and wakeup holding the bio_lock. > > > > Take a look, give it a try, and let me know what you think. > > I had been trying a slightly different kind of fix -- appended is > the updated version of the patch I last posted. It uses the bio_list_lock > to protect the dio->waiter field, which finished_one_bio sets back > to NULL after it has issued the wakeup; and the code that waits for > i/o to drain out checks the dio->waiter field instead of bio_count. > This might not seem very obvious given the nomenclature of the > bio_list_lock, so I was holding back wondering if it could be > improved. > > Your approach looks clearer in that sense -- its pretty unambiguous > about what lock protects what fields. The only thing that bothers me (and > this is what I was trying to avoid in my patch) is the increased > use of spin_lock_irq 's (overhead of turning interrupts off and on) > instead of simple atomic inc/dec in most places. > > Thoughts ? > > Regards > Suparna ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch 2003-12-02 1:35 ` Daniel McNeil @ 2003-12-02 15:25 ` Suparna Bhattacharya 2003-12-03 23:14 ` Daniel McNeil 0 siblings, 1 reply; 49+ messages in thread From: Suparna Bhattacharya @ 2003-12-02 15:25 UTC (permalink / raw) To: Daniel McNeil Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio I suspect the degree to which the spin_lock_irq is costlier than atomic_inc/dec would vary across architectures - cli/sti is probably more expensive on certain archs than others. The patch I sent just kept things the way they were in terms of locking costs, assuming that those choices were thought through at that time (should check with akpm). Yours changes it by switching to spin_lock(unlock)_irq instead of atomic_dec in the normal (common) path for finished_one_bio, for both sync and async i/o. At the same time, for the sync i/o case, as you observe it takes away one atomic_dec from dio_bio_end_io. Since these probably aren't really very hot paths ... possibly the difference doesn't matter that much. I do agree that your patch makes the locking easier to follow. Regards Suparna On Mon, Dec 01, 2003 at 05:35:18PM -0800, Daniel McNeil wrote: > Suparna, > > Sorry I did not respond sooner, I was on vacation. > > Your patch should also fix the problem. I like mine with the > cleaner locking. > > I am not sure your approach has less overhead. At least > on x86, cli/sti are fairly inexpensive. The locked xchange or locked > inc/dec is what is expensive (from what I understand). > > So comparing: > > my patch: Your patch: > > dio_bio_submit() > spin_lock() atomic_inc(bio_count); > bio_count++ atomic_inc(bios_in_flight); > bios_in_flight++ > spin_unlock > > My guess is that the spin_lock/spin_unlock is faster than 2 atomic_inc's > since it is only 1 locked operation (spin_lock) verses 2 (atomic_inc's) > > finished_one_bio() (normal case) > > My patch: > spin_lock() atomic_dec_and_test(bio_count) > bio_count-- > spin_unlock() > > 1 locked instruction each, so very close -- atomic_dec_and_test() does > not disable interrupts, so it is probabably a little bit faster. > > finished_one-bio (fallback case): > > spin_lock() spin_lock() > bio_count--; dio->waiter = null > spin_unlock() spin_unlock() > > Both approaches are the same. > > dio_bio_complete() > > spin_lock() spin_lock() > bios_in_flight-- atomic_dec() > spin_unlock spin_unlock() > > My patch is faster since it removed 1 locked instruction. > > Conclusion: > > My guess would be that both approaches are close, but my patch > has less locked instructions but does disable interrupts more. > My preference is for the cleaner locking approach that is easier > to understand and modify in the future. > > Daniel > > On Tue, 2003-11-25 at 23:55, Suparna Bhattacharya wrote: > > On Tue, Nov 25, 2003 at 03:49:31PM -0800, Daniel McNeil wrote: > > > Suparna, > > > > > > Yes your patch did help. I originally had CONFIG_DEBUG_SLAB=y which > > > was helping me see problems because the the freed dio was getting > > > poisoned. I also tested with CONFIG_DEBUG_PAGEALLOC=y which is > > > very good at catching these. > > > > Ah I see - perhaps that explains why neither Janet nor I could > > recreate the problem that you were hitting so easily. So we > > should probably try running with CONFIG_DEBUG_SLAB and > > CONFIG_DEBUG_PAGEALLOC as well. > > > > > > > > I updated your AIO fallback patch plus your AIO race plus I fixed > > > the bio_count decrement fix. This patch has all three fixes and > > > it is working for me. > > > > > > I fixed the bio_count race, by changing bio_list_lock into bio_lock > > > and using that for all the bio fields. I changed bio_count and > > > bios_in_flight from atomics into int. They are now proctected by > > > the bio_lock. I fixed the race, by in finished_one_bio() by > > > leaving the bio_count at 1 until after the dio_complete() > > > and then do the bio_count decrement and wakeup holding the bio_lock. > > > > > > Take a look, give it a try, and let me know what you think. > > > > I had been trying a slightly different kind of fix -- appended is > > the updated version of the patch I last posted. It uses the bio_list_lock > > to protect the dio->waiter field, which finished_one_bio sets back > > to NULL after it has issued the wakeup; and the code that waits for > > i/o to drain out checks the dio->waiter field instead of bio_count. > > This might not seem very obvious given the nomenclature of the > > bio_list_lock, so I was holding back wondering if it could be > > improved. > > > > Your approach looks clearer in that sense -- its pretty unambiguous > > about what lock protects what fields. The only thing that bothers me (and > > this is what I was trying to avoid in my patch) is the increased > > use of spin_lock_irq 's (overhead of turning interrupts off and on) > > instead of simple atomic inc/dec in most places. > > > > Thoughts ? > > > > Regards > > Suparna > > -- > To unsubscribe, send a message with 'unsubscribe linux-aio' in > the body to majordomo@kvack.org. For more info on Linux AIO, > see: http://www.kvack.org/aio/ > Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> -- Suparna Bhattacharya (suparna@in.ibm.com) Linux Technology Center IBM Software Lab, India ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch 2003-12-02 15:25 ` Suparna Bhattacharya @ 2003-12-03 23:14 ` Daniel McNeil 2003-12-04 4:40 ` Suparna Bhattacharya 0 siblings, 1 reply; 49+ messages in thread From: Daniel McNeil @ 2003-12-03 23:14 UTC (permalink / raw) To: Suparna Bhattacharya Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio Suparna, I did a quick test of your patch and my patch by running my aiocp program to write zeros to a file and to read a file. I used a 50MB file on ext2 file system on a ramdisk. The machine is a 2-proc IBM box with: model name : Intel(R) Xeon(TM) CPU 1700MHz stepping : 10 cpu MHz : 1686.033 cache size : 256 KB The write test was: time aiocp -n 32 -b 1k -s 50m -z -f DIRECT file The read test was time aiocp -n 32 -b 1k -s 50 -w -f DIRECT file I ran each test more than 10 times and here are the averages: my patch your patch aiocp write real 0.7328 real 0.7756 user 0.01425 user 0.01221 sys 0.716 sys 0.76157 aiocp read real 0.7250 real 0.7456 user 0.0144 user 0.0130 sys 0.07149 sys 0.7307 It looks like using the spin_lock instead of the atomic inc/dec is very close performance wise. The spin_lock averages a bit faster. This is not testing the fallback base, but both patches would be very similar in performance for that case. I don't have any non-intel hardware to test with. Daniel On Tue, 2003-12-02 at 07:25, Suparna Bhattacharya wrote: > I suspect the degree to which the spin_lock_irq is costlier > than atomic_inc/dec would vary across architectures - cli/sti > is probably more expensive on certain archs than others. > > The patch I sent just kept things the way they were in terms of > locking costs, assuming that those choices were thought through > at that time (should check with akpm). Yours changes it by > switching to spin_lock(unlock)_irq instead of atomic_dec in > the normal (common) path for finished_one_bio, for both sync > and async i/o. At the same time, for the sync i/o case, as > you observe it takes away one atomic_dec from dio_bio_end_io. > > Since these probably aren't really very hot paths ... possibly > the difference doesn't matter that much. I do agree that your > patch makes the locking easier to follow. > > Regards > Suparna > > On Mon, Dec 01, 2003 at 05:35:18PM -0800, Daniel McNeil wrote: > > Suparna, > > > > Sorry I did not respond sooner, I was on vacation. > > > > Your patch should also fix the problem. I like mine with the > > cleaner locking. > > > > I am not sure your approach has less overhead. At least > > on x86, cli/sti are fairly inexpensive. The locked xchange or locked > > inc/dec is what is expensive (from what I understand). > > > > So comparing: > > > > my patch: Your patch: > > > > dio_bio_submit() > > spin_lock() atomic_inc(bio_count); > > bio_count++ atomic_inc(bios_in_flight); > > bios_in_flight++ > > spin_unlock > > > > My guess is that the spin_lock/spin_unlock is faster than 2 atomic_inc's > > since it is only 1 locked operation (spin_lock) verses 2 (atomic_inc's) > > > > finished_one_bio() (normal case) > > > > My patch: > > spin_lock() atomic_dec_and_test(bio_count) > > bio_count-- > > spin_unlock() > > > > 1 locked instruction each, so very close -- atomic_dec_and_test() does > > not disable interrupts, so it is probabably a little bit faster. > > > > finished_one-bio (fallback case): > > > > spin_lock() spin_lock() > > bio_count--; dio->waiter = null > > spin_unlock() spin_unlock() > > > > Both approaches are the same. > > > > dio_bio_complete() > > > > spin_lock() spin_lock() > > bios_in_flight-- atomic_dec() > > spin_unlock spin_unlock() > > > > My patch is faster since it removed 1 locked instruction. > > > > Conclusion: > > > > My guess would be that both approaches are close, but my patch > > has less locked instructions but does disable interrupts more. > > My preference is for the cleaner locking approach that is easier > > to understand and modify in the future. > > > > Daniel > > > > On Tue, 2003-11-25 at 23:55, Suparna Bhattacharya wrote: > > > On Tue, Nov 25, 2003 at 03:49:31PM -0800, Daniel McNeil wrote: > > > > Suparna, > > > > > > > > Yes your patch did help. I originally had CONFIG_DEBUG_SLAB=y which > > > > was helping me see problems because the the freed dio was getting > > > > poisoned. I also tested with CONFIG_DEBUG_PAGEALLOC=y which is > > > > very good at catching these. > > > > > > Ah I see - perhaps that explains why neither Janet nor I could > > > recreate the problem that you were hitting so easily. So we > > > should probably try running with CONFIG_DEBUG_SLAB and > > > CONFIG_DEBUG_PAGEALLOC as well. > > > > > > > > > > > I updated your AIO fallback patch plus your AIO race plus I fixed > > > > the bio_count decrement fix. This patch has all three fixes and > > > > it is working for me. > > > > > > > > I fixed the bio_count race, by changing bio_list_lock into bio_lock > > > > and using that for all the bio fields. I changed bio_count and > > > > bios_in_flight from atomics into int. They are now proctected by > > > > the bio_lock. I fixed the race, by in finished_one_bio() by > > > > leaving the bio_count at 1 until after the dio_complete() > > > > and then do the bio_count decrement and wakeup holding the bio_lock. > > > > > > > > Take a look, give it a try, and let me know what you think. > > > > > > I had been trying a slightly different kind of fix -- appended is > > > the updated version of the patch I last posted. It uses the bio_list_lock > > > to protect the dio->waiter field, which finished_one_bio sets back > > > to NULL after it has issued the wakeup; and the code that waits for > > > i/o to drain out checks the dio->waiter field instead of bio_count. > > > This might not seem very obvious given the nomenclature of the > > > bio_list_lock, so I was holding back wondering if it could be > > > improved. > > > > > > Your approach looks clearer in that sense -- its pretty unambiguous > > > about what lock protects what fields. The only thing that bothers me (and > > > this is what I was trying to avoid in my patch) is the increased > > > use of spin_lock_irq 's (overhead of turning interrupts off and on) > > > instead of simple atomic inc/dec in most places. > > > > > > Thoughts ? > > > > > > Regards > > > Suparna > > > > -- > > To unsubscribe, send a message with 'unsubscribe linux-aio' in > > the body to majordomo@kvack.org. For more info on Linux AIO, > > see: http://www.kvack.org/aio/ > > Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch 2003-12-03 23:14 ` Daniel McNeil @ 2003-12-04 4:40 ` Suparna Bhattacharya 0 siblings, 0 replies; 49+ messages in thread From: Suparna Bhattacharya @ 2003-12-04 4:40 UTC (permalink / raw) To: Daniel McNeil Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio Interesting :) Could it be that extra atomic_dec in dio_bio_end_io in the current code that makes the difference ? Or are spin_lock_irq and atomic_inc/dec really that close ? Would be good to know from other archs just to keep in mind in the future. As for the fallback case, performance doesn't even matter - its not a typical situation. So we don't need to bother with that. Unless someone thinks otherwise, we should go with your fix. Regards Suparna On Wed, Dec 03, 2003 at 03:14:06PM -0800, Daniel McNeil wrote: > Suparna, > > I did a quick test of your patch and my patch by running my aiocp > program to write zeros to a file and to read a file. I used > a 50MB file on ext2 file system on a ramdisk. The machine > is a 2-proc IBM box with: > > model name : Intel(R) Xeon(TM) CPU 1700MHz > stepping : 10 > cpu MHz : 1686.033 > cache size : 256 KB > > The write test was: > > time aiocp -n 32 -b 1k -s 50m -z -f DIRECT file > > The read test was > > time aiocp -n 32 -b 1k -s 50 -w -f DIRECT file > > I ran each test more than 10 times and here are the averages: > > my patch your patch > > aiocp write real 0.7328 real 0.7756 > user 0.01425 user 0.01221 > sys 0.716 sys 0.76157 > > aiocp read real 0.7250 real 0.7456 > user 0.0144 user 0.0130 > sys 0.07149 sys 0.7307 > > It looks like using the spin_lock instead of the atomic inc/dec > is very close performance wise. The spin_lock averages a bit > faster. This is not testing the fallback base, but both patches > would be very similar in performance for that case. > > I don't have any non-intel hardware to test with. > > Daniel > > > > On Tue, 2003-12-02 at 07:25, Suparna Bhattacharya wrote: > > I suspect the degree to which the spin_lock_irq is costlier > > than atomic_inc/dec would vary across architectures - cli/sti > > is probably more expensive on certain archs than others. > > > > The patch I sent just kept things the way they were in terms of > > locking costs, assuming that those choices were thought through > > at that time (should check with akpm). Yours changes it by > > switching to spin_lock(unlock)_irq instead of atomic_dec in > > the normal (common) path for finished_one_bio, for both sync > > and async i/o. At the same time, for the sync i/o case, as > > you observe it takes away one atomic_dec from dio_bio_end_io. > > > > Since these probably aren't really very hot paths ... possibly > > the difference doesn't matter that much. I do agree that your > > patch makes the locking easier to follow. > > > > Regards > > Suparna > > > > On Mon, Dec 01, 2003 at 05:35:18PM -0800, Daniel McNeil wrote: > > > Suparna, > > > > > > Sorry I did not respond sooner, I was on vacation. > > > > > > Your patch should also fix the problem. I like mine with the > > > cleaner locking. > > > > > > I am not sure your approach has less overhead. At least > > > on x86, cli/sti are fairly inexpensive. The locked xchange or locked > > > inc/dec is what is expensive (from what I understand). > > > > > > So comparing: > > > > > > my patch: Your patch: > > > > > > dio_bio_submit() > > > spin_lock() atomic_inc(bio_count); > > > bio_count++ atomic_inc(bios_in_flight); > > > bios_in_flight++ > > > spin_unlock > > > > > > My guess is that the spin_lock/spin_unlock is faster than 2 atomic_inc's > > > since it is only 1 locked operation (spin_lock) verses 2 (atomic_inc's) > > > > > > finished_one_bio() (normal case) > > > > > > My patch: > > > spin_lock() atomic_dec_and_test(bio_count) > > > bio_count-- > > > spin_unlock() > > > > > > 1 locked instruction each, so very close -- atomic_dec_and_test() does > > > not disable interrupts, so it is probabably a little bit faster. > > > > > > finished_one-bio (fallback case): > > > > > > spin_lock() spin_lock() > > > bio_count--; dio->waiter = null > > > spin_unlock() spin_unlock() > > > > > > Both approaches are the same. > > > > > > dio_bio_complete() > > > > > > spin_lock() spin_lock() > > > bios_in_flight-- atomic_dec() > > > spin_unlock spin_unlock() > > > > > > My patch is faster since it removed 1 locked instruction. > > > > > > Conclusion: > > > > > > My guess would be that both approaches are close, but my patch > > > has less locked instructions but does disable interrupts more. > > > My preference is for the cleaner locking approach that is easier > > > to understand and modify in the future. > > > > > > Daniel > > > > > > On Tue, 2003-11-25 at 23:55, Suparna Bhattacharya wrote: > > > > On Tue, Nov 25, 2003 at 03:49:31PM -0800, Daniel McNeil wrote: > > > > > Suparna, > > > > > > > > > > Yes your patch did help. I originally had CONFIG_DEBUG_SLAB=y which > > > > > was helping me see problems because the the freed dio was getting > > > > > poisoned. I also tested with CONFIG_DEBUG_PAGEALLOC=y which is > > > > > very good at catching these. > > > > > > > > Ah I see - perhaps that explains why neither Janet nor I could > > > > recreate the problem that you were hitting so easily. So we > > > > should probably try running with CONFIG_DEBUG_SLAB and > > > > CONFIG_DEBUG_PAGEALLOC as well. > > > > > > > > > > > > > > I updated your AIO fallback patch plus your AIO race plus I fixed > > > > > the bio_count decrement fix. This patch has all three fixes and > > > > > it is working for me. > > > > > > > > > > I fixed the bio_count race, by changing bio_list_lock into bio_lock > > > > > and using that for all the bio fields. I changed bio_count and > > > > > bios_in_flight from atomics into int. They are now proctected by > > > > > the bio_lock. I fixed the race, by in finished_one_bio() by > > > > > leaving the bio_count at 1 until after the dio_complete() > > > > > and then do the bio_count decrement and wakeup holding the bio_lock. > > > > > > > > > > Take a look, give it a try, and let me know what you think. > > > > > > > > I had been trying a slightly different kind of fix -- appended is > > > > the updated version of the patch I last posted. It uses the bio_list_lock > > > > to protect the dio->waiter field, which finished_one_bio sets back > > > > to NULL after it has issued the wakeup; and the code that waits for > > > > i/o to drain out checks the dio->waiter field instead of bio_count. > > > > This might not seem very obvious given the nomenclature of the > > > > bio_list_lock, so I was holding back wondering if it could be > > > > improved. > > > > > > > > Your approach looks clearer in that sense -- its pretty unambiguous > > > > about what lock protects what fields. The only thing that bothers me (and > > > > this is what I was trying to avoid in my patch) is the increased > > > > use of spin_lock_irq 's (overhead of turning interrupts off and on) > > > > instead of simple atomic inc/dec in most places. > > > > > > > > Thoughts ? > > > > > > > > Regards > > > > Suparna > > > > > > -- > > > To unsubscribe, send a message with 'unsubscribe linux-aio' in > > > the body to majordomo@kvack.org. For more info on Linux AIO, > > > see: http://www.kvack.org/aio/ > > > Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> > -- Suparna Bhattacharya (suparna@in.ibm.com) Linux Technology Center IBM Software Lab, India ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 (compile stats) 2003-11-13 7:30 2.6.0-test9-mm3 Andrew Morton 2003-11-13 20:03 ` [PATCH] linux-2.6.0-test9-mm3_verbose-timesource-acpi-pm_A0 john stultz 2003-11-13 22:03 ` 2.6.0-test9-mm3 - AIO test results Daniel McNeil @ 2003-11-13 22:04 ` John Cherry 2003-11-14 5:07 ` 2.6.0-test9-mm3 Martin J. Bligh 2003-11-14 19:08 ` 2.6.0-test9-mm3 Martin J. Bligh 4 siblings, 0 replies; 49+ messages in thread From: John Cherry @ 2003-11-13 22:04 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Linux 2.6 (mm tree) Compile Statistics (gcc 3.2.2) Warnings/Errors Summary Kernel bzImage bzImage bzImage modules bzImage modules (defconfig) (allno) (allyes) (allyes) (allmod) (allmod) --------------- ---------- -------- -------- -------- -------- --------- 2.6.0-test9-mm3 0w/0e 0w/0e 172w/ 0e 12w/0e 3w/0e 211w/0e 2.6.0-test9-mm2 0w/0e 0w/0e 172w/ 0e 12w/0e 3w/0e 211w/1e 2.6.0-test9-mm1 0w/0e 0w/0e 179w/ 1e 12w/0e 3w/0e 213w/1e 2.6.0-test8-mm1 0w/0e 0w/0e 183w/ 1e 13w/0e 3w/0e 223w/1e 2.6.0-test7-mm1 0w/0e 1w/0e 176w/ 1e 9w/0e 3w/0e 231w/1e 2.6.0-test6-mm4 0w/0e 1w/0e 179w/ 1e 9w/0e 3w/0e 234w/1e 2.6.0-test6-mm3 0w/0e 1w/0e 178w/ 1e 9w/0e 3w/0e 252w/2e 2.6.0-test6-mm2 0w/0e 1w/0e 179w/ 1e 9w/0e 3w/0e 252w/2e 2.6.0-test6-mm1 0w/0e 1w/0e 179w/ 1e 9w/0e 3w/0e 252w/2e Web page with links to complete details: http://developer.osdl.org/cherry/compile/ Version information for host [ cherrypit.pdx.osdl.net ] gcc: 3.2.2 patch: 2.5.4 Kernel version: 2.6.0-test9-mm3 Kernel build: Making bzImage (defconfig): 0 warnings, 0 errors Making modules (defconfig): 0 warnings, 0 errors Making bzImage (allnoconfig): 0 warnings, 0 errors Making bzImage (allyesconfig): 172 warnings, 0 errors Making modules (allyesconfig): 12 warnings, 0 errors Making bzImage (allmodconfig): 3 warnings, 0 errors Making modules (allmodconfig): 211 warnings, 0 errors Building directories: Building fs/adfs: clean Building fs/affs: clean Building fs/afs: clean Building fs/autofs: clean Building fs/autofs4: clean Building fs/befs: clean Building fs/bfs: clean Building fs/cifs: clean Building fs/coda: clean Building fs/cramfs: clean Building fs/devfs: clean Building fs/devpts: clean Building fs/efs: clean Building fs/exportfs: clean Building fs/ext2: clean Building fs/ext3: clean Building fs/fat: clean Building fs/freevxfs: clean Building fs/hfs: clean Building fs/hpfs: clean Building fs/hugetlbfs: clean Building fs/intermezzo: clean Building fs/isofs: clean Building fs/jbd: clean Building fs/jffs: clean Building fs/jffs2: clean Building fs/jfs: clean Building fs/lockd: clean Building fs/minix: clean Building fs/msdos: clean Building fs/ncpfs: clean Building fs/nfs: clean Building fs/nfsd: clean Building fs/nls: clean Building fs/ntfs: clean Building fs/partitions: clean Building fs/proc: clean Building fs/qnx4: clean Building fs/ramfs: clean Building fs/reiserfs: clean Building fs/romfs: clean Building fs/smbfs: clean Building fs/sysfs: clean Building fs/sysv: clean Building fs/udf: clean Building fs/ufs: clean Building fs/vfat: clean Building fs/xfs: clean Building drivers/i2c: clean Building drivers/net: 31 warnings, 0 errors Building drivers/media: 1 warnings, 0 errors Building drivers/base: clean Building drivers/pci: clean Building drivers/eisa: clean Building drivers/isdn: clean Building drivers/char: 1 warnings, 0 errors Building drivers/acpi: clean Building drivers/serial: 1 warnings, 0 errors Building drivers/fc4: clean Building drivers/parport: clean Building drivers/mtd: 23 warnings, 0 errors Building drivers/usb: clean Building drivers/block: 1 warnings, 0 errors Building drivers/pcmcia: 3 warnings, 0 errors Building drivers/input: clean Building drivers/atm: clean Building drivers/ide: 30 warnings, 0 errors Building drivers/pnp: clean Building drivers/oprofile: clean Building drivers/ieee1394: clean Building drivers/cdrom: 3 warnings, 0 errors Building drivers/md: clean Building drivers/message: 1 warnings, 0 errors Building drivers/cpufreq: clean Building drivers/sbus: clean Building drivers/bluetooth: clean Building drivers/telephony: 5 warnings, 0 errors Building drivers/zorro: clean Building drivers/acorn: clean Building drivers/tc: clean Building drivers/mca: clean Building drivers/nubus: clean Building drivers/misc: clean Building drivers/dio: clean Building drivers/scsi/aacraid: clean Building drivers/scsi/aic7xxx: clean Building drivers/scsi/pcmcia: 4 warnings, 0 errors Building drivers/scsi/sym53c8xx_2: clean Building drivers/video/aty: 3 warnings, 0 errors Building drivers/video/console: 2 warnings, 0 errors Building drivers/video/i810: clean Building drivers/video/logo: clean Building drivers/video/matrox: 5 warnings, 0 errors Building drivers/video/riva: clean Building drivers/video/sis: 1 warnings, 0 errors Building sound/core: clean Building sound/drivers: clean Building sound/i2c: clean Building sound/isa: 3 warnings, 0 errors Building sound/oss: 33 warnings, 0 errors Building sound/pci: clean Building sound/pcmcia: clean Building sound/synth: clean Building sound/usb: clean Building arch/i386: clean Building crypto: clean Building lib: clean Building net: 9 warnings, 0 errors Building security: clean Building sound: clean Building usr: clean Building fs: clean Building drivers/video: 8 warnings, 0 errors Building drivers/scsi: 44 warnings, 0 errors Building drivers/net: 0 warnings, 1 errors Error Summary (individual module builds): drivers/net: 0 warnings, 1 errors Warning Summary (individual module builds): drivers/block: 1 warnings, 0 errors drivers/cdrom: 3 warnings, 0 errors drivers/char: 1 warnings, 0 errors drivers/ide: 30 warnings, 0 errors drivers/media: 1 warnings, 0 errors drivers/message: 1 warnings, 0 errors drivers/mtd: 23 warnings, 0 errors drivers/net: 31 warnings, 0 errors drivers/pcmcia: 3 warnings, 0 errors drivers/scsi/pcmcia: 4 warnings, 0 errors drivers/scsi: 44 warnings, 0 errors drivers/serial: 1 warnings, 0 errors drivers/telephony: 5 warnings, 0 errors drivers/video/aty: 3 warnings, 0 errors drivers/video/console: 2 warnings, 0 errors drivers/video/matrox: 5 warnings, 0 errors drivers/video/sis: 1 warnings, 0 errors drivers/video: 8 warnings, 0 errors net: 9 warnings, 0 errors sound/isa: 3 warnings, 0 errors sound/oss: 33 warnings, 0 errors Error List: make[1]: [arch/i386/boot/bzImage] Error 1 (ignored) make[2]: [drivers/net/wan/wanxlfw.inc] Error 127 (ignored) Warning List: arch/i386/kernel/cpu/cpufreq/powernow-k8.c:38:2: warning: #warning this driver has not been tested on a preempt system arch/i386/kernel/cpu/cpufreq/powernow-k8.c:938:2: warning: #warning pol->policy is in undefined state here drivers/cdrom/aztcd.c:379: warning: `pa_ok' defined but not used drivers/cdrom/isp16.c:124: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/cdrom/mcdx.h:180:2: warning: #warning You have not edited mcdx.h drivers/cdrom/mcdx.h:181:2: warning: #warning Perhaps irq and i/o settings are wrong. drivers/cdrom/sjcd.c:1700: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/char/applicom.c:522:2: warning: #warning "Je suis stupide. DW. - copy*user in cli" drivers/char/applicom.c:67: warning: `applicom_pci_tbl' defined but not used drivers/char/watchdog/alim1535_wdt.c:320: warning: `ali_pci_tbl' defined but not used drivers/ide/ide-probe.c:1326: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/ide-probe.c:1353: warning: `MOD_DEC_USE_COUNT' is deprecated (declared at include/linux/module.h:494) drivers/ide/ide-tape.c:6213: warning: duplicate `const' drivers/ide/ide.c:2470: warning: implicit declaration of function `pnpide_init' drivers/ide/legacy/ide-cs.c:365: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/legacy/ide-cs.c:411: warning: `MOD_DEC_USE_COUNT' is deprecated (declared at include/linux/module.h:494) drivers/ide/pci/aec62xx.c:533: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/alim15x3.c:871: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/amd74xx.c:451: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/cmd64x.c:755: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/cs5520.c:294: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/cs5530.c:416: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/cy82c693.c:437: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/hpt34x.c:334: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/hpt366.c:1223: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/ns87415.c:228: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/opti621.c:364: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/pdc202xx_new.c:631: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/pdc202xx_old.c:925: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/piix.c:746: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/rz1000.c:65: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/sc1200.c:557: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/serverworks.c:804: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/siimage.c:1174: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/sis5513.c:956: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/slc90e66.c:376: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/triflex.c:227: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/trm290.c:378: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/ide/pci/trm290.c:406: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/ide/pci/via82cxxx.c:618: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/input/gameport/ns558.c:121: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/input/gameport/ns558.c:80: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/media/common/saa7146_vbi.c:6: warning: `vbi_workaround' defined but not used drivers/media/video/zoran_card.c:149: warning: `zr36067_pci_tbl' defined but not used drivers/message/fusion/mptscsih.c:6922: warning: `mptscsih_setup' defined but not used drivers/message/i2o/i2o_block.c:1506: warning: `MOD_DEC_USE_COUNT' is deprecated (declared at include/linux/module.h:494) drivers/mtd/chips/amd_flash.c:783: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/mtd/chips/cfi_cmdset_0001.c:381: warning: unsigned int format, different type arg (arg 2) drivers/mtd/chips/cfi_cmdset_0001.c:965: warning: unsigned int format, different type arg (arg 2) drivers/mtd/chips/cfi_cmdset_0002.c:1157: warning: unsigned int format, different type arg (arg 4) drivers/mtd/chips/cfi_cmdset_0002.c:513: warning: unsigned int format, different type arg (arg 4) drivers/mtd/chips/cfi_cmdset_0002.c:651: warning: unsigned int format, different type arg (arg 4) drivers/mtd/chips/cfi_cmdset_0002.c:977: warning: unsigned int format, different type arg (arg 4) drivers/mtd/chips/cfi_cmdset_0020.c:1139: warning: unsigned int format, different type arg (arg 3) drivers/mtd/chips/cfi_cmdset_0020.c:1288: warning: unsigned int format, different type arg (arg 3) drivers/mtd/chips/cfi_cmdset_0020.c:493: warning: unsigned int format, different type arg (arg 3) drivers/mtd/chips/cfi_cmdset_0020.c:853: warning: unsigned int format, different type arg (arg 3) drivers/mtd/chips/sharp.c:157: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/mtd/cmdlinepart.c:344: warning: `mtdpart_setup' defined but not used drivers/mtd/devices/doc2000.c:567: warning: assignment from incompatible pointer type drivers/mtd/devices/doc2000.c:568: warning: assignment from incompatible pointer type drivers/mtd/devices/doc2001.c:376: warning: assignment from incompatible pointer type drivers/mtd/devices/doc2001.c:377: warning: assignment from incompatible pointer type drivers/mtd/nftlcore.c:354: warning: passing arg 7 of pointer to function makes pointer from integer without a cast drivers/mtd/nftlcore.c:358: warning: passing arg 7 of pointer to function makes pointer from integer without a cast drivers/mtd/nftlcore.c:363: warning: passing arg 7 of pointer to function makes pointer from integer without a cast drivers/mtd/nftlcore.c:632: warning: passing arg 7 of pointer to function makes pointer from integer without a cast drivers/mtd/nftlcore.c:696: warning: passing arg 7 of pointer to function makes pointer from integer without a cast drivers/mtd/nftlmount.c:220: warning: passing arg 7 of pointer to function makes pointer from integer without a cast drivers/net/3c515.c:529: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/net/acenic.c:135: warning: `acenic_pci_tbl' defined but not used drivers/net/arcnet/arc-rimi.c:319: warning: `dev_alloc' is deprecated (declared at include/linux/netdevice.h:525) drivers/net/arcnet/com20020-isa.c:152: warning: `dev_alloc' is deprecated (declared at include/linux/netdevice.h:525) drivers/net/arcnet/com20020-pci.c:71: warning: `dev_alloc' is deprecated (declared at include/linux/netdevice.h:525) drivers/net/arcnet/com90io.c:385: warning: `dev_alloc' is deprecated (declared at include/linux/netdevice.h:525) drivers/net/arcnet/com90xx.c:146: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/net/arcnet/com90xx.c:412: warning: `dev_alloc' is deprecated (declared at include/linux/netdevice.h:525) drivers/net/arcnet/com90xx.c:609: warning: `dev_alloc' is deprecated (declared at include/linux/netdevice.h:525) drivers/net/dgrs.c:124: warning: `dgrs_pci_tbl' defined but not used drivers/net/eepro.c:575: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/net/ewrk3.c:1291: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/net/ewrk3.c:1335: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/net/hp100.c:288: warning: `hp100_pci_tbl' defined but not used drivers/net/hp100.c:385: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/net/hp100.c:432: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/net/hp100.c:463: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/net/hp100.c:471: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/net/sk98lin/skaddr.c:1092: warning: `ReturnCode' might be used uninitialized in this function drivers/net/sk98lin/skaddr.c:1624: warning: `ReturnCode' might be used uninitialized in this function drivers/net/skfp/skfddi.c:185: warning: `skfddi_pci_tbl' defined but not used drivers/net/tokenring/smctr.c:3494: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/net/tokenring/smctr.c:733: warning: `MOD_DEC_USE_COUNT' is deprecated (declared at include/linux/module.h:494) drivers/net/tulip/winbond-840.c:149: warning: `version' defined but not used drivers/net/wan/cycx_drv.c:430: warning: long unsigned int format, u32 arg (arg 2) drivers/net/wan/farsync.c:1316: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/net/wan/farsync.c:1329: warning: `MOD_DEC_USE_COUNT' is deprecated (declared at include/linux/module.h:494) drivers/net/wan/hostess_sv11.c:125: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/net/wan/hostess_sv11.c:157: warning: `MOD_DEC_USE_COUNT' is deprecated (declared at include/linux/module.h:494) drivers/net/wan/lmc/lmc_main.c:1063: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/net/wan/lmc/lmc_main.c:1184: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/net/wan/lmc/lmc_main.c:1355: warning: `MOD_DEC_USE_COUNT' is deprecated (declared at include/linux/module.h:494) drivers/net/wan/pc300_drv.c:3168: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/net/wan/pc300_drv.c:3204: warning: `MOD_DEC_USE_COUNT' is deprecated (declared at include/linux/module.h:494) drivers/net/wan/sbni.c:308: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/pcmcia/i82365.c:680: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/pcmcia/i82365.c:817: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/pcmcia/tcic.c:340: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/BusLogic.c:1003: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/BusLogic.c:1008: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/BusLogic.c:700: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/BusLogic.c:704: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/BusLogic.c:708: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/BusLogic.c:712: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/BusLogic.c:716: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/BusLogic.c:720: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/BusLogic.c:973: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/BusLogic.c:988: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/BusLogic.c:993: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/BusLogic.c:998: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/NCR5380.c:396: warning: `phases' defined but not used drivers/scsi/NCR5380.c:699: warning: `NCR5380_probe_irq' defined but not used drivers/scsi/NCR5380.c:756: warning: `NCR5380_print_options' defined but not used drivers/scsi/NCR53c406a.c:611: warning: `NCR53c406a_setup' defined but not used drivers/scsi/NCR53c406a.c:660: warning: initialization from incompatible pointer type drivers/scsi/NCR53c406a.c:669: warning: `wait_intr' defined but not used drivers/scsi/advansys.c:10006: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/advansys.c:4622: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/aha152x.c:396: warning: `id_table' defined but not used drivers/scsi/aha152x.c:793: warning: `aha152x_setup' defined but not used drivers/scsi/aha152x.c:852: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/aha152x.c:870: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/atp870u.c:2350: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/atp870u.c:2422: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/cpqfcTSinit.c:1583: warning: unused variable `timeout' drivers/scsi/cpqfcTSinit.c:1584: warning: unused variable `retries' drivers/scsi/cpqfcTSinit.c:1585: warning: unused variable `scsi_cdb' drivers/scsi/cpqfcTSinit.c:471: warning: `my_ioctl_done' defined but not used drivers/scsi/dtc.c:187: warning: `dtc_setup' defined but not used drivers/scsi/eata_pio.c:596: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/fd_mcs.c:300: warning: `fd_mcs_setup' defined but not used drivers/scsi/fd_mcs.c:311: warning: initialization from incompatible pointer type drivers/scsi/fd_mcs.h:27: warning: `fd_mcs_command' declared `static' but never defined drivers/scsi/fdomain.c:763: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/g_NCR5380.c:926: warning: `id_table' defined but not used drivers/scsi/gdth.c:881: warning: `gdthtable' defined but not used drivers/scsi/inia100.h:70: warning: `inia100_detect' declared `static' but never defined drivers/scsi/inia100.h:71: warning: `inia100_release' declared `static' but never defined drivers/scsi/inia100.h:72: warning: `inia100_queue' declared `static' but never defined drivers/scsi/inia100.h:73: warning: `inia100_abort' declared `static' but never defined drivers/scsi/inia100.h:74: warning: `inia100_device_reset' declared `static' but never defined drivers/scsi/inia100.h:75: warning: `inia100_bus_reset' declared `static' but never defined drivers/scsi/libata-core.c:2133: warning: `ata_qc_push' defined but not used drivers/scsi/psi240i.c:713: warning: initialization from incompatible pointer type drivers/scsi/psi240i.c:714: warning: initialization from incompatible pointer type drivers/scsi/sym53c416.c:627: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/sym53c416.c:715: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/scsi/wd7000.c:1611: warning: `wd7000_abort' defined but not used drivers/serial/8250.c:693: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/telephony/ixj.c:7737: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/telephony/ixj.c:7799: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/telephony/ixj.c:7835: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) drivers/telephony/ixj.h:41: warning: `ixj_h_rcsid' defined but not used drivers/usb/class/usb-midi.h:150: warning: `usb_midi_ids' defined but not used drivers/video/aty/aty128fb.c:2335: warning: `aty128fb_exit' defined but not used drivers/video/aty/aty128fb.c:254: warning: `mode' defined but not used drivers/video/aty/aty128fb.c:256: warning: `nomtrr' defined but not used drivers/video/console/mdacon.c:374: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) drivers/video/console/mdacon.c:384: warning: `MOD_DEC_USE_COUNT' is deprecated (declared at include/linux/module.h:494) drivers/video/hgafb.c:452: warning: `hgafb_fillrect' defined but not used drivers/video/hgafb.c:472: warning: `hgafb_copyarea' defined but not used drivers/video/hgafb.c:502: warning: `hgafb_imageblit' defined but not used drivers/video/imsttfb.c:1089: warning: `imsttfb_load_cursor_image' defined but not used drivers/video/imsttfb.c:1159: warning: `imstt_set_cursor' defined but not used drivers/video/matrox/matroxfb_base.c:1250: warning: `inverse' defined but not used drivers/video/matrox/matroxfb_g450.c:129: warning: duplicate `const' drivers/video/matrox/matroxfb_g450.c:130: warning: duplicate `const' drivers/video/matrox/matroxfb_maven.c:347: warning: duplicate `const' drivers/video/matrox/matroxfb_maven.c:348: warning: duplicate `const' drivers/video/sis/sis_main.c:622: warning: unused variable `reg' drivers/video/tdfxfb.c:1005: warning: `tdfxfb_cursor' defined but not used drivers/video/tdfxfb.c:198: warning: `inverse' defined but not used drivers/video/tdfxfb.c:199: warning: `mode_option' defined but not used drivers/video/tridentfb.c:455: warning: `tridentfb_fillrect' defined but not used drivers/video/tridentfb.c:473: warning: `tridentfb_copyarea' defined but not used include/linux/ixjuser.h:45: warning: `ixjuser_h_rcsid' defined but not used include/linux/mca-legacy.h:12:2: warning: #warning "MCA legacy - please move your driver to the new sysfs api" net/decnet/dn_nsp_in.c:805: warning: `skb_linearize' is deprecated (declared at include/linux/skbuff.h:1136) net/decnet/dn_route.c:639: warning: `skb_linearize' is deprecated (declared at include/linux/skbuff.h:1136) net/ipv4/ipcomp.c:189: warning: `skb_linearize' is deprecated (declared at include/linux/skbuff.h:1136) net/ipv4/ipcomp.c:72: warning: `skb_linearize' is deprecated (declared at include/linux/skbuff.h:1136) net/ipv6/ipcomp6.c:174: warning: `skb_linearize' is deprecated (declared at include/linux/skbuff.h:1136) net/ipv6/ipcomp6.c:61: warning: `skb_linearize' is deprecated (declared at include/linux/skbuff.h:1136) net/ipv6/netfilter/ip6_tables.c:349: warning: `skb_linearize' is deprecated (declared at include/linux/skbuff.h:1136) net/ipv6/netfilter/ip6table_mangle.c:162: warning: `skb_linearize' is deprecated (declared at include/linux/skbuff.h:1136) net/wanrouter/wanmain.c:729: warning: `dev_get' is deprecated (declared at include/linux/netdevice.h:514) sound/isa/opti9xx/opti92x-ad1848.c:1670: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/isa/opti9xx/opti92x-ad1848.c:1686: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/isa/opti9xx/opti92x-ad1848.c:314: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/ad1848.c:1580: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/ad1848.c:2530: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/ad1848.c:2967: warning: `id_table' defined but not used sound/oss/cmpci.c:1465: warning: unused variable `s' sound/oss/cmpci.c:2865: warning: `cmpci_pci_tbl' defined but not used sound/oss/cs4232.c:141: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/cs4232.c:193: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/gus_card.c:76: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/gus_card.c:78: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/gus_card.c:93: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/gus_card.c:94: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/mad16.c:322: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/maui.c:307: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/mpu401.c:1217: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/msnd.c:74: warning: `MOD_INC_USE_COUNT' is deprecated (declared at include/linux/module.h:482) sound/oss/msnd.c:95: warning: `MOD_DEC_USE_COUNT' is deprecated (declared at include/linux/module.h:494) sound/oss/msnd_pinnacle.c:1123: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/msnd_pinnacle.c:1811: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/opl3sa.c:114: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/opl3sa.c:122: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/pss.c:1004: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/pss.c:191: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/pss.c:640: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/pss.c:710: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/sb_common.c:1224: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/sb_common.c:523: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/sgalaxy.c:89: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/sgalaxy.c:97: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/sscape.c:1113: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/sscape.c:1132: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/sscape.c:1137: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/sscape.c:737: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/trix.c:147: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/trix.c:292: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/trix.c:85: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/wavfront.c:2426: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) sound/oss/wf_midi.c:788: warning: `check_region' is deprecated (declared at include/linux/ioport.h:119) ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 2003-11-13 7:30 2.6.0-test9-mm3 Andrew Morton ` (2 preceding siblings ...) 2003-11-13 22:04 ` 2.6.0-test9-mm3 (compile stats) John Cherry @ 2003-11-14 5:07 ` Martin J. Bligh 2003-11-14 20:57 ` 2.6.0-test9-mm3 Zwane Mwaikambo 2003-11-14 19:08 ` 2.6.0-test9-mm3 Martin J. Bligh 4 siblings, 1 reply; 49+ messages in thread From: Martin J. Bligh @ 2003-11-14 5:07 UTC (permalink / raw) To: Andrew Morton, linux-kernel, linux-mm > - Several ext2 and ext3 allocator fixes. These need serious testing on big > SMP. Survives kernbench and SDET on ext2 at least on 16-way. I'll try ext3 later. M. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 2003-11-14 5:07 ` 2.6.0-test9-mm3 Martin J. Bligh @ 2003-11-14 20:57 ` Zwane Mwaikambo 2003-11-14 21:57 ` 2.6.0-test9-mm3 Martin J. Bligh 0 siblings, 1 reply; 49+ messages in thread From: Zwane Mwaikambo @ 2003-11-14 20:57 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Andrew Morton, linux-kernel, linux-mm On Thu, 13 Nov 2003, Martin J. Bligh wrote: > > - Several ext2 and ext3 allocator fixes. These need serious testing on big > > SMP. > > Survives kernbench and SDET on ext2 at least on 16-way. I'll try ext3 > later. It's actually triple faulting my laptop (K6 family=5 model=8 step=12) when i have CONFIG_X86_4G enabled and try and run X11. The same kernel is fine on all my other test boxes. Any hints? ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 2003-11-14 20:57 ` 2.6.0-test9-mm3 Zwane Mwaikambo @ 2003-11-14 21:57 ` Martin J. Bligh 2003-11-14 21:37 ` 2.6.0-test9-mm3 Zwane Mwaikambo 2003-11-14 21:47 ` 2.6.0-test9-mm3 Linus Torvalds 0 siblings, 2 replies; 49+ messages in thread From: Martin J. Bligh @ 2003-11-14 21:57 UTC (permalink / raw) To: Zwane Mwaikambo; +Cc: Andrew Morton, linux-kernel, linux-mm >> > - Several ext2 and ext3 allocator fixes. These need serious testing on big >> > SMP. >> >> Survives kernbench and SDET on ext2 at least on 16-way. I'll try ext3 >> later. > > It's actually triple faulting my laptop (K6 family=5 model=8 step=12) when > i have CONFIG_X86_4G enabled and try and run X11. The same kernel is fine > on all my other test boxes. Any hints? Linus had some debug thing for triple faults, a few months ago, IIRC ... probably in the archives somewhere ... M. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 2003-11-14 21:57 ` 2.6.0-test9-mm3 Martin J. Bligh @ 2003-11-14 21:37 ` Zwane Mwaikambo 2003-11-14 21:47 ` 2.6.0-test9-mm3 Linus Torvalds 1 sibling, 0 replies; 49+ messages in thread From: Zwane Mwaikambo @ 2003-11-14 21:37 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Andrew Morton, linux-kernel, linux-mm On Fri, 14 Nov 2003, Martin J. Bligh wrote: > >> > - Several ext2 and ext3 allocator fixes. These need serious testing on big > >> > SMP. > >> > >> Survives kernbench and SDET on ext2 at least on 16-way. I'll try ext3 > >> later. > > > > It's actually triple faulting my laptop (K6 family=5 model=8 step=12) when > > i have CONFIG_X86_4G enabled and try and run X11. The same kernel is fine > > on all my other test boxes. Any hints? > > Linus had some debug thing for triple faults, a few months ago, IIRC ... > probably in the archives somewhere ... It should all be in the kernel right now; arch/i386/kernel/doublefault.c but i think i may be a bit low on luck =) ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 2003-11-14 21:57 ` 2.6.0-test9-mm3 Martin J. Bligh 2003-11-14 21:37 ` 2.6.0-test9-mm3 Zwane Mwaikambo @ 2003-11-14 21:47 ` Linus Torvalds 2003-11-15 0:55 ` 2.6.0-test9-mm3 Zwane Mwaikambo 1 sibling, 1 reply; 49+ messages in thread From: Linus Torvalds @ 2003-11-14 21:47 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Zwane Mwaikambo, Andrew Morton, linux-kernel, linux-mm On Fri, 14 Nov 2003, Martin J. Bligh wrote: > > Linus had some debug thing for triple faults, a few months ago, IIRC ... > probably in the archives somewhere ... Triple faults you can't debug, they raise a line outside the CPU, and normal PC hardware will cause that to just trigger a reboot. But double faults do get caught, and that debugging stuff actually is in the standard kernel. It won't give _nearly_ as good a debug report as a "normal" oops, since I didn't want the double-fault handler to touch anything even remotely unsafe, but it often gives a good hint about what might be wrong. Certainly better than triple-faulting did (which we still do for _catastrophic_ corruption, eg totally munged kernel page tables etc - it's just very hard to avoid once you get corrupted enough). Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 2003-11-14 21:47 ` 2.6.0-test9-mm3 Linus Torvalds @ 2003-11-15 0:55 ` Zwane Mwaikambo 2003-11-15 19:34 ` [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops Zwane Mwaikambo 0 siblings, 1 reply; 49+ messages in thread From: Zwane Mwaikambo @ 2003-11-15 0:55 UTC (permalink / raw) To: Linus Torvalds; +Cc: Martin J. Bligh, Andrew Morton, linux-kernel, linux-mm On Fri, 14 Nov 2003, Linus Torvalds wrote: > Triple faults you can't debug, they raise a line outside the CPU, and > normal PC hardware will cause that to just trigger a reboot. > > But double faults do get caught, and that debugging stuff actually is in > the standard kernel. It won't give _nearly_ as good a debug report as a > "normal" oops, since I didn't want the double-fault handler to touch > anything even remotely unsafe, but it often gives a good hint about what > might be wrong. Certainly better than triple-faulting did (which we still > do for _catastrophic_ corruption, eg totally munged kernel page tables etc > - it's just very hard to avoid once you get corrupted enough). "Catastrophic" seems to be rather apt here. 2.6.0-test8-mm1 produced the following, i'm still doing a binary search. Unable to handle kernel paging request at virtual address 00002000 printing eip: 00007341 *pde = 00000000 Oops: 0004 [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 0 EIP: c000:[<00007341>] Not tainted VLI EFLAGS: 00033246 EIP is at 0x7341 eax: 32454256 ebx: 00000000 ecx: 00000000 edx: 00000000 esi: 00000000 edi: 00002000 ebp: 00000fd6 esp: 08763f24 ds: 0000 es: 0000 ss: 0068 Process X (pid: 939, threadinfo=08762000 task=0890b330) Stack: 00000fcb 00000100 00000000 0000c000 00000000 00000000 00000000 00000000 00000005 ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff Call Trace: Code: Bad EIP value. ^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops 2003-11-15 0:55 ` 2.6.0-test9-mm3 Zwane Mwaikambo @ 2003-11-15 19:34 ` Zwane Mwaikambo 2003-11-15 19:52 ` Zwane Mwaikambo 2003-11-17 21:46 ` Zwane Mwaikambo 0 siblings, 2 replies; 49+ messages in thread From: Zwane Mwaikambo @ 2003-11-15 19:34 UTC (permalink / raw) To: Ingo Molnar Cc: Martin J. Bligh, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins The 4G/4G page fault handling path doesn't appear to handle faults happening whilst in vm86. The regs->xcs != __USER_CS so it confused the in kernel test. However i'm still debugging the X11 triple fault in test9-mm3 Unable to handle kernel paging request at virtual address 00002000 printing eip: 00007341 *pde = 00000000 Oops: 0004 [#1] SMP DEBUG_PAGEALLOC CPU: 0 EIP: c000:[<00007341>] Not tainted VLI EFLAGS: 00033246 EIP is at 0x7341 eax: 32454256 ebx: 00000000 ecx: 00000000 edx: 00000000 esi: 00000000 edi: 00002000 ebp: 00000fd6 esp: 087bbf24 ds: 0000 es: 0000 ss: 0068 Process X (pid: 939, threadinfo=087ba000 task=0891c690) Stack: 00000fcb 00000100 00000000 0000c000 00000000 00000000 00000000 00000000 00000005 ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff Call Trace: Index: linux-2.6.0-test9-mm3/arch/i386/mm/fault.c =================================================================== RCS file: /build/cvsroot/linux-2.6.0-test9-mm3/arch/i386/mm/fault.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 fault.c --- linux-2.6.0-test9-mm3/arch/i386/mm/fault.c 13 Nov 2003 08:07:17 -0000 1.1.1.1 +++ linux-2.6.0-test9-mm3/arch/i386/mm/fault.c 15 Nov 2003 19:08:34 -0000 @@ -264,7 +264,9 @@ asmlinkage void do_page_fault(struct pt_ if (error_code & 3) goto bad_area_nosemaphore; - goto vmalloc_fault; + /* If it's vm86 fall through */ + if (!(error_code & 4)) + goto vmalloc_fault; } #else if (unlikely(address >= TASK_SIZE)) { ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops 2003-11-15 19:34 ` [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops Zwane Mwaikambo @ 2003-11-15 19:52 ` Zwane Mwaikambo 2003-11-17 21:46 ` Zwane Mwaikambo 1 sibling, 0 replies; 49+ messages in thread From: Zwane Mwaikambo @ 2003-11-15 19:52 UTC (permalink / raw) To: Ingo Molnar Cc: Martin J. Bligh, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins On Sat, 15 Nov 2003, Zwane Mwaikambo wrote: > The 4G/4G page fault handling path doesn't appear to handle faults > happening whilst in vm86. The regs->xcs != __USER_CS so it confused the in > kernel test. Perhaps this would be more desirable? Index: linux-2.6.0-test9-mm3/arch/i386/mm/fault.c =================================================================== RCS file: /build/cvsroot/linux-2.6.0-test9-mm3/arch/i386/mm/fault.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 fault.c --- linux-2.6.0-test9-mm3/arch/i386/mm/fault.c 13 Nov 2003 08:07:17 -0000 1.1.1.1 +++ linux-2.6.0-test9-mm3/arch/i386/mm/fault.c 15 Nov 2003 19:40:17 -0000 @@ -264,7 +264,9 @@ asmlinkage void do_page_fault(struct pt_ if (error_code & 3) goto bad_area_nosemaphore; - goto vmalloc_fault; + /* If it's vm86 fall through */ + if (!(regs->eflags & VM_MASK)) + goto vmalloc_fault; } #else if (unlikely(address >= TASK_SIZE)) { ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops 2003-11-15 19:34 ` [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops Zwane Mwaikambo 2003-11-15 19:52 ` Zwane Mwaikambo @ 2003-11-17 21:46 ` Zwane Mwaikambo 2003-11-17 22:42 ` Linus Torvalds 1 sibling, 1 reply; 49+ messages in thread From: Zwane Mwaikambo @ 2003-11-17 21:46 UTC (permalink / raw) To: Ingo Molnar Cc: Martin J. Bligh, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins On Sat, 15 Nov 2003, Zwane Mwaikambo wrote: > The 4G/4G page fault handling path doesn't appear to handle faults > happening whilst in vm86. The regs->xcs != __USER_CS so it confused the in > kernel test. > > However i'm still debugging the X11 triple fault in test9-mm3 I've managed to `fix` the triple fault (see further below for the patch in all it's glory). Unfortunately i have been unable to come up with a simpler workaround which is fewer instructions and easier to debug. I have tried the following; mb()/barrier() flush_tlb_all() wbinvd() outb(0x80,0x00) local_irq_save(flags); local_irq_enable(); loop(); local_irq_restore(flags); long_loop() What i do know is that in the following code; __asm__ __volatile__( "xorl %%eax,%%eax; movl %%eax,%%fs; movl %%eax,%%gs\n\t" "movl %0,%%esp\n\t" "movl %1,%%ebp\n\t" "jmp resume_userspace" : /* no outputs */ :"r" (&info->regs), "r" (tsk->thread_info) : "ax"); It does get to resume_userspace as putting a $0 into %ebp will oops in __switch_to And here is the current 'workaround'. Any hints? Index: arch/i386/kernel/vm86.c =================================================================== RCS file: /build/cvsroot/linux-2.6.0-test9-mm3/arch/i386/kernel/vm86.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 vm86.c --- arch/i386/kernel/vm86.c 13 Nov 2003 08:07:17 -0000 1.1.1.1 +++ arch/i386/kernel/vm86.c 17 Nov 2003 21:45:13 -0000 @@ -312,6 +311,8 @@ static void do_sys_vm86(struct kernel_vm tsk->thread.screen_bitmap = info->screen_bitmap; if (info->flags & VM86_SCREEN_BITMAP) mark_screen_rdonly(tsk); + + printk("ooh la la\n"); __asm__ __volatile__( "xorl %%eax,%%eax; movl %%eax,%%fs; movl %%eax,%%gs\n\t" "movl %0,%%esp\n\t" ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops 2003-11-17 21:46 ` Zwane Mwaikambo @ 2003-11-17 22:42 ` Linus Torvalds 2003-11-17 23:01 ` Zwane Mwaikambo 0 siblings, 1 reply; 49+ messages in thread From: Linus Torvalds @ 2003-11-17 22:42 UTC (permalink / raw) To: Zwane Mwaikambo Cc: Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins On Mon, 17 Nov 2003, Zwane Mwaikambo wrote: > > I've managed to `fix` the triple fault (see further below for the patch > in all it's glory). What's the generated assembly language for this function with and without the "fix"? If adding that printk fixes a triple fault, the issue is not likely to be the printk itself as much as the difference in code that the compiler generates - stack frame, memory re-ordering etc... Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops 2003-11-17 22:42 ` Linus Torvalds @ 2003-11-17 23:01 ` Zwane Mwaikambo 2003-11-17 23:14 ` Zwane Mwaikambo 0 siblings, 1 reply; 49+ messages in thread From: Zwane Mwaikambo @ 2003-11-17 23:01 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins On Mon, 17 Nov 2003, Linus Torvalds wrote: > What's the generated assembly language for this function with and without > the "fix"? > > If adding that printk fixes a triple fault, the issue is not likely to be > the printk itself as much as the difference in code that the compiler > generates - stack frame, memory re-ordering etc... This would be my 'trusty' gcc 3.2.2 from RedHat 9 (gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5) With the fix: 0x0210e860 <do_sys_vm86+0>: push %edi 0x0210e861 <do_sys_vm86+1>: mov $0xffffe000,%eax 0x0210e866 <do_sys_vm86+6>: push %esi 0x0210e867 <do_sys_vm86+7>: and %esp,%eax 0x0210e869 <do_sys_vm86+9>: push %ebx 0x0210e86a <do_sys_vm86+10>: mov 0x10(%esp,1),%edi 0x0210e86e <do_sys_vm86+14>: mov 0x14(%esp,1),%esi 0x0210e872 <do_sys_vm86+18>: movl $0x0,0x1c(%edi) 0x0210e879 <do_sys_vm86+25>: movl $0x0,0x20(%edi) 0x0210e880 <do_sys_vm86+32>: mov (%eax),%edx 0x0210e882 <do_sys_vm86+34>: mov 0x30(%edi),%eax 0x0210e885 <do_sys_vm86+37>: mov %eax,0x5b8(%edx) 0x0210e88b <do_sys_vm86+43>: mov 0x30(%edi),%edx 0x0210e88e <do_sys_vm86+46>: mov 0xbc(%edi),%eax 0x0210e894 <do_sys_vm86+52>: and $0xdd5,%edx 0x0210e89a <do_sys_vm86+58>: mov %edx,0x30(%edi) 0x0210e89d <do_sys_vm86+61>: mov 0x30(%eax),%eax 0x0210e8a0 <do_sys_vm86+64>: and $0xfffff22a,%eax 0x0210e8a5 <do_sys_vm86+69>: or %eax,%edx 0x0210e8a7 <do_sys_vm86+71>: mov 0x54(%edi),%eax 0x0210e8aa <do_sys_vm86+74>: or $0x20000,%edx 0x0210e8b0 <do_sys_vm86+80>: cmp $0x3,%eax 0x0210e8b3 <do_sys_vm86+83>: mov %edx,0x30(%edi) 0x0210e8b6 <do_sys_vm86+86>: je 0x210e9f0 <do_sys_vm86+400> 0x0210e8bc <do_sys_vm86+92>: cmp $0x3,%eax 0x0210e8bf <do_sys_vm86+95>: ja 0x210e9d5 <do_sys_vm86+373> 0x0210e8c5 <do_sys_vm86+101>: cmp $0x2,%eax 0x0210e8c8 <do_sys_vm86+104>: je 0x210e9c6 <do_sys_vm86+358> 0x0210e8ce <do_sys_vm86+110>: movl $0x247000,0x5bc(%esi) 0x0210e8d8 <do_sys_vm86+120>: mov 0xbc(%edi),%eax 0x0210e8de <do_sys_vm86+126>: movl $0x0,0x18(%eax) 0x0210e8e5 <do_sys_vm86+133>: mov 0x360(%esi),%eax 0x0210e8eb <do_sys_vm86+139>: mov %eax,0x5c0(%esi) 0x0210e8f1 <do_sys_vm86+145>: movl %fs,0x5c4(%esi) 0x0210e8f7 <do_sys_vm86+151>: movl %gs,0x5c8(%esi) 0x0210e8fd <do_sys_vm86+157>: mov $0xffffe000,%ebx 0x0210e902 <do_sys_vm86+162>: and %esp,%ebx 0x0210e904 <do_sys_vm86+164>: mov 0x14(%ebx),%eax 0x0210e907 <do_sys_vm86+167>: inc %eax 0x0210e908 <do_sys_vm86+168>: mov %eax,0x14(%ebx) 0x0210e90b <do_sys_vm86+171>: mov 0x10(%ebx),%eax 0x0210e90e <do_sys_vm86+174>: mov 0x4(%esi),%edx 0x0210e911 <do_sys_vm86+177>: shl $0x9,%eax 0x0210e914 <do_sys_vm86+180>: lea 0x26ff000(%eax),%ecx 0x0210e91a <do_sys_vm86+186>: lea 0x4c(%edi),%eax 0x0210e91d <do_sys_vm86+189>: mov %eax,0x360(%esi) 0x0210e923 <do_sys_vm86+195>: sub 0x1c(%edx),%eax 0x0210e926 <do_sys_vm86+198>: add 0x20(%edx),%eax 0x0210e929 <do_sys_vm86+201>: mov %eax,0x4(%ecx) 0x0210e92c <do_sys_vm86+204>: mov 0x25fe52c,%eax 0x0210e931 <do_sys_vm86+209>: test $0x800,%eax 0x0210e936 <do_sys_vm86+214>: je 0x210e942 <do_sys_vm86+226> 0x0210e938 <do_sys_vm86+216>: movl $0x0,0x364(%esi) 0x0210e942 <do_sys_vm86+226>: lea 0x340(%esi),%edx 0x0210e948 <do_sys_vm86+232>: mov 0x20(%edx),%eax 0x0210e94b <do_sys_vm86+235>: mov %eax,0x4(%ecx) 0x0210e94e <do_sys_vm86+238>: mov 0x10(%ecx),%ax 0x0210e952 <do_sys_vm86+242>: and $0xffff,%eax 0x0210e957 <do_sys_vm86+247>: cmp 0x24(%edx),%eax 0x0210e95a <do_sys_vm86+250>: jne 0x210e9b0 <do_sys_vm86+336> 0x0210e95c <do_sys_vm86+252>: mov 0x14(%ebx),%eax 0x0210e95f <do_sys_vm86+255>: dec %eax 0x0210e960 <do_sys_vm86+256>: mov %eax,0x14(%ebx) 0x0210e963 <do_sys_vm86+259>: mov 0x8(%ebx),%eax 0x0210e966 <do_sys_vm86+262>: and $0x8,%eax 0x0210e969 <do_sys_vm86+265>: jne 0x210e9a9 <do_sys_vm86+329> 0x0210e96b <do_sys_vm86+267>: push $0x255f121 0x0210e970 <do_sys_vm86+272>: call 0x21285a0 <printk> 0x0210e975 <do_sys_vm86+277>: mov 0x50(%edi),%eax 0x0210e978 <do_sys_vm86+280>: mov %eax,0x5b4(%esi) 0x0210e97e <do_sys_vm86+286>: pop %eax 0x0210e97f <do_sys_vm86+287>: testb $0x1,0x4c(%edi) 0x0210e983 <do_sys_vm86+291>: jne 0x210e9a0 <do_sys_vm86+320> 0x0210e985 <do_sys_vm86+293>: mov 0x4(%esi),%edx 0x0210e988 <do_sys_vm86+296>: xor %eax,%eax 0x0210e98a <do_sys_vm86+298>: mov %eax,%fs 0x0210e98c <do_sys_vm86+300>: mov %eax,%gs 0x0210e98e <do_sys_vm86+302>: mov %edi,%esp 0x0210e990 <do_sys_vm86+304>: mov %edx,%ebp 0x0210e992 <do_sys_vm86+306>: jmp 0xfffeb100 <resume_userspace> 0x0210e997 <do_sys_vm86+311>: pop %ebx 0x0210e998 <do_sys_vm86+312>: pop %esi 0x0210e999 <do_sys_vm86+313>: pop %edi 0x0210e99a <do_sys_vm86+314>: ret 0x0210e99b <do_sys_vm86+315>: nop 0x0210e99c <do_sys_vm86+316>: lea 0x0(%esi,1),%esi 0x0210e9a0 <do_sys_vm86+320>: push %esi 0x0210e9a1 <do_sys_vm86+321>: call 0x210e5b0 <mark_screen_rdonly> 0x0210e9a6 <do_sys_vm86+326>: pop %eax 0x0210e9a7 <do_sys_vm86+327>: jmp 0x210e985 <do_sys_vm86+293> 0x0210e9a9 <do_sys_vm86+329>: call 0x21222d0 <preempt_schedule> 0x0210e9ae <do_sys_vm86+334>: jmp 0x210e96b <do_sys_vm86+267> 0x0210e9b0 <do_sys_vm86+336>: mov 0x24(%edx),%ax 0x0210e9b4 <do_sys_vm86+340>: mov %ax,0x10(%ecx) 0x0210e9b8 <do_sys_vm86+344>: mov $0x174,%ecx 0x0210e9bd <do_sys_vm86+349>: mov 0x24(%edx),%eax 0x0210e9c0 <do_sys_vm86+352>: xor %edx,%edx 0x0210e9c2 <do_sys_vm86+354>: wrmsr 0x0210e9c4 <do_sys_vm86+356>: jmp 0x210e95c <do_sys_vm86+252> 0x0210e9c6 <do_sys_vm86+358>: movl $0x0,0x5bc(%esi) 0x0210e9d0 <do_sys_vm86+368>: jmp 0x210e8d8 <do_sys_vm86+120> 0x0210e9d5 <do_sys_vm86+373>: cmp $0x4,%eax 0x0210e9d8 <do_sys_vm86+376>: jne 0x210e8ce <do_sys_vm86+110> 0x0210e9de <do_sys_vm86+382>: movl $0x47000,0x5bc(%esi) 0x0210e9e8 <do_sys_vm86+392>: jmp 0x210e8d8 <do_sys_vm86+120> 0x0210e9ed <do_sys_vm86+397>: lea 0x0(%esi),%esi 0x0210e9f0 <do_sys_vm86+400>: movl $0x7000,0x5bc(%esi) 0x0210e9fa <do_sys_vm86+410>: jmp 0x210e8d8 <do_sys_vm86+120> Without the fix: 0x0210e860 <do_sys_vm86+0>: push %edi 0x0210e861 <do_sys_vm86+1>: mov $0xffffe000,%eax 0x0210e866 <do_sys_vm86+6>: push %esi 0x0210e867 <do_sys_vm86+7>: and %esp,%eax 0x0210e869 <do_sys_vm86+9>: push %ebx 0x0210e86a <do_sys_vm86+10>: mov 0x10(%esp,1),%edi 0x0210e86e <do_sys_vm86+14>: mov 0x14(%esp,1),%esi 0x0210e872 <do_sys_vm86+18>: movl $0x0,0x1c(%edi) 0x0210e879 <do_sys_vm86+25>: movl $0x0,0x20(%edi) 0x0210e880 <do_sys_vm86+32>: mov (%eax),%edx 0x0210e882 <do_sys_vm86+34>: mov 0x30(%edi),%eax 0x0210e885 <do_sys_vm86+37>: mov %eax,0x5b8(%edx) 0x0210e88b <do_sys_vm86+43>: mov 0x30(%edi),%edx 0x0210e88e <do_sys_vm86+46>: mov 0xbc(%edi),%eax 0x0210e894 <do_sys_vm86+52>: and $0xdd5,%edx 0x0210e89a <do_sys_vm86+58>: mov %edx,0x30(%edi) 0x0210e89d <do_sys_vm86+61>: mov 0x30(%eax),%eax 0x0210e8a0 <do_sys_vm86+64>: and $0xfffff22a,%eax 0x0210e8a5 <do_sys_vm86+69>: or %eax,%edx 0x0210e8a7 <do_sys_vm86+71>: mov 0x54(%edi),%eax 0x0210e8aa <do_sys_vm86+74>: or $0x20000,%edx 0x0210e8b0 <do_sys_vm86+80>: cmp $0x3,%eax 0x0210e8b3 <do_sys_vm86+83>: mov %edx,0x30(%edi) 0x0210e8b6 <do_sys_vm86+86>: je 0x210e9e0 <do_sys_vm86+384> 0x0210e8bc <do_sys_vm86+92>: cmp $0x3,%eax 0x0210e8bf <do_sys_vm86+95>: ja 0x210e9c5 <do_sys_vm86+357> 0x0210e8c5 <do_sys_vm86+101>: cmp $0x2,%eax 0x0210e8c8 <do_sys_vm86+104>: je 0x210e9b6 <do_sys_vm86+342> 0x0210e8ce <do_sys_vm86+110>: movl $0x247000,0x5bc(%esi) 0x0210e8d8 <do_sys_vm86+120>: mov 0xbc(%edi),%eax 0x0210e8de <do_sys_vm86+126>: movl $0x0,0x18(%eax) 0x0210e8e5 <do_sys_vm86+133>: mov 0x360(%esi),%eax 0x0210e8eb <do_sys_vm86+139>: mov %eax,0x5c0(%esi) 0x0210e8f1 <do_sys_vm86+145>: movl %fs,0x5c4(%esi) 0x0210e8f7 <do_sys_vm86+151>: movl %gs,0x5c8(%esi) 0x0210e8fd <do_sys_vm86+157>: mov $0xffffe000,%ebx 0x0210e902 <do_sys_vm86+162>: and %esp,%ebx 0x0210e904 <do_sys_vm86+164>: mov 0x14(%ebx),%eax 0x0210e907 <do_sys_vm86+167>: inc %eax 0x0210e908 <do_sys_vm86+168>: mov %eax,0x14(%ebx) 0x0210e90b <do_sys_vm86+171>: mov 0x10(%ebx),%eax 0x0210e90e <do_sys_vm86+174>: mov 0x4(%esi),%edx 0x0210e911 <do_sys_vm86+177>: shl $0x9,%eax 0x0210e914 <do_sys_vm86+180>: lea 0x26ff000(%eax),%ecx 0x0210e91a <do_sys_vm86+186>: lea 0x4c(%edi),%eax 0x0210e91d <do_sys_vm86+189>: mov %eax,0x360(%esi) 0x0210e923 <do_sys_vm86+195>: sub 0x1c(%edx),%eax 0x0210e926 <do_sys_vm86+198>: add 0x20(%edx),%eax 0x0210e929 <do_sys_vm86+201>: mov %eax,0x4(%ecx) 0x0210e92c <do_sys_vm86+204>: mov 0x25fe52c,%eax 0x0210e931 <do_sys_vm86+209>: test $0x800,%eax 0x0210e936 <do_sys_vm86+214>: je 0x210e942 <do_sys_vm86+226> 0x0210e938 <do_sys_vm86+216>: movl $0x0,0x364(%esi) 0x0210e942 <do_sys_vm86+226>: lea 0x340(%esi),%edx 0x0210e948 <do_sys_vm86+232>: mov 0x20(%edx),%eax 0x0210e94b <do_sys_vm86+235>: mov %eax,0x4(%ecx) 0x0210e94e <do_sys_vm86+238>: mov 0x10(%ecx),%ax 0x0210e952 <do_sys_vm86+242>: and $0xffff,%eax 0x0210e957 <do_sys_vm86+247>: cmp 0x24(%edx),%eax 0x0210e95a <do_sys_vm86+250>: jne 0x210e9a0 <do_sys_vm86+320> 0x0210e95c <do_sys_vm86+252>: mov 0x14(%ebx),%eax 0x0210e95f <do_sys_vm86+255>: dec %eax 0x0210e960 <do_sys_vm86+256>: mov %eax,0x14(%ebx) 0x0210e963 <do_sys_vm86+259>: mov 0x8(%ebx),%eax 0x0210e966 <do_sys_vm86+262>: and $0x8,%eax 0x0210e969 <do_sys_vm86+265>: jne 0x210e999 <do_sys_vm86+313> 0x0210e96b <do_sys_vm86+267>: mov 0x50(%edi),%eax 0x0210e96e <do_sys_vm86+270>: mov %eax,0x5b4(%esi) 0x0210e974 <do_sys_vm86+276>: testb $0x1,0x4c(%edi) 0x0210e978 <do_sys_vm86+280>: jne 0x210e990 <do_sys_vm86+304> 0x0210e97a <do_sys_vm86+282>: mov 0x4(%esi),%edx 0x0210e97d <do_sys_vm86+285>: xor %eax,%eax 0x0210e97f <do_sys_vm86+287>: mov %eax,%fs 0x0210e981 <do_sys_vm86+289>: mov %eax,%gs 0x0210e983 <do_sys_vm86+291>: mov %edi,%esp 0x0210e985 <do_sys_vm86+293>: mov %edx,%ebp 0x0210e987 <do_sys_vm86+295>: jmp 0xfffeb100 <resume_userspace> 0x0210e98c <do_sys_vm86+300>: pop %ebx 0x0210e98d <do_sys_vm86+301>: pop %esi 0x0210e98e <do_sys_vm86+302>: pop %edi 0x0210e98f <do_sys_vm86+303>: ret 0x0210e990 <do_sys_vm86+304>: push %esi 0x0210e991 <do_sys_vm86+305>: call 0x210e5b0 <mark_screen_rdonly> 0x0210e996 <do_sys_vm86+310>: pop %eax 0x0210e997 <do_sys_vm86+311>: jmp 0x210e97a <do_sys_vm86+282> 0x0210e999 <do_sys_vm86+313>: call 0x21222c0 <preempt_schedule> 0x0210e99e <do_sys_vm86+318>: jmp 0x210e96b <do_sys_vm86+267> 0x0210e9a0 <do_sys_vm86+320>: mov 0x24(%edx),%ax 0x0210e9a4 <do_sys_vm86+324>: mov %ax,0x10(%ecx) 0x0210e9a8 <do_sys_vm86+328>: mov $0x174,%ecx 0x0210e9ad <do_sys_vm86+333>: mov 0x24(%edx),%eax 0x0210e9b0 <do_sys_vm86+336>: xor %edx,%edx 0x0210e9b2 <do_sys_vm86+338>: wrmsr 0x0210e9b4 <do_sys_vm86+340>: jmp 0x210e95c <do_sys_vm86+252> 0x0210e9b6 <do_sys_vm86+342>: movl $0x0,0x5bc(%esi) 0x0210e9c0 <do_sys_vm86+352>: jmp 0x210e8d8 <do_sys_vm86+120> 0x0210e9c5 <do_sys_vm86+357>: cmp $0x4,%eax 0x0210e9c8 <do_sys_vm86+360>: jne 0x210e8ce <do_sys_vm86+110> 0x0210e9ce <do_sys_vm86+366>: movl $0x47000,0x5bc(%esi) 0x0210e9d8 <do_sys_vm86+376>: jmp 0x210e8d8 <do_sys_vm86+120> 0x0210e9dd <do_sys_vm86+381>: lea 0x0(%esi),%esi 0x0210e9e0 <do_sys_vm86+384>: movl $0x7000,0x5bc(%esi) 0x0210e9ea <do_sys_vm86+394>: jmp 0x210e8d8 <do_sys_vm86+120> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops 2003-11-17 23:01 ` Zwane Mwaikambo @ 2003-11-17 23:14 ` Zwane Mwaikambo 2003-11-18 7:21 ` Zwane Mwaikambo 0 siblings, 1 reply; 49+ messages in thread From: Zwane Mwaikambo @ 2003-11-17 23:14 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins On Mon, 17 Nov 2003, Zwane Mwaikambo wrote: > On Mon, 17 Nov 2003, Linus Torvalds wrote: > > > What's the generated assembly language for this function with and without > > the "fix"? > > > > If adding that printk fixes a triple fault, the issue is not likely to be > > the printk itself as much as the difference in code that the compiler > > generates - stack frame, memory re-ordering etc... > > This would be my 'trusty' gcc 3.2.2 from RedHat 9 > (gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5) A little bird told me to send diffs... But there is a lot of noise due to offsets i'm afraid. --- buggy 2003-11-17 18:09:35.302964248 -0500 +++ works 2003-11-17 18:09:47.744072912 -0500 @@ -21,11 +21,11 @@ 0x0210e8aa <do_sys_vm86+74>: or $0x20000,%edx 0x0210e8b0 <do_sys_vm86+80>: cmp $0x3,%eax 0x0210e8b3 <do_sys_vm86+83>: mov %edx,0x30(%edi) -0x0210e8b6 <do_sys_vm86+86>: je 0x210e9e0 <do_sys_vm86+384> +0x0210e8b6 <do_sys_vm86+86>: je 0x210e9f0 <do_sys_vm86+400> 0x0210e8bc <do_sys_vm86+92>: cmp $0x3,%eax -0x0210e8bf <do_sys_vm86+95>: ja 0x210e9c5 <do_sys_vm86+357> +0x0210e8bf <do_sys_vm86+95>: ja 0x210e9d5 <do_sys_vm86+373> 0x0210e8c5 <do_sys_vm86+101>: cmp $0x2,%eax -0x0210e8c8 <do_sys_vm86+104>: je 0x210e9b6 <do_sys_vm86+342> +0x0210e8c8 <do_sys_vm86+104>: je 0x210e9c6 <do_sys_vm86+358> 0x0210e8ce <do_sys_vm86+110>: movl $0x247000,0x5bc(%esi) 0x0210e8d8 <do_sys_vm86+120>: mov 0xbc(%edi),%eax 0x0210e8de <do_sys_vm86+126>: movl $0x0,0x18(%eax) @@ -57,47 +57,52 @@ 0x0210e94e <do_sys_vm86+238>: mov 0x10(%ecx),%ax 0x0210e952 <do_sys_vm86+242>: and $0xffff,%eax 0x0210e957 <do_sys_vm86+247>: cmp 0x24(%edx),%eax -0x0210e95a <do_sys_vm86+250>: jne 0x210e9a0 <do_sys_vm86+320> +0x0210e95a <do_sys_vm86+250>: jne 0x210e9b0 <do_sys_vm86+336> 0x0210e95c <do_sys_vm86+252>: mov 0x14(%ebx),%eax 0x0210e95f <do_sys_vm86+255>: dec %eax 0x0210e960 <do_sys_vm86+256>: mov %eax,0x14(%ebx) 0x0210e963 <do_sys_vm86+259>: mov 0x8(%ebx),%eax 0x0210e966 <do_sys_vm86+262>: and $0x8,%eax -0x0210e969 <do_sys_vm86+265>: jne 0x210e999 <do_sys_vm86+313> -0x0210e96b <do_sys_vm86+267>: mov 0x50(%edi),%eax -0x0210e96e <do_sys_vm86+270>: mov %eax,0x5b4(%esi) -0x0210e974 <do_sys_vm86+276>: testb $0x1,0x4c(%edi) -0x0210e978 <do_sys_vm86+280>: jne 0x210e990 <do_sys_vm86+304> -0x0210e97a <do_sys_vm86+282>: mov 0x4(%esi),%edx -0x0210e97d <do_sys_vm86+285>: xor %eax,%eax -0x0210e97f <do_sys_vm86+287>: mov %eax,%fs -0x0210e981 <do_sys_vm86+289>: mov %eax,%gs -0x0210e983 <do_sys_vm86+291>: mov %edi,%esp -0x0210e985 <do_sys_vm86+293>: mov %edx,%ebp -0x0210e987 <do_sys_vm86+295>: jmp 0xfffeb100 <resume_userspace> -0x0210e98c <do_sys_vm86+300>: pop %ebx -0x0210e98d <do_sys_vm86+301>: pop %esi -0x0210e98e <do_sys_vm86+302>: pop %edi -0x0210e98f <do_sys_vm86+303>: ret -0x0210e990 <do_sys_vm86+304>: push %esi -0x0210e991 <do_sys_vm86+305>: call 0x210e5b0 <mark_screen_rdonly> -0x0210e996 <do_sys_vm86+310>: pop %eax -0x0210e997 <do_sys_vm86+311>: jmp 0x210e97a <do_sys_vm86+282> -0x0210e999 <do_sys_vm86+313>: call 0x21222c0 <preempt_schedule> -0x0210e99e <do_sys_vm86+318>: jmp 0x210e96b <do_sys_vm86+267> -0x0210e9a0 <do_sys_vm86+320>: mov 0x24(%edx),%ax -0x0210e9a4 <do_sys_vm86+324>: mov %ax,0x10(%ecx) -0x0210e9a8 <do_sys_vm86+328>: mov $0x174,%ecx -0x0210e9ad <do_sys_vm86+333>: mov 0x24(%edx),%eax -0x0210e9b0 <do_sys_vm86+336>: xor %edx,%edx -0x0210e9b2 <do_sys_vm86+338>: wrmsr -0x0210e9b4 <do_sys_vm86+340>: jmp 0x210e95c <do_sys_vm86+252> -0x0210e9b6 <do_sys_vm86+342>: movl $0x0,0x5bc(%esi) -0x0210e9c0 <do_sys_vm86+352>: jmp 0x210e8d8 <do_sys_vm86+120> -0x0210e9c5 <do_sys_vm86+357>: cmp $0x4,%eax -0x0210e9c8 <do_sys_vm86+360>: jne 0x210e8ce <do_sys_vm86+110> -0x0210e9ce <do_sys_vm86+366>: movl $0x47000,0x5bc(%esi) -0x0210e9d8 <do_sys_vm86+376>: jmp 0x210e8d8 <do_sys_vm86+120> -0x0210e9dd <do_sys_vm86+381>: lea 0x0(%esi),%esi -0x0210e9e0 <do_sys_vm86+384>: movl $0x7000,0x5bc(%esi) -0x0210e9ea <do_sys_vm86+394>: jmp 0x210e8d8 <do_sys_vm86+120> +0x0210e969 <do_sys_vm86+265>: jne 0x210e9a9 <do_sys_vm86+329> +0x0210e96b <do_sys_vm86+267>: push $0x255f121 +0x0210e970 <do_sys_vm86+272>: call 0x21285a0 <printk> +0x0210e975 <do_sys_vm86+277>: mov 0x50(%edi),%eax +0x0210e978 <do_sys_vm86+280>: mov %eax,0x5b4(%esi) +0x0210e97e <do_sys_vm86+286>: pop %eax +0x0210e97f <do_sys_vm86+287>: testb $0x1,0x4c(%edi) +0x0210e983 <do_sys_vm86+291>: jne 0x210e9a0 <do_sys_vm86+320> +0x0210e985 <do_sys_vm86+293>: mov 0x4(%esi),%edx +0x0210e988 <do_sys_vm86+296>: xor %eax,%eax +0x0210e98a <do_sys_vm86+298>: mov %eax,%fs +0x0210e98c <do_sys_vm86+300>: mov %eax,%gs +0x0210e98e <do_sys_vm86+302>: mov %edi,%esp +0x0210e990 <do_sys_vm86+304>: mov %edx,%ebp +0x0210e992 <do_sys_vm86+306>: jmp 0xfffeb100 <resume_userspace> +0x0210e997 <do_sys_vm86+311>: pop %ebx +0x0210e998 <do_sys_vm86+312>: pop %esi +0x0210e999 <do_sys_vm86+313>: pop %edi +0x0210e99a <do_sys_vm86+314>: ret +0x0210e99b <do_sys_vm86+315>: nop +0x0210e99c <do_sys_vm86+316>: lea 0x0(%esi,1),%esi +0x0210e9a0 <do_sys_vm86+320>: push %esi +0x0210e9a1 <do_sys_vm86+321>: call 0x210e5b0 <mark_screen_rdonly> +0x0210e9a6 <do_sys_vm86+326>: pop %eax +0x0210e9a7 <do_sys_vm86+327>: jmp 0x210e985 <do_sys_vm86+293> +0x0210e9a9 <do_sys_vm86+329>: call 0x21222d0 <preempt_schedule> +0x0210e9ae <do_sys_vm86+334>: jmp 0x210e96b <do_sys_vm86+267> +0x0210e9b0 <do_sys_vm86+336>: mov 0x24(%edx),%ax +0x0210e9b4 <do_sys_vm86+340>: mov %ax,0x10(%ecx) +0x0210e9b8 <do_sys_vm86+344>: mov $0x174,%ecx +0x0210e9bd <do_sys_vm86+349>: mov 0x24(%edx),%eax +0x0210e9c0 <do_sys_vm86+352>: xor %edx,%edx +0x0210e9c2 <do_sys_vm86+354>: wrmsr +0x0210e9c4 <do_sys_vm86+356>: jmp 0x210e95c <do_sys_vm86+252> +0x0210e9c6 <do_sys_vm86+358>: movl $0x0,0x5bc(%esi) +0x0210e9d0 <do_sys_vm86+368>: jmp 0x210e8d8 <do_sys_vm86+120> +0x0210e9d5 <do_sys_vm86+373>: cmp $0x4,%eax +0x0210e9d8 <do_sys_vm86+376>: jne 0x210e8ce <do_sys_vm86+110> +0x0210e9de <do_sys_vm86+382>: movl $0x47000,0x5bc(%esi) +0x0210e9e8 <do_sys_vm86+392>: jmp 0x210e8d8 <do_sys_vm86+120> +0x0210e9ed <do_sys_vm86+397>: lea 0x0(%esi),%esi +0x0210e9f0 <do_sys_vm86+400>: movl $0x7000,0x5bc(%esi) +0x0210e9fa <do_sys_vm86+410>: jmp 0x210e8d8 <do_sys_vm86+120> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops 2003-11-17 23:14 ` Zwane Mwaikambo @ 2003-11-18 7:21 ` Zwane Mwaikambo 2003-11-18 15:47 ` Linus Torvalds 0 siblings, 1 reply; 49+ messages in thread From: Zwane Mwaikambo @ 2003-11-18 7:21 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins On Mon, 17 Nov 2003, Zwane Mwaikambo wrote: > A little bird told me to send diffs... But there is a lot of noise due to > offsets i'm afraid. Another note from our avian friends; i seem to have sent a slightly different dump from the patch, although they do both achieve the same effect. I shall append it for completeness. 0x0210e860 <do_sys_vm86+0>: push %edi 0x0210e861 <do_sys_vm86+1>: mov $0xffffe000,%eax 0x0210e866 <do_sys_vm86+6>: push %esi 0x0210e867 <do_sys_vm86+7>: and %esp,%eax 0x0210e869 <do_sys_vm86+9>: push %ebx 0x0210e86a <do_sys_vm86+10>: mov 0x10(%esp,1),%edi 0x0210e86e <do_sys_vm86+14>: mov 0x14(%esp,1),%esi 0x0210e872 <do_sys_vm86+18>: movl $0x0,0x1c(%edi) 0x0210e879 <do_sys_vm86+25>: movl $0x0,0x20(%edi) 0x0210e880 <do_sys_vm86+32>: mov (%eax),%edx 0x0210e882 <do_sys_vm86+34>: mov 0x30(%edi),%eax 0x0210e885 <do_sys_vm86+37>: mov %eax,0x5b8(%edx) 0x0210e88b <do_sys_vm86+43>: mov 0x30(%edi),%edx 0x0210e88e <do_sys_vm86+46>: mov 0xbc(%edi),%eax 0x0210e894 <do_sys_vm86+52>: and $0xdd5,%edx 0x0210e89a <do_sys_vm86+58>: mov %edx,0x30(%edi) 0x0210e89d <do_sys_vm86+61>: mov 0x30(%eax),%eax 0x0210e8a0 <do_sys_vm86+64>: and $0xfffff22a,%eax 0x0210e8a5 <do_sys_vm86+69>: or %eax,%edx 0x0210e8a7 <do_sys_vm86+71>: mov 0x54(%edi),%eax 0x0210e8aa <do_sys_vm86+74>: or $0x20000,%edx 0x0210e8b0 <do_sys_vm86+80>: cmp $0x3,%eax 0x0210e8b3 <do_sys_vm86+83>: mov %edx,0x30(%edi) 0x0210e8b6 <do_sys_vm86+86>: je 0x210e9f0 <do_sys_vm86+400> 0x0210e8bc <do_sys_vm86+92>: cmp $0x3,%eax 0x0210e8bf <do_sys_vm86+95>: ja 0x210e9d5 <do_sys_vm86+373> 0x0210e8c5 <do_sys_vm86+101>: cmp $0x2,%eax 0x0210e8c8 <do_sys_vm86+104>: je 0x210e9c6 <do_sys_vm86+358> 0x0210e8ce <do_sys_vm86+110>: movl $0x247000,0x5bc(%esi) 0x0210e8d8 <do_sys_vm86+120>: mov 0xbc(%edi),%eax 0x0210e8de <do_sys_vm86+126>: movl $0x0,0x18(%eax) 0x0210e8e5 <do_sys_vm86+133>: mov 0x360(%esi),%eax 0x0210e8eb <do_sys_vm86+139>: mov %eax,0x5c0(%esi) 0x0210e8f1 <do_sys_vm86+145>: movl %fs,0x5c4(%esi) 0x0210e8f7 <do_sys_vm86+151>: movl %gs,0x5c8(%esi) 0x0210e8fd <do_sys_vm86+157>: mov $0xffffe000,%ebx 0x0210e902 <do_sys_vm86+162>: and %esp,%ebx 0x0210e904 <do_sys_vm86+164>: mov 0x14(%ebx),%eax 0x0210e907 <do_sys_vm86+167>: inc %eax 0x0210e908 <do_sys_vm86+168>: mov %eax,0x14(%ebx) 0x0210e90b <do_sys_vm86+171>: mov 0x10(%ebx),%eax 0x0210e90e <do_sys_vm86+174>: mov 0x4(%esi),%edx 0x0210e911 <do_sys_vm86+177>: shl $0x9,%eax 0x0210e914 <do_sys_vm86+180>: lea 0x26ff000(%eax),%ecx 0x0210e91a <do_sys_vm86+186>: lea 0x4c(%edi),%eax 0x0210e91d <do_sys_vm86+189>: mov %eax,0x360(%esi) 0x0210e923 <do_sys_vm86+195>: sub 0x1c(%edx),%eax 0x0210e926 <do_sys_vm86+198>: add 0x20(%edx),%eax 0x0210e929 <do_sys_vm86+201>: mov %eax,0x4(%ecx) 0x0210e92c <do_sys_vm86+204>: mov 0x25fe52c,%eax 0x0210e931 <do_sys_vm86+209>: test $0x800,%eax 0x0210e936 <do_sys_vm86+214>: je 0x210e942 <do_sys_vm86+226> 0x0210e938 <do_sys_vm86+216>: movl $0x0,0x364(%esi) 0x0210e942 <do_sys_vm86+226>: lea 0x340(%esi),%edx 0x0210e948 <do_sys_vm86+232>: mov 0x20(%edx),%eax 0x0210e94b <do_sys_vm86+235>: mov %eax,0x4(%ecx) 0x0210e94e <do_sys_vm86+238>: mov 0x10(%ecx),%ax 0x0210e952 <do_sys_vm86+242>: and $0xffff,%eax 0x0210e957 <do_sys_vm86+247>: cmp 0x24(%edx),%eax 0x0210e95a <do_sys_vm86+250>: jne 0x210e9b0 <do_sys_vm86+336> 0x0210e95c <do_sys_vm86+252>: mov 0x14(%ebx),%eax 0x0210e95f <do_sys_vm86+255>: dec %eax 0x0210e960 <do_sys_vm86+256>: mov %eax,0x14(%ebx) 0x0210e963 <do_sys_vm86+259>: mov 0x8(%ebx),%eax 0x0210e966 <do_sys_vm86+262>: and $0x8,%eax 0x0210e969 <do_sys_vm86+265>: jne 0x210e9a9 <do_sys_vm86+329> 0x0210e96b <do_sys_vm86+267>: mov 0x50(%edi),%eax 0x0210e96e <do_sys_vm86+270>: mov %eax,0x5b4(%esi) 0x0210e974 <do_sys_vm86+276>: testb $0x1,0x4c(%edi) 0x0210e978 <do_sys_vm86+280>: jne 0x210e9a0 <do_sys_vm86+320> 0x0210e97a <do_sys_vm86+282>: push $0x255f121 0x0210e97f <do_sys_vm86+287>: call 0x21285a0 <printk> 0x0210e984 <do_sys_vm86+292>: mov 0x4(%esi),%edx 0x0210e987 <do_sys_vm86+295>: xor %eax,%eax 0x0210e989 <do_sys_vm86+297>: mov %eax,%fs 0x0210e98b <do_sys_vm86+299>: mov %eax,%gs 0x0210e98d <do_sys_vm86+301>: mov %edi,%esp 0x0210e98f <do_sys_vm86+303>: mov %edx,%ebp 0x0210e991 <do_sys_vm86+305>: jmp 0xfffeb100 <resume_userspace> 0x0210e996 <do_sys_vm86+310>: pop %esi 0x0210e997 <do_sys_vm86+311>: pop %ebx 0x0210e998 <do_sys_vm86+312>: pop %esi 0x0210e999 <do_sys_vm86+313>: pop %edi 0x0210e99a <do_sys_vm86+314>: ret 0x0210e99b <do_sys_vm86+315>: nop 0x0210e99c <do_sys_vm86+316>: lea 0x0(%esi,1),%esi 0x0210e9a0 <do_sys_vm86+320>: push %esi 0x0210e9a1 <do_sys_vm86+321>: call 0x210e5b0 <mark_screen_rdonly> 0x0210e9a6 <do_sys_vm86+326>: pop %eax 0x0210e9a7 <do_sys_vm86+327>: jmp 0x210e97a <do_sys_vm86+282> 0x0210e9a9 <do_sys_vm86+329>: call 0x21222d0 <preempt_schedule> 0x0210e9ae <do_sys_vm86+334>: jmp 0x210e96b <do_sys_vm86+267> 0x0210e9b0 <do_sys_vm86+336>: mov 0x24(%edx),%ax 0x0210e9b4 <do_sys_vm86+340>: mov %ax,0x10(%ecx) 0x0210e9b8 <do_sys_vm86+344>: mov $0x174,%ecx 0x0210e9bd <do_sys_vm86+349>: mov 0x24(%edx),%eax 0x0210e9c0 <do_sys_vm86+352>: xor %edx,%edx 0x0210e9c2 <do_sys_vm86+354>: wrmsr 0x0210e9c4 <do_sys_vm86+356>: jmp 0x210e95c <do_sys_vm86+252> 0x0210e9c6 <do_sys_vm86+358>: movl $0x0,0x5bc(%esi) 0x0210e9d0 <do_sys_vm86+368>: jmp 0x210e8d8 <do_sys_vm86+120> 0x0210e9d5 <do_sys_vm86+373>: cmp $0x4,%eax 0x0210e9d8 <do_sys_vm86+376>: jne 0x210e8ce <do_sys_vm86+110> 0x0210e9de <do_sys_vm86+382>: movl $0x47000,0x5bc(%esi) 0x0210e9e8 <do_sys_vm86+392>: jmp 0x210e8d8 <do_sys_vm86+120> 0x0210e9ed <do_sys_vm86+397>: lea 0x0(%esi),%esi 0x0210e9f0 <do_sys_vm86+400>: movl $0x7000,0x5bc(%esi) 0x0210e9fa <do_sys_vm86+410>: jmp 0x210e8d8 <do_sys_vm86+120> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops 2003-11-18 7:21 ` Zwane Mwaikambo @ 2003-11-18 15:47 ` Linus Torvalds 2003-11-18 16:16 ` Zwane Mwaikambo 0 siblings, 1 reply; 49+ messages in thread From: Linus Torvalds @ 2003-11-18 15:47 UTC (permalink / raw) To: Zwane Mwaikambo Cc: Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins On Tue, 18 Nov 2003, Zwane Mwaikambo wrote: > > Another note from our avian friends; i seem to have sent a slightly > different dump from the patch, although they do both achieve the same > effect. I shall append it for completeness. Hmm. I don't see anything. However, it's a lot easier to read the gcc-generated assembly ("make arch/i386/kernel/vm86.s") than it is to read the objdump disassembly. It's also a lot easier to see what the assembly language is when giving the -fno-reorder-blocks switch to gcc. Without it, modern gcc's tend to have _way_ too many jumps around. But maybe that actually changes the behaviour too. Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops 2003-11-18 15:47 ` Linus Torvalds @ 2003-11-18 16:16 ` Zwane Mwaikambo 2003-11-18 16:37 ` Linus Torvalds 0 siblings, 1 reply; 49+ messages in thread From: Zwane Mwaikambo @ 2003-11-18 16:16 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins On Tue, 18 Nov 2003, Linus Torvalds wrote: > Hmm. I don't see anything. However, it's a lot easier to read the > gcc-generated assembly ("make arch/i386/kernel/vm86.s") than it is to read > the objdump disassembly. > > > It's also a lot easier to see what the assembly language is when giving > the > > -fno-reorder-blocks I'll recompile and verify that the bug can be reproduced and worked around with that flag. > switch to gcc. Without it, modern gcc's tend to have _way_ too many jumps > around. But maybe that actually changes the behaviour too. Here are diffs from the do_sys_vm86 only. --- asm-before 2003-11-18 10:56:02.967643808 -0500 +++ asm-after 2003-11-18 10:55:37.880457640 -0500 @@ -897,6 +897,10 @@ .LFE473: .Lfe4: .size sys_vm86,.Lfe4-sys_vm86 + .section .rodata.str1.1 +.LC6: + .string "ooh la la\n" + .text .p2align 4,,15 .type do_sys_vm86,@function do_sys_vm86: @@ -1053,29 +1057,37 @@ jne .L213 .L210: .loc 1 315 0 + pushl $.LC6 +.LCFI98: + call printk + .loc 1 316 0 movl 4(%esi), %edx #APP xorl %eax,%eax; movl %eax,%fs; movl %eax,%gs movl %edi,%esp movl %edx,%ebp jmp resume_userspace - .loc 1 323 0 #NO_APP - popl %ebx -.LCFI98: +.LBE53: popl %esi .LCFI99: - popl %edi + .loc 1 324 0 + popl %ebx .LCFI100: + popl %esi +.LCFI101: + popl %edi +.LCFI102: ret .loc 1 313 0 .p2align 4,,7 .L213: +.LBB65: pushl %esi -.LCFI101: +.LCFI103: call mark_screen_rdonly popl %eax -.LCFI102: +.LCFI104: jmp .L210 .loc 1 310 0 .L212: @@ -1083,7 +1095,7 @@ jmp .L197 .loc 14 454 0 .L211: -.LBB65: +.LBB66: movw 36(%edx), %ax movw %ax, 16(%ecx) .loc 14 455 0 @@ -1097,7 +1109,7 @@ .p2align 4,,7 .L183: .loc 1 283 0 -.LBE65: +.LBE66: movl $0, 1468(%esi) .loc 1 284 0 jmp .L182 @@ -1115,7 +1127,7 @@ movl $28672, 1468(%esi) .loc 1 287 0 jmp .L182 -.LBE53: +.LBE65: .LFE475: .Lfe5: .size do_sys_vm86,.Lfe5-do_sys_vm86 ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops 2003-11-18 16:16 ` Zwane Mwaikambo @ 2003-11-18 16:37 ` Linus Torvalds 2003-11-18 17:08 ` Zwane Mwaikambo 2003-11-19 20:32 ` Matt Mackall 0 siblings, 2 replies; 49+ messages in thread From: Linus Torvalds @ 2003-11-18 16:37 UTC (permalink / raw) To: Zwane Mwaikambo Cc: Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins On Tue, 18 Nov 2003, Zwane Mwaikambo wrote: > > Here are diffs from the do_sys_vm86 only. Ok. Much more readable. And there is something very suspicious there. The code with and without the printk() looks _identical_ apart from some trivial label renumbering, and the added pushl $.LC6 call printk .. asm .. popl %esi which all looks fine (esi is dead at that point, so the compiler is just using a "popl" as a shorter form of "addl $4,%esp"). Btw, you seem to compile with debugging, which makes the assembly language pretty much unreadable and accounts for most of the differences: the line numbers change. If you compile a kernel where the line numbers don't change (by commenting _out_ the printk rather than removing the whole line), your diff would be more readable. Anyway, there are _zero_ differences. Just for fun, try this: move the "printk()" to _below_ the "asm" statement. It will never actually get executed, but if it's an issue of some subtle code or data placement things (cache lines etc), maybe that also hides the oops, since all the same code and data will be generated, just not run... Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops 2003-11-18 16:37 ` Linus Torvalds @ 2003-11-18 17:08 ` Zwane Mwaikambo 2003-11-18 17:38 ` Martin J. Bligh 2003-11-19 20:32 ` Matt Mackall 1 sibling, 1 reply; 49+ messages in thread From: Zwane Mwaikambo @ 2003-11-18 17:08 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins On Tue, 18 Nov 2003, Linus Torvalds wrote: > Ok. Much more readable. > > And there is something very suspicious there. > > The code with and without the printk() looks _identical_ apart from some > trivial label renumbering, and the added > > pushl $.LC6 > call printk > .. asm .. > popl %esi > > which all looks fine (esi is dead at that point, so the compiler is just > using a "popl" as a shorter form of "addl $4,%esp"). > > Btw, you seem to compile with debugging, which makes the assembly > language pretty much unreadable and accounts for most of the > differences: the line numbers change. If you compile a kernel where the > line numbers don't change (by commenting _out_ the printk rather than > removing the whole line), your diff would be more readable. Aha! Thanks for mentioning that, noted. > Anyway, there are _zero_ differences. > > Just for fun, try this: move the "printk()" to _below_ the "asm" > statement. It will never actually get executed, but if it's an issue of > some subtle code or data placement things (cache lines etc), maybe that > also hides the oops, since all the same code and data will be generated, > just not run... Ok i just tried that and it still fails. Matt Mackall suggested i also try writing a minimal printk which has the same effect. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops 2003-11-18 17:08 ` Zwane Mwaikambo @ 2003-11-18 17:38 ` Martin J. Bligh 2003-11-18 17:22 ` Zwane Mwaikambo 0 siblings, 1 reply; 49+ messages in thread From: Martin J. Bligh @ 2003-11-18 17:38 UTC (permalink / raw) To: Zwane Mwaikambo, Linus Torvalds Cc: Ingo Molnar, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins >> Btw, you seem to compile with debugging, which makes the assembly >> language pretty much unreadable and accounts for most of the >> differences: the line numbers change. If you compile a kernel where the >> line numbers don't change (by commenting _out_ the printk rather than >> removing the whole line), your diff would be more readable. > > Aha! Thanks for mentioning that, noted. > >> Anyway, there are _zero_ differences. >> >> Just for fun, try this: move the "printk()" to _below_ the "asm" >> statement. It will never actually get executed, but if it's an issue of >> some subtle code or data placement things (cache lines etc), maybe that >> also hides the oops, since all the same code and data will be generated, >> just not run... > > Ok i just tried that and it still fails. Matt Mackall suggested i also try > writing a minimal printk which has the same effect. The other thing I've found printks to hide before is timing bugs / races. Unfortunately I can't see one here, but maybe someone else can ;-) Maybe inserting a 1ms delay or something in place of the printk would have the same effect? M. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops 2003-11-18 17:38 ` Martin J. Bligh @ 2003-11-18 17:22 ` Zwane Mwaikambo 0 siblings, 0 replies; 49+ messages in thread From: Zwane Mwaikambo @ 2003-11-18 17:22 UTC (permalink / raw) To: Martin J. Bligh Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins On Tue, 18 Nov 2003, Martin J. Bligh wrote: > The other thing I've found printks to hide before is timing bugs / races. > Unfortunately I can't see one here, but maybe someone else can ;-) > Maybe inserting a 1ms delay or something in place of the printk would > have the same effect? I've tried a number of timing related workarounds, namely; schedule_timeout(2*HZ) and some long spinning loops. I've also thrown a schedule() in there at some point. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops 2003-11-18 16:37 ` Linus Torvalds 2003-11-18 17:08 ` Zwane Mwaikambo @ 2003-11-19 20:32 ` Matt Mackall 2003-11-19 23:09 ` Matt Mackall 1 sibling, 1 reply; 49+ messages in thread From: Matt Mackall @ 2003-11-19 20:32 UTC (permalink / raw) To: Linus Torvalds Cc: Zwane Mwaikambo, Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins On Tue, Nov 18, 2003 at 08:37:25AM -0800, Linus Torvalds wrote: > > On Tue, 18 Nov 2003, Zwane Mwaikambo wrote: > > > > Here are diffs from the do_sys_vm86 only. > > Ok. Much more readable. > > And there is something very suspicious there. > > The code with and without the printk() looks _identical_ apart from some > trivial label renumbering, and the added > > pushl $.LC6 > call printk > .. asm .. > popl %esi > > which all looks fine (esi is dead at that point, so the compiler is just > using a "popl" as a shorter form of "addl $4,%esp"). > > Btw, you seem to compile with debugging, which makes the assembly > language pretty much unreadable and accounts for most of the > differences: the line numbers change. If you compile a kernel where the > line numbers don't change (by commenting _out_ the printk rather than > removing the whole line), your diff would be more readable. > > Anyway, there are _zero_ differences. > > Just for fun, try this: move the "printk()" to _below_ the "asm" > statement. It will never actually get executed, but if it's an issue of > some subtle code or data placement things (cache lines etc), maybe that > also hides the oops, since all the same code and data will be generated, > just not run... Zwane's got a K6-2 500MHz. I've just managed to reproduce this on my 1.4GHz Opteron box (with Debian gcc 3.2). Here, the "ooh la la" bit doesn't help. So my suspicion is that the printk is changing the timing just enough on Zwane's box that he's getting a timer interrupt knocking him out of vm86 mode before he hits a fatal bit in the fault handling path for 4/4. Printks in handle_vm86_trap, handle_vm86_fault, do_trap:vm86_trap, and do_general_protection:gp_in_vm86 never fire so there's probably something amiss in the trampoline code. -- Matt Mackall : http://www.selenic.com : Linux development and consulting ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops 2003-11-19 20:32 ` Matt Mackall @ 2003-11-19 23:09 ` Matt Mackall 2003-11-20 7:14 ` Zwane Mwaikambo 2003-11-20 7:44 ` Matt Mackall 0 siblings, 2 replies; 49+ messages in thread From: Matt Mackall @ 2003-11-19 23:09 UTC (permalink / raw) To: Linus Torvalds Cc: Zwane Mwaikambo, Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins On Wed, Nov 19, 2003 at 02:32:10PM -0600, Matt Mackall wrote: > > Zwane's got a K6-2 500MHz. I've just managed to reproduce this on my > 1.4GHz Opteron box (with Debian gcc 3.2). Here, the "ooh la la" bit > doesn't help. So my suspicion is that the printk is changing the > timing just enough on Zwane's box that he's getting a timer interrupt > knocking him out of vm86 mode before he hits a fatal bit in the fault > handling path for 4/4. Printks in handle_vm86_trap, handle_vm86_fault, > do_trap:vm86_trap, and do_general_protection:gp_in_vm86 never fire so > there's probably something amiss in the trampoline code. Some more datapoints: CPU distro compiler video X result K6-2/500 connectiva 9 2.96 trident 4.3 reboot (zwane) K6-2/500 connectiva 9 3.2.2 trident 4.3 reboot (zwane) Opteron 240 debian unstable 3.2 S3 4.2.1 reboot Athlon 2100 debian unstable 3.2 radeon 7500 4.2.1 works P4M 1800 debian unstable 3.2 radeon m7 4.2.1 reboot -- Matt Mackall : http://www.selenic.com : Linux development and consulting ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops 2003-11-19 23:09 ` Matt Mackall @ 2003-11-20 7:14 ` Zwane Mwaikambo 2003-11-20 7:44 ` Matt Mackall 1 sibling, 0 replies; 49+ messages in thread From: Zwane Mwaikambo @ 2003-11-20 7:14 UTC (permalink / raw) To: Matt Mackall Cc: Linus Torvalds, Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins On Wed, 19 Nov 2003, Matt Mackall wrote: > On Wed, Nov 19, 2003 at 02:32:10PM -0600, Matt Mackall wrote: > > > > Zwane's got a K6-2 500MHz. I've just managed to reproduce this on my > > 1.4GHz Opteron box (with Debian gcc 3.2). Here, the "ooh la la" bit > > doesn't help. So my suspicion is that the printk is changing the > > timing just enough on Zwane's box that he's getting a timer interrupt > > knocking him out of vm86 mode before he hits a fatal bit in the fault > > handling path for 4/4. Printks in handle_vm86_trap, handle_vm86_fault, > > do_trap:vm86_trap, and do_general_protection:gp_in_vm86 never fire so > > there's probably something amiss in the trampoline code. > > Some more datapoints: Thanks for trying those out, i got another one to add. > CPU distro compiler video X result > K6-2/500 connectiva 9 2.96 trident 4.3 reboot (zwane) > K6-2/500 connectiva 9 3.2.2 trident 4.3 reboot (zwane) > Opteron 240 debian unstable 3.2 S3 4.2.1 reboot > Athlon 2100 debian unstable 3.2 radeon 7500 4.2.1 works > P4M 1800 debian unstable 3.2 radeon m7 4.2.1 reboot P4/Xeon 2000 Fedora Core 1 3.3.2 ATI Rage XL 4.3.0 reboot ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops 2003-11-19 23:09 ` Matt Mackall 2003-11-20 7:14 ` Zwane Mwaikambo @ 2003-11-20 7:44 ` Matt Mackall 2003-11-20 7:53 ` Andrew Morton 2003-11-20 8:13 ` Matt Mackall 1 sibling, 2 replies; 49+ messages in thread From: Matt Mackall @ 2003-11-20 7:44 UTC (permalink / raw) To: Linus Torvalds Cc: Zwane Mwaikambo, Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins On Wed, Nov 19, 2003 at 05:09:28PM -0600, Matt Mackall wrote: > On Wed, Nov 19, 2003 at 02:32:10PM -0600, Matt Mackall wrote: > > > > Zwane's got a K6-2 500MHz. I've just managed to reproduce this on my > > 1.4GHz Opteron box (with Debian gcc 3.2). Here, the "ooh la la" bit > > doesn't help. So my suspicion is that the printk is changing the > > timing just enough on Zwane's box that he's getting a timer interrupt > > knocking him out of vm86 mode before he hits a fatal bit in the fault > > handling path for 4/4. Printks in handle_vm86_trap, handle_vm86_fault, > > do_trap:vm86_trap, and do_general_protection:gp_in_vm86 never fire so > > there's probably something amiss in the trampoline code. > > Some more datapoints: > > CPU distro compiler video X result > K6-2/500 connectiva 9 2.96 trident 4.3 reboot (zwane) > K6-2/500 connectiva 9 3.2.2 trident 4.3 reboot (zwane) > Opteron 240 debian unstable 3.2 S3 4.2.1 reboot > Athlon 2100 debian unstable 3.2 radeon 7500 4.2.1 works > P4M 1800 debian unstable 3.2 radeon m7 4.2.1 reboot And indeed it does turn out to be a problem with the trampoline mechanics. The fix for -mm4: Fix triple faulting on some boxes with 4G/4G mm-mpm/arch/i386/kernel/vm86.c | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) diff -puN arch/i386/kernel/vm86.c~virtual-esp arch/i386/kernel/vm86.c --- mm/arch/i386/kernel/vm86.c~virtual-esp 2003-11-20 01:36:32.000000000 -0600 +++ mm-mpm/arch/i386/kernel/vm86.c 2003-11-20 01:36:32.000000000 -0600 @@ -306,7 +306,7 @@ static void do_sys_vm86(struct kernel_vm tss->esp0 = virtual_esp0(tsk); if (cpu_has_sep) tsk->thread.sysenter_cs = 0; - load_esp0(tss, &tsk->thread); + load_virtual_esp0(tss, tsk); put_cpu(); tsk->thread.screen_bitmap = info->screen_bitmap; _ -- Matt Mackall : http://www.selenic.com : Linux development and consulting ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops 2003-11-20 7:44 ` Matt Mackall @ 2003-11-20 7:53 ` Andrew Morton 2003-11-20 8:13 ` Matt Mackall 1 sibling, 0 replies; 49+ messages in thread From: Andrew Morton @ 2003-11-20 7:53 UTC (permalink / raw) To: Matt Mackall; +Cc: torvalds, zwane, mingo, mbligh, linux-kernel, linux-mm, hugh Matt Mackall <mpm@selenic.com> wrote: > > - load_esp0(tss, &tsk->thread); > + load_virtual_esp0(tss, tsk); Thanks guys. Now I'll have to put something else in there to keep you amused ;) ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops 2003-11-20 7:44 ` Matt Mackall 2003-11-20 7:53 ` Andrew Morton @ 2003-11-20 8:13 ` Matt Mackall 1 sibling, 0 replies; 49+ messages in thread From: Matt Mackall @ 2003-11-20 8:13 UTC (permalink / raw) To: Linus Torvalds Cc: Zwane Mwaikambo, Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins On Thu, Nov 20, 2003 at 01:44:05AM -0600, Matt Mackall wrote: > On Wed, Nov 19, 2003 at 05:09:28PM -0600, Matt Mackall wrote: > > On Wed, Nov 19, 2003 at 02:32:10PM -0600, Matt Mackall wrote: > > > > > > Zwane's got a K6-2 500MHz. I've just managed to reproduce this on my > > > 1.4GHz Opteron box (with Debian gcc 3.2). Here, the "ooh la la" bit > > > doesn't help. So my suspicion is that the printk is changing the > > > timing just enough on Zwane's box that he's getting a timer interrupt > > > knocking him out of vm86 mode before he hits a fatal bit in the fault > > > handling path for 4/4. Printks in handle_vm86_trap, handle_vm86_fault, > > > do_trap:vm86_trap, and do_general_protection:gp_in_vm86 never fire so > > > there's probably something amiss in the trampoline code. > > > > Some more datapoints: > > > > CPU distro compiler video X result > > K6-2/500 connectiva 9 2.96 trident 4.3 reboot (zwane) > > K6-2/500 connectiva 9 3.2.2 trident 4.3 reboot (zwane) > > Opteron 240 debian unstable 3.2 S3 4.2.1 reboot > > Athlon 2100 debian unstable 3.2 radeon 7500 4.2.1 works > > P4M 1800 debian unstable 3.2 radeon m7 4.2.1 reboot > > And indeed it does turn out to be a problem with the trampoline > mechanics. The fix for -mm4: Cleanup, as pointed out by Zwane: Fix triple faulting on some boxes with 4G/4G mm-mpm/arch/i386/kernel/vm86.c | 3 +-- 1 files changed, 1 insertion(+), 2 deletions(-) diff -puN arch/i386/kernel/vm86.c~virtual-esp arch/i386/kernel/vm86.c --- mm/arch/i386/kernel/vm86.c~virtual-esp 2003-11-20 01:36:32.000000000 -0600 +++ mm-mpm/arch/i386/kernel/vm86.c 2003-11-20 02:08:38.000000000 -0600 @@ -303,10 +303,9 @@ static void do_sys_vm86(struct kernel_vm tss = init_tss + get_cpu(); tsk->thread.esp0 = (unsigned long) &info->VM86_TSS_ESP0; - tss->esp0 = virtual_esp0(tsk); if (cpu_has_sep) tsk->thread.sysenter_cs = 0; - load_esp0(tss, &tsk->thread); + load_virtual_esp0(tss, tsk); put_cpu(); tsk->thread.screen_bitmap = info->screen_bitmap; _ -- Matt Mackall : http://www.selenic.com : Linux development and consulting ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 2003-11-13 7:30 2.6.0-test9-mm3 Andrew Morton ` (3 preceding siblings ...) 2003-11-14 5:07 ` 2.6.0-test9-mm3 Martin J. Bligh @ 2003-11-14 19:08 ` Martin J. Bligh 2003-11-14 18:59 ` 2.6.0-test9-mm3 Andrew Morton 2003-11-14 19:10 ` 2.6.0-test9-mm3 Badari Pulavarty 4 siblings, 2 replies; 49+ messages in thread From: Martin J. Bligh @ 2003-11-14 19:08 UTC (permalink / raw) To: Andrew Morton, linux-kernel, linux-mm > - Several ext2 and ext3 allocator fixes. These need serious testing on big > SMP. OK, ext3 survived a swatting on the 16-way as well. It's still slow as snot, but it does work ;-) No changes from before, methinks. Diffprofile for kernbench (-j) from ext2 to ext3 on mm3 27022 16.3% total 24069 53.3% default_idle 583 2.4% page_remove_rmap 539 248.4% fd_install 478 388.6% __blk_queue_bounce 319 4.0% __d_lookup 220 122.9% may_open 204 68.2% filemap_nopage 124 0.0% journal_add_journal_head 122 321.1% __find_get_block_slow 122 0.0% do_get_write_access 101 57.1% generic_fillattr ... -52 -73.2% .text.lock.highmem -52 -94.5% generic_file_read -53 -18.7% do_generic_mapping_read -58 -3.3% do_no_page -65 -13.0% page_address -65 -60.2% kmap_high -74 -100.0% grab_block -75 -3.3% do_page_fault -85 -1.9% __copy_from_user_ll -273 -19.5% link_path_walk -299 -6.5% find_get_page -758 -100.0% generic_file_open SDET: 1726439 214.7% total 1383611 345.4% default_idle 115417 0.0% .text.lock.transaction 79362 0.0% find_next_usable_block 38003 0.0% do_get_write_access 32429 2316.4% __down 31231 0.0% journal_dirty_metadata 15114 553.8% schedule 14350 1253.3% __wake_up 13459 0.0% start_this_handle 13100 0.0% journal_stop ... -1105 -25.1% copy_mm -1144 -100.0% generic_file_open -1205 -45.0% .text.lock.dec_and_lock -1342 -100.0% ext2_new_inode -1365 -50.5% follow_mount -1453 -100.0% grab_block -1580 -30.5% remove_shared_vm_struct -1759 -11.0% copy_page_range -2145 -18.4% __d_lookup -2157 -35.6% path_lookup -2222 -33.7% atomic_dec_and_lock -2813 -25.0% release_pages -3764 -19.1% zap_pte_range -8954 -21.2% page_add_rmap -22707 -25.0% page_remove_rmap ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 2003-11-14 19:08 ` 2.6.0-test9-mm3 Martin J. Bligh @ 2003-11-14 18:59 ` Andrew Morton 2003-11-14 19:32 ` 2.6.0-test9-mm3 Mike Fedyk 2003-11-14 19:10 ` 2.6.0-test9-mm3 Badari Pulavarty 1 sibling, 1 reply; 49+ messages in thread From: Andrew Morton @ 2003-11-14 18:59 UTC (permalink / raw) To: Martin J. Bligh; +Cc: linux-kernel, linux-mm "Martin J. Bligh" <mbligh@aracnet.com> wrote: > > > > > - Several ext2 and ext3 allocator fixes. These need serious testing on big > > SMP. > > OK, ext3 survived a swatting on the 16-way as well> Great, thanks. > It's still slow as snot, but it does work ;-) I think SDET generates storms of metadata updates. Making the journal larger may help get that idle time down. Probably the default journal size is too small nowadays. Most tests seem to run faster when it is enlarged. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 2003-11-14 18:59 ` 2.6.0-test9-mm3 Andrew Morton @ 2003-11-14 19:32 ` Mike Fedyk 2003-11-14 20:27 ` 2.6.0-test9-mm3 John Stoffel 0 siblings, 1 reply; 49+ messages in thread From: Mike Fedyk @ 2003-11-14 19:32 UTC (permalink / raw) To: Andrew Morton; +Cc: Martin J. Bligh, linux-kernel, linux-mm On Fri, Nov 14, 2003 at 10:59:47AM -0800, Andrew Morton wrote: > "Martin J. Bligh" <mbligh@aracnet.com> wrote: > > > > > > > > > - Several ext2 and ext3 allocator fixes. These need serious testing on big > > > SMP. > > > > OK, ext3 survived a swatting on the 16-way as well> > > Great, thanks. > > > It's still slow as snot, but it does work ;-) > > I think SDET generates storms of metadata updates. Making the journal > larger may help get that idle time down. > > Probably the default journal size is too small nowadays. Most tests seem > to run faster when it is enlarged. Or maybe if it didn't start sync committing from the journal once it hits 50%. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 2003-11-14 19:32 ` 2.6.0-test9-mm3 Mike Fedyk @ 2003-11-14 20:27 ` John Stoffel 2003-11-15 1:01 ` 2.6.0-test9-mm3 Mike Fedyk 0 siblings, 1 reply; 49+ messages in thread From: John Stoffel @ 2003-11-14 20:27 UTC (permalink / raw) To: Mike Fedyk; +Cc: Andrew Morton, Martin J. Bligh, linux-kernel, linux-mm Mike> Or maybe if it didn't start sync committing from the journal Mike> once it hits 50%. Instead of using a percentage like this, would it make sense to flush the journal when there are only N number of free journal slots/entries left? Now the question is how to compute N in a sane way that works for small (memory) systems, as well as for larger systems. You don't want to grow N too aggresively, or base it on the memory of the system, do you? When you have a 20mb journal, maybe starting writeout after 10mb is used makes sense, because you've only got 10 transaction slots open. But when you have a 200mb journal, does it make sense to start writeout when you only have 100 transaction slots left? Since I don't know the internals of Ext3 at all, I'm probably completely missing the idea here, but my gut feeling is that the scaling we use in these cases shouldn't be linear at all, but more likely inverse logyrythmic instead. Basically, the larger we get with a resource, the slower we grow our useage, or the smaller we grow the absolute size of the writeout buffer(s). Hmmm... this doesn't sound clear even to me. But the idea I think I'm trying to get at is that if we have X size of a journal, we want to start writeout when we have X/2 available. But when we have Y size of a journal, where Y is X*10 (or larger), we don't want Y/2 as the cutover point, we want something like Y/10. The idea is that we grow the denominator here at a slow rate, since it will shrink the free buffer percentage nicely, yet not let us get too close to a truly zero sized buffer. X X/N ----- -------- 10 5 100 10 1000 25 10000 125 Does this make any sense to anyone? John ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 2003-11-14 20:27 ` 2.6.0-test9-mm3 John Stoffel @ 2003-11-15 1:01 ` Mike Fedyk 0 siblings, 0 replies; 49+ messages in thread From: Mike Fedyk @ 2003-11-15 1:01 UTC (permalink / raw) To: John Stoffel; +Cc: Andrew Morton, Martin J. Bligh, linux-kernel, linux-mm On Fri, Nov 14, 2003 at 03:27:01PM -0500, John Stoffel wrote: > You don't want to grow N too aggresively, or base it on the memory of > the system, do you? When you have a 20mb journal, maybe starting > writeout after 10mb is used makes sense, because you've only got 10 > transaction slots open. But when you have a 200mb journal, does it > make sense to start writeout when you only have 100 transaction slots > left? The minimum transaction size is one block (since ext3 is the only journaling FS to log entire blocks, instead of the specific logical changes made during the transaction), and your blocks are 1k, 2k, or 4k. Though many times you'll have several blocks per transaction since each transaction can change bitmaps, directory blocks, and etc. > Since I don't know the internals of Ext3 at all, I'm probably > completely missing the idea here, but my gut feeling is that the > scaling we use in these cases shouldn't be linear at all, but more > likely inverse logyrythmic instead. Basically, the larger we get with > a resource, the slower we grow our useage, or the smaller we grow the > absolute size of the writeout buffer(s). > > Hmmm... this doesn't sound clear even to me. But the idea I think I'm > trying to get at is that if we have X size of a journal, we want to > start writeout when we have X/2 available. But when we have Y size of > a journal, where Y is X*10 (or larger), we don't want Y/2 as the > cutover point, we want something like Y/10. The idea is that we grow > the denominator here at a slow rate, since it will shrink the free > buffer percentage nicely, yet not let us get too close to a truly zero > sized buffer. Last I heard, ext3 will try to flush the journal with an async process and if that isn't able to keep up, once the journal hits 50% full, the system will write syncronously until the journal is empty (or was that until it was 25% full or less, I forget...). AFAIK everyone agrees that this is not optimal, but nobody's taken the time to fix it yet either. Mike ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 2003-11-14 19:08 ` 2.6.0-test9-mm3 Martin J. Bligh 2003-11-14 18:59 ` 2.6.0-test9-mm3 Andrew Morton @ 2003-11-14 19:10 ` Badari Pulavarty 2003-11-14 20:29 ` 2.6.0-test9-mm3 Martin J. Bligh 1 sibling, 1 reply; 49+ messages in thread From: Badari Pulavarty @ 2003-11-14 19:10 UTC (permalink / raw) To: Martin J. Bligh, Andrew Morton, linux-kernel, linux-mm On Friday 14 November 2003 11:08 am, Martin J. Bligh wrote: > > - Several ext2 and ext3 allocator fixes. These need serious testing on > > big SMP. > > OK, ext3 survived a swatting on the 16-way as well. It's still slow as > snot, but it does work ;-) No changes from before, methinks. > > Diffprofile for kernbench (-j) from ext2 to ext3 on mm3 > > 27022 16.3% total > 24069 53.3% default_idle > 583 2.4% page_remove_rmap > 539 248.4% fd_install > 478 388.6% __blk_queue_bounce What driver are you using ? Why are you bouncing ? Thanks, Badari ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 2003-11-14 19:10 ` 2.6.0-test9-mm3 Badari Pulavarty @ 2003-11-14 20:29 ` Martin J. Bligh 2003-11-17 20:58 ` 2.6.0-test9-mm3 bill davidsen 0 siblings, 1 reply; 49+ messages in thread From: Martin J. Bligh @ 2003-11-14 20:29 UTC (permalink / raw) To: Badari Pulavarty, Andrew Morton, linux-kernel, linux-mm >> > - Several ext2 and ext3 allocator fixes. These need serious testing on >> > big SMP. >> >> OK, ext3 survived a swatting on the 16-way as well. It's still slow as >> snot, but it does work ;-) No changes from before, methinks. >> >> Diffprofile for kernbench (-j) from ext2 to ext3 on mm3 >> >> 27022 16.3% total >> 24069 53.3% default_idle >> 583 2.4% page_remove_rmap >> 539 248.4% fd_install >> 478 388.6% __blk_queue_bounce > > What driver are you using ? Why are you bouncing ? qlogicisp. Because the driver is crap? ;-) M. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: 2.6.0-test9-mm3 2003-11-14 20:29 ` 2.6.0-test9-mm3 Martin J. Bligh @ 2003-11-17 20:58 ` bill davidsen 0 siblings, 0 replies; 49+ messages in thread From: bill davidsen @ 2003-11-17 20:58 UTC (permalink / raw) To: linux-kernel In article <100480000.1068841761@flay>, Martin J. Bligh <mbligh@aracnet.com> wrote: | >> > - Several ext2 and ext3 allocator fixes. These need serious testing on | >> > big SMP. | >> | >> OK, ext3 survived a swatting on the 16-way as well. It's still slow as | >> snot, but it does work ;-) No changes from before, methinks. | >> | >> Diffprofile for kernbench (-j) from ext2 to ext3 on mm3 | >> | >> 27022 16.3% total | >> 24069 53.3% default_idle | >> 583 2.4% page_remove_rmap | >> 539 248.4% fd_install | >> 478 388.6% __blk_queue_bounce | > | > What driver are you using ? Why are you bouncing ? | | qlogicisp. Because the driver is crap? ;-) The question is, does that make your testing better or worse in terms of checking the new code? Clearly you have done a good job of checking the "disk can't keep up" case, is there a need to test further with a much higher transaction rate? I would assume that if there were lock issues they would have shown up, which is probably all that's needed. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2003-12-04 4:35 UTC | newest] Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-11-13 7:30 2.6.0-test9-mm3 Andrew Morton 2003-11-13 20:03 ` [PATCH] linux-2.6.0-test9-mm3_verbose-timesource-acpi-pm_A0 john stultz 2003-11-13 22:03 ` 2.6.0-test9-mm3 - AIO test results Daniel McNeil 2003-11-17 5:25 ` Suparna Bhattacharya 2003-11-18 1:15 ` Daniel McNeil 2003-11-18 1:37 ` Daniel McNeil 2003-11-18 11:55 ` Suparna Bhattacharya 2003-11-18 23:47 ` Daniel McNeil 2003-11-24 9:42 ` Suparna Bhattacharya 2003-11-25 23:49 ` [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch Daniel McNeil 2003-11-26 7:55 ` Suparna Bhattacharya 2003-12-02 1:35 ` Daniel McNeil 2003-12-02 15:25 ` Suparna Bhattacharya 2003-12-03 23:14 ` Daniel McNeil 2003-12-04 4:40 ` Suparna Bhattacharya 2003-11-13 22:04 ` 2.6.0-test9-mm3 (compile stats) John Cherry 2003-11-14 5:07 ` 2.6.0-test9-mm3 Martin J. Bligh 2003-11-14 20:57 ` 2.6.0-test9-mm3 Zwane Mwaikambo 2003-11-14 21:57 ` 2.6.0-test9-mm3 Martin J. Bligh 2003-11-14 21:37 ` 2.6.0-test9-mm3 Zwane Mwaikambo 2003-11-14 21:47 ` 2.6.0-test9-mm3 Linus Torvalds 2003-11-15 0:55 ` 2.6.0-test9-mm3 Zwane Mwaikambo 2003-11-15 19:34 ` [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops Zwane Mwaikambo 2003-11-15 19:52 ` Zwane Mwaikambo 2003-11-17 21:46 ` Zwane Mwaikambo 2003-11-17 22:42 ` Linus Torvalds 2003-11-17 23:01 ` Zwane Mwaikambo 2003-11-17 23:14 ` Zwane Mwaikambo 2003-11-18 7:21 ` Zwane Mwaikambo 2003-11-18 15:47 ` Linus Torvalds 2003-11-18 16:16 ` Zwane Mwaikambo 2003-11-18 16:37 ` Linus Torvalds 2003-11-18 17:08 ` Zwane Mwaikambo 2003-11-18 17:38 ` Martin J. Bligh 2003-11-18 17:22 ` Zwane Mwaikambo 2003-11-19 20:32 ` Matt Mackall 2003-11-19 23:09 ` Matt Mackall 2003-11-20 7:14 ` Zwane Mwaikambo 2003-11-20 7:44 ` Matt Mackall 2003-11-20 7:53 ` Andrew Morton 2003-11-20 8:13 ` Matt Mackall 2003-11-14 19:08 ` 2.6.0-test9-mm3 Martin J. Bligh 2003-11-14 18:59 ` 2.6.0-test9-mm3 Andrew Morton 2003-11-14 19:32 ` 2.6.0-test9-mm3 Mike Fedyk 2003-11-14 20:27 ` 2.6.0-test9-mm3 John Stoffel 2003-11-15 1:01 ` 2.6.0-test9-mm3 Mike Fedyk 2003-11-14 19:10 ` 2.6.0-test9-mm3 Badari Pulavarty 2003-11-14 20:29 ` 2.6.0-test9-mm3 Martin J. Bligh 2003-11-17 20:58 ` 2.6.0-test9-mm3 bill davidsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).