* 2.6.11-rc1-mm1 @ 2005-01-14 8:23 Andrew Morton 2005-01-14 8:47 ` 2.6.11-rc1-mm1 Andi Kleen ` (11 more replies) 0 siblings, 12 replies; 142+ messages in thread From: Andrew Morton @ 2005-01-14 8:23 UTC (permalink / raw) To: linux-kernel ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc1/2.6.11-rc1-mm1/ - Added bk-xfs to the -mm "external trees" lineup. - Added the Linux Trace Toolkit (and hence relayfs). Mainly because I haven't yet taken as close a look at LTT as I should have. Probably neither have you. It needs a bit of work on the kernel<->user periphery, which is not a big deal. As does relayfs, IMO. It seems to need some regularised way in which a userspace relayfs client can tell relayfs what file(s) to use. LTT is currently using some ghastly stick-a-pathname-in-/proc thing. Relayfs should provide this service. relayfs needs a closer look too. A lot of advanced instrumentation projects seem to require it, but none of them have been merged. Lots of people say "use netlink instead" and lots of other people say "err, we think relayfs is better". This is a discussion which needs to be had. - The 2.6.10-mm3 announcement was munched by the vger filters, sorry. One of the uml patches had an inopportune substring in its name (oh pee tee hyphen oh you tee). Nice trick if you meant it ;) - Big update to the ext3 extended attribute support. agruen, tridge and sct have been cooking this up for a while. samba4 proved to be a good stress test. - davej's "2.6 post-Halloween features" document has been added to -mm as Documentation/feature-list-2.6.txt in the hope that someone will review it and help keep it up-to-date. - Added FUSE (filesystem in userspace) for people to play with. Am agnostic as to whether it should be merged (haven't read it at all closely yet, either), but I am impressed by the amount of care which has obviously gone into it. Opinions sought. Changes since 2.6.10-mm3: linus.patch bk-alsa.patch bk-arm.patch bk-cifs.patch bk-cpufreq.patch bk-drm-via.patch bk-i2c.patch bk-ide-dev.patch bk-input.patch bk-dtor-input.patch bk-kbuild.patch bk-kconfig.patch bk-netdev.patch bk-ntfs.patch bk-pci.patch bk-usb.patch bk-xfs.patch Latest versions of everyone's bk trees. -m32r-include-nodemaskh-for-build-fix.patch -acpi_smp_processor_id-warning-fix.patch -sn2-trivial-nodemaskh-include-fix.patch -split-bprm_apply_creds-into-two-functions.patch -merge-_vm_enough_memorys-into-a-common-helper.patch -ppc64-fix-iommu-cleanup-regression.patch -ppc64-rename-perf-counter-register-defines.patch -dmi_iterate-fix.patch -arch-i386-kernel-cpu-mtrr-too-many-bits-are-masked-off-from-cr4.patch -pm-introduce-pm_message_t.patch -mark-older-power-managment-as-deprecated.patch -swsusp-device-power-management-fix.patch -swsusp-properly-suspend-and-resume-all-devices.patch -m32r-employ-new-kernel-api-abi.patch -m68k-update-defconfigs-for-2610.patch -mmc_wbsd-depends-on-isa.patch -m68k-remove-nowhere-referenced-files.patch -direct-write-vs-truncate-deadlock.patch -random-whitespace-cleanups.patch -random-remove-pool-resizing-sysctl.patch -cciss-update-to-version-264.patch -reiserfs-vs-8115-test-adjustment.patch -export-get_sb_pseudo.patch -proc_kcore-correct-double-accounting-of-elf_buflen.patch -remove-intermezzo-maintainers-entry.patch -3c59x-reload-eeprom-values-at-rmmod-for-needy-cards.patch -3c59x-remove-eeprom_reset-for-3c905b.patch -3c59x-add-eeprom_reset-for-3c900-boomerang.patch -3c59x-pm-fix.patch -3c59x-missing-pci_disable_device.patch -3c59x-use-netdev_priv.patch -3c59x-make-use-of-generic_mii_ioctl.patch -3c59x-vortex-select-mii.patch -3c59x-support-more-ethtool_ops.patch -inux-269-fs-proc-basec-array-size.patch -linux-269-fs-proc-proc_ttyc-avoid-array.patch -optimize-prefetch-usage-in-list_for_each_xxx.patch -signalc-convert-assertion-to-bug_on.patch -right-severity-level-for-fatal-message.patch -remove-unused-drivers-char-rio-cdprotoh.patch -remove-unused-drivers-char-rsf16fmih.patch -mtd-added-nec-upd29f064115-support.patch -ide-cd-is-very-noisy.patch -signedness-fix-in-deadline-ioschedc.patch -cleanup-virtual-console-selectionc-interface.patch -warn-about-cli-sti-co-uses-even-on-up.patch -remove-umsdos-from-tree.patch -kill-quota_v2c-printk-of-size_t-warning.patch -silence-numerous-size_t-warnings-in-drivers-acpi-processor_idlec.patch -make-irda-string-tables-conditional-on-config_irda_debug.patch -fix-unresolved-mtd-symbols-in-scx200_docflashc.patch -fix-module_param-type-mismatch-in-drivers-char-n_hdlcc.patch -drivers-char-misc-cleanups.patch -pktcdvd-make-two-functions-static.patch -pktcdvd-grep-friendly-function-prototypes.patch -pktcdvd-small-documentation-update.patch -isofs-remove-useless-include.patch -synaptics-remove-unused-struct-member-variable.patch -kill-one-if-x-vfreex-usage.patch -smbfs-make-some-functions-static.patch -mips-fixed-build-error-about-nec-vr4100-series.patch -efs-make-a-struct-static-fwd.patch -fs-ext3-possible-cleanups.patch -fs-ext2-xattrc-make-ext2_xattr_list-static.patch -fs-hugetlbfs-inodec-make-4-functions-static.patch -remove-nr_super-define.patch -i2o-fix-init-exit-section-usage.patch -use-modern-format-for-pci-apic-irq-transform-printks.patch -coda-bounds-checking.patch -coda-use-list_for_each_entry_safe.patch -coda-make-global-code-static.patch -coda-remove-unused-coda_mknod.patch -coda-rename-coda_psdev-to-coda.patch -remove-outdated-smbfs-changelog.patch -update-geerts-address-in-credits.patch -cputime-introduce-cputime.patch -cputime-microsecond-based-cputime-for-s390.patch -4level-swapoff-hang-fix.patch -snd-intel8x0-ac97-quirk-entries-for-hp-xw6200-xw8000.patch -igxb-build-fix.patch -eepro-build-fix.patch -3c515-warning-fix.patch -ixgb-whitespace-fix.patch -fix-expand_stack-smp-race.patch -ppc-fix-idle-with-interrupts-disabled.patch -ppc-remove-duplicate-define.patch -ppc-include-missing-header.patch -ppc64-move-hotplug-cpu-functions-to-smp_ops.patch -ppc64-kprobes-breaks-bug-handling.patch -ppc64-fix-numa-build.patch -ppc64-enhance-oops-printing.patch -ppc64-fix-xmon-longjmp-handling.patch -ppc64-make-xmon-print-bug-warnings.patch -ppc64-xtime-gettimeofday-can-get-out-of-sync.patch -ppc64-pci-cleanup.patch -ppc64-remove-flush_instruction_cache.patch -ppc64-interrupt-code-cleanup.patch -ppc64-fix-rtas_set_indicator9005.patch -ppc64-make-numa-code-handle-unexpected-layouts.patch -ppc64-semicolon-in-rtasdc.patch -improved-wait_8254_wraparound.patch -kprobes-dont-steal-interrupts-from-vm86.patch -apic-lapic-hanging-problems-on-nforce2-system.patch -x86_64-work-around-another-aperture-bios-bug-on-opteron.patch -x86_64-hack-to-disable-clustered-mode-on-amd-systems.patch -x86_64-updates-for-x86-64-boot-optionstxt.patch -x86_64-update-defconfig.patch -x86_64-remove-old-checksumc.patch -x86_64-fix-sparse-warnings.patch -x86_64-fix-some-gcc-4-warnings-in-arch-x86_64.patch -i386-port-missing-cpuid-bits-from-x86-64-to-i386.patch -i386-amd-dual-core-support-for-i386.patch -i386-count-both-multi-cores-and-smp-siblings-in.patch -i386-count-both-multi-cores-and-smp-siblings-in-fix.patch -i386-export-phys_proc_id.patch -x86_64-move-memset_io-out-of-line-to-avoid-warnings.patch -x86_64-fix-ioremap-attribute-restoration-on-i386-and.patch -x86_64-fix-tlb-reporting-on-k8.patch -x86_64-change_page_attr-logic-fixes-from-andrea.patch -x86_64-fix-mptables-printk.patch -x86_64-add-new-key-syscalls.patch -x86_64-remove-direct-mem_map-references.patch -x86_64-remove-check-that-limited-max-number-of-io-apic.patch -x86_64-prevent-gcc-from-generating-mmx-code-by-mistake.patch -x86_64-dont-sync-apic-arbs-on-p4s.patch -x86_64-cleanups-preparing-for-memory-hotplug.patch -x86_64-remove-unused-prototypes.patch -x86_64-fix-a-lot-of-broken-white-space-in.patch -x86_64-fix-signal-fpu-leak-on-i386-and-x86-64.patch -x86_64-disable-conforming-bit-on-user32_cs-segment.patch -x86_64-notify-user-of-mce-events.patch -uml-add-some-pudding.patch -uml-use-va_end-wherever-va_args-are-used.patch -uml-split-out-arch-specific-syscalls-from-generic-ones.patch -uml-three-level-page-table-support.patch -uml-x86-64-core-support.patch -uml-x86-64-config-support.patch -uml-factor-out-register-saving-and-restoring.patch -uml-x86_64-ptrace-support.patch -uml-separate-out-signal-reception.patch -uml-make-a-common-misconfiguration-impossible.patch -uml-separate-out-the-time-code.patch -uml-x86-64-headers.patch -uml-split-out-arch-link-address-definitions.patch -uml-dont-use-__nr_waitpid-on-arches-which-dont-have-it.patch -uml-use-va_copy.patch -uml-code-tidying.patch -uml-use-for_each_cpu.patch -uml-2610-ptrace-updates.patch -uml-add-the-new-syscalls.patch -uml-64-bit-cleanups.patch -uml-silence-some-message-from-the-console-driver.patch -uml-add-a-missing-include.patch -uml-sparse-annotations.patch -uml-fix-sys_call_table-syntax.patch -uml-fix-make-clean.patch -uml-define-config_input-better.patch -uml-fix-a-compile-warning.patch -seclvl-add-missing-dependency.patch -binfmt_elf-fix-return-error-codes-and-early-corrupt-binary-detection.patch -fix-setattr-attr_size-locking-for-nfsd.patch -pcmcia-new-ds-cs-interface.patch -pcmcia-call-device-drivers-from-ds-not-from-cs.patch -pcmcia-unify-bind_mtd-and-pcmcia_bind_mtd.patch -pcmcia-unfiy-bind_device-and-pcmcia_bind_device.patch -pcmcia-device-model-integration-can-only-be-submitted-under-gpl.patch -pcmcia-add-pcmcia_devices.patch -pcmcia-remove-socket_bind_t-use-pcmcia_devices-instead.patch -pcmcia-remove-internal-module-use-count-use-module_refcount-instead.patch -pcmcia-set-drivers-owner-field.patch -pcmcia-move-pcmcia_unregister_client-to-ds.patch -pcmcia-device-model-integration-can-only-be-submitted-under-gpl-part-2.patch -pcmcia-use-kref-instead-of-native-atomic-counter.patch -pcmcia-add-pcmcia_putget_socket.patch -pcmcia-grab-a-reference-to-the-cs-socket-in-ds.patch -pcmcia-get-a-reference-to-ds-socket-for-each-pcmcia_device.patch -pcmcia-add-a-pointer-to-client-in-struct-pcmcia_device.patch -pcmcia-use-pcmcia_device-in-send_event.patch -pcmcia-use-pcmcia_device-to-mark-clients-as-stale.patch -pcmcia-code-moving-in-ds.patch -pcmcia-use-pcmcia_device-in-register_client.patch -pcmcia-direct-ordered-unbind-of-devices.patch -pcmcia-bug-on-dev_list-=-null.patch -pcmcia-bug-if-clients-are-kept-too-long.patch -pcmcia-move-struct-client_t-inside-struct-pcmcia_device.patch -pcmcia-use-driver_find-in-ds.patch -pcmcia-set_netdev-for-network-devices.patch -pcmcia-set_netdev-for-wireless-network-devices.patch -pcmcia-reduce-stack-usage-in-ds_ioctl-randy-dunlap.patch -pcmcia-add-disable_clkrun-option.patch -pcmcia-rename-pcmcia-devices.patch -pcmcia-pd6729-e-mail-update.patch -pcmcia-pd6729-cleanups.patch -pcmcia-pd6729-isa_irq-handling.patch -pcmcia-remove-obsolete-code.patch -pcmcia-remove-pending_events.patch -pcmcia-remove-client_attributes.patch -pcmcia-remove-unneeded-parameter-from-rsrc_mgr.patch -pcmcia-remove-dev_info-from-client.patch -pcmcia-remove-mtd-and-bulkmem-replaced-by-pcmciamtd.patch -pcmcia-per-socket-resource-database.patch -pcmcia-validate_mem-only-for-non-statically-mapped-sockets.patch -pcmcia-adjust_io_region-only-for-non-statically-mapped-sockets.patch -pcmcia-find_io_region-only-for-non-statically-mapped-sockets.patch -pcmcia-find_mem_region-only-for-non-statically-mapped-sockets.patch -pcmcia-adjust_-and-release_resources-only-for-non-statically-mapped-sockets.patch -pcmcia-move-resource-handling-code-only-for-non-statically-mapped-sockets-to-other-file.patch -pcmcia-make-rsrc_nonstatic-an-independend-module.patch -pcmcia-allocate-resource-database-per-socket.patch -pcmcia-remove-typedef.patch -pcmcia-grab-lock-in-resource_release.patch -sched-make-preempt_bkl-depend-on-preempt-alone.patch -use-mmiowb-in-qla1280c.patch -bug-on-error-handlings-in-ext3-under-i-o-failure.patch -bug-on-error-handlings-in-ext3-under-i-o-failure-fix.patch -scsi-aic7xxx-kill-kernel-22-ifdefs.patch Merged +sparc64-nodemask-build-fix.patch sparc64 compile fix +selinux-fix-error-handling-code-for-policy-load.patch SELinux fix +generic-irq-code-missing-export-of-probe_irq_mask.patch parisc fix +infiniband-ipoib-use-correct-static-rate-in-ipoib.patch +infiniband-mthca-trivial-formatting-fix.patch +infiniband-mthca-support-rdma-atomic-attributes-in-qp-modify.patch +infiniband-mthca-clean-up-allocation-mapping-of-hca-context-memory.patch +infiniband-mthca-add-needed-rmb-in-event-queue-poll.patch +infiniband-core-remove-debug-printk.patch +infiniband-make-more-code-static.patch +infiniband-core-set-byte_cnt-correctly-in-mad-completion.patch +infiniband-core-add-qp-number-to-work-completion-struct.patch +infiniband-core-add-node_type-and-phys_state-sysfs-attrs.patch +infiniband-mthca-clean-up-computation-of-hca-memory-map.patch +infiniband-core-fix-handling-of-0-hop-directed-route-mads.patch +infiniband-core-add-more-parameters-to-process_mad.patch +infiniband-core-add-qp_type-to-struct-ib_qp.patch +infiniband-core-add-ib_find_cached_gid-function.patch +infiniband-update-copyrights-for-new-year.patch +infiniband-ipoib-move-structs-from-stack-to-device-private-struct.patch +infiniband-core-rename-handle_outgoing_smp.patch infiniband updates +seagate-st3200822as-sata-disk-needs-to-be-in-sil_blacklist-as-well.patch SATA blacklist entry -agpgart-allow-multiple-backends-to-be-initialized-fix.patch -agpgart-add-bridge-assignment-missed-in-agp_allocate_memory.patch Folded into agpgart-allow-multiple-backends-to-be-initialized.patch +agpgart-add-agp_find_bridge-function.patch +agpgart-allow-drivers-to-allocate-memory-local-to.patch -agp-x86_64-build-fix.patch More work on the support-multiple-agp-busses patches. +orphaned-pagecache-memleak-fix.patch Fix a weird memory leak on the page LRU. This isn't right yet. +mark-page-accessed-in-filemapc-not-quite-right.patch Page aging fix +netpoll-fix-napi-polling-race-on-smp.patch netpoll oops fix +tun-tan-arp-monitor-support.patch Make the tun/tap driver play right with ARP monitoring. +atmel_cs-add-support-lg-lw2100n-wlan-pcmcia-card.patch Add firmware support for another wlan card. +ppc32-fix-mpc8272ads.patch +ppc32-add-freescale-pq2fads-support.patch ppc32 updates +ppc64-make-hvlpevent_unregisterhandler-work.patch +ppc64-make-iseries_veth-call-flush_scheduled_work.patch +ppc64-iommu-avoid-isa-io-space-on-power3.patch ppc64 updates +frv-remove-mandatory-single-step-debugging-diversion.patch +frv-excess-whitespace-cleanup.patch arch/frv updates +x86_64-i386-increase-command-line-size.patch +x86_64-add-brackets-to-bitops.patch +x86_64-move-early-cpu-detection-earlier.patch +x86_64-disable-uselib-when-possible.patch +x86_64-optimize-nodemask-operations-slightly.patch +x86_64-fix-a-bug-in-timer_suspend.patch +x86-consolidate-code-segment-base-calculation.patch x86_64 update +swsusp-more-small-fixes.patch +swsusp-dm-use-right-levels-for-device_suspend.patch +swsusp-update-docs.patch +acpi-comment-whitespace-updates.patch +make-suspend-work-with-ioapic.patch +swsusp-refrigerator-cleanups.patch swsusp update +uml-avoid-null-dereference-in-linec.patch +uml-readd-config_magic_sysrq-for-uml.patch +uml-commentary-addition-to-recent-sysemu-fix.patch +uml-drop-unused-buffer_headh-header-from-hostfs.patch +uml-delete-unused-header-umnh.patch +uml-commentary-about-sigwinch-handling-for-consoles.patch +uml-fail-xterm_open-when-we-have-no-display.patch +uml-depend-on-usermode-in-drivers-block-kconfig-and-drop-arch-um-kconfig_block.patch +uml-makefile-simplification-and-correction.patch +uml-fix-compilation-for-missing-headers.patch +uml-fix-some-uml-own-initcall-macros.patch +uml-refuse-to-run-without-skas-if-no-tt-mode-in.patch +uml-for-ubd-cmdline-param-use-colon-as-delimiter.patch +uml-allow-free-ubd-flag-ordering.patch +uml-move-code-from-ubd_user-to-ubd_kern.patch +uml-fix-and-cleanup-code-in-ubd_kernc-coming-from-ubd_userc.patch +uml-add-stack-content-to-dumps.patch +uml-add-stack-addresses-to-dumps.patch +uml-update-ld-scripts-to-newer-binutils.patch UML update +reintroduce-export_symboltask_nice-for-binfmt_elf32.patch s/390 build fix +csum_and_copy_from_user-gcc4-warning-fixes-m32r-fix.patch m32r build fix +fixups-for-block2mtd.patch block2mtd update +poll-mini-optimisations.patch teeny poll() speedup +file_tableexpand_files-code-cleanup.patch +file_tableexpand_files-code-cleanup-remove-debug.patch code consolidation +mtrr-size-and-base-debug.patch Debug an mtrr bug. +minor-ext3-speedup.patch Reduce ext3 CPU consumption a little. +move-read-only-and-immutable-checks-into-permission.patch +factor-out-common-code-around-follow_link-invocation.patch Code cleanups/consolidation +relayfs-doc.patch +relayfs-common-files.patch +relayfs-locking-lockless-implementation.patch +relayfs-headers.patch relayfs +ltt-core-implementation.patch +ltt-core-headers.patch +ltt-kconfig-fix.patch +ltt-kernel-events.patch +ltt-kernel-events-tidy.patch +ltt-kernel-events-build-fix.patch +ltt-fs-events.patch +ltt-fs-events-tidy.patch +ltt-ipc-events.patch +ltt-mm-events.patch +ltt-net-events.patch +ltt-architecture-events.patch LTT. +lock-initializer-cleanup-ppc.patch +lock-initializer-cleanup-m32r.patch +lock-initializer-cleanup-video.patch +lock-initializer-cleanup-ide.patch +lock-initializer-cleanup-sound.patch +lock-initializer-cleanup-sh.patch +lock-initializer-cleanup-ppc64.patch +lock-initializer-cleanup-security.patch +lock-initializer-cleanup-core.patch +lock-initializer-cleanup-media-drivers.patch +lock-initializer-cleanup-networking.patch +lock-initializer-cleanup-block-devices.patch +lock-initializer-cleanup-s390.patch +lock-initializer-cleanup-usermode.patch +lock-initializer-cleanup-scsi.patch +lock-initializer-cleanup-sparc.patch +lock-initializer-cleanup-v850.patch +lock-initializer-cleanup-i386.patch +lock-initializer-cleanup-drm.patch +lock-initializer-cleanup-firewire.patch +lock-initializer-cleanup-arm26.patch +lock-initializer-cleanup-m68k.patch +lock-initializer-cleanup-network-drivers.patch +lock-initializer-cleanup-mtd.patch +lock-initializer-cleanup-x86_64.patch +lock-initializer-cleanup-filesystems.patch +lock-initializer-cleanup-ia64.patch +lock-initializer-cleanup-raid.patch +lock-initializer-cleanup-isdn.patch +lock-initializer-cleanup-parisc.patch +lock-initializer-cleanup-sparc64.patch +lock-initializer-cleanup-arm.patch +lock-initializer-cleanup-misc-drivers.patch +lock-initializer-cleanup-alpha.patch +lock-initializer-cleanup-character-devices.patch +lock-initializer-cleanup-drivers-serial.patch +lock-initializer-cleanup-frv.patch spinlock and rwlock initialiser clanups +ext3-ea-revert-cleanup.patch +ext3-ea-revert-old-ea-in-inode.patch +ext3-ea-mbcache-cleanup.patch +ext2-ea-race-in-ext-xattr-sharing-code.patch +ext3-ea-ext3-do-not-use-journal_release_buffer.patch +ext3-ea-ext3-factor-our-common-xattr-code-unnecessary-lock.patch +ext3-ea-ext-no-spare-xattr-handler-slots-needed.patch +ext3-ea-cleanup-and-prepare-ext3-for-in-inode-xattrs.patch +ext3-ea-hide-ext3_get_inode_loc-in_mem-option.patch +ext3-ea-in-inode-extended-attributes-for-ext3.patch Big ext3+EA update with various fixes +fix-race-between-core-dumping-and-exec.patch +fix-exec-deadlock-when-ptrace-used-inside-the-thread-group.patch +ptrace-unlocked-access-to-last_siginfo-resending.patch +clear-false-pending-signal-indication-in-core-dump.patch Various ptrace/signal/coredump fixes +pcmcia-remove-irq_type_time.patch +pcmcia-ignore-driver-irq-mask.patch +pcmcia-remove-irq_mask-and-irq_list-parameters-from-pcmcia-drivers.patch +pcmcia-use-irq_mask-to-mark-irqs-as-unusable.patch +pcmcia-remove-racy-try_irq.patch +pcmcia-modify-irq_mask-via-sysfs.patch +pcmcia-remove-includes-in-rsrc_mgr-which-arent-necessary-any-longer.patch pcmcia udpates. +sched-fix-preemption-race-core-i386.patch +sched-make-use-of-preempt_schedule_irq-ppc.patch +sched-make-use-of-preempt_schedule_irq-arm.patch CPU scheduler preemption fix +fbdev-cleanup-broken-edid-fixup-code.patch +fbcon-catch-blank-events-on-both-device-and-console-level.patch +fbcon-fix-compile-error.patch +fbdev-fbmon-cleanup.patch +i810fb-module-param-fix.patch +atyfb-fix-module-parameter-descriptions.patch +radeonfb-fix-init-exit-section-usage.patch +pxafb-reorder-add_wait_queue-and-set_current_state.patch +sa1100fb-reorder-add_wait_queue-and-set_current_state.patch +backlight-add-backlight-lcd-device-basic-support.patch +fbdev-add-w100-framebuffer-driver.patch fbdev/fbcon update +post-halloween-doc.patch davej's 2.6 feature list +fuse-maintainers-kconfig-and-makefile-changes.patch +fuse-core.patch +fuse-device-functions.patch +fuse-read-only-operations.patch +fuse-read-write-operations.patch +fuse-file-operations.patch +fuse-mount-options.patch +fuse-extended-attribute-operations.patch +fuse-readpages-operation.patch +fuse-nfs-export.patch +fuse-direct-i-o.patch Filesystem in userspace. +ieee1394-adds-a-disable_irm-option-to-ieee1394ko.patch New command line option for firewire. +fix-typo-in-arch-i386-kconfig.patch Fix a tpyo. +random-whitespace-doh.patch +random-entropy-debugging-improvements.patch +random-run-time-configurable-debugging.patch +random-periodicity-detection-fix.patch +random-add_input_randomness.patch random driver updates +various-kconfig-fixes.patch Fix a huge number of Kconfig typos and brainos. number of patches in -mm: 434 number of changesets in external trees: 314 number of patches in -mm only: 417 total patches: 731 All 434 patches: linus.patch sparc64-nodemask-build-fix.patch sparc64: nodemask build fix selinux-fix-error-handling-code-for-policy-load.patch SELinux: fix error handling code for policy load generic-irq-code-missing-export-of-probe_irq_mask.patch generic irq code missing export of probe_irq_mask() infiniband-ipoib-use-correct-static-rate-in-ipoib.patch InfiniBand/IPoIB: use correct static rate in IpoIB infiniband-mthca-trivial-formatting-fix.patch InfiniBand/mthca: trivial formatting fix infiniband-mthca-support-rdma-atomic-attributes-in-qp-modify.patch InfiniBand/mthca: support RDMA/atomic attributes in QP modify infiniband-mthca-clean-up-allocation-mapping-of-hca-context-memory.patch InfiniBand/mthca: clean up allocation mapping of HCA context memory infiniband-mthca-add-needed-rmb-in-event-queue-poll.patch InfiniBand/mthca: add needed rmb() in event queue poll infiniband-core-remove-debug-printk.patch InfiniBand/core: remove debug printk infiniband-make-more-code-static.patch InfiniBand: make more code static infiniband-core-set-byte_cnt-correctly-in-mad-completion.patch InfiniBand/core: set byte_cnt correctly in MAD completion infiniband-core-add-qp-number-to-work-completion-struct.patch InfiniBand/core: add QP number to work completion struct infiniband-core-add-node_type-and-phys_state-sysfs-attrs.patch InfiniBand/core: add node_type and phys_state sysfs attrs infiniband-mthca-clean-up-computation-of-hca-memory-map.patch InfiniBand/mthca: clean up computation of HCA memory map infiniband-core-fix-handling-of-0-hop-directed-route-mads.patch InfiniBand/core: fix handling of 0-hop directed route MADs infiniband-core-add-more-parameters-to-process_mad.patch InfiniBand/core: add more parameters to process_mad infiniband-core-add-qp_type-to-struct-ib_qp.patch InfiniBand/core: add qp_type to struct ib_qp infiniband-core-add-ib_find_cached_gid-function.patch InfiniBand/core: add ib_find_cached_gid function infiniband-update-copyrights-for-new-year.patch InfiniBand: update copyrights for new year infiniband-ipoib-move-structs-from-stack-to-device-private-struct.patch InfiniBand/ipoib: move structs from stack to device private struct infiniband-core-rename-handle_outgoing_smp.patch InfiniBand/core: rename handle_outgoing_smp ia64-acpi-build-fix.patch ia64 acpi build fix ia64-config_apci_numa-fix.patch ia64 CONFIG_APCI_NUMA fix bk-acpi-revert-20041210.patch bk-acpi-revert-20041210 acpi-report-errors-in-fanc.patch ACPI: report errors in fan.c acpi-flush-tlb-when-pagetable-changed.patch acpi: flush TLB when pagetable changed acpi-kfree-fix.patch a bk-alsa.patch bk-arm.patch bk-cifs.patch bk-cpufreq.patch bk-drm-via.patch bk-i2c.patch bk-ide-dev.patch ide-dev-build-fix.patch ide-dev-build-fix bk-input.patch bk-dtor-input.patch alps-touchpad-detection-fix.patch ALPS touchpad detection fix bk-kbuild.patch bk-kconfig.patch seagate-st3200822as-sata-disk-needs-to-be-in-sil_blacklist-as-well.patch Seagate ST3200822AS SATA disk needs to be in sil_blacklist as well bk-netdev.patch bk-ntfs.patch bk-pci.patch bk-usb.patch bk-xfs.patch mm.patch add -mmN to EXTRAVERSION fix-smm-failures-on-e750x-systems.patch fix SMM failures on E750x systems agpgart-allow-multiple-backends-to-be-initialized.patch agpgart: allow multiple backends to be initialized agpgart-allow-multiple-backends-to-be-initialized fix agpgart: add bridge assignment missed in agp_allocate_memory x86_64 agp failure fix agpgart-add-agp_find_bridge-function.patch agpgart: add agp_find_bridge function agpgart-allow-drivers-to-allocate-memory-local-to.patch agpgart: allow drivers to allocate memory local to the bridge drm-add-support-for-new-multiple-agp-bridge-agpgart-api.patch drm: add support for new multiple agp bridge agpgart api fb-add-support-for-new-multiple-agp-bridge-agpgart-api.patch fb: add support for new multiple agp bridge agpgart api agpgart-add-bridge-parameter-to-driver-functions.patch agpgart: add bridge parameter to driver functions vm-pageout-throttling.patch vm: pageout throttling make-tree_lock-an-rwlock.patch make mapping->tree_lock an rwlock orphaned-pagecache-memleak-fix.patch orphaned pagecache memleak fix mark-page-accessed-in-filemapc-not-quite-right.patch mark-page-accessed in filemap.c not quite right must-fix.patch must fix lists update must fix list update mustfix update must-fix update mustfix lists pcnet32-79c976-with-fiber-optic.patch pcnet32: 79c976 with fiber optic fix add-omap-support-to-smc91x-ethernet-driver.patch Add OMAP support to smc91x Ethernet driver restore-net-sched-iptc-after-iptables-kmod-cleanup.patch Restore net/sched/ipt.c After iptables Kmod Cleanup b44-bounce-buffer-fix.patch b44 bounce buffering fix netpoll-fix-napi-polling-race-on-smp.patch netpoll: fix NAPI polling race on SMP tun-tan-arp-monitor-support.patch tun/tap ARP monitor support atmel_cs-add-support-lg-lw2100n-wlan-pcmcia-card.patch atmel_cs: Add support LG LW2100N WLAN PCMCIA card ppc32-fix-mpc8272ads.patch ppc32: Fix mpc8272ads ppc32-add-freescale-pq2fads-support.patch ppc32: Add Freescale PQ2FADS support ppc64-make-hvlpevent_unregisterhandler-work.patch ppc64: make HvLpEvent_unregisterHandler() work ppc64-make-iseries_veth-call-flush_scheduled_work.patch ppc64: make iseries_veth call flush_scheduled_work() ppc64-iommu-avoid-isa-io-space-on-power3.patch ppc64: iommu: avoid ISA io space on POWER3 ppc64-reloc_hide.patch frv-remove-mandatory-single-step-debugging-diversion.patch FRV: Remove mandatory single-step debugging diversion frv-excess-whitespace-cleanup.patch FRV: Excess whitespace cleanup superhyway-bus-support.patch SuperHyway bus support x86_64-i386-increase-command-line-size.patch x86_64/i386: increase command line size x86_64-add-brackets-to-bitops.patch x86_64: Add brackets to bitops x86_64-move-early-cpu-detection-earlier.patch x86_64: Move early CPU detection earlier x86_64-disable-uselib-when-possible.patch x86_64: Disable uselib when possible x86_64-optimize-nodemask-operations-slightly.patch x86_64: Optimize nodemask operations slightly x86_64-fix-a-bug-in-timer_suspend.patch Fix a bug in timer_suspend() on x86_64 x86-consolidate-code-segment-base-calculation.patch x68: consolidate code segment base calculation xen-vmm-4-add-ptep_establish_new-to-make-va-available.patch Xen VMM #4: add ptep_establish_new to make va available xen-vmm-4-return-code-for-arch_free_page.patch Xen VMM #4: return code for arch_free_page xen-vmm-4-return-code-for-arch_free_page-fix.patch Get rid of arch_free_page() warning xen-vmm-4-runtime-disable-of-vt-console.patch Xen VMM #4: runtime disable of VT console xen-vmm-4-has_arch_dev_mem.patch Xen VMM #4: HAS_ARCH_DEV_MEM xen-vmm-4-split-free_irq-into-teardown_irq.patch Xen VMM #4: split free_irq into teardown_irq swsusp-more-small-fixes.patch swsusp: more small fixes swsusp-dm-use-right-levels-for-device_suspend.patch swsusp/dm: Use right levels for device_suspend() swsusp-update-docs.patch swsusp: update docs acpi-comment-whitespace-updates.patch acpi: comment/whitespace updates make-suspend-work-with-ioapic.patch make suspend work with ioapic swsusp-refrigerator-cleanups.patch swsusp: refrigerator cleanups uml-avoid-null-dereference-in-linec.patch uml: avoid NULL dereference in line.c uml-readd-config_magic_sysrq-for-uml.patch uml: readd CONFIG_MAGIC_SYSRQ for UML uml-commentary-addition-to-recent-sysemu-fix.patch uml: Commentary addition to recent SYSEMU fix. uml-drop-unused-buffer_headh-header-from-hostfs.patch uml: drop unused buffer_head.h header from hostfs uml-delete-unused-header-umnh.patch uml: delete unused header umn.h uml-commentary-about-sigwinch-handling-for-consoles.patch uml: commentary about SIGWINCH handling for consoles uml-fail-xterm_open-when-we-have-no-display.patch uml: fail xterm_open when we have no $DISPLAY uml-depend-on-usermode-in-drivers-block-kconfig-and-drop-arch-um-kconfig_block.patch uml: depend on !USERMODE in drivers/block/Kconfig and drop arch/um/Kconfig_block uml-makefile-simplification-and-correction.patch uml: Makefile simplification and correction. uml-fix-compilation-for-missing-headers.patch uml: fix compilation for missing headers uml-fix-some-uml-own-initcall-macros.patch uml: fix some UML own initcall macros uml-refuse-to-run-without-skas-if-no-tt-mode-in.patch uml: refuse to run without skas if no tt mode in uml-for-ubd-cmdline-param-use-colon-as-delimiter.patch uml: for ubd cmdline param use colon as delimiter uml-allow-free-ubd-flag-ordering.patch uml: allow free ubd flag ordering uml-move-code-from-ubd_user-to-ubd_kern.patch uml: move code from ubd_user to ubd_kern uml-fix-and-cleanup-code-in-ubd_kernc-coming-from-ubd_userc.patch uml: fix and cleanup code in ubd_kern.c coming from ubd_user.c uml-add-stack-content-to-dumps.patch uml: add stack content to dumps uml-add-stack-addresses-to-dumps.patch uml: add stack addresses to dumps uml-update-ld-scripts-to-newer-binutils.patch uml: update ld scripts to newer binutils reintroduce-export_symboltask_nice-for-binfmt_elf32.patch reintroduce task_nice export for binfmt_elf32 wacom-tablet-driver.patch wacom tablet driver force-feedback-support-for-uinput.patch Force feedback support for uinput kmap_atomic-takes-char.patch kmap_atomic takes char* kmap_atomic-takes-char-fix.patch kmap_atomic-takes-char-fix kmap_atomic-fallout.patch kmap_atomic fallout kunmap-fallout-more-fixes.patch kunmap-fallout-more-fixes make-sysrq-f-call-oom_kill.patch make sysrq-F call oom_kill() allow-admin-to-enable-only-some-of-the-magic-sysrq-functions.patch Allow admin to enable only some of the Magic-Sysrq functions sort-out-pci_rom_address_enable-vs-ioresource_rom_enable.patch Sort out PCI_ROM_ADDRESS_ENABLE vs IORESOURCE_ROM_ENABLE csum_and_copy_from_user-gcc4-warning-fixes.patch csum_and_copy_from_user gcc4 warning fixes csum_and_copy_from_user-gcc4-warning-fixes-m32r-fix.patch csum_and_copy_from_user-gcc4-warning-fixes m32r fix smbfs-fixes.patch smbfs fixes irqpoll.patch irqpoll fixups-for-block2mtd.patch fixups for block2mtd poll-mini-optimisations.patch poll: mini optimisations file_tableexpand_files-code-cleanup.patch file_table:expand_files() code cleanup file_tableexpand_files-code-cleanup-remove-debug.patch file_tableexpand_files-code-cleanup-remove-debug mtrr-size-and-base-debug.patch mtrr size-and-base debugging minor-ext3-speedup.patch Minor ext3 speedup move-read-only-and-immutable-checks-into-permission.patch move read-only and immutable checks into permission() factor-out-common-code-around-follow_link-invocation.patch factor out common code around ->follow_link invocation relayfs-doc.patch relayfs: doc relayfs-common-files.patch relayfs: common files relayfs-locking-lockless-implementation.patch relayfs: locking/lockless implementation relayfs-headers.patch relayfs: headers ltt-core-implementation.patch ltt: core implementation ltt-core-headers.patch ltt: core headers ltt-kconfig-fix.patch ltt kconfig fix ltt-kernel-events.patch ltt: kernel/ events ltt-kernel-events-tidy.patch ltt-kernel-events tidy ltt-kernel-events-build-fix.patch ltt-kernel-events-build-fix ltt-fs-events.patch ltt: fs/ events ltt-fs-events-tidy.patch ltt-fs-events tidy ltt-ipc-events.patch ltt: ipc/ events ltt-mm-events.patch ltt: mm/ events ltt-net-events.patch ltt: net/ events ltt-architecture-events.patch ltt: architecture events lock-initializer-cleanup-ppc.patch Lock initializer cleanup: PPC lock-initializer-cleanup-m32r.patch Lock initializer cleanup: M32R lock-initializer-cleanup-video.patch Lock initializer cleanup: Video lock-initializer-cleanup-ide.patch Lock initializer cleanup: IDE lock-initializer-cleanup-sound.patch Lock initializer cleanup: sound lock-initializer-cleanup-sh.patch Lock initializer cleanup: SH lock-initializer-cleanup-ppc64.patch Lock initializer cleanup: PPC64 lock-initializer-cleanup-security.patch Lock initializer cleanup: Security lock-initializer-cleanup-core.patch Lock initializer cleanup: Core lock-initializer-cleanup-media-drivers.patch Lock initializer cleanup: media drivers lock-initializer-cleanup-networking.patch Lock initializer cleanup: Networking lock-initializer-cleanup-block-devices.patch Lock initializer cleanup: Block devices lock-initializer-cleanup-s390.patch Lock initializer cleanup: S390 lock-initializer-cleanup-usermode.patch Lock initializer cleanup: UserMode lock-initializer-cleanup-scsi.patch Lock initializer cleanup: SCSI lock-initializer-cleanup-sparc.patch Lock initializer cleanup: SPARC lock-initializer-cleanup-v850.patch Lock initializer cleanup: V850 lock-initializer-cleanup-i386.patch Lock initializer cleanup: I386 lock-initializer-cleanup-drm.patch Lock initializer cleanup: DRM lock-initializer-cleanup-firewire.patch Lock initializer cleanup: Firewire lock-initializer-cleanup-arm26.patch Lock initializer cleanup - (ARM26) lock-initializer-cleanup-m68k.patch Lock initializer cleanup: M68K lock-initializer-cleanup-network-drivers.patch Lock initializer cleanup: Network drivers lock-initializer-cleanup-mtd.patch Lock initializer cleanup: MTD lock-initializer-cleanup-x86_64.patch Lock initializer cleanup: X86_64 lock-initializer-cleanup-filesystems.patch Lock initializer cleanup: Filesystems lock-initializer-cleanup-ia64.patch Lock initializer cleanup: IA64 lock-initializer-cleanup-raid.patch Lock initializer cleanup: Raid lock-initializer-cleanup-isdn.patch Lock initializer cleanup: ISDN lock-initializer-cleanup-parisc.patch Lock initializer cleanup: PARISC lock-initializer-cleanup-sparc64.patch Lock initializer cleanup: SPARC64 lock-initializer-cleanup-arm.patch Lock initializer cleanup: ARM lock-initializer-cleanup-misc-drivers.patch Lock initializer cleanup: Misc drivers lock-initializer-cleanup-alpha.patch Lock initializer cleanup - (ALPHA) lock-initializer-cleanup-character-devices.patch Lock initializer cleanup: character devices lock-initializer-cleanup-drivers-serial.patch Lock initializer cleanup: drivers/serial lock-initializer-cleanup-frv.patch Lock initializer cleanup: FRV ext3-ea-revert-cleanup.patch ext3-ea-revert-cleanup ext3-ea-revert-old-ea-in-inode.patch revert old ea-in-inode patch ext3-ea-mbcache-cleanup.patch ext3/EA: mbcache cleanup ext2-ea-race-in-ext-xattr-sharing-code.patch ext3/EA: Race in ext[23] xattr sharing code ext3-ea-ext3-do-not-use-journal_release_buffer.patch ext3/EA: Ext3: do not use journal_release_buffer ext3-ea-ext3-factor-our-common-xattr-code-unnecessary-lock.patch ext3/EA: Ext3: factor our common xattr code; unnecessary lock ext3-ea-ext-no-spare-xattr-handler-slots-needed.patch ext3/EA: Ext[23]: no spare xattr handler slots needed ext3-ea-cleanup-and-prepare-ext3-for-in-inode-xattrs.patch ext3/EA: Cleanup and prepare ext3 for in-inode xattrs ext3-ea-hide-ext3_get_inode_loc-in_mem-option.patch ext3/EA: Hide ext3_get_inode_loc in_mem option ext3-ea-in-inode-extended-attributes-for-ext3.patch ext3/EA: In-inode extended attributes for ext3 speedup-proc-pid-maps.patch Speed up /proc/pid/maps speedup-proc-pid-maps-fix.patch Speed up /proc/pid/maps fix speedup-proc-pid-maps-fix-fix.patch speedup-proc-pid-maps fix fix speedup-proc-pid-maps-fix-fix-fix.patch speedup /proc/<pid>/maps(4th version) inotify.patch inotify ioctl-rework-2.patch ioctl rework #2 ioctl-rework-2-fix.patch ioctl-rework-2 fix make-standard-conversions-work-with-compat_ioctl.patch make standard conversions work with compat_ioctl. fget_light-fput_light-for-ioctls.patch fget_light/fput_light for ioctls macros-to-detect-existance-of-unlocked_ioctl-and-ioctl_compat.patch macros to detect existance of unlocked_ioctl and ioctl_compat fix-coredump_wait-deadlock-with-ptracer-tracee-on-shared-mm.patch fix coredump_wait deadlock with ptracer & tracee on shared mm fix-race-between-core-dumping-and-exec.patch fix race between core dumping and exec with shared mm fix-exec-deadlock-when-ptrace-used-inside-the-thread-group.patch fix exec deadlock when ptrace used inside the thread group ptrace-unlocked-access-to-last_siginfo-resending.patch ptrace: unlocked access to last_siginfo (resending) clear-false-pending-signal-indication-in-core-dump.patch clear false pending signal indication in core dump pcmcia-remove-irq_type_time.patch pcmcia: remove IRQ_TYPE_TIME pcmcia-ignore-driver-irq-mask.patch pcmcia: ignore driver IRQ mask pcmcia-remove-irq_mask-and-irq_list-parameters-from-pcmcia-drivers.patch pcmcia: remove irq_mask and irq_list parameters from PCMCIA drivers pcmcia-use-irq_mask-to-mark-irqs-as-unusable.patch pcmcia: use irq_mask to mark IRQs as (un)usable pcmcia-remove-racy-try_irq.patch pcmcia: remove racy try_irq() pcmcia-modify-irq_mask-via-sysfs.patch pcmcia: modify irq_mask via sysfs pcmcia-remove-includes-in-rsrc_mgr-which-arent-necessary-any-longer.patch pcmcia: remove #includes in rsrc_mgr which aren't necessary any longer kgdb-ga.patch kgdb stub for ia32 (George Anzinger's one) kgdbL warning fix kgdb buffer overflow fix kgdbL warning fix kgdb: CONFIG_DEBUG_INFO fix x86_64 fixes correct kgdb.txt Documentation link (against 2.6.1-rc1-mm2) kgdb: fix for recent gcc kgdb warning fixes THREAD_SIZE fixes for kgdb Fix stack overflow test for non-8k stacks kgdb-ga.patch fix for i386 single-step into sysenter fix TRAP_BAD_SYSCALL_EXITS on i386 add TRAP_BAD_SYSCALL_EXITS config for i386 kgdb-is-incompatible-with-kprobes kgdb-ga-build-fix kgdb-ga-fixes kgdb-kill-off-highmem_start_page.patch kgdb: kill off highmem_start_page kgdboe-netpoll.patch kgdb-over-ethernet via netpoll kgdboe: fix configuration of MAC address kgdb-x86_64-support.patch kgdb-x86_64-support.patch for 2.6.2-rc1-mm3 kgdb-x86_64-warning-fixes kgdb-x86_64-fix kgdb-x86_64-serial-fix kprobes exception notifier fix dev-mem-restriction-patch.patch /dev/mem restriction patch dev-mem-restriction-patch-allow-reads.patch dev-mem-restriction-patch: allow reads jbd-remove-livelock-avoidance.patch JBD: remove livelock avoidance code in journal_dirty_data() journal_add_journal_head-debug.patch journal_add_journal_head-debug list_del-debug.patch list_del debug check unplug-can-sleep.patch unplug functions can sleep firestream-warnings.patch firestream warnings perfctr-core.patch perfctr: core perfctr: remove bogus perfctr_sample_thread() calls perfctr-i386.patch perfctr: i386 perfctr-x86-core-updates.patch perfctr x86 core updates perfctr-x86-driver-updates.patch perfctr x86 driver updates perfctr-x86-driver-cleanup.patch perfctr: x86 driver cleanup perfctr-prescott-fix.patch Prescott fix for perfctr perfctr-x86-update-2.patch perfctr x86 update 2 perfctr-x86_64.patch perfctr: x86_64 perfctr-x86_64-core-updates.patch perfctr x86_64 core updates perfctr-ppc.patch perfctr: PowerPC perfctr-ppc32-driver-update.patch perfctr: ppc32 driver update perfctr-ppc32-mmcr0-handling-fixes.patch perfctr ppc32 MMCR0 handling fixes perfctr-ppc32-update.patch perfctr ppc32 update perfctr-ppc32-update-2.patch perfctr ppc32 update perfctr-virtualised-counters.patch perfctr: virtualised counters perfctr-remap_page_range-fix.patch virtual-perfctr-illegal-sleep.patch virtual perfctr illegal sleep make-perfctr_virtual-default-in-kconfig-match-recommendation.patch Make PERFCTR_VIRTUAL default in Kconfig match recommendation in help text perfctr-ifdef-cleanup.patch perfctr ifdef cleanup perfctr-update-2-6-kconfig-related-updates.patch perfctr: Kconfig-related updates perfctr-virtual-updates.patch perfctr virtual updates perfctr-virtual-cleanup.patch perfctr: virtual cleanup perfctr-ppc32-preliminary-interrupt-support.patch perfctr ppc32 preliminary interrupt support perfctr-update-5-6-reduce-stack-usage.patch perfctr: reduce stack usage perfctr-interrupt-support-kconfig-fix.patch perfctr interrupt_support Kconfig fix perfctr-low-level-documentation.patch perfctr low-level documentation perfctr-inheritance-1-3-driver-updates.patch perfctr inheritance: driver updates perfctr-inheritance-2-3-kernel-updates.patch perfctr inheritance: kernel updates perfctr-inheritance-3-3-documentation-updates.patch perfctr inheritance: documentation updates perfctr-inheritance-locking-fix.patch perfctr inheritance locking fix perfctr-api-changes-first-step.patch perfctr API changes: first step perfctr-virtual-update.patch perfctr virtual update perfctr-x86-64-ia32-emulation-fix.patch perfctr x86-64 ia32 emulation fix perfctr-sysfs-update-1-4-core.patch perfctr sysfs update: core perfctr-sysfs-update.patch Perfctr sysfs update perfctr-sysfs-update-2-4-x86.patch perfctr sysfs update: x86 perfctr-sysfs-update-3-4-x86-64.patch perfctr sysfs update: x86-64 perfctr: syscall numbers in x86-64 ia32-emulation perfctr x86_64 native syscall numbers fix perfctr-sysfs-update-4-4-ppc32.patch perfctr sysfs update: ppc32 sched-fix-preemption-race-core-i386.patch sched: fix preemption race (Core/i386) sched-make-use-of-preempt_schedule_irq-ppc.patch sched: make use of preempt_schedule_irq() (PPC) sched-make-use-of-preempt_schedule_irq-arm.patch sched: make use of preempt_schedule_irq (ARM) add-do_proc_doulonglongvec_minmax-to-sysctl-functions.patch Add do_proc_doulonglongvec_minmax to sysctl functions add-do_proc_doulonglongvec_minmax-to-sysctl-functions-fix add-do_proc_doulonglongvec_minmax-to-sysctl-functions fix 2 add-sysctl-interface-to-sched_domain-parameters.patch Add sysctl interface to sched_domain parameters allow-modular-ide-pnp.patch allow modular ide-pnp allow-x86_64-to-reenable-interrupts-on-contention.patch Allow x86_64 to reenable interrupts on contention i386-cpu-hotplug-updated-for-mm.patch i386 CPU hotplug updated for -mm ppc64-fix-cpu-hotplug.patch ppc64: fix hotplug cpu serialize-access-to-ide-devices.patch serialize access to ide devices disable-atykb-warning.patch disable atykb "too many keys pressed" warning export-file_ra_state_init-again.patch Export file_ra_state_init() again cachefs-filesystem.patch CacheFS filesystem numa-policies-for-file-mappings-mpol_mf_move-cachefs.patch numa-policies-for-file-mappings-mpol_mf_move for cachefs cachefs-release-search-records-lest-they-return-to-haunt-us.patch CacheFS: release search records lest they return to haunt us fix-64-bit-problems-in-cachefs.patch Fix 64-bit problems in cachefs cachefs-fixed-typos-that-cause-wrong-pointer-to-be-kunmapped.patch cachefs: fixed typos that cause wrong pointer to be kunmapped cachefs-return-the-right-error-upon-invalid-mount.patch CacheFS: return the right error upon invalid mount fix-cachefs-barrier-handling-and-other-kernel-discrepancies.patch Fix CacheFS barrier handling and other kernel discrepancies remove-error-from-linux-cachefsh.patch Remove #error from linux/cachefs.h cachefs-warning-fix-2.patch cachefs warning fix 2 cachefs-linkage-fix-2.patch cachefs linkage fix cachefs-build-fix.patch cachefs build fix cachefs-documentation.patch CacheFS documentation add-page-becoming-writable-notification.patch Add page becoming writable notification add-page-becoming-writable-notification-fix.patch do_wp_page_mk_pte_writable() fix add-page-becoming-writable-notification-build-fix.patch add-page-becoming-writable-notification build fix provide-a-filesystem-specific-syncable-page-bit.patch Provide a filesystem-specific sync'able page bit provide-a-filesystem-specific-syncable-page-bit-fix.patch provide-a-filesystem-specific-syncable-page-bit-fix provide-a-filesystem-specific-syncable-page-bit-fix-2.patch provide-a-filesystem-specific-syncable-page-bit-fix-2 make-afs-use-cachefs.patch Make AFS use CacheFS afs-cachefs-dependency-fix.patch afs-cachefs-dependency-fix split-general-cache-manager-from-cachefs.patch Split general cache manager from CacheFS turn-cachefs-into-a-cache-backend.patch Turn CacheFS into a cache backend rework-the-cachefs-documentation-to-reflect-fs-cache-split.patch Rework the CacheFS documentation to reflect FS-Cache split update-afs-client-to-reflect-cachefs-split.patch Update AFS client to reflect CacheFS split assign_irq_vector-section-fix.patch assign_irq_vector __init section fix kexec-i8259-shutdowni386.patch kexec: i8259-shutdown.i386 kexec-i8259-shutdown-x86_64.patch kexec: x86_64 i8259 shutdown kexec-apic-virtwire-on-shutdowni386patch.patch kexec: apic-virtwire-on-shutdown.i386.patch kexec-apic-virtwire-on-shutdownx86_64.patch kexec: apic-virtwire-on-shutdown.x86_64 kexec-ioapic-virtwire-on-shutdowni386.patch kexec: ioapic-virtwire-on-shutdown.i386 kexec-apic-virt-wire-fix.patch kexec: apic-virt-wire fix kexec-ioapic-virtwire-on-shutdownx86_64.patch kexec: ioapic-virtwire-on-shutdown.x86_64 kexec-e820-64bit.patch kexec: e820-64bit kexec-kexec-generic.patch kexec: kexec-generic kexec-ide-spindown-fix.patch kexec-ide-spindown-fix kexec-ifdef-cleanup.patch kexec ifdef cleanup kexec-machine_shutdownx86_64.patch kexec: machine_shutdown.x86_64 kexec-kexecx86_64.patch kexec: kexec.x86_64 kexec-kexecx86_64-4level-fix.patch kexec-kexecx86_64-4level-fix kexec-kexecx86_64-4level-fix-unfix.patch kexec-kexecx86_64-4level-fix unfix kexec-machine_shutdowni386.patch kexec: machine_shutdown.i386 kexec-kexeci386.patch kexec: kexec.i386 kexec-use_mm.patch kexec: use_mm kexec-loading-kernel-from-non-default-offset.patch kexec: loading kernel from non-default offset kexec-loading-kernel-from-non-default-offset-fix.patch kdump: fix bss compile error kexec-enabling-co-existence-of-normal-kexec-kernel-and-panic-kernel.patch kexec: nabling co-existence of normal kexec kernel and panic kernel kexec-ppc-support.patch kexec: ppc support crashdump-documentation.patch crashdump: documentation crashdump-memory-preserving-reboot-using-kexec.patch crashdump: memory preserving reboot using kexec crashdump-memory-preserving-reboot-using-kexec-fix.patch kdump: Fix for boot problems on SMP kdump-config_discontigmem-fix.patch kdump: CONFIG_DISCONTIGMEM fix crashdump-routines-for-copying-dump-pages.patch crashdump: routines for copying dump pages crashdump-routines-for-copying-dump-pages-kmap-fiddle.patch crashdump-routines-for-copying-dump-pages-kmap-fiddle crashdump-kmap-build-fix.patch crashdump kmap build fix crashdump-register-snapshotting-before-kexec-boot.patch crashdump: register snapshotting before kexec boot crashdump-elf-format-dump-file-access.patch crashdump: ELF format dump file access crashdump-linear-raw-format-dump-file-access.patch crashdump: linear/raw format dump file access crashdump-minor-bug-fixes-to-kexec-crashdump-code.patch crashdump: minor bug fixes to kexec crashdump code crashdump-cleanups-to-the-kexec-based-crashdump-code.patch crashdump: cleanups to the kexec based crashdump code x86-rename-apic_mode_exint.patch x86: rename APIC_MODE_EXINT x86-local-apic-fix.patch x86: local apic fix new-bitmap-list-format-for-cpusets.patch new bitmap list format (for cpusets) cpusets-big-numa-cpu-and-memory-placement.patch cpusets - big numa cpu and memory placement cpusets-config_cpusets-depends-on-smp.patch Cpusets: CONFIG_CPUSETS depends on SMP cpusets-move-cpusets-above-embedded.patch move CPUSETS above EMBEDDED cpusets-fix-cpuset_get_dentry.patch cpusets : fix cpuset_get_dentry() cpusets-fix-race-in-cpuset_add_file.patch cpusets: fix race in cpuset_add_file() cpusets-remove-more-casts.patch cpusets: remove more casts cpusets-make-config_cpusets-the-default-in-sn2_defconfig.patch cpusets: make CONFIG_CPUSETS the default in sn2_defconfig cpusets-document-proc-status-allowed-fields.patch cpusets: document proc status allowed fields cpusets-dont-export-proc_cpuset_operations.patch Cpusets - Dont export proc_cpuset_operations cpusets-display-allowed-masks-in-proc-status.patch cpusets: display allowed masks in proc status cpusets-simplify-cpus_allowed-setting-in-attach.patch cpusets: simplify cpus_allowed setting in attach cpusets-remove-useless-validation-check.patch cpusets: remove useless validation check cpusets-tasks-file-simplify-format-fixes.patch Cpusets tasks file: simplify format, fixes cpusets-simplify-memory-generation.patch Cpusets: simplify memory generation cpusets-interoperate-with-hotplug-online-maps.patch cpusets: interoperate with hotplug online maps cpusets-alternative-fix-for-possible-race-in.patch cpusets: alternative fix for possible race in cpuset_tasks_read() cpusets-remove-casts.patch cpusets: remove void* typecasts reiser4-sb_sync_inodes.patch reiser4: vfs: add super_operations.sync_inodes() reiser4-allow-drop_inode-implementation.patch reiser4: export vfs inode.c symbols reiser4-truncate_inode_pages_range.patch reiser4: vfs: add truncate_inode_pages_range() reiser4-export-remove_from_page_cache.patch reiser4: export pagecache add/remove functions to modules reiser4-export-page_cache_readahead.patch reiser4: export page_cache_readahead to modules reiser4-reget-page-mapping.patch reiser4: vfs: re-check page->mapping after calling try_to_release_page() reiser4-rcu-barrier.patch reiser4: add rcu_barrier() synchronization point reiser4-export-inode_lock.patch reiser4: export inode_lock to modules reiser4-export-pagevec-funcs.patch reiser4: export pagevec functions to modules reiser4-export-radix_tree_preload.patch reiser4: export radix_tree_preload() to modules reiser4-export-find_get_pages.patch reiser4-radix-tree-tag.patch reiser4: add new radix tree tag reiser4-radix_tree_lookup_slot.patch reiser4: add radix_tree_lookup_slot() reiser4-perthread-pages.patch reiser4: per-thread page pools reiser4-include-reiser4.patch reiser4: add to build system reiser4-doc.patch reiser4: documentation reiser4-only.patch reiser4: main fs reiser4-recover-read-performance.patch reiser4: recover read performance reiser4-export-find_get_pages_tag.patch reiser4-export-find_get_pages_tag reiser4-add-missing-context.patch add-acpi-based-floppy-controller-enumeration.patch Add ACPI-based floppy controller enumeration. possible-dcache-bug-debugging-patch.patch Possible dcache BUG: debugging patch serial-add-support-for-non-standard-xtals-to-16c950-driver.patch serial: add support for non-standard XTALs to 16c950 driver add-support-for-possio-gcc-aka-pcmcia-siemens-mc45.patch Add support for Possio GCC AKA PCMCIA Siemens MC45 mpsc-driver-patch.patch serial: MPSC driver generic-serial-cli-conversion.patch generic-serial cli() conversion specialix-io8-cli-conversion.patch Specialix/IO8 cli() conversion sx-cli-conversion.patch SX cli() conversion revert-allow-oem-written-modules-to-make-calls-to-ia64-oem-sal-functions.patch revert "allow OEM written modules to make calls to ia64 OEM SAL functions" md-add-interface-for-userspace-monitoring-of-events.patch md: add interface for userspace monitoring of events. make-acpi_bus_register_driver-consistent-with-pci_register_driver-again.patch make acpi_bus_register_driver() consistent with pci_register_driver() remove-lock_section-from-x86_64-spin_lock-asm.patch remove LOCK_SECTION from x86_64 spin_lock asm kfree_skb-dump_stack.patch kfree_skb-dump_stack cancel_rearming_delayed_work.patch cancel_rearming_delayed_work() make cancel_rearming_delayed_workqueue static ipvs-deadlock-fix.patch ipvs deadlock fix minimal-ide-disk-updates.patch Minimal ide-disk updates use-find_trylock_page-in-free_swap_and_cache-instead-of-hand-coding.patch use find_trylock_page in free_swap_and_cache instead of hand coding fbdev-cleanup-broken-edid-fixup-code.patch fbdev: Cleanup broken edid fixup code fbcon-catch-blank-events-on-both-device-and-console-level.patch fbcon: Catch blank events on both device and console level fbcon-fix-compile-error.patch fbcon: Fix compile error fbdev-fbmon-cleanup.patch fbdev: Fbmon cleanup i810fb-module-param-fix.patch i810fb: Module param fix atyfb-fix-module-parameter-descriptions.patch atyfb: Fix module parameter descriptions radeonfb-fix-init-exit-section-usage.patch radeonfb: Fix init/exit section usage pxafb-reorder-add_wait_queue-and-set_current_state.patch pxafb: Reorder add_wait_queue() and set_current_state() sa1100fb-reorder-add_wait_queue-and-set_current_state.patch sa1100fb: Reorder add_wait_queue() and set_current_state() backlight-add-backlight-lcd-device-basic-support.patch backlight: Add Backlight/LCD device basic support fbdev-add-w100-framebuffer-driver.patch fbdev: Add w100 framebuffer driver raid5-overlapping-read-hack.patch raid5 overlapping read hack figure-out-who-is-inserting-bogus-modules.patch Figure out who is inserting bogus modules detect-atomic-counter-underflows.patch detect atomic counter underflows waiting-10s-before-mounting-root-filesystem.patch retry mounting the root filesystem at boot time post-halloween-doc.patch post halloween doc periodically-scan-redzone-entries-and-slab-control-structures.patch periodically scan redzone entries and slab control structures fuse-maintainers-kconfig-and-makefile-changes.patch Subject: [PATCH 1/11] FUSE - MAINTAINERS, Kconfig and Makefile changes fuse-core.patch Subject: [PATCH 2/11] FUSE - core fuse-device-functions.patch Subject: [PATCH 3/11] FUSE - device functions fuse-read-only-operations.patch Subject: [PATCH 4/11] FUSE - read-only operations fuse-read-write-operations.patch Subject: [PATCH 5/11] FUSE - read-write operations fuse-file-operations.patch Subject: [PATCH 6/11] FUSE - file operations fuse-mount-options.patch Subject: [PATCH 7/11] FUSE - mount options fuse-extended-attribute-operations.patch Subject: [PATCH 8/11] FUSE - extended attribute operations fuse-readpages-operation.patch Subject: [PATCH 9/11] FUSE - readpages operation fuse-nfs-export.patch Subject: [PATCH 10/11] FUSE - NFS export fuse-direct-i-o.patch Subject: [PATCH 11/11] FUSE - direct I/O ieee1394-adds-a-disable_irm-option-to-ieee1394ko.patch ieee1394: add a disable_irm option to ieee1394.ko fix-typo-in-arch-i386-kconfig.patch Fix typo in arch/i386/Kconfig random-whitespace-doh.patch random: whitespace doh random-entropy-debugging-improvements.patch random: entropy debugging improvements random-run-time-configurable-debugging.patch random: run-time configurable debugging random-periodicity-detection-fix.patch random: periodicity detection fix random-add_input_randomness.patch random: add_input_randomness various-kconfig-fixes.patch various Kconfig fixes ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 8:23 2.6.11-rc1-mm1 Andrew Morton @ 2005-01-14 8:47 ` Andi Kleen 2005-01-14 9:27 ` 2.6.11-rc1-mm1 Karim Yaghmour ` (3 more replies) 2005-01-14 12:36 ` 2.6.11-rc1-mm1 Miklos Szeredi ` (10 subsequent siblings) 11 siblings, 4 replies; 142+ messages in thread From: Andi Kleen @ 2005-01-14 8:47 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Andrew Morton <akpm@osdl.org> writes: > > - Added the Linux Trace Toolkit (and hence relayfs). Mainly because I > haven't yet taken as close a look at LTT as I should have. Probably neither > have you. I think it would be better to have a standard set of kprobes instead of all the ugly LTT hooks. kprobes could then log to relayfs or another fast logging mechanism. Advantage of this would be that it had no impact on fast paths unless enabled (LTT slows down a kernel quite considerable just by compiling it in) > As does relayfs, IMO. It seems to need some regularised way in which a > userspace relayfs client can tell relayfs what file(s) to use. LTT is > currently using some ghastly stick-a-pathname-in-/proc thing. Relayfs > should provide this service. > > relayfs needs a closer look too. A lot of advanced instrumentation > projects seem to require it, but none of them have been merged. Lots of > people say "use netlink instead" and lots of other people say "err, we think > relayfs is better". This is a discussion which needs to be had. imho relayfs and netlink are for completely problem spaces. relayfs is for relaying a lot of data quickly (e.g. for kernel instrumentation). There it fills a niche that printk doesn't fill (since it's too slow). netlink is quite slow (allocates data for each event, does lots of other gunk), but an useful extensible format for low frequency events. For the problems that relayfs solves netlink is totally unusable due to low efficiency (you could as well use printk, but that is also to slow). I think a low overhead logging mechanism is very much needed, because I find myself reinventing it quite often when I need to debug some timing sensitive problem. Trying to tackle these with printk is hopeless because it changes timing too much. The problem relayfs has IMHO is that it is too complicated. It seems to either suffer from a overfull specification or second system effect. There are lots of different options to do everything, instead of a nice simple fast path that does one thing efficiently. IMHO before merging it should go through a diet and only keep the paths that are actually needed and dropping a lot of the current baggage. Preferably that would be only the fastest options (extremly simple per CPU buffer with inlined fast path that drop data on buffer overflow), with leaving out anything more complicated. My ideal is something like the old SGI ktrace which was an extremly simple mechanism to do lockless per CPU logging of binary data efficiently and reading that from a user daemon. -Andi ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 8:47 ` 2.6.11-rc1-mm1 Andi Kleen @ 2005-01-14 9:27 ` Karim Yaghmour 2005-01-14 10:27 ` 2.6.11-rc1-mm1 Nikita Danilov ` (2 subsequent siblings) 3 siblings, 0 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-14 9:27 UTC (permalink / raw) To: Andi Kleen Cc: Andrew Morton, linux-kernel, Tom Zanussi, Larry Kessler, Richard J Moore, Robert Wisniewski, Michel Dagenais Andi Kleen wrote: > I think it would be better to have a standard set of kprobes instead > of all the ugly LTT hooks. kprobes could then log to relayfs or another > fast logging mechanism. > > Advantage of this would be that it had no impact on fast paths > unless enabled (LTT slows down a kernel quite considerable just > by compiling it in) There are different ways to look at this. For one thing, the current ltt hooks aren't as fast as they should be (i.e. we check whether the tracing is enabled for a certain event way too far in the code-path.) This should be rather simple to fix. Whether it be by checking for the event's logging as early as possible or by using one of the hooking frameworks that generate noops which cost nothing until tracing is enabled. None of this is really difficult. What is difficult is trying to maintain the LTT patches outside the kernel while trying to add all the bells-and-whistles that make such a thing lightweight and effective. As far as kprobes go, then you still need to have some form or another of marking the code for key events, unless you keep maintaining a set of kprobes-able points separately, which really makes it unusable for the rest of us, as the users of LTT have discovered over time (having to create a new patch for every new kernel that comes out.) Yet I do see the point of being able to add the stuff dynamically. So lately I've been thinking that there may be a middle-ground here where everyone could be happy. Define three states for the hooks: disabled, static, marker. The third one just adds some info into System.map for allowing the automation of the insertion of kprobes hooks (though you would still need the debugging info to find the values of the variables that you want to log.) Hence, you get to choose which type of poison you prefer. For my part, I think the noop/early-check should be sufficient to get better performance from the existing hook-set. > imho relayfs and netlink are for completely problem spaces. > relayfs is for relaying a lot of data quickly (e.g. for kernel > instrumentation). There it fills a niche that printk doesn't fill > (since it's too slow). netlink is quite slow (allocates data for each > event, does lots of other gunk), but an useful extensible format > for low frequency events. > > For the problems that relayfs solves netlink is totally unusable > due to low efficiency (you could as well use printk, but that is > also to slow). I think a low overhead logging mechanism is very > much needed, because I find myself reinventing it quite often > when I need to debug some timing sensitive problem. Trying to > tackle these with printk is hopeless because it changes timing too much. This is a very positive review, thanks. > The problem relayfs has IMHO is that it is too complicated. It > seems to either suffer from a overfull specification or second system > effect. There are lots of different options to do everything, > instead of a nice simple fast path that does one thing efficiently. > IMHO before merging it should go through a diet and only keep > the paths that are actually needed and dropping a lot of the current > baggage. > > Preferably that would be only the fastest options (extremly simple > per CPU buffer with inlined fast path that drop data on buffer overflow), > with leaving out anything more complicated. My ideal is something > like the old SGI ktrace which was an extremly simple mechanism > to do lockless per CPU logging of binary data efficiently and > reading that from a user daemon. Certainly we are more than willing to accomodate any reasonable changes. Some of the "overfeatures" you've noticed actually stem from our trying to implement a number of things over relayfs. For example, we've ported printk over to relayfs and have been able to obtain lossless printk by implementing dynamically resizable buffers. That doesn't mean there isn't room for improvement. If there are any specific changes you think are required, we'd be glad to take a look at them. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 8:47 ` 2.6.11-rc1-mm1 Andi Kleen 2005-01-14 9:27 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-14 10:27 ` Nikita Danilov 2005-01-14 10:38 ` 2.6.11-rc1-mm1 Andi Kleen 2005-01-14 15:24 ` 2.6.11-rc1-mm1 Roman Zippel 2005-01-18 11:19 ` 2.6.11-rc1-mm1 Masami Hiramatsu 3 siblings, 1 reply; 142+ messages in thread From: Nikita Danilov @ 2005-01-14 10:27 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel Andi Kleen <ak@muc.de> writes: [...] > > Preferably that would be only the fastest options (extremly simple > per CPU buffer with inlined fast path that drop data on buffer overflow), Logging mechanism that loses data is worse than useless. It's only too often that one spends a lot of time trying to reproduce some condition with logging on, only to find out that nothing was logged. [...] > > -Andi Nikita. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 10:27 ` 2.6.11-rc1-mm1 Nikita Danilov @ 2005-01-14 10:38 ` Andi Kleen 2005-01-14 11:06 ` 2.6.11-rc1-mm1 Karim Yaghmour 0 siblings, 1 reply; 142+ messages in thread From: Andi Kleen @ 2005-01-14 10:38 UTC (permalink / raw) To: Nikita Danilov; +Cc: linux-kernel On Fri, Jan 14, 2005 at 01:27:27PM +0300, Nikita Danilov wrote: > Andi Kleen <ak@muc.de> writes: > > [...] > > > > > Preferably that would be only the fastest options (extremly simple > > per CPU buffer with inlined fast path that drop data on buffer overflow), > > Logging mechanism that loses data is worse than useless. It's only too > often that one spends a lot of time trying to reproduce some condition > with logging on, only to find out that nothing was logged. When you have a timing bug and your logger starts to block randomly you also won't debug anything. Fix is to make your buffers bigger. -Andi ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 10:38 ` 2.6.11-rc1-mm1 Andi Kleen @ 2005-01-14 11:06 ` Karim Yaghmour 2005-01-14 15:31 ` 2.6.11-rc1-mm1 Roman Zippel 0 siblings, 1 reply; 142+ messages in thread From: Karim Yaghmour @ 2005-01-14 11:06 UTC (permalink / raw) To: Andi Kleen; +Cc: Nikita Danilov, linux-kernel, Tom Zanussi Andi Kleen wrote: > When you have a timing bug and your logger starts to block randomly > you also won't debug anything. Fix is to make your buffers bigger. relayfs allows you to choose which is best for you. >From Documentation/filesystems/relayfs.txt: ... int relay_open(channel_path, bufsize, nbufs, channel_flags, channel_callbacks, start_reserve, end_reserve, rchan_start_reserve, resize_min, resize_max, mode, init_buf, init_buf_size) ... - resize_min - if set, this signifies that the channel is auto-resizeable. The value specifies the size that the channel will try to maintain as a normal working size, and that it won't go below. The client makes use of the resizing callbacks and relay_realloc_buffer() and relay_replace_buffer() to actually effect the resize. - resize_max - if set, this signifies that the channel is auto-resizeable. The value specifies the maximum size the channel can have as a result of resizing. ... LTT uses fixed-sized channels, but the implementation of printk- over-relayfs used resize_min and resize_max to allow automatic sizing (grep for relay_open): http://www.opersys.com/ftp/pub/relayfs/patch-printk-on-relayfs-2.6.0-test1 ... now I'm going to get some sleep ... I'll catch up later with further discussion ... Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 11:06 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-14 15:31 ` Roman Zippel 2005-01-14 21:11 ` 2.6.11-rc1-mm1 Karim Yaghmour 0 siblings, 1 reply; 142+ messages in thread From: Roman Zippel @ 2005-01-14 15:31 UTC (permalink / raw) To: Karim Yaghmour; +Cc: Andi Kleen, Nikita Danilov, linux-kernel, Tom Zanussi Hi, On Fri, 14 Jan 2005, Karim Yaghmour wrote: > Andi Kleen wrote: > > When you have a timing bug and your logger starts to block randomly > > you also won't debug anything. Fix is to make your buffers bigger. > > relayfs allows you to choose which is best for you. > > >From Documentation/filesystems/relayfs.txt: > ... > int relay_open(channel_path, bufsize, nbufs, channel_flags, > channel_callbacks, start_reserve, end_reserve, > rchan_start_reserve, resize_min, resize_max, mode, > init_buf, init_buf_size) You don't think that's a little overkill? BTW it should return a pointer not an id, at every further access it needs to be looked up, killing the effects of any lockless mechanism. bye, Roman ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 15:31 ` 2.6.11-rc1-mm1 Roman Zippel @ 2005-01-14 21:11 ` Karim Yaghmour 2005-01-14 22:58 ` 2.6.11-rc1-mm1 Tim Bird ` (2 more replies) 0 siblings, 3 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-14 21:11 UTC (permalink / raw) To: Roman Zippel; +Cc: Andi Kleen, Nikita Danilov, linux-kernel, Tom Zanussi Hello Roman, Roman Zippel wrote: > You don't think that's a little overkill? I can see why you'd say this as a first impression, but really it isn't. Here's a simple primer to this call's parameters: channel_path, mode: Where does this appear in relayfs and what rights do user-space apps have over it (rwx). bufsize, nbufs: Usually things have to be subdivided in sub-buffers to make both writing and reading simple. LTT uses this to allow, among other things, random trace access. channel_flags, channel_callbacks: General channel management (should we write over unread data, is data delivered in bulk or in units, what granularity of timestamping is required, who should we call to initialize/ finalize the content of a sub-buffer.) All of these are used by LTT, for example, in a number of ways. start_reserve, end_reserve, rchan_start_reserve: Some subsystems, like LTT, need to be able to write some key data at sub-buffer boundaries. This is to specify how much space is required for said data. resize_min, resize_max: Allow for dynamic resizing of buffer. init_buf, init_buf_size: Is there an initial buffer containing some data that should be used to initialize the channel's content. If you're doing init-time tracing, for example, you need to have a pre-allocated static buffer that is copied to relayfs once relayfs is mounted. As you can see, most of this is already used in one way or another by LTT. The only thing LTT doesn't use is the dynamic resizing, but as was said earlier in this thread, some people actually want to have this. If it really came to it, we could drop this and resubmit when somebody actually requests this, but my understanding is that the previous poster did indeed indicate his need for this. > BTW it should return a pointer not an id, at every further access it needs > to be looked up, killing the effects of any lockless mechanism. Sounds reasonable. We will review this. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 21:11 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-14 22:58 ` Tim Bird 2005-01-15 0:20 ` 2.6.11-rc1-mm1 Andi Kleen 2005-01-15 4:25 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-15 1:06 ` 2.6.11-rc1-mm1 Roman Zippel 2005-01-16 16:14 ` 2.6.11-rc1-mm1 Christoph Hellwig 2 siblings, 2 replies; 142+ messages in thread From: Tim Bird @ 2005-01-14 22:58 UTC (permalink / raw) To: karim Cc: Roman Zippel, Andi Kleen, Nikita Danilov, linux-kernel, Tom Zanussi, ltt-dev Karim Yaghmour wrote: > Roman Zippel wrote: >>You don't think that's a little overkill? > >Based on the descriptions below, I think Roman is right. There's too much going on here for the average user. I haven't looked closely, but some of the stuff seems to be for esoteric use cases. There are two ways to approach it: - add a simplified API for the most common usage - strip out the stuff that's not really needed, and figure out workarounds for things (like tracing initialization) that need special assistance. Some of these options (e.g. bufsize) are available to the user via tracedaemon. I can honestly say I haven't got a clue what to use for some of them, and so always leave them at defaults. > I can see why you'd say this as a first impression, but really it isn't. > > Here's a simple primer to this call's parameters: > channel_path, mode: > Where does this appear in relayfs and what rights do > user-space apps have over it (rwx). > bufsize, nbufs: > Usually things have to be subdivided in sub-buffers to make > both writing and reading simple. LTT uses this to allow, > among other things, random trace access. Could these be simplified to a few enumerated modes? > channel_flags, channel_callbacks: > start_reserve, end_reserve, rchan_start_reserve: > resize_min, resize_max: > init_buf, init_buf_size: It seems like you could remove these from relay_open() and move them to get()/set() operations if you wanted to simplify the open API. Or, you could create other (separate) APIs to pre-fill the buffer or reserve space. Do you want me to take a look at this and propose some specific changes? (I won't get to this until Monday, though). ============================= Tim Bird Architecture Group Chair, CE Linux Forum Senior Staff Engineer, Sony Electronics ============================= ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 22:58 ` 2.6.11-rc1-mm1 Tim Bird @ 2005-01-15 0:20 ` Andi Kleen 2005-01-15 4:25 ` 2.6.11-rc1-mm1 Karim Yaghmour 1 sibling, 0 replies; 142+ messages in thread From: Andi Kleen @ 2005-01-15 0:20 UTC (permalink / raw) To: Tim Bird Cc: karim, Roman Zippel, Nikita Danilov, linux-kernel, Tom Zanussi, ltt-dev On Fri, Jan 14, 2005 at 02:58:38PM -0800, Tim Bird wrote: > > Roman Zippel wrote: > >>You don't think that's a little overkill? > > > >Based on the descriptions below, I think Roman is right. There's > too much going on here for the average user. I haven't looked closely, > but some of the stuff seems to be for esoteric use cases. There are > two ways to approach it: > - add a simplified API for the most common usage > - strip out the stuff that's not really needed, and figure out > workarounds for things (like tracing initialization) that need > special assistance. > > Some of these options (e.g. bufsize) are available to the user > via tracedaemon. I can honestly say I haven't got a clue what > to use for some of them, and so always leave them at defaults. This is a strong cue that they are unneeded. > > I can see why you'd say this as a first impression, but really it isn't. > > > > Here's a simple primer to this call's parameters: > > channel_path, mode: > > Where does this appear in relayfs and what rights do > > user-space apps have over it (rwx). > > bufsize, nbufs: > > Usually things have to be subdivided in sub-buffers to make > > both writing and reading simple. LTT uses this to allow, > > among other things, random trace access. > Could these be simplified to a few enumerated modes? Just make it a global single define in the source. > > > channel_flags, channel_callbacks: > > start_reserve, end_reserve, rchan_start_reserve: > > resize_min, resize_max: > > init_buf, init_buf_size: > > It seems like you could remove these from relay_open() and move them to > get()/set() operations if you wanted to simplify the open API. I think all for which not an clear need is demonstrated should be removed. If there is a real need it can be still readded later. But in the current form it is far too complicated and too fat. > Or, you could create other (separate) APIs to pre-fill the buffer or > reserve space. Do you want me to take a look at this and propose > some specific changes? (I won't get to this until Monday, though). No, no, it far less APIs not more. -Andi ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 22:58 ` 2.6.11-rc1-mm1 Tim Bird 2005-01-15 0:20 ` 2.6.11-rc1-mm1 Andi Kleen @ 2005-01-15 4:25 ` Karim Yaghmour 1 sibling, 0 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-15 4:25 UTC (permalink / raw) To: Tim Bird Cc: Roman Zippel, Andi Kleen, Nikita Danilov, linux-kernel, Tom Zanussi, ltt-dev Tim Bird wrote: > Some of these options (e.g. bufsize) are available to the user > via tracedaemon. I can honestly say I haven't got a clue what > to use for some of them, and so always leave them at defaults. Yes, but those defaults were chosen by a person who understood the kernel part's use of the buffer space, right? Presumably if you are writing your own relayfs client you know what type of throughput to expect and what size you'd like your buffers to be (bufsize and nbufs), so you need to be able to set this somehow and it only seems right that this be done upon instantiation. > Could these be simplified to a few enumerated modes? I don't see how. Do you have actual examples? As for the other fields, please see my response to Roman. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 21:11 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-14 22:58 ` 2.6.11-rc1-mm1 Tim Bird @ 2005-01-15 1:06 ` Roman Zippel 2005-01-15 4:18 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-16 16:14 ` 2.6.11-rc1-mm1 Christoph Hellwig 2 siblings, 1 reply; 142+ messages in thread From: Roman Zippel @ 2005-01-15 1:06 UTC (permalink / raw) To: Karim Yaghmour; +Cc: Andi Kleen, Nikita Danilov, linux-kernel, Tom Zanussi Hi, On Fri, 14 Jan 2005, Karim Yaghmour wrote: > As you can see, most of this is already used in one way or another by > LTT. The only thing LTT doesn't use is the dynamic resizing, but as was > said earlier in this thread, some people actually want to have this. This doesn't mean everything has to be put into a single call. Several parameters can still be set after creation. > start_reserve, end_reserve, rchan_start_reserve: > Some subsystems, like LTT, need to be able to write some key > data at sub-buffer boundaries. This is to specify how much > space is required for said data. Why should a subsystem care about the details of the buffer management? You could move all this into the relay layer by making a relay channel an event channel. I know you want to save space, but having a magic event_struct_size array is not a good idea. If you have that much events, that a little more overhead causes problems, the tracing results won't be reliable anymore anyway. Simplicity and maintainability are far more important than saving a few bytes, the general case should be fast and simple, leave the complexity to the special cases. bye, Roman ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-15 1:06 ` 2.6.11-rc1-mm1 Roman Zippel @ 2005-01-15 4:18 ` Karim Yaghmour 2005-01-16 2:38 ` 2.6.11-rc1-mm1 Roman Zippel 0 siblings, 1 reply; 142+ messages in thread From: Karim Yaghmour @ 2005-01-15 4:18 UTC (permalink / raw) To: Roman Zippel; +Cc: Andi Kleen, Nikita Danilov, linux-kernel, Tom Zanussi Hello Roman, Roman Zippel wrote: > This doesn't mean everything has to be put into a single call. Several > parameters can still be set after creation. I don't have a problem with that. If that's preferable, then we can do it this way too. > Why should a subsystem care about the details of the buffer management? Because it wants to enforce a data format on buffer boundaries. Let me explain how this applies in the case of LTT, but this easily generalizes itself to any sort of subsystem that needs to transfer large amounts of information between the kernel and user-space. And to avoid any confusion, let me repeat that relayfs is not intended just for conveying debug/performance/trace info. Basically, in the case of LTT at least, the kernel tracing infrastructure must provide a stream of data to the user-space tools that they will in turn process and display to the user. At this point it must be said that what you write and you how write it in the trace depends largely on a few key issues. Namely: - How much data you expect to be generating. - What you intend to do with it. Given ltt's target audience (mainstream developers, sysadmins, and power- users), one of the goals was to have a trace format that provided easy browsing forward and backwards, and random access. Initially, this was implemented using two 1MB buffers, one that was being written to while the other one was being written to disk. So, in essence, we had random access at 1MB boundaries. For reading backwards, the size of the event is written at the end of the event and we just need to read 2 bytes prior to the current event to know where the previous event started. Eventually we found that this format was rather bulky, and that it recorded superfluous data. Amongst other things we relied on a single buffer, so with each event we logged the CPU-ID of the processor on which the event occured. So, in order to reduce the amount of data recorded and in trying to obtain better performance at runtime by avoiding a call to do_gettimeofday for every event, we did the following: - Eliminate the CPU-ID => use per-cpu buffers instead. - Stop calling do_gettimeofday when possible => instead write a complete time-stamp at sub-buffer boundaries (begining and end; because of clock drift) and only read the lower-half of the TSC for each event. Determining an event's actual time is done in post-mortem in user-space. So how does this translate in practice? Here's the trace header. This is written only once at the start of the trace: /* Information logged when a trace is started */ typedef struct _ltt_trace_start { u32 magic_number; u32 arch_type; u32 arch_variant; u32 system_type; u8 major_version; u8 minor_version; u32 buffer_size; ltt_event_mask event_mask; ltt_event_mask details_mask; u8 log_cpuid; u8 use_tsc; u8 flight_recorder; } LTT_PACKED_STRUCT ltt_trace_start; This is written in the begining of every new sub-buffer: /* Start of trace buffer information */ typedef struct _ltt_buffer_start { struct timeval time; /* Time stamp of this buffer */ u32 tsc; /* TSC of this buffer, if applicable */ u32 id; /* Unique buffer ID */ } LTT_PACKED_STRUCT ltt_buffer_start; This is written at the end of every sub-buffer: typedef struct _ltt_buffer_end { struct timeval time; /* Time stamp of this buffer */ u32 tsc; /* TSC of this buffer, if applicable */ } LTT_PACKED_STRUCT ltt_buffer_end; As you can see, we can't just dump this information in an event channel. This is really intrinsic to how the trace data is going to be read later on. Removing this data would require more data for each event to be logged, and require parsing through the trace before reading it in order to obtain markers allowing random access. This wouldn't be so bad if we were expecting users to use LTT sporadically for very short periods of time. However, given ltt's target audience (i.e. need to run traces for hours, maybe days, weeks), traces would rapidely become useless because while plowing through a few hundred KBs of data and allocating RAM for building internal structures as you go is fine, plowing through tens of GBs of data, possibly hundreds, requires that you come up with a format that won't require unreasonable resources from your system, while incuring negligeable runtime costs for generating it. We believe the format we currently have achieves the right balance here. So what happens now is that ltt tells relayfs when creating a channel how much space it needs for these basic structures, and provides it with callbacks which are invoked at boundaries for filling the actual reserved space. In all other circumstances, here's what we are writing into the relayfs buffer for each event: - Event ID (1 byte) - Time delta (4 bytes) => this the low 32-bits from the TSC or a diff between the current do_gettimeofday and the one at buffer start. - Event details (variable length, see include/linux/ltt-events.h) - Event size (2 bytes) Of course there are possible improvements. For one thing, we've discussed dropping the "event size" altogether and rely on smaller buffers and dynamically create sub-buffer indexing tables for reading backwards. This is still part of a work in progress which aims at creating an even better and more flexible format. Of course in an ideal world this new format and the corresponding user tools would be available as we speak, but there's only so much that can be done without having an existing solid base to work off on. As usual, we're open to any other outside suggestions. > You could move all this into the relay layer by making a relay channel > an event channel. I know you want to save space, but having a magic > event_struct_size array is not a good idea. If you have that much events, > that a little more overhead causes problems, the tracing results won't be > reliable anymore anyway. I hope what I said above explains why this isn't possible. > Simplicity and maintainability are far more important than saving a few > bytes, the general case should be fast and simple, leave the complexity to > the special cases. I agree. I also realize that not all relayfs clients will have the same requirements as ltt. Already, ltt uses a few things from relayfs that others are unlikely to need. For example, it directly invokes relay_lock_channel() to directly lock a channel and relay_write_direct() to directly write to the buffers without relying on the usual relay_write() which takes care of both. This allows LTT to do zero-copy (i.e. no need to pack a buffer before comiting it.) Other subsystems may actually not use any relayfs function to write, but instead write directly to a channel as if it was an allocated buffer (which in fact it is). In all cases, though, the open(), mmap(), write() semantic makes it very simple for user-space applications to process channeled data. So here's a suggested change. Instead of the current relay_open() API, here are three replacement functions (inspired by Tim's input and your comments above): relay_open(channel_path, mode, bufsize, nbufs); relay_set_property(property, value); relay_get_property(property, &value); Is this more palatable? Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-15 4:18 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-16 2:38 ` Roman Zippel 2005-01-16 6:00 ` 2.6.11-rc1-mm1 Karim Yaghmour 0 siblings, 1 reply; 142+ messages in thread From: Roman Zippel @ 2005-01-16 2:38 UTC (permalink / raw) To: Karim Yaghmour; +Cc: Andi Kleen, Nikita Danilov, linux-kernel, Tom Zanussi Hi, On Fri, 14 Jan 2005, Karim Yaghmour wrote: > > Why should a subsystem care about the details of the buffer management? > > Because it wants to enforce a data format on buffer boundaries. It's interesting to read more about ltt's requirements, but I still think it's possible to leave this work to the relayfs layer. Why not just move the ltt buffer management into relayfs and provide a small library, which extracts the event stream again? Otherwise you have to duplicate this work for every serious relayfs user anyway. Completely abstracting the buffer management would the make whole interface simpler and it would be a lot easier to change without breaking everything. E.g. it would be possible to use per cpu buffers and remove the need for different locking mechanisms, for a good tracing mechanism it's not just important that it's lockless, but also that the cpus don't share cache lines in the fast path. In this regard relayfs/ltt has really still too much overhead and the complex relayfs API isn't really making it easy to fix this. bye, Roman ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 2:38 ` 2.6.11-rc1-mm1 Roman Zippel @ 2005-01-16 6:00 ` Karim Yaghmour 2005-01-16 16:52 ` 2.6.11-rc1-mm1 Roman Zippel 2005-01-16 19:05 ` 2.6.11-rc1-mm1 Tom Zanussi 0 siblings, 2 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-16 6:00 UTC (permalink / raw) To: Roman Zippel; +Cc: Andi Kleen, Nikita Danilov, linux-kernel, Tom Zanussi Hello Roman, Roman Zippel wrote: > It's interesting to read more about ltt's requirements, but I still think > it's possible to leave this work to the relayfs layer. Ok, I'm willing to play ball, but can you be a little bit more specific. > Why not just move the ltt buffer management into relayfs and provide a > small library, which extracts the event stream again? Otherwise you have > to duplicate this work for every serious relayfs user anyway. Ok, I've been meditating over what you say above for some time in order to understand how best to follow what you are suggesting. So here's what I've been able to come up with. Let me know if you have other suggestions: Drop the buffer-start/end callbacks altogether. Instead, allow user to specify in the channel properties whether they want to have sub-buffer delimiters. If so, relayfs would automatically prepend and append the structures currently written by ltt: /* Start of trace buffer information */ typedef struct _ltt_buffer_start { struct timeval time; /* Time stamp of this buffer */ u32 tsc; /* TSC of this buffer, if applicable */ u32 id; /* Unique buffer ID */ } LTT_PACKED_STRUCT ltt_buffer_start; /* End of trace buffer information */ typedef struct _ltt_buffer_end { struct timeval time; /* Time stamp of this buffer */ u32 tsc; /* TSC of this buffer, if applicable */ } LTT_PACKED_STRUCT ltt_buffer_end; This would also allow dropping the start_reserve, end_reserve, and channel_start_reserve. The latter can be added by ltt as its first event. Is this what you are looking for and is there something else we should be doing. > Completely abstracting the buffer management would the make whole > interface simpler and it would be a lot easier to change without breaking > everything. E.g. it would be possible to use per cpu buffers and remove > the need for different locking mechanisms, for a good tracing mechanism > it's not just important that it's lockless, but also that the cpus don't > share cache lines in the fast path. In this regard relayfs/ltt has really > still too much overhead and the complex relayfs API isn't really making it > easy to fix this. The per-cpu buffering issue is really specific to the client. It just so happens that LTT creates one channel for each CPU. Not everyone who needs to ship lots of data to user-space needs/wants one channel per cpu. You could, for example, use a relayfs channel as a big chunk of memory visible to both a user-space app and its kernel buddy in order to exchange data without ever using either needing more than one such channel for your entire subsystem. As for lockless vs. locking there is a need for both. Not having to get locks has obvious advantages, but if you require strict timing you will want to use the locking scheme because its logging time is linear (see Thomas' complaints about lockless elsewhere in this thread, and Ingo's complaints about relayfs somewhere back in October.) But in trying to make things simpler, here's a reworked API: rchan* relay_open(channel_path, mode, bufsize, nbufs); int relay_close(*rchan); int relay_reset(*rchan) int relay_write(*rchan, *data_ptr, count, **wrote-pos); int relay_info(*rchan, *channel_info) void relay_set_property(*rchan, property, value); void relay_get_property(*rchan, property, *value); For direct writing (currently already used by ltt, for example): char* relay_reserve(*rchan, len, *ts, *td, *err, *interrupting) void relay_commit(*rchan, *from, len, reserve_code, interrupting); These are the related macros: #define relay_write_direct(DEST, SRC, SIZE) \ #define relay_lock_channel(RCHAN, FLAGS) \ #define relay_unlock_channel(RCHAN, FLAGS) \ As I hinted elsewhere, we would now have three modes for relayfs channels: - locking => relies on local_irq_save. - lockless => relies on try_reserve/fail->retry (based on cmpxchg). - kdebug => this is for kernel debugging. The last one could be based on Ingo's tracing code, or any implementation suggestions by Thomas. It wouldn't do all the checks and provide all the capabilities of the other two mechanisms, but would really be a hot-path logger with only minimalistic provisions for content loss and other such things. (note to Tom: time_delta_offset that used to be in relay_write should be a property set using relay_set_property). What I'm dropping for now is all the functions that allow a subsystem to read from a channel from within the kernel. So, for example, if you want to obtain large amounts of data from user-space via a relayfs channel you won't be able to. Here are the functions that would go: rchan_reader *add_rchan_reader(channel_id, auto_consume) int remove_rchan_reader(rchan_reader *reader) rchan_reader *add_map_reader(channel_id) int remove_map_reader(rchan_reader *reader) int relay_read(reader, buf, count, wait, *actual_read_offset) void relay_buffers_consumed(reader, buffers_consumed) void relay_bytes_consumed(reader, bytes_consumed, read_offset) int relay_bytes_avail(reader) int rchan_full(reader) int rchan_empty(reader) We could add these at a later time when/if needed. Removing these changes nothing for ltt. Also, we should try to get rid of the following. They are there for allowing dynamically-resizable buffers, but if we are to make buffer-management opaque, then this should be done internally (Tom: I can't remember the rationale for these. Let me know if there's a reason why the must be kept.) int relay_realloc_buffer(*rchan, nbufs, async) int relay_replace_buffer(*rchan) I think this is a pretty major change and simplification of the API along the lines of what others have asked for. Let me know what you think. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 6:00 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-16 16:52 ` Roman Zippel 2005-01-16 21:18 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-16 19:05 ` 2.6.11-rc1-mm1 Tom Zanussi 1 sibling, 1 reply; 142+ messages in thread From: Roman Zippel @ 2005-01-16 16:52 UTC (permalink / raw) To: Karim Yaghmour; +Cc: Andi Kleen, Nikita Danilov, linux-kernel, Tom Zanussi Hi, On Sun, 16 Jan 2005, Karim Yaghmour wrote: > The per-cpu buffering issue is really specific to the client. It just > so happens that LTT creates one channel for each CPU. Not everyone > who needs to ship lots of data to user-space needs/wants one channel > per cpu. You could, for example, use a relayfs channel as a big > chunk of memory visible to both a user-space app and its kernel buddy > in order to exchange data without ever using either needing more > than one such channel for your entire subsystem. It seems we first need to specify, what relayfs actually is supposed to be. Is it a relaying mechanism for large amount of data from kernel to user space or is it a general communication channel between kernel and user space? You have to choose one, if you mix contradicting requirements, you'll never get a simple abstraction layer and relayfs will always be a pain to work with. > > Why not just move the ltt buffer management into relayfs and provide a > > small library, which extracts the event stream again? Otherwise you have > > to duplicate this work for every serious relayfs user anyway. > > Ok, I've been meditating over what you say above for some time in order > to understand how best to follow what you are suggesting. So here's > what I've been able to come up with. Let me know if you have other > suggestions: > > Drop the buffer-start/end callbacks altogether. Instead, allow user > to specify in the channel properties whether they want to have > sub-buffer delimiters. If so, relayfs would automatically prepend > and append the structures currently written by ltt: > /* Start of trace buffer information */ > typedef struct _ltt_buffer_start { > struct timeval time; /* Time stamp of this buffer */ > u32 tsc; /* TSC of this buffer, if applicable */ > u32 id; /* Unique buffer ID */ > } LTT_PACKED_STRUCT ltt_buffer_start; > > /* End of trace buffer information */ > typedef struct _ltt_buffer_end { > struct timeval time; /* Time stamp of this buffer */ > u32 tsc; /* TSC of this buffer, if applicable */ > } LTT_PACKED_STRUCT ltt_buffer_end; You can make it even simpler by dropping this completely. Every buffer is simply a list of events and you can let ltt write periodically a timer event. In userspace you can randomly seek at buffer boundaries and search for the timer events. It will require a bit more work for userspace, but even large amount of tracing data stays managable. > As for lockless vs. locking there is a need for both. Not having > to get locks has obvious advantages, but if you require strict > timing you will want to use the locking scheme because its logging > time is linear (see Thomas' complaints about lockless elsewhere > in this thread, and Ingo's complaints about relayfs somewhere back > in October.) But why has it to be done in relayfs? Simply leave it to the user to write an extra id field: event_id = atomic_inc_return(&event_cnt); Userspace can then easily restore the original order of events. bye, Roman ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 16:52 ` 2.6.11-rc1-mm1 Roman Zippel @ 2005-01-16 21:18 ` Karim Yaghmour 2005-01-17 1:37 ` 2.6.11-rc1-mm1 Thomas Gleixner ` (2 more replies) 0 siblings, 3 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-16 21:18 UTC (permalink / raw) To: Roman Zippel; +Cc: Andi Kleen, Nikita Danilov, linux-kernel, Tom Zanussi Hello Roman, Roman Zippel wrote: > It seems we first need to specify, what relayfs actually is supposed to > be. Is it a relaying mechanism for large amount of data from kernel to > user space or is it a general communication channel between kernel and > user space? You have to choose one, if you mix contradicting requirements, > you'll never get a simple abstraction layer and relayfs will always be a > pain to work with. I think we want to concentrate on the former, though I suspect the latter will happen eventually. But let's keep our focus on providing a mechanism for relaying large amounts of data from the kernel to user-space. > You can make it even simpler by dropping this completely. Every buffer is > simply a list of events and you can let ltt write periodically a timer > event. In userspace you can randomly seek at buffer boundaries and search > for the timer events. It will require a bit more work for userspace, but > even large amount of tracing data stays managable. We already do write a heartbeat event periodically to have readable traces in the case where the lower 32 bits of the TSC wrap-around. As I mentioned elsewhere, please don't think of this in terms of kbs or mbs of data. What we're talking about here is gbs if not 100gbs of data. Having to start reading each sub-buffer until you hit a heartbeat really is a killer for such large traces. If there was a significant impact on relayfs for having this I would have understood the argument, but relayfs needs to do buffer-management anyway, so I don't see that much complexity being added by allowing the channel user to ask relayfs for delimiters. > Userspace can then easily restore the original order of events. As above, restoring the original order of events is fine if you are looking at mbs or kbs of data. It's just totally unrealistic for the amounts of data we want to handle. But like I said earlier, the added relayfs mode (kdebug) would allow for exactly what you are suggesting: event_id = atomic_inc_return(&event_cnt); So here's the new API based on input from Christoph and Tom: rchan* relay_open(channel_path, bufsize, nbufs); int relay_close(*rchan); int relay_reset(*rchan) int relay_write(*rchan, *data_ptr, count, **wrote-pos); int relay_info(*rchan, *channel_info) void relay_set_property(*rchan, property, value); void relay_get_property(*rchan, property, *value); For direct writing (currently already used by ltt, for example): char* relay_reserve(*rchan, len, *ts, *td, *err, *interrupting) void relay_commit(*rchan, *from, len, reserve_code, interrupting); void relay_buffers_consumed(*rchan, u32) These are the related macros: #define relay_write_direct(DEST, SRC, SIZE) \ #define relay_lock_channel(RCHAN, FLAGS) \ #define relay_unlock_channel(RCHAN, FLAGS) \ What we are dropping for later review: read/write semantics from user-space. It has to be understood that we believe that this is a major drawback. For one thing, you won't be able to do something like: $ cat /relayfs/xchg/my-file > ~/test-data Instead, you will have to write a custom app that does open(), mmap(), write(). We could still provide a small app/library that did this automagically, but you've got to admit that nothing beats the real thing. Also note that there are people who currently use this already, so there will be some unhappy campers. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 21:18 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-17 1:37 ` Thomas Gleixner 2005-01-17 2:24 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-17 13:54 ` 2.6.11-rc1-mm1 Roman Zippel 2005-01-17 17:02 ` 2.6.11-rc1-mm1 Tom Zanussi 2 siblings, 1 reply; 142+ messages in thread From: Thomas Gleixner @ 2005-01-17 1:37 UTC (permalink / raw) To: Karim Yaghmour Cc: Roman Zippel, Andi Kleen, Nikita Danilov, LKML, Tom Zanussi On Sun, 2005-01-16 at 16:18 -0500, Karim Yaghmour wrote: > We already do write a heartbeat event periodically to have readable > traces in the case where the lower 32 bits of the TSC wrap-around. Which is every 1.42 seconds on a 3GHz machine. I guess we don't have GB's of data when the 1.42 seconds elapse without an event. > > Userspace can then easily restore the original order of events. > > As above, restoring the original order of events is fine if you are > looking at mbs or kbs of data. It's just totally unrealistic for > the amounts of data we want to handle. I still don't see the point. The implicit ability of LTT to allow tracing of up to 8192 bytes user data, strings and XML makes this neccecary. I do not see any neccecarity to integrate this special usage modes instead of an generic usable instrumentation implementation. If relayfs is giving those users the ability to do so then they can do it, but I object the fact that LTT/relayfs is occupying the place of a more generic implementation in the way it is implemeted now. For normal event tracing you have about 32-64 byte of data per event. So disabling interrupts in order to copy this amount of imformation into a buffer is cheaper on most architectures than doing the whole magic in LTT and relayfs. This also keeps your buffers consistent and does not need any magic for postprocessing. Sorting out disabled events in the hot path and moving the if (pid/gid/grp) whatever stuff into userspace postprocessing is not an alien request. You are talking of Gigabytes of data. In what time ? Let's do some math. For simplicity all events use 64 Byte event space. ~ 64kB/sec for 1000 events/s (event frequency 1kHz) ( 1 ms) 1024kB/sec for 16 events/ms (event frequency 16kHz) (62 us) 2048kB/sec for 32 events/ms (event frequency 32kHz) (31 us) 4096kB/sec for 64 events/ms (event frequency 64kHz) (15 us) 8192kB/sec for 128 events/ms (event frequency 128kHz) ( 8 us) where a 100Mbit network can theoretically transport 10240kB/sec and practically does 4000-8000 kB/sec. An event frequency of 8us even on a 3 GHz machine is complete illusion, because we spend already a couple of usecs in servicing the legacy 8254 timer. So the realistic assumption on a 3Ghz machine is definitely below 64kHz, which means we have to handle max. 4Mb of data per second. I'm not impressed. Disabling interrupts for a couple of nano seconds to store the trace data in the buffer does not hurt at all. Running through a big bunch of out of cache line instructions does. If you try to trace more than this amount you are toast anyway. Please beware me of "reality has bitten" arguments. The whole if(..) scenario in _ltt_event_log() is doing postprocessing, which can be done in userspace. I don't care about the required time as long as it does not introduce additional burden into the kernel. > Also note that there are people who currently use this already, > so there will be some unhappy campers. Be aware that there are some unhappy campers in the kernel community too when the special purpose tracing is included instead of a general usable framework. tglx ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-17 1:37 ` 2.6.11-rc1-mm1 Thomas Gleixner @ 2005-01-17 2:24 ` Karim Yaghmour 2005-01-17 12:20 ` 2.6.11-rc1-mm1 Thomas Gleixner 0 siblings, 1 reply; 142+ messages in thread From: Karim Yaghmour @ 2005-01-17 2:24 UTC (permalink / raw) To: tglx Cc: Roman Zippel, Andi Kleen, Nikita Danilov, LKML, Tom Zanussi, Robert Wisniewski Thomas Gleixner wrote: > Which is every 1.42 seconds on a 3GHz machine. I guess we don't have > GB's of data when the 1.42 seconds elapse without an event. My argument was about being able to browse the amount of data I was refering to. The hearbeat thing was an asside to Roman as to the fact that we already do what he's suggesting. > I still don't see the point. The implicit ability of LTT to allow > tracing of up to 8192 bytes user data, strings and XML makes this > neccecary. I do not see any neccecarity to integrate this special usage > modes instead of an generic usable instrumentation implementation. I've already clarified your mischaracterization of custom events, you are being dissengenious here. If you want a generalized hooking mechanism, feel free to ask Andrew to take kernel hooks: http://www-124.ibm.com/developerworks/oss/linux/projects/kernelhooks/ > If relayfs is giving those users the ability to do so then they can do > it, but I object the fact that LTT/relayfs is occupying the place of a > more generic implementation in the way it is implemeted now. Again, damned if we do, damned if don't. LTT isn't meant for kernel debugging per se, though you can use it to that end to a certain extent. However, if you are kernel debugging, you will find the ad-hoc mode I'm talking about adding to relayfs quite useful. > For normal event tracing you have about 32-64 byte of data per event. So > disabling interrupts in order to copy this amount of imformation into a > buffer is cheaper on most architectures than doing the whole magic in > LTT and relayfs. This also keeps your buffers consistent and does not > need any magic for postprocessing. Oh, now you want to lighten the weight on postprocessing? Common Thomas, please stop wasting my time. Note, however, that we are thinking of dropping the lockless scheme for now. We will pick up this discussion separately further down the road. > Sorting out disabled events in the hot path and moving the if > (pid/gid/grp) whatever stuff into userspace postprocessing is not an > alien request. It is. Have you even read what I suggested to change in my other mail: if ((any_filtering) && !(ltt_filter(event_id, event_struct, data))) return -EINVAL; You're not honestly telling me that checking for any_filtering is going to ruin your day. > You are talking of Gigabytes of data. In what time ? > > Let's do some math. > > For simplicity all events use 64 Byte event space. > > ~ 64kB/sec for 1000 events/s (event frequency 1kHz) ( 1 ms) > 1024kB/sec for 16 events/ms (event frequency 16kHz) (62 us) > 2048kB/sec for 32 events/ms (event frequency 32kHz) (31 us) > 4096kB/sec for 64 events/ms (event frequency 64kHz) (15 us) > 8192kB/sec for 128 events/ms (event frequency 128kHz) ( 8 us) > > where a 100Mbit network can theoretically transport 10240kB/sec and > practically does 4000-8000 kB/sec. > > An event frequency of 8us even on a 3 GHz machine is complete illusion, > because we spend already a couple of usecs in servicing the legacy 8254 > timer. > > So the realistic assumption on a 3Ghz machine is definitely below 64kHz, > which means we have to handle max. 4Mb of data per second. Actually, on a PII-350MHz, I was already generating 0.5MB/s of data just by running an X session. If we assume that a machine 10 times faster generates 10 times as many events, we've already got 5MB/s, and I'm sure that there are heavier cases than X. Here's the paper if you want to read it: http://www.opersys.com/ftp/pub/LTT/Documentation/ltt-usenix.ps.gz > I'm not impressed. Disabling interrupts for a couple of nano seconds to > store the trace data in the buffer does not hurt at all. Running through > a big bunch of out of cache line instructions does. Like I said above, fighting for/against lockless is not our immediate goal, and we will likely remove it. > If you try to trace more than this amount you are toast anyway. > > Please beware me of "reality has bitten" arguments. The whole if(..) > scenario in _ltt_event_log() is doing postprocessing, which can be done > in userspace. I don't care about the required time as long as it does > not introduce additional burden into the kernel. Not even Ingo hinted at getting rid of filtering. Remember the earlier e-mail I refered to? Here's what he was suggesting: > void trace(event, data1, data2, data3) > { > int cpu = smp_processor_id(); > int idx, pending, *curr = curr_idx + cpu; > struct trace_event *t; > unsigned long flags; > > if (!event_wanted(current, event, data1, data2, data3)) > return; > > local_irq_save(flags); > > idx = ++curr_idx[cpu] & (NR_TRACE_ENTRIES - 1); > pending = ++curr_pending[cpu]; > > t = trace_ring[cpu] + idx; > > t->event = event; > rdtscll(t->timestamp); > t->data1 = data1; > t->data2 = data2; > t->data3 = data3; > > if (curr_pending == TRACE_LOW_WATERMARK && tracer_task) > wake_up_process(tracer_task); > > local_irq_restore(flags); > } Notice the "event_wanted()"? Original found here: http://marc.theaimsgroup.com/?l=linux-kernel&m=103273730326318&w=2 Again, Thomas, I don't mind hearing you out, but please don't waste my time. > Be aware that there are some unhappy campers in the kernel community too > when the special purpose tracing is included instead of a general usable > framework. Like I said, we are willing to accomodate those who want to be able to use relayfs for kernel debugging purposes, but we can hardly be blamed for not making LTT a generic kernel debugging tool as this is exactly the excuse many kernel developers had for not including LTT to start with. It's just totally dissengenious for giving us grief for claiming that we are doing something and then later turn around and blame us for not doing it ... cheesh ... Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-17 2:24 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-17 12:20 ` Thomas Gleixner 2005-01-17 20:32 ` 2.6.11-rc1-mm1 Karim Yaghmour 0 siblings, 1 reply; 142+ messages in thread From: Thomas Gleixner @ 2005-01-17 12:20 UTC (permalink / raw) To: Karim Yaghmour Cc: Roman Zippel, Andi Kleen, Nikita Danilov, LKML, Tom Zanussi, Robert Wisniewski On Sun, 2005-01-16 at 21:24 -0500, Karim Yaghmour wrote: > > Sorting out disabled events in the hot path and moving the if > > (pid/gid/grp) whatever stuff into userspace postprocessing is not an > > alien request. > > It is. Have you even read what I suggested to change in my other mail: > if ((any_filtering) && !(ltt_filter(event_id, event_struct, data))) > return -EINVAL; Sorting out disabled events is the filtering you have to do in kernel and you should do it in the hot path or remove the unneccecary tracepoints at compiletime. > > 4096kB/sec for 64 events/ms (event frequency 64kHz) (15 us) > > 8192kB/sec for 128 events/ms (event frequency 128kHz) ( 8 us) > Actually, on a PII-350MHz, I was already generating 0.5MB/s of data > just by running an X session. If we assume that a machine 10 times > faster generates 10 times as many events, we've already got 5MB/s, > and I'm sure that there are heavier cases than X. You are not answering my argument. 8MB/sec is an event frequency of 128hz when we assume 64byte/event. It's one event every 8us. So every unneccecary computation, every leaving the hotpath for nothing is just giving you performance loss. > Not even Ingo hinted at getting rid of filtering. Remember the earlier > e-mail I refered to? Here's what he was suggesting: I said: > > Sorting out disabled events in the hot path s/Sorting/Filtering/ I never said this should not be done. > Like I said, we are willing to accomodate those who want to be able > to use relayfs for kernel debugging purposes, but we can hardly > be blamed for not making LTT a generic kernel debugging tool as this > is exactly the excuse many kernel developers had for not including > LTT to start with. It's just totally dissengenious for giving us > grief for claiming that we are doing something and then later turn > around and blame us for not doing it ... cheesh ... Seperating layers as I suggested before is not making it a generic debugging tool. It makes parts of those layers available for other usage and gives us the chance to reuse the parts for cleaning up already available code which has the same hardwired structure. tglx ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-17 12:20 ` 2.6.11-rc1-mm1 Thomas Gleixner @ 2005-01-17 20:32 ` Karim Yaghmour 2005-01-17 22:31 ` 2.6.11-rc1-mm1 Thomas Gleixner 0 siblings, 1 reply; 142+ messages in thread From: Karim Yaghmour @ 2005-01-17 20:32 UTC (permalink / raw) To: tglx Cc: Roman Zippel, Andi Kleen, Nikita Danilov, LKML, Tom Zanussi, Robert Wisniewski Thomas Gleixner wrote: > Sorting out disabled events is the filtering you have to do in kernel > and you should do it in the hot path or remove the unneccecary > tracepoints at compiletime. Do you actually read my replies or do you just grep for something you can object to? If you care to read my replies you will see that this has already been answered. > You are not answering my argument. 8MB/sec is an event frequency of > 128hz when we assume 64byte/event. It's one event every 8us. So every > unneccecary computation, every leaving the hotpath for nothing is just > giving you performance loss. I have, you just choose not to read. Here's what I said earlier: > Note, however, that we are thinking of dropping the lockless scheme > for now. We will pick up this discussion separately further down the > road. IOW, we will be using cli/sti. So there is no "leaving the hotpath". > I said: > >>>Sorting out disabled events in the hot path > > > s/Sorting/Filtering/ > > I never said this should not be done. You're either on crack or I don't know how to read english. Here's what you said: > Sorting out disabled events in the hot path and moving the if > (pid/gid/grp) whatever stuff into userspace postprocessing is not an > alien request. Clearly you are suggesting to moving the filtering into user-space. > Seperating layers as I suggested before is not making it a generic > debugging tool. It makes parts of those layers available for other usage > and gives us the chance to reuse the parts for cleaning up already > available code which has the same hardwired structure. This has already been answered. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-17 20:32 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-17 22:31 ` Thomas Gleixner 2005-01-17 22:42 ` 2.6.11-rc1-mm1 Robert Wisniewski 2005-01-17 23:41 ` 2.6.11-rc1-mm1 Karim Yaghmour 0 siblings, 2 replies; 142+ messages in thread From: Thomas Gleixner @ 2005-01-17 22:31 UTC (permalink / raw) To: Karim Yaghmour Cc: Roman Zippel, Andi Kleen, Nikita Danilov, LKML, Tom Zanussi, Robert Wisniewski On Mon, 2005-01-17 at 15:32 -0500, Karim Yaghmour wrote: > You're either on crack or I don't know how to read english. Here's what > you said: Maybe you should read your own comment about ad-hominem attacks earlier in this thread and consider if it might apply to you. I know, what I have said. I said reduce the filtering to the absolute minimum and do the rest in userspace. The now builtin filters are defined to fit somebodys needs or idea of what the user should / wants to see. They will not fit everybodys needs / ideas. So we start modifying, adding and #ifdefing kernel filters, which is a scary vision. Enabling and disabling events is a valid basic filter request, which should live in the kernel. Anything else should go into userspace, IMO. tglx ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-17 22:31 ` 2.6.11-rc1-mm1 Thomas Gleixner @ 2005-01-17 22:42 ` Robert Wisniewski 2005-01-17 23:26 ` 2.6.11-rc1-mm1 Thomas Gleixner 2005-01-17 23:41 ` 2.6.11-rc1-mm1 Karim Yaghmour 1 sibling, 1 reply; 142+ messages in thread From: Robert Wisniewski @ 2005-01-17 22:42 UTC (permalink / raw) To: tglx Cc: Karim Yaghmour, Roman Zippel, Andi Kleen, Nikita Danilov, LKML, Tom Zanussi, Robert Wisniewski n <m1zmzcpfca.fsf@muc.de> <m17jmg2tm8.fsf@clusterfs.com> <20050114103836.GA71397@muc.de> <41E7A7A6.3060502@opersys.com> <Pine.LNX.4.61.0501141626310.6118@scrub.home> <41E8358A.4030908@opersys.com> <Pine.LNX.4.61.0501150101010.30794@scrub.home> <41E899AC.3070705@opersys.com> <Pine.LNX.4.61.0501160245180.30794@scrub.home> <41EA0307.6020807@opersys.com> <Pine.LNX.4.61.0501161648310.30794@scrub.home> <41EADA11.70403@opersys.com> <1105925842.13265.364.camel@tglx.tec.linutronix.de> <41EB21C2.3020608@opersys.com> <1105964417.13265.406.camel@tglx.tec.linutronix.de> <41EC20FB.9030506@opersys.com> <1106001113.13265.474.camel@tglx.tec.linutronix.de> X-Mailer: VM 6.43 under 20.4 "Emerald" XEmacs Lucid Message-ID: <16876.16024.551428.349980@kix.watson.ibm.com> From: Robert Wisniewski <bob@watson.ibm.com> Bcc: bob@watson.ibm.com,rosnbrg@watson.ibm.com Thomas Gleixner writes: > On Mon, 2005-01-17 at 15:32 -0500, Karim Yaghmour wrote: > > You're either on crack or I don't know how to read english. Here's what > > you said: > > Maybe you should read your own comment about ad-hominem attacks earlier > in this thread and consider if it might apply to you. > > I know, what I have said. I said reduce the filtering to the absolute > minimum and do the rest in userspace. > > The now builtin filters are defined to fit somebodys needs or idea of > what the user should / wants to see. They will not fit everybodys > needs / ideas. So we start modifying, adding and #ifdefing kernel > filters, which is a scary vision. > > Enabling and disabling events is a valid basic filter request, which > should live in the kernel. Anything else should go into userspace, IMO. > > tglx I believe (and Karim can correct me if I'm wrong) the idea is to have groups of events that can be disabled and enabled via a one word mask. No checking multiple variables, no #ifdefing, something very streamlined. By userspace I assume you mean post-processing, i.e., if the user/library/etc needs to log events they use the same simple facility. I think we agree to optimize/streamline performance for the gathering and do work in the post processing. There is an outstanding patch that makes strides in this direction. -bob Robert Wisniewski The K42 MP OS Project http://www.research.ibm.com/K42/ bob@watson.ibm.com ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-17 22:42 ` 2.6.11-rc1-mm1 Robert Wisniewski @ 2005-01-17 23:26 ` Thomas Gleixner 0 siblings, 0 replies; 142+ messages in thread From: Thomas Gleixner @ 2005-01-17 23:26 UTC (permalink / raw) To: Robert Wisniewski Cc: Karim Yaghmour, Roman Zippel, Andi Kleen, Nikita Danilov, LKML, Tom Zanussi On Mon, 2005-01-17 at 17:42 -0500, Robert Wisniewski wrote: > I believe (and Karim can correct me if I'm wrong) the idea is to have > groups of events that can be disabled and enabled via a one word mask. No > checking multiple variables, no #ifdefing, something very streamlined. By > userspace I assume you mean post-processing, i.e., if the user/library/etc > needs to log events they use the same simple facility. Yes, I was talking about postprocessing in userspace. The logging of userspace events is a complete seperate issue. You have to solve the timestamp problem and do the correlation to kernel events in the postprocessing. > I think we agree to optimize/streamline performance for the gathering and > do work in the post processing. There is an outstanding patch that makes > strides in this direction. Ack. Have you any plans to seperate the layers into different pieces, so they provide better reusability ? tglx ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-17 22:31 ` 2.6.11-rc1-mm1 Thomas Gleixner 2005-01-17 22:42 ` 2.6.11-rc1-mm1 Robert Wisniewski @ 2005-01-17 23:41 ` Karim Yaghmour 2005-01-18 0:02 ` 2.6.11-rc1-mm1 Thomas Gleixner 1 sibling, 1 reply; 142+ messages in thread From: Karim Yaghmour @ 2005-01-17 23:41 UTC (permalink / raw) To: tglx Cc: Roman Zippel, Andi Kleen, Nikita Danilov, LKML, Tom Zanussi, Robert Wisniewski Thomas Gleixner wrote: > I know, what I have said. I said reduce the filtering to the absolute > minimum and do the rest in userspace. You keep adopting the interpretation which best suits you, taking quotes out of context, and keep repeating things that have already been answered. There are limits to one's patience. What you did is change your position twice. It's there for anyone to see. > The now builtin filters are defined to fit somebodys needs or idea of > what the user should / wants to see. They will not fit everybodys > needs / ideas. So we start modifying, adding and #ifdefing kernel > filters, which is a scary vision. Ah, finally. Here's an actual suggestion. _IF_ you want, I'll just export a ltt_set_filter(*callback) and rewrite the if in _ltt_log_event() to: if ((ltt_filter != NULL) && !(<t_filter(event_id, event_struct, data))) return -EINVAL; You're always welcome to do the following from anywhere in your code: ltt_set_filter(NULL); > Enabling and disabling events is a valid basic filter request, which > should live in the kernel. Anything else should go into userspace, IMO. What you are suggesting is that a system administator that wants to monitor his sendmail server over a period of three weeks should just postprocess 1.8TB (1MB/s) of data because Thomas Gleixner didn't like the idea of kernel event filtering based on anything but events. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-17 23:41 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-18 0:02 ` Thomas Gleixner 2005-01-18 3:05 ` 2.6.11-rc1-mm1 Karim Yaghmour 0 siblings, 1 reply; 142+ messages in thread From: Thomas Gleixner @ 2005-01-18 0:02 UTC (permalink / raw) To: Karim Yaghmour Cc: Roman Zippel, Andi Kleen, Nikita Danilov, LKML, Tom Zanussi, Robert Wisniewski On Mon, 2005-01-17 at 18:41 -0500, Karim Yaghmour wrote: > Thomas Gleixner wrote: > > I know, what I have said. I said reduce the filtering to the absolute > > minimum and do the rest in userspace. > > You keep adopting the interpretation which best suits you, taking > quotes out of context, and keep repeating things that have already > been answered. There are limits to one's patience. I said before: "Sorting out disabled events is the filtering you have to do in kernel and you should do it in the hot path or remove the unneccecary tracepoints at compiletime." This is exactly what I stated above. I omitted the addon of "do the rest in userspace", as it was obvious enough. > What you did is change your position twice. It's there for anyone to see. Sorry, I didn't know that you are representing anyone. > > The now builtin filters are defined to fit somebodys needs or idea of > > what the user should / wants to see. They will not fit everybodys > > needs / ideas. So we start modifying, adding and #ifdefing kernel > > filters, which is a scary vision. > > Ah, finally. Here's an actual suggestion. _IF_ you want, I'll just > export a ltt_set_filter(*callback) and rewrite the if in > _ltt_log_event() to: > if ((ltt_filter != NULL) && !(<t_filter(event_id, event_struct, data))) > return -EINVAL; > > You're always welcome to do the following from anywhere in your code: > ltt_set_filter(NULL); Provide a hook, export it and load your filters as a module, but keep the filters out of the mainline kernel code. > > Enabling and disabling events is a valid basic filter request, which > > should live in the kernel. Anything else should go into userspace, IMO. > > What you are suggesting is that a system administator that wants to > monitor his sendmail server over a period of three weeks should > just postprocess 1.8TB (1MB/s) of data because Thomas Gleixner didn't > like the idea of kernel event filtering based on anything but events. A real common scenario with a broad range of users. And everybody has to like the idea of hardwired filters in the kernel code to make the life of this sysadmin easier. See above about hooks. Maybe some simple pipe would be helpful too: read_stream | prefilter | buildbuffers | storeit tglx ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-18 0:02 ` 2.6.11-rc1-mm1 Thomas Gleixner @ 2005-01-18 3:05 ` Karim Yaghmour 0 siblings, 0 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-18 3:05 UTC (permalink / raw) To: tglx Cc: Roman Zippel, Andi Kleen, Nikita Danilov, LKML, Tom Zanussi, Robert Wisniewski Thomas Gleixner wrote: > Provide a hook, export it and load your filters as a module, but keep > the filters out of the mainline kernel code. Great idea! I will do exactly that. Thanks, Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 21:18 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-17 1:37 ` 2.6.11-rc1-mm1 Thomas Gleixner @ 2005-01-17 13:54 ` Roman Zippel 2005-01-17 21:27 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-17 17:02 ` 2.6.11-rc1-mm1 Tom Zanussi 2 siblings, 1 reply; 142+ messages in thread From: Roman Zippel @ 2005-01-17 13:54 UTC (permalink / raw) To: Karim Yaghmour; +Cc: Andi Kleen, Nikita Danilov, linux-kernel, Tom Zanussi Hi, On Sun, 16 Jan 2005, Karim Yaghmour wrote: > > You can make it even simpler by dropping this completely. Every buffer is > > simply a list of events and you can let ltt write periodically a timer > > event. In userspace you can randomly seek at buffer boundaries and search > > for the timer events. It will require a bit more work for userspace, but > > even large amount of tracing data stays managable. > > We already do write a heartbeat event periodically to have readable > traces in the case where the lower 32 bits of the TSC wrap-around. > > As I mentioned elsewhere, please don't think of this in terms of > kbs or mbs of data. What we're talking about here is gbs if not > 100gbs of data. Having to start reading each sub-buffer until you > hit a heartbeat really is a killer for such large traces. If there > was a significant impact on relayfs for having this I would have > understood the argument, but relayfs needs to do buffer-management > anyway, so I don't see that much complexity being added by allowing > the channel user to ask relayfs for delimiters. Periodically can also mean a buffer start call back from relayfs (although that would mean the first entry is not guaranteed) or a (per cpu) eventcnt from the subsystem. The amount of needed search would be limited. The main point is from the relayfs POV the buffer structure has always the same (simple) structure. You have to be more specific, what's so special about this amount of data. You likely want to (incrementally) build an index file, so you don't have to repeat the searches, but even with your current format you would benefit from such an index file. > > Userspace can then easily restore the original order of events. > > As above, restoring the original order of events is fine if you are > looking at mbs or kbs of data. It's just totally unrealistic for > the amounts of data we want to handle. Why is it "totally unrealistic"? > But like I said earlier, the added relayfs mode (kdebug) would allow > for exactly what you are suggesting: > event_id = atomic_inc_return(&event_cnt); Actually that would be already too much for low level kernel debugging. Why do you want to put this into relayfs? What are the _specific_ reasons you need these various modes, why can't you build any special requirements on top of a very light weight relay mechanism? bye, Roman ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-17 13:54 ` 2.6.11-rc1-mm1 Roman Zippel @ 2005-01-17 21:27 ` Karim Yaghmour 2005-01-17 23:57 ` 2.6.11-rc1-mm1 Roman Zippel 2005-01-18 1:13 ` 2.6.11-rc1-mm1 Roman Zippel 0 siblings, 2 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-17 21:27 UTC (permalink / raw) To: Roman Zippel; +Cc: Andi Kleen, Nikita Danilov, linux-kernel, Tom Zanussi Hello Roman, Roman Zippel wrote: > Periodically can also mean a buffer start call back from relayfs > (although that would mean the first entry is not guaranteed) or a > (per cpu) eventcnt from the subsystem. The amount of needed search would > be limited. The main point is from the relayfs POV the buffer structure > has always the same (simple) structure. But two e-mails ago, you told us to drop the start_reserve and end_reserve and move the details of the buffer management into relayfs and out of ltt? Either we have a callback, like you suggest, and then we need to reserve some space to make sure that the callback is guaranteed to have the first entry, or we drop the callback and provide an option to the user for relayfs to write this first entry for him. Providing a callback without reservation is no different than relying purely on the heartbeat, which, like I said before and for the reasons illustrated below, is unrealistic. > You have to be more specific, what's so special about this amount of data. > You likely want to (incrementally) build an index file, so you don't have > to repeat the searches, but even with your current format you would > benefit from such an index file. [snip] >>As above, restoring the original order of events is fine if you are >>looking at mbs or kbs of data. It's just totally unrealistic for >>the amounts of data we want to handle. > > > Why is it "totally unrealistic"? Ok, let's expand a little here on the amount of data. Say you're getting 2MB/s of data (which is not unrealistic on a loaded system.) That means that if I'm tracing for 2 days, I've got 345GB of data (~7.5GB/hour). In practice, users aren't necessarily interested in plowing through the entire 345GB, they just want to view a given portion of it. Now, if I follow what you are suggesting, I have to go through the entire 345GB to: a) create indexes, b) reorder events, and likely c) have to rewrite another 345GB of data. And I haven't yet discussed the kind of problems you would encounter in trying to reorder such a beast that contains, by definition, variable-sized events. For one thing, if event N+1 doesn't follow N, then you would be forced to browse forward until you actually found it before you could write a properly ordered trace. And it just takes a few processes that are interrupted and forced to sleep here and there to make this unusable. That's without the RAM or fs space required to store those index tables ... At 3 to 12 bytes per events, that's a lot of space for indexes ... If I keep things as they are with ordered events and delimiters on buffer boundaries, I can skip to any place within this 345GB and start processing from there. And that's for two days. If you're a sysadmin encountering a transient problem on a server, you may actually want more than that. >>But like I said earlier, the added relayfs mode (kdebug) would allow >>for exactly what you are suggesting: >> event_id = atomic_inc_return(&event_cnt); > > > Actually that would be already too much for low level kernel debugging. > Why do you want to put this into relayfs? I don't. I was just saying that with the adhoc mode, a relayfs client could use the code snippet you were suggesting. > What are the _specific_ reasons you need these various modes, why can't > you build any special requirements on top of a very light weight relay > mechanism? Because of the opposite requirements. Here are the two modes I'm suggesting in relayfs and how they operate: Managed: - Presumes active user-space daemon interested in catching _all_ events. - Allows N buffers in buffer ring - Provides limit-checking (callback on end of sub-buffer) - Provides buffer delimiters (writes timestamp at beg and end) - Suited for all types of event sizes (both fixed and variable) at very high frequency. - Daemon is woken up when buffer is ready for writing, executes a write() on an mmaped area and notifies relevant kernel subsystem, which in turn notifies relayfs that buffer can now be reused. - Relies on proper abstraction of cli/sti. Ad-Hoc: - Presumes transient userspace tool interested in event snapshots. - Single circular buffer. - No limits checking (or very basic: as in stop if overwrite). - No buffer delimiters. - Best suited for fixed-size events at extreme high frequency. - User-space tool simply does a write() on an mmaped area and exits or goes back to sleep. - Relies on proper abstraction of cli/sti. Basically, the ad-hoc modes abides by the principles of KISS, whereas the managed is a more elaborate for clients like LTT. Rhetorical: Couldn't the ad-hoc mode case be a special case of the managed mode? In theory yes, in practice no. The various conditionals and code paths for switching buffers, invoking callbacks, writing delimiters and the likes, which make this mode useful to client like LTT, will always be a problem for those seeking the shortest path to buffer comital. In the case of Ingo, for example, I'm sure he'd probably go in the code and "#if 0" it to make sure it doesn't slow him down. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-17 21:27 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-17 23:57 ` Roman Zippel 2005-01-18 4:03 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-18 1:13 ` 2.6.11-rc1-mm1 Roman Zippel 1 sibling, 1 reply; 142+ messages in thread From: Roman Zippel @ 2005-01-17 23:57 UTC (permalink / raw) To: Karim Yaghmour; +Cc: Nikita Danilov, linux-kernel, Tom Zanussi Hi, On Mon, 17 Jan 2005, Karim Yaghmour wrote: > > Periodically can also mean a buffer start call back from relayfs > > (although that would mean the first entry is not guaranteed) or a > > (per cpu) eventcnt from the subsystem. The amount of needed search would > > be limited. The main point is from the relayfs POV the buffer structure > > has always the same (simple) structure. > > But two e-mails ago, you told us to drop the start_reserve and end_reserve > and move the details of the buffer management into relayfs and out of > ltt? Either we have a callback, like you suggest, and then we need to > reserve some space to make sure that the callback is guaranteed to have > the first entry, or we drop the callback and provide an option to the > user for relayfs to write this first entry for him. Providing a callback > without reservation is no different than relying purely on the heartbeat, > which, like I said before and for the reasons illustrated below, is > unrealistic. Why is so important that it's at the start of the buffer? What's wrong with a special event _near_ the start of a buffer? > > Why is it "totally unrealistic"? > > Ok, let's expand a little here on the amount of data. Say you're getting > 2MB/s of data (which is not unrealistic on a loaded system.) That means > that if I'm tracing for 2 days, I've got 345GB of data (~7.5GB/hour). > In practice, users aren't necessarily interested in plowing through the > entire 345GB, they just want to view a given portion of it. Now, if I > follow what you are suggesting, I have to go through the entire 345GB to: > a) create indexes, b) reorder events, and likely c) have to rewrite > another 345GB of data. And I haven't yet discussed the kind of problems > you would encounter in trying to reorder such a beast that contains, > by definition, variable-sized events. For one thing, if event N+1 doesn't > follow N, then you would be forced to browse forward until you actually > found it before you could write a properly ordered trace. And it just > takes a few processes that are interrupted and forced to sleep here and > there to make this unusable. That's without the RAM or fs space required > to store those index tables ... At 3 to 12 bytes per events, that's a lot > of space for indexes ... > > If I keep things as they are with ordered events and delimiters on buffer > boundaries, I can skip to any place within this 345GB and start processing > from there. What gives you the idea, that you can't do this with what I proposed? You can still seek freely within the data at buffer boundaries and you only have to search a little into the buffer to find the delimiter. Events are not completely at random, so that the little reordering can be done at runtime. Sorry, but I don't get what kind of unsolvable problems you see here. > Rhetorical: Couldn't the ad-hoc mode case be a special case of the > managed mode? Wrong question. What compromises can be made on both sides to create a common simple framework? Your unwillingness to compromise a little on the ltt requirements really amazes me. bye, Roman ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-17 23:57 ` 2.6.11-rc1-mm1 Roman Zippel @ 2005-01-18 4:03 ` Karim Yaghmour 2005-01-18 4:30 ` 2.6.11-rc1-mm1 Aaron Cohen 2005-01-18 15:31 ` 2.6.11-rc1-mm1 Roman Zippel 0 siblings, 2 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-18 4:03 UTC (permalink / raw) To: Roman Zippel; +Cc: Nikita Danilov, linux-kernel, Tom Zanussi Hello Roman, Roman Zippel wrote: > Why is so important that it's at the start of the buffer? What's wrong > with a special event _near_ the start of a buffer? [snip] > What gives you the idea, that you can't do this with what I proposed? > You can still seek freely within the data at buffer boundaries and you > only have to search a little into the buffer to find the delimiter. Events > are not completely at random, so that the little reordering can be done at > runtime. Sorry, but I don't get what kind of unsolvable problems you see > here. Actually I just checked the code and this is a non-issue. The callback can only be called when the condition is met, which itself happens only on buffer switch, which itself only happens when we try to reserve something bigger than what is left in the buffer. IOW, there is no need for reserving anything. Here's what the code does: if (!finalizing) { bytes_written = rchan->callbacks->buffer_start ... cur_write_pos(rchan) += bytes_written; } With that said, I hope we've agreed that we'll have a callback for letting relayfs clients know that they need to write the begining of the buffer event. There won't be any associated reserve. Conversly, I hope it is not too much to ask to have an end-of-buffer callback. > Wrong question. What compromises can be made on both sides to create a > common simple framework? Your unwillingness to compromise a little on the > ltt requirements really amazes me. Roman, of all people I've been more than happy to change my stuff following your recommendations. Do I have to list how far down relayfs has been stripped down? I mean, we got rid of the lockless scheme (which was one of ltt's explicit requirements), we got rid of the read/write capabilities for user-space, etc. And we are now only left with the bare-bones API: rchan* relay_open(channel_path, bufsize, nbufs, flags, *callbacks); int relay_close(*rchan); int relay_reset(*rchan); int relay_write(*rchan, *data_ptr, count, **wrote-pos); char* relay_reserve(*rchan, len, *ts, *td, *err, *interrupting); void relay_commit(*rchan, *from, len, reserve_code, interrupting); void relay_buffers_consumed(*rchan, u32); #define relay_write_direct(DEST, SRC, SIZE) \ #define relay_lock_channel(RCHAN, FLAGS) \ #define relay_unlock_channel(RCHAN, FLAGS) \ This is a far-cry from what we had before, have a look at the relayfs.txt file in 2.6.11-rc1-mm1's Documentation/filesystems if you want to compare. Please at least acknowledge as much. I'm more than willing to compromise, but at least give me something substantive to feed on. I've explained why I believe there needs to be two modes for relayfs. If you don't think they are appropriate, then please explain why. Either my experience blinds me or it rightly compels me to continue defending it. You ask what compromises can be found from both sides to obtain a single implementation. I have looked at this, and given how stripped down it has become, anything less from relayfs will make it useless for LTT. IOW, I would have to reimplement a buffering scheme within LTT outside of relayfs. Can't you see that not all buffering schemes are adapted to all applications and that it's preferable to have a single API transparently providing separate mechanisms instead of a single mechanism that doesn't satisfy any of its users? If I can't convince you of the concept, can I at least convince you to withhold your final judgement until you actually see the code for the managed vs. ad-hoc schemes? Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-18 4:03 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-18 4:30 ` Aaron Cohen 2005-01-18 4:46 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-18 15:31 ` 2.6.11-rc1-mm1 Roman Zippel 1 sibling, 1 reply; 142+ messages in thread From: Aaron Cohen @ 2005-01-18 4:30 UTC (permalink / raw) To: karim; +Cc: Roman Zippel, Nikita Danilov, linux-kernel, Tom Zanussi Hi, I'm very much a newbie to all of this, but I'm finding this discussion fairly interesting. I've got a quick question and I just want to be clear that it doesn't have a political agenda behind it. Here goes, why can't LTT and/or relayfs, work similar to the way syslog does and just fill a buffer (aka ring-buffer or whatever is appropriate), while a userspace daemon of some kind periodically reads that buffer and massages it. I'm probably being naive but if the difficulty is with huge several hundred-gig files, the daemon if it monitors the buffer often enough could stuff it into a database or whatever high-performance format you need. It also seems to me that Linus' nascent "splice and tee" work would be really useful for something like this to avoid a lot of unnecessary copying by the userspace daemon. On Mon, 17 Jan 2005 23:03:46 -0500, Karim Yaghmour <karim@opersys.com> wrote: > > Hello Roman, > > Roman Zippel wrote: > > Why is so important that it's at the start of the buffer? What's wrong > > with a special event _near_ the start of a buffer? > [snip] > > What gives you the idea, that you can't do this with what I proposed? > > You can still seek freely within the data at buffer boundaries and you > > only have to search a little into the buffer to find the delimiter. Events > > are not completely at random, so that the little reordering can be done at > > runtime. Sorry, but I don't get what kind of unsolvable problems you see > > here. > > Actually I just checked the code and this is a non-issue. The callback > can only be called when the condition is met, which itself happens only > on buffer switch, which itself only happens when we try to reserve > something bigger than what is left in the buffer. IOW, there is no need > for reserving anything. Here's what the code does: > if (!finalizing) { > bytes_written = rchan->callbacks->buffer_start ... > cur_write_pos(rchan) += bytes_written; > } > > With that said, I hope we've agreed that we'll have a callback for > letting relayfs clients know that they need to write the begining of > the buffer event. There won't be any associated reserve. Conversly, > I hope it is not too much to ask to have an end-of-buffer callback. > > > Wrong question. What compromises can be made on both sides to create a > > common simple framework? Your unwillingness to compromise a little on the > > ltt requirements really amazes me. > > Roman, of all people I've been more than happy to change my stuff following > your recommendations. Do I have to list how far down relayfs has been > stripped down? I mean, we got rid of the lockless scheme (which was > one of ltt's explicit requirements), we got rid of the read/write capabilities > for user-space, etc. And we are now only left with the bare-bones API: > rchan* relay_open(channel_path, bufsize, nbufs, flags, *callbacks); > int relay_close(*rchan); > int relay_reset(*rchan); > int relay_write(*rchan, *data_ptr, count, **wrote-pos); > > char* relay_reserve(*rchan, len, *ts, *td, *err, *interrupting); > void relay_commit(*rchan, *from, len, reserve_code, interrupting); > void relay_buffers_consumed(*rchan, u32); > > #define relay_write_direct(DEST, SRC, SIZE) \ > #define relay_lock_channel(RCHAN, FLAGS) \ > #define relay_unlock_channel(RCHAN, FLAGS) \ > > This is a far-cry from what we had before, have a look at the > relayfs.txt file in 2.6.11-rc1-mm1's Documentation/filesystems if > you want to compare. Please at least acknowledge as much. > > I'm more than willing to compromise, but at least give me something > substantive to feed on. I've explained why I believe there needs to be > two modes for relayfs. If you don't think they are appropriate, then > please explain why. Either my experience blinds me or it rightly > compels me to continue defending it. > > You ask what compromises can be found from both sides to obtain a > single implementation. I have looked at this, and given how > stripped down it has become, anything less from relayfs will make > it useless for LTT. IOW, I would have to reimplement a buffering > scheme within LTT outside of relayfs. > > Can't you see that not all buffering schemes are adapted to all > applications and that it's preferable to have a single API > transparently providing separate mechanisms instead of a single > mechanism that doesn't satisfy any of its users? > > If I can't convince you of the concept, can I at least convince > you to withhold your final judgement until you actually see the > code for the managed vs. ad-hoc schemes? > > Karim > -- > Author, Speaker, Developer, Consultant > Pushing Embedded and Real-Time Linux Systems Beyond the Limits > http://www.opersys.com || karim@opersys.com || 1-866-677-4546 > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-18 4:30 ` 2.6.11-rc1-mm1 Aaron Cohen @ 2005-01-18 4:46 ` Karim Yaghmour 2005-01-18 8:07 ` 2.6.11-rc1-mm1 Tom Zanussi 0 siblings, 1 reply; 142+ messages in thread From: Karim Yaghmour @ 2005-01-18 4:46 UTC (permalink / raw) To: Aaron Cohen; +Cc: Roman Zippel, Nikita Danilov, linux-kernel, Tom Zanussi Aaron Cohen wrote: > I've got a quick question and I just want to be clear that it > doesn't have a political agenda behind it. :) > Here goes, why can't LTT and/or relayfs, work similar to the way > syslog does and just fill a buffer (aka ring-buffer or whatever is > appropriate), while a userspace daemon of some kind periodically reads > that buffer and massages it. I'm probably being naive but if the > difficulty is with huge several hundred-gig files, the daemon if it > monitors the buffer often enough could stuff it into a database or > whatever high-performance format you need. Because of the bandwidth it is not possible to do any sort of live processing of any kind. The only thing the daemon can possibly do is write large blocks of tracing info to disk as rapidly as possible. > It also seems to me that Linus' nascent "splice and tee" work would > be really useful for something like this to avoid a lot of unnecessary > copying by the userspace daemon. There is no copying by the userspace daemon. All it does is open(), then mmap(), and then it sleeps until it is woken up by the ltt kernel subsystem. When that happens, it only does a write() on the mmaped area, tells the ltt subsystem that it commited X number of sub-buffers and goes back asleep. This is all zero-copy. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-18 4:46 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-18 8:07 ` Tom Zanussi 2005-01-18 16:40 ` 2.6.11-rc1-mm1 Karim Yaghmour 0 siblings, 1 reply; 142+ messages in thread From: Tom Zanussi @ 2005-01-18 8:07 UTC (permalink / raw) To: karim Cc: Aaron Cohen, Roman Zippel, Nikita Danilov, linux-kernel, Tom Zanussi Karim Yaghmour writes: > > Aaron Cohen wrote: > > I've got a quick question and I just want to be clear that it > > doesn't have a political agenda behind it. > > :) > > > Here goes, why can't LTT and/or relayfs, work similar to the way > > syslog does and just fill a buffer (aka ring-buffer or whatever is > > appropriate), while a userspace daemon of some kind periodically reads > > that buffer and massages it. I'm probably being naive but if the > > difficulty is with huge several hundred-gig files, the daemon if it > > monitors the buffer often enough could stuff it into a database or > > whatever high-performance format you need. > > Because of the bandwidth it is not possible to do any sort of live > processing of any kind. The only thing the daemon can possibly do > is write large blocks of tracing info to disk as rapidly as possible. I have to disagree. Awhile back, if you remember, I posted a patch to the LTT daemon that would monitor the trace stream in real time, and process it using an embedded Perl interpreter, no less: http://marc.theaimsgroup.com/?l=linux-kernel&m=109405724500237&w=2 It didn't seem to have any problems keeping up with the trace stream even though it was monitoring all LTT event types (and a couple of others - custom events injected using kprobes) and not doing any filtering in the kernel, through kernel compiles, normal X traffic, etc. I don't know what volume of event traffic would cause this model to break down, but I think it shows that at least some level of non-trivial live processing is possible... Tom > > > It also seems to me that Linus' nascent "splice and tee" work would > > be really useful for something like this to avoid a lot of unnecessary > > copying by the userspace daemon. > > There is no copying by the userspace daemon. All it does is open(), > then mmap(), and then it sleeps until it is woken up by the ltt > kernel subsystem. When that happens, it only does a write() on the > mmaped area, tells the ltt subsystem that it commited X number of > sub-buffers and goes back asleep. This is all zero-copy. > > Karim > -- > Author, Speaker, Developer, Consultant > Pushing Embedded and Real-Time Linux Systems Beyond the Limits > http://www.opersys.com || karim@opersys.com || 1-866-677-4546 -- Regards, Tom Zanussi <zanussi@us.ibm.com> IBM Linux Technology Center/RAS ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-18 8:07 ` 2.6.11-rc1-mm1 Tom Zanussi @ 2005-01-18 16:40 ` Karim Yaghmour 2005-01-18 19:37 ` 2.6.11-rc1-mm1 Tom Zanussi 0 siblings, 1 reply; 142+ messages in thread From: Karim Yaghmour @ 2005-01-18 16:40 UTC (permalink / raw) To: Tom Zanussi; +Cc: Aaron Cohen, Roman Zippel, Nikita Danilov, linux-kernel Tom Zanussi wrote: > I have to disagree. Awhile back, if you remember, I posted a patch to > the LTT daemon that would monitor the trace stream in real time, and > process it using an embedded Perl interpreter, no less: > > http://marc.theaimsgroup.com/?l=linux-kernel&m=109405724500237&w=2 > > It didn't seem to have any problems keeping up with the trace stream > even though it was monitoring all LTT event types (and a couple of > others - custom events injected using kprobes) and not doing any > filtering in the kernel, through kernel compiles, normal X traffic, > etc. I don't know what volume of event traffic would cause this model > to break down, but I think it shows that at least some level of > non-trivial live processing is possible... Good Point. My bad. Thanks for bringing this up. Obviously this didn't get as much attention as it should've had the last time it was posted, especially as it allows very easy scripting of filtering in userspace. That email you refer to is pretty loaded and I'm sure those who are interested will dig through it. But in the interest of helping everyone get a rapid understanding of what it does and how it does it, can you break it down in to a short description, possibly with a diagram? I'm sure many will find this very interesting. Thanks, Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-18 16:40 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-18 19:37 ` Tom Zanussi 0 siblings, 0 replies; 142+ messages in thread From: Tom Zanussi @ 2005-01-18 19:37 UTC (permalink / raw) To: karim Cc: Tom Zanussi, Aaron Cohen, Roman Zippel, Nikita Danilov, linux-kernel Karim Yaghmour writes: > > Tom Zanussi wrote: > > I have to disagree. Awhile back, if you remember, I posted a patch to > > the LTT daemon that would monitor the trace stream in real time, and > > process it using an embedded Perl interpreter, no less: > > > > http://marc.theaimsgroup.com/?l=linux-kernel&m=109405724500237&w=2 > > > > It didn't seem to have any problems keeping up with the trace stream > > even though it was monitoring all LTT event types (and a couple of > > others - custom events injected using kprobes) and not doing any > > filtering in the kernel, through kernel compiles, normal X traffic, > > etc. I don't know what volume of event traffic would cause this model > > to break down, but I think it shows that at least some level of > > non-trivial live processing is possible... > > Good Point. > > My bad. Thanks for bringing this up. Obviously this didn't get as > much attention as it should've had the last time it was posted, > especially as it allows very easy scripting of filtering in userspace. > That email you refer to is pretty loaded and I'm sure those who > are interested will dig through it. But in the interest of helping > everyone get a rapid understanding of what it does and how it does it, > can you break it down in to a short description, possibly with a > diagram? I'm sure many will find this very interesting. It's so simple it doesn't really deserve a diagram, which I'm pretty bad at anyway... Basically all it does is loop around the received buffer, reading each event and sending it off to a handler. In this case the handler massages the data into a form that allows it to be passed to the Perl interpreter as arguments to a Perl function that in turn acts as callback handler in the Perl interpreter. At that point, the Perl callback can do whatever it wants with the data - save events matching a certain pid and discard everything else, keep running counts or time totals e.g. total syscall counts for each pid, function call tracing (if you dynamically instrumented function call entry/exit with kprobes for example), etc, etc, etc. Probably even more useful is the ability to monitor the event stream looking for sporadically occuring events, again under the control of the Perl interpreter, so your criteria for deciding what an 'important event' is can be arbitrarily complex and incorporate past history. It also means that you don't have to save anything at all to disk until you detect your specified condition (which makes tracing for days or weeks on end more practical), at which point you can dump out the currently mapped buffer containing the last bufsize number of events most likely to be of interest anyway. Perl makes this kind of quick and dirty processing extremely easy and it has a lot of powerful language features such as nested hashes built in, which is why I chose it, but you could of course avoid the extra layer and the interpreter and do your filtering in straight C, or create a binding for any language you want. IMHO being able to do most of the filtering in user space like this opens up a lot of avenues for not only one-off problem determination hacks, but a proliferation of more substantial tools, considering how easy it is to put together applications using for instance the copious number of Perl modules available. Tom > > Thanks, > > Karim > -- > Author, Speaker, Developer, Consultant > Pushing Embedded and Real-Time Linux Systems Beyond the Limits > http://www.opersys.com || karim@opersys.com || 1-866-677-4546 -- Regards, Tom Zanussi <zanussi@us.ibm.com> IBM Linux Technology Center/RAS ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-18 4:03 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-18 4:30 ` 2.6.11-rc1-mm1 Aaron Cohen @ 2005-01-18 15:31 ` Roman Zippel 2005-01-21 6:26 ` 2.6.11-rc1-mm1 Karim Yaghmour 1 sibling, 1 reply; 142+ messages in thread From: Roman Zippel @ 2005-01-18 15:31 UTC (permalink / raw) To: Karim Yaghmour; +Cc: Nikita Danilov, linux-kernel, Tom Zanussi Hi, On Mon, 17 Jan 2005, Karim Yaghmour wrote: > With that said, I hope we've agreed that we'll have a callback for > letting relayfs clients know that they need to write the begining of > the buffer event. There won't be any associated reserve. Conversly, > I hope it is not too much to ask to have an end-of-buffer callback. There of course has to be some kind of end marker, but that's less critical as it's not the active buffer anymore. > Roman, of all people I've been more than happy to change my stuff following > your recommendations. Do I have to list how far down relayfs has been > stripped down? Sorry, you missunderstood me. At the moment I'm only secondarily interested in the API details, primarily I want to work out the details of what exactly relayfs/ltt are supposed to do. One main question here I can't answer yet, why you insist on multiple relayfs modes. This is what I basically have in mind for the relay_write function: cpu = get_cpu(); buffer = relay_get_buffer(chan, cpu); while(1) { offset = local_add_return(buffer->offset, length); if (likely(offset + length <= buffer->size)) break; buffer = relay_switch_buffer(chan, buffer, offset); } memcpy(buffer->data + offset, data, length); put_cpu(); ltt_log_event should only be a few lines more (for writing header and event data). What I'd like to know now are the reasons why you need more than this. It's not the amount of data and any timing requirements have to be done by the caller. During processing you either take the events in the order they were recorded (often that's good enough) or you sort them which is not that difficult. > You ask what compromises can be found from both sides to obtain a > single implementation. I have looked at this, and given how > stripped down it has become, anything less from relayfs will make > it useless for LTT. IOW, I would have to reimplement a buffering > scheme within LTT outside of relayfs. I know you don't want to touch the topic of kernel debugging, but its requirements greatly overlap with what you want to do with ltt, e.g. one needs very often information about scheduling events as many kernel processes rely more and more on kernel threads. The only real requirement for kernel debugging is low runtime overhead, which you certainly like to have as well. So what exactly are these requirements and why can't there be no reasonable alternative? bye, Roman ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-18 15:31 ` 2.6.11-rc1-mm1 Roman Zippel @ 2005-01-21 6:26 ` Karim Yaghmour 2005-01-21 22:23 ` 2.6.11-rc1-mm1 Roman Zippel 0 siblings, 1 reply; 142+ messages in thread From: Karim Yaghmour @ 2005-01-21 6:26 UTC (permalink / raw) To: Roman Zippel; +Cc: Nikita Danilov, linux-kernel, Tom Zanussi OK, I finally come around to answering this ... Roman Zippel wrote: > Sorry, you missunderstood me. At the moment I'm only secondarily > interested in the API details, primarily I want to work out the details of > what exactly relayfs/ltt are supposed to do. One main question here I > can't answer yet, why you insist on multiple relayfs modes. I should have avoided earlier confusing the use of a certain type of relayfs channel for a given purpose (i.e. LTT should not necessarily depend on the managed mode.) I believe that there is a need for more than one mode in relayfs independently of LTT. There are users who want to be able to manage the data in a buffer (by manage I mean: receive notification of important buffer events, be able to insert important data at boundaries, etc.), and there are users who just want to dump as much information as possible in as fast a way as possible without having to deal with non-essential codepaths. > This is what I basically have in mind for the relay_write function: > > cpu = get_cpu(); > buffer = relay_get_buffer(chan, cpu); > while(1) { > offset = local_add_return(buffer->offset, length); > if (likely(offset + length <= buffer->size)) > break; > buffer = relay_switch_buffer(chan, buffer, offset); > } > memcpy(buffer->data + offset, data, length); > put_cpu(); looking at this code: 1) get_cpu() and put_cpu() won't do. You need to outright disable interrupts because you may be called from an interrupt handler. 2) You assume that relayfs creates one buffer per cpu for each channel. We think this is wrong. Relayfs should not need to care about the number of CPUs, it's the clients' responsibility to create as many channels as they see fit, whether it be one channel per CPU or 10 channels per CPU or 1 channel per interrupt, etc. 3) I'm unclear about the need for local_add_return(), why not just: if (likely(buffer->offset + length <= buffer->size) In any case, here's what we do in relay_write(): write_pos = relay_reserve(rchan, count, &reserve_code, &interrupting); If there's any buffer switching required, that will be done in relay_reserve. This has the added advantage that clients that want to write directly to the buffer without using relay_write() can do so by calling relay_reserve() and not care about required buffer switching. 4) After securing the area, you simply go ahead and do a memcpy() and leave. We think that this is insufficient. Here's what we do: if (likely(write_pos != NULL)) { relay_write_direct(write_pos, data_ptr, count); relay_commit(rchan, write_pos, count, reserve_code, interrupting); *wrote_pos = write_pos; the relay_write_direct() is basically an memcpy(). We also do a relay_commit(). This actually effects the delivery of the event. If, for example, there had been a buffer switch at the previous relay_reserve(), then this call to relay_commit() will generate a call to the client's deliver() callback function. In the case of LTT, for example, this is how it knows that it's got to notify the user-space daemon that there are buffers to consume (i.e. write to disk.) > ltt_log_event should only be a few lines more (for writing header and > event data). Actually no, you don't want ltt_log_event using relay_write(), for one thing because is can generate variable size events. Instead, ltt_log_event does (basically): data_size = sizeof(event_id) + sizeof(time_delta) + sizeof(data_size); relay_lock_channel(); relay_reserve(); relay_write_direct(&event_id, sizeof(event_id)); relay_write_direct(&time_delta, sizeof(event_id)); if (var_data) { relay_write_direct(var_data, var_data_len); data_size += var_data_len; } relay_write_direct(&data_size, sizeof(data_size)); relay_commit(); relay_unlock_channel(); > What I'd like to know now are the reasons why you need more than this. I hope the above explanation clarifies things. > It's not the amount of data and any timing requirements have to be done by > the caller. During processing you either take the events in the order they > were recorded (often that's good enough) or you sort them which is not > that difficult. Ordering is a non-issue to be honest. Unless you've got some hardware scope in there, it's almost impossible to pinpoint exactly when an event occurred. There is no single line of code where an event occurs, so it's all an educated guess anyway. You want things to resemble what really happened in as much as possible though. > I know you don't want to touch the topic of kernel debugging, but its > requirements greatly overlap with what you want to do with ltt, e.g. one > needs very often information about scheduling events as many kernel > processes rely more and more on kernel threads. The only real requirement > for kernel debugging is low runtime overhead, which you certainly like to > have as well. So what exactly are these requirements and why can't there > be no reasonable alternative? ok, ok, ok, ok, ok, ok, OK! You've hit it enough times on its head that I'll actually have to answer. In terms of low runtime overhead, you are correct, the requirements overlap, and I will agree to do my best to trim down LTT to make it useable for kernel tracing without jeopardizing its existing purpose. I'll start this separately in a "Ripping LTT apart" thread. In regards to relayfs, I think that LTT should run on both modes transparently. Unlike what I said before, no single mode should be tied to LTT. If you want tracing with the ad-hoc mode, then fine, you should be able to do that. There is merit in keeping both relayfs modes, irrespective of what modes LTT uses. A review of the managed and adhoc code should consider all clients, including LTT, as potential users of both. Sure, we'll want to optimize the managed mode in as much as possible, but its functionality stands on its own and is different from that of the ad-hoc mode. The difference between these modes is akin the difference between GFP_KERNEL, GFP_ATOMIC, GFP_USER, etc.: same API, different underlying functionality. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-21 6:26 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-21 22:23 ` Roman Zippel 2005-01-23 7:43 ` 2.6.11-rc1-mm1 Karim Yaghmour 0 siblings, 1 reply; 142+ messages in thread From: Roman Zippel @ 2005-01-21 22:23 UTC (permalink / raw) To: Karim Yaghmour; +Cc: Nikita Danilov, linux-kernel, Tom Zanussi Hi, On Fri, 21 Jan 2005, Karim Yaghmour wrote: > I should have avoided earlier confusing the use of a certain type of > relayfs channel for a given purpose (i.e. LTT should not necessarily > depend on the managed mode.) I believe that there is a need for > more than one mode in relayfs independently of LTT. There are users > who want to be able to manage the data in a buffer (by manage I mean: > receive notification of important buffer events, be able to insert > important data at boundaries, etc.), and there are users who just > want to dump as much information as possible in as fast a way as > possible without having to deal with non-essential codepaths. Well, let's concentrate for a moment on the last thing and check later if and how they fit into relayfs. Since ltt will be first main user, let's optimize it for this. Also since relayfs is intended for large, fast data transfers, per cpu buffers are pretty much always required, so it would make sense to leave this to relayfs (less to get wrong for the client). > looking at this code: I have to modify it a little (only the if (!buffer) part is new): cpu = get_cpu(); buffer = relay_get_buffer(chan, cpu); while(1) { offset = local_add_return(buffer->offset, length); if (likely(offset + length <= buffer->size)) break; buffer = relay_switch_buffer(chan, buffer, offset); if (!buffer) { put_cpu(); return; } } memcpy(buffer->data + offset, data, length); put_cpu(); This has a very short fast path and I need very good reasons to change/add anything here. OTOH the slow path with relay_switch_buffer() is less critical and still leaves a lot of flexibility. > 1) get_cpu() and put_cpu() won't do. You need to outright disable > interrupts because you may be called from an interrupt handler. Look closer, it's already interrupt safe, the synchronization for the buffer switch is left to relay_switch_buffer(). > 3) I'm unclear about the need for local_add_return(), why not > just: > if (likely(buffer->offset + length <= buffer->size) > In any case, here's what we do in relay_write(): > write_pos = relay_reserve(rchan, count, &reserve_code, &interrupting); Ok, let's take a closer look at the fast path of relay_write (via relay_managed.c): > rchan_get(rchan); This is not needed, it's the responsibility of the client to keep a reference to the channel. A synchronize_kernel() is enough to get rid of current users of the channel on other cpus. > relay_lock_channel(rchan, flags); what becomes: > FLAGS = 0; > if (RCHAN->flags & RELAY_USAGE_SMP) local_irq_save(FLAGS); > else spin_lock_irqsave(&(RCHAN)->mode.managed.lock, FLAGS); This adds a conditional and is not really needed. Above shows how to make it interrupt safe and if the clients wants to reuse the same buffer, leave the locking to the client. > write_pos = relay_reserve(rchan, count, &reserve_code, &interrupting); what becomes: > if (rchan == NULL) ... Is this really needed? > if (slot_len >= rchan->buf_size) ... You can leave it to caller to check for this, a BUG_ON should be enough here. > if (rchan->initialized == 0) ... Does this really have to be in the fast path? > if (in_progress_event_size(rchan)) ... What's the point of this? You already disable interrupts, so how can anything else be in progress? > if (cur_write_pos(rchan) + slot_len > write_limit(rchan)) ... Ok. This leads to the slow path and not interesting right now. > if (likely(write_pos != NULL)) { After 7 conditions we finally have a valid write position (and that's without ltt). > relay_write_direct(write_pos, data_ptr, count); If write_pos is just a normal memory pointer, why not also just use memcpy? > relay_commit(rchan, write_pos, count, reserve_code, interrupting); what becomes: > if (rchan == NULL) > return; Hopefully no comment needed. > if (interrupting) ... Same comment as above for in_progress_event_size(). > if (deliver) ... > ... > if (deliver && waitqueue_active(&rchan->mmap_read_wait)) Why is that hook needed here? Why can't this be done by the client? A buffer switch notification can be done somewhere else. > relay_unlock_channel(rchan, flags); > rchan_put(rchan); Same comment as above. That's quite a lot of code with at least 14 conditions (or 13 conditions too much) and this is just relayfs. > The difference between these modes is akin the > difference between GFP_KERNEL, GFP_ATOMIC, GFP_USER, etc.: same API, > different underlying functionality. That's not always true, where perfomance matters we provide different functions (e.g. spinlocks), so having an alternative version of relay_write is a possibility (although I'd like to see the user first). bye, Roman ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-21 22:23 ` 2.6.11-rc1-mm1 Roman Zippel @ 2005-01-23 7:43 ` Karim Yaghmour 2005-01-23 7:52 ` 2.6.11-rc1-mm1 Karim Yaghmour ` (2 more replies) 0 siblings, 3 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-23 7:43 UTC (permalink / raw) To: Roman Zippel; +Cc: Nikita Danilov, linux-kernel, Tom Zanussi Hello Roman, Roman Zippel wrote: > Well, let's concentrate for a moment on the last thing and check later > if and how they fit into relayfs. Since ltt will be first main user, let's > optimize it for this. > Also since relayfs is intended for large, fast data transfers, per cpu > buffers are pretty much always required, so it would make sense to leave > this to relayfs (less to get wrong for the client). But how does relayfs organize the namespace then? What if I have multiple channels per CPU, each for a different type of data, will all channels for the same CPU be under the same directory or will each type of data have its own directory with one entry per CPU? I don't have an answer to that, and I don't know that we should. Why not just leave it to the client to organize his data as he wishes. If we must assume that everyone will have at least one channel per CPU, then why not provide helper functions built on top of very basic functions instead of fixing the namespace in stone? > I have to modify it a little (only the if (!buffer) part is new): > > cpu = get_cpu(); > buffer = relay_get_buffer(chan, cpu); > while(1) { > offset = local_add_return(buffer->offset, length); > if (likely(offset + length <= buffer->size)) > break; > buffer = relay_switch_buffer(chan, buffer, offset); > if (!buffer) { > put_cpu(); > return; > } > } > memcpy(buffer->data + offset, data, length); > put_cpu(); > > This has a very short fast path and I need very good reasons to change/add > anything here. OTOH the slow path with relay_switch_buffer() is less > critical and still leaves a lot of flexibility. This is not good for any client that doesn't know beforehand the exact size of their data units, as in the case of LTT. If LTT has to use this code that means we are going to loose performance because we will need to fill an intermediate data structure which will only be used for relay_write(). Instead of zero-copy, we would have an extra unnecessary copy. There has got to be a way for clients to directly reserve and write as they wish. Even Zach Brown recognized this in his tracepipe proposal, here's from his patch: + * - let caller reserve space and get a pointer into buf >>1) get_cpu() and put_cpu() won't do. You need to outright disable >>interrupts because you may be called from an interrupt handler. > > > Look closer, it's already interrupt safe, the synchronization for the > buffer switch is left to relay_switch_buffer(). Sorry, I'm still missing something. What exactly does local_add_return() do? I assume this code has got to be interrupt safe? Something like: #define local_add_return(OFFSET, LEN) \ do {\ ... local_irq_save(); \ OFFSET += LEN; local_irq_restore(); \ ... } while(0); I'm assuming local_irq_XXX because we were told by quite a few people in the related thread to avoid atomic ops because they are more expensive on most CPUs than cli/sti. Also how does relay_get_buffer() operate? What if I'm writing an event from within a system call and I'm about to switch buffers and get an interrupt at the if(likely(...))? Isn't relay_get_buffer() going to return the same pointer as the one obtained for the syscall, and aren't both cases now going to effect relay_switch_buffer(), one of which will be superfluous? > This adds a conditional and is not really needed. Above shows how to make > it interrupt safe and if the clients wants to reuse the same buffer, leave > the locking to the client. Fine, but how is the client going to be able to reuse the same buffer if relayfs always assumes per-CPU buffer as you said above? This would be solved if at its core relayfs' functions worked on single channels and additional code provided helpers for making the SMP case very simple. > That's quite a lot of code with at least 14 conditions (or 13 conditions > too much) and this is just relayfs. I believe Tom has refactored the code with your comments in mind, and has something ready for review. I just want to clear up the above before we make this final. Among other things, he just dropped all modes, and there's only a basic relay_write() that closely resembles what you have above. > That's not always true, where perfomance matters we provide different > functions (e.g. spinlocks), so having an alternative version of > relay_write is a possibility (although I'd like to see the user first). Sure, see above in the case of LTT. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-23 7:43 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-23 7:52 ` Karim Yaghmour 2005-01-23 8:28 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-24 0:38 ` 2.6.11-rc1-mm1 Roman Zippel 2 siblings, 0 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-23 7:52 UTC (permalink / raw) To: Roman Zippel; +Cc: Nikita Danilov, linux-kernel, Tom Zanussi Karim Yaghmour wrote: > This is not good for any client that doesn't know beforehand the exact > size of their data units, as in the case of LTT. If LTT has to use this > code that means we are going to loose performance because we will need to > fill an intermediate data structure which will only be used for relay_write(). > Instead of zero-copy, we would have an extra unnecessary copy. There has > got to be a way for clients to directly reserve and write as they wish. > Even Zach Brown recognized this in his tracepipe proposal, here's from > his patch: > + * - let caller reserve space and get a pointer into buf Actually, come to think of it, this code is not good for any client that needs to fill complex data structures, whether they be fixed-size or not, because it requires having a prepackaged structure already available. Any client that wants to have zero-copying will want to write data directly into the buffer instead of filling an intermediate buffer first. And this requires being able to atomically reserve. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-23 7:43 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-23 7:52 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-23 8:28 ` Karim Yaghmour 2005-01-24 0:38 ` 2.6.11-rc1-mm1 Roman Zippel 2 siblings, 0 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-23 8:28 UTC (permalink / raw) To: Roman Zippel; +Cc: Nikita Danilov, linux-kernel, Tom Zanussi Karim Yaghmour wrote: > This is not good for any client that doesn't know beforehand the exact > size of their data units, as in the case of LTT. If LTT has to use this > code that means we are going to loose performance because we will need to > fill an intermediate data structure which will only be used for relay_write(). > Instead of zero-copy, we would have an extra unnecessary copy. There has > got to be a way for clients to directly reserve and write as they wish. > Even Zach Brown recognized this in his tracepipe proposal, here's from > his patch: > + * - let caller reserve space and get a pointer into buf Also, if the reserve is exported, then a client that chooses so, can do something like: local_irq_save(); relay_reserve(); write(); write(); write(); ... local_irq_restore(); And therefore enforce in-order events is he so chooses. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-23 7:43 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-23 7:52 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-23 8:28 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-24 0:38 ` Roman Zippel 2005-01-25 9:12 ` 2.6.11-rc1-mm1 Karim Yaghmour 2 siblings, 1 reply; 142+ messages in thread From: Roman Zippel @ 2005-01-24 0:38 UTC (permalink / raw) To: Karim Yaghmour; +Cc: Nikita Danilov, linux-kernel, Tom Zanussi Hi, On Sun, 23 Jan 2005, Karim Yaghmour wrote: > But how does relayfs organize the namespace then? What if I have > multiple channels per CPU, each for a different type of data, will > all channels for the same CPU be under the same directory or will > each type of data have its own directory with one entry per CPU? I'd say the latter, you already do this for ltt. > I don't have an answer to that, and I don't know that we should. Why > not just leave it to the client to organize his data as he wishes. > If we must assume that everyone will have at least one channel per > CPU, then why not provide helper functions built on top of very > basic functions instead of fixing the namespace in stone? How should simple do you want to have these helper functions, isn't something like relay_create(path, num_chan) simple enough? I don't think a directory structure is that bad, as that allows to add more control files to the relay stream and still leave the option to write out all buffers into one file. > > I have to modify it a little (only the if (!buffer) part is new): > > > > cpu = get_cpu(); > > buffer = relay_get_buffer(chan, cpu); > > while(1) { > > offset = local_add_return(buffer->offset, length); > > if (likely(offset + length <= buffer->size)) > > break; > > buffer = relay_switch_buffer(chan, buffer, offset); > > if (!buffer) { > > put_cpu(); > > return; > > } > > } > > memcpy(buffer->data + offset, data, length); > > put_cpu(); > > > > This has a very short fast path and I need very good reasons to change/add > > anything here. OTOH the slow path with relay_switch_buffer() is less > > critical and still leaves a lot of flexibility. > > This is not good for any client that doesn't know beforehand the exact > size of their data units, as in the case of LTT. If LTT has to use this > code that means we are going to loose performance because we will need to > fill an intermediate data structure which will only be used for relay_write(). > Instead of zero-copy, we would have an extra unnecessary copy. There has > got to be a way for clients to directly reserve and write as they wish. Ok, let's change it a little so it's more familiar. :) void *relay_reserve(chan, length, cpu) { buffer = relay_get_buffer(chan, cpu); while(1) { offset = local_add_return(buffer->offset, length); if (likely(offset + length <= buffer->size)) return buffer->data + offset; buffer = relay_switch_buffer(chan, buffer, offset); if (!buffer) return NULL; } } All you have to do is to put between get_cpu()/put_cpu(). The same is also possible as macro, which allows you to directly jump out of it to the failure code and avoid one test. > > Look closer, it's already interrupt safe, the synchronization for the > > buffer switch is left to relay_switch_buffer(). > > Sorry, I'm still missing something. What exactly does local_add_return() > do? I assume this code has got to be interrupt safe? Something like: > #define local_add_return(OFFSET, LEN) \ > do {\ > ... > local_irq_save(); \ > OFFSET += LEN; > local_irq_restore(); \ > ... > } while(0); > > I'm assuming local_irq_XXX because we were told by quite a few people > in the related thread to avoid atomic ops because they are more expensive > on most CPUs than cli/sti. That would be about the generic implementation, but it allows archs to provide more efficient implementations in <asm/local.h>, e.g. i386 can use xadd. > Also how does relay_get_buffer() operate? #define relay_get_buffer(chan, cpu) chan->buffer[cpu] > What if I'm writing an event > from within a system call and I'm about to switch buffers and get > an interrupt at the if(likely(...))? Isn't relay_get_buffer() going to > return the same pointer as the one obtained for the syscall, and aren't > both cases now going to effect relay_switch_buffer(), one of which will > be superfluous? The synchronization has to be done in relay_switch_buffer(), but catching it there is still cheaper as in the fast path. > > This adds a conditional and is not really needed. Above shows how to make > > it interrupt safe and if the clients wants to reuse the same buffer, leave > > the locking to the client. > > Fine, but how is the client going to be able to reuse the same buffer if > relayfs always assumes per-CPU buffer as you said above? This would be > solved if at its core relayfs' functions worked on single channels and > additional code provided helpers for making the SMP case very simple. What do you mean? Why not make SMP case simple (less to get wrong)? The client can still serialize everything with a simple spinlock. > > That's quite a lot of code with at least 14 conditions (or 13 conditions > > too much) and this is just relayfs. > > I believe Tom has refactored the code with your comments in mind, and has > something ready for review. I just want to clear up the above before we > make this final. Among other things, he just dropped all modes, and there's > only a basic relay_write() that closely resembles what you have above. Ok, great. BTW I don't really expect the first version to be fully optimized (unless you want to :) ), but once the basics are right, that can still be added later. bye, Roman ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-24 0:38 ` 2.6.11-rc1-mm1 Roman Zippel @ 2005-01-25 9:12 ` Karim Yaghmour 0 siblings, 0 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-25 9:12 UTC (permalink / raw) To: Roman Zippel; +Cc: Nikita Danilov, linux-kernel, Tom Zanussi Roman Zippel wrote: > Ok, great. > BTW I don't really expect the first version to be fully optimized (unless > you want to :) ), but once the basics are right, that can still be added > later. Agreed. Tom will post updated patches sometime this week. I'll follow up with the LTT stuff separately as agreed. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-17 21:27 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-17 23:57 ` 2.6.11-rc1-mm1 Roman Zippel @ 2005-01-18 1:13 ` Roman Zippel 2005-01-18 2:52 ` 2.6.11-rc1-mm1 Karim Yaghmour 1 sibling, 1 reply; 142+ messages in thread From: Roman Zippel @ 2005-01-18 1:13 UTC (permalink / raw) To: Karim Yaghmour; +Cc: Nikita Danilov, linux-kernel, Tom Zanussi Hi, On Mon, 17 Jan 2005, Karim Yaghmour wrote: > a) create indexes, b) reorder events, and likely c) have to rewrite An additional comment about the order of events. What you're doing in lockless_reserve is bogus anyway. There is no single correct time to write into the event. By artificially synchronizing event order and event time you only cheat yourself. You either take it into account during postprocessing that events can be interrupted or the time stamp doesn't seem to be that important, but there is nothing you can do during the recording of the event except of completely disabling interrupts. bye, Roman ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-18 1:13 ` 2.6.11-rc1-mm1 Roman Zippel @ 2005-01-18 2:52 ` Karim Yaghmour 0 siblings, 0 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-18 2:52 UTC (permalink / raw) To: Roman Zippel; +Cc: Nikita Danilov, linux-kernel, Tom Zanussi Hello Roman, Roman Zippel wrote: > An additional comment about the order of events. What you're doing in > lockless_reserve is bogus anyway. There is no single correct time to > write into the event. By artificially synchronizing event order and event > time you only cheat yourself. You either take it into account during > postprocessing that events can be interrupted or the time stamp doesn't > seem to be that important, but there is nothing you can do during the > recording of the event except of completely disabling interrupts. Correct and like I said before, we are dropping the lockless scheme. Ergo, disabling interrupts we will. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 21:18 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-17 1:37 ` 2.6.11-rc1-mm1 Thomas Gleixner 2005-01-17 13:54 ` 2.6.11-rc1-mm1 Roman Zippel @ 2005-01-17 17:02 ` Tom Zanussi 2 siblings, 0 replies; 142+ messages in thread From: Tom Zanussi @ 2005-01-17 17:02 UTC (permalink / raw) To: karim; +Cc: Roman Zippel, Andi Kleen, Nikita Danilov, linux-kernel, Tom Zanussi Karim Yaghmour writes: > > Hello Roman, > > > What we are dropping for later review: read/write semantics from > user-space. It has to be understood that we believe that this is > a major drawback. For one thing, you won't be able to do something > like: > $ cat /relayfs/xchg/my-file > ~/test-data > > Instead, you will have to write a custom app that does open(), > mmap(), write(). We could still provide a small app/library that > did this automagically, but you've got to admit that nothing > beats the real thing. > Maybe we could use FUSE to provide read()/write() for relayfs files - opening a FUSE relayfs file would open and mmap the actual relayfs file, read() would move around in the buffer using basically the current relayfs read logic moved down into the FUSE filesystem read fileop, and write() could write directly to the buffer... Tom > Also note that there are people who currently use this already, > so there will be some unhappy campers. > > Karim > -- > Author, Speaker, Developer, Consultant > Pushing Embedded and Real-Time Linux Systems Beyond the Limits > http://www.opersys.com || karim@opersys.com || 1-866-677-4546 -- Regards, Tom Zanussi <zanussi@us.ibm.com> IBM Linux Technology Center/RAS ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 6:00 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-16 16:52 ` 2.6.11-rc1-mm1 Roman Zippel @ 2005-01-16 19:05 ` Tom Zanussi 2005-01-19 11:14 ` 2.6.11-rc1-mm1 Christoph Hellwig 1 sibling, 1 reply; 142+ messages in thread From: Tom Zanussi @ 2005-01-16 19:05 UTC (permalink / raw) To: karim Cc: Roman Zippel, Andi Kleen, Nikita Danilov, linux-kernel, Tom Zanussi, frankeh Karim Yaghmour writes: > > What I'm dropping for now is all the functions that allow a > subsystem to read from a channel from within the kernel. So, > for example, if you want to obtain large amounts of data from > user-space via a relayfs channel you won't be able to. Here > are the functions that would go: > > rchan_reader *add_rchan_reader(channel_id, auto_consume) > int remove_rchan_reader(rchan_reader *reader) > rchan_reader *add_map_reader(channel_id) > int remove_map_reader(rchan_reader *reader) > int relay_read(reader, buf, count, wait, *actual_read_offset) > void relay_buffers_consumed(reader, buffers_consumed) > void relay_bytes_consumed(reader, bytes_consumed, read_offset) > int relay_bytes_avail(reader) > int rchan_full(reader) > int rchan_empty(reader) > > We could add these at a later time when/if needed. Removing > these changes nothing for ltt. One of the things that uses these functions to read from a channel from within the kernel is the relayfs code that implements read(2), so taking them away means you wouldn't be able to use read() on a relayfs file. That wouldn't matter for ltt since it mmaps the file, but there are existing users of relayfs that do use relayfs this way. In fact, most of the bug reports I've gotten are from people using it in this mode. That doesn't mean though that it's necessarily the right thing for relayfs or these users to be doing if they have suitable alternatives for passing lower-volume messages in this way. As others have mentioned, that seems to be the major question - should relayfs concentrate on being solely a high-speed data relay mechanism or should it try to be more, as it currently is implemented? If the former, then I wonder if you need a filesystem at all - all you have is a collection of mmappable buffers and the only thing the filesystem provides is the namespace. Removing read()/write() and filesystem support would of course greatly simplify the code; I'd like to hear from any existing users though and see what they'd be missing. ltt would still need at least relay_buffers_consumed() though. This is used to support the 'no-overwrite' option, which means that when the buffers are full i.e. the daemon has fallen behind and needs to catch up, channel writing is 'suspended' until it catches up. > > Also, we should try to get rid of the following. They are there > for allowing dynamically-resizable buffers, but if we are to > make buffer-management opaque, then this should be done > internally (Tom: I can't remember the rationale for these. Let > me know if there's a reason why the must be kept.) > > int relay_realloc_buffer(*rchan, nbufs, async) > int relay_replace_buffer(*rchan) relay_realloc_buffer actually does the work of allocating the new buffer space for used for resizing, and since it can sleep, it's done in the background using a work queue. When everything's ready, the channel buffer can then be replaced, thus relay_replace_buffer(). The only user of channel resizing that I know of is the 'dynamically resizeable printk replacement' I posted awhile back, and that apparently doesn't have any users, so I'd be happy to get rid of all the resizing code. Tom > > I think this is a pretty major change and simplification of the > API along the lines of what others have asked for. Let me know > what you think. > > Karim > -- > Author, Speaker, Developer, Consultant > Pushing Embedded and Real-Time Linux Systems Beyond the Limits > http://www.opersys.com || karim@opersys.com || 1-866-677-4546 -- Regards, Tom Zanussi <zanussi@us.ibm.com> IBM Linux Technology Center/RAS ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 19:05 ` 2.6.11-rc1-mm1 Tom Zanussi @ 2005-01-19 11:14 ` Christoph Hellwig 2005-01-19 16:53 ` 2.6.11-rc1-mm1 Tom Zanussi 0 siblings, 1 reply; 142+ messages in thread From: Christoph Hellwig @ 2005-01-19 11:14 UTC (permalink / raw) To: Tom Zanussi Cc: karim, Roman Zippel, Andi Kleen, Nikita Danilov, linux-kernel, frankeh On Sun, Jan 16, 2005 at 01:05:19PM -0600, Tom Zanussi wrote: > One of the things that uses these functions to read from a channel > from within the kernel is the relayfs code that implements read(2), so > taking them away means you wouldn't be able to use read() on a relayfs > file. Removing them from the public API is different from disallowing the read operation. > That wouldn't matter for ltt since it mmaps the file, but there > are existing users of relayfs that do use relayfs this way. In fact, > most of the bug reports I've gotten are from people using it in this > mode. That doesn't mean though that it's necessarily the right thing > for relayfs or these users to be doing if they have suitable > alternatives for passing lower-volume messages in this way. As others > have mentioned, that seems to be the major question - should relayfs > concentrate on being solely a high-speed data relay mechanism or > should it try to be more, as it currently is implemented? I'd say let it do one thing well, that is high-volume data transfer. > If the > former, then I wonder if you need a filesystem at all - all you have > is a collection of mmappable buffers and the only thing the filesystem > provides is the namespace. Removing read()/write() and filesystem > support would of course greatly simplify the code; I'd like to hear > from any existing users though and see what they'd be missing. What else would manage the namespace? ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-19 11:14 ` 2.6.11-rc1-mm1 Christoph Hellwig @ 2005-01-19 16:53 ` Tom Zanussi 0 siblings, 0 replies; 142+ messages in thread From: Tom Zanussi @ 2005-01-19 16:53 UTC (permalink / raw) To: Christoph Hellwig Cc: Tom Zanussi, karim, Roman Zippel, Andi Kleen, Nikita Danilov, linux-kernel, frankeh Christoph Hellwig wrote: > On Sun, Jan 16, 2005 at 01:05:19PM -0600, Tom Zanussi wrote: > >>One of the things that uses these functions to read from a channel >>from within the kernel is the relayfs code that implements read(2), so >>taking them away means you wouldn't be able to use read() on a relayfs >>file. > > > Removing them from the public API is different from disallowing the > read operation. > Right, but we were planning on removing all that code in the interest of stripping relayfs down to its bare minimum as a high-speed data transfer mechanism. > >>That wouldn't matter for ltt since it mmaps the file, but there >>are existing users of relayfs that do use relayfs this way. In fact, >>most of the bug reports I've gotten are from people using it in this >>mode. That doesn't mean though that it's necessarily the right thing >>for relayfs or these users to be doing if they have suitable >>alternatives for passing lower-volume messages in this way. As others >>have mentioned, that seems to be the major question - should relayfs >>concentrate on being solely a high-speed data relay mechanism or >>should it try to be more, as it currently is implemented? > > > I'd say let it do one thing well, that is high-volume data transfer. Yes, I think that's the one thing everyone's agreed on. > > >>If the >>former, then I wonder if you need a filesystem at all - all you have >>is a collection of mmappable buffers and the only thing the filesystem >>provides is the namespace. Removing read()/write() and filesystem >>support would of course greatly simplify the code; I'd like to hear >>from any existing users though and see what they'd be missing. > > > What else would manage the namespace? I have to confess I haven't had the time to look at it in detail, but I previously suggested that we might be able to recover the read() operations by providing them in userspace on top of the mmapped relayfs buffer, using FUSE. If we did that, our FUSE filesystem could also provide the namespace, I assume. Anyway, I don't think I've seen any objections in principal to the filesystem part of relayfs, so maybe it's not an issue - any other suggestions would be welcome, of course... Tom > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 21:11 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-14 22:58 ` 2.6.11-rc1-mm1 Tim Bird 2005-01-15 1:06 ` 2.6.11-rc1-mm1 Roman Zippel @ 2005-01-16 16:14 ` Christoph Hellwig 2005-01-16 19:47 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-16 20:30 ` 2.6.11-rc1-mm1 Tom Zanussi 2 siblings, 2 replies; 142+ messages in thread From: Christoph Hellwig @ 2005-01-16 16:14 UTC (permalink / raw) To: Karim Yaghmour Cc: Roman Zippel, Andi Kleen, Nikita Danilov, linux-kernel, Tom Zanussi On Fri, Jan 14, 2005 at 04:11:38PM -0500, Karim Yaghmour wrote: > Where does this appear in relayfs and what rights do > user-space apps have over it (rwx). Why would you want anything but read access? > bufsize, nbufs: > Usually things have to be subdivided in sub-buffers to make > both writing and reading simple. LTT uses this to allow, > among other things, random trace access. I think random access is overkill. Keeping the code simple is more important and user-space can post-process it. > resize_min, resize_max: > Allow for dynamic resizing of buffer. Auto-resizing sounds like a really bad idea. > init_buf, init_buf_size: > Is there an initial buffer containing some data that should > be used to initialize the channel's content. If you're doing > init-time tracing, for example, you need to have a pre-allocated > static buffer that is copied to relayfs once relayfs is mounted. And why can't you do this from that code? It just needs an initcall-like thing that runs after mounting of relayfs. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 16:14 ` 2.6.11-rc1-mm1 Christoph Hellwig @ 2005-01-16 19:47 ` Karim Yaghmour 2005-01-16 20:30 ` 2.6.11-rc1-mm1 Tom Zanussi 1 sibling, 0 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-16 19:47 UTC (permalink / raw) To: Christoph Hellwig Cc: Roman Zippel, Andi Kleen, Nikita Danilov, linux-kernel, Tom Zanussi Hello Christoph, Christoph Hellwig wrote: > Why would you want anything but read access? Fine, we can put it read-only, we'll drop the "mode" field. > I think random access is overkill. Keeping the code simple is more > important and user-space can post-process it. it's overkill if you're thinking in terms of kbs or mbs of data. it isn't if you're looking at gbs and 100gbs. please read my other posting as to who is using this and how. but regardless of access, you have to have some way of telling relayfs of the size of the channel you want. bufsize, nbufs just tell relayfs the size of the buffers you want and how many buffers there are in the ring. both of which are really basic to any sort of buffering scheme. > Auto-resizing sounds like a really bad idea. Ok, it will go. > And why can't you do this from that code? It just needs an initcall-like > thing that runs after mounting of relayfs. Ok, we'll leave it to the caller to do a relay_write() with his init-bufs at startup. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 16:14 ` 2.6.11-rc1-mm1 Christoph Hellwig 2005-01-16 19:47 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-16 20:30 ` Tom Zanussi 2005-01-19 11:11 ` 2.6.11-rc1-mm1 Christoph Hellwig 1 sibling, 1 reply; 142+ messages in thread From: Tom Zanussi @ 2005-01-16 20:30 UTC (permalink / raw) To: Christoph Hellwig Cc: Karim Yaghmour, Roman Zippel, Andi Kleen, Nikita Danilov, linux-kernel, Tom Zanussi Christoph Hellwig writes: > On Fri, Jan 14, 2005 at 04:11:38PM -0500, Karim Yaghmour wrote: > > Where does this appear in relayfs and what rights do > > user-space apps have over it (rwx). > > Why would you want anything but read access? This would allow an application to write trace events of its own to a trace stream for instance. Also, I added a user-requested 'feature' whereby write()s on a relayfs channel would be sent to a callback that could be used to interpret 'out-of-band' commands sent from the userspace application. And if lockless logging were being used, this could provide a cheaper way for applications to write to the trace buffer than having to do it via syscall. > > > bufsize, nbufs: > > Usually things have to be subdivided in sub-buffers to make > > both writing and reading simple. LTT uses this to allow, > > among other things, random trace access. > > I think random access is overkill. Keeping the code simple is more > important and user-space can post-process it. > > > resize_min, resize_max: > > Allow for dynamic resizing of buffer. > > Auto-resizing sounds like a really bad idea. It also doesn't seem to be really useful to anyone, so we should probably remove it. Tom > > > init_buf, init_buf_size: > > Is there an initial buffer containing some data that should > > be used to initialize the channel's content. If you're doing > > init-time tracing, for example, you need to have a pre-allocated > > static buffer that is copied to relayfs once relayfs is mounted. > > And why can't you do this from that code? It just needs an initcall-like > thing that runs after mounting of relayfs. > -- Regards, Tom Zanussi <zanussi@us.ibm.com> IBM Linux Technology Center/RAS ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 20:30 ` 2.6.11-rc1-mm1 Tom Zanussi @ 2005-01-19 11:11 ` Christoph Hellwig 0 siblings, 0 replies; 142+ messages in thread From: Christoph Hellwig @ 2005-01-19 11:11 UTC (permalink / raw) To: Tom Zanussi Cc: Christoph Hellwig, Karim Yaghmour, Roman Zippel, Andi Kleen, Nikita Danilov, linux-kernel On Sun, Jan 16, 2005 at 02:30:33PM -0600, Tom Zanussi wrote: > This would allow an application to write trace events of its own to a > trace stream for instance. I don't think this is a good idea. Userspace could aswell easily write its trace into shared memory segments. > Also, I added a user-requested 'feature' > whereby write()s on a relayfs channel would be sent to a callback that > could be used to interpret 'out-of-band' commands sent from the > userspace application. Now write as a control channel makes lots of sense, but I'd encapsulate that differently. Basically a net ctl file for each stream (and get rid of ioctl in favour of this one while we're at it) ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 8:47 ` 2.6.11-rc1-mm1 Andi Kleen 2005-01-14 9:27 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-14 10:27 ` 2.6.11-rc1-mm1 Nikita Danilov @ 2005-01-14 15:24 ` Roman Zippel 2005-01-18 11:19 ` 2.6.11-rc1-mm1 Masami Hiramatsu 3 siblings, 0 replies; 142+ messages in thread From: Roman Zippel @ 2005-01-14 15:24 UTC (permalink / raw) To: Andi Kleen; +Cc: Andrew Morton, linux-kernel Hi, On Fri, 14 Jan 2005, Andi Kleen wrote: > > - Added the Linux Trace Toolkit (and hence relayfs). Mainly because I > > haven't yet taken as close a look at LTT as I should have. Probably neither > > have you. > > I think it would be better to have a standard set of kprobes instead > of all the ugly LTT hooks. kprobes could then log to relayfs or another > fast logging mechanism. kprobes is not portable. > The problem relayfs has IMHO is that it is too complicated. It > seems to either suffer from a overfull specification or second system > effect. There are lots of different options to do everything, > instead of a nice simple fast path that does one thing efficiently. I have to agree with this. relayfs should resemble a very simple pipe, maybe making it possible to writing them directly to disk. ltt has the same problem. It still does way too much at event time, it should just pump the data to disk and postprocess it later. I think it's better to implement multiple traces in user space via a daemon, which synchronizes multiple users. > IMHO before merging it should go through a diet and only keep > the paths that are actually needed and dropping a lot of the current > baggage. While I agree this is needed, I don't think it's a reason against merging, it should just be made clear, that the API is not stable and will change. bye, Roman ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 8:47 ` 2.6.11-rc1-mm1 Andi Kleen ` (2 preceding siblings ...) 2005-01-14 15:24 ` 2.6.11-rc1-mm1 Roman Zippel @ 2005-01-18 11:19 ` Masami Hiramatsu 2005-01-18 11:46 ` 2.6.11-rc1-mm1 Andi Kleen 3 siblings, 1 reply; 142+ messages in thread From: Masami Hiramatsu @ 2005-01-18 11:19 UTC (permalink / raw) To: Andi Kleen; +Cc: Andrew Morton, linux-kernel, lkst-develop Hello, I’m a developer of yet another kernel tracer, LKST. I and co-developers are very glad to hear that LTT was merged into -mm tree and to talk about the kernel tracer on this ML. Because we think that the kernel event tracer is useful to debug Linux systems, and to improve the kernel reliability. Andi Kleen wrote: > Andrew Morton <akpm@osdl.org> writes: > >>- Added the Linux Trace Toolkit (and hence relayfs). Mainly because I >> haven't yet taken as close a look at LTT as I should have. Probably neither >> have you. > > > I think it would be better to have a standard set of kprobes instead > of all the ugly LTT hooks. kprobes could then log to relayfs or another > fast logging mechanism. I agree. I’m interested in kprobes. Currently, LKST can switch off and on each hook. But, even if a hook was disabled, there is a little overhead-time (one conditional-jump instruction should be executed). I think kprobes-based hooks can completely remove this overhead-time. Moreover, kprobes-based hooks can be inserted dynamically into the code-point specified by user. This feature is greatly useful for debugging. So, I have an idea to renew LKST to kprobes-based hooks. Also, I’m developing a prototype implementation. > The problem relayfs has IMHO is that it is too complicated. It > seems to either suffer from a overfull specification or second system > effect. There are lots of different options to do everything, > instead of a nice simple fast path that does one thing efficiently. > IMHO before merging it should go through a diet and only keep > the paths that are actually needed and dropping a lot of the current > baggage. > > Preferably that would be only the fastest options (extremly simple > per CPU buffer with inlined fast path that drop data on buffer overflow), > with leaving out anything more complicated. My ideal is something > like the old SGI ktrace which was an extremly simple mechanism > to do lockless per CPU logging of binary data efficiently and > reading that from a user daemon. LKST’s logging buffer is (much) simpler than relayfs. It is just the linked-perCPU-buffer. If you are interested in this, please try LKST. -- Masami HIRAMATSU Hitachi, Ltd., Systems Development Laboratory E-mail: hiramatu@sdl.hitachi.co.jp ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-18 11:19 ` 2.6.11-rc1-mm1 Masami Hiramatsu @ 2005-01-18 11:46 ` Andi Kleen 2005-01-18 14:52 ` [Lkst-develop] 2.6.11-rc1-mm1 Masami Hiramatsu 0 siblings, 1 reply; 142+ messages in thread From: Andi Kleen @ 2005-01-18 11:46 UTC (permalink / raw) To: Masami Hiramatsu; +Cc: Andrew Morton, linux-kernel, lkst-develop On Tue, Jan 18, 2005 at 08:19:18PM +0900, Masami Hiramatsu wrote: > Hello, > > I?m a developer of yet another kernel tracer, LKST. I and co-developers > are very glad to hear that LTT was merged into -mm tree and to talk > about the kernel tracer on this ML. Because we think that the kernel > event tracer is useful to debug Linux systems, and to improve the kernel > reliability. I haven't looked at your code, but I would suggest you also post for review it so that it can be evaluated in the same way as other more noisy proposals. Perhaps Andrew can test both for some time in MM like he used to do for the various schedulers. -Andi ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [Lkst-develop] Re: 2.6.11-rc1-mm1 2005-01-18 11:46 ` 2.6.11-rc1-mm1 Andi Kleen @ 2005-01-18 14:52 ` Masami Hiramatsu 0 siblings, 0 replies; 142+ messages in thread From: Masami Hiramatsu @ 2005-01-18 14:52 UTC (permalink / raw) To: Andi Kleen; +Cc: Andrew Morton, linux-kernel, lkst-develop Hi, Andi Kleen wrote: > On Tue, Jan 18, 2005 at 08:19:18PM +0900, Masami Hiramatsu wrote: > >>Hello, >> >>I?m a developer of yet another kernel tracer, LKST. I and co-developers >>are very glad to hear that LTT was merged into -mm tree and to talk >>about the kernel tracer on this ML. Because we think that the kernel >>event tracer is useful to debug Linux systems, and to improve the kernel >>reliability. > > > I haven't looked at your code, but I would suggest you also post > for review it so that it can be evaluated in the same way > as other more noisy proposals. > > Perhaps Andrew can test both for some time in MM like he used > to do for the various schedulers. Thanks to your advice. The latest release package of LKST baesd on linux-2.6.9 can be downloaded from http://sourceforge.net/projects/lkst/ I'll release the LKST based on the latest kernel as soon as possible. Regards, -- Masami HIRAMATSU Hitachi, Ltd., Systems Development Laboratory E-mail: hiramatu@sdl.hitachi.co.jp ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 8:23 2.6.11-rc1-mm1 Andrew Morton 2005-01-14 8:47 ` 2.6.11-rc1-mm1 Andi Kleen @ 2005-01-14 12:36 ` Miklos Szeredi 2005-01-14 13:04 ` 2.6.11-rc1-mm1 Kasper Sandberg ` (9 subsequent siblings) 11 siblings, 0 replies; 142+ messages in thread From: Miklos Szeredi @ 2005-01-14 12:36 UTC (permalink / raw) To: akpm; +Cc: linux-kernel > - Added FUSE (filesystem in userspace) for people to play with. Am agnostic > as to whether it should be merged (haven't read it at all closely yet, > either), but I am impressed by the amount of care which has obviously gone > into it. Opinions sought. Great, thanks Andrew! Miklos ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 8:23 2.6.11-rc1-mm1 Andrew Morton 2005-01-14 8:47 ` 2.6.11-rc1-mm1 Andi Kleen 2005-01-14 12:36 ` 2.6.11-rc1-mm1 Miklos Szeredi @ 2005-01-14 13:04 ` Kasper Sandberg 2005-01-14 18:35 ` 2.6.11-rc1-mm1 Andrew Morton 2005-01-14 19:02 ` 2.6.11-rc1-mm1 Bill Davidsen 2005-01-14 15:07 ` 2.6.11-rc1-mm1 Barry K. Nathan ` (8 subsequent siblings) 11 siblings, 2 replies; 142+ messages in thread From: Kasper Sandberg @ 2005-01-14 13:04 UTC (permalink / raw) To: Andrew Morton; +Cc: LKML Mailinglist On Fri, 2005-01-14 at 00:23 -0800, Andrew Morton wrote: > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc1/2.6.11-rc1-mm1/ > > > - Added bk-xfs to the -mm "external trees" lineup. > > - Added the Linux Trace Toolkit (and hence relayfs). Mainly because I > haven't yet taken as close a look at LTT as I should have. Probably neither > have you. > > It needs a bit of work on the kernel<->user periphery, which is not a big > deal. > > As does relayfs, IMO. It seems to need some regularised way in which a > userspace relayfs client can tell relayfs what file(s) to use. LTT is > currently using some ghastly stick-a-pathname-in-/proc thing. Relayfs > should provide this service. > > relayfs needs a closer look too. A lot of advanced instrumentation > projects seem to require it, but none of them have been merged. Lots of > people say "use netlink instead" and lots of other people say "err, we think > relayfs is better". This is a discussion which needs to be had. > > - The 2.6.10-mm3 announcement was munched by the vger filters, sorry. One of > the uml patches had an inopportune substring in its name (oh pee tee hyphen > oh you tee). Nice trick if you meant it ;) > > - Big update to the ext3 extended attribute support. agruen, tridge and sct > have been cooking this up for a while. samba4 proved to be a good > stress test. > > - davej's "2.6 post-Halloween features" document has been added to -mm as > Documentation/feature-list-2.6.txt in the hope that someone will review it > and help keep it up-to-date. > > - Added FUSE (filesystem in userspace) for people to play with. Am agnostic > as to whether it should be merged (haven't read it at all closely yet, > either), but I am impressed by the amount of care which has obviously gone > into it. Opinions sought. i really believe fuse is a good thing to have merged, i use it, and it works really really good. my vote is to get it in <snip> ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 13:04 ` 2.6.11-rc1-mm1 Kasper Sandberg @ 2005-01-14 18:35 ` Andrew Morton 2005-01-14 19:08 ` 2.6.11-rc1-mm1 Rogério Brito ` (2 more replies) 2005-01-14 19:02 ` 2.6.11-rc1-mm1 Bill Davidsen 1 sibling, 3 replies; 142+ messages in thread From: Andrew Morton @ 2005-01-14 18:35 UTC (permalink / raw) To: Kasper Sandberg; +Cc: linux-kernel Kasper Sandberg <lkml@metanurb.dk> wrote: > > i really believe fuse is a good thing to have merged, i use it, and it > works really really good. What filesystem(s) do you use, and why? ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 18:35 ` 2.6.11-rc1-mm1 Andrew Morton @ 2005-01-14 19:08 ` Rogério Brito 2005-01-14 19:41 ` 2.6.11-rc1-mm1 Peter Buckingham 2005-01-17 17:04 ` 2.6.11-rc1-mm1 Matthias Urlichs 2 siblings, 0 replies; 142+ messages in thread From: Rogério Brito @ 2005-01-14 19:08 UTC (permalink / raw) To: linux-kernel On Jan 14 2005, Andrew Morton wrote: > Kasper Sandberg <lkml@metanurb.dk> wrote: > > i really believe fuse is a good thing to have merged, i use it, and it > > works really really good. > > What filesystem(s) do you use, and why? I'm not the person to whom you asked the question, but I will answer anyway. I have never used a -mm kernel tree before, but seeing that fuse got included made me download the patch to try it. I'll be using gmailfs (which needs fuse) just to see how things work with Debian's testing (sarge) userland. Hope this is another data point of interest, Rogério. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Rogério Brito - rbrito@ime.usp.br - http://www.ime.usp.br/~rbrito =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 18:35 ` 2.6.11-rc1-mm1 Andrew Morton 2005-01-14 19:08 ` 2.6.11-rc1-mm1 Rogério Brito @ 2005-01-14 19:41 ` Peter Buckingham 2005-01-17 17:04 ` 2.6.11-rc1-mm1 Matthias Urlichs 2 siblings, 0 replies; 142+ messages in thread From: Peter Buckingham @ 2005-01-14 19:41 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Andrew Morton wrote: > Kasper Sandberg <lkml@metanurb.dk> wrote: > >>i really believe fuse is a good thing to have merged, i use it, and it >> works really really good. > > > What filesystem(s) do you use, and why? we're currently prototyping a lightweight network filesystem proxy using fuse. peter ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 18:35 ` 2.6.11-rc1-mm1 Andrew Morton 2005-01-14 19:08 ` 2.6.11-rc1-mm1 Rogério Brito 2005-01-14 19:41 ` 2.6.11-rc1-mm1 Peter Buckingham @ 2005-01-17 17:04 ` Matthias Urlichs 2 siblings, 0 replies; 142+ messages in thread From: Matthias Urlichs @ 2005-01-17 17:04 UTC (permalink / raw) To: linux-kernel Hi, Andrew Morton schrub am Fri, 14 Jan 2005 10:35:34 -0800: > What filesystem(s) do you use, and why? sshfs (best idea for file access through firewalls). gmailfs (best free off-site backup facility). Will use encfs as soon as FUSE is in mainline (I'm using cryptoloop now, but that's not sanely backupable.) -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 13:04 ` 2.6.11-rc1-mm1 Kasper Sandberg 2005-01-14 18:35 ` 2.6.11-rc1-mm1 Andrew Morton @ 2005-01-14 19:02 ` Bill Davidsen 1 sibling, 0 replies; 142+ messages in thread From: Bill Davidsen @ 2005-01-14 19:02 UTC (permalink / raw) To: Kasper Sandberg; +Cc: Andrew Morton, LKML Mailinglist Kasper Sandberg wrote: > On Fri, 2005-01-14 at 00:23 -0800, Andrew Morton wrote: > >>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc1/2.6.11-rc1-mm1/ >> >> >>- Added bk-xfs to the -mm "external trees" lineup. >> >>- Added the Linux Trace Toolkit (and hence relayfs). Mainly because I >> haven't yet taken as close a look at LTT as I should have. Probably neither >> have you. >> >> It needs a bit of work on the kernel<->user periphery, which is not a big >> deal. >> >> As does relayfs, IMO. It seems to need some regularised way in which a >> userspace relayfs client can tell relayfs what file(s) to use. LTT is >> currently using some ghastly stick-a-pathname-in-/proc thing. Relayfs >> should provide this service. >> >> relayfs needs a closer look too. A lot of advanced instrumentation >> projects seem to require it, but none of them have been merged. Lots of >> people say "use netlink instead" and lots of other people say "err, we think >> relayfs is better". This is a discussion which needs to be had. >> >>- The 2.6.10-mm3 announcement was munched by the vger filters, sorry. One of >> the uml patches had an inopportune substring in its name (oh pee tee hyphen >> oh you tee). Nice trick if you meant it ;) >> >>- Big update to the ext3 extended attribute support. agruen, tridge and sct >> have been cooking this up for a while. samba4 proved to be a good >> stress test. >> >>- davej's "2.6 post-Halloween features" document has been added to -mm as >> Documentation/feature-list-2.6.txt in the hope that someone will review it >> and help keep it up-to-date. >> >>- Added FUSE (filesystem in userspace) for people to play with. Am agnostic >> as to whether it should be merged (haven't read it at all closely yet, >> either), but I am impressed by the amount of care which has obviously gone >> into it. Opinions sought. > > > i really believe fuse is a good thing to have merged, i use it, and it > works really really good. my vote is to get it in I like the idea, but I also like the practice of letting a feature like this sit in -mm for a few weeks or even a month until people have a chance to break^H^H^H^H^Htest it a bit. -- -bill davidsen (davidsen@tmr.com) "The secret to procrastination is to put things off until the last possible moment - but no longer" -me ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 8:23 2.6.11-rc1-mm1 Andrew Morton ` (2 preceding siblings ...) 2005-01-14 13:04 ` 2.6.11-rc1-mm1 Kasper Sandberg @ 2005-01-14 15:07 ` Barry K. Nathan 2005-01-14 16:56 ` 2.6.11-rc1-mm1 Dave Jones 2005-01-19 23:06 ` 2.6.11-rc1-mm1 Marcos D. Marado Torres 2005-01-14 15:35 ` 2.6.11-rc1-mm1 Zwane Mwaikambo ` (7 subsequent siblings) 11 siblings, 2 replies; 142+ messages in thread From: Barry K. Nathan @ 2005-01-14 15:07 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel This isn't new to 2.6.11-rc1-mm1, but it has the infamous (to Fedora users) "ACPI shutdown bug" -- poweroff hangs instead of actually turning the computer off, on some computers. Here's the RH Bugzilla report where most of the discussion took place: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=132761 In the Fedora kernels it turned out to be due to kexec. I'll see if I can narrow it down further. -Barry K. Nathan <barryn@pobox.com> ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 15:07 ` 2.6.11-rc1-mm1 Barry K. Nathan @ 2005-01-14 16:56 ` Dave Jones 2005-01-14 17:55 ` 2.6.11-rc1-mm1 Barry K. Nathan 2005-01-19 23:06 ` 2.6.11-rc1-mm1 Marcos D. Marado Torres 1 sibling, 1 reply; 142+ messages in thread From: Dave Jones @ 2005-01-14 16:56 UTC (permalink / raw) To: Barry K. Nathan; +Cc: Andrew Morton, linux-kernel On Fri, Jan 14, 2005 at 07:07:14AM -0800, Barry K. Nathan wrote: > This isn't new to 2.6.11-rc1-mm1, but it has the infamous (to Fedora > users) "ACPI shutdown bug" -- poweroff hangs instead of actually turning > the computer off, on some computers. Here's the RH Bugzilla report where > most of the discussion took place: > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=132761 > > In the Fedora kernels it turned out to be due to kexec. I'll see if I > can narrow it down further. For *some* users. It still affects others. My Compaq Evo showed the bug with 2.6.9 vanilla, went away with 2.6.10 vanilla. Dave ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 16:56 ` 2.6.11-rc1-mm1 Dave Jones @ 2005-01-14 17:55 ` Barry K. Nathan 0 siblings, 0 replies; 142+ messages in thread From: Barry K. Nathan @ 2005-01-14 17:55 UTC (permalink / raw) To: Dave Jones, Barry K. Nathan, Andrew Morton, linux-kernel On Fri, Jan 14, 2005 at 11:56:12AM -0500, Dave Jones wrote: > For *some* users. It still affects others. > My Compaq Evo showed the bug with 2.6.9 vanilla, went away with 2.6.10 > vanilla. Ok, I didn't know that. Anyway, I've dug a bit deeper into my particular case, and there's now some more information here: http://bugme.osdl.org/show_bug.cgi?id=4041 -Barry K. Nathan <barryn@pobox.com> ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 15:07 ` 2.6.11-rc1-mm1 Barry K. Nathan 2005-01-14 16:56 ` 2.6.11-rc1-mm1 Dave Jones @ 2005-01-19 23:06 ` Marcos D. Marado Torres 2005-01-19 23:54 ` 2.6.11-rc1-mm1 Barry K. Nathan 1 sibling, 1 reply; 142+ messages in thread From: Marcos D. Marado Torres @ 2005-01-19 23:06 UTC (permalink / raw) To: Barry K. Nathan; +Cc: Andrew Morton, linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, 14 Jan 2005, Barry K. Nathan wrote: > This isn't new to 2.6.11-rc1-mm1, but it has the infamous (to Fedora > users) "ACPI shutdown bug" -- poweroff hangs instead of actually turning > the computer off, on some computers. Here's the RH Bugzilla report where > most of the discussion took place: > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=132761 This is the same bug I've talked here: http://lkml.org/lkml/2005/1/11/88 This only happens with -mm and not with vanilla sources. I'm reporting about this issue in an ASUS M3N laptop with Debian. Best regards, Mind Booster Noori > In the Fedora kernels it turned out to be due to kexec. I'll see if I > can narrow it down further. > > -Barry K. Nathan <barryn@pobox.com> > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - -- /* *************************************************************** */ Marcos Daniel Marado Torres AKA Mind Booster Noori http://student.dei.uc.pt/~marado - marado@student.dei.uc.pt () Join the ASCII ribbon campaign against html email, Microsoft /\ attachments and Software patents. They endanger the World. Sign a petition against patents: http://petition.eurolinux.org /* *************************************************************** */ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFB7ufzmNlq8m+oD34RAmsIAKDM55tzy957YqEXtNkz9l2O3O7V1ACeKXQB v2LuSPMWch9A7NQApq6Bm8c= =F7on -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-19 23:06 ` 2.6.11-rc1-mm1 Marcos D. Marado Torres @ 2005-01-19 23:54 ` Barry K. Nathan 0 siblings, 0 replies; 142+ messages in thread From: Barry K. Nathan @ 2005-01-19 23:54 UTC (permalink / raw) To: Marcos D. Marado Torres; +Cc: Barry K. Nathan, Andrew Morton, linux-kernel On Wed, Jan 19, 2005 at 11:06:10PM +0000, Marcos D. Marado Torres wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Fri, 14 Jan 2005, Barry K. Nathan wrote: > > >This isn't new to 2.6.11-rc1-mm1, but it has the infamous (to Fedora > >users) "ACPI shutdown bug" -- poweroff hangs instead of actually turning > >the computer off, on some computers. Here's the RH Bugzilla report where > >most of the discussion took place: > > > >https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=132761 > > This is the same bug I've talked here: > http://lkml.org/lkml/2005/1/11/88 FWIW the RH Bugzilla bug is (unfortunately) discussing several different similar but not identical bugs, as far as I can tell. > This only happens with -mm and not with vanilla sources. > > I'm reporting about this issue in an ASUS M3N laptop with Debian. > > Best regards, > Mind Booster Noori FWIW my report against -mm (where I narrowed it down to one of the kexec patches in particular) is here: http://bugme.osdl.org/show_bug.cgi?id=4041 -Barry K. Nathan <barryn@pobox.com> ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 8:23 2.6.11-rc1-mm1 Andrew Morton ` (3 preceding siblings ...) 2005-01-14 15:07 ` 2.6.11-rc1-mm1 Barry K. Nathan @ 2005-01-14 15:35 ` Zwane Mwaikambo 2005-01-14 22:03 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-14 17:35 ` [patch] 2.6.11-rc1-mm1: ip_tables.c: ipt_find_target must be EXPORT_SYMBOL'ed Adrian Bunk ` (6 subsequent siblings) 11 siblings, 1 reply; 142+ messages in thread From: Zwane Mwaikambo @ 2005-01-14 15:35 UTC (permalink / raw) To: Karim Yaghmour; +Cc: Linux Kernel, Andrew Morton On Fri, 14 Jan 2005, Andrew Morton wrote: > - Added the Linux Trace Toolkit (and hence relayfs). Mainly because I > haven't yet taken as close a look at LTT as I should have. Probably neither > have you. Just a few things from a quick look; - What's with all the ltt_*_bit? Please use the ones provided by the kernel. - i see cpu_has_tsc, can't you use sched_clock? - ltt_log_event isn't preempt safe - num_cpus isn't hotplug cpu safe, and you should be using the for_each_online_cpu iterators - code style, you have large hunks of code with blocks of the following form, you can save processor cycles by placing an if (incoming_process) branch earlier. This code is in _ltt_log_event, which i presume executes frequently. if (event_id == LTT_EV_SCHEDCHANGE) incoming_process = (struct task_struct *) ((ltt_schedchange *) event_struct)->in); if ((trace->tracing_gid == 1) && (current->egid != trace->traced_gid)) { if (incoming_process == NULL) return 0; else if (incoming_process->egid != trace->traced_gid) return 0; } ... [ more of the same ] if ((trace->tracing_uid == 1) && (current->euid != trace->traced_uid)) { if (incoming_process == NULL) return 0; else if (incoming_process->euid != trace->traced_uid) return 0; } ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 15:35 ` 2.6.11-rc1-mm1 Zwane Mwaikambo @ 2005-01-14 22:03 ` Karim Yaghmour 0 siblings, 0 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-14 22:03 UTC (permalink / raw) To: Zwane Mwaikambo; +Cc: Linux Kernel, Andrew Morton Zwane Mwaikambo wrote: > Just a few things from a quick look; Thanks for the feedback. I've added your suggestions to my to-do list. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* [patch] 2.6.11-rc1-mm1: ip_tables.c: ipt_find_target must be EXPORT_SYMBOL'ed 2005-01-14 8:23 2.6.11-rc1-mm1 Andrew Morton ` (4 preceding siblings ...) 2005-01-14 15:35 ` 2.6.11-rc1-mm1 Zwane Mwaikambo @ 2005-01-14 17:35 ` Adrian Bunk 2005-01-14 17:43 ` Patrick McHardy 2005-01-14 22:41 ` 2.6.11-rc1-mm1 Tim Bird ` (5 subsequent siblings) 11 siblings, 1 reply; 142+ messages in thread From: Adrian Bunk @ 2005-01-14 17:35 UTC (permalink / raw) To: Andrew Morton, Rusty Russell; +Cc: linux-kernel, coreteam, netdev On Fri, Jan 14, 2005 at 12:23:52AM -0800, Andrew Morton wrote: >... > All 434 patches: >... > restore-net-sched-iptc-after-iptables-kmod-cleanup.patch > Restore net/sched/ipt.c After iptables Kmod Cleanup >... This causes the following error with CONFIG_NET_ACT_IPT=m: <-- snip --> if [ -r System.map ]; then /sbin/depmod -ae -F System.map 2.6.11-rc1-mm1; fi WARNING: /lib/modules/2.6.11-rc1-mm1/kernel/net/sched/ipt.ko needs unknown symbol ipt_find_target <-- snip --> The fix is simple: Signed-off-by: Adrian Bunk <bunk@stusta.de> --- linux-2.6.11-rc1-mm1-modular/net/ipv4/netfilter/ip_tables.c.old 2005-01-14 18:03:18.000000000 +0100 +++ linux-2.6.11-rc1-mm1-modular/net/ipv4/netfilter/ip_tables.c 2005-01-14 18:04:17.000000000 +0100 @@ -488,6 +488,7 @@ return NULL; return target; } +EXPORT_SYMBOL(ipt_find_target); static int match_revfn(const char *name, u8 revision, int *bestp) { ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [patch] 2.6.11-rc1-mm1: ip_tables.c: ipt_find_target must be EXPORT_SYMBOL'ed 2005-01-14 17:35 ` [patch] 2.6.11-rc1-mm1: ip_tables.c: ipt_find_target must be EXPORT_SYMBOL'ed Adrian Bunk @ 2005-01-14 17:43 ` Patrick McHardy 0 siblings, 0 replies; 142+ messages in thread From: Patrick McHardy @ 2005-01-14 17:43 UTC (permalink / raw) To: Adrian Bunk; +Cc: Andrew Morton, Rusty Russell, linux-kernel, coreteam, netdev Adrian Bunk wrote: >On Fri, Jan 14, 2005 at 12:23:52AM -0800, Andrew Morton wrote: > >>... >>All 434 patches: >>... >>restore-net-sched-iptc-after-iptables-kmod-cleanup.patch >> Restore net/sched/ipt.c After iptables Kmod Cleanup >>... >> > >This causes the following error with CONFIG_NET_ACT_IPT=m: > ><-- snip --> > >if [ -r System.map ]; then /sbin/depmod -ae -F System.map 2.6.11-rc1-mm1; fi >WARNING: /lib/modules/2.6.11-rc1-mm1/kernel/net/sched/ipt.ko needs unknown symbol ipt_find_target > ><-- snip --> > The fix is already in Dave's tree. Regards Patrick ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 8:23 2.6.11-rc1-mm1 Andrew Morton ` (5 preceding siblings ...) 2005-01-14 17:35 ` [patch] 2.6.11-rc1-mm1: ip_tables.c: ipt_find_target must be EXPORT_SYMBOL'ed Adrian Bunk @ 2005-01-14 22:41 ` Tim Bird 2005-01-14 22:46 ` 2.6.11-rc1-mm1 Thomas Gleixner ` (4 subsequent siblings) 11 siblings, 0 replies; 142+ messages in thread From: Tim Bird @ 2005-01-14 22:41 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Andrew Morton wrote: > - Added the Linux Trace Toolkit (and hence relayfs). Mainly because I > haven't yet taken as close a look at LTT as I should have. Probably neither > have you. > > It needs a bit of work on the kernel<->user periphery, which is not a big > deal. > > As does relayfs, IMO. It seems to need some regularised way in which a > userspace relayfs client can tell relayfs what file(s) to use. LTT is > currently using some ghastly stick-a-pathname-in-/proc thing. Relayfs > should provide this service. > > relayfs needs a closer look too. A lot of advanced instrumentation > projects seem to require it, but none of them have been merged. Lots of > people say "use netlink instead" and lots of other people say "err, we think > relayfs is better". This is a discussion which needs to be had. Thanks very much. I know lots of embedded folks who will be happy to see this discussion take place. (As an aside, I'll try to encourage some of our more shy members to speak up and participate in the discussion as well. I know Hitachi has been doing some work on tracing, and I'd hate to see duplicate effort.) BTW - I agree with most of the relayfs comments. It seems like overkill for the kernel developer doing a "casual", ad-hoc trace. I'll try to work with Karim on the suggested improvements. ============================= Tim Bird Architecture Group Chair, CE Linux Forum Senior Staff Engineer, Sony Electronics ============================= ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 8:23 2.6.11-rc1-mm1 Andrew Morton ` (6 preceding siblings ...) 2005-01-14 22:41 ` 2.6.11-rc1-mm1 Tim Bird @ 2005-01-14 22:46 ` Thomas Gleixner 2005-01-14 23:22 ` 2.6.11-rc1-mm1 Tim Bird 2005-01-14 22:48 ` 2.6.11-rc1-mm1 Andre Eisenbach ` (3 subsequent siblings) 11 siblings, 1 reply; 142+ messages in thread From: Thomas Gleixner @ 2005-01-14 22:46 UTC (permalink / raw) To: LKML On Fri, 2005-01-14 at 00:23 -0800, Andrew Morton wrote: > - Added the Linux Trace Toolkit (and hence relayfs). Mainly because I > haven't yet taken as close a look at LTT as I should have. Probably neither > have you. I have. Maybe you should have. I really don't see a good argument to include this code. The "non-locking" claim is nice, but a do { } while loop in the slot reservation for every event including a do { } while loop in the slow path is just a replacement of locking without actually using a lock. I don't care whether this is likely or unlikely to happen, it's just bogus to add a non constant time path for debugging/tracing purposes. Default timestamp measuring with do_gettimeofday is also contrary to the non locking argument. There is a) a lock in there b) it might loop because it's a sequential lock. If you have no TSC you can do at least a jiffies + event-number based, not so finegrained tracing which gives you at least the timeline of the events. There is also no need to do time diff calculations / conversions, this can be done in userspace postprocessing. Adding 150k relayfs source in order to do tracing is scary. I don't see any real advantage over a nice implemented per cpu ringbuffer, which is lock free and does not add variable timed delays in the log path. Don't tell me that a ringbuffer is not suitable, it's a question of size and it is the same problem for relayfs. If you don't have enough buffers it does not work. This applies for every implementation of tracebuffering you do. In space constraint systems relayfs is even worse as it needs more memory than the plain ringbuffer. The ringbuffer has a nice advantage. In case the system crashes you can retrieve the last and therefor most interesting information from the ringbuffer without any hassle via BDI or in the worstcase via a serial dump. You can even copy the tail of the buffer into a permanent storage like buffered SRAM so it can be retrieved after reboot. Splitting the trace into different paths is nice to have but I don't see a single point which cannot be done by a userspace (hostside) postprocessing tool. It adds another non time constant component to the trace path. Even the per CPU ringbuffers can be nicely synchronized by a userspace postprocessing tool without adding complex synchronization functions. Replacing printk by a varags print into an event buffer is a nice idea to replace serial logging of long lasting debug features. Must we really include 150k source for this or can we just increase the log buffer size or improve the printk itself? In case of time related tracing it's just overkill. The printk information is mostly a string, which can be replaced by the address on which the printk is happening. The maybe available arguments can be dumped in binary form. All this information can be converted into human readable form by postprocessing. I wonder whether the various formatting options of the trace are really of any value. I need neither strings, HEX strings nor XML formatted information from the kernel. Max. 8192 Byte of user information makes me frown. Tracing is not a copy to userspace function or am I missing something ? All tracepoints are unconditionally compiled into the kernel, whether they are enabled or not. Why is it neccecary to check the enabled bit for information I'm not interested in ? Why can't I compile this away by not enabling the tracepoint at all. I don't need to point out the various coding style issues again, but I question if atomic_set(&var), atomic_read(&var) | bit); which can be found on several places is really doing what it's suggests to do. I did a short test on a 300MHz PIII box and the maximum time spent in the log path (interrupts disabled during measurement) is about 30us. Extrapolated to a 74MHz ARM SoC it will sum up to ~ 90-120us, what makes it purely useless. Summary: 1. The code is not doing what it claims to do. 2. The code adds unnecessary overhead 3. It's not useful for low speed systems. Question: Why is the code included ? tglx ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 22:46 ` 2.6.11-rc1-mm1 Thomas Gleixner @ 2005-01-14 23:22 ` Tim Bird 2005-01-15 0:24 ` 2.6.11-rc1-mm1 Thomas Gleixner 2005-01-15 13:08 ` [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) Thomas Gleixner 0 siblings, 2 replies; 142+ messages in thread From: Tim Bird @ 2005-01-14 23:22 UTC (permalink / raw) To: tglx; +Cc: LKML Thomas Gleixner wrote: > On Fri, 2005-01-14 at 00:23 -0800, Andrew Morton wrote: > >>- Added the Linux Trace Toolkit (and hence relayfs). Mainly because I >> haven't yet taken as close a look at LTT as I should have. Probably neither >> have you. > > I have. Maybe you should have. I really don't see a good argument to > include this code. [ Lots of excellent criticisms omitted.] I don't want to be argumentative, but possibly (to answer your last question first), there are twofold reasons to put this in -mm: - there's no tracing infrastructure in the kernel now (except for kprobes - which provides hooks for creating tracepoints dynamically, but not 1) supporting infrastructure for timestamping, managing event data, etc., and 2) a static list of generally useful tracepoints. - to generate this discussion. > > I did a short test on a 300MHz PIII box and the maximum time spent in > the log path (interrupts disabled during measurement) is about 30us. > Extrapolated to a 74MHz ARM SoC it will sum up to ~ 90-120us, what makes > it purely useless. I've used it for various tasks, and I know others who have. I wouldn't recommend it in its present form for deep scheduling tweaks or debugging kernel race conditions (which it is more likely to mask than it is to find), but inapplicability there hardly makes it worthless for other things. > > Summary: > > 1. The code is not doing what it claims to do. I'm guessing the sense of this is in the micro-claims which are implied (e.g. runs lockless and therefore avoids cache thrashing), rather than the high-level claim of providing useful information in some situations. It clearly does the latter. At least is has for me. > 2. The code adds unnecessary overhead I agree it could be improved. The threshold for "unnecessary" varies by task. > 3. It's not useful for low speed systems. I've used it on low speed systems. > Question: > Why is the code included ? See above. By the way, don't think that your comments are not appreciated. I'm not particularly glued to any specific part of the implementation. I'm excited to see tracing discussed here, if only to avoid duplicate efforts and point out danger areas, for multiple tracing projects that I am aware of. ============================= Tim Bird Architecture Group Chair, CE Linux Forum Senior Staff Engineer, Sony Electronics ============================= ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 23:22 ` 2.6.11-rc1-mm1 Tim Bird @ 2005-01-15 0:24 ` Thomas Gleixner 2005-01-15 1:27 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-16 16:18 ` 2.6.11-rc1-mm1 Christoph Hellwig 2005-01-15 13:08 ` [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) Thomas Gleixner 1 sibling, 2 replies; 142+ messages in thread From: Thomas Gleixner @ 2005-01-15 0:24 UTC (permalink / raw) To: Tim Bird; +Cc: LKML Hi Tim, On Fri, 2005-01-14 at 15:22 -0800, Tim Bird wrote: > [ Lots of excellent criticisms omitted.] Thanks for the compliment :) > I don't want to be argumentative, but possibly (to answer your last > question first), there are twofold reasons to put this in -mm: > - there's no tracing infrastructure in the kernel now (except for > kprobes - which provides hooks for creating tracepoints dynamically, > but not 1) supporting infrastructure for timestamping, managing event > data, etc., and 2) a static list of generally useful tracepoints. > - to generate this discussion. I have no objection at all to put instrumentation into the kernel. Quite the contrary, I would appreciate it. Putting tracepoints into the kernel is great. Providing a trace/log/instrumentation framework is great. Adding the given overhead is not. > I've used it for various tasks, and I know others who have. I wouldn't > recommend it in its present form for deep scheduling tweaks or debugging > kernel race conditions (which it is more likely to mask than > it is to find), but inapplicability there hardly makes it worthless for > other things. Putting a 200k patch into the kernel for limited usage and maybe restricting a generic simple non intrusive and more generic implementation by its mere presence is making it inapplicable enough. Merge the instrumentation points from ltt and other projects like DSKI and the places where in kernel instrumentation for specific purposes is already available and use a simple and effective framework which moves the burden into postprocessing and provides a simple postmortem dump interface, is the goal IMHO. When this is available, trace tool developers can concentrate on postprocessing improvement rather than moving postprocessing incapabilities into the kernel. > By the way, don't think that your comments are not appreciated. > I'm not particularly glued to any specific part of the implementation. > I'm excited to see tracing discussed here, if only to avoid > duplicate efforts and point out danger areas, for multiple tracing > projects that I am aware of. So I'm I. tglx ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-15 0:24 ` 2.6.11-rc1-mm1 Thomas Gleixner @ 2005-01-15 1:27 ` Karim Yaghmour 2005-01-16 16:18 ` 2.6.11-rc1-mm1 Christoph Hellwig 1 sibling, 0 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-15 1:27 UTC (permalink / raw) To: tglx; +Cc: Tim Bird, LKML Thomas Gleixner wrote: > Putting a 200k patch into the kernel for limited usage and maybe > restricting a generic simple non intrusive and more generic > implementation by its mere presence is making it inapplicable enough. I think you've missed the other thread where people are claiming that it's so generic as to be arcane ... Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-15 0:24 ` 2.6.11-rc1-mm1 Thomas Gleixner 2005-01-15 1:27 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-16 16:18 ` Christoph Hellwig 1 sibling, 0 replies; 142+ messages in thread From: Christoph Hellwig @ 2005-01-16 16:18 UTC (permalink / raw) To: Thomas Gleixner; +Cc: Tim Bird, LKML On Sat, Jan 15, 2005 at 01:24:16AM +0100, Thomas Gleixner wrote: > Putting a 200k patch into the kernel for limited usage and maybe > restricting a generic simple non intrusive and more generic > implementation by its mere presence is making it inapplicable enough. > > Merge the instrumentation points from ltt and other projects like DSKI > and the places where in kernel instrumentation for specific purposes is > already available and use a simple and effective framework which moves > the burden into postprocessing and provides a simple postmortem dump > interface, is the goal IMHO. > > When this is available, trace tool developers can concentrate on > postprocessing improvement rather than moving postprocessing > incapabilities into the kernel. I completely agree with that statement. We've been working in most areas of the kernel to move or keep complexity and policy in userspace. The same should be true for a tracing framework. ^ permalink raw reply [flat|nested] 142+ messages in thread
* [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) 2005-01-14 23:22 ` 2.6.11-rc1-mm1 Tim Bird 2005-01-15 0:24 ` 2.6.11-rc1-mm1 Thomas Gleixner @ 2005-01-15 13:08 ` Thomas Gleixner 2005-01-16 2:09 ` Karim Yaghmour 1 sibling, 1 reply; 142+ messages in thread From: Thomas Gleixner @ 2005-01-15 13:08 UTC (permalink / raw) To: Tim Bird; +Cc: LKML, Karim Yaghmour </Flame off> On Fri, 2005-01-14 at 15:22 -0800, Tim Bird wrote: > but not 1) supporting infrastructure for timestamping, managing event > data, etc., and 2) a static list of generally useful tracepoints. Both points are well taken. Thats the essential minimum what instrumentation needs. I'd like to see this infrastructure usable for all kinds of instrumentation mechanisms which are built in to the kernel already or functions which are used for similar purposes in experimental trees and other instrumentation related projects. This requires to seperate the backend from the infrastructure, so you can chose from a set of backends which fit best for the intended use. One of those backends is LTT+relayfs. I really respect the work you have done there, but please accept that I just see the limitations and try to figure out a way to make it more generic and flexible before it is cemented into the kernel and makes it hard to use for other interesting instrumentation aspects and maybe enforces redundant implementation of infrastructure related functionality. E.g. tracking down timing related issues can make use from such functionality if the infrastructure is provided seperately. I guess a lot of developers would be happy to use it when it is already around in the kernel and it can help testers for giving better information to developers. tglx ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) 2005-01-15 13:08 ` [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) Thomas Gleixner @ 2005-01-16 2:09 ` Karim Yaghmour 2005-01-16 3:11 ` Roman Zippel 0 siblings, 1 reply; 142+ messages in thread From: Karim Yaghmour @ 2005-01-16 2:09 UTC (permalink / raw) To: tglx; +Cc: Tim Bird, LKML, Andrew Morton Hello Thomas, I don't mind having a general discussion about instrumentation, but it has to be understood that the topic is so general and means so many different things to different people that we are unlikely to reach any useful consensus. Believe me, it's not for the lack of trying. More below. Thomas Gleixner wrote: > </Flame off> :D > One of those backends is LTT+relayfs. > I really respect the work you have done there, but please accept that I > just see the limitations and try to figure out a way to make it more > generic and flexible before it is cemented into the kernel and makes it > hard to use for other interesting instrumentation aspects and maybe > enforces redundant implementation of infrastructure related > functionality. > > E.g. tracking down timing related issues can make use from such > functionality if the infrastructure is provided seperately. > I guess a lot of developers would be happy to use it when it is already > around in the kernel and it can help testers for giving better > information to developers. I would invite you to review the history behind LTT and the history behind the efforts to get LTT integrated in the kernel (which are two separate topics.) If you look back, you will see that I worked very hard trying to get people to think about a common framework and that I and others made numerous suggestions in this regard. Here are a few examples: - DProbes (kprobes ancestor): Shortly after dprobes came out in 2000, I was one of the first to suggest that there could be interfacing between both to allow dynamically added trace points. We worked with, and eventually joined forces with, the IBM team working on this and very early on, LTT and DProbes were interfacing: http://marc.theaimsgroup.com/?l=linux-kernel&m=97079714009328&w=2 - OProfile: When time came to integrate oprofile in the kernel, I tried to push for oprofile to use ltt as it's logging engine (to John's utter horror.) relayfs didn't exist at the time, and obviously oprofile made it in without relying on ltt. Here's a posting from July 2002 where I suggested oprofile rely on ltt. In that same posting I listed a number of drivers/subsystems that already contained tracing statements. Obviously I was pointing out that there was an opportunity to create a common, uniform infrastructure based on ltt: http://marc.theaimsgroup.com/?l=linux-kernel&m=102624656615567&w=2 - Syscalltrack: In replying to a posting of someone looking for tracing info, there was a brief discussion as to how syscalltrack could use ltt instead of: a) redirecting the syscall table, b) have its own buffering mechanism. Again, relayfs didn't exist at the time: http://marc.theaimsgroup.com/?l=linux-kernel&m=102822343523369&w=2 - Event logging: When there was discussion about event logging, there was suggestion to use ltt's engine. Again, relayfs wasn't there: http://marc.theaimsgroup.com/?l=linux-kernel&m=101836133400796&w=2 And there are many other cases. As you can see, it's not as if I didn't try to have this discussion before. Unfortunately, interest in this was rather limited. In addition, and this is a very important issue, quite a few kernel developers mistook LTT for a kernel debugging tool, which it was never meant to be. When, in fact, if you ask those who have looked at using it for that purpose (try Marcelo or Andrea) you will see that they didn't find it to be appropriate for them. And rightly so, it was never meant for that purpose. Even lately, when I suggested Ingo try using relayfs instead of his custom tracing code for his preemption work, he looked at it and said that it wasn't suited, but would consider reusing parts of it if it were in the kernel. So, in general, one thing I learned over the years is to not touch the topic of kernel debugging even with a 10 foot poll when discussing LTT. What you are hinting at here (mention of developers vs. testers, for example), and your stated preference for the type of ring-buffer you described earlier clearly goes in the direction I've learned to avoid: buffering support for the general purpose of kernel debugging. Let me say outright that I see the relevance of what you are looking for, but let me also say that what we tried to achieve with relayfs is to provide a general mechanism for kernel subsystems that need to convey large amounts of data to user-space. We did not attempt to solve the problem of providing a buffering framework for core kernel debugging. As I mentioned to Ingo in the mail I referred to earlier regarding the type of buffering you are looking for: > The above tracer may indeed be very appropriate for kernel development, > but it doesn't provide enough functionality for the requirements of > mainstream users. If there is interest for using either relayfs and/or ltt for that purpose, then this is an entirely different mandate and a few things would need to be added for that to happen. For starters, we could add another mode to relayfs. Currently, it supports a locking and a lockless buffering scheme. We could also have ring-buffer mode which would function very much as you, and Ingo before, have described. But let me be crystal clear about this: don't count on me to make a case for it on LKML. I've had enough flak as it is. If you believe this is necessary, then you are welcome to make a case for it, and obtain support from others on LKML. Obviously, as the maintainers of relayfs, we see no reason to avoid extending it for purposes others may find it useful for and/or accepting patches to that end, if indeed such extensions don't preclude its adoption in the mainline kernel. Hope this helps clarify things a little, Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) 2005-01-16 2:09 ` Karim Yaghmour @ 2005-01-16 3:11 ` Roman Zippel 2005-01-16 4:23 ` Karim Yaghmour 0 siblings, 1 reply; 142+ messages in thread From: Roman Zippel @ 2005-01-16 3:11 UTC (permalink / raw) To: Karim Yaghmour; +Cc: tglx, Tim Bird, LKML, Andrew Morton Hi, On Sat, 15 Jan 2005, Karim Yaghmour wrote: > In addition, and this is a very important issue, quite a few > kernel developers mistook LTT for a kernel debugging tool, which > it was never meant to be. When, in fact, if you ask those who have > looked at using it for that purpose (try Marcelo or Andrea) you will > see that they didn't find it to be appropriate for them. And > rightly so, it was never meant for that purpose. Even lately, when > I suggested Ingo try using relayfs instead of his custom tracing > code for his preemption work, he looked at it and said that it > wasn't suited, but would consider reusing parts of it if it were > in the kernel. Well, that's really a core problem. We don't want to duplicate infrastructure, which practically does the same. So if relayfs isn't usable in this kind of situation, it really raises the question whether relayfs is usable at all. We need to make relayfs generally usable, otherwise it will join the fate of devfs. bye, Roman ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) 2005-01-16 3:11 ` Roman Zippel @ 2005-01-16 4:23 ` Karim Yaghmour 2005-01-16 23:43 ` Thomas Gleixner 0 siblings, 1 reply; 142+ messages in thread From: Karim Yaghmour @ 2005-01-16 4:23 UTC (permalink / raw) To: Roman Zippel; +Cc: tglx, Tim Bird, LKML, Andrew Morton, Tom Zanussi Hello Roman, Roman Zippel wrote: > On Sat, 15 Jan 2005, Karim Yaghmour wrote: >>In addition, and this is a very important issue, quite a few >>kernel developers mistook LTT for a kernel debugging tool, which >>it was never meant to be. When, in fact, if you ask those who have >>looked at using it for that purpose (try Marcelo or Andrea) you will >>see that they didn't find it to be appropriate for them. And >>rightly so, it was never meant for that purpose. Even lately, when >>I suggested Ingo try using relayfs instead of his custom tracing >>code for his preemption work, he looked at it and said that it >>wasn't suited, but would consider reusing parts of it if it were >>in the kernel. > > Well, that's really a core problem. We don't want to duplicate > infrastructure, which practically does the same. So if relayfs isn't > usable in this kind of situation, it really raises the question whether > relayfs is usable at all. We need to make relayfs generally usable, > otherwise it will join the fate of devfs. Hmm, coming from you I will take this is a pretty strong endorsement for what I was suggesting earlier: provide a basic buffering mode in relayfs to be used in kernel debugging. However, it must be understood that this is separate from the existing modes and ltt, for example, could not use such a basic infrastructure. If this is ok with you, and no one wants to complain too loudly about this, I will go ahead and add this to our to-do list for relayfs. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) 2005-01-16 4:23 ` Karim Yaghmour @ 2005-01-16 23:43 ` Thomas Gleixner 2005-01-17 1:54 ` Karim Yaghmour 0 siblings, 1 reply; 142+ messages in thread From: Thomas Gleixner @ 2005-01-16 23:43 UTC (permalink / raw) To: Karim Yaghmour; +Cc: Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi On Sat, 2005-01-15 at 23:23 -0500, Karim Yaghmour wrote: > > Well, that's really a core problem. We don't want to duplicate > > infrastructure, which practically does the same. So if relayfs isn't > > usable in this kind of situation, it really raises the question whether > > relayfs is usable at all. We need to make relayfs generally usable, > > otherwise it will join the fate of devfs. > > Hmm, coming from you I will take this is a pretty strong endorsement > for what I was suggesting earlier: provide a basic buffering mode > in relayfs to be used in kernel debugging. However, it must be > understood that this is separate from the existing modes and ltt, > for example, could not use such a basic infrastructure. If this is > ok with you, and no one wants to complain too loudly about this, I > will go ahead and add this to our to-do list for relayfs. This implies to seperate - infrastructure - event registration - transport mechanism tglx ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) 2005-01-16 23:43 ` Thomas Gleixner @ 2005-01-17 1:54 ` Karim Yaghmour 2005-01-17 10:26 ` Thomas Gleixner 2005-01-19 7:13 ` Werner Almesberger 0 siblings, 2 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-17 1:54 UTC (permalink / raw) To: tglx Cc: Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi, Richard J Moore Thomas Gleixner wrote: > This implies to seperate > > - infrastructure > - event registration > - transport mechanism Like I said in my first response: we can't be everything for everbody, the requirements are just too broad. ISO tried it with OSI. Have a look at net/* for the result. Currently, LTT provides the first two in one piece, and relayfs provides the third. Like I acknowledged earlier, there is room for generalizing the transport mechanism, and I'm thinking of amending the relayfs API proposal further and rename the modes to make them more straight-forward: - Managed (locking or lockless.) - Ad-Hoc (which works like Ingo, yourself, and others have requested.) If you really want to define layers, then there are actually four layers: 1- hooking mechanism 2- event definition / registration 3- event management infrastructure 4- transport mechanism LTT currently does 1, 2 & 3. Clearly, as in the mail I refered to earlier, there is code in the kernel that already does 1, 2, 3, and 4 in very hardwired/ad-hoc fashion and there isn't anyone asking for them to remove it. We're offering 4 separately and are putting LTT on top of it. If you want to get 1 & 2 separately, have a look at kernel hooks and genevent: http://www-124.ibm.com/developerworks/oss/linux/projects/kernelhooks/ http://www.listserv.shafik.org/pipermail/ltt-dev/2003-January/000408.html We'd gladly take a serious look at using the former if it was included, and there is work in progress being conducted on getting the latter being the standard way for declaring LTT events instead of using a static ltt-events.h. Five years ago, there was a discussion about integrating GKHI into the kernel (the kernel hooks ancestor). Have a look for yourself as to the response to this suggestion (basically people weren't ready to accept a generalized hooking mechanism without a defined set of hooks, and then others didn't like the idea at all because creating general hooks in the kernel which anybody can register to creates legal and maintenance problems ... basically it's a can of worms): http://marc.theaimsgroup.com/?l=linux-kernel&m=97371908916365&w=2 There's only so much we can push into the kernel in the same time. Not to mention that before you can be generic, you've got to have some specific implementation to start working off on. I believe that what we've ironed out through the discussion of the past two days is a good basis. There is some irony in all this. For years, we were told that we couldn't make it into the kernel because we were perceived as providing a kernel debugging tool, and now that we're starting to get our things seriously reviewed we're being told that maybe it ain't really that useful because those who want to do kernel debugging can't use it as-is ... go figure. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) 2005-01-17 1:54 ` Karim Yaghmour @ 2005-01-17 10:26 ` Thomas Gleixner 2005-01-17 20:34 ` Karim Yaghmour 2005-01-19 7:13 ` Werner Almesberger 1 sibling, 1 reply; 142+ messages in thread From: Thomas Gleixner @ 2005-01-17 10:26 UTC (permalink / raw) To: Karim Yaghmour Cc: Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi, Richard J Moore On Sun, 2005-01-16 at 20:54 -0500, Karim Yaghmour wrote: > If you really want to define layers, then there are actually four > layers: > 1- hooking mechanism > 2- event definition / registration > 3- event management infrastructure > 4- transport mechanism > > LTT currently does 1, 2 & 3. Clearly, as in the mail I refered to > earlier, there is code in the kernel that already does 1, 2, 3, > and 4 in very hardwired/ad-hoc fashion and there isn't anyone asking > for them to remove it. We're offering 4 separately and are putting > LTT on top of it. If you want to get 1 & 2 separately, have a look > at kernel hooks and genevent: I know that there is enough code which does x,y,z hardcoded/hardwired already. Thats the point. Adding another hardwired implementation does not give us a possibility to solve the hardwired problem of the already available stuff. tglx ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) 2005-01-17 10:26 ` Thomas Gleixner @ 2005-01-17 20:34 ` Karim Yaghmour 2005-01-17 22:18 ` Thomas Gleixner 0 siblings, 1 reply; 142+ messages in thread From: Karim Yaghmour @ 2005-01-17 20:34 UTC (permalink / raw) To: tglx Cc: Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi, Richard J Moore Thomas Gleixner wrote: > Thats the point. Adding another hardwired implementation does not give > us a possibility to solve the hardwired problem of the already available > stuff. Well then, like I said before, you know what you need to do: http://www-124.ibm.com/developerworks/oss/linux/projects/kernelhooks/ Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) 2005-01-17 20:34 ` Karim Yaghmour @ 2005-01-17 22:18 ` Thomas Gleixner 2005-01-17 23:57 ` Karim Yaghmour 0 siblings, 1 reply; 142+ messages in thread From: Thomas Gleixner @ 2005-01-17 22:18 UTC (permalink / raw) To: Karim Yaghmour Cc: Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi, Richard J Moore On Mon, 2005-01-17 at 15:34 -0500, Karim Yaghmour wrote: > Thomas Gleixner wrote: > > Thats the point. Adding another hardwired implementation does not give > > us a possibility to solve the hardwired problem of the already available > > stuff. > > Well then, like I said before, you know what you need to do: > http://www-124.ibm.com/developerworks/oss/linux/projects/kernelhooks/ Oh, I guess my English must be really bad. I was talking about seperation of layers, so why do I need kernelhooks ? The seperation of layers makes it possible to actually reuse functionality and gives the possibility that existing hardwired stuff can be cleaned up to use the new functionality too. If we add another hardwired implementation then we do not have said benefits. tglx ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) 2005-01-17 22:18 ` Thomas Gleixner @ 2005-01-17 23:57 ` Karim Yaghmour 2005-01-18 8:46 ` Thomas Gleixner 0 siblings, 1 reply; 142+ messages in thread From: Karim Yaghmour @ 2005-01-17 23:57 UTC (permalink / raw) To: tglx Cc: Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi, Richard J Moore Thomas Gleixner wrote: > If we add another hardwired implementation then we do not have said > benefits. Please stop handwaving. Folks like Andrew, Christoph, Zwane, Roman, and others actually made specific requests for changes in the code. What makes you think you're so special that you think you are entitled to stay on the side and handwave about concepts. If there is a limitation with the code, please present actual snippets that need to be changed and suggest alternatives. That's what everyone else does on this list. If you want to clean-up the existing tracing code in the kernel, then here are some ltt calls you may be interested in: int ltt_create_event(char *event_type, char *event_desc, int format_type, char *format_data); int ltt_log_raw_event(int event_id, int event_size, void *event_data); And here's an actual example: ... delta_id = ltt_create_event("Delta", NULL, CUSTOM_EVENT_FORMAT_TYPE_HEX, NULL); ... ltt_log_raw_event(delta_id, sizeof(a_delta_event), &a_delta_event); ... ltt_destroy_event(delta_id); You can then use LibLTT to read the trace and extract your custom events and format your binary data as it suits you. Save the bandwidth and start cleaning. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) 2005-01-17 23:57 ` Karim Yaghmour @ 2005-01-18 8:46 ` Thomas Gleixner 2005-01-18 16:31 ` Karim Yaghmour 0 siblings, 1 reply; 142+ messages in thread From: Thomas Gleixner @ 2005-01-18 8:46 UTC (permalink / raw) To: Karim Yaghmour Cc: Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi, Richard J Moore On Mon, 2005-01-17 at 18:57 -0500, Karim Yaghmour wrote: > Thomas Gleixner wrote: > > If we add another hardwired implementation then we do not have said > > benefits. > > Please stop handwaving. Folks like Andrew, Christoph, Zwane, Roman, > and others actually made specific requests for changes in the code. > What makes you think you're so special that you think you are > entitled to stay on the side and handwave about concepts. So the points you added to your todo list which were brought up by me are worthless ? I'm not handwaving. I started this RFC to move the discussion into a general discussion about instrumentation. A couple of people are seriosly interested to do this. If you are not interested then ignore the thread, but you're way not in a position to tell me to shut up. You turned this thread into your LTT prayer wheel. Roman pointed out your unwillingness to create a common framework before. But I have to disagree with him in one point. It's not amazing, it's annoying. > If there is a limitation with the code, please present actual > snippets that need to be changed and suggest alternatives. That's > what everyone else does on this list. I pointed you to actually broken code and you accused me of throwing mud. > Save the bandwidth Please remove me from cc, it's a good start to save bandwidth. > and start cleaning. Yes, I did already start cleaning cat ../broken-out/ltt* | patch -p1 -R tglx ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) 2005-01-18 8:46 ` Thomas Gleixner @ 2005-01-18 16:31 ` Karim Yaghmour 0 siblings, 0 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-18 16:31 UTC (permalink / raw) To: tglx Cc: Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi, Richard J Moore Thomas, Thomas Gleixner wrote: > Yes, I did already start cleaning > > cat ../broken-out/ltt* | patch -p1 -R :D If it gives you a warm and fuzzy feeling to have the last cheap-shot, then I'm all for it, it is of no consequence anyway. And _please_ don't forget to answer this very email with something of the same substance. For my part I consider that I've invested a substantial amount of time in responding to both your conceptual and practical feedback, as the archives clearly show. That being said, I have to thank you for making sure that all the obvious questions have been asked. I now have more than a dozen archive links of my answers to those. I'll sure come in handy when writing an FAQ. Thanks again, Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) 2005-01-17 1:54 ` Karim Yaghmour 2005-01-17 10:26 ` Thomas Gleixner @ 2005-01-19 7:13 ` Werner Almesberger 2005-01-19 17:38 ` Karim Yaghmour 1 sibling, 1 reply; 142+ messages in thread From: Werner Almesberger @ 2005-01-19 7:13 UTC (permalink / raw) To: Karim Yaghmour Cc: tglx, Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi, Richard J Moore >From all I've heard and seen of LTT (and I have to admit that most of it comes from reading this thread, not from reading the code), I have the impression that it may try to be a bit too specialized, and thus might miss opportunities for synergy. You must be getting tired of people trying to redesign things from scratch, but maybe you'll humor me anyway ;-) Karim Yaghmour wrote: > If you really want to define layers, then there are actually four > layers: > 1- hooking mechanism > 2- event definition / registration > 3- event management infrastructure > 4- transport mechanism For 1, kprobes would seem largely sufficient. In cases where you don't have a usable attachment point (e.g. in the middle of a function and you need access to variables with unknown location), you can add lightweight instrumentation that arranges the code flow suitably. [1, 2] 2 and 3 should be the main domain of LTT, with 2 sitting on top of kprobes. kprobes currently doesn't have a nice way for describing handlers, but that can be fixed [3]. But you probably don't need a "nice" interface right now, but might be satisfied with one that works and is fast (?) >From the discussion, it seems that the management is partially done by relayfs. I find this a little strange. E.g. instead of filtering events, you may just not generate them in the first place, e.g. by not placing a probe, or by filtering in LTT, before submitting the event. Timestamps may be fine either way. Restoring sequence should be a task user-space can handle: in the worst case, you'd have to read and merge from #cpus streams. Seeking works in that context, too. Last but not least, 4 should be simple. Particularly since you're worried about extreme speeds, there should be as little processing as you can afford. If you need to seek efficiently (do you, really ?), you may not even want message boundaries at that level. Something that isn't entirely clear to me is if you also need to aggregate information in buffers. E.g. by updating a record until is has been retrieved by user space, or by updating a record when there is no space to create a new one. Such functionality would add complexity and needs tight sychronization with the transport. [1] I've seen the argument that kprobes aren't portable. This strikes me a highly questionable. Even if an architecture doesn't have a trap instruction (or equivalent code sequence) that is at least as short as the shortest instruction, you can always fall back to adding instrumentation [2]. Also, if you know where your basic blocks are, you may be able to use traps that span multiple instructions. I recall that things of this kind are already planned for kprobes. [2] See the "reliable markers" of umlsim from umlsim.sf.net. Implementation: cd umlsim/lib; make; tail -50 markers_kernel.h Examples: cd umlsim/sim/tests; cat sbug.marker They're basically extra-light markup in the source code. Works on ia32, but I haven't found a way to get the assembler to cooperate for amd64, yet. [3] I've already solved this problem in umlsim: there, I have a Perl/C-like scripting language that allows handlers to do pretty much anything they want. Of course, kprobes would want pre-compiled C code, not some scripts, but I think the design could be developped in a direction that would allow both. Will take a while, but since I'll eventually have to rewrite the "microcode" anyway, ... So my comments are basically as follows: 1) kprobes seems like a suitable and elegant mechanism for placing all the hooks LTT needs, so I think that it would be better to build on this basis, and extend it where necessary, than to build yet another specialized variant in parallel. 2) LTT should do what it is good at, and not have to worry about the rest (i.e. supporting infrastructure). 3) relayfs should be lean and fast, as you intend it to be, so that non-LTT tracing or fnord debugging fnord code may find it useful, too. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) 2005-01-19 7:13 ` Werner Almesberger @ 2005-01-19 17:38 ` Karim Yaghmour 0 siblings, 0 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-19 17:38 UTC (permalink / raw) To: Werner Almesberger Cc: tglx, Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi, Richard J Moore Werner Almesberger wrote: >>From all I've heard and seen of LTT (and I have to admit that most > of it comes from reading this thread, not from reading the code), Might I add that this is part of the problem ... No personal offence intended, but there's been _A LOT_ of things said about LTT that were based on third-hand account and no direct contact with the toolset/code. And part of the problem is that _many_ people on this list, and elsewhere, have done some form of tracing or another as part of their development, so they all have their idea of how this is best done. Yet, while such experience can help provide additional ideas to LTT's development, it also often requires re-explaining to every new suggestor why we added features he couldn't imagine would be useful to any of his/her own tracing needs ... Sometimes I wish my interests lied in some arcane feature that few had ever played with ;) IOW, while I don't discount anybody else's experience with tracing, please give us at least the benefit of the doubt by actually: a) Looking at the code b) Looking at the mailing list archives c) Asking us questions directly related to the code > I have the impression that it may try to be a bit too specialized, > and thus might miss opportunities for synergy. Bare with me on this one ... > You must be getting tired of people trying to redesign things from > scratch, but maybe you'll humor me anyway ;-) Hey, from you Werner I'll take anything. It's always a pleasure talking with you :) > Karim Yaghmour wrote: > >>If you really want to define layers, then there are actually four >>layers: >>1- hooking mechanism >>2- event definition / registration >>3- event management infrastructure >>4- transport mechanism > > > For 1, kprobes would seem largely sufficient. In cases where you > don't have a usable attachment point (e.g. in the middle of a > function and you need access to variables with unknown location), > you can add lightweight instrumentation that arranges the code > flow suitably. [1, 2] Let me say outright, as I said to Andi early on in the sister thread, that I have no problems with having the trace points being fed by kprobes. In fact, in 2000, way back before kprobes even existed, LTT was already interfacing with DProbes for dynamic insertion of trace points. ... There I said it ... now watch me have to repeat this yet again later on ... :/ However, kprobes is not magic: a) Like I said to Andi: > As far as kprobes go, then you still need to have some form or another > of marking the code for key events, unless you keep maintaining a set > of kprobes-able points separately, which really makes it unusable for > the rest of us, as the users of LTT have discovered over time (having > to create a new patch for every new kernel that comes out.) b) Like I said to Andrew back in July: > I've double-checked what I already knew about kprobes and have looked again > at the site and the patch, and unless there's some feature of kprobes I don't > know about that allows using something else than the debug interrupt to add > hooks, ... > Generating new interrupts is simply unacceptable for LTT's functionality. > Not to mention that it breaks LTT because tracing something will generate > events of its own, which will generating tracing events of their own ... > recursion. Ok, you can argue about the recursion thing with an "if()", but you'll have to admit that like in the case I described to Roman: > ... Say you're getting > 2MB/s of data (which is not unrealistic on a loaded system.) That means > that if I'm tracing for 2 days, I've got 345GB of data (~7.5GB/hour). IOW, something like 200,000events/s (average of 10bytes/event). Do I really need to explain that 200,000 traps/interrupts per second is not something you want ... ? But don't despair, like I said to Andi: > So lately I've been thinking that there may be a middle-ground here > where everyone could be happy. Define three states for the hooks: > disabled, static, marker. The third one just adds some info into > System.map for allowing the automation of the insertion of kprobes > hooks (though you would still need the debugging info to find the > values of the variables that you want to log.) Hence, you get to > choose which type of poison you prefer. For my part, I think the > noop/early-check should be sufficient to get better performance from > the existing hook-set. I have received very little feedback on this suggestion, though I really think it's worth entertaining, especially with your mention of uml-sim markers further below. As for the location of ltt trace points, then they are very rarely at function boundaries. Here's a classic: prepare_arch_switch(rq, next); ltt_ev_schedchange(prev, next); prev = context_switch(rq, prev, next); > 2 and 3 should be the main domain of LTT, with 2 sitting on top > of kprobes. kprobes currently doesn't have a nice way for > describing handlers, but that can be fixed [3]. But you probably > don't need a "nice" interface right now, but might be satisfied > with one that works and is fast (?) The functions have been there for DProbes for 5 years: int ltt_create_event(char *event_type, char *event_desc, int format_type, char *format_data) int ltt_log_raw_event(int event_id, int event_size, void *event_data) >>From the discussion, it seems that the management is partially > done by relayfs. I find this a little strange. E.g. instead of > filtering events, you may just not generate them in the first > place, e.g. by not placing a probe, or by filtering in LTT, > before submitting the event. Like I said to Andi: > ... For one thing, the current > ltt hooks aren't as fast as they should be (i.e. we check whether > the tracing is enabled for a certain event way too far in the code-path.) > This should be rather simple to fix. And I've already got the code snippet to fix this ready. > Timestamps may be fine either way. Restoring sequence should be > a task user-space can handle: in the worst case, you'd have to > read and merge from #cpus streams. Seeking works in that context, > too. > > Last but not least, 4 should be simple. Particularly since you're > worried about extreme speeds, there should be as little > processing as you can afford. If you need to seek efficiently > (do you, really ?), you may not even want message boundaries at > that level. Like I said to Roman: > Removing this data would require more data for each event to > be logged, and require parsing through the trace before reading it in > order to obtain markers allowing random access. This wouldn't be so > bad if we were expecting users to use LTT sporadically for very short > periods of time. However, given ltt's target audience (i.e. need to > run traces for hours, maybe days, weeks), traces would rapidely become > useless because while plowing through a few hundred KBs of data and > allocating RAM for building internal structures as you go is fine, > plowing through tens of GBs of data, possibly hundreds, requires that > you come up with a format that won't require unreasonable resources > from your system, while incuring negligeable runtime costs for generating > it. We believe the format we currently have achieves the right balance > here. What we've agreed with Roman is that relayfs won't write anything at the boundaries. Its clients will provide it with callbacks to be invoked at buffer boundaries. When invoked, said callbacks can add whatever they feel is important to the buffer, relayfs doesn't care. > Something that isn't entirely clear to me is if you also need to > aggregate information in buffers. E.g. by updating a record until > is has been retrieved by user space, or by updating a record > when there is no space to create a new one. Such functionality > would add complexity and needs tight sychronization with the > transport. If I understand you correctly, you are talking about the fact that the transport layer's management of the buffers is syncrhonized with some user-space entity that consumes the buffers produced and talks back to relayfs (albeit indirectly) to let it know that said buffers are now available? If so, then that's why I suggested elsewhere that we have two modes for relayfs: managed and adhoc. In the former, you have the required mechanics for what I just described. In the latter, you have a very basic buffering scheme that cares nothing about user-space synchronization. > [1] I've seen the argument that kprobes aren't portable. This > strikes me a highly questionable. Even if an architecture > doesn't have a trap instruction (or equivalent code sequence) > that is at least as short as the shortest instruction, you > can always fall back to adding instrumentation [2]. Also, if > you know where your basic blocks are, you may be able to > use traps that span multiple instructions. I recall that > things of this kind are already planned for kprobes. I have nothing against kprobes. People keep refering to it as if it magically made all the related problems go away, and it doesn't. See above. > [2] See the "reliable markers" of umlsim from umlsim.sf.net. > Implementation: cd umlsim/lib; make; tail -50 markers_kernel.h > Examples: cd umlsim/sim/tests; cat sbug.marker > They're basically extra-light markup in the source code. > Works on ia32, but I haven't found a way to get the assembler > to cooperate for amd64, yet. Nothing precludes us to move in this direction once something is in the kernel, it's all currently hidden away in a .h, and it would be the same with this. > [3] I've already solved this problem in umlsim: there, I have a > Perl/C-like scripting language that allows handlers to do > pretty much anything they want. Of course, kprobes would > want pre-compiled C code, not some scripts, but I think the > design could be developped in a direction that would allow > both. Will take a while, but since I'll eventually have to > rewrite the "microcode" anyway, ... Like I said, nothing precludes us ... > So my comments are basically as follows: > > 1) kprobes seems like a suitable and elegant mechanism for > placing all the hooks LTT needs, so I think that it would > be better to build on this basis, and extend it where > necessary, than to build yet another specialized variant > in parallel. Whichever way you look at this, you need to mark the code. What's in the .h is something we can tweak ad-nauseam. > 2) LTT should do what it is good at, and not have to worry > about the rest (i.e. supporting infrastructure). I'm guessing that when you're talking about "supporting infrastructure" you are refering to the trace statements. If so, please see above. Also note that without the existing marker set LTT is useless to its users (application developers, sysadmins, power users, etc.) > 3) relayfs should be lean and fast, as you intend it to be, so > that non-LTT tracing or fnord debugging fnord code may find > it useful, too. relayfs has already been used for many non-LTT related. Ask Hubertus or Jamal, to name a few. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 8:23 2.6.11-rc1-mm1 Andrew Morton ` (7 preceding siblings ...) 2005-01-14 22:46 ` 2.6.11-rc1-mm1 Thomas Gleixner @ 2005-01-14 22:48 ` Andre Eisenbach 2005-01-15 8:42 ` 2.6.11-rc1-mm1 Miklos Szeredi 2005-01-15 8:45 ` 2.6.11-rc1-mm1 Miklos Szeredi [not found] ` <1105740276.8604.83.camel@tglx.tec.linutronix.de> ` (2 subsequent siblings) 11 siblings, 2 replies; 142+ messages in thread From: Andre Eisenbach @ 2005-01-14 22:48 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Fri, 14 Jan 2005 00:23:52 -0800, Andrew Morton <akpm@osdl.org> wrote: > - Added FUSE (filesystem in userspace) for people to play with. Am agnostic > as to whether it should be merged (haven't read it at all closely yet, > either), but I am impressed by the amount of care which has obviously gone > into it. Opinions sought. This is great news! As a long time user of KDE's kio-slaves, I was always missing the kio-slave functionality on the command line and in non-kde programs. FUSE provides a kio-slave interface, but hopefully the inclusion of FUSE in the mm-kernel will cause more "fuse native" filesystems to come out which provide the functionality of the various kio-slaves. Some things I'd like to see (as I am currently using the KIO equivalent) implemented as FUSE fs: - "fish", virtual file access over ssh - "audiocd", virtual audio cd filesystem which copies and encodes audio tracks on the fly - "ftp", virtual file system ftp server access etc.. Imagination is the limit, and since it can be implemented in userspace pretty easily with FUSE, I am looking forward to see what people can come up with and hope that FUSE is here to sray. Cheers, Andre ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 22:48 ` 2.6.11-rc1-mm1 Andre Eisenbach @ 2005-01-15 8:42 ` Miklos Szeredi 2005-01-15 8:45 ` 2.6.11-rc1-mm1 Miklos Szeredi 1 sibling, 0 replies; 142+ messages in thread From: Miklos Szeredi @ 2005-01-15 8:42 UTC (permalink / raw) To: int2str; +Cc: akpm, linux-kernel Some things I'd like to see (as I am currently using the KIO equivalent) implemented as FUSE fs: - "fish", virtual file access over ssh This is already available here: http://sourceforge.net/projects/fuse You need to dowload fuse-2.2-pre3 and sshfs-1.0. It should work on any kernel including the 2.6.10-rc1-mm1 with FUSE compiled in. Miklos ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 22:48 ` 2.6.11-rc1-mm1 Andre Eisenbach 2005-01-15 8:42 ` 2.6.11-rc1-mm1 Miklos Szeredi @ 2005-01-15 8:45 ` Miklos Szeredi 1 sibling, 0 replies; 142+ messages in thread From: Miklos Szeredi @ 2005-01-15 8:45 UTC (permalink / raw) To: int2str; +Cc: akpm, linux-kernel Sorry about the missing quotes. It should read: You wrote: > Some things I'd like to see (as I am currently using the KIO > equivalent) implemented as FUSE fs: > - "fish", virtual file access over ssh This is already available here: http://sourceforge.net/projects/fuse You need to dowload fuse-2.2-pre3 and sshfs-1.0. It should work on any kernel including the 2.6.10-rc1-mm1 with FUSE compiled in. Miklos ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <1105740276.8604.83.camel@tglx.tec.linutronix.de>]
* Re: 2.6.11-rc1-mm1 [not found] ` <1105740276.8604.83.camel@tglx.tec.linutronix.de> @ 2005-01-14 23:09 ` Karim Yaghmour 2005-01-15 0:01 ` 2.6.11-rc1-mm1 Thomas Gleixner 2005-01-16 16:21 ` 2.6.11-rc1-mm1 Christoph Hellwig 0 siblings, 2 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-14 23:09 UTC (permalink / raw) To: tglx; +Cc: Andrew Morton, linux-kernel [repost. first reply had wrong lkml CC.] Hello Thomas, First, thanks for the feedback, it's greatly appreciated. Lots of stuff in here. I don't mean to drop any of your arguments, but I'm going to reply to this in a way that makes this reply and further responses as useful as possible to outsiders. Let me know if you think I've dropped something important. Thomas Gleixner wrote: >> The "non-locking" claim is nice, but a do { } while loop in the slot >> reservation for every event including a do { } while loop in the slow >> path is just a replacement of locking without actually using a lock. I >> don't care whether this is likely or unlikely to happen, it's just bogus >> to add a non constant time path for debugging/tracing purposes. relayfs implements two schemes: lockless and locking. The later uses standard linear locking mechanisms. If you need stringent constant time, you know what to do. >> Default timestamp measuring with do_gettimeofday is also contrary to the >> non locking argument. There is >> a) a lock in there >> b) it might loop because it's a sequential lock. That's true, but that's not a limitation of relayfs per se. We'd gladly use any timing facility available to us. We already use the TSC when available. >> If you have no TSC you can do at least a jiffies + event-number based, >> not so finegrained tracing which gives you at least the timeline of the >> events. Interesting. I've added this to the to-do list. >> There is also no need to do time diff calculations / conversions, this >> can be done in userspace postprocessing. Ah yes, that's the kind of thing that you learn by getting bitten by it. The problem is the size of the data stream. Diffs are an easy and a rather inexpensive way of reducing trace sizes. Logging 2 or 4 more bytes per event when you've got tens of thousands of events occuring per second does have a noticeable impact. If this is really a sticking point, we could provide a way for writing full time-stamps. >> you do. In space constraint systems relayfs is even worse as it needs >> more memory than the plain ringbuffer. Don't get us wrong, we can strip this down to make this a stupid ring- buffer. But the fact of the matter is that in trying to use such a thing, you will find yourself reimplementing the exact things we did for the same purposes. >> The ringbuffer has a nice advantage. In case the system crashes you can >> retrieve the last and therefor most interesting information from the >> ringbuffer without any hassle via BDI or in the worstcase via a serial >> dump. You can even copy the tail of the buffer into a permanent storage >> like buffered SRAM so it can be retrieved after reboot. And there's a reason why you can't do that with relayfs? We've looked at this and interfacing between relayfs and crashdump is trivial. >> Splitting the trace into different paths is nice to have but I don't see >> a single point which cannot be done by a userspace (hostside) >> postprocessing tool. It adds another non time constant component to the >> trace path. Even the per CPU ringbuffers can be nicely synchronized by a >> userspace postprocessing tool without adding complex synchronization >> functions. Again life is a merciless teacher. LTT did initially start with a single eat-your-breakfeast-dinner-and-supper-in-one-place buffer. But that just doesn't scale. If you're doing flight-recording, for example, you need to have a separate channel which contains process creation/exit, otherwise you have a hard time interepreting the data. >> In case of time related tracing it's just overkill. The printk >> information is mostly a string, which can be replaced by the address on >> which the printk is happening. The maybe available arguments can be >> dumped in binary form. All this information can be converted into human >> readable form by postprocessing. I'm sorry, I don't understand your argument here. >> I wonder whether the various formatting options of the trace are really >> of any value. I need neither strings, HEX strings nor XML formatted >> information from the kernel. Max. 8192 Byte of user information makes me >> frown. Tracing is not a copy to userspace function or am I missing >> something ? Dynamically created custom events and events directed by the likes of DProbes need something to write to, and user-space utilities must have a way of determining what format this data was written in. That's all there is to see here. >> All tracepoints are unconditionally compiled into the kernel, whether >> they are enabled or not. Why is it neccecary to check the enabled bit >> for information I'm not interested in ? Why can't I compile this away by >> not enabling the tracepoint at all. But you can. Have a look at include/linux/ltt-events.h: #else /* defined(CONFIG_LTT) */ #define ltt_ev(ID, DATA) #define ltt_ev_trap_entry(ID, EIP) #define ltt_ev_trap_exit() #define ltt_ev_irq_entry(ID, KERNEL) #define ltt_ev_irq_exit() #define ltt_ev_schedchange(OUT, IN) #define ltt_ev_soft_irq(ID, DATA) #define ltt_ev_process(ID, DATA1, DATA2) #define ltt_ev_process_exit(DATA1, DATA2) #define ltt_ev_file_system(ID, DATA1, DATA2, FILE_NAME) #define ltt_ev_timer(ID, SDATA, DATA1, DATA2) #define ltt_ev_memory(ID, DATA) #define ltt_ev_socket(ID, DATA1, DATA2) #define ltt_ev_ipc(ID, DATA1, DATA2) #define ltt_ev_network(ID, DATA) #define ltt_ev_heartbeat() #endif /* defined(CONFIG_LTT) */ >> I don't need to point out the various coding style issues again, but I >> question if >> atomic_set(&var), atomic_read(&var) | bit); >> which can be found on several places is really doing what it's suggests >> to do. If there are actual code snippets you think are broken, we'll gladly fix them. >> I did a short test on a 300MHz PIII box and the maximum time spent in >> the log path (interrupts disabled during measurement) is about 30us. >> Extrapolated to a 74MHz ARM SoC it will sum up to ~ 90-120us, what makes >> it purely useless. Granted tracing is not free, but please avoid spreading FUD without actually carrying out proper testing. We've done quite a large number of tests and we've demonstrated over and over that LTT, and ltt-over- relayfs, is actually very efficient. If you're interested in actual test data, then you may want to check out the following: http://www.opersys.com/ftp/pub/LTT/Documentation/ltt-usenix.ps.gz http://lwn.net/Articles/13870/ We are aware of the cost of the various tracing components, as you can see by my earlier posting about early-checking to minimize the cost of the tracing hooks for kernel compiled with them, and are open for any optimization. If you have any concrete suggestions, save the scrap-everything-I-know-better (which is really unproductive as you would anyway have to go down the same path we have), we are more than willing to entertain them. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 23:09 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-15 0:01 ` Thomas Gleixner 2005-01-15 0:26 ` 2.6.11-rc1-mm1 Andrew Morton 2005-01-15 1:14 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-16 16:21 ` 2.6.11-rc1-mm1 Christoph Hellwig 1 sibling, 2 replies; 142+ messages in thread From: Thomas Gleixner @ 2005-01-15 0:01 UTC (permalink / raw) To: karim; +Cc: Andrew Morton, linux-kernel Hi Karim, On Fri, 2005-01-14 at 18:09 -0500, Karim Yaghmour wrote: > >> The "non-locking" claim is nice, but a do { } while loop in the slot > >> reservation for every event including a do { } while loop in the slow > >> path is just a replacement of locking without actually using a lock. I > >> don't care whether this is likely or unlikely to happen, it's just bogus > >> to add a non constant time path for debugging/tracing purposes. > > relayfs implements two schemes: lockless and locking. The later uses > standard linear locking mechanisms. If you need stringent constant > time, you know what to do. It's not only me, who needs constant time. Everybody interested in tracing will need that. In my opinion its a principle of tracing. The "lockless" mechanism is _FAKE_ as I already pointed out. It replaces locks by do { } while loops. So what ? > >> Default timestamp measuring with do_gettimeofday is also contrary to the > >> non locking argument. There is > >> a) a lock in there > >> b) it might loop because it's a sequential lock. > > >> If you have no TSC you can do at least a jiffies + event-number based, > >> not so finegrained tracing which gives you at least the timeline of the > >> events. > > Interesting. I've added this to the to-do list. Interesting. I read this phrase more than once in the discussion of your patch. When will the to-do list be done ? > >> There is also no need to do time diff calculations / conversions, this > >> can be done in userspace postprocessing. > > Ah yes, that's the kind of thing that you learn by getting bitten by it. > The problem is the size of the data stream. Diffs are an easy and a > rather inexpensive way of reducing trace sizes. Logging 2 or 4 more bytes > per event when you've got tens of thousands of events occuring per second > does have a noticeable impact. If this is really a sticking point, we > could provide a way for writing full time-stamps. I'm impressed of your sudden time constraints awareness. Allowing 8192 bytes of user event size, string printing with varags and XML tracing is not biting you ? If you only store the low 32 bit of TSC you have a valid timeline when you are able to do the math in your postprocessor. Depending on the speed 16 bit are enough. > >> you do. In space constraint systems relayfs is even worse as it needs > >> more memory than the plain ringbuffer. > > Don't get us wrong, we can strip this down to make this a stupid ring- > buffer. But the fact of the matter is that in trying to use such a thing, > you will find yourself reimplementing the exact things we did for the > same purposes. A ring buffer is not stupid at all. I have implemented tracing with ring buffers already, so I know the limitations and the PITA. OTOH ringbuffers _ARE_ lockless, constant time comsuming and allow you to implement the splitting and related functionality in userspace postprocessing, which has to be done anyway. Do not tell me that streaming out data in a constant stream is worse than putting them into nodes of a filesystem and retrieving them from there. Setting up a simple /dev/proc/sys interface and do a cat /xxx/trace/cpuX >file/interface/whatever is not less efficient than the conversion of your data into a file. > >> The ringbuffer has a nice advantage. In case the system crashes you can > >> retrieve the last and therefor most interesting information from the > >> ringbuffer without any hassle via BDI or in the worstcase via a serial > >> dump. You can even copy the tail of the buffer into a permanent storage > >> like buffered SRAM so it can be retrieved after reboot. > > > And there's a reason why you can't do that with relayfs? We've looked at > this and interfacing between relayfs and crashdump is trivial. Sure, I have to grab stuff out of a filesystem instead of simply doing for (....) sendserial(buffer[i]); I know you can provide a nice function for doing so, but it will take another xxx kB of code instead of a 10 line simple solution. > >> Splitting the trace into different paths is nice to have but I don't see > >> a single point which cannot be done by a userspace (hostside) > >> postprocessing tool. It adds another non time constant component to the > >> trace path. Even the per CPU ringbuffers can be nicely synchronized by a > >> userspace postprocessing tool without adding complex synchronization > >> functions. > > > Again life is a merciless teacher. LTT did initially start with a single > eat-your-breakfeast-dinner-and-supper-in-one-place buffer. But that just > doesn't scale. If you're doing flight-recording, for example, you need > to have a separate channel which contains process creation/exit, > otherwise you have a hard time interepreting the data. Haha. If you have eventstamps and timestamps (even the jiffie + event based ones) nothing is hard to interpret. I guess the ethereal guys are rolling on the floor and laughing. The kernel is not the place to fix your postprocessing problems. Sure you have to do more complicated stuff, but you move the burden from kernel to a place where it does not hurt. What's hard on interpreting and filtering a stream of data ? > >> In case of time related tracing it's just overkill. The printk > >> information is mostly a string, which can be replaced by the address on > >> which the printk is happening. The maybe available arguments can be > >> dumped in binary form. All this information can be converted into human > >> readable form by postprocessing. > > I'm sorry, I don't understand your argument here. What's complicated ? In case I want to have timing related tracing which includes printks, then storing the address where the printk is coming from is enough instead of a various length string. Storing some args in binary form with this address should not be too hard to achieve. Again its a postprocessing problems. > >> I wonder whether the various formatting options of the trace are really > >> of any value. I need neither strings, HEX strings nor XML formatted > >> information from the kernel. Max. 8192 Byte of user information makes me > >> frown. Tracing is not a copy to userspace function or am I missing > >> something ? > Dynamically created custom events and events directed by the likes of > DProbes need something to write to, and user-space utilities must have > a way of determining what format this data was written in. That's all > there is to see here. And therefor I need strings, HEX strings, XML ? A simple number and the data behind gives you all you need. Again its a postprocessing problems. > >> All tracepoints are unconditionally compiled into the kernel, whether > >> they are enabled or not. Why is it neccecary to check the enabled bit > >> for information I'm not interested in ? Why can't I compile this away by > >> not enabling the tracepoint at all. > But you can. Have a look at include/linux/ltt-events.h: > #else /* defined(CONFIG_LTT) */ > #define ltt_ev(ID, DATA) > #define ltt_ev_trap_entry(ID, EIP) > #define ltt_ev_trap_exit() Sure I'm aware that I can switch off all, but I can not deselect specific tracepoints during compile time to reduce the overhead. If I want to have custom tracepoints for my specific problem, then why I need the overhead of the other stuff ? > >> I don't need to point out the various coding style issues again, but I > >> question if > >> atomic_set(&var), atomic_read(&var) | bit); > >> which can be found on several places is really doing what it's suggests > >> to do. > > If there are actual code snippets you think are broken, we'll gladly > fix them. If you consider the above example, which is taken of your code, as sane then we can stop talkin about this. > >> I did a short test on a 300MHz PIII box and the maximum time spent in > >> the log path (interrupts disabled during measurement) is about 30us. > >> Extrapolated to a 74MHz ARM SoC it will sum up to ~ 90-120us, what makes > >> it purely useless > > Granted tracing is not free, but please avoid spreading FUD without > actually carrying out proper testing. We've done quite a large number > of tests and we've demonstrated over and over that LTT, and ltt-over- > relayfs, is actually very efficient. If you're interested in actual > test data, then you may want to check out the following: > http://www.opersys.com/ftp/pub/LTT/Documentation/ltt-usenix.ps.gz > http://lwn.net/Articles/13870/ Karim, please do not use the FUD argument. I do not doubt that it is efficient from your point of view. But if short tests show this and I'm able to prove that numbers, you can barely deny that the scaling of 300MHZ PIII to ARM 74MHz SoC is wrong. It's simple math. > We are aware of the cost of the various tracing components, as you > can see by my earlier posting about early-checking to minimize the > cost of the tracing hooks for kernel compiled with them, and are > open for any optimization. If you have any concrete suggestions, save > the scrap-everything-I-know-better (which is really unproductive as > you would anyway have to go down the same path we have), we are more > than willing to entertain them. Yes, the "you would anyway have to go down the same path we have" argument really scares me away from doing so. I don't buy this kind of arguments. tglx ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-15 0:01 ` 2.6.11-rc1-mm1 Thomas Gleixner @ 2005-01-15 0:26 ` Andrew Morton 2005-01-15 1:00 ` 2.6.11-rc1-mm1 Thomas Gleixner 2005-01-15 1:14 ` 2.6.11-rc1-mm1 Karim Yaghmour 1 sibling, 1 reply; 142+ messages in thread From: Andrew Morton @ 2005-01-15 0:26 UTC (permalink / raw) To: tglx; +Cc: karim, linux-kernel Thomas Gleixner <tglx@linutronix.de> wrote: > > ... > I'm impressed of your sudden time constraints awareness. Allowing 8192 > bytes of user event size, string printing with varags and XML tracing > is not biting you ? ? I see no XML in there. akpm:/usr/src/25> grep -i xml patches/ltt* patches/relayfs* patches/ltt-core-headers.patch:+#define LTT_CUSTOM_EV_FORMAT_TYPE_XML 3 akpm:/usr/src/25> > > Haha. If you have eventstamps and timestamps (even the jiffie + event > based ones) nothing is hard to interpret. I guess the ethereal guys are > rolling on the floor and laughing. > > The kernel is not the place to fix your postprocessing problems. Sure > you have to do more complicated stuff, but you move the burden from > kernel to a place where it does not hurt. I thought Karim said that this was a form of data compression. > > Yes, the "you would anyway have to go down the same path we have" > argument really scares me away from doing so. > > I don't buy this kind of arguments. I do. When someone has been working on a real-world project for several years we *need* to understand all the problems which that person encountered before we can competently review the implementation. Surely you've been there before: you throw out all the old stuff, write a new one and once you've addressed all the warts and corner cases and weird-but-valid requirements it ends up with the same complexity as the original. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-15 0:26 ` 2.6.11-rc1-mm1 Andrew Morton @ 2005-01-15 1:00 ` Thomas Gleixner 2005-01-15 1:25 ` 2.6.11-rc1-mm1 Karim Yaghmour 0 siblings, 1 reply; 142+ messages in thread From: Thomas Gleixner @ 2005-01-15 1:00 UTC (permalink / raw) To: Andrew Morton; +Cc: karim, LKML On Fri, 2005-01-14 at 16:26 -0800, Andrew Morton wrote: > ? I see no XML in there. > > akpm:/usr/src/25> grep -i xml patches/ltt* patches/relayfs* > patches/ltt-core-headers.patch:+#define LTT_CUSTOM_EV_FORMAT_TYPE_XML 3 > akpm:/usr/src/25> And what is this define for ? > > The kernel is not the place to fix your postprocessing problems. Sure > > you have to do more complicated stuff, but you move the burden from > > kernel to a place where it does not hurt. > > I thought Karim said that this was a form of data compression. Adding data compression in form of an additional computation is really inventive. Provide the information in a way that postprocessing tools can do the job without adding computations to the kernel is the goal. I pointed out a couple of those possibilities in my previous mail. > > > > Yes, the "you would anyway have to go down the same path we have" > > argument really scares me away from doing so. > > > > I don't buy this kind of arguments. > > I do. When someone has been working on a real-world project for several > years we *need* to understand all the problems which that person > encountered before we can competently review the implementation. I'm working on real world problems for quite a long time and your argument should apply the other way too. I have implemented instrumentation in different flavours before, so I know exactly what I'm talking about. I'm well aware of the worthiness of someones experience and I'm not going to throw it away, but I don't see the reverse, that accepting this is forcing me to blindly agree with arguments from those persons. > Surelyyou've been there before: you throw out all the old stuff, > write a new one and once you've addressed all the warts and corner > cases and weird-but-valid requirements it ends up with the same > complexity as the original. I disagree at this point. Accepting the maturness of an implementation just from the argument that somebody has done this for a couple of time and therefor gained experience is a quite weak argument, if one can point out the opposite by just reading the code and making a short real life test. If the goal is to provide some "cool to have" instrumentation in the kernel, then I stop arguing immidiately. But this can not be the goal. If we introduce instrumentation facilities into the kernel, then they must be for general use, optimized for non intrusiveness and replace all the other "[] provide measurement X" config options instead of introducing parallel mechanisms. I do not accept unnecessary complexity in the kernel, when you are able to achieve the same goal by putting more thoughts into the postprocessing. The kernel code is responsible to provide a simple and fast interface for those tasks and nothing more. I don't see the point why we need 150k additional code with limitations/problems, which are even obvious without running it, instead of a simple interface to userland where different postprocessors can compete to do the job more or less perfect. As I pointed out in my reply to Tim, I would be happy to have instrumentation in the kernel, but I'm not willing to pay the price which is requested by the currently discussed implementation. tglx ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-15 1:00 ` 2.6.11-rc1-mm1 Thomas Gleixner @ 2005-01-15 1:25 ` Karim Yaghmour 2005-01-15 10:20 ` 2.6.11-rc1-mm1 Thomas Gleixner 0 siblings, 1 reply; 142+ messages in thread From: Karim Yaghmour @ 2005-01-15 1:25 UTC (permalink / raw) To: tglx; +Cc: Andrew Morton, LKML Thomas Gleixner wrote: > I do not accept unnecessary complexity in the kernel, when you are able > to achieve the same goal by putting more thoughts into the > postprocessing. The kernel code is responsible to provide a simple and > fast interface for those tasks and nothing more. I don't see the point > why we need 150k additional code with limitations/problems, which are > even obvious without running it, instead of a simple interface to > userland where different postprocessors can compete to do the job more > or less perfect. You have previously demonstrated that you do not understand the implementation you are criticizing. You keep repeating the size of the patch like a mantra, yet when pressed for actual bits of code that need fixing, you use a circular argument to slip away. If you feel that there is some unncessary processing being done in the kernel, please show me the piece of code affected so that it can be fixed if it is broken. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-15 1:25 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-15 10:20 ` Thomas Gleixner 2005-01-16 4:13 ` 2.6.11-rc1-mm1 Karim Yaghmour 0 siblings, 1 reply; 142+ messages in thread From: Thomas Gleixner @ 2005-01-15 10:20 UTC (permalink / raw) To: karim; +Cc: Andrew Morton, LKML On Fri, 2005-01-14 at 20:25 -0500, Karim Yaghmour wrote: > Thomas Gleixner wrote: > > You have previously demonstrated that you do not understand the > implementation you are criticizing. You keep repeating the size > of the patch like a mantra, yet when pressed for actual bits of > code that need fixing, you use a circular argument to slip away. Yeah, did you answer one of my arguments except claiming that I'm to stupid to understand how it works ? I completely understand what this code does and I don't beat on the patch size. I beat on the timing burden and restrictions which are given by the implementation. I have no objection against relayfs itself. I can just leave the config switch off, so it does not affect me. Adding instrumentation to the kernel is a good thing. I just dont like the idea, that instrumentation is bound on relayfs and adds a feature to the kernel which fits for a restricted set of problems rather than providing a generic optimized instrumentation framework, where one can use relayfs as a backend, if it fits his needs. Making this less glued together leaves the possibility to use other backends. > If you feel that there is some unncessary processing being done > in the kernel, please show me the piece of code affected so that > it can be fixed if it is broken. Just doing codepath analysis shows me: There is a loop in ltt_log_event, which enforces the processing of each event twice. Spliting traces is postprocessing and can be done elsewhere. In _ltt_log_event lives quite a bunch of if(...) processing decisions which have to be evaluated for _each_ event. The relay_reserve code can loop in the do { } while() and even go into a slow path where another do { } while() is found. So it can not be used in fast paths and for timing related problem tracking, because it adds variable time overhead. Due to the fact, that the ltt_log_event path is not preempt safe you can actually hit the additional go in the do { } while() loop. I pointed out before, that it is not possible to selectively select the events which I'm interested in during compile time. I get either nothing or everything. If I want to use instrumentation for a particular problem, why must I process a loop of _ltt_log_event calls for stuff I do not need instead of just compiling it away ? If I compile a event in, then adding a couple of checks into the instrumentation macro itself does not hurt as much as leaving the straight code path for a disabled event. tglx ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-15 10:20 ` 2.6.11-rc1-mm1 Thomas Gleixner @ 2005-01-16 4:13 ` Karim Yaghmour 2005-01-16 15:19 ` 2.6.11-rc1-mm1 Robert Wisniewski 0 siblings, 1 reply; 142+ messages in thread From: Karim Yaghmour @ 2005-01-16 4:13 UTC (permalink / raw) To: tglx; +Cc: Andrew Morton, LKML, Robert Wisniewski Hello Thomas, In the interest of avoiding expanding the thread too thin, I'm replying to both emails in the same time. Thomas Gleixner wrote: >>relayfs is a generalized buffering mechanism. Tracing is one application >>it serves. Check out the web site: "high-speed data-relay filesystem." >>Fancy name huh ... > > > I do not doubt that. > > But hardwiring an instrumentation framework on it is also hardwiring > implicit restrictions on the usability of the instrumentation for > certain purposes. To a certain extent this is true. Please refer to my reply to your RFC for a discussion of this. >>Well for one thing, a portion of code running in user-context won't >>disable interrupts while it's attempting to get buffer space, and >>therefore won't impact on interrupt delivery. > > > The do {} while loops are in the fast ltt_log_event path You mean that it would impact on interrupt deliver? This code's behavior has actually been carefully studied, and what has been seen is that there code almost never loops, and when it does, it very rarely does it more than twice. In the case of an interrupt, you'd have to receive an interrupt while reserving space for logging a current's interrupt occurrence for the loop to be done twice. I've CC'ed Bob Wisniewski on this as he's the one that implemented this code and studied its behavior in depth. > Yeah, did you answer one of my arguments except claiming that I'm to > stupid to understand how it works ? If I miss-spoke, then I appologize. For one thing, I've never thought of you as stupid. I'm just trying to get specifics here. > I just dont like the idea, that instrumentation is bound on relayfs and > adds a feature to the kernel which fits for a restricted set of problems > rather than providing a generic optimized instrumentation framework, > where one can use relayfs as a backend, if it fits his needs. Making > this less glued together leaves the possibility to use other backends. Yes, I understand and I hope my other mail properly addresses this issue. > There is a loop in ltt_log_event, which enforces the processing of each > event twice. Spliting traces is postprocessing and can be done > elsewhere. Sorry, this is not postprocessing. Let me explain: Basically, the ltt framework allows only one tracing session to be active at all times. IOW, if you were planning on starting a 2 week trace and after doing so wanted to trace a short 10s on an application then you are screwed, LTT won't allow you to do that. Currently this is a limitation which we haven't heard any complaints about, so we're not going to generalize it until there is proof that people really need this. However, there are cases where you want to have tracing running at _all_ times in what is refered to as flight-recorder mode and only dump the content of the buffers when something special happens. Yet, those who are interested in having this 24x7 mode also know enough about tracing that they do need to actually trace other things for short periods without disrupting their flight-recording. That's why there's a loop. An event will be processed twice only if you're tracing AND flight- recording in the same time. There is no way to do an equivalent of what I just described with any form of postprocessing. Here's the proper snippet from include/linux/ltt-events.h: /* We currently support 2 traces, normal trace and flight recorder */ #define NR_TRACES 2 #define TRACE_HANDLE 0 #define FLIGHT_HANDLE 1 > In _ltt_log_event lives quite a bunch of if(...) processing decisions > which have to be evaluated for _each_ event. Correct, and I'm honest enough with myself to admit that this is the bit of code that I think needs the most reviewing. So, in order to help you help me, here's the various code snippets and things I can think of which would help make the code faster/simpler: Here's the preamble where we check some make some basic sanity checks: if (!trace) return -ENOMEDIUM; if (trace->paused) return -EBUSY; tracer_handle = trace->trace_handle; if (!trace->flight_recorder && (trace->daemon_task_struct == NULL)) return -ENODEV; channel_handle = trace_channel_handle(tracer_handle, cpu_id); if ((trace->tracer_started == 1) || (event_id == LTT_EV_START) || (event_id == LTT_EV_BUFFER_START)) goto trace_event; return -EBUSY; trace_event: if (!ltt_test_bit(event_id, &trace->traced_events)) return 0; Basically, unless we've succeeded in all those if's, we're not going to write anything. I think we could get rid of the first 4 ones by simply maintaining a state-machine for the tracer. Then we could either have a single if or even use function pointers (though I think this costs more) to call or not call _ltt_log_event. As for checking whether the event has a certain ID (EV_START or EV_BUFFER_START and ltt_test_bit), we could do the testing at the event's occurrence (i.e. as soon as the event occurs, check whether it's being monitored right there and drop it otherwise.) Here's the part where we check if some basic filtering requirements have been made: if ((event_id != LTT_EV_START) && (event_id != LTT_EV_BUFFER_START)) { if (event_id == LTT_EV_SCHEDCHANGE) incoming_process = (struct task_struct *) (((ltt_schedchange *) event_struct)->in); if ((trace->tracing_pid == 1) && (current->pid != trace->traced_pid)) { if (incoming_process == NULL) return 0; else if (incoming_process->pid != trace->traced_pid) return 0; } if ((trace->tracing_pgrp == 1) && (process_group(current) != trace->traced_pgrp)) { if (incoming_process == NULL) return 0; else if (process_group(incoming_process) != trace->traced_pgrp) return 0; } if ((trace->tracing_gid == 1) && (current->egid != trace->traced_gid)) { if (incoming_process == NULL) return 0; else if (incoming_process->egid != trace->traced_gid) return 0; } if ((trace->tracing_uid == 1) && (current->euid != trace->traced_uid)) { if (incoming_process == NULL) return 0; else if (incoming_process->euid != trace->traced_uid) return 0; } if (event_id == LTT_EV_SCHEDCHANGE) (((ltt_schedchange *) event_struct)->in) = incoming_process->pid; } First, the first inner if (LTT_EV_SCHEDCHANGE) really ought to be outside. Instead we should modify ltt_log_event from: int ltt_log_event(u8 event_id, void *event_struct) to: int ltt_log_event(u8 event_id, void *event_struct, void *data, int data_len) where data is used to pass the pointer to the incoming process' task struct, and reused below in conjunction with data_len for other purposes. and have something like this instead in the code: if ((any_filtering) && !(ltt_filter(event_id, event_struct, data))) return -EINVAL; where ltt_filter is the filtering function, called only when there is any sort of filtering being done. The we calculate the size of this event: data_size = sizeof(event_id) + sizeof(time_delta) + sizeof(data_size); if (ltt_test_bit(event_id, &trace->log_event_details_mask)) { data_size += event_struct_size[event_id]; switch (event_id) { case LTT_EV_FILE_SYSTEM: if ((((ltt_file_system *) event_struct)->event_sub_id == LTT_EV_FILE_SYSTEM_EXEC) || (((ltt_file_system *) event_struct)->event_sub_id == LTT_EV_FILE_SYSTEM_OPEN)) { var_data_beg = ((ltt_file_system *) event_struct)->file_name; var_data_len = ((ltt_file_system *) event_struct)->event_data2 + 1; data_size += (uint16_t) var_data_len; } break; case LTT_EV_CUSTOM: var_data_beg = ((ltt_custom *) event_struct)->data; var_data_len = ((ltt_custom *) event_struct)->data_size; data_size += (uint16_t) var_data_len; break; } } Here we reuse data and data_len, and remove the checking for whether the user wants to log event details or not in order to remove this if/switch altogether. The log_event_details_mask was a feature I added early on in LTT's life and I don't know of anyone for whom this was really crucial. We could revive it later if it became important. Then we check whether we should be logging the CPU-ID: if ((trace->log_cpuid == 1) && (event_id != LTT_EV_START) && (event_id != LTT_EV_BUFFER_START)) data_size += sizeof(cpu_id); Frankly this is legacy code for when ltt only supported one trace buffer, and I don't know that we need to keep it. Clearly if you've got many CPUs you don't want to be using one buffer. So this code can go. Now we do the relayfs part: rchan = rchan_get(channel_handle); if (rchan == NULL) return -ENODEV; relay_lock_channel(rchan, flags); /* nop for lockless */ reserved = relay_reserve(rchan, data_size, &time_stamp, &time_delta, &reserve_code, &interrupting); if (reserve_code & RELAY_WRITE_DISCARD) { events_lost(trace->trace_handle, cpu_id)++; bytes_written = 0; goto check_buffer_switch; } First, the rchan_get() really ought to go. As Roman suggested, relayfs should be handing out IDs, it should be handing out pointers. Once this is changed in relayfs, this piece of code will go and be replaced by something like: atomic_inc(&rchan->refcount); The rest is ok. At this point we actually write to the buffer: if ((trace->log_cpuid == 1) && (event_id != LTT_EV_START) && (event_id != LTT_EV_BUFFER_START)) relay_write_direct(reserved, &cpu_id, sizeof(cpu_id)); relay_write_direct(reserved, &event_id, sizeof(event_id)); relay_write_direct(reserved, &time_delta, sizeof(time_delta)); if (ltt_test_bit(event_id, &trace->log_event_details_mask)) { relay_write_direct(reserved, event_struct, event_struct_size[event_id]); if (var_data_len) relay_write_direct(reserved, var_data_beg, var_data_len); } relay_write_direct(reserved, &data_size, sizeof(data_size)); bytes_written = data_size; As above, the CPU-Id and the check for log_event_details_mask should go. And the details snippet should look something like this: relay_write_direct(reserved, event_struct, event_struct_size[event_id]); if (data_len) relay_write_direct(reserved, data, data_len); Finally, we complete the relayfs management: check_buffer_switch: if ((event_id == LTT_EV_SCHEDCHANGE) && (tracer_handle == TRACE_HANDLE) && current_traces[FLIGHT_HANDLE].active) (((ltt_schedchange *) event_struct)->in) = (u32)incoming_process; /* We need to commit even if we didn't write anything because that's how the deliver callback is invoked. */ relay_commit(rchan, reserved, bytes_written, reserve_code, interrupting); relay_unlock_channel(rchan, flags); rchan_put(rchan); For this bit, it's the if() that ought to go now that we would be using data and data_len. Also, the rchan_put() should be replaced with the following once relayfs is changed: atomic_dec(&rchan->refcount); Let me know if have additional suggestions. > The relay_reserve code can loop in the do { } while() and even go into a > slow path where another do { } while() is found. > So it can not be used in fast paths and for timing related problem > tracking, because it adds variable time overhead. True. But remember what I said earlier, if timing is an issue you need to be using the locking scheme. > Due to the fact, that the ltt_log_event path is not preempt safe you can > actually hit the additional go in the do { } while() loop. Yes, we should have something like this instead: u32 cpu; preempt_disable(); cpu = smp_processor_id(); for (i = 0; i < NR_TRACES; i++) { trace = current_traces[i].active; err[i] = _ltt_log_event(trace, event_id, event_struct, cpu); } preempt_enable(); This better? > I pointed out before, that it is not possible to selectively select the > events which I'm interested in during compile time. I get either nothing > or everything. If I want to use instrumentation for a particular > problem, why must I process a loop of _ltt_log_event calls for stuff I > do not need instead of just compiling it away ? Like I said, that's an easy hack in Kconfig. > If I compile a event in, then adding a couple of checks into the > instrumentation macro itself does not hurt as much as leaving the > straight code path for a disabled event. Right, like I said above, the instrumentation macros should check for the event's logging as early as possible. As you can see, I am open to your feedback. The above improvements will go in the ltt code. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 4:13 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-16 15:19 ` Robert Wisniewski 0 siblings, 0 replies; 142+ messages in thread From: Robert Wisniewski @ 2005-01-16 15:19 UTC (permalink / raw) To: karim; +Cc: tglx, Andrew Morton, LKML, Robert Wisniewski Karim Yaghmour writes: > > Hello Thomas, > > In the interest of avoiding expanding the thread too thin, I'm replying to > both emails in the same time. > > Thomas Gleixner wrote: > >>relayfs is a generalized buffering mechanism. Tracing is one application > >>it serves. Check out the web site: "high-speed data-relay filesystem." > >>Fancy name huh ... > > > > > > I do not doubt that. > > > > But hardwiring an instrumentation framework on it is also hardwiring > > implicit restrictions on the usability of the instrumentation for > > certain purposes. > > To a certain extent this is true. Please refer to my reply to your RFC > for a discussion of this. > > >>Well for one thing, a portion of code running in user-context won't > >>disable interrupts while it's attempting to get buffer space, and > >>therefore won't impact on interrupt delivery. > > > > > > The do {} while loops are in the fast ltt_log_event path As Greg's comments implicitly involved this issue as well, maybe it's worth expanding on what is going on here. The idea behind the lockless tracing is for each process/thread to atomically reserve space in the buffer, then write in the events. Also note that buffers are per-processor. So the do {} while loop loads the current index, does a calculation and attempts to use the calculated value (which is the old index + length of current event) to atomically compare_and_swap with the actual index pointer. As Karim correctly notes, the only way this will fail is if an interrupt occurred during the couple of instruction calculation, i.e., between when the old value was loaded and when we do the CAS, so it's unlikely, but even much more unlikely that, as he notes, this process would be woken up only for a couple of instructions and re-interrupted. Back to Greg's volatile issue: The reason the index needs to be volatile (or as was originally coded the reason we clobbered the registers) is to make sure the compiler knows the index value needs to get reloaded from memory each time around the loop. Hope this helps. I'm certainly happy to discuss in more length if there's any concerns/questions. -bob Robert Wisniewski The K42 MP OS Project http://www.research.ibm.com/K42/ bob@watson.ibm.com > > You mean that it would impact on interrupt deliver? This code's behavior > has actually been carefully studied, and what has been seen is that > there code almost never loops, and when it does, it very rarely does > it more than twice. In the case of an interrupt, you'd have to receive > an interrupt while reserving space for logging a current's interrupt > occurrence for the loop to be done twice. I've CC'ed Bob Wisniewski > on this as he's the one that implemented this code and studied its > behavior in depth. > > > Yeah, did you answer one of my arguments except claiming that I'm to > > stupid to understand how it works ? > > If I miss-spoke, then I appologize. For one thing, I've never thought > of you as stupid. I'm just trying to get specifics here. > > > I just dont like the idea, that instrumentation is bound on relayfs and > > adds a feature to the kernel which fits for a restricted set of problems > > rather than providing a generic optimized instrumentation framework, > > where one can use relayfs as a backend, if it fits his needs. Making > > this less glued together leaves the possibility to use other backends. > > Yes, I understand and I hope my other mail properly addresses this issue. > > > There is a loop in ltt_log_event, which enforces the processing of each > > event twice. Spliting traces is postprocessing and can be done > > elsewhere. > > Sorry, this is not postprocessing. Let me explain: > > Basically, the ltt framework allows only one tracing session to be active > at all times. IOW, if you were planning on starting a 2 week trace and > after doing so wanted to trace a short 10s on an application then you are > screwed, LTT won't allow you to do that. Currently this is a limitation > which we haven't heard any complaints about, so we're not going to > generalize it until there is proof that people really need this. > > However, there are cases where you want to have tracing running at _all_ > times in what is refered to as flight-recorder mode and only dump the > content of the buffers when something special happens. Yet, those who > are interested in having this 24x7 mode also know enough about tracing > that they do need to actually trace other things for short periods > without disrupting their flight-recording. That's why there's a loop. > An event will be processed twice only if you're tracing AND flight- > recording in the same time. > > There is no way to do an equivalent of what I just described with any > form of postprocessing. > > Here's the proper snippet from include/linux/ltt-events.h: > /* We currently support 2 traces, normal trace and flight recorder */ > #define NR_TRACES 2 > #define TRACE_HANDLE 0 > #define FLIGHT_HANDLE 1 > > > In _ltt_log_event lives quite a bunch of if(...) processing decisions > > which have to be evaluated for _each_ event. > > Correct, and I'm honest enough with myself to admit that this is the bit > of code that I think needs the most reviewing. So, in order to help > you help me, here's the various code snippets and things I can think > of which would help make the code faster/simpler: > > Here's the preamble where we check some make some basic sanity checks: > > if (!trace) > return -ENOMEDIUM; > > if (trace->paused) > return -EBUSY; > > tracer_handle = trace->trace_handle; > > if (!trace->flight_recorder && (trace->daemon_task_struct == NULL)) > return -ENODEV; > > channel_handle = trace_channel_handle(tracer_handle, cpu_id); > > if ((trace->tracer_started == 1) || (event_id == LTT_EV_START) || (event_id == LTT_EV_BUFFER_START)) > goto trace_event; > > return -EBUSY; > > trace_event: > if (!ltt_test_bit(event_id, &trace->traced_events)) > return 0; > > Basically, unless we've succeeded in all those if's, we're not going to > write anything. I think we could get rid of the first 4 ones by simply > maintaining a state-machine for the tracer. Then we could either have > a single if or even use function pointers (though I think this costs > more) to call or not call _ltt_log_event. As for checking whether the > event has a certain ID (EV_START or EV_BUFFER_START and ltt_test_bit), > we could do the testing at the event's occurrence (i.e. as soon as the > event occurs, check whether it's being monitored right there and drop > it otherwise.) > > Here's the part where we check if some basic filtering requirements > have been made: > > if ((event_id != LTT_EV_START) && (event_id != LTT_EV_BUFFER_START)) { > if (event_id == LTT_EV_SCHEDCHANGE) > incoming_process = (struct task_struct *) (((ltt_schedchange *) event_struct)->in); > if ((trace->tracing_pid == 1) && (current->pid != trace->traced_pid)) { > if (incoming_process == NULL) > return 0; > else if (incoming_process->pid != trace->traced_pid) > return 0; > } > if ((trace->tracing_pgrp == 1) && (process_group(current) != trace->traced_pgrp)) { > if (incoming_process == NULL) > return 0; > else if (process_group(incoming_process) != trace->traced_pgrp) > return 0; > } > if ((trace->tracing_gid == 1) && (current->egid != trace->traced_gid)) { > if (incoming_process == NULL) > return 0; > else if (incoming_process->egid != trace->traced_gid) > return 0; > } > if ((trace->tracing_uid == 1) && (current->euid != trace->traced_uid)) { > if (incoming_process == NULL) > return 0; > else if (incoming_process->euid != trace->traced_uid) > return 0; > } > if (event_id == LTT_EV_SCHEDCHANGE) > (((ltt_schedchange *) event_struct)->in) = incoming_process->pid; > } > > First, the first inner if (LTT_EV_SCHEDCHANGE) really ought to be outside. > Instead we should modify ltt_log_event from: > int ltt_log_event(u8 event_id, > void *event_struct) > to: > int ltt_log_event(u8 event_id, > void *event_struct, > void *data, > int data_len) > > where data is used to pass the pointer to the incoming process' task struct, > and reused below in conjunction with data_len for other purposes. > > and have something like this instead in the code: > if ((any_filtering) && !(ltt_filter(event_id, event_struct, data))) > return -EINVAL; > > where ltt_filter is the filtering function, called only when there is any > sort of filtering being done. > > The we calculate the size of this event: > data_size = sizeof(event_id) + sizeof(time_delta) + sizeof(data_size); > > > if (ltt_test_bit(event_id, &trace->log_event_details_mask)) { > data_size += event_struct_size[event_id]; > switch (event_id) { > case LTT_EV_FILE_SYSTEM: > if ((((ltt_file_system *) event_struct)->event_sub_id == LTT_EV_FILE_SYSTEM_EXEC) > || (((ltt_file_system *) event_struct)->event_sub_id == LTT_EV_FILE_SYSTEM_OPEN)) { > var_data_beg = ((ltt_file_system *) event_struct)->file_name; > var_data_len = ((ltt_file_system *) event_struct)->event_data2 + 1; > data_size += (uint16_t) var_data_len; > } > break; > case LTT_EV_CUSTOM: > var_data_beg = ((ltt_custom *) event_struct)->data; > var_data_len = ((ltt_custom *) event_struct)->data_size; > data_size += (uint16_t) var_data_len; > break; > } > } > > Here we reuse data and data_len, and remove the checking for whether the > user wants to log event details or not in order to remove this if/switch > altogether. The log_event_details_mask was a feature I added early on > in LTT's life and I don't know of anyone for whom this was really crucial. > We could revive it later if it became important. > > Then we check whether we should be logging the CPU-ID: > if ((trace->log_cpuid == 1) && (event_id != LTT_EV_START) && (event_id != LTT_EV_BUFFER_START)) > data_size += sizeof(cpu_id); > > Frankly this is legacy code for when ltt only supported one trace buffer, > and I don't know that we need to keep it. Clearly if you've got many > CPUs you don't want to be using one buffer. So this code can go. > > Now we do the relayfs part: > rchan = rchan_get(channel_handle); > if (rchan == NULL) > return -ENODEV; > > relay_lock_channel(rchan, flags); /* nop for lockless */ > reserved = relay_reserve(rchan, data_size, &time_stamp, &time_delta, &reserve_code, &interrupting); > > if (reserve_code & RELAY_WRITE_DISCARD) { > events_lost(trace->trace_handle, cpu_id)++; > bytes_written = 0; > goto check_buffer_switch; > } > > First, the rchan_get() really ought to go. As Roman suggested, relayfs > should be handing out IDs, it should be handing out pointers. Once this > is changed in relayfs, this piece of code will go and be replaced by > something like: > atomic_inc(&rchan->refcount); > > The rest is ok. > > At this point we actually write to the buffer: > if ((trace->log_cpuid == 1) && (event_id != LTT_EV_START) > && (event_id != LTT_EV_BUFFER_START)) > relay_write_direct(reserved, > &cpu_id, > sizeof(cpu_id)); > > relay_write_direct(reserved, > &event_id, > sizeof(event_id)); > > relay_write_direct(reserved, > &time_delta, > sizeof(time_delta)); > > if (ltt_test_bit(event_id, &trace->log_event_details_mask)) { > relay_write_direct(reserved, > event_struct, > event_struct_size[event_id]); > if (var_data_len) > relay_write_direct(reserved, > var_data_beg, > var_data_len); > } > > relay_write_direct(reserved, > &data_size, > sizeof(data_size)); > > bytes_written = data_size; > > As above, the CPU-Id and the check for log_event_details_mask should > go. And the details snippet should look something like this: > > relay_write_direct(reserved, > event_struct, > event_struct_size[event_id]); > if (data_len) > relay_write_direct(reserved, > data, > data_len); > > Finally, we complete the relayfs management: > > check_buffer_switch: > if ((event_id == LTT_EV_SCHEDCHANGE) && (tracer_handle == TRACE_HANDLE) && current_traces[FLIGHT_HANDLE].active) > (((ltt_schedchange *) event_struct)->in) = (u32)incoming_process; > > /* We need to commit even if we didn't write anything because > that's how the deliver callback is invoked. */ > relay_commit(rchan, reserved, bytes_written, reserve_code, interrupting); > > relay_unlock_channel(rchan, flags); > rchan_put(rchan); > > For this bit, it's the if() that ought to go now that we would be using > data and data_len. Also, the rchan_put() should be replaced with the > following once relayfs is changed: > atomic_dec(&rchan->refcount); > > Let me know if have additional suggestions. > > > The relay_reserve code can loop in the do { } while() and even go into a > > slow path where another do { } while() is found. > > So it can not be used in fast paths and for timing related problem > > tracking, because it adds variable time overhead. > > True. But remember what I said earlier, if timing is an issue you need to > be using the locking scheme. > > > Due to the fact, that the ltt_log_event path is not preempt safe you can > > actually hit the additional go in the do { } while() loop. > > Yes, we should have something like this instead: > u32 cpu; > > preempt_disable(); > cpu = smp_processor_id(); > for (i = 0; i < NR_TRACES; i++) { > trace = current_traces[i].active; > err[i] = _ltt_log_event(trace, event_id, event_struct, cpu); > } > preempt_enable(); > > This better? > > > I pointed out before, that it is not possible to selectively select the > > events which I'm interested in during compile time. I get either nothing > > or everything. If I want to use instrumentation for a particular > > problem, why must I process a loop of _ltt_log_event calls for stuff I > > do not need instead of just compiling it away ? > > Like I said, that's an easy hack in Kconfig. > > > If I compile a event in, then adding a couple of checks into the > > instrumentation macro itself does not hurt as much as leaving the > > straight code path for a disabled event. > > Right, like I said above, the instrumentation macros should check for > the event's logging as early as possible. > > As you can see, I am open to your feedback. The above improvements > will go in the ltt code. > > Karim > -- > Author, Speaker, Developer, Consultant > Pushing Embedded and Real-Time Linux Systems Beyond the Limits > http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-15 0:01 ` 2.6.11-rc1-mm1 Thomas Gleixner 2005-01-15 0:26 ` 2.6.11-rc1-mm1 Andrew Morton @ 2005-01-15 1:14 ` Karim Yaghmour 2005-01-15 9:57 ` 2.6.11-rc1-mm1 Thomas Gleixner 1 sibling, 1 reply; 142+ messages in thread From: Karim Yaghmour @ 2005-01-15 1:14 UTC (permalink / raw) To: tglx; +Cc: Andrew Morton, linux-kernel Hello Thomas, Gee Thomas, I guess you really want to take this one until the last man is standing. Feel free to use the ad-hominem tone if it suits you. Don't hold it against me though if I don't bite :) Thomas Gleixner wrote: > It's not only me, who needs constant time. Everybody interested in > tracing will need that. In my opinion its a principle of tracing. relayfs is a generalized buffering mechanism. Tracing is one application it serves. Check out the web site: "high-speed data-relay filesystem." Fancy name huh ... > The "lockless" mechanism is _FAKE_ as I already pointed out. It replaces > locks by do { } while loops. So what ? Well for one thing, a portion of code running in user-context won't disable interrupts while it's attempting to get buffer space, and therefore won't impact on interrupt delivery. > Interesting. I read this phrase more than once in the discussion of your > patch. When will the to-do list be done ? Well of course you hear it more than once, we are getting _a lot_ of interesting feedback. Forgive me if I actually take the time to wait a day or two for most everyone's feedback to come in and carry out recommendations properly. Don't worry, I won't hold the changes too long :) > I'm impressed of your sudden time constraints awareness. Allowing 8192 > bytes of user event size, string printing with varags and XML tracing > is not biting you ? Use of these is by definition lacking performance. It's there because some people actually need it. Again, if you have some concrete advice as to what needs to be changed, we'll gladly hear it. > If you only store the low 32 bit of TSC you have a valid timeline when > you are able to do the math in your postprocessor. Depending on the > speed 16 bit are enough. We're already storing the low 32 bit of the TSC where available. > A ring buffer is not stupid at all. I have implemented tracing with ring > buffers already, so I know the limitations and the PITA. > > OTOH ringbuffers _ARE_ lockless, constant time comsuming and allow you > to implement the splitting and related functionality in userspace > postprocessing, which has to be done anyway. We've had this debate before if you're interested to dig in the archives. Here's a suggested implementation by Ingo: http://marc.theaimsgroup.com/?l=linux-kernel&m=103273730326318&w=2 And here are some reasons why this is incomplete: http://marc.theaimsgroup.com/?l=linux-kernel&m=103273967727564&w=2 > Do not tell me that streaming out data in a constant stream is worse > than putting them into nodes of a filesystem and retrieving them from > there. > > Setting up a simple /dev/proc/sys interface and do a > cat /xxx/trace/cpuX >file/interface/whatever > is not less efficient than the conversion of your data into a file. Clearly you haven't read the implementation and/or aren't familiar with its use. Usually, what you want to do is open(), mmap(), write(), there is no "conversion" to a file. The filesystem abstraction is just a namespace holder for us. > Sure, I have to grab stuff out of a filesystem instead of simply doing > for (....) > sendserial(buffer[i]); > > I know you can provide a nice function for doing so, but it will take > another xxx kB of code instead of a 10 line simple solution. Again, you haven't read the implementation and aren't familiar with its mechanics. Basically, you should just need to provide the pointer to the begining of the relayfs buffer and do what you suggest above. > Haha. If you have eventstamps and timestamps (even the jiffie + event > based ones) nothing is hard to interpret. I guess the ethereal guys are > rolling on the floor and laughing. > > The kernel is not the place to fix your postprocessing problems. Sure > you have to do more complicated stuff, but you move the burden from > kernel to a place where it does not hurt. > > What's hard on interpreting and filtering a stream of data ? Umm, not having enough information in order for interpreting the data? There is no postprocessing done in the kernel, please stop making false claims. What is done is provide enough information to allow simpler post-processing later. Spliting the stream on a per-cpu basis is certainly not without merit. Plus, there is no cost in doing this, each channel has a different ID and logging to it does not require any form of string lookup (currently we're just checking a table to make sure the ID is valid, but Roman suggested we dump this for pure pointers instead and we've added this to our list.) > What's complicated ? In case I want to have timing related tracing which > includes printks, then storing the address where the printk is coming > from is enough instead of a various length string. Storing some args in > binary form with this address should not be too hard to achieve. > > Again its a postprocessing problems. Sorry, I don't see how this is relevant to either relayfs or LTT. > And therefor I need strings, HEX strings, XML ? A simple number and the > data behind gives you all you need. > > Again its a postprocessing problems. But that's exactly what we got already. Here's from include/linux/ltt-events.h: /* Custom declared events */ /* ***WARNING*** These structures should never be used as is, use the provided custom event creation and logging functions. */ typedef struct _ltt_new_event { /* Basics */ u32 id; /* Custom event ID */ char type[LTT_CUSTOM_EV_TYPE_STR_LEN]; /* Event type description */ char desc[LTT_CUSTOM_EV_DESC_STR_LEN]; /* Detailed event description */ /* Custom formatting */ u32 format_type; /* Type of formatting */ char form[LTT_CUSTOM_EV_FORM_STR_LEN]; /* Data specific to format */ } LTT_PACKED_STRUCT ltt_new_event; typedef struct _ltt_custom { u32 id; /* Event ID */ u32 data_size; /* Size of data recorded by event */ void *data; /* Data recorded by event */ } LTT_PACKED_STRUCT ltt_custom; The ltt_new_event struct is only used once when the event is created. Everything afterwards goes through an ltt_custom struct. > Sure I'm aware that I can switch off all, but I can not deselect > specific tracepoints during compile time to reduce the overhead. > > If I want to have custom tracepoints for my specific problem, then why I > need the overhead of the other stuff ? Ah, ok, you weren't as clear earlier. I don't see anything that precludes us from adding the appropriate kconfig/#ifdef machinery to allow this. I'll gladly take a patch from you. > If you consider the above example, which is taken of your code, as sane > then we can stop talkin about this. That's not the point. You're bending backwards as far as you can reach trying to raise as much mud as you can, but when pressed for actual constructive input you hide behind a strawman argument. If you don't have anything to say, then stop whining. > Karim, please do not use the FUD argument. > > I do not doubt that it is efficient from your point of view. > > But if short tests show this and I'm able to prove that numbers, you can > barely deny that the scaling of 300MHZ PIII to ARM 74MHz SoC is wrong. > It's simple math. I like calling things by their name. You can say what you will but I will bet on the casual observer's sense of reality to differentiate between your "short tests" and the rounds of benchmarks we ran and the results that we documented. > Yes, the "you would anyway have to go down the same path we have" > argument really scares me away from doing so. > > I don't buy this kind of arguments. You have every right to contest what I'm saying. But if you do wish to enforce that right, it seems to me that I have the right to not have my time wasted by having to parse through your unnecessary ad-hominem attacks. There are justifications for our choices, and I will do my best to present them to you. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-15 1:14 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-15 9:57 ` Thomas Gleixner 0 siblings, 0 replies; 142+ messages in thread From: Thomas Gleixner @ 2005-01-15 9:57 UTC (permalink / raw) To: karim; +Cc: Andrew Morton, linux-kernel Hi Karim, On Fri, 2005-01-14 at 20:14 -0500, Karim Yaghmour wrote: > Gee Thomas, I guess you really want to take this one until the last > man is standing. Feel free to use the ad-hominem tone if it suits > you. Don't hold it against me though if I don't bite :) No personal offence was intended. > Thomas Gleixner wrote: > > It's not only me, who needs constant time. Everybody interested in > > tracing will need that. In my opinion its a principle of tracing. > > relayfs is a generalized buffering mechanism. Tracing is one application > it serves. Check out the web site: "high-speed data-relay filesystem." > Fancy name huh ... I do not doubt that. But hardwiring an instrumentation framework on it is also hardwiring implicit restrictions on the usability of the instrumentation for certain purposes. > > The "lockless" mechanism is _FAKE_ as I already pointed out. It replaces > > locks by do { } while loops. So what ? > > Well for one thing, a portion of code running in user-context won't > disable interrupts while it's attempting to get buffer space, and > therefore won't impact on interrupt delivery. The do {} while loops are in the fast ltt_log_event path > Clearly you haven't read the implementation and/or aren't familiar with > its use. Usually, what you want to do is open(), mmap(), write(), there > is no "conversion" to a file. The filesystem abstraction is just a > namespace holder for us. I have read it and tried it. I don't see a point why I can't map a ringbuffer into user space. I'm not beating on the ringbuffer, but I'm using it as an example. :) > That's not the point. You're bending backwards as far as you can reach > trying to raise as much mud as you can, but when pressed for actual > constructive input you hide behind a strawman argument. If you don't > have anything to say, then stop whining. I gave constructive criticism along with points, where I just point on the restrictions and weakness of the implementation. tglx ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 23:09 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-15 0:01 ` 2.6.11-rc1-mm1 Thomas Gleixner @ 2005-01-16 16:21 ` Christoph Hellwig 2005-01-16 19:49 ` 2.6.11-rc1-mm1 Karim Yaghmour 1 sibling, 1 reply; 142+ messages in thread From: Christoph Hellwig @ 2005-01-16 16:21 UTC (permalink / raw) To: Karim Yaghmour; +Cc: tglx, Andrew Morton, linux-kernel On Fri, Jan 14, 2005 at 06:09:23PM -0500, Karim Yaghmour wrote: > relayfs implements two schemes: lockless and locking. The later uses > standard linear locking mechanisms. If you need stringent constant > time, you know what to do. the lockless mode is really just loops around cmpxchg. It's spinlocks reinvented poorly. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 16:21 ` 2.6.11-rc1-mm1 Christoph Hellwig @ 2005-01-16 19:49 ` Karim Yaghmour 2005-01-16 20:11 ` 2.6.11-rc1-mm1 Robert Wisniewski 0 siblings, 1 reply; 142+ messages in thread From: Karim Yaghmour @ 2005-01-16 19:49 UTC (permalink / raw) To: Christoph Hellwig; +Cc: tglx, Andrew Morton, linux-kernel, Robert Wisniewski Christoph Hellwig wrote: > the lockless mode is really just loops around cmpxchg. It's spinlocks > reinvented poorly. I beg to differ. You have to use different spinlocks depending on where you are: - serving user-space - bh-derivatives - irq lockless is the same primitive regardless of your current state, it's not the same as spinlocks. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 19:49 ` 2.6.11-rc1-mm1 Karim Yaghmour @ 2005-01-16 20:11 ` Robert Wisniewski 2005-01-16 20:32 ` 2.6.11-rc1-mm1 Andrew Morton 2005-01-16 20:39 ` 2.6.11-rc1-mm1 Christoph Hellwig 0 siblings, 2 replies; 142+ messages in thread From: Robert Wisniewski @ 2005-01-16 20:11 UTC (permalink / raw) To: karim Cc: Christoph Hellwig, tglx, Andrew Morton, linux-kernel, Robert Wisniewski Karim Yaghmour writes: > > Christoph Hellwig wrote: > > the lockless mode is really just loops around cmpxchg. It's spinlocks > > reinvented poorly. Christoph, Sadly they're not the same, atomic operations provide a set of functionality that simple spin locks do not give you. Consider two different processes each executing the following code int global_val; modify_val_spin() { acquire_spin_lock() // calculate some_value based on global_val // for example c=global_val; if (c%0) some_value=10; else some_value=20; global_val = global_val + some_value release_spin_lock() } modify_val_atomic() { do // calculate some_value based on global_val // for example c=global_val; if (c%0) some_value=10; else some_value=20; global_val = global_val + some_value while (compare_and_store(global_val, , )) } What's the difference. The deal is if two processes execute this code simultaneously and one gets interrupted in the middle of modify_val_spin, then the other wastes its entire quantum spinning for the lock. In the modify_val_atomic if one process gets interrupted, no problem, the other process can proceed through, then when the first one runs again the CAS will fail, and it will go around the loop again. Now imagine it was the kernel involved... I don't claim to have all the answers and am happy to have discussion on something, but the attitude expressed by "It's spinlocks reinvented poorly." is not conducive to a useful exchange even if you were correct. > > I beg to differ. You have to use different spinlocks depending on > where you are: > - serving user-space > - bh-derivatives > - irq > > lockless is the same primitive regardless of your current state, > it's not the same as spinlocks. > > Karim > -- > Author, Speaker, Developer, Consultant > Pushing Embedded and Real-Time Linux Systems Beyond the Limits > http://www.opersys.com || karim@opersys.com || 1-866-677-4546 Robert Wisniewski The K42 MP OS Project http://www.research.ibm.com/K42/ bob@watson.ibm.com ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 20:11 ` 2.6.11-rc1-mm1 Robert Wisniewski @ 2005-01-16 20:32 ` Andrew Morton 2005-01-16 21:06 ` 2.6.11-rc1-mm1 Robert Wisniewski 2005-01-16 20:39 ` 2.6.11-rc1-mm1 Christoph Hellwig 1 sibling, 1 reply; 142+ messages in thread From: Andrew Morton @ 2005-01-16 20:32 UTC (permalink / raw) To: Robert Wisniewski; +Cc: karim, hch, tglx, linux-kernel, bob Robert Wisniewski <bob@watson.ibm.com> wrote: > > modify_val_spin() > { > acquire_spin_lock() > // calculate some_value based on global_val > // for example c=global_val; if (c%0) some_value=10; else some_value=20; > global_val = global_val + some_value > release_spin_lock() > } > > modify_val_atomic() > { > do > // calculate some_value based on global_val > // for example c=global_val; if (c%0) some_value=10; else some_value=20; > global_val = global_val + some_value > while (compare_and_store(global_val, , )) > } > > What's the difference. The deal is if two processes execute this code > simultaneously and one gets interrupted in the middle of modify_val_spin, > then the other wastes its entire quantum spinning for the lock. In the > modify_val_atomic if one process gets interrupted, no problem, the other > process can proceed through, then when the first one runs again the CAS > will fail, and it will go around the loop again. One could use spin_lock_irq(). The performance would be similar. > Now imagine it was the kernel involved... Or are you saying that userspace does the above as well? ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 20:32 ` 2.6.11-rc1-mm1 Andrew Morton @ 2005-01-16 21:06 ` Robert Wisniewski 2005-01-16 21:40 ` 2.6.11-rc1-mm1 Arjan van de Ven 0 siblings, 1 reply; 142+ messages in thread From: Robert Wisniewski @ 2005-01-16 21:06 UTC (permalink / raw) To: Andrew Morton; +Cc: Robert Wisniewski, karim, hch, tglx, linux-kernel Andrew Morton writes: > Robert Wisniewski <bob@watson.ibm.com> wrote: > > > > modify_val_spin() > > { > > acquire_spin_lock() > > // calculate some_value based on global_val > > // for example c=global_val; if (c%0) some_value=10; else some_value=20; > > global_val = global_val + some_value > > release_spin_lock() > > } > > > > modify_val_atomic() > > { > > do > > // calculate some_value based on global_val > > // for example c=global_val; if (c%0) some_value=10; else some_value=20; > > global_val = global_val + some_value > > while (compare_and_store(global_val, , )) > > } > > > > What's the difference. The deal is if two processes execute this code > > simultaneously and one gets interrupted in the middle of modify_val_spin, > > then the other wastes its entire quantum spinning for the lock. In the > > modify_val_atomic if one process gets interrupted, no problem, the other > > process can proceed through, then when the first one runs again the CAS > > will fail, and it will go around the loop again. > > One could use spin_lock_irq(). The performance would be similar. Yes on some architectures I think you right (on some archs though I'm not so sure) - Ingo and I had that debate a bit ago. But as you astutely noted or asked below, the original intent was to be able to use a single shared buffer for user and kernel space. In fact, the lockless design of tracing in K42, which motivated this design does that. For a couple of reasons we have not (yet?) done that for LTT. But, for example, NPTL could have made use of it when they were investigating a tracing facility. Recently, another company using LTT for device driver and video debugging is very interested in cheap user space tracing in conjunction with kernel tracing because they need both sets of events to understand what is up. The debate is still open for the best way to get cheap user space logging, but there seems to be an increasing need for it by the community. > > > Now imagine it was the kernel involved... > > Or are you saying that userspace does the above as well? :-) - as above. Furthermore, it seems that reducing the places where interrupts are disabled would be a good thing? By not introducing additional disable interrupts tracing has less of an impact. I was also pointing out Christoph's statement that spin locks and atomic ops are the same is not accurate (except for perhaps limited cases, but then you must make such arguments - not necessarily good), and we had good reasons for using an atomic op. Thanks. -bob Robert Wisniewski The K42 MP OS Project http://www.research.ibm.com/K42/ bob@watson.ibm.com ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 21:06 ` 2.6.11-rc1-mm1 Robert Wisniewski @ 2005-01-16 21:40 ` Arjan van de Ven 2005-01-17 15:48 ` 2.6.11-rc1-mm1 Robert Wisniewski 0 siblings, 1 reply; 142+ messages in thread From: Arjan van de Ven @ 2005-01-16 21:40 UTC (permalink / raw) To: Robert Wisniewski; +Cc: Andrew Morton, karim, hch, tglx, linux-kernel On Sun, 2005-01-16 at 16:06 -0500, Robert Wisniewski wrote: > :-) - as above. Furthermore, it seems that reducing the places where > interrupts are disabled would be a good thing? depends at the price. On several cpus, disabling interupts is hundreds of times cheaper than doing an atomic op. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 21:40 ` 2.6.11-rc1-mm1 Arjan van de Ven @ 2005-01-17 15:48 ` Robert Wisniewski 2005-01-17 16:13 ` 2.6.11-rc1-mm1 Christoph Hellwig 0 siblings, 1 reply; 142+ messages in thread From: Robert Wisniewski @ 2005-01-17 15:48 UTC (permalink / raw) To: Arjan van de Ven Cc: Robert Wisniewski, Andrew Morton, karim, hch, tglx, linux-kernel Arjan van de Ven writes: > On Sun, 2005-01-16 at 16:06 -0500, Robert Wisniewski wrote: > > > :-) - as above. Furthermore, it seems that reducing the places where > > interrupts are disabled would be a good thing? > > depends at the price. On several cpus, disabling interupts is hundreds > of times cheaper than doing an atomic op. Wow - disabling interrupts is handfuls to tens of cycles, so that means some architectures take thousands of cycles to do atomic operations. Then I would definitely agree we should not be using atomic operations on those, fwiw, out of curiosity, what archs make atomic ops so expensive. Andrew, on the broader note. If the community feels disabling interrupts is the better way to go for the variables (I think it's index and count) we were protecting with atomic ops then as the code stands things should be fine with that approach and we can make that change. Thanks for your attention to looking through this. -bob Robert Wisniewski The K42 MP OS Project http://www.research.ibm.com/K42/ bob@watson.ibm.com ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-17 15:48 ` 2.6.11-rc1-mm1 Robert Wisniewski @ 2005-01-17 16:13 ` Christoph Hellwig 2005-01-17 21:38 ` 2.6.11-rc1-mm1 Karim Yaghmour 0 siblings, 1 reply; 142+ messages in thread From: Christoph Hellwig @ 2005-01-17 16:13 UTC (permalink / raw) To: Robert Wisniewski Cc: Arjan van de Ven, Andrew Morton, karim, hch, tglx, linux-kernel On Mon, Jan 17, 2005 at 10:48:52AM -0500, Robert Wisniewski wrote: > Wow - disabling interrupts is handfuls to tens of cycles, so that means > some architectures take thousands of cycles to do atomic operations. Then > I would definitely agree we should not be using atomic operations on those, > fwiw, out of curiosity, what archs make atomic ops so expensive. > > Andrew, on the broader note. If the community feels disabling interrupts > is the better way to go for the variables (I think it's index and count) we > were protecting with atomic ops then as the code stands things should be > fine with that approach and we can make that change. The thing I'm unhappy with is what the code does currently. I haven't looked at the code enough nor through about the problem enough to tell you what's the right thing to do. Knowing that will involve review of the architecture and serious benchmarking on a few plattforms. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-17 16:13 ` 2.6.11-rc1-mm1 Christoph Hellwig @ 2005-01-17 21:38 ` Karim Yaghmour 0 siblings, 0 replies; 142+ messages in thread From: Karim Yaghmour @ 2005-01-17 21:38 UTC (permalink / raw) To: Christoph Hellwig Cc: Robert Wisniewski, Arjan van de Ven, Andrew Morton, tglx, linux-kernel Hello Chistoph, Christoph Hellwig wrote: > The thing I'm unhappy with is what the code does currently. I haven't > looked at the code enough nor through about the problem enough to tell > you what's the right thing to do. Knowing that will involve review of > the architecture and serious benchmarking on a few plattforms. Like I was saying elswhere, we are likely going to drop the lockless code for now (i.e. the code that does the cmpxchg). Instead we will depend on normal cli/sti abstractions. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || karim@opersys.com || 1-866-677-4546 ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 20:11 ` 2.6.11-rc1-mm1 Robert Wisniewski 2005-01-16 20:32 ` 2.6.11-rc1-mm1 Andrew Morton @ 2005-01-16 20:39 ` Christoph Hellwig 2005-01-16 21:14 ` 2.6.11-rc1-mm1 Robert Wisniewski 1 sibling, 1 reply; 142+ messages in thread From: Christoph Hellwig @ 2005-01-16 20:39 UTC (permalink / raw) To: Robert Wisniewski Cc: karim, Christoph Hellwig, tglx, Andrew Morton, linux-kernel On Sun, Jan 16, 2005 at 03:11:00PM -0500, Robert Wisniewski wrote: > int global_val; > > modify_val_spin() > { > acquire_spin_lock() > // calculate some_value based on global_val > // for example c=global_val; if (c%0) some_value=10; else some_value=20; > global_val = global_val + some_value > release_spin_lock() > } > > modify_val_atomic() > { > do > // calculate some_value based on global_val > // for example c=global_val; if (c%0) some_value=10; else some_value=20; > global_val = global_val + some_value > while (compare_and_store(global_val, , )) > } > > What's the difference. The deal is if two processes execute this code > simultaneously and one gets interrupted in the middle of modify_val_spin, > then the other wastes its entire quantum spinning for the lock. In the > modify_val_atomic if one process gets interrupted, no problem, the other > process can proceed through, then when the first one runs again the CAS > will fail, and it will go around the loop again. Now imagine it was the > kernel involved... Just prevent that with spin_lock_irq. But anyway this example doesn't fit the ltt code. cmpxchg loops can make lots of sense for such simple loops, but as soon as you need to do significant work in the loop it starts to get problematic. Your example would btw be better off using atomic_t and it's primitives so you abstract away the actual implementation and the architecture can chose the most efficient implementation. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 20:39 ` 2.6.11-rc1-mm1 Christoph Hellwig @ 2005-01-16 21:14 ` Robert Wisniewski 0 siblings, 0 replies; 142+ messages in thread From: Robert Wisniewski @ 2005-01-16 21:14 UTC (permalink / raw) To: Christoph Hellwig Cc: Robert Wisniewski, karim, tglx, Andrew Morton, linux-kernel Christoph Hellwig writes: > On Sun, Jan 16, 2005 at 03:11:00PM -0500, Robert Wisniewski wrote: > > int global_val; > > > > modify_val_spin() > > { > > acquire_spin_lock() > > // calculate some_value based on global_val > > // for example c=global_val; if (c%0) some_value=10; else some_value=20; > > global_val = global_val + some_value > > release_spin_lock() > > } > > > > modify_val_atomic() > > { > > do > > // calculate some_value based on global_val > > // for example c=global_val; if (c%0) some_value=10; else some_value=20; > > global_val = global_val + some_value > > while (compare_and_store(global_val, , )) > > } > > > > What's the difference. The deal is if two processes execute this code > > simultaneously and one gets interrupted in the middle of modify_val_spin, > > then the other wastes its entire quantum spinning for the lock. In the > > modify_val_atomic if one process gets interrupted, no problem, the other > > process can proceed through, then when the first one runs again the CAS > > will fail, and it will go around the loop again. Now imagine it was the > > kernel involved... > > Just prevent that with spin_lock_irq. But anyway this example doesn't > fit the ltt code. cmpxchg loops can make lots of sense for such simple > loops, but as soon as you need to do significant work in the loop it > starts to get problematic. Your example would btw be better off using The loop in question is where we grab the current (old) index, perform a computation (or three). The only expensive operation is the timestamp acquisition which has been modified to use the cheaper rtsc, so I still think that's within the realm of a reasonably simply loop. I think what you want to avoid is starting to walk a (potentially indeterminate) data structure in such atomic op loop. > atomic_t and it's primitives so you abstract away the actual implementation > and the architecture can chose the most efficient implementation. > That's an interesting thought because it might address Andrew's concern. We'll investigate. Thanks. -bob ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 8:23 2.6.11-rc1-mm1 Andrew Morton ` (9 preceding siblings ...) [not found] ` <1105740276.8604.83.camel@tglx.tec.linutronix.de> @ 2005-01-15 2:58 ` William Lee Irwin III 2005-01-17 22:19 ` 2.6.11-rc1-mm1 William Lee Irwin III 2005-01-16 0:59 ` 2.6.11-rc1-mm1 Joseph Fannin 11 siblings, 1 reply; 142+ messages in thread From: William Lee Irwin III @ 2005-01-15 2:58 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Fri, Jan 14, 2005 at 12:23:52AM -0800, Andrew Morton wrote: > - Added bk-xfs to the -mm "external trees" lineup. > - Added the Linux Trace Toolkit (and hence relayfs). Mainly because I > haven't yet taken as close a look at LTT as I should have. Probably neither > have you. > It needs a bit of work on the kernel<->user periphery, which is not a big > deal. [...] No idea what hit me just yet. x86-64 doesn't boot. Still going through the various architectures. The same system (including the initrd FPOS bullcrap, though, of course, I'm using an initrd built just for this kernel) boots various 2.6.x up to 2.6.10-mm1. There are vague indications something in/around SCSI and/or initrd's has violently exploded in my face. -- wli Booting '2.6.11-rc1-mm1' kernel (hd0,0)/vmlinuz-2.6.11-rc1-mm1 early_printk=serial root=/dev/sda2 consol e=ttyS0,9600 profile=1 debug initcall_debug nmi_watchdog=2 elevator=cfq splash= silent showopts resume=/dev/sda3 desktop [Linux-bzImage, setup=0x1600, size=0x1c4711] initrd (hd0,0)/initrd-2.6.11-rc1-mm1 [Linux-initrd @ 0x37ceb000, 0x304d0d bytes] Bootdata ok (command line is early_printk=serial root=/dev/sda2 console=ttyS0,9600 profile=1 debug initcall_debug nmi_watchdog=2 elevator=cfq splash=silent showopts resume=/dev/sda3 desktop) Linux version 2.6.11-rc1-mm1 (wli@residue) (gcc version 3.3.3 (SuSE Linux)) #2 SMP Fri Jan 14 18:00:33 PST 2005 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000ebbd0 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000007ffd0000 (usable) BIOS-e820: 000000007ffd0000 - 000000007ffdf000 (ACPI data) BIOS-e820: 000000007ffdf000 - 0000000080000000 (ACPI NVS) BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved) BIOS-e820: 00000000ffc00000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 0000000180000000 (usable) ACPI: RSDP (v000 ACPIAM ) @ 0x00000000000f6710 ACPI: RSDT (v001 A M I OEMRSDT 0x05000427 MSFT 0x00000097) @ 0x000000007ffd0000 ACPI: FADT (v002 A M I OEMFACP 0x05000427 MSFT 0x00000097) @ 0x000000007ffd0200 ACPI: MADT (v001 A M I OEMAPIC 0x05000427 MSFT 0x00000097) @ 0x000000007ffd0390 ACPI: MCFG (v001 Intel Cayuse 0x00000001 MSFT 0x00000001) @ 0x000000007ffd0420 ACPI: OEMB (v001 A M I AMI_OEM 0x05000427 MSFT 0x00000097) @ 0x000000007ffdf040 ACPI: HPET (v001 A M I OEMHPET 0x05000427 MSFT 0x00000097) @ 0x000000007ffd7460 ACPI: DSDT (v001 CYCRB CYCRB039 0x00000039 INTL 0x02002026) @ 0x0000000000000000 No NUMA configuration found Faking a node at 0000000000000000-0000000180000000 Bootmem setup node 0 0000000000000000-0000000180000000 On node 0 totalpages: 1572864 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 1568768 pages, LIFO batch:16 HighMem zone: 0 pages, LIFO batch:1 ACPI: Local APIC address 0xfee00000 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 15:3 APIC version 16 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06] enabled) Processor #6 15:3 APIC version 16 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled) Processor #1 15:3 APIC version 16 ACPI: LAPIC (acpi_id[0x04] lapic_id[0x07] enabled) Processor #7 15:3 APIC version 16 ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23 ACPI: IOAPIC (id[0x09] address[0xfec81000] gsi_base[24]) IOAPIC[1]: apic_id 9, version 32, address 0xfec81000, GSI 24-47 ACPI: IOAPIC (id[0x0a] address[0xfec81400] gsi_base[48]) IOAPIC[2]: apic_id 10, version 32, address 0xfec81400, GSI 48-71 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. Setting APIC routing to flat ACPI: HPET id: 0x8086a202 base: 0xfed00000 Using ACPI (MADT) for SMP configuration information Checking aperture... Built 1 zonelists Initializing CPU#0 Kernel command line: early_printk=serial root=/dev/sda2 console=ttyS0,9600 profile=1 debug initcall_debug nmi_watchdog=2 elevator=cfq splash=silent showopts resume=/dev/sda3 desktop kernel profiling enabled (shift: 1) PID hash table entries: 4096 (order: 12, 131072 bytes) time.c: Using 14.318180 MHz HPET timer. time.c: Detected 3400.235 MHz processor. Console: colour VGA+ 80x25 Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) Placing software IO TLB between 0x7528000 - 0x9528000 Memory: 4048952k/6291456k available (2395k kernel code, 0k reserved, 1484k data, 224k init) Calibrating delay loop... 6750.20 BogoMIPS (lpj=3375104) Security Framework v1.0.0 initialized SELinux: Initializing. SELinux: Starting in permissive mode selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Mount-cache hash table entries: 256 (order: 0, 4096 bytes) CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K using mwait in idle threads. CPU: Physical Processor ID: 0 CPU0: Thermal monitoring enabled (TM1) CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K CPU: Physical Processor ID: 0 CPU0: Intel(R) Xeon(TM) CPU 3.40GHz stepping 04 per-CPU timeslice cutoff: 1023.90 usecs. task migration cache decay timeout: 2 msecs. Booting processor 1/6 rip 6000 rsp ffff81007ff95f58 Initializing CPU#1 Calibrating delay loop... 6782.97 BogoMIPS (lpj=3391488) CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K CPU: Physical Processor ID: 3 CPU1: Thermal monitoring enabled (TM1) Intel(R) Xeon(TM) CPU 3.40GHz stepping 04 Booting processor 2/1 rip 6000 rsp ffff810037c8df58 Initializing CPU#2 Calibrating delay loop... 6782.97 BogoMIPS (lpj=3391488) CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K CPU: Physical Processor ID: 0 CPU2: Thermal monitoring enabled (TM1) Intel(R) Xeon(TM) CPU 3.40GHz stepping 04 Booting processor 3/7 rip 6000 rsp ffff81007ff03f58 Initializing CPU#3 Calibrating delay loop... 6782.97 BogoMIPS (lpj=3391488) CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K CPU: Physical Processor ID: 3 CPU3: Thermal monitoring enabled (TM1) Intel(R) Xeon(TM) CPU 3.40GHz stepping 04 Total of 4 processors activated (27099.13 BogoMIPS). Using local APIC timer interrupts. Detected 12.500 MHz APIC timer. checking TSC synchronization across 4 CPUs: passed. time.c: Using HPET based timekeeping. Brought up 4 CPUs CPU0 attaching sched-domain: domain 0: span 05 groups: 01 04 domain 1: span 0f groups: 05 0a domain 2: span 0f groups: 0f CPU1 attaching sched-domain: domain 0: span 0a groups: 02 08 domain 1: span 0f groups: 0a 05 domain 2: span 0f groups: 0f CPU2 attaching sched-domain: domain 0: span 05 groups: 04 01 domain 1: span 0f groups: 05 0a domain 2: span 0f groups: 0f CPU3 attaching sched-domain: domain 0: span 0a groups: 08 02 domain 1: span 0f groups: 0a 05 domain 2: span 0f groups: 0f checking if image is initramfs...it isn't (no cpio magic); looks like an initrd Calling initcall 0xffffffff805633a0: cpufreq_tsc+0x0/0x90() Calling initcall 0xffffffff8056e390: init_elf32_binfmt+0x0/0x10() Calling initcall 0xffffffff80570180: helper_init+0x0/0x40() Calling initcall 0xffffffff80570280: pm_init+0x0/0x30() Calling initcall 0xffffffff80570400: ksysfs_init+0x0/0x30() Losing some ticks... checking if CPU frequency changed. Calling initcall 0xffffffff80572510: filelock_init+0x0/0x40() Calling initcall 0xffffffff80572ce0: init_script_binfmt+0x0/0x10() Calling initcall 0xffffffff80572cf0: init_elf_binfmt+0x0/0x10() Calling initcall 0xffffffff805809f0: netlink_proto_init+0x0/0x200() NET: Registered protocol family 16 Calling initcall 0xffffffff805744c0: kobject_uevent_init+0x0/0x40() Calling initcall 0xffffffff805745a0: pcibus_class_init+0x0/0x10() Calling initcall 0xffffffff80574c20: pci_driver_init+0x0/0x10() Calling initcall 0xffffffff80578520: tty_class_init+0x0/0x30() Calling initcall 0xffffffff8057ac90: register_node_type+0x0/0x10() Calling initcall 0xffffffff80566490: mtrr_if_init+0x0/0x80() Calling initcall 0xffffffff8057f100: pci_direct_init+0x0/0x1b0() PCI: Using configuration type 1 Calling initcall 0xffffffff8057fe30: pci_mmcfg_init+0x0/0x90() PCI: Using MMCONFIG at e0000000 Calling initcall 0xffffffff805662f0: mtrr_init+0x0/0x1a0() mtrr: v2.0 (20020519) Calling initcall 0xffffffff8056d290: topology_init+0x0/0x70() Calling initcall 0xffffffff801541a0: pm_sysrq_init+0x0/0x20() Calling initcall 0xffffffff80572240: init_bio+0x0/0x190() Calling initcall 0xffffffff805754d0: fbmem_init+0x0/0xb0() Calling initcall 0xffffffff80577622: acpi_init+0x0/0x1f1() ACPI: Subsystem revision 20041203 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing Calling initcall 0xffffffff8057792c: acpi_ec_init+0x0/0x5e() Calling initcall 0xffffffff80577d09: acpi_pci_root_init+0x0/0x20() Calling initcall 0xffffffff80577e85: acpi_pci_link_init+0x0/0x42() Calling initcall 0xffffffff80577ec7: acpi_power_init+0x0/0x74() Calling initcall 0xffffffff80577f3b: acpi_system_init+0x0/0xc7() Calling initcall 0xffffffff80578002: acpi_event_init+0x0/0x3e() Calling initcall 0xffffffff80578040: acpi_scan_init+0x0/0xc4() ACPI: PCI Root Bridge [PCI0] (00:00) PCI: Probing PCI hardware (bus 00) PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.2 PCI: Transparent bridge - 0000:00:1e.0 ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P1._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P2._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P2.P2P3._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P2.P2P4._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P6._PRT] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 *5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 *10 11 12 14 15) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 *10 11 12 14 15) ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 *7 10 11 12 14 15) Calling initcall 0xffffffff80578e60: misc_init+0x0/0x90() Calling initcall 0xffffffff8057ae10: device_init+0x0/0x40() Calling initcall 0xffffffff8057e850: input_init+0x0/0x170() Calling initcall 0xffffffff8057f320: pci_acpi_init+0x0/0x130() PCI: Using ACPI for IRQ routing ** PCI interrupts are no longer routed automatically. If this ** causes a device to stop working, it is probably because the ** driver failed to call pci_enable_device(). As a temporary ** workaround, the "pci=routeirq" argument restores the old ** behavior. If this argument makes the device work again, ** please email the output of "lspci" to bjorn.helgaas@hp.com ** so I can fix the driver. Calling initcall 0xffffffff8057f450: pci_legacy_init+0x0/0x100() Calling initcall 0xffffffff8057f970: pcibios_irq_init+0x0/0x450() Calling initcall 0xffffffff8057fdc0: pcibios_init+0x0/0x70() Calling initcall 0xffffffff80580360: net_dev_init+0x0/0x200() Calling initcall 0xffffffff805808f0: pktsched_init+0x0/0xc0() Calling initcall 0xffffffff805809b0: tc_filter_init+0x0/0x40() Calling initcall 0xffffffff80563430: late_hpet_init+0x0/0xc0() hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 hpet0: 69ns tick, 3 64-bit timers Calling initcall 0xffffffff8056c1f0: pci_iommu_init+0x0/0x610() PCI-DMA: Using software bounce buffering for IO (SWIOTLB) Calling initcall 0xffffffff80572490: init_pipe_fs+0x0/0x50() Calling initcall 0xffffffff80578104: acpi_motherboard_init+0x0/0x1bc() Calling initcall 0xffffffff805782c0: chr_dev_init+0x0/0x90() Calling initcall 0xffffffff8057ed20: cpufreq_gov_performance_init+0x0/0x10() Calling initcall 0xffffffff8057ed30: pcibios_assign_resources+0x0/0xf0() Calling initcall 0xffffffff8057fec0: fill_mp_bus_to_cpumask+0x0/0x100() Calling initcall 0xffffffff80112800: time_init_device+0x0/0x30() Calling initcall 0xffffffff80564a80: init_timer_sysfs+0x0/0x30() Calling initcall 0xffffffff80564a50: i8259A_init_sysfs+0x0/0x30() Calling initcall 0xffffffff80564f50: vsyscall_init+0x0/0x90() Calling initcall 0xffffffff805654b0: sbf_init+0x0/0xd0() Calling initcall 0xffffffff80566040: mce_init_device+0x0/0xf0() Calling initcall 0xffffffff80565fd0: periodic_mcheck_init+0x0/0x30() Calling initcall 0xffffffff80568300: init_lapic_sysfs+0x0/0x40() Calling initcall 0xffffffff805697d0: ioapic_init_sysfs+0x0/0xd0() Calling initcall 0xffffffff8056d330: x8664_sysctl_init+0x0/0x20() Calling initcall 0xffffffff8056e370: ia32_init+0x0/0x20() IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $ Calling initcall 0xffffffff8056e3a0: ia32_binfmt_init+0x0/0x20() Calling initcall 0xffffffff8056e3c0: init_syscall32+0x0/0x120() Calling initcall 0xffffffff8056e4e0: init_aout_binfmt+0x0/0x10() Calling initcall 0xffffffff8056f450: create_proc_profile+0x0/0x410() Calling initcall 0xffffffff8056f930: ioresources_init+0x0/0x50() Calling initcall 0xffffffff8056fae0: uid_cache_init+0x0/0xb0() Calling initcall 0xffffffff8056fed0: param_sysfs_init+0x0/0x200() Calling initcall 0xffffffff805700d0: init_posix_timers+0x0/0xb0() Calling initcall 0xffffffff805701c0: init+0x0/0x60() Calling initcall 0xffffffff80570220: proc_dma_init+0x0/0x30() Calling initcall 0xffffffff8014f870: percpu_modinit+0x0/0x90() Calling initcall 0xffffffff80570250: kallsyms_init+0x0/0x30() Calling initcall 0xffffffff805702b0: ikconfig_init+0x0/0x40() Calling initcall 0xffffffff80570370: audit_init+0x0/0x90() audit: initializing netlink socket (disabled) audit(1105757136.391:0): initialized Calling initcall 0xffffffff80570fb0: init_per_zone_pages_min+0x0/0x50() Calling initcall 0xffffffff80571ae0: pdflush_init+0x0/0x20() Calling initcall 0xffffffff80571b00: cpucache_init+0x0/0x30() Calling initcall 0xffffffff80571e80: kswapd_init+0x0/0x60() Calling initcall 0xffffffff80571f20: procswaps_init+0x0/0x30() Calling initcall 0xffffffff80571f50: hugetlb_init+0x0/0xb0() Total HugeTLB memory allocated, 0 Calling initcall 0xffffffff805720d0: init_tmpfs+0x0/0xe0() Calling initcall 0xffffffff805724e0: fasync_init+0x0/0x30() Calling initcall 0xffffffff80572b20: aio_setup+0x0/0x70() Calling initcall 0xffffffff80572b90: eventpoll_init+0x0/0xf0() Calling initcall 0xffffffff80572c80: init_sys32_ioctl+0x0/0x60() Calling initcall 0xffffffff80572d00: init_mbcache+0x0/0x30() Calling initcall 0xffffffff80572d30: dquot_init+0x0/0x100() VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) Calling initcall 0xffffffff80572e30: dnotify_init+0x0/0x30() Calling initcall 0xffffffff805732b0: init_devpts_fs+0x0/0x40() Calling initcall 0xffffffff805732f0: init_ext2_fs+0x0/0x70() Calling initcall 0xffffffff805733a0: init_ramfs_fs+0x0/0x10() Calling initcall 0xffffffff805733c0: init_hugetlbfs_fs+0x0/0x80() Calling initcall 0xffffffff80573440: init_minix_fs+0x0/0x60() Calling initcall 0xffffffff805734a0: init_iso9660_fs+0x0/0x70() Calling initcall 0xffffffff805735a0: init_nfs_fs+0x0/0xa0() Calling initcall 0xffffffff80573d00: init_nlm+0x0/0x30() Calling initcall 0xffffffff80573d30: ipc_init+0x0/0x20() Calling initcall 0xffffffff80573ed0: init_mqueue_fs+0x0/0xe0() Calling initcall 0xffffffff80574100: selinux_nf_ip_init+0x0/0x60() SELinux: Registering netfilter hooks Calling initcall 0xffffffff80574240: init_sel_fs+0x0/0x70() Calling initcall 0xffffffff805742b0: selnl_init+0x0/0x50() Calling initcall 0xffffffff80574300: sel_netif_init+0x0/0x80() Calling initcall 0xffffffff80574420: init_crypto+0x0/0x20() Initializing Cryptographic API Calling initcall 0xffffffff80574470: init+0x0/0x10() Calling initcall 0xffffffff80574480: init+0x0/0x40() Calling initcall 0xffffffff80228860: pci_init+0x0/0x30() Intel E7520/7320/7525 detected.<7>Calling initcall 0xffffffff80574c30: pci_sysfs_init+0x0/0x40() Calling initcall 0xffffffff80574c70: pci_proc_init+0x0/0x70() Calling initcall 0xffffffff805751a0: fb_console_init+0x0/0x70() Calling initcall 0xffffffff80576b50: vesafb_init+0x0/0x68() Calling initcall 0xffffffff80577e50: irqrouter_init_sysfs+0x0/0x35() Calling initcall 0xffffffff80578350: rand_initialize+0x0/0x1b0() Calling initcall 0xffffffff80578550: tty_init+0x0/0x1e0() Calling initcall 0xffffffff80578770: inotify_init+0x0/0x100() inotify device minor=63 Calling initcall 0xffffffff80578870: pty_init+0x0/0x5f0() Calling initcall 0xffffffff80579430: rtc_init+0x0/0x200() Real Time Clock Driver v1.12 Calling initcall 0xffffffff80579630: hpet_init+0x0/0x70() hpet_acpi_add: no address or irqs in _CRS Calling initcall 0xffffffff805796a0: nvram_init+0x0/0x90() Non-volatile memory driver v1.2 Calling initcall 0xffffffff80579790: agp_init+0x0/0x30() Linux agpgart interface v0.101 (c) Dave Jones Calling initcall 0xffffffff805798a0: serio_init+0x0/0x60() Calling initcall 0xffffffff80579990: i8042_init+0x0/0x650() Calling initcall 0xffffffff8057a3d0: serial8250_init+0x0/0x110() Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing disabled ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A Calling initcall 0xffffffff8057a5b0: serial8250_pci_init+0x0/0x10() Calling initcall 0xffffffff80286e90: elevator_global_init+0x0/0x10() Calling initcall 0xffffffff8057ae50: noop_init+0x0/0x10() io scheduler noop registered Calling initcall 0xffffffff8057ae60: as_init+0x0/0x60() io scheduler anticipatory registered Calling initcall 0xffffffff8057aec0: deadline_init+0x0/0x60() io scheduler deadline registered Calling initcall 0xffffffff80294810: cfq_init+0x0/0xb0() io scheduler cfq registered (default) Calling initcall 0xffffffff8057af20: rd_init+0x0/0x1c0() RAMDISK driver initialized: 16 RAM disks of 128000K size 1024 blocksize Calling initcall 0xffffffff8057b150: loop_init+0x0/0x340() loop: loaded (max 8 devices) Calling initcall 0xffffffff8057b500: net_olddevs_init+0x0/0xe0() Calling initcall 0xffffffff80296e30: aec62xx_ide_init+0x0/0x10() Calling initcall 0xffffffff80297540: ali15x3_ide_init+0x0/0x10() Calling initcall 0xffffffff80298630: amd74xx_ide_init+0x0/0x10() Calling initcall 0xffffffff80299820: atiixp_ide_init+0x0/0x10() Calling initcall 0xffffffff80299dd0: cmd64x_ide_init+0x0/0x10() Calling initcall 0xffffffff8029b200: sc1200_ide_init+0x0/0x10() Calling initcall 0xffffffff8029bd20: cy82c693_ide_init+0x0/0x10() Calling initcall 0xffffffff8029c050: hpt34x_ide_init+0x0/0x10() Calling initcall 0xffffffff8029c730: hpt366_ide_init+0x0/0x10() Calling initcall 0xffffffff8029e640: ns87415_ide_init+0x0/0x10() Calling initcall 0xffffffff8029ea00: pdc202xx_ide_init+0x0/0x10() Calling initcall 0xffffffff8029fac0: pdc202new_ide_init+0x0/0x10() Calling initcall 0xffffffff8057c3c0: piix_ide_init+0x0/0xd0() Calling initcall 0xffffffff802a1050: rz1000_ide_init+0x0/0x10() Calling initcall 0xffffffff802a1130: svwks_ide_init+0x0/0x10() Calling initcall 0xffffffff802a1ac0: siimage_ide_init+0x0/0x10() Calling initcall 0xffffffff802a31f0: sis5513_ide_init+0x0/0x10() Calling initcall 0xffffffff802a4280: slc90e66_ide_init+0x0/0x10() Calling initcall 0xffffffff802a47e0: triflex_ide_init+0x0/0x10() Calling initcall 0xffffffff802a4cb0: via_ide_init+0x0/0x10() Calling initcall 0xffffffff802a5f40: generic_ide_init+0x0/0x10() Calling initcall 0xffffffff8057df40: ide_init+0x0/0x80() Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx Calling initcall 0xffffffff8057e820: ide_generic_init+0x0/0x20() Probing IDE interface ide0... hda: TEAC DW-548D, ATAPI CD/DVD-ROM drive ide1: I/O resource 0x170-0x177 not free. ide1: ports already in use, skipping probe Probing IDE interface ide2... ide2: Wait for ready failed before probe ! Probing IDE interface ide3... ide3: Wait for ready failed before probe ! Probing IDE interface ide4... ide4: Wait for ready failed before probe ! Probing IDE interface ide5... ide5: Wait for ready failed before probe ! ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Calling initcall 0xffffffff802b3bf0: idedisk_init+0x0/0x10() Calling initcall 0xffffffff802b5cb0: ide_cdrom_init+0x0/0x20() hda: ATAPI 48X DVD-ROM CD-R/RW drive, 2048kB Cache Uniform CD-ROM driver Revision: 3.20 Calling initcall 0xffffffff802ba550: idefloppy_init+0x0/0x30() ide-floppy driver 0.99.newide Calling initcall 0xffffffff8057e840: cdrom_init+0x0/0x10() Calling initcall 0xffffffff8057e9c0: mousedev_init+0x0/0xe0() mice: PS/2 mouse device common for all mice Calling initcall 0xffffffff8057eaa0: atkbd_init+0x0/0x20() Calling initcall 0xffffffff8057eac0: psmouse_init+0x0/0xb0() Calling initcall 0xffffffff8057eb70: pcspkr_init+0x0/0x80() input: PC Speaker Calling initcall 0xffffffff8057ebf0: md_init+0x0/0x130() md: md driver 0.90.1 MAX_MD_DEVS=256, MD_SB_DISKS=27 Calling initcall 0xffffffff80580140: flow_cache_init+0x0/0x220() Calling initcall 0xffffffff805807c0: llc_init+0x0/0x70() Calling initcall 0xffffffff80580830: snap_init+0x0/0x40() Calling initcall 0xffffffff80580870: rif_init+0x0/0x80() Calling initcall 0xffffffff805816b0: inet_init+0x0/0x3f0() NET: Registered protocol family 2 IP: routing cache hash table of 32768 buckets, 512Kbytes TCP established hash table entries: 262144 (order: 10, 4194304 bytes) TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) TCP: Hash tables configured (established 262144 bind 65536) Calling initcall 0xffffffff80583970: tcpdiag_init+0x0/0x30() Calling initcall 0xffffffff80583b70: af_unix_init+0x0/0x80() NET: Registered protocol family 1 Calling initcall 0xffffffff80583bf0: init_sunrpc+0x0/0x50() Calling initcall 0xffffffff80583c40: init_rpcsec_gss+0x0/0x40() Calling initcall 0xffffffff80583c80: init_kerberos_module+0x0/0x25() Calling initcall 0xffffffff80568c60: init_lapic_nmi_sysfs+0x0/0x40() Calling initcall 0xffffffff8025480c: acpi_poweroff_init+0x0/0x3a() Calling initcall 0xffffffff80577450: acpi_wakeup_device_init+0x0/0xec() ACPI wakeup devices: P0P1 MC97 USB1 USB2 USB3 USB4 EUSB P2P3 P2P4 Calling initcall 0xffffffff8057755d: acpi_sleep_init+0x0/0xc5() ACPI: (supports S0 S1 S3 S4 S5) Calling initcall 0xffffffff80254d18: acpi_sleep_proc_init+0x0/0x94() Calling initcall 0xffffffff80578500: seqgen_init+0x0/0x20() Calling initcall 0xffffffff8057a3a0: serial8250_late_console_init+0x0/0x30() Calling initcall 0xffffffff8057aab0: early_uart_console_switch+0x0/0x90() Calling initcall 0xffffffff802e6090: net_random_reseed+0x0/0x50() Calling initcall 0xffffffff80582820: ip_auto_config+0x0/0xf00() md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. RAMDISK: Compressed image found at block 0 VFS: Waiting 19sec for root device... VFS: Waiting 18sec for root device... VFS: Waiting 17sec for root device... VFS: Waiting 16sec for root device... VFS: Waiting 15sec for root device... VFS: Waiting 14sec for root device... VFS: Waiting 13sec for root device... VFS: Waiting 12sec for root device... VFS: Waiting 11sec for root device... VFS: Waiting 10sec for root device... VFS: Waiting 9sec for root device... VFS: Waiting 8sec for root device... VFS: Waiting 7sec for root device... VFS: Waiting 6sec for root device... VFS: Waiting 5sec for root device... VFS: Waiting 4sec for root device... VFS: Waiting 3sec for root device... VFS: Waiting 2sec for root device... VFS: Waiting 1sec for root device... VFS: Cannot open root device "sda2" or unknown-block(0,0) Please append a correct "root=" boot option Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-15 2:58 ` 2.6.11-rc1-mm1 William Lee Irwin III @ 2005-01-17 22:19 ` William Lee Irwin III 0 siblings, 0 replies; 142+ messages in thread From: William Lee Irwin III @ 2005-01-17 22:19 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Fri, Jan 14, 2005 at 06:58:10PM -0800, William Lee Irwin III wrote: > No idea what hit me just yet. x86-64 doesn't boot. Still going through > the various architectures. The same system (including the initrd FPOS > bullcrap, though, of course, I'm using an initrd built just for this > kernel) boots various 2.6.x up to 2.6.10-mm1. There are vague indications > something in/around SCSI and/or initrd's has violently exploded in my face. With the waiting 10s patch backed out, things seem to be going well: $ ssh analyticity Last login: Mon Jan 17 14:03:13 2005 from meromorphy Linux analyticity 2.6.11-rc1-mm1 #5 SMP Sat Jan 15 01:25:23 PST 2005 sparc64 GNU/Linux $ uptime 14:10:55 up 10 min, 7 users, load average: 0.10, 0.40, 0.31 Now I just have to remember to set up ip route add 192.168.1.0/24 dev eth3 via 192.168.1.1 instead of just ip route add 192.168.1.0/24 dev eth3 so I can tftpboot the thing (well, it took all of 10s to figure out, but it may not next time). Routing changes are painful. -- wli ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-14 8:23 2.6.11-rc1-mm1 Andrew Morton ` (10 preceding siblings ...) 2005-01-15 2:58 ` 2.6.11-rc1-mm1 William Lee Irwin III @ 2005-01-16 0:59 ` Joseph Fannin 2005-01-16 19:09 ` 2.6.11-rc1-mm1 Daniel Drake ` (2 more replies) 11 siblings, 3 replies; 142+ messages in thread From: Joseph Fannin @ 2005-01-16 0:59 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Neil Brown, Daniel Drake, William Park [-- Attachment #1: Type: text/plain, Size: 1232 bytes --] On Fri, Jan 14, 2005 at 12:23:52AM -0800, Andrew Morton wrote: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc1/2.6.11-rc1-mm1/ > waiting-10s-before-mounting-root-filesystem.patch > retry mounting the root filesystem at boot time With this patch, initrds seem to get 'skipped'. I think this is probably the cause for the reports of problems with RAID too. Just after loading the initrd (RAMDISK: Loading 5284KiB [1 disk] into ram disk...) the kernel tries to mount the real root fs -- if the necessary drivers are built-in, it proceeds from there; if not, not. I'm guessing that when the initrd code calls mount_block_root() to mount the ramdisk, this bit makes it decide to try to mount the real root instead: if (!ROOT_DEV) { ROOT_DEV = name_to_dev_t(saved_root_name); create_dev(name, ROOT_DEV, root_device_name); } Perhaps this should not be done until after the first attempt to mount fails? Sorry, I haven't had nearly enough coffee today to attempt to make a patch. :-) -- Joseph Fannin jhf@rivenstone.net "Bull in pure form is rare; there is usually some contamination by data." -- William Graves Perry Jr. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 0:59 ` 2.6.11-rc1-mm1 Joseph Fannin @ 2005-01-16 19:09 ` Daniel Drake 2005-01-16 19:20 ` 2.6.11-rc1-mm1 William Lee Irwin III 2005-01-16 21:09 ` 2.6.11-rc1-mm1 Daniel Drake 2005-01-18 2:54 ` [PATCH] Wait and retry mounting root device (revised) Daniel Drake 2 siblings, 1 reply; 142+ messages in thread From: Daniel Drake @ 2005-01-16 19:09 UTC (permalink / raw) To: Joseph Fannin; +Cc: Andrew Morton, linux-kernel, Neil Brown, William Park, wli Joseph Fannin wrote: > On Fri, Jan 14, 2005 at 12:23:52AM -0800, Andrew Morton wrote: > >>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc1/2.6.11-rc1-mm1/ > > >>waiting-10s-before-mounting-root-filesystem.patch >> retry mounting the root filesystem at boot time > > > With this patch, initrds seem to get 'skipped'. I think this is > probably the cause for the reports of problems with RAID too. This seems likely and is probably also the cause of wli's problems mentioned elsewhere in this thread. I had overlooked the way that initrd's work in that part of the boot sequence. Will investigate. Daniel ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 19:09 ` 2.6.11-rc1-mm1 Daniel Drake @ 2005-01-16 19:20 ` William Lee Irwin III 0 siblings, 0 replies; 142+ messages in thread From: William Lee Irwin III @ 2005-01-16 19:20 UTC (permalink / raw) To: Daniel Drake Cc: Joseph Fannin, Andrew Morton, linux-kernel, Neil Brown, William Park Joseph Fannin wrote: >> With this patch, initrds seem to get 'skipped'. I think this is >> probably the cause for the reports of problems with RAID too. On Sun, Jan 16, 2005 at 07:09:31PM +0000, Daniel Drake wrote: > This seems likely and is probably also the cause of wli's problems > mentioned elsewhere in this thread. > I had overlooked the way that initrd's work in that part of the boot > sequence. Will investigate. akpm suspected this immediately, and my tests confirmed it. I should probably do the work to make the box boot with CONFIG_MODULES=n as I don't like initrd's or modules anyway (new points of failure). -- wli ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 0:59 ` 2.6.11-rc1-mm1 Joseph Fannin 2005-01-16 19:09 ` 2.6.11-rc1-mm1 Daniel Drake @ 2005-01-16 21:09 ` Daniel Drake 2005-01-17 23:31 ` 2.6.11-rc1-mm1 J.A. Magallon 2005-01-18 2:54 ` [PATCH] Wait and retry mounting root device (revised) Daniel Drake 2 siblings, 1 reply; 142+ messages in thread From: Daniel Drake @ 2005-01-16 21:09 UTC (permalink / raw) To: Joseph Fannin; +Cc: Andrew Morton, linux-kernel, Neil Brown, William Park, wli [-- Attachment #1: Type: text/plain, Size: 567 bytes --] Hi, Joseph Fannin wrote: > On Fri, Jan 14, 2005 at 12:23:52AM -0800, Andrew Morton wrote: > >>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc1/2.6.11-rc1-mm1/ > > >>waiting-10s-before-mounting-root-filesystem.patch >> retry mounting the root filesystem at boot time > > > With this patch, initrds seem to get 'skipped'. I think this is > probably the cause for the reports of problems with RAID too. This patch should do the job. Replaces the existing waiting-10s-before-mounting-root-filesystem.patch in 2.6.11-rc1-mm1. Daniel [-- Attachment #2: waiting-10s-before-mounting-root-filesystem.patch --] [-- Type: text/x-patch, Size: 2910 bytes --] Retry up to 20 times if mounting the root device fails. This fixes booting from usb-storage devices, which no longer make their partitions immediately available. Also cleans up the mount_block_root() function. Based on an earlier patch from William Park <opengeometry@yahoo.ca> Signed-off-by: Daniel Drake <dsd@gentoo.org> --- linux-2.6.10/init/do_mounts.c.orig 2005-01-16 19:18:57.000000000 +0000 +++ linux-2.6.10/init/do_mounts.c 2005-01-16 21:04:29.198471440 +0000 @@ -6,6 +6,7 @@ #include <linux/suspend.h> #include <linux/root_dev.h> #include <linux/security.h> +#include <linux/delay.h> #include <linux/nfs_fs.h> #include <linux/nfs_fs_sb.h> @@ -261,6 +262,9 @@ static void __init get_fs_names(char *pa static int __init do_mount_root(char *name, char *fs, int flags, void *data) { int err = sys_mount(name, "/root", fs, flags, data); + if (err == -EACCES && (flags | MS_RDONLY) == 0) + err = sys_mount(name, "/root", fs, flags | MS_RDONLY, data); + if (err) return err; @@ -273,38 +277,57 @@ static int __init do_mount_root(char *na return 0; } +static int __init mount_root_try_all_fs(char *name, char *fs_names, int flags, void *data) +{ + char *p; + int err = -EFAULT; + + for (p = fs_names; *p; p += strlen(p)+1) { + err = do_mount_root(name, p, flags, root_mount_data); + if (err != -EINVAL) + break; + } + + return err; +} + void __init mount_block_root(char *name, int flags) { char *fs_names = __getname(); - char *p; char b[BDEVNAME_SIZE]; + int tryagain = 20; get_fs_names(fs_names); -retry: - for (p = fs_names; *p; p += strlen(p)+1) { - int err = do_mount_root(name, p, flags, root_mount_data); - switch (err) { - case 0: - goto out; - case -EACCES: - flags |= MS_RDONLY; - goto retry; - case -EINVAL: - continue; + + while (1) { + int err = mount_root_try_all_fs(name, fs_names, flags, root_mount_data); + if (err == 0) + break; + + /* + * The root device may not be ready yet, so we retry a number of times + */ + if (--tryagain) { + printk(KERN_WARNING "VFS: Waiting %dsec for root device...\n", + tryagain); + ssleep(1); + if (!ROOT_DEV) { + ROOT_DEV = name_to_dev_t(saved_root_name); + create_dev(name, ROOT_DEV, root_device_name); + } + continue; } - /* + + /* * Allow the user to distinguish between failed sys_open * and bad superblock on root device. */ __bdevname(ROOT_DEV, b); - printk("VFS: Cannot open root device \"%s\" or %s\n", - root_device_name, b); - printk("Please append a correct \"root=\" boot option\n"); - + printk(KERN_CRIT "VFS: Cannot open root device \"%s\" or %s\n", + root_device_name, b); + printk(KERN_CRIT "Please append a correct \"root=\" boot option\n"); panic("VFS: Unable to mount root fs on %s", b); } - panic("VFS: Unable to mount root fs on %s", __bdevname(ROOT_DEV, b)); -out: putname(fs_names); } ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-16 21:09 ` 2.6.11-rc1-mm1 Daniel Drake @ 2005-01-17 23:31 ` J.A. Magallon 2005-01-18 2:35 ` 2.6.11-rc1-mm1 Daniel Drake 0 siblings, 1 reply; 142+ messages in thread From: J.A. Magallon @ 2005-01-17 23:31 UTC (permalink / raw) To: Daniel Drake; +Cc: Lista Linux-Kernel [-- Attachment #1: Type: text/plain, Size: 1492 bytes --] On 2005.01.16, Daniel Drake wrote: > Hi, > > Joseph Fannin wrote: > > On Fri, Jan 14, 2005 at 12:23:52AM -0800, Andrew Morton wrote: > > > >>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc1/2.6.11-rc1-mm1/ > > > > > >>waiting-10s-before-mounting-root-filesystem.patch > >> retry mounting the root filesystem at boot time > > > > > > With this patch, initrds seem to get 'skipped'. I think this is > > probably the cause for the reports of problems with RAID too. > > This patch should do the job. Replaces the existing > waiting-10s-before-mounting-root-filesystem.patch in 2.6.11-rc1-mm1. > > Daniel > > Retry up to 20 times if mounting the root device fails. This fixes booting > from usb-storage devices, which no longer make their partitions immediately > available. Also cleans up the mount_block_root() function. > > Based on an earlier patch from William Park <opengeometry@yahoo.ca> > > Signed-off-by: Daniel Drake <dsd@gentoo.org> > This does not patch against -mm1. -mm1 looks like a mix of plain 2.6.10 and your code. Could you revamp it against -mm1, please ? I looked at it but seems out of my understanding... TIA -- J.A. Magallon <jamagallon()able!es> \ Software is like sex: werewolf!able!es \ It's better when it's free Mandrakelinux release 10.2 (Cooker) for i586 Linux 2.6.10-jam4 (gcc 3.4.3 (Mandrakelinux 10.2 3.4.3-3mdk)) #2 [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1 2005-01-17 23:31 ` 2.6.11-rc1-mm1 J.A. Magallon @ 2005-01-18 2:35 ` Daniel Drake 0 siblings, 0 replies; 142+ messages in thread From: Daniel Drake @ 2005-01-18 2:35 UTC (permalink / raw) To: J.A. Magallon; +Cc: Lista Linux-Kernel J.A. Magallon wrote: > This does not patch against -mm1. -mm1 looks like a mix of plain 2.6.10 > and your code. > Could you revamp it against -mm1, please ? I looked at it but seems out > of my understanding... My patch replaces the one in -mm1. Just revert the waiting-10s-... patch that is in 2.6.11-rc1-mm1 using patch -p1 -R Then apply the one I attached to the last mail normally. I'll also be sending in a cleaner version of the patch shortly. Daniel ^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH] Wait and retry mounting root device (revised) 2005-01-16 0:59 ` 2.6.11-rc1-mm1 Joseph Fannin 2005-01-16 19:09 ` 2.6.11-rc1-mm1 Daniel Drake 2005-01-16 21:09 ` 2.6.11-rc1-mm1 Daniel Drake @ 2005-01-18 2:54 ` Daniel Drake 2005-01-18 0:34 ` Al Viro 2005-01-18 8:02 ` Andries Brouwer 2 siblings, 2 replies; 142+ messages in thread From: Daniel Drake @ 2005-01-18 2:54 UTC (permalink / raw) To: Andrew Morton; +Cc: Joseph Fannin, linux-kernel, Neil Brown, William Park [-- Attachment #1: Type: text/plain, Size: 567 bytes --] Retry up to 20 times if mounting the root device fails. This fixes booting from usb-storage devices, which no longer make their partitions immediately available. This should allow booting from root=/dev/sda1 and root=8:1 style parameters, whilst not breaking booting from RAID or initrd :) I have also cleaned up the mount_block_root() function a bit. Based on an earlier patch from William Park <opengeometry@yahoo.ca> Replaces the existing waiting-10s-before-mounting-root-filesystem.patch patch in 2.6.11-rc1-mm1 Signed-off-by: Daniel Drake <dsd@gentoo.org> [-- Attachment #2: boot-delay-retry-v3.patch --] [-- Type: text/x-patch, Size: 2738 bytes --] --- linux-2.6.10/init/do_mounts.c.orig 2005-01-16 19:18:57.000000000 +0000 +++ linux-2.6.10/init/do_mounts.c 2005-01-17 01:42:25.000000000 +0000 @@ -6,6 +6,7 @@ #include <linux/suspend.h> #include <linux/root_dev.h> #include <linux/security.h> +#include <linux/delay.h> #include <linux/nfs_fs.h> #include <linux/nfs_fs_sb.h> @@ -261,6 +262,9 @@ static void __init get_fs_names(char *pa static int __init do_mount_root(char *name, char *fs, int flags, void *data) { int err = sys_mount(name, "/root", fs, flags, data); + if (err == -EACCES && (flags | MS_RDONLY) == 0) + err = sys_mount(name, "/root", fs, flags | MS_RDONLY, data); + if (err) return err; @@ -273,38 +277,56 @@ static int __init do_mount_root(char *na return 0; } +static int __init mount_root_try_all_fs(char *name, char *fs_names, int flags, void *data) +{ + char *p; + int err = -EFAULT; + + for (p = fs_names; *p; p += strlen(p)+1) { + err = do_mount_root(name, p, flags, root_mount_data); + if (err != -EINVAL) + break; + } + + return err; +} + void __init mount_block_root(char *name, int flags) { char *fs_names = __getname(); - char *p; char b[BDEVNAME_SIZE]; + int tryagain = 20; get_fs_names(fs_names); -retry: - for (p = fs_names; *p; p += strlen(p)+1) { - int err = do_mount_root(name, p, flags, root_mount_data); - switch (err) { - case 0: - goto out; - case -EACCES: - flags |= MS_RDONLY; - goto retry; - case -EINVAL: - continue; - } - /* - * Allow the user to distinguish between failed sys_open - * and bad superblock on root device. - */ - __bdevname(ROOT_DEV, b); - printk("VFS: Cannot open root device \"%s\" or %s\n", - root_device_name, b); - printk("Please append a correct \"root=\" boot option\n"); - panic("VFS: Unable to mount root fs on %s", b); + while (--tryagain) { + int err = mount_root_try_all_fs(name, fs_names, flags, root_mount_data); + if (err == 0) + goto out; + + /* + * The root device may not be ready yet, so we retry a number of times + */ + printk(KERN_WARNING "VFS: Waiting %dsec for root device...\n", + tryagain); + ssleep(1); + if (!ROOT_DEV) { + ROOT_DEV = name_to_dev_t(saved_root_name); + create_dev(name, ROOT_DEV, root_device_name); + } } - panic("VFS: Unable to mount root fs on %s", __bdevname(ROOT_DEV, b)); -out: + + /* + * Allow the user to distinguish between failed sys_open + * and bad superblock on root device. + */ + __bdevname(ROOT_DEV, b); + printk(KERN_CRIT "VFS: Cannot open root device \"%s\" or %s\n", + root_device_name, b); + printk(KERN_CRIT "Please append a correct \"root=\" boot option\n"); + panic("VFS: Unable to mount root fs on %s", b); + + out: putname(fs_names); } ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH] Wait and retry mounting root device (revised) 2005-01-18 2:54 ` [PATCH] Wait and retry mounting root device (revised) Daniel Drake @ 2005-01-18 0:34 ` Al Viro 2005-01-18 0:02 ` Randy.Dunlap 2005-01-18 1:03 ` [PATCH] Wait and retry mounting root device (revised) William Park 2005-01-18 8:02 ` Andries Brouwer 1 sibling, 2 replies; 142+ messages in thread From: Al Viro @ 2005-01-18 0:34 UTC (permalink / raw) To: Daniel Drake Cc: Andrew Morton, Joseph Fannin, linux-kernel, Neil Brown, William Park On Tue, Jan 18, 2005 at 02:54:24AM +0000, Daniel Drake wrote: > Retry up to 20 times if mounting the root device fails. This fixes booting > from usb-storage devices, which no longer make their partitions immediately > available. Sigh... So we can very well get device coming up in the middle of a loop and get the actual attempts to mount the sucker in wrong order. How nice... Folks, that's not a solution. And kludges like that really have no business being there - they only hide the problem and make it harder to reproduce. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH] Wait and retry mounting root device (revised) 2005-01-18 0:34 ` Al Viro @ 2005-01-18 0:02 ` Randy.Dunlap 2005-01-18 8:05 ` Andries Brouwer 2005-01-18 8:28 ` Helge Hafting 2005-01-18 1:03 ` [PATCH] Wait and retry mounting root device (revised) William Park 1 sibling, 2 replies; 142+ messages in thread From: Randy.Dunlap @ 2005-01-18 0:02 UTC (permalink / raw) To: Al Viro Cc: Daniel Drake, Andrew Morton, Joseph Fannin, linux-kernel, Neil Brown, William Park Al Viro wrote: > On Tue, Jan 18, 2005 at 02:54:24AM +0000, Daniel Drake wrote: > >>Retry up to 20 times if mounting the root device fails. This fixes booting >>from usb-storage devices, which no longer make their partitions immediately >>available. > > > Sigh... So we can very well get device coming up in the middle of a loop > and get the actual attempts to mount the sucker in wrong order. How nice... > > Folks, that's not a solution. And kludges like that really have no > business being there - they only hide the problem and make it harder > to reproduce. Is there a solution other than initrd/initramfs ? Thanks, -- ~Randy ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH] Wait and retry mounting root device (revised) 2005-01-18 0:02 ` Randy.Dunlap @ 2005-01-18 8:05 ` Andries Brouwer 2005-01-18 8:28 ` Helge Hafting 1 sibling, 0 replies; 142+ messages in thread From: Andries Brouwer @ 2005-01-18 8:05 UTC (permalink / raw) To: Randy.Dunlap Cc: Al Viro, Daniel Drake, Andrew Morton, Joseph Fannin, linux-kernel, Neil Brown, William Park On Mon, Jan 17, 2005 at 04:02:15PM -0800, Randy.Dunlap wrote: > Al Viro wrote: > >On Tue, Jan 18, 2005 at 02:54:24AM +0000, Daniel Drake wrote: > > > >>Retry up to 20 times if mounting the root device fails. This fixes > >>booting > >>from usb-storage devices, which no longer make their partitions > >>immediately > >>available. > > > > > >Sigh... So we can very well get device coming up in the middle of a loop > >and get the actual attempts to mount the sucker in wrong order. How > >nice... > > > >Folks, that's not a solution. And kludges like that really have no > >business being there - they only hide the problem and make it harder > >to reproduce. > > Is there a solution other than initrd/initramfs ? On the one hand, I entirely agree with Al - this guessing business is a bad kludge, and building complications on top of it makes things worse. On the other hand, we do already have the rootfstype= option, so one can avoid trying things in the "wrong" order. Andries ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH] Wait and retry mounting root device (revised) 2005-01-18 0:02 ` Randy.Dunlap 2005-01-18 8:05 ` Andries Brouwer @ 2005-01-18 8:28 ` Helge Hafting 2005-01-18 8:49 ` Andrew Morton 1 sibling, 1 reply; 142+ messages in thread From: Helge Hafting @ 2005-01-18 8:28 UTC (permalink / raw) To: Randy.Dunlap Cc: Al Viro, Daniel Drake, Andrew Morton, Joseph Fannin, linux-kernel, Neil Brown, William Park Randy.Dunlap wrote: > Al Viro wrote: > >> On Tue, Jan 18, 2005 at 02:54:24AM +0000, Daniel Drake wrote: >> >>> Retry up to 20 times if mounting the root device fails. This fixes >>> booting >>> from usb-storage devices, which no longer make their partitions >>> immediately >>> available. >> >> >> >> Sigh... So we can very well get device coming up in the middle of a >> loop >> and get the actual attempts to mount the sucker in wrong order. How >> nice... >> >> Folks, that's not a solution. And kludges like that really have no >> business being there - they only hide the problem and make it harder >> to reproduce. > > > Is there a solution other than initrd/initramfs ? There is a solution that seems obvious to me, so obvious that it ought to be the first solution to try. And it is guaranteed to not mess up raid or anything else too. So perhaps there is something wrong with it, or someone would have done this already? Here it is: Apparently, USB devices doesn't appear immediately (after powerup? after USB bus initialization?) We know this - therefore the USB block driver should know this. The USB block driver should know that 10s (or whatever) hasn't yet passed, and simply block any attempt to access block devices (or scan for them) knowing that it will not work yet, but any device will be there after the pause. A root mount on USB will then succeed at the _first_ try everytime, so no need for retries. This solution is guaranteed to not mess up raid or anything else, because the fix is done in the driver for the "odd" devices, not in the upper layer trying to use the device as a root fs. Surely someone must have thought of this before - is there any reason why this won't work well? The only thing I can think of is that partition scanning will cause a delay on every system with USB block devices compiled-in, but this could be postponed when root isn't on usb. Partition scanning is moving to "early userspace" anyway, isn't it? In the meantime, people without USB root that don't want a bootup delay can use modular usb and load the module later in some boot script. Helge Hafting ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH] Wait and retry mounting root device (revised) 2005-01-18 8:28 ` Helge Hafting @ 2005-01-18 8:49 ` Andrew Morton 2005-01-18 13:20 ` Helge Hafting 2005-01-20 20:55 ` [PATCH] Configurable delay before mounting root device Daniel Drake 0 siblings, 2 replies; 142+ messages in thread From: Andrew Morton @ 2005-01-18 8:49 UTC (permalink / raw) To: Helge Hafting; +Cc: rddunlap, viro, dsd, jhf, linux-kernel, neilb, opengeometry Helge Hafting <helge.hafting@hist.no> wrote: > > The USB block driver should know that 10s (or whatever) hasn't yet > passed, and simply > block any attempt to access block devices (or scan for them) knowing > that it will > not work yet, but any device will be there after the pause. A root mount > on USB will > then succeed at the _first_ try everytime, so no need for retries. Maybe a simple delay somewhere in the boot sequence would suffice? Boot with `mount_delay=10'. But it sure would be nice to simply get this stuff right somehow. If the USB block driver knows that discovery is still in progress it should wait until it has completed. (I suggested that before, but wasn't 100% convinced by the answer). ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH] Wait and retry mounting root device (revised) 2005-01-18 8:49 ` Andrew Morton @ 2005-01-18 13:20 ` Helge Hafting 2005-01-20 20:55 ` [PATCH] Configurable delay before mounting root device Daniel Drake 1 sibling, 0 replies; 142+ messages in thread From: Helge Hafting @ 2005-01-18 13:20 UTC (permalink / raw) To: Andrew Morton; +Cc: rddunlap, viro, dsd, jhf, linux-kernel, neilb, opengeometry Andrew Morton wrote: >Helge Hafting <helge.hafting@hist.no> wrote: > > >>The USB block driver should know that 10s (or whatever) hasn't yet >> passed, and simply >> block any attempt to access block devices (or scan for them) knowing >> that it will >> not work yet, but any device will be there after the pause. A root mount >> on USB will >> then succeed at the _first_ try everytime, so no need for retries. >> >> > >Maybe a simple delay somewhere in the boot sequence would suffice? Boot >with `mount_delay=10'. > > > Certainly the simplest solution, and it also solves a related but rare problem: People booting linux from ROM long before the disks have time to spin up. There seems to be a disadvantage in that one must specify this pause manually, but the admin have to select the root fs somewhere anyway (lilo.conf) and may specify the delay at the same time. >But it sure would be nice to simply get this stuff right somehow. If the >USB block driver knows that discovery is still in progress it should wait >until it has completed. (I suggested that before, but wasn't 100% convinced >by the answer). > > Sure, if the USB core can know, then it should use the knowledge. Or utilize a simple timeout if all it knows is that "common storage devices appear on the bus up to 10s after powerup/reset". Helge Hafting ^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH] Configurable delay before mounting root device 2005-01-18 8:49 ` Andrew Morton 2005-01-18 13:20 ` Helge Hafting @ 2005-01-20 20:55 ` Daniel Drake 2005-01-20 20:24 ` Andrew Morton 2005-01-20 22:49 ` William Park 1 sibling, 2 replies; 142+ messages in thread From: Daniel Drake @ 2005-01-20 20:55 UTC (permalink / raw) To: Andrew Morton Cc: Helge Hafting, rddunlap, viro, jhf, linux-kernel, neilb, opengeometry [-- Attachment #1: Type: text/plain, Size: 384 bytes --] Adds a boot parameter which can be used to specify a delay (in seconds) before the root device is decoded/discovered/mounted. Example usage for 10 second delay: rootdelay=10 Useful for usb-storage devices which no longer make their partitions immediately available, and for other storage devices which require some "spin-up" time. Signed-off-by: Daniel Drake <dsd@gentoo.org> [-- Attachment #2: rootdelay-boot-param.patch --] [-- Type: text/x-patch, Size: 932 bytes --] --- linux-2.6.10/init/do_mounts.c.orig 2005-01-20 20:37:01.000000000 +0000 +++ linux-2.6.10/init/do_mounts.c 2005-01-20 20:44:47.190899080 +0000 @@ -6,6 +6,7 @@ #include <linux/suspend.h> #include <linux/root_dev.h> #include <linux/security.h> +#include <linux/delay.h> #include <linux/nfs_fs.h> #include <linux/nfs_fs_sb.h> @@ -228,8 +229,16 @@ return 1; } +static unsigned int __initdata root_delay; +static int __init root_delay_setup(char *str) +{ + root_delay = simple_strtoul(str, NULL, 0); + return 1; +} + __setup("rootflags=", root_data_setup); __setup("rootfstype=", fs_names_setup); +__setup("rootdelay=", root_delay_setup); static void __init get_fs_names(char *page) { @@ -387,6 +396,12 @@ mount_devfs(); + if (root_delay) { + printk(KERN_INFO "Waiting %dsec before mounting root device...\n", + root_delay); + ssleep(root_delay); + } + md_run_setup(); if (saved_root_name[0]) { ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH] Configurable delay before mounting root device 2005-01-20 20:55 ` [PATCH] Configurable delay before mounting root device Daniel Drake @ 2005-01-20 20:24 ` Andrew Morton 2005-01-21 18:15 ` Daniel Drake 2005-01-20 22:49 ` William Park 1 sibling, 1 reply; 142+ messages in thread From: Andrew Morton @ 2005-01-20 20:24 UTC (permalink / raw) To: Daniel Drake Cc: helge.hafting, rddunlap, viro, jhf, linux-kernel, neilb, opengeometry Daniel Drake <dsd@gentoo.org> wrote: > > + if (root_delay) { > + printk(KERN_INFO "Waiting %dsec before mounting root device...\n", > + root_delay); > + ssleep(root_delay); > + } Totally sad, but it's hard to see how that could break anything. You owe me an update to Documentation/kernel-parameters.txt ;) ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH] Configurable delay before mounting root device 2005-01-20 20:24 ` Andrew Morton @ 2005-01-21 18:15 ` Daniel Drake 0 siblings, 0 replies; 142+ messages in thread From: Daniel Drake @ 2005-01-21 18:15 UTC (permalink / raw) To: Andrew Morton Cc: helge.hafting, rddunlap, viro, jhf, linux-kernel, neilb, opengeometry [-- Attachment #1: Type: text/plain, Size: 86 bytes --] Andrew Morton wrote: > You owe me an update to Documentation/kernel-parameters.txt ;) [-- Attachment #2: rootdelay-boot-param.patch --] [-- Type: text/x-patch, Size: 1794 bytes --] Adds a boot parameter which can be used to specify a delay (in seconds) before the root device is decoded/discovered/mounted. Example usage for 10 second delay: rootdelay=10 Useful for usb-storage devices which no longer make their partitions immediately available, and for other storage devices which require some "spin-up" time. Signed-off-by: Daniel Drake <dsd@gentoo.org> --- linux-2.6.10/init/do_mounts.c.orig 2005-01-20 20:37:01.000000000 +0000 +++ linux-2.6.10/init/do_mounts.c 2005-01-20 20:44:47.190899080 +0000 @@ -6,6 +6,7 @@ #include <linux/suspend.h> #include <linux/root_dev.h> #include <linux/security.h> +#include <linux/delay.h> #include <linux/nfs_fs.h> #include <linux/nfs_fs_sb.h> @@ -228,8 +229,16 @@ return 1; } +static unsigned int __initdata root_delay; +static int __init root_delay_setup(char *str) +{ + root_delay = simple_strtoul(str, NULL, 0); + return 1; +} + __setup("rootflags=", root_data_setup); __setup("rootfstype=", fs_names_setup); +__setup("rootdelay=", root_delay_setup); static void __init get_fs_names(char *page) { @@ -387,6 +396,12 @@ mount_devfs(); + if (root_delay) { + printk(KERN_INFO "Waiting %dsec before mounting root device...\n", + root_delay); + ssleep(root_delay); + } + md_run_setup(); if (saved_root_name[0]) { --- linux-2.6.10/Documentation/kernel-parameters.txt.orig 2005-01-21 17:18:20.000000000 +0000 +++ linux-2.6.10/Documentation/kernel-parameters.txt 2005-01-21 17:22:29.000000000 +0000 @@ -1072,6 +1072,9 @@ running once the system is up. root= [KNL] Root filesystem + rootdelay= [KNL] Delay (in seconds) to pause before attempting to + mount the root filesystem + rootflags= [KNL] Set root filesystem mount option string rootfstype= [KNL] Set root filesystem type ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH] Configurable delay before mounting root device 2005-01-20 20:55 ` [PATCH] Configurable delay before mounting root device Daniel Drake 2005-01-20 20:24 ` Andrew Morton @ 2005-01-20 22:49 ` William Park 1 sibling, 0 replies; 142+ messages in thread From: William Park @ 2005-01-20 22:49 UTC (permalink / raw) To: Daniel Drake Cc: Andrew Morton, Helge Hafting, rddunlap, viro, jhf, linux-kernel, neilb On Thu, Jan 20, 2005 at 08:55:54PM +0000, Daniel Drake wrote: > Adds a boot parameter which can be used to specify a delay (in seconds) > before the root device is decoded/discovered/mounted. > > Example usage for 10 second delay: > > rootdelay=10 > > Useful for usb-storage devices which no longer make their partitions > immediately available, and for other storage devices which require some > "spin-up" time. > > Signed-off-by: Daniel Drake <dsd@gentoo.org> Very concise. It's much better than 2.4 patch or its 2.6 adaptation (my patch)... -- William Park <opengeometry@yahoo.ca>, Toronto, Canada Slackware Linux -- because I can type. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH] Wait and retry mounting root device (revised) 2005-01-18 0:34 ` Al Viro 2005-01-18 0:02 ` Randy.Dunlap @ 2005-01-18 1:03 ` William Park 2005-01-19 0:43 ` Werner Almesberger 1 sibling, 1 reply; 142+ messages in thread From: William Park @ 2005-01-18 1:03 UTC (permalink / raw) To: Al Viro Cc: Daniel Drake, Andrew Morton, Joseph Fannin, linux-kernel, Neil Brown On Tue, Jan 18, 2005 at 12:34:13AM +0000, Al Viro wrote: > On Tue, Jan 18, 2005 at 02:54:24AM +0000, Daniel Drake wrote: > > Retry up to 20 times if mounting the root device fails. This fixes > > booting from usb-storage devices, which no longer make their > > partitions immediately available. > > Sigh... So we can very well get device coming up in the middle of a > loop and get the actual attempts to mount the sucker in wrong order. > How nice... > > Folks, that's not a solution. And kludges like that really have no > business being there - they only hide the problem and make it harder > to reproduce. The problem at hand is that USB key drive (which is my immediate concern) takes 5sec to show up. So, it's much better approach than 'initrd'. -- William Park <opengeometry@yahoo.ca>, Toronto, Canada Slackware Linux -- because I can type. ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH] Wait and retry mounting root device (revised) 2005-01-18 1:03 ` [PATCH] Wait and retry mounting root device (revised) William Park @ 2005-01-19 0:43 ` Werner Almesberger 0 siblings, 0 replies; 142+ messages in thread From: Werner Almesberger @ 2005-01-19 0:43 UTC (permalink / raw) To: Al Viro, Daniel Drake, Andrew Morton, Joseph Fannin, linux-kernel, Neil Brown William Park wrote: > The problem at hand is that USB key drive (which is my immediate > concern) takes 5sec to show up. So, it's much better approach than > 'initrd'. I'm a little biased, but I disagree ;-) The main problems with initrd seem to be that it adds at least one more moving part, and that most initrd-making procedures give you something non-interactive that hardly interacts with the outside world. Lo and behold, nobody likes sudden silent failure of a complex and opaque subsystem, particularly if it happens to be vitally important. I think initrds could be greatly improved by including a BusyBox in their failure paths (plus a way to manually enter the BusyBox, in case apparent success still means failure). That way, you can actually try to fix things if there are problems. Another issue is configuration data that has to exist in the initrd, yielding a possibly complex initrd construction process that has to follow each configuration change. Also there, an initrd could be able to try to access the regular file system to access such information, possibly combined with caching and heuristics. (I realize that this isn't trivial and bears a high risk of intractable failure paths, but I also think that it's worth exploring this direction.) Regarding the delayed mount problem, I think some retry procedure may be the best possible band-aid for a while. While it would be desirable for the USB subsystem (etc.) to just block until the device is ready, this doesn't work so well if the presence of the device can't be predicted at that point, e.g. if a "devfs" (udev, etc.) name has to be looked up first. I'm not sure I understand Al's concern with devices popping up in the middle of the loop. For all practical purposes, mounting the root file system has a single target anyway, so it can't really compete with anything else. Automatically selected alternative roots can make sense, but that's sufficiently policy-ish that I think it would be better kept in an initrd, where instrumentation is more naturally added than in the kernel. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH] Wait and retry mounting root device (revised) 2005-01-18 2:54 ` [PATCH] Wait and retry mounting root device (revised) Daniel Drake 2005-01-18 0:34 ` Al Viro @ 2005-01-18 8:02 ` Andries Brouwer 2005-01-19 20:11 ` Frank van Maarseveen 1 sibling, 1 reply; 142+ messages in thread From: Andries Brouwer @ 2005-01-18 8:02 UTC (permalink / raw) To: Daniel Drake Cc: Andrew Morton, Joseph Fannin, linux-kernel, Neil Brown, William Park On Tue, Jan 18, 2005 at 02:54:24AM +0000, Daniel Drake wrote: > Retry up to 20 times if mounting the root device fails. This fixes booting > from usb-storage devices, which no longer make their partitions immediately > available. > > This should allow booting from root=/dev/sda1 and root=8:1 style > parameters, whilst not breaking booting from RAID or initrd :) > I have also cleaned up the mount_block_root() function a bit. + if (err == -EACCES && (flags | MS_RDONLY) == 0) + err = sys_mount(name, "/root", fs, flags | MS_RDONLY, data); + It is rather unlikely that (flags | MS_RDONLY) == 0 ... I don't like the 20 - so arbitrary. And since we are going to panic anyway, why not wait indefinitely? Suppose we have kernel command line options rootdev=, rootpttype=, root=, rootfstype=, rootwait= telling the kernel what device is the root device, what type of partition table it has, on which partition the root filesystem lives, what type of filesystem it has, and whether we want to wait until it becomes available instead of panicking. If we wait, possibly after the first failure to mount, do a printk to tell the user: waiting for device to become available. rootwait can have several values: for example, with a boot/root floppy combo, we want the user to hit enter or so before accessing the device. Andries ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH] Wait and retry mounting root device (revised) 2005-01-18 8:02 ` Andries Brouwer @ 2005-01-19 20:11 ` Frank van Maarseveen 0 siblings, 0 replies; 142+ messages in thread From: Frank van Maarseveen @ 2005-01-19 20:11 UTC (permalink / raw) To: Andries Brouwer Cc: Daniel Drake, Andrew Morton, Joseph Fannin, linux-kernel, Neil Brown, William Park On Tue, Jan 18, 2005 at 09:02:14AM +0100, Andries Brouwer wrote: > > Suppose we have kernel command line options > rootdev=, rootpttype=, root=, rootfstype=, rootwait= > telling the kernel what device is the root device, > what type of partition table it has, > on which partition the root filesystem lives, > what type of filesystem it has, might as well add rootuuid= for those fs which support it. -- Frank ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: 2.6.11-rc1-mm1
@ 2005-01-17 6:49 Prasanna S Panchamukhi
0 siblings, 0 replies; 142+ messages in thread
From: Prasanna S Panchamukhi @ 2005-01-17 6:49 UTC (permalink / raw)
To: karim; +Cc: trz, ak, maneesh, linux-kernel
Hi Karim,
> Thomas Gleixner wrote:
>> It's not only me, who needs constant time. Everybody interested in
>> tracing will need that. In my opinion its a principle of tracing.
>
> relayfs is a generalized buffering mechanism. Tracing is one application
> it serves. Check out the web site: "high-speed data-relay filesystem."
> Fancy name huh ...
>
>> The "lockless" mechanism is _FAKE_ as I already pointed out. It replaces
>> locks by do { } while loops. So what ?
>
How about combining "buffering mechansim of relayfs" and
"kernel-> user space tranport by debugfs"
This will also remove lots of compilcated code from realyfs.
Thanks
Prasanna
--
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Ph: 91-80-25044636
<prasanna@in.ibm.com>
^ permalink raw reply [flat|nested] 142+ messages in thread
end of thread, other threads:[~2005-01-25 8:22 UTC | newest] Thread overview: 142+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-01-14 8:23 2.6.11-rc1-mm1 Andrew Morton 2005-01-14 8:47 ` 2.6.11-rc1-mm1 Andi Kleen 2005-01-14 9:27 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-14 10:27 ` 2.6.11-rc1-mm1 Nikita Danilov 2005-01-14 10:38 ` 2.6.11-rc1-mm1 Andi Kleen 2005-01-14 11:06 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-14 15:31 ` 2.6.11-rc1-mm1 Roman Zippel 2005-01-14 21:11 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-14 22:58 ` 2.6.11-rc1-mm1 Tim Bird 2005-01-15 0:20 ` 2.6.11-rc1-mm1 Andi Kleen 2005-01-15 4:25 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-15 1:06 ` 2.6.11-rc1-mm1 Roman Zippel 2005-01-15 4:18 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-16 2:38 ` 2.6.11-rc1-mm1 Roman Zippel 2005-01-16 6:00 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-16 16:52 ` 2.6.11-rc1-mm1 Roman Zippel 2005-01-16 21:18 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-17 1:37 ` 2.6.11-rc1-mm1 Thomas Gleixner 2005-01-17 2:24 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-17 12:20 ` 2.6.11-rc1-mm1 Thomas Gleixner 2005-01-17 20:32 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-17 22:31 ` 2.6.11-rc1-mm1 Thomas Gleixner 2005-01-17 22:42 ` 2.6.11-rc1-mm1 Robert Wisniewski 2005-01-17 23:26 ` 2.6.11-rc1-mm1 Thomas Gleixner 2005-01-17 23:41 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-18 0:02 ` 2.6.11-rc1-mm1 Thomas Gleixner 2005-01-18 3:05 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-17 13:54 ` 2.6.11-rc1-mm1 Roman Zippel 2005-01-17 21:27 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-17 23:57 ` 2.6.11-rc1-mm1 Roman Zippel 2005-01-18 4:03 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-18 4:30 ` 2.6.11-rc1-mm1 Aaron Cohen 2005-01-18 4:46 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-18 8:07 ` 2.6.11-rc1-mm1 Tom Zanussi 2005-01-18 16:40 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-18 19:37 ` 2.6.11-rc1-mm1 Tom Zanussi 2005-01-18 15:31 ` 2.6.11-rc1-mm1 Roman Zippel 2005-01-21 6:26 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-21 22:23 ` 2.6.11-rc1-mm1 Roman Zippel 2005-01-23 7:43 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-23 7:52 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-23 8:28 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-24 0:38 ` 2.6.11-rc1-mm1 Roman Zippel 2005-01-25 9:12 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-18 1:13 ` 2.6.11-rc1-mm1 Roman Zippel 2005-01-18 2:52 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-17 17:02 ` 2.6.11-rc1-mm1 Tom Zanussi 2005-01-16 19:05 ` 2.6.11-rc1-mm1 Tom Zanussi 2005-01-19 11:14 ` 2.6.11-rc1-mm1 Christoph Hellwig 2005-01-19 16:53 ` 2.6.11-rc1-mm1 Tom Zanussi 2005-01-16 16:14 ` 2.6.11-rc1-mm1 Christoph Hellwig 2005-01-16 19:47 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-16 20:30 ` 2.6.11-rc1-mm1 Tom Zanussi 2005-01-19 11:11 ` 2.6.11-rc1-mm1 Christoph Hellwig 2005-01-14 15:24 ` 2.6.11-rc1-mm1 Roman Zippel 2005-01-18 11:19 ` 2.6.11-rc1-mm1 Masami Hiramatsu 2005-01-18 11:46 ` 2.6.11-rc1-mm1 Andi Kleen 2005-01-18 14:52 ` [Lkst-develop] 2.6.11-rc1-mm1 Masami Hiramatsu 2005-01-14 12:36 ` 2.6.11-rc1-mm1 Miklos Szeredi 2005-01-14 13:04 ` 2.6.11-rc1-mm1 Kasper Sandberg 2005-01-14 18:35 ` 2.6.11-rc1-mm1 Andrew Morton 2005-01-14 19:08 ` 2.6.11-rc1-mm1 Rogério Brito 2005-01-14 19:41 ` 2.6.11-rc1-mm1 Peter Buckingham 2005-01-17 17:04 ` 2.6.11-rc1-mm1 Matthias Urlichs 2005-01-14 19:02 ` 2.6.11-rc1-mm1 Bill Davidsen 2005-01-14 15:07 ` 2.6.11-rc1-mm1 Barry K. Nathan 2005-01-14 16:56 ` 2.6.11-rc1-mm1 Dave Jones 2005-01-14 17:55 ` 2.6.11-rc1-mm1 Barry K. Nathan 2005-01-19 23:06 ` 2.6.11-rc1-mm1 Marcos D. Marado Torres 2005-01-19 23:54 ` 2.6.11-rc1-mm1 Barry K. Nathan 2005-01-14 15:35 ` 2.6.11-rc1-mm1 Zwane Mwaikambo 2005-01-14 22:03 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-14 17:35 ` [patch] 2.6.11-rc1-mm1: ip_tables.c: ipt_find_target must be EXPORT_SYMBOL'ed Adrian Bunk 2005-01-14 17:43 ` Patrick McHardy 2005-01-14 22:41 ` 2.6.11-rc1-mm1 Tim Bird 2005-01-14 22:46 ` 2.6.11-rc1-mm1 Thomas Gleixner 2005-01-14 23:22 ` 2.6.11-rc1-mm1 Tim Bird 2005-01-15 0:24 ` 2.6.11-rc1-mm1 Thomas Gleixner 2005-01-15 1:27 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-16 16:18 ` 2.6.11-rc1-mm1 Christoph Hellwig 2005-01-15 13:08 ` [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) Thomas Gleixner 2005-01-16 2:09 ` Karim Yaghmour 2005-01-16 3:11 ` Roman Zippel 2005-01-16 4:23 ` Karim Yaghmour 2005-01-16 23:43 ` Thomas Gleixner 2005-01-17 1:54 ` Karim Yaghmour 2005-01-17 10:26 ` Thomas Gleixner 2005-01-17 20:34 ` Karim Yaghmour 2005-01-17 22:18 ` Thomas Gleixner 2005-01-17 23:57 ` Karim Yaghmour 2005-01-18 8:46 ` Thomas Gleixner 2005-01-18 16:31 ` Karim Yaghmour 2005-01-19 7:13 ` Werner Almesberger 2005-01-19 17:38 ` Karim Yaghmour 2005-01-14 22:48 ` 2.6.11-rc1-mm1 Andre Eisenbach 2005-01-15 8:42 ` 2.6.11-rc1-mm1 Miklos Szeredi 2005-01-15 8:45 ` 2.6.11-rc1-mm1 Miklos Szeredi [not found] ` <1105740276.8604.83.camel@tglx.tec.linutronix.de> 2005-01-14 23:09 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-15 0:01 ` 2.6.11-rc1-mm1 Thomas Gleixner 2005-01-15 0:26 ` 2.6.11-rc1-mm1 Andrew Morton 2005-01-15 1:00 ` 2.6.11-rc1-mm1 Thomas Gleixner 2005-01-15 1:25 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-15 10:20 ` 2.6.11-rc1-mm1 Thomas Gleixner 2005-01-16 4:13 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-16 15:19 ` 2.6.11-rc1-mm1 Robert Wisniewski 2005-01-15 1:14 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-15 9:57 ` 2.6.11-rc1-mm1 Thomas Gleixner 2005-01-16 16:21 ` 2.6.11-rc1-mm1 Christoph Hellwig 2005-01-16 19:49 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-16 20:11 ` 2.6.11-rc1-mm1 Robert Wisniewski 2005-01-16 20:32 ` 2.6.11-rc1-mm1 Andrew Morton 2005-01-16 21:06 ` 2.6.11-rc1-mm1 Robert Wisniewski 2005-01-16 21:40 ` 2.6.11-rc1-mm1 Arjan van de Ven 2005-01-17 15:48 ` 2.6.11-rc1-mm1 Robert Wisniewski 2005-01-17 16:13 ` 2.6.11-rc1-mm1 Christoph Hellwig 2005-01-17 21:38 ` 2.6.11-rc1-mm1 Karim Yaghmour 2005-01-16 20:39 ` 2.6.11-rc1-mm1 Christoph Hellwig 2005-01-16 21:14 ` 2.6.11-rc1-mm1 Robert Wisniewski 2005-01-15 2:58 ` 2.6.11-rc1-mm1 William Lee Irwin III 2005-01-17 22:19 ` 2.6.11-rc1-mm1 William Lee Irwin III 2005-01-16 0:59 ` 2.6.11-rc1-mm1 Joseph Fannin 2005-01-16 19:09 ` 2.6.11-rc1-mm1 Daniel Drake 2005-01-16 19:20 ` 2.6.11-rc1-mm1 William Lee Irwin III 2005-01-16 21:09 ` 2.6.11-rc1-mm1 Daniel Drake 2005-01-17 23:31 ` 2.6.11-rc1-mm1 J.A. Magallon 2005-01-18 2:35 ` 2.6.11-rc1-mm1 Daniel Drake 2005-01-18 2:54 ` [PATCH] Wait and retry mounting root device (revised) Daniel Drake 2005-01-18 0:34 ` Al Viro 2005-01-18 0:02 ` Randy.Dunlap 2005-01-18 8:05 ` Andries Brouwer 2005-01-18 8:28 ` Helge Hafting 2005-01-18 8:49 ` Andrew Morton 2005-01-18 13:20 ` Helge Hafting 2005-01-20 20:55 ` [PATCH] Configurable delay before mounting root device Daniel Drake 2005-01-20 20:24 ` Andrew Morton 2005-01-21 18:15 ` Daniel Drake 2005-01-20 22:49 ` William Park 2005-01-18 1:03 ` [PATCH] Wait and retry mounting root device (revised) William Park 2005-01-19 0:43 ` Werner Almesberger 2005-01-18 8:02 ` Andries Brouwer 2005-01-19 20:11 ` Frank van Maarseveen 2005-01-17 6:49 2.6.11-rc1-mm1 Prasanna S Panchamukhi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).