linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.17-mm5
@ 2006-07-01 10:35 ` Andrew Morton
  2006-07-01 11:08   ` 2.6.17-mm5 Reuben Farrelly
                     ` (6 more replies)
  0 siblings, 7 replies; 35+ messages in thread
From: Andrew Morton @ 2006-07-01 10:35 UTC (permalink / raw)
  To: linux-kernel


ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17/2.6.17-mm5/


Nothing very exciting here - a few buggy patches were fixed or dropped.


Boilerplate:

- See the `hot-fixes' directory for any important updates to this patchset.

- To fetch an -mm tree using git, use (for example)

  git fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git v2.6.16-rc2-mm1

- -mm kernel commit activity can be reviewed by subscribing to the
  mm-commits mailing list.

        echo "subscribe mm-commits" | mail majordomo@vger.kernel.org

- If you hit a bug in -mm and it is not obvious which patch caused it, it is
  most valuable if you can perform a bisection search to identify which patch
  introduced the bug.  Instructions for this process are at

        http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt

  But beware that this process takes some time (around ten rebuilds and
  reboots), so consider reporting the bug first and if we cannot immediately
  identify the faulty patch, then perform the bisection search.

- When reporting bugs, please try to Cc: the relevant maintainer and mailing
  list on any email.



Changes since 2.6.17-mm4:


 origin.patch
 git-acpi.patch
 git-cpufreq.patch
 git-geode.patch
 git-gfs2.patch
 git-ia64.patch
 git-infiniband.patch
 git-jfs.patch
 git-klibc.patch
 git-hdrinstall2.patch
 git-libata-all.patch
 git-mtd.patch
 git-netdev-all.patch
 git-nfs.patch
 git-ocfs2.patch
 git-pcmcia-fixup.patch
 git-sas.patch
 git-scsi-misc.patch
 git-scsi-target.patch
 git-supertrak.patch
 git-watchdog.patch
 git-wireless.patch
 git-cryptodev.patch

 git trees.

-fix-sgivwfb-compile.patch
-generic_file_buffered_write-handle-zero-length-iovec-segments-stable.patch
-solve-config-broken-undefined-reference-to-online_page.patch
-sparc-register_cpu-build-fix.patch
-acpi-add-ibm-r60e-laptop-to-proc-idle-blacklist.patch
-drivers-acpi-scanc-make-acpi_bus_type-static.patch
-acpi_srat-needs-acpi.patch
-acpi-identify-which-device-is-not-power-manageable.patch
-the-scheduled-unexport-of-insert_resource.patch
-videocodec-make-1-bit-fields-unsigned.patch
-i2c-801-64bit-resource-fix.patch
-fs-jffs2-make-2-functions-static.patch
-mtd-fix-all-kernel-doc-warnings.patch
-mtd-kernel-doc-fixes-additions.patch
-af_unix-datagram-getpeersec.patch
-drivers-net-irda-mcs7780c-make-struct-mcs_driver-static.patch
-irda-fix-rcu-lock-pairing-on-error-path.patch
-kill-open-coded-offsetof-in-cm4000_csc-zero_dev.patch
-com20020_cs-more-device-support.patch
-git-pcmcia-xirc2ps_cs-fix-ooops-not-a-creditcard.patch
-git-powerpc.patch
-powerpc-fix-idr-locking-in-init_new_context.patch
-gregkh-pci-64bit-resource-c99-changes-for-struct-resource-declarations.patch
-gregkh-pci-64bit-resource-fix-up-printks-for-resources-in-sound-drivers.patch
-gregkh-pci-64bit-resource-fix-up-printks-for-resources-in-networks-drivers.patch
-gregkh-pci-64bit-resource-fix-up-printks-for-resources-in-pci-core-and-hotplug-drivers.patch
-gregkh-pci-64bit-resource-fix-up-printks-for-resources-in-mtd-drivers.patch
-gregkh-pci-64bit-resource-fix-up-printks-for-resources-in-ide-drivers.patch
-gregkh-pci-64bit-resource-fix-up-printks-for-resources-in-video-drivers.patch
-gregkh-pci-64bit-resource-fix-up-printks-for-resources-in-pcmcia-drivers.patch
-gregkh-pci-64bit-resource-fix-up-printks-for-resources-in-arch-and-core-code.patch
-gregkh-pci-64bit-resource-fix-up-printks-for-resources-in-misc-drivers.patch
-gregkh-pci-64bit-resource-introduce-resource_size_t-for-the-start-and-end-of-struct-resource.patch
-gregkh-pci-64bit-resource-change-resource-core-to-use-resource_size_t.patch
-gregkh-pci-64bit-resource-change-pci-core-and-arch-code-to-use-resource_size_t.patch
-gregkh-pci-64bit-resource-change-pnp-core-to-use-resource_size_t.patch
-gregkh-pci-64bit-resource-convert-a-few-remaining-drivers-to-use-resource_size_t-where-needed.patch
-gregkh-pci-64bit-resource-finally-enable-64bit-resource-sizes.patch
-gregkh-pci-i386-export-memory-more-than-4g-through-proc-iomem.patch
-gregkh-pci-pci-legacy-i-o-port-free-driver-changes-to-generic-pci-code.patch
-gregkh-pci-pci-legacy-i-o-port-free-driver-update-documentation-pci_txt.patch
-gregkh-pci-pci-legacy-i-o-port-free-driver-make-intel-e1000-driver-legacy-i-o-port-free.patch
-gregkh-pci-pci-legacy-i-o-port-free-driver-make-emulex-lpfc-driver-legacy-i-o-port-free.patch
-64bit-resource-convert-a-few-remaining-drivers-to-use-resource_size_t-where-needed-8139cp.patch
-bugfix-pci-legacy-i-o-port-free-driver.patch
-insert-identical-resources-above-existing-resources.patch
-clear-abnormal-poweroff-flag-on-via-southbridges-fix-resume.patch
-clear-abnormal-poweroff-flag-on-via-southbridges-fix-resume-fix.patch
-small-whitespace-cleanup-for-qlogic-driver.patch
-mpt_interrupt-should-return-irq_none-when.patch
-qla1280-fix-section-mismatch-warnings.patch
-ehci-fix-bogus-alteration-of-a-local-variable.patch
-ipaqc-bugfixes.patch
-ipaqc-timing-parameters.patch
-if-0-drivers-usb-input-hid-corechid_find_field_by_usage.patch
-usb-remove-empty-destructor-from-drivers-usb-mon-mon_textc.patch
-zoned-vm-counters-create-vmstatc-h-from-page_allocc-h.patch
-zoned-vm-counters-create-vmstatc-h-from-page_allocc-h-s390-fix.patch
-zoned-vm-counters-create-vmstatc-h-from-page_allocc-h-fix.patch
-zoned-vm-counters-create-vmstatc-h-from-page_allocc-h-fix-2.patch
-zoned-vm-counters-basic-zvc-zoned-vm-counter-implementation.patch
-zoned-vm-counters-basic-zvc-zoned-vm-counter-implementation-tidy.patch
-zoned-vm-counters-basic-zvc-zoned-vm-counter-implementation-speedup.patch
-zoned-vm-counters-basic-zvc-zoned-vm-counter-implementation-speedup-fix.patch
-zoned-vm-counters-basic-zvc-zoned-vm-counter-implementation-export-vm_stat.patch
-zoned-vm-counters-convert-nr_mapped-to-per-zone-counter.patch
-zoned-vm-counters-convert-nr_mapped-to-per-zone-counter-fix.patch
-zoned-vm-counters-conversion-of-nr_pagecache-to-per-zone-counter.patch
-zoned-vm-counters-remove-nr_file_mapped-from-scan-control-structure.patch
-zoned-vm-counters-remove-nr_file_mapped-from-scan-control-structure-fix.patch
-zoned-vm-counters-split-nr_anon_pages-off-from-nr_file_mapped.patch
-zoned-vm-counters-zone_reclaim-remove-proc-sys-vm-zone_reclaim_interval.patch
-zoned-vm-counters-conversion-of-nr_slab-to-per-zone-counter.patch
-zoned-vm-counters-conversion-of-nr_slab-to-per-zone-counter-fix.patch
-zoned-vm-counters-conversion-of-nr_slab-to-per-zone-counter-fix-2.patch
-zoned-vm-counters-conversion-of-nr_pagetables-to-per-zone-counter.patch
-zoned-vm-counters-conversion-of-nr_pagetables-to-per-zone-counter-fix.patch
-zoned-vm-counters-conversion-of-nr_dirty-to-per-zone-counter.patch
-zoned-vm-counters-conversion-of-nr_dirty-to-per-zone-counter-fix.patch
-zoned-vm-counters-conversion-of-nr_writeback-to-per-zone-counter.patch
-zoned-vm-counters-conversion-of-nr_writeback-to-per-zone-counter-fix.patch
-zoned-vm-counters-conversion-of-nr_unstable-to-per-zone-counter.patch
-zoned-vm-counters-conversion-of-nr_unstable-to-per-zone-counter-nfs-fix.patch
-zoned-vm-counters-conversion-of-nr_unstable-to-per-zone-counter-fix.patch
-zoned-vm-counters-conversion-of-nr_bounce-to-per-zone-counter.patch
-zoned-vm-counters-conversion-of-nr_bounce-to-per-zone-counter-fix.patch
-zoned-vm-counters-conversion-of-nr_bounce-to-per-zone-counter-fix-2.patch
-zoned-vm-counters-remove-useless-struct-wbs.patch
-zoned-vm-counters-remove-read_page_state.patch
-use-zoned-vm-counters-for-numa-statistics-v3.patch
-light-weight-event-counters-v5.patch
-slab-consolidate-code-to-free-slabs-from-freelist.patch
-slab-consolidate-code-to-free-slabs-from-freelist-fix.patch
-selinux-extend-task_kill-hook-to-handle-signals-sent.patch
-selinux-add-security-hook-call-to-kill_proc_info_as_uid.patch
-selinux-update-usb-code-with-new-kill_proc_info_as_uid.patch
-add-smp_setup_processor_id.patch
-x86-dont-print-out-smp-info-on-up-kernels.patch
-keys-allow-in-kernel-key-requestor-to-pass-auxiliary-data-to-upcaller.patch
-keys-allow-in-kernel-key-requestor-to-pass-auxiliary-data-to-upcaller-try-2.patch
-cond_resched-fix.patch
-ufs-printk-fix.patch
-arch-i386-mach-visws-setupc-remove-dummy-function-calls.patch
-re-add-config_sound_sscape.patch
-remove-devinit-from-ioc4-pci_driver.patch
-deref-in-drivers-block-paride-pfc.patch
-chardev-gpio-for-scx200-pc-8736x-add-proper-kconfig-makefile-entries.patch
-edac-pci-device-to-device-cleanup.patch
-edac-mc-numbers-refactor-1-of-2.patch
-edac-mc-numbers-refactor-2-of-2.patch
-edac-probe1-cleanup-1-of-2.patch
-edac-probe1-cleanup-2-of-2.patch
-edac-maintainers-update.patch
-i4l-remove-unneeded-include-linux-isdn-tpamh.patch
-skb-leak-in-drivers-isdn-i4l-isdn_x25ifacec.patch
-knfsd-improve-the-test-for-cross-device-rename-in-nfsd.patch
-knfsd-fixing-missing-expkey-support-for-fsid-type-3.patch
-knfsd-remove-noise-about-filehandle-being-uptodate.patch
-knfsd-ignore-ref_fh-when-crossing-a-mountpoint.patch
-knfsd-nfsd4-fix-open_confirm-locking.patch
-knfsd-nfsd-call-nfsd_setuser-on-fh_compose-fix-nfsd4-permissions-problem.patch
-knfsd-nfsd4-remove-superfluous-grace-period-checks.patch
-knfsd-nfsd-fix-misplaced-fh_unlock-in-nfsd_link.patch
-knfsd-svcrpc-gss-simplify-rsc_parse.patch
-knfsd-nfsd4-fix-some-open-argument-tests.patch
-knfsd-nfsd4-fix-open-flag-passing.patch
-knfsd-svcrpc-simplify-nfsd-rpcsec_gss-integrity-code.patch
-knfsd-nfsd-mark-rqstp-to-prevent-use-of-sendfile-in-privacy-case.patch
-knfsd-svcrpc-gss-server-side-implementation-of-rpcsec_gss-privacy.patch
-drivers-md-raid5c-remove-an-unused-variable.patch
-genirq-rename-desc-handler-to-desc-chip.patch
-genirq-rename-desc-handler-to-desc-chip-power-fix.patch
-genirq-rename-desc-handler-to-desc-chip-ia64-fix.patch
-genirq-rename-desc-handler-to-desc-chip-ia64-fix-2.patch
-genirq-rename-desc-handler-to-desc-chip-terminate_irqs-fix.patch
-genirq-rename-desc-handler-to-desc-chip-sparc64-fix.patch
-genirq-sem2mutex-probe_sem-probing_active.patch
-genirq-cleanup-merge-irq_affinity-into-irq_desc.patch
-genirq-cleanup-merge-irq_affinity-into-irq_desc-sparc64-fix.patch
-genirq-cleanup-remove-irq_descp.patch
-genirq-cleanup-remove-irq_descp-fix.patch
-genirq-cleanup-remove-fastcall.patch
-genirq-cleanup-misc-code-cleanups.patch
-genirq-cleanup-reduce-irq_desc_t-use-mark-it-obsolete.patch
-genirq-cleanup-include-linux-irqh.patch
-genirq-cleanup-merge-irq_dir-smp_affinity_entry-into-irq_desc.patch
-genirq-cleanup-merge-pending_irq_cpumask-into-irq_desc.patch
-genirq-cleanup-turn-arch_has_irq_per_cpu-into-config_irq_per_cpu.patch
-genirq-debug-better-debug-printout-in-enable_irq.patch
-genirq-add-retrigger-irq-op-to-consolidate-hw_irq_resend.patch
-genirq-doc-comment-include-linux-irqh-structures.patch
-genirq-doc-handle_irq_event-and-__do_irq-comments.patch
-genirq-cleanup-no_irq_type-cleanups.patch
-genirq-doc-add-design-documentation.patch
-genirq-add-genirq-sw-irq-retrigger.patch
-genirq-add-irq_noprobe-support.patch
-genirq-add-irq_norequest-support.patch
-genirq-add-irq_noautoen-support.patch
-genirq-update-copyrights.patch
-genirq-core.patch
-genirq-core-revert-noisiness-on-spurious-interrupts.patch
-genirq-msi-fixes-2.patch
-genirq-add-irq-chip-support.patch
-genirq-add-irq-chip-support-fix.patch
-genirq-add-irq-chip-support-misroute-irq-dont-call-desc-chip-end.patch
-genirq-add-handle_bad_irq.patch
-genirq-add-irq-wake-power-management-support.patch
-genirq-add-sa_trigger-support.patch
-genirq-cleanup-no_irq_type-no_irq_chip-rename.patch
-genirq-more-verbose-debugging-on-unexpected-irq-vectors.patch
-genirq-ia64-build-fix.patch
-genirq-add-irq_type_sense_mask.patch
-genirq-add-irq-chip-support-fasteoi-handler-handle-interrupt-disabling.patch
-genirq-irq-document-what-an-irq-is.patch
-genirq-add-chip-eoi-fastack-fasteoi-core.patch
-genirq-add-chip-eoi-fastack-fasteoi-fix.patch

 Merged into mainline or a subsystem tree.

+pi-futex-fix-mm_struct-memory-leak.patch
+irq-use-sa_percpu_irq-not-irq_per_cpu-for-irqactionflags.patch
+irq-warning-message-cleanup.patch
+edac-bug-fix-module-names-quoted-in-sysfs.patch
+pi-futex-futex_wake-lockup-fix.patch
+acpi-identify-which-device-is-not-power-manageable.patch

 2.6.17-rc1 queue

-git-acpi-fixup.patch

 Unneeded.

-cpu_relax-use-in-acpi-lock.patch
-cpu_relax-use-in-acpi-lock-fix.patch

 Dropped.

+pnpacpi-support-shareable-interrupts.patch
+serial-allow-shared-8250_pnp-interrupts.patch

 pnpacpi fixes

-git-agpgart-fixup.patch

 Unneeded.

+gregkh-driver-driver-core-bus.c-cleanups.patch
+gregkh-driver-remove-kernel-power-pm.c-pm_unregister_all.patch
+gregkh-driver-the-scheduled-unexport-of-insert_resource.patch
+gregkh-driver-suspend-infrastructure-cleanup-and-extension.patch
+gregkh-driver-suspend-pci.patch

 Driver tree updates.

+gregkh-i2c-w1-fix-idle-check-loop-in-ds2482.patch
+gregkh-i2c-w1-remove-drivers-w1-w1.h.patch

 I2C tree updates

+ib-ipath-name-zero-counter-offsets-so-its-clear.patch
+ib-ipath-update-copyrights-and-other-strings-to.patch
+ib-ipath-share-more-common-code-between-rc-and-uc.patch
+ib-ipath-fix-an-indenting-problem.patch
+ib-ipath-fix-shared-receive-queues-for-rc.patch
+ib-ipath-allow-diags-on-any-unit.patch
+ib-ipath-update-some-comments-and-fix-typos.patch
+ib-ipath-remove-some-duplicate-code.patch
+ib-ipath-dont-allow-resources-to-be-created-with.patch
+ib-ipath-fix-some-memory-leaks-on-failure-paths.patch
+ib-ipath-return-an-error-for-unknown-multicast-gid.patch
+ib-ipath-report-correct-device-identification.patch
+ib-ipath-enforce-device-resource-limits.patch
+ib-ipath-removed-unused-field-ipath_kregvirt-from.patch
+ib-ipath-print-better-debug-info-when-handling.patch
+ib-ipath-enable-freeze-mode-when-shutting-down.patch
+ib-ipath-use-more-appropriate-gfp-flags.patch
+ib-ipath-use-vmalloc-to-allocate-struct.patch
+ib-ipath-memory-management-cleanups.patch
+ib-ipath-reduce-overhead-on-receive-interrupts.patch
+ib-ipath-fixed-bug-9776.patch
+ib-ipath-fix-lost-interrupts-on-ht-400.patch
+ib-ipath-disallow-send-of-invalid-packet-sizes.patch
+ib-ipath-dont-confuse-the-max-message-size-with.patch
+ib-ipath-removed-redundant-statements.patch
+ib-ipath-check-for-valid-lid-and-multicast-lids.patch
+ib-ipath-fixes-to-performance-get-counters-for-ib.patch
+ib-ipath-rc-receive-interrupt-performance-changes.patch
+ib-ipath-purge-sps_lid-and-sps_mlid-arrays.patch
+ib-ipath-drop-the-stats-sysfs-attribute-group.patch
+ib-ipath-support-more-models-of-infinipath-hardware.patch
+ib-ipath-read-write-correct-sizes-through-diag.patch
+ib-ipath-fix-a-bug-that-results-in-addresses-near.patch
+ib-ipath-remove-some-if-0-code-related-to.patch
+ib-ipath-ignore-receive-queue-size-if-srq-is.patch
+ib-ipath-namespace-cleanup-replace-ips-with-ipath.patch

 Infiniband updates

+ib-ipath-fixes-a-bug-where-our-delay-for-eeprom-no.patch

 Unpopular infiniband update

-revert-input-atkbd-fix-hangeul-hanja-keys.patch

 Dropped.

+if-0-drivers-usb-input-hid-corechid_find_field_by_usage.patch

 USB cleanup.

+ia64-kbuild-fix.patch

 Fix kbuild for ia64

-revert-ignore-makes-built-in-rules-variables.patch

 Unneeded.

+git-netdev-all-fixup.patch

 Fix reject due to git-netdev-all.patch

+8139cp-printk-fix.patch

 Fix printk warning

-ni5010-netcard-cleanup-fix.patch

 Folded into ni5010-netcard-cleanup.patch

+ixgb-add-pci-error-recovery-callbacks.patch
+e100-disable-device-on-pci-error.patch
+e1000-disable-device-on-pci-error.patch

 netdev updates

+fix-a-warning-in-ioatdma.patch
+ioat-fix-header-file-kernel-doc.patch
+ioat-fix-kernel-doc-in-source-files.patch

 IOAT driver fixlets

+fs-nfs-make-2-functions-static.patch

 NFS cleanup

+fix-implicit-declaration-on-cell.patch

 powerpc fix

-git-sas-sas_discover-build-fix.patch

 Dropped.

-serial-add-tsi108-8250-serial-support-fix.patch

 Folded into serial-add-tsi108-8250-serial-support.patch

+gregkh-pci-pci-poper-prototype-for-arch-i386-pci-pcbios.c-pcibios_sort.patch
+gregkh-pci-pci-clear-abnormal-poweroff-flag-on-via-southbridges-fix-resume.patch
+gregkh-pci-msi-merge-existing-msi-disabling-quirks.patch
+gregkh-pci-msi-rename-pci_cap_id_ht_irqconf-into-pci_cap_id_ht.patch
+gregkh-pci-msi-blacklist-pci-e-chipsets-depending-on-hypertransport-msi-capabality.patch
+gregkh-pci-msi-factorize-common-msi-detection-code-from-pci_enable_msi-and-msix.patch
+gregkh-pci-msi-stop-inheriting-bus-flags-and-check-root-chipset-bus-flags-instead.patch
+gregkh-pci-msi-drop-pci_msi_quirk.patch
+gregkh-pci-resources-insert-identical-resources-above-existing-resources.patch

 PCI tree updates

-drivers-scsi-qla2xxx-make-more-some-functions-static.patch

 Folded into drivers-scsi-qla2xxx-make-some-functions-static.patch

+stc-improve-sense-output.patch
+my-name-is-ingo-molnar-you-killed-my-make-allyesconfig-prepare-to-die.patch

 scsi fixes.

+gregkh-usb-usb-unusual_devs-entry-for-samsung-mp3-player.patch
+gregkh-usb-usbcore-fixes-for-hub_port_resume.patch
+gregkh-usb-usb-storage-us_fl_max_sectors_64-flag.patch
+gregkh-usb-usb-storage-uname-in-pr-sc-unneeded-message.patch
+gregkh-usb-usb-serial-visor-fix-race-in-open-close.patch
+gregkh-usb-usb-serial-ftdi_sio-prevent-userspace-dos.patch
+gregkh-usb-usb-kill-compiler-warning-in-quirk_usb_handoff_ohci.patch
+gregkh-usb-usb-fix-pointer-dereference-in-drivers-usb-misc-usblcd.patch
+gregkh-usb-usb-add-driver-for-non-composite-sierra-wireless-devices.patch
+gregkh-usb-usb-ehci-fix-bogus-alteration-of-a-local-variable.patch
+gregkh-usb-usb-ipaq.c-bugfixes.patch
+gregkh-usb-usb-ipaq.c-timing-parameters.patch
+gregkh-usb-usb-remove-empty-destructor-from-drivers-usb-mon-mon_text.c.patch
+gregkh-usb-usb-ohci-s3c2410.c-clock-now-usb-bus-host.patch
+gregkh-usb-usb-at91-udc-updates-mostly-power-management.patch
+gregkh-usb-usb-at91-ohci-updates-mostly-power-management.patch
+gregkh-usb-usb-ohci-controller-support-for-pnx4008.patch
+gregkh-usb-usb-move-linux-usb_otg.h-to-linux-usb-otg.h.patch
+gregkh-usb-usb-pxa2xx_udc-understands-gpio-based-vbus-sensing.patch
+gregkh-usb-usb-allow-compile-in-g_ether-fix-typo.patch

 USB updates

+kill-usb-kconfig-warning.patch

 Fix it.

-bcm43xx-opencoded-locking-fix.patch

 Folded into bcm43xx-opencoded-locking.patch

+x86_64-mm-defconfig-update.patch
+x86_64-mm-i386-up-generic-arch.patch
+x86_64-mm-i386-numa-summit-check.patch
+x86_64-mm-temp-revert-arch-perfmon.patch
+x86_64-mm-add-performance-counter-reservation-framework-for-up-kernels.patch
+x86_64-mm-utilize-performance-counter-reservation-framework-in-oprofile.patch
+x86_64-mm-add-smp-support-on-x86_64-to-reservation-framework.patch
+x86_64-mm-add-smp-support-on-i386-to-reservation-framework.patch
+x86_64-mm-cleanup-nmi-interrupt-path.patch
+x86_64-mm-rdtscp-macros.patch
+x86_64-mm-init-rdtscp.patch
+x86_64-mm-mce-amd-fix.patch

 x86-64 tree updates (partial - I dropped all the NMI changes because they
 don't apply and look like they wouldn't build if I fixed them all).

+zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o.patch
+zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o-tunable.patch
+zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o-tunable-rename.patch

 NUMA memory reclaim tweak.

+mm-fixup-do_wp_page.patch

 MM fix

+mm-msync-cleanup-fix.patch

 Fix mm-msync-cleanup.patch

+mm-make-functions-static.patch

 MM cleanup

-lockdep-add-disable-enable_irq_lockdep-api-fix.patch

 Folded into lockdep-add-disable-enable_irq_lockdep-api.patch

-lockdep-stacktrace-subsystem-s390-support-fix.patch

 Folded into lockdep-stacktrace-subsystem-s390-support.patch

-lockdep-irqtrace-subsystem-x86_64-support-fix.patch
-lockdep-irqtrace-subsystem-x86_64-support-fix-2.patch

 Folded into lockdep-irqtrace-subsystem-x86_64-support.patch

-lockdep-core-improve-non-static-key-warning-message.patch
-lockdep-core-cleanups.patch
-lockdep-core-cleanups-2.patch

 Folded into lockdep-core.patch

-lockdep-annotate-vlan-net-device-as-being-a-special-class-fix.patch

 Folded into lockdep-annotate-vlan-net-device-as-being-a-special-class.patch

+lockdep-core-improve-bug-messages.patch
+lockdep-core-add-set_class_and_name.patch
+lockdep-core-add-set_class_and_name-fix.patch
+lockdep-annotate-blkdev-nesting-fix.patch
+lockdep-annotate-sk_locks.patch
+lockdep-annotate-sk_locks-fix.patch

 lockdep updates

+smp-alternatives-skip-with-up-kernels.patch

 x86 alternatives cleanup

-hpet-rtc-emulation-add-watchdog-timer.patch
+hpet-rtc-emulation-add-watchdog-timer-2.patch

 Updated version of this ancient patch.

-destroy-the-dentries-contributed-by-a-superblock-on-unmounting.patch
-destroy-the-dentries-contributed-by-a-superblock-on-unmounting-fix.patch

 Dropped.

+fix-is_err-threshold-value.patch
+rtc-class-driver-for-samsung-s3c-series-soc.patch
+rtc-class-driver-for-samsung-s3c-series-soc-tidy.patch
+hotcpu_notifier-fixes.patch
+add-___rodata-sections-to-asm-generic-sectionsh.patch
+add-___rodata-sections-to-asm-generic-sectionsh-fix.patch
+s390-put-sys_call_table-into-rodata-section-and-write-protect-it.patch
+reiserfs-update-ctime-and-mtime-on-expanding-truncate.patch
+kernel-doc-consistent-text-man-mode-output.patch
+fix-problem-with-atapi-dma-on-it8212-in-linux.patch
+kernel-doc-make-man-text-mode-function-output-same.patch
+fix-and-enable-edac-sysfs-operation.patch
+edac-new-opteron-athlon64-memory-controller-driver.patch
+edac-new-opteron-athlon64-memory-controller-driver-tidy.patch
+drivers-block-nbdc-compile-fix.patch
+pnp-suppress-request_irq-warning.patch

 Misc patches.

+per-task-delay-accounting-taskstats-interface-tidy.patch

 Tweak per-task-delay-accounting-taskstats-interface.patch

+jmicron-pci-identifiers.patch

 PCI IDs for IDE drivers

+fbdev-add-framebuffer-and-display-update-module-support.patch
+vt-remove-vt-specific-declarations-and-definitions-from.patch
+tty-remove-include-of-screen_infoh-from-ttyh.patch

 fbdev updates

+statistics-infrastructure-update-6.patch
+statistics-infrastructure-update-7.patch
+statistics-infrastructure-update-8.patch

 statistics updates

+genirq-convert-the-x86_64-architecture-to-irq-chips.patch
+genirq-add-chip-eoi-fastack-fasteoi-x86_64.patch
+genirq-convert-the-i386-architecture-to-irq-chips.patch
+genirq-convert-the-i386-architecture-to-irq-chips-fix-2.patch
+genirq-add-chip-eoi-fastack-fasteoi-x86.patch
+genirq-irq-convert-the-move_irq-flag-from-a-32bit-word-to-a-single-bit.patch
+genirq-irq-add-moved_masked_irq.patch
+genirq-x86_64-irq-reenable-migrating-irqs-to-other-cpus.patch
+genirq-x86_64-irq-reenable-migrating-irqs-to-other-cpus-fix.patch
+genirq-msi-simplify-msi-enable-and-disable.patch
+genirq-msi-simplify-msi-enable-and-disable-fix.patch
+genirq-msi-make-the-msi-boolean-tests-return-either-0-or-1.patch
+genirq-msi-implement-helper-functions-read_msi_msg-and-write_msi_msg.patch
+genirq-msi-refactor-the-msi_ops.patch
+genirq-msi-simplify-the-msi-irq-limit-policy.patch
+genirq-irq-add-a-dynamic-irq-creation-api.patch
+genirq-ia64-irq-dynamic-irq-support.patch
+genirq-ia64-irq-dynamic-irq-support-fix.patch
+genirq-i386-irq-dynamic-irq-support.patch
+genirq-i386-irq-dynamic-irq-support-fix.patch
+genirq-x86_64-irq-dynamic-irq-support.patch
+genirq-msi-make-the-msi-code-irq-based-and-not-vector-based.patch
+genirq-x86_64-irq-move-msi-message-composition-into-io_apicc.patch
+genirq-i386-irq-move-msi-message-composition-into-io_apicc.patch
+genirq-msi-only-build-msi-apicc-on-ia64.patch
+genirq-x86_64-irq-remove-the-msi-assumption-that-irq-==-vector.patch
+genirq-i386-irq-remove-the-msi-assumption-that-irq-==-vector.patch
+genirq-i386-irq-remove-the-msi-assumption-that-irq-==-vector-fix.patch
+genirq-i386-irq-remove-the-msi-assumption-that-irq-==-vector-fix-tidies.patch
+genirq-irq-remove-msi-hacks.patch
+genirq-irq-generalize-the-check-for-hardirq_bits.patch
+genirq-x86_64-irq-make-the-external-irq-handlers-report-their-vector-not-the-irq-number.patch
+genirq-x86_64-irq-make-vector_irq-per-cpu.patch
+genirq-x86_64-irq-kill-gsi_irq_sharing.patch
+genirq-x86_64-irq-kill-irq-compression.patch

 Restore the genirq implementation for various architectures.

-ro-bind-mounts-prepare-for-write-access-checks-collapse-if.patch
-ro-bind-mounts-r-o-bind-mount-prepwork-move-open_nameis-vfs_create.patch
-ro-bind-mounts-add-vfsmount-writer-count.patch
-ro-bind-mounts-elevate-mnt-writers-for-callers-of-vfs_mkdir.patch
-ro-bind-mounts-elevate-write-count-during-entire-ncp_ioctl.patch
-ro-bind-mounts-elevate-write-count-during-entire-ncp_ioctl-tidy.patch
-ro-bind-mounts-sys_symlinkat-elevate-write-count-around-vfs_symlink.patch
-ro-bind-mounts-elevate-mount-count-for-extended-attributes.patch
-ro-bind-mounts-sys_linkat-elevate-write-count-around-vfs_link.patch
-ro-bind-mounts-mount_is_safe-add-comment.patch
-ro-bind-mounts-unix_find_other-elevate-write-count-for-touch_atime.patch
-ro-bind-mounts-elevate-write-count-over-calls-to-vfs_rename.patch
-ro-bind-mounts-tricky-elevate-write-count-files-are-opened.patch
-ro-bind-mounts-elevate-writer-count-for-do_sys_truncate.patch
-ro-bind-mounts-elevate-write-count-for-do_utimes.patch
-ro-bind-mounts-elevate-write-count-for-do_sys_utime-and-touch_atime.patch
-ro-bind-mounts-sys_mknodat-elevate-write-count-for-vfs_mknod-create.patch
-ro-bind-mounts-elevate-mnt-writers-for-vfs_unlink-callers.patch
-ro-bind-mounts-do_rmdir-elevate-write-count.patch
-ro-bind-mounts-elevate-writer-count-for-custom-struct-file.patch
-ro-bind-mounts-honor-r-w-changes-at-do_remount-time.patch

 Dropped.

+the-scheduled-removal-of-some-oss-drivers.patch

 Remove lots of OSS drivers

+make-more-file_operation-structs-static.patch
+make-more-file_operation-structs-static-fix.patch

 constify some file_operations structs.

-slab-leak-detector.patch

 Dropped.

+kernel-printkc-export_symbol_unused.patch
+mm-bootmemc-export_unused_symbol.patch
+mm-memoryc-export_unused_symbol.patch
+mm-mmzonec-export_unused_symbol.patch
+fs-read_writec-export_unused_symbol.patch
+export_unused_symbolgpl-unregister_die_notifier.patch
+kernel-softirqc-export_unused_symbol.patch

 Fiddle with exports.



All 791 patches:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17/2.6.17-mm5/patch-list




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 10:35 ` 2.6.17-mm5 Andrew Morton
@ 2006-07-01 11:08   ` Reuben Farrelly
  2006-07-01 11:51     ` 2.6.17-mm5 Andrew Morton
  2006-07-01 18:03   ` 2.6.17-mm5 Ralf Hildebrandt
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: Reuben Farrelly @ 2006-07-01 11:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel



On 1/07/2006 10:35 p.m., Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17/2.6.17-mm5/
> 
> 
> Nothing very exciting here - a few buggy patches were fixed or dropped.

Ouch:

Bootdata ok (command line is ro root=/dev/md0 panic=60 console=ttyS0,57600 single)
Linux version 2.6.17-mm5 (root@tornado.reub.net) (gcc version 4.1.1 20060629 
(Red Hat 4.1.1-6)) #1 SMP Sat Jul 1 22:59:00 NZST 2006
BIOS-provided physical RAM map:
  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
  BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
  BIOS-e820: 0000000000100000 - 000000003f670000 (usable)
  BIOS-e820: 000000003f670000 - 000000003f6e9000 (ACPI NVS)
  BIOS-e820: 000000003f6e9000 - 000000003f6ec000 (usable)
  BIOS-e820: 000000003f6ec000 - 000000003f6ff000 (ACPI data)
  BIOS-e820: 000000003f6ff000 - 000000003f700000 (usable)
DMI 2.3 present.
ACPI: PM-Timer IO Port: 0x408
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:4 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:4 APIC version 20
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x82] disabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x83] disabled)
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Setting APIC routing to flat
ACPI: HPET id: 0x8086a201 base: 0xfed00000
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 40000000 (gap: 3f700000:c0900000)
Built 1 zonelists.  Total pages: 254547
Kernel command line: ro root=/dev/md0 panic=60 console=ttyS0,57600 single
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
Checking aperture...
Memory: 1015044k/1039360k available (2569k kernel code, 22788k reserved, 1660k 
data, 216k init)
Calibrating delay using timer specific routine.. 6006.40 BogoMIPS (lpj=12012800)
Security Framework v1.0.0 initialized
SELinux:  Initializing.
SELinux:  Starting in permissive mode
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 256
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 2048K
using mwait in idle threads.
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
CPU0: Thermal monitoring enabled (TM1)
Freeing SMP alternatives: 28k freed
ACPI: Core revision 20060623
Using local APIC timer interrupts.
result 12500450
Detected 12.500 MHz APIC timer.
Booting processor 1/2 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 5999.87 BogoMIPS (lpj=11999755)
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 2048K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
CPU1: Thermal monitoring enabled (TM1)
               Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 03
Brought up 2 CPUs
testing NMI watchdog ... OK.
time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer.
time.c: Detected 3000.123 MHz processor.
migration_cost=4
checking if image is initramfs... it is
Freeing initrd memory: 877k freed
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
PCI: Not using MMCONFIG.
PCI: Using configuration type 1
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
PCI: Transparent bridge - 0000:00:1e.0
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 7 9 10 *11 12)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 7 9 10 11 12) *0, disabled.
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 7 9 10 *11 12)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 7 9 *10 11 12)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 7 9 10 11 12) *0, disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 7 9 10 11 12) *0, disabled.
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 7 9 10 11 12) *0, disabled.
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 7 *9 10 11 12)
Intel 82802 RNG detected
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
hpet0: at MMIO 0xfed00000 (virtual 0xffffffffff5fe000), IRQs 2, 8, 0
hpet0: 3 64-bit timers, 14318180 Hz
PCI-GART: No AMD northbridge found.
PCI: Ignore bogus resource 6 [0:0] of 0000:00:02.0
PCI: Bridge: 0000:00:1c.0
   IO window: 2000-2fff
   MEM window: 48000000-480fffff
   PREFETCH window: disabled.
PCI: Bridge: 0000:00:1c.2
   IO window: disabled.
   MEM window: disabled.
   PREFETCH window: disabled.
PCI: Bridge: 0000:00:1c.3
   IO window: disabled.
   MEM window: disabled.
   PREFETCH window: disabled.
PCI: Bridge: 0000:00:1c.4
   IO window: disabled.
   MEM window: disabled.
   PREFETCH window: disabled.
PCI: Bridge: 0000:00:1c.5
   IO window: disabled.
   MEM window: disabled.
   PREFETCH window: disabled.
PCI: Bridge: 0000:00:1e.0
   IO window: 1000-1fff
   MEM window: disabled.
   PREFETCH window: disabled.
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 6, 262144 bytes)
TCP established hash table entries: 131072 (order: 9, 2097152 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
audit: initializing netlink socket (disabled)
audit(1151751831.012:1): initialized
SELinux:  Registering netfilter hooks
Initializing Cryptographic API
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered (default)
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
ACPI: Power Button (FF) [PWRF]
ACPI: Sleep Button (CM) [SLPB]
ACPI: Getting cpuindex for acpiid 0x3
ACPI: Getting cpuindex for acpiid 0x4
Real Time Clock Driver v1.12ac
Linux agpgart interface v0.101 (c) Dave Jones
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
ÿserial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ACPI: PCI Interrupt 0000:06:03.0[A] -> GSI 19 (level, low) -> IRQ 19
0000:06:03.0: ttyS1 at I/O 0x1000 (irq = 19) is a 16550A
0000:06:03.0: ttyS2 at I/O 0x1008 (irq = 19) is a 16550A
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
RAMDISK driver initialized: 4 RAM disks of 16384K size 1024 blocksize
Intel(R) PRO/1000 Network Driver - version 7.0.38-k4-NAPI
Copyright (c) 1999-2006 Intel Corporation.
ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16
e1000: 0000:01:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x1) 00:13:20:60:b4:23
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ICH7: IDE controller at PCI slot 0000:00:1f.1
ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 18
ICH7: chipset revision 1
ICH7: not 100% native mode: will probe irqs later
     ide0: BM-DMA at 0x30b0-0x30b7, BIOS settings: hda:DMA, hdb:pio
hda: PIONEER DVD-RW DVR-111D, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Unable to handle kernel NULL pointer dereference at 00000000000000ce RIP:
  [<ffffffff80363a96>] pci_msi_supported+0x37/0x4b
PGD 0
Oops: 0000 [1] SMP
last sysfs file:
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.17-mm5 #1
RIP: 0010:[<ffffffff80363a96>]  [<ffffffff80363a96>] pci_msi_supported+0x37/0x4b
RSP: 0000:ffff81003f601b88  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff81003ec659c8 RCX: 00000000481a0000
RDX: 00000000481a03ff RSI: ffff810037f9aa80 RDI: ffff81003ec65800
RBP: ffff81003f601b88 R08: 0000000000000000 R09: 0000000000000000
R10: ffff810037f9aa80 R11: 0000000000000040 R12: ffff81003ec65800
R13: 0000000000000000 R14: ffffffff805a0620 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff80685000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000000000ce CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 1, threadinfo ffff81003f600000, task ffff810001fb8740)
Stack:  ffff81003f601bf8 ffffffff80364909 ffff81003f601bc8 ffffffff8035dbee
  0000000000000000 0000000000000005 ffffffff804c8166 ffff81003ec65800
  ffff81003f601bf8 ffff81003ec659c8 ffff81003ec65800 0000000000000000
Call Trace:
  [<ffffffff80364909>] pci_enable_msi+0x19/0x2f2
  [<ffffffff8035dbee>] pci_request_region+0xce/0x180
  [<ffffffff803e8867>] ahci_init_one+0x88/0x93a
  [<ffffffff8026311d>] wait_for_completion+0xb2/0x112
  [<ffffffff80280b4f>] default_wake_function+0x0/0xf
  [<ffffffff80290dcc>] call_usermodehelper_keys+0xd4/0xe8
  [<ffffffff80290de0>] __call_usermodehelper+0x0/0x64
  [<ffffffff8025affa>] kobject_get+0x1a/0x24
  [<ffffffff8035ff1c>] pci_device_probe+0x4d/0x78
  [<ffffffff803aaa8f>] driver_probe_device+0x5c/0xb4
  [<ffffffff803aabc9>] __driver_attach+0x67/0xb9
  [<ffffffff803aab62>] __driver_attach+0x0/0xb9
  [<ffffffff803aa44f>] bus_for_each_dev+0x4f/0x79
  [<ffffffff803aa9bc>] driver_attach+0x1c/0x1e
  [<ffffffff803aa01a>] bus_add_driver+0x7a/0x143
  [<ffffffff803aae63>] driver_register+0x9f/0xa6
  [<ffffffff80280b6e>] wake_up_process+0x10/0x12
  [<ffffffff80360107>] __pci_register_driver+0x59/0x7e
  [<ffffffff806b7799>] ahci_init+0x12/0x14
  [<ffffffff80267ece>] init+0x14e/0x2c2
  [<ffffffff80227b67>] schedule_tail+0x37/0x9e
  [<ffffffff80260972>] child_rip+0x8/0x12
  [<ffffffff80267d80>] init+0x0/0x2c2
  [<ffffffff8026096a>] child_rip+0x0/0x12


Code: f6 80 ce 00 00 00 01 75 04 31 c0 eb 05 b8 ff ff ff ff 5d c3
RIP  [<ffffffff80363a96>] pci_msi_supported+0x37/0x4b
  RSP <ffff81003f601b88>
CR2: 00000000000000ce
  <0>Kernel panic - not syncing: Attempted to kill init!
  <0>Rebooting in 60 seconds..

Hardware is listed at http://www.reub.net/files/kernel/system-hardware

Reuben


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 11:08   ` 2.6.17-mm5 Reuben Farrelly
@ 2006-07-01 11:51     ` Andrew Morton
  2006-07-01 12:31       ` 2.6.17-mm5 Reuben Farrelly
  0 siblings, 1 reply; 35+ messages in thread
From: Andrew Morton @ 2006-07-01 11:51 UTC (permalink / raw)
  To: Reuben Farrelly; +Cc: linux-kernel, Brice Goglin, Greg KH

On Sat, 01 Jul 2006 23:08:40 +1200
Reuben Farrelly <reuben-lkml@reub.net> wrote:

> 
> 
> On 1/07/2006 10:35 p.m., Andrew Morton wrote:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17/2.6.17-mm5/
> > 
> > 
> > Nothing very exciting here - a few buggy patches were fixed or dropped.
> 
> Ouch:

Well I didn't say that new buggy patches weren't added.

>      ide0: BM-DMA at 0x30b0-0x30b7, BIOS settings: hda:DMA, hdb:pio
> hda: PIONEER DVD-RW DVR-111D, ATAPI CD/DVD-ROM drive
> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> Unable to handle kernel NULL pointer dereference at 00000000000000ce RIP:
>   [<ffffffff80363a96>] pci_msi_supported+0x37/0x4b
> PGD 0
> Oops: 0000 [1] SMP
> last sysfs file:
> CPU 0
> Modules linked in:
> Pid: 1, comm: swapper Not tainted 2.6.17-mm5 #1
> RIP: 0010:[<ffffffff80363a96>]  [<ffffffff80363a96>] pci_msi_supported+0x37/0x4b
> RSP: 0000:ffff81003f601b88  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff81003ec659c8 RCX: 00000000481a0000
> RDX: 00000000481a03ff RSI: ffff810037f9aa80 RDI: ffff81003ec65800
> RBP: ffff81003f601b88 R08: 0000000000000000 R09: 0000000000000000
> R10: ffff810037f9aa80 R11: 0000000000000040 R12: ffff81003ec65800
> R13: 0000000000000000 R14: ffffffff805a0620 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffffffff80685000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00000000000000ce CR3: 0000000000201000 CR4: 00000000000006e0
> Process swapper (pid: 1, threadinfo ffff81003f600000, task ffff810001fb8740)
> Stack:  ffff81003f601bf8 ffffffff80364909 ffff81003f601bc8 ffffffff8035dbee
>   0000000000000000 0000000000000005 ffffffff804c8166 ffff81003ec65800
>   ffff81003f601bf8 ffff81003ec659c8 ffff81003ec65800 0000000000000000
> Call Trace:
>   [<ffffffff80364909>] pci_enable_msi+0x19/0x2f2
>   [<ffffffff8035dbee>] pci_request_region+0xce/0x180
>   [<ffffffff803e8867>] ahci_init_one+0x88/0x93a
>   [<ffffffff8026311d>] wait_for_completion+0xb2/0x112
>   [<ffffffff80280b4f>] default_wake_function+0x0/0xf
>   [<ffffffff80290dcc>] call_usermodehelper_keys+0xd4/0xe8
>   [<ffffffff80290de0>] __call_usermodehelper+0x0/0x64
>   [<ffffffff8025affa>] kobject_get+0x1a/0x24
>   [<ffffffff8035ff1c>] pci_device_probe+0x4d/0x78
>   [<ffffffff803aaa8f>] driver_probe_device+0x5c/0xb4
>   [<ffffffff803aabc9>] __driver_attach+0x67/0xb9
>   [<ffffffff803aab62>] __driver_attach+0x0/0xb9
>   [<ffffffff803aa44f>] bus_for_each_dev+0x4f/0x79
>   [<ffffffff803aa9bc>] driver_attach+0x1c/0x1e
>   [<ffffffff803aa01a>] bus_add_driver+0x7a/0x143
>   [<ffffffff803aae63>] driver_register+0x9f/0xa6
>   [<ffffffff80280b6e>] wake_up_process+0x10/0x12
>   [<ffffffff80360107>] __pci_register_driver+0x59/0x7e
>   [<ffffffff806b7799>] ahci_init+0x12/0x14
>   [<ffffffff80267ece>] init+0x14e/0x2c2
>   [<ffffffff80227b67>] schedule_tail+0x37/0x9e
>   [<ffffffff80260972>] child_rip+0x8/0x12
>   [<ffffffff80267d80>] init+0x0/0x2c2
>   [<ffffffff8026096a>] child_rip+0x0/0x12
> 
> 
> Code: f6 80 ce 00 00 00 01 75 04 31 c0 eb 05 b8 ff ff ff ff 5d c3

It oopsed here:

static
int pci_msi_supported(struct pci_dev * dev)
{
	struct pci_dev *pdev;

	if (!pci_msi_enable || !dev || dev->no_msi)
		return -1;

	/* find root complex for our device */
	pdev = dev;
	while (pdev->bus && pdev->bus->self)
		pdev = pdev->bus->self;

	/* check its bus flags */
	if (pdev->subordinate->bus_flags & PCI_BUS_FLAGS_NO_MSI)
		return -1;

	return 0;
}

pdev->subordinate is NULL.

Two patch series touch that file.  The generic-irq wire-up and a couple of new
ones in Greg's tree.  I'd be suspecting
gregkh-pci-msi-stop-inheriting-bus-flags-and-check-root-chipset-bus-flags-instead.patch.


To confirm that, could you please test 2.6.17 plus
http://www.zip.com.au/~akpm/linux/patches/stuff/rf.bz2 with the same
.config?  That's everything up to but not including the genirq changes.


You may find that this gets things going again:

--- a/drivers/pci/msi.c~a
+++ a/drivers/pci/msi.c
@@ -913,6 +913,9 @@ int pci_msi_supported(struct pci_dev * d
 	while (pdev->bus && pdev->bus->self)
 		pdev = pdev->bus->self;
 
+	if (!pdev->subordinate)
+		return -1;
+
 	/* check its bus flags */
 	if (pdev->subordinate->bus_flags & PCI_BUS_FLAGS_NO_MSI)
 		return -1;
_

Or disable CONFIG_PCI_MSI.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 11:51     ` 2.6.17-mm5 Andrew Morton
@ 2006-07-01 12:31       ` Reuben Farrelly
  2006-07-01 13:06         ` 2.6.17-mm5 Brice Goglin
  0 siblings, 1 reply; 35+ messages in thread
From: Reuben Farrelly @ 2006-07-01 12:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Brice Goglin, Greg KH



On 1/07/2006 11:51 p.m., Andrew Morton wrote:
> On Sat, 01 Jul 2006 23:08:40 +1200
> Reuben Farrelly <reuben-lkml@reub.net> wrote:
> 
>>
>> On 1/07/2006 10:35 p.m., Andrew Morton wrote:
>>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17/2.6.17-mm5/
>>>
>>>
>>> Nothing very exciting here - a few buggy patches were fixed or dropped.
>> Ouch:
> 
> Well I didn't say that new buggy patches weren't added.
> 
>>      ide0: BM-DMA at 0x30b0-0x30b7, BIOS settings: hda:DMA, hdb:pio
>> hda: PIONEER DVD-RW DVR-111D, ATAPI CD/DVD-ROM drive
>> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
>> Unable to handle kernel NULL pointer dereference at 00000000000000ce RIP:
>>   [<ffffffff80363a96>] pci_msi_supported+0x37/0x4b
>> PGD 0
>> Oops: 0000 [1] SMP
>> last sysfs file:
>> CPU 0
>> Modules linked in:
>> Pid: 1, comm: swapper Not tainted 2.6.17-mm5 #1
>> RIP: 0010:[<ffffffff80363a96>]  [<ffffffff80363a96>] pci_msi_supported+0x37/0x4b
>> RSP: 0000:ffff81003f601b88  EFLAGS: 00010246
>> RAX: 0000000000000000 RBX: ffff81003ec659c8 RCX: 00000000481a0000
>> RDX: 00000000481a03ff RSI: ffff810037f9aa80 RDI: ffff81003ec65800
>> RBP: ffff81003f601b88 R08: 0000000000000000 R09: 0000000000000000
>> R10: ffff810037f9aa80 R11: 0000000000000040 R12: ffff81003ec65800
>> R13: 0000000000000000 R14: ffffffff805a0620 R15: 0000000000000000
>> FS:  0000000000000000(0000) GS:ffffffff80685000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>> CR2: 00000000000000ce CR3: 0000000000201000 CR4: 00000000000006e0
>> Process swapper (pid: 1, threadinfo ffff81003f600000, task ffff810001fb8740)
>> Stack:  ffff81003f601bf8 ffffffff80364909 ffff81003f601bc8 ffffffff8035dbee
>>   0000000000000000 0000000000000005 ffffffff804c8166 ffff81003ec65800
>>   ffff81003f601bf8 ffff81003ec659c8 ffff81003ec65800 0000000000000000
>> Call Trace:
>>   [<ffffffff80364909>] pci_enable_msi+0x19/0x2f2
>>   [<ffffffff8035dbee>] pci_request_region+0xce/0x180
>>   [<ffffffff803e8867>] ahci_init_one+0x88/0x93a
>>   [<ffffffff8026311d>] wait_for_completion+0xb2/0x112
>>   [<ffffffff80280b4f>] default_wake_function+0x0/0xf
>>   [<ffffffff80290dcc>] call_usermodehelper_keys+0xd4/0xe8
>>   [<ffffffff80290de0>] __call_usermodehelper+0x0/0x64
>>   [<ffffffff8025affa>] kobject_get+0x1a/0x24
>>   [<ffffffff8035ff1c>] pci_device_probe+0x4d/0x78
>>   [<ffffffff803aaa8f>] driver_probe_device+0x5c/0xb4
>>   [<ffffffff803aabc9>] __driver_attach+0x67/0xb9
>>   [<ffffffff803aab62>] __driver_attach+0x0/0xb9
>>   [<ffffffff803aa44f>] bus_for_each_dev+0x4f/0x79
>>   [<ffffffff803aa9bc>] driver_attach+0x1c/0x1e
>>   [<ffffffff803aa01a>] bus_add_driver+0x7a/0x143
>>   [<ffffffff803aae63>] driver_register+0x9f/0xa6
>>   [<ffffffff80280b6e>] wake_up_process+0x10/0x12
>>   [<ffffffff80360107>] __pci_register_driver+0x59/0x7e
>>   [<ffffffff806b7799>] ahci_init+0x12/0x14
>>   [<ffffffff80267ece>] init+0x14e/0x2c2
>>   [<ffffffff80227b67>] schedule_tail+0x37/0x9e
>>   [<ffffffff80260972>] child_rip+0x8/0x12
>>   [<ffffffff80267d80>] init+0x0/0x2c2
>>   [<ffffffff8026096a>] child_rip+0x0/0x12
>>
>>
>> Code: f6 80 ce 00 00 00 01 75 04 31 c0 eb 05 b8 ff ff ff ff 5d c3
> 
> It oopsed here:
> 
> static
> int pci_msi_supported(struct pci_dev * dev)
> {
> 	struct pci_dev *pdev;
> 
> 	if (!pci_msi_enable || !dev || dev->no_msi)
> 		return -1;
> 
> 	/* find root complex for our device */
> 	pdev = dev;
> 	while (pdev->bus && pdev->bus->self)
> 		pdev = pdev->bus->self;
> 
> 	/* check its bus flags */
> 	if (pdev->subordinate->bus_flags & PCI_BUS_FLAGS_NO_MSI)
> 		return -1;
> 
> 	return 0;
> }
> 
> pdev->subordinate is NULL.
> 
> Two patch series touch that file.  The generic-irq wire-up and a couple of new
> ones in Greg's tree.  I'd be suspecting
> gregkh-pci-msi-stop-inheriting-bus-flags-and-check-root-chipset-bus-flags-instead.patch.
> 
> 
> To confirm that, could you please test 2.6.17 plus
> http://www.zip.com.au/~akpm/linux/patches/stuff/rf.bz2 with the same
> .config?  That's everything up to but not including the genirq changes.

   CC      arch/x86_64/kernel/smp.o
   CC      arch/x86_64/kernel/smpboot.o
   AS      arch/x86_64/kernel/trampoline.o
   CC      arch/x86_64/kernel/apic.o
   CC      arch/x86_64/kernel/nmi.o
   CC      arch/x86_64/kernel/io_apic.o
arch/x86_64/kernel/io_apic.c: In function 'ioapic_register_intr':
arch/x86_64/kernel/io_apic.c:887: error: 'handle_fastack_irq' undeclared (first
use in this function)
arch/x86_64/kernel/io_apic.c:887: error: (Each undeclared identifier is reported
only once
arch/x86_64/kernel/io_apic.c:887: error: for each function it appears in.)
arch/x86_64/kernel/io_apic.c: In function 'setup_ExtINT_IRQ0_pin':
arch/x86_64/kernel/io_apic.c:992: error: 'handle_fastack_irq' undeclared (first
use in this function)
arch/x86_64/kernel/io_apic.c: In function 'check_timer':
arch/x86_64/kernel/io_apic.c:1830: error: 'handle_fastack_irq' undeclared (first
use in this function)
make[1]: *** [arch/x86_64/kernel/io_apic.o] Error 1
make: *** [arch/x86_64/kernel] Error 2
[root@tornado linux-2.6-mm-temp-mm5tester]#

No go :(

> You may find that this gets things going again:
> 
> --- a/drivers/pci/msi.c~a
> +++ a/drivers/pci/msi.c
> @@ -913,6 +913,9 @@ int pci_msi_supported(struct pci_dev * d
>  	while (pdev->bus && pdev->bus->self)
>  		pdev = pdev->bus->self;
>  
> +	if (!pdev->subordinate)
> +		return -1;
> +
>  	/* check its bus flags */
>  	if (pdev->subordinate->bus_flags & PCI_BUS_FLAGS_NO_MSI)
>  		return -1;
> _
Yes it does.  (Until I then notice that my raid-1 is still broken, but that's 
another story, and to be expected...)

reuben


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 12:31       ` 2.6.17-mm5 Reuben Farrelly
@ 2006-07-01 13:06         ` Brice Goglin
  2006-07-01 17:00           ` 2.6.17-mm5 Greg KH
  0 siblings, 1 reply; 35+ messages in thread
From: Brice Goglin @ 2006-07-01 13:06 UTC (permalink / raw)
  To: Reuben Farrelly; +Cc: Andrew Morton, linux-kernel, Greg KH

Reuben Farrelly wrote:
>>
>> It oopsed here:
>>
>> static
>> int pci_msi_supported(struct pci_dev * dev)
>> {
>>     struct pci_dev *pdev;
>>
>>     if (!pci_msi_enable || !dev || dev->no_msi)
>>         return -1;
>>
>>     /* find root complex for our device */
>>     pdev = dev;
>>     while (pdev->bus && pdev->bus->self)
>>         pdev = pdev->bus->self;
>>
>>     /* check its bus flags */
>>     if (pdev->subordinate->bus_flags & PCI_BUS_FLAGS_NO_MSI)
>>         return -1;
>>
>>     return 0;
>> }
>>
>> pdev->subordinate is NULL.
>>
>
>> You may find that this gets things going again:
>>
>> --- a/drivers/pci/msi.c~a
>> +++ a/drivers/pci/msi.c
>> @@ -913,6 +913,9 @@ int pci_msi_supported(struct pci_dev * d
>>      while (pdev->bus && pdev->bus->self)
>>          pdev = pdev->bus->self;
>>  
>> +    if (!pdev->subordinate)
>> +        return -1;
>> +
>>      /* check its bus flags */
>>      if (pdev->subordinate->bus_flags & PCI_BUS_FLAGS_NO_MSI)
>>          return -1;
>> _
> Yes it does.

I was not expecting a root chipset without subordinate bus... Maybe we
should store the NO_MSI flags in the device itself instead of in its
subordinate bus (I would have to rework all my patches then). After all,
we don't inherit bus flags anymore, and I don't see why bus flags would
have been chosen initially except to help flags inheritance.
I am still convinced that checking to root chipset (bus) flags only is a
good idea since the root chipset is where MSI are translated from PCI
messages into DMA (we don't care about MSI support in the bridges
between the chipset and the devices since they only forward PCI messages).

Brice


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 13:06         ` 2.6.17-mm5 Brice Goglin
@ 2006-07-01 17:00           ` Greg KH
  0 siblings, 0 replies; 35+ messages in thread
From: Greg KH @ 2006-07-01 17:00 UTC (permalink / raw)
  To: Brice Goglin; +Cc: Reuben Farrelly, Andrew Morton, linux-kernel

On Sat, Jul 01, 2006 at 09:06:14AM -0400, Brice Goglin wrote:
> Reuben Farrelly wrote:
> >>
> >> It oopsed here:
> >>
> >> static
> >> int pci_msi_supported(struct pci_dev * dev)
> >> {
> >>     struct pci_dev *pdev;
> >>
> >>     if (!pci_msi_enable || !dev || dev->no_msi)
> >>         return -1;
> >>
> >>     /* find root complex for our device */
> >>     pdev = dev;
> >>     while (pdev->bus && pdev->bus->self)
> >>         pdev = pdev->bus->self;
> >>
> >>     /* check its bus flags */
> >>     if (pdev->subordinate->bus_flags & PCI_BUS_FLAGS_NO_MSI)
> >>         return -1;
> >>
> >>     return 0;
> >> }
> >>
> >> pdev->subordinate is NULL.
> >>
> >
> >> You may find that this gets things going again:
> >>
> >> --- a/drivers/pci/msi.c~a
> >> +++ a/drivers/pci/msi.c
> >> @@ -913,6 +913,9 @@ int pci_msi_supported(struct pci_dev * d
> >>      while (pdev->bus && pdev->bus->self)
> >>          pdev = pdev->bus->self;
> >>  
> >> +    if (!pdev->subordinate)
> >> +        return -1;
> >> +
> >>      /* check its bus flags */
> >>      if (pdev->subordinate->bus_flags & PCI_BUS_FLAGS_NO_MSI)
> >>          return -1;
> >> _
> > Yes it does.
> 
> I was not expecting a root chipset without subordinate bus... Maybe we
> should store the NO_MSI flags in the device itself instead of in its
> subordinate bus (I would have to rework all my patches then).

If that solves this issue, I guess so.

> After all,
> we don't inherit bus flags anymore, and I don't see why bus flags would
> have been chosen initially except to help flags inheritance.
> I am still convinced that checking to root chipset (bus) flags only is a
> good idea since the root chipset is where MSI are translated from PCI
> messages into DMA (we don't care about MSI support in the bridges
> between the chipset and the devices since they only forward PCI messages).

Yes, I agree with that, just be able to handle the above issue too :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 10:35 ` 2.6.17-mm5 Andrew Morton
  2006-07-01 11:08   ` 2.6.17-mm5 Reuben Farrelly
@ 2006-07-01 18:03   ` Ralf Hildebrandt
  2006-07-01 18:14   ` 2.6.17-mm5 dislikes raid-1, just like mm4 Helge Hafting
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 35+ messages in thread
From: Ralf Hildebrandt @ 2006-07-01 18:03 UTC (permalink / raw)
  To: linux-kernel


Starting with -mm4 and now with -mm5 I see:

> Jul  1 19:54:29 knarzkiste kernel: powernow-k8: Found 1 AMD Turion(tm) 64 Mobile Technology ML-30 processors (version 2.00.00)
> Jul  1 19:54:29 knarzkiste kernel: ACPI: Invalid package argument
> Jul  1 19:54:29 knarzkiste kernel: ACPI Exception (acpi_processor-0272): AE_BAD_PARAMETER, Invalid _PSS data [20060623]
> Jul  1 19:54:29 knarzkiste kernel: powernow-k8:    0 : fid 0x0 (800 MHz), vid 0x12
> Jul  1 19:54:29 knarzkiste kernel: powernow-k8:    1 : fid 0x8 (1600 MHz), vid 0x4
> Jul  1 19:54:29 knarzkiste kernel: powernow-k8: ph2 null fid transition 0x8

I'm not sure if The "ACPI: Invalid package argument" and "ACPI Exception" are indicative of a real problem.

> Jul  1 19:54:15 knarzkiste kernel: CPU: AMD Turion(tm) 64 Mobile Technology ML-30 stepping 02
> Jul  1 19:54:15 knarzkiste kernel: Checking 'hlt' instruction... OK.
> Jul  1 19:54:15 knarzkiste kernel: ACPI: Core revision 20060623
> Jul  1 19:54:15 knarzkiste kernel: ENABLING IO-APIC IRQs

-- 
Ralf Hildebrandt (i.A. des IT-Zentrums)         Ralf.Hildebrandt@charite.de
Charite - Universitätsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-Berlin    Fax.  +49 (0)30-450 570-962
IT-Zentrum Standort CBF                 send no mail to spamtrap@charite.de

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-01 10:35 ` 2.6.17-mm5 Andrew Morton
  2006-07-01 11:08   ` 2.6.17-mm5 Reuben Farrelly
  2006-07-01 18:03   ` 2.6.17-mm5 Ralf Hildebrandt
@ 2006-07-01 18:14   ` Helge Hafting
  2006-07-01 22:22     ` Andrew Morton
  2006-07-02  3:51     ` Tejun Heo
       [not found]   ` <20060701142419.GB28750@tlg.swandive.local>
                     ` (3 subsequent siblings)
  6 siblings, 2 replies; 35+ messages in thread
From: Helge Hafting @ 2006-07-01 18:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

I  just got mm5 up, and it has the same problem as mm4.
Raid-1 does not work. I used 2.6.16 to resync my raids,
and booted into 2.6.17-mm5.

[...]
md:  adding sda2 ...
md: created md0
md: bind<sda2>
md: bind<sdb1>
md: running: <sdb1><sda2>
raid1: raid set md0 active with 2 out of 2 mirrors
md: ... autorun DONE.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
  Vendor: USB2.0    Model:       HS-CF       Rev: 1.64
  Type:   Direct-Access                      ANSI SCSI revision: 00
sd 3:0:0:0: Attached scsi removable disk sdf
sd 3:0:0:0: Attached scsi generic sg5 type 0
  Vendor: USB2.0    Model:       HS-MS       Rev: 1.64
  Type:   Direct-Access                      ANSI SCSI revision: 00
sd 3:0:0:1: Attached scsi removable disk sdg
sd 3:0:0:1: Attached scsi generic sg6 type 0
  Vendor: USB2.0    Model:       HS-SM       Rev: 1.64
  Type:   Direct-Access                      ANSI SCSI revision: 00
sd 3:0:0:2: Attached scsi removable disk sdh
sd 3:0:0:2: Attached scsi generic sg7 type 0
  Vendor: USB2.0    Model:       HS-SD/MMC   Rev: 1.64
  Type:   Direct-Access                      ANSI SCSI revision: 00
sd 3:0:0:3: Attached scsi removable disk sdi
sd 3:0:0:3: Attached scsi generic sg8 type 0
usb-storage: device scan complete
loadkeys[2214]: segfault at 00000000000005a0 rip 00002b22e169feea rsp 00007fffc973c478 error 4
Adding 1000424k swap on /dev/sda6.  Priority:1 extents:1 across:1000424k
EXT3 FS on sdd1, internal journal
raid1: Disk failure on sda2, disabling device. 
        Operation continuing on 1 devices
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:1, o:0, dev:sda2
 disk 1, wo:0, o:1, dev:sdb1
RAID1 conf printout:
 --- wd:1 rd:2
 disk 1, wo:0, o:1, dev:sdb1
raid1: Disk failure on sda5, disabling device. 
        Operation continuing on 1 devices
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:1, o:0, dev:sda5
 disk 1, wo:0, o:1, dev:sdb5
RAID1 conf printout:
 --- wd:1 rd:2
 disk 1, wo:0, o:1, dev:sdb5
raid1: Disk failure on sde2, disabling device. 
        Operation continuing on 1 devices
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:1, o:0, dev:sde2
 disk 1, wo:0, o:1, dev:sdd2
RAID1 conf printout:
 --- wd:1 rd:2
 disk 1, wo:0, o:1, dev:sdd2
kjournald starting.  Commit interval 5 seconds
EXT3 FS on md3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda8, internal journal
EXT3-fs: mounted filesystem with writeback data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on md2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on md0, internal journal
EXT3-fs: mounted filesystem with journal data mode.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with writeback data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda7, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
PM: Writing back config space on device 0000:00:0b.0 at offset b (was 165314e4, writing 13001462)
PM: Writing back config space on device 0000:00:0b.0 at offset 3 (was 0, writing 2008)
PM: Writing back config space on device 0000:00:0b.0 at offset 2 (was 2000000, writing 2000003)
PM: Writing back config space on device 0000:00:0b.0 at offset 1 (was 2b00000, writing 2b00006)
ADDRCONF(NETDEV_UP): eth0: link is not ready
[...]

As we see, the md devices are assembled, then the filesystems are
mounted and swap turned on.  Then all three md devices fail a 
partition at the same time.  Somehow, I don't believe that
is correct. ;-)

Nothing else was logged.

Helge Hafting

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
       [not found]   ` <20060701142419.GB28750@tlg.swandive.local>
@ 2006-07-01 21:30     ` Andrew Morton
  2006-07-01 22:26       ` 2.6.17-mm5 James Bottomley
                         ` (3 more replies)
  0 siblings, 4 replies; 35+ messages in thread
From: Andrew Morton @ 2006-07-01 21:30 UTC (permalink / raw)
  To: Grant Wilson; +Cc: linux-kernel, Neil Brown, linux-scsi

On Sat, 1 Jul 2006 15:24:19 +0100
Grant Wilson <grant.wilson@zen.co.uk> wrote:

> More RAID1 problems - OOPS on shutdown.

Thanks.  Please copy the mailing lists on these reports - I'm not an MD,
SCSI or SATA developer, and this is in their area.

> [   37.482699] md: Autodetecting RAID arrays.
> [   37.547908] md: autorun ...
> [   37.566449] md: considering sdb2 ...
> [   37.589664] md:  adding sdb2 ...
> [   37.610757] md:  adding sda2 ...
> [   37.632116] md: created md1
> [   37.650587] md: bind<sda2>
> [   37.668571] md: bind<sdb2>
> [   37.686541] md: running: <sdb2><sda2>
> [   37.710807] raid1: raid set md1 active with 2 out of 2 mirrors
> [   37.747557] md: ... autorun DONE.
> [   37.784444] EXT3-fs: INFO: recovery required on readonly filesystem.
> [   37.824275] EXT3-fs: write access will be enabled during recovery.
> [   38.814113] kjournald starting.  Commit interval 5 seconds
> [   38.848761] EXT3-fs: sdc1: orphan cleanup on readonly fs
> [   38.985436] EXT3-fs: sdc1: 7 orphan inodes deleted
> [   39.015845] EXT3-fs: recovery complete.
> [   39.072168] EXT3-fs: mounted filesystem with ordered data mode.
> [   44.693986] Adding 995988k swap on /dev/sda1.  Priority:-1 extents:1 across:995988k
> [   44.744558] Adding 995988k swap on /dev/sdb1.  Priority:-2 extents:1 across:995988k
> [   44.966034] EXT3 FS on sdc1, internal journal
> [   49.305350] device-mapper: ioctl: 4.8.0-ioctl (2006-06-24) initialised: dm-devel@redhat.com
> [   64.091331] raid1: Disk failure on sdb2, disabling device. 
> [   64.091333] 	Operation continuing on 1 devices
> [   64.212624] RAID1 conf printout:
> [   64.233951]  --- wd:1 rd:2
> [   64.252195]  disk 0, wo:0, o:1, dev:sda2
> [   64.277712]  disk 1, wo:1, o:0, dev:sdb2
> [   64.305627] RAID1 conf printout:
> [   64.326977]  --- wd:1 rd:2
> [   64.345220]  disk 0, wo:0, o:1, dev:sda2
> [

Which device drivers are being used for these disks?

> [  155.123022] Unable to handle kernel NULL pointer dereference at 0000000000000048 RIP: 
> [  155.155867]  [<ffffffff8047157a>] md_error+0x45/0x91
> [  155.200353] PGD 77954067 PUD 726e5067 PMD 0 
> [  155.226233] Oops: 0000 [1] PREEMPT SMP 
> [  155.249516] last sysfs file: /devices/system/cpu/cpu0/cpufreq/scaling_setspeed
> [  155.292808] CPU 0 
> [  155.304968] Modules linked in: dm_mod evdev
> [  155.330331] Pid: 0, comm: swapper Not tainted 2.6.17-mm5 #1
> [  155.363697] RIP: 0010:[<ffffffff8047157a>]  [<ffffffff8047157a>] md_error+0x45/0x91
> [  155.409638] RSP: 0018:ffffffff807a0c50  EFLAGS: 00010046
> [  155.441445] RAX: 0000000000000000 RBX: ffff81007aa34708 RCX: 000000000000003f
> [  155.484216] RDX: 00000000fffffffb RSI: ffff81007a821d28 RDI: ffff81007aa34708
> [  155.526989] RBP: ffffffff807a0c60 R08: 0000000000000000 R09: ffff81007aac43b0
> [  155.569759] R10: ffffffff804221e5 R11: 0000000000000058 R12: ffff81007aac4ab0
> [  155.612533] R13: ffff81007aac43b0 R14: ffff81007aac4ab0 R15: 00000000fffffffb
> [  155.655303] FS:  00002aeb361606d0(0000) GS:ffffffff80a46000(0000) knlGS:0000000000000000
> [  155.703791] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [  155.738195] CR2: 0000000000000048 CR3: 0000000070997000 CR4: 00000000000006e0
> [  155.780969] Process swapper (pid: 0, threadinfo ffffffff80a64000, task ffffffff80696a00)
> [  155.829404] Stack:  ffff81007a821d28 ffff81007aa34708 ffffffff807a0c80 ffffffff804728d9
> [  155.877840]  ffff81007a821d28 ffff81007aa34708 ffffffff807a0cc0 ffffffff8047409c
> [  155.922535]  00001000807a0d00 ffff81007aac4ab0 00000000fffffffb ffff81007aac4ab0
> [  155.966085] Call Trace:
> [  155.982416]  [<ffffffff804728d9>] super_written+0x30/0x65
> [  156.015292]  [<ffffffff8047409c>] super_written_barrier+0xc4/0xd1
> [  156.052297]  [<ffffffff8023a5a5>] bio_endio+0x56/0x5b
> [  156.082688]  [<ffffffff8022d21b>] __end_that_request_first+0x1c9/0x4c9
> [  156.122068]  [<ffffffff8024a0d6>] end_that_request_first+0xc/0xe
> [  156.158343]  [<ffffffff8036a692>] blk_ordered_complete_seq+0x7c/0x8b
> [  156.196705]  [<ffffffff8036a6d1>] post_flush_end_io+0x30/0x35
> [  156.231419]  [<ffffffff8036a5b5>] end_that_request_last+0xd9/0xf6
> [  156.268215]  [<ffffffff80422204>] scsi_end_request+0xad/0xd7
> [  156.302573]  [<ffffffff80422637>] scsi_io_completion+0x3e1/0x3f0
> [  156.339004]  [<ffffffff8042266c>] scsi_blk_pc_done+0x26/0x28
> [  156.373357]  [<ffffffff8041d11e>] scsi_finish_command+0xa9/0xb2
> [  156.409264]  [<ffffffff804229f9>] scsi_softirq_done+0xf4/0xfd
> [  156.444143]  [<ffffffff80237f66>] blk_done_softirq+0x70/0x7f
> [  156.478323]  [<ffffffff80211366>] __do_softirq+0x67/0xf4
> [  156.510224]  [<ffffffff8025f95e>] call_softirq+0x1e/0x28
> [  156.542083] 
> [  156.542083] Code: 48 8b 40 48 48 85 c0 74 3f ff d0 f0 0f ba ab e0 01 00 00 03 

The barrier code is in there again.

mddev->pers is NULL in md_error(), so the test of
!mddev->pers->error_handler oopsed.  Perhaps this is a real MD bug which is
now being exposed by the new barrier-handling problem.


This should get you further, but...

From: Andrew Morton <akpm@osdl.org>

Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/md/md.c |    2 ++
 1 file changed, 2 insertions(+)

diff -puN drivers/md/md.c~md-oops-workaround drivers/md/md.c
--- a/drivers/md/md.c~md-oops-workaround
+++ a/drivers/md/md.c
@@ -4586,6 +4586,8 @@ void md_error(mddev_t *mddev, mdk_rdev_t
 		__builtin_return_address(0),__builtin_return_address(1),
 		__builtin_return_address(2),__builtin_return_address(3));
 */
+	if (!mddev->pers)
+		return;
 	if (!mddev->pers->error_handler)
 		return;
 	mddev->pers->error_handler(mddev,rdev);
_


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-01 18:14   ` 2.6.17-mm5 dislikes raid-1, just like mm4 Helge Hafting
@ 2006-07-01 22:22     ` Andrew Morton
  2006-07-01 22:52       ` Jeff Garzik
                         ` (2 more replies)
  2006-07-02  3:51     ` Tejun Heo
  1 sibling, 3 replies; 35+ messages in thread
From: Andrew Morton @ 2006-07-01 22:22 UTC (permalink / raw)
  To: Helge Hafting
  Cc: linux-kernel, linux-scsi, Neil Brown, Reuben Farrelly, Grant Wilson

On Sat, 1 Jul 2006 20:14:55 +0200
Helge Hafting <helgehaf@aitel.hist.no> wrote:

> I  just got mm5 up, and it has the same problem as mm4.
> Raid-1 does not work. I used 2.6.16 to resync my raids,
> and booted into 2.6.17-mm5.
> 
> [...]
> md:  adding sda2 ...
> md: created md0
> md: bind<sda2>
> md: bind<sdb1>
> md: running: <sdb1><sda2>
> raid1: raid set md0 active with 2 out of 2 mirrors
> md: ... autorun DONE.
> kjournald starting.  Commit interval 5 seconds
> EXT3-fs: mounted filesystem with ordered data mode.
>   Vendor: USB2.0    Model:       HS-CF       Rev: 1.64
>   Type:   Direct-Access                      ANSI SCSI revision: 00
> sd 3:0:0:0: Attached scsi removable disk sdf
> sd 3:0:0:0: Attached scsi generic sg5 type 0
>   Vendor: USB2.0    Model:       HS-MS       Rev: 1.64
>   Type:   Direct-Access                      ANSI SCSI revision: 00
> sd 3:0:0:1: Attached scsi removable disk sdg
> sd 3:0:0:1: Attached scsi generic sg6 type 0
>   Vendor: USB2.0    Model:       HS-SM       Rev: 1.64
>   Type:   Direct-Access                      ANSI SCSI revision: 00
> sd 3:0:0:2: Attached scsi removable disk sdh
> sd 3:0:0:2: Attached scsi generic sg7 type 0
>   Vendor: USB2.0    Model:       HS-SD/MMC   Rev: 1.64
>   Type:   Direct-Access                      ANSI SCSI revision: 00
> sd 3:0:0:3: Attached scsi removable disk sdi
> sd 3:0:0:3: Attached scsi generic sg8 type 0
> usb-storage: device scan complete
> loadkeys[2214]: segfault at 00000000000005a0 rip 00002b22e169feea rsp 00007fffc973c478 error 4
> Adding 1000424k swap on /dev/sda6.  Priority:1 extents:1 across:1000424k
> EXT3 FS on sdd1, internal journal
> raid1: Disk failure on sda2, disabling device. 
>         Operation continuing on 1 devices
> RAID1 conf printout:
>  --- wd:1 rd:2
>  disk 0, wo:1, o:0, dev:sda2
>  disk 1, wo:0, o:1, dev:sdb1
> RAID1 conf printout:
>  --- wd:1 rd:2
> 
> ...
>
> As we see, the md devices are assembled, then the filesystems are
> mounted and swap turned on.  Then all three md devices fail a 
> partition at the same time.  Somehow, I don't believe that
> is correct. ;-)
> 

I assume this is still the broken-barriers bug.  Thanks for all the help on
this, guys.  More is to be asked for, I'm afraid.

I've prepared a tree which is basically 2.6.17-mm5, only the git-scsi-misc
and git-libata-all trees have been omitted.  It's at 

http://www.zip.com.au/~akpm/linux/patches/stuff/2.6.17-mm5-no-sata-scsi.bz2

(That's a diff against 2.6.17)

If that kernel works, then the next step is to test

http://www.zip.com.au/~akpm/linux/patches/stuff/2.6.17-mm5-no-scsi.bz2

which is 2.6.17-mm5 without git-scsi-misc, but with git-libata-all.

Thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 21:30     ` 2.6.17-mm5 Andrew Morton
@ 2006-07-01 22:26       ` James Bottomley
  2006-07-01 22:32         ` 2.6.17-mm5 Neil Brown
  2006-07-01 22:29       ` More RAID / SATA / barrier problems [ Re: 2.6.17-mm5 ] Neil Brown
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 35+ messages in thread
From: James Bottomley @ 2006-07-01 22:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Grant Wilson, linux-kernel, Neil Brown, linux-scsi

On Sat, 2006-07-01 at 14:30 -0700, Andrew Morton wrote:
> On Sat, 1 Jul 2006 15:24:19 +0100
> Grant Wilson <grant.wilson@zen.co.uk> wrote:
> 
> > More RAID1 problems - OOPS on shutdown.

Actually, is there any more of the trace, like what was going on just
before the oops?

It looks very like a lifetime issue (i.e. md thinks the array is dead
and has torn it down, but there's still an outstanding command).  It
would be nice to know what the outstanding command might have been.

James


> Thanks.  Please copy the mailing lists on these reports - I'm not an MD,
> SCSI or SATA developer, and this is in their area.
> 
> > [   37.482699] md: Autodetecting RAID arrays.
> > [   37.547908] md: autorun ...
> > [   37.566449] md: considering sdb2 ...
> > [   37.589664] md:  adding sdb2 ...
> > [   37.610757] md:  adding sda2 ...
> > [   37.632116] md: created md1
> > [   37.650587] md: bind<sda2>
> > [   37.668571] md: bind<sdb2>
> > [   37.686541] md: running: <sdb2><sda2>
> > [   37.710807] raid1: raid set md1 active with 2 out of 2 mirrors
> > [   37.747557] md: ... autorun DONE.
> > [   37.784444] EXT3-fs: INFO: recovery required on readonly filesystem.
> > [   37.824275] EXT3-fs: write access will be enabled during recovery.
> > [   38.814113] kjournald starting.  Commit interval 5 seconds
> > [   38.848761] EXT3-fs: sdc1: orphan cleanup on readonly fs
> > [   38.985436] EXT3-fs: sdc1: 7 orphan inodes deleted
> > [   39.015845] EXT3-fs: recovery complete.
> > [   39.072168] EXT3-fs: mounted filesystem with ordered data mode.
> > [   44.693986] Adding 995988k swap on /dev/sda1.  Priority:-1 extents:1 across:995988k
> > [   44.744558] Adding 995988k swap on /dev/sdb1.  Priority:-2 extents:1 across:995988k
> > [   44.966034] EXT3 FS on sdc1, internal journal
> > [   49.305350] device-mapper: ioctl: 4.8.0-ioctl (2006-06-24) initialised: dm-devel@redhat.com
> > [   64.091331] raid1: Disk failure on sdb2, disabling device. 
> > [   64.091333] 	Operation continuing on 1 devices
> > [   64.212624] RAID1 conf printout:
> > [   64.233951]  --- wd:1 rd:2
> > [   64.252195]  disk 0, wo:0, o:1, dev:sda2
> > [   64.277712]  disk 1, wo:1, o:0, dev:sdb2
> > [   64.305627] RAID1 conf printout:
> > [   64.326977]  --- wd:1 rd:2
> > [   64.345220]  disk 0, wo:0, o:1, dev:sda2
> > [
> 
> Which device drivers are being used for these disks?
> 
> > [  155.123022] Unable to handle kernel NULL pointer dereference at 0000000000000048 RIP: 
> > [  155.155867]  [<ffffffff8047157a>] md_error+0x45/0x91
> > [  155.200353] PGD 77954067 PUD 726e5067 PMD 0 
> > [  155.226233] Oops: 0000 [1] PREEMPT SMP 
> > [  155.249516] last sysfs file: /devices/system/cpu/cpu0/cpufreq/scaling_setspeed
> > [  155.292808] CPU 0 
> > [  155.304968] Modules linked in: dm_mod evdev
> > [  155.330331] Pid: 0, comm: swapper Not tainted 2.6.17-mm5 #1
> > [  155.363697] RIP: 0010:[<ffffffff8047157a>]  [<ffffffff8047157a>] md_error+0x45/0x91
> > [  155.409638] RSP: 0018:ffffffff807a0c50  EFLAGS: 00010046
> > [  155.441445] RAX: 0000000000000000 RBX: ffff81007aa34708 RCX: 000000000000003f
> > [  155.484216] RDX: 00000000fffffffb RSI: ffff81007a821d28 RDI: ffff81007aa34708
> > [  155.526989] RBP: ffffffff807a0c60 R08: 0000000000000000 R09: ffff81007aac43b0
> > [  155.569759] R10: ffffffff804221e5 R11: 0000000000000058 R12: ffff81007aac4ab0
> > [  155.612533] R13: ffff81007aac43b0 R14: ffff81007aac4ab0 R15: 00000000fffffffb
> > [  155.655303] FS:  00002aeb361606d0(0000) GS:ffffffff80a46000(0000) knlGS:0000000000000000
> > [  155.703791] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > [  155.738195] CR2: 0000000000000048 CR3: 0000000070997000 CR4: 00000000000006e0
> > [  155.780969] Process swapper (pid: 0, threadinfo ffffffff80a64000, task ffffffff80696a00)
> > [  155.829404] Stack:  ffff81007a821d28 ffff81007aa34708 ffffffff807a0c80 ffffffff804728d9
> > [  155.877840]  ffff81007a821d28 ffff81007aa34708 ffffffff807a0cc0 ffffffff8047409c
> > [  155.922535]  00001000807a0d00 ffff81007aac4ab0 00000000fffffffb ffff81007aac4ab0
> > [  155.966085] Call Trace:
> > [  155.982416]  [<ffffffff804728d9>] super_written+0x30/0x65
> > [  156.015292]  [<ffffffff8047409c>] super_written_barrier+0xc4/0xd1
> > [  156.052297]  [<ffffffff8023a5a5>] bio_endio+0x56/0x5b
> > [  156.082688]  [<ffffffff8022d21b>] __end_that_request_first+0x1c9/0x4c9
> > [  156.122068]  [<ffffffff8024a0d6>] end_that_request_first+0xc/0xe
> > [  156.158343]  [<ffffffff8036a692>] blk_ordered_complete_seq+0x7c/0x8b
> > [  156.196705]  [<ffffffff8036a6d1>] post_flush_end_io+0x30/0x35
> > [  156.231419]  [<ffffffff8036a5b5>] end_that_request_last+0xd9/0xf6
> > [  156.268215]  [<ffffffff80422204>] scsi_end_request+0xad/0xd7
> > [  156.302573]  [<ffffffff80422637>] scsi_io_completion+0x3e1/0x3f0
> > [  156.339004]  [<ffffffff8042266c>] scsi_blk_pc_done+0x26/0x28
> > [  156.373357]  [<ffffffff8041d11e>] scsi_finish_command+0xa9/0xb2
> > [  156.409264]  [<ffffffff804229f9>] scsi_softirq_done+0xf4/0xfd
> > [  156.444143]  [<ffffffff80237f66>] blk_done_softirq+0x70/0x7f
> > [  156.478323]  [<ffffffff80211366>] __do_softirq+0x67/0xf4
> > [  156.510224]  [<ffffffff8025f95e>] call_softirq+0x1e/0x28
> > [  156.542083] 
> > [  156.542083] Code: 48 8b 40 48 48 85 c0 74 3f ff d0 f0 0f ba ab e0 01 00 00 03 
> 
> The barrier code is in there again.
> 
> mddev->pers is NULL in md_error(), so the test of
> !mddev->pers->error_handler oopsed.  Perhaps this is a real MD bug which is
> now being exposed by the new barrier-handling problem.
> 
> 
> This should get you further, but...
> 
> From: Andrew Morton <akpm@osdl.org>
> 
> Cc: Neil Brown <neilb@suse.de>
> Signed-off-by: Andrew Morton <akpm@osdl.org>
> ---
> 
>  drivers/md/md.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff -puN drivers/md/md.c~md-oops-workaround drivers/md/md.c
> --- a/drivers/md/md.c~md-oops-workaround
> +++ a/drivers/md/md.c
> @@ -4586,6 +4586,8 @@ void md_error(mddev_t *mddev, mdk_rdev_t
>  		__builtin_return_address(0),__builtin_return_address(1),
>  		__builtin_return_address(2),__builtin_return_address(3));
>  */
> +	if (!mddev->pers)
> +		return;
>  	if (!mddev->pers->error_handler)
>  		return;
>  	mddev->pers->error_handler(mddev,rdev);
> _
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 35+ messages in thread

* More RAID / SATA / barrier problems [ Re: 2.6.17-mm5 ]
  2006-07-01 21:30     ` 2.6.17-mm5 Andrew Morton
  2006-07-01 22:26       ` 2.6.17-mm5 James Bottomley
@ 2006-07-01 22:29       ` Neil Brown
  2006-07-01 22:54       ` 2.6.17-mm5 Jeff Garzik
  2006-07-27 21:02       ` 2.6.17-mm5 Ming Zhang
  3 siblings, 0 replies; 35+ messages in thread
From: Neil Brown @ 2006-07-01 22:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Grant Wilson, linux-kernel, linux-scsi

On Saturday July 1, akpm@osdl.org wrote:
> 
> mddev->pers is NULL in md_error(), so the test of
> !mddev->pers->error_handler oopsed.  Perhaps this is a real MD bug which is
> now being exposed by the new barrier-handling problem.
> 

Yes, this is a real MD bug which would hit whenever writing a
superblock fails during array-shutdown.  I guess that has never
happened before!  The work around you propose is probably as good as
any, but I'll think through it some more and see.

It seems that super block writes are always failing in some
configurations at the moment!

I wonder what we *should* do when writing to the superblock on the
last device of a raid1 faills... maybe switch the array to read-only?
I'll have a think about that too.

But we need to find out why barrier-writes are returning EIO.
Hopefully Reuben's testing will shed some light.

NeilBrown

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 22:26       ` 2.6.17-mm5 James Bottomley
@ 2006-07-01 22:32         ` Neil Brown
  2006-07-01 22:56           ` 2.6.17-mm5 Jeff Garzik
  0 siblings, 1 reply; 35+ messages in thread
From: Neil Brown @ 2006-07-01 22:32 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Morton, Grant Wilson, linux-kernel, linux-scsi

On Saturday July 1, James.Bottomley@SteelEye.com wrote:
> On Sat, 2006-07-01 at 14:30 -0700, Andrew Morton wrote:
> > On Sat, 1 Jul 2006 15:24:19 +0100
> > Grant Wilson <grant.wilson@zen.co.uk> wrote:
> > 
> > > More RAID1 problems - OOPS on shutdown.
> 
> Actually, is there any more of the trace, like what was going on just
> before the oops?
> 
> It looks very like a lifetime issue (i.e. md thinks the array is dead
> and has torn it down, but there's still an outstanding command).  It
> would be nice to know what the outstanding command might have been.

md writes the superblock after tearing down the array, which is
admittedly a bit careless.

The problem seems to be simply that on some hardware at least,
BIO_RW_BARRIER writes result in an EIO.  Don't know why yet.

NeilBrown

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-01 22:22     ` Andrew Morton
@ 2006-07-01 22:52       ` Jeff Garzik
  2006-07-01 22:58         ` Andrew Morton
  2006-07-02  4:43       ` Reuben Farrelly
  2006-07-02  5:13       ` Reuben Farrelly
  2 siblings, 1 reply; 35+ messages in thread
From: Jeff Garzik @ 2006-07-01 22:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Helge Hafting, linux-kernel, linux-scsi, Neil Brown,
	Reuben Farrelly, Grant Wilson

On Sat, Jul 01, 2006 at 03:22:58PM -0700, Andrew Morton wrote:
> Helge Hafting <helgehaf@aitel.hist.no> wrote:
> > kjournald starting.  Commit interval 5 seconds
> > EXT3-fs: mounted filesystem with ordered data mode.
> >   Vendor: USB2.0    Model:       HS-CF       Rev: 1.64
> >   Type:   Direct-Access                      ANSI SCSI revision: 00
> > sd 3:0:0:0: Attached scsi removable disk sdf
> > sd 3:0:0:0: Attached scsi generic sg5 type 0
> >   Vendor: USB2.0    Model:       HS-MS       Rev: 1.64
> >   Type:   Direct-Access                      ANSI SCSI revision: 00
> > sd 3:0:0:1: Attached scsi removable disk sdg
> > sd 3:0:0:1: Attached scsi generic sg6 type 0
> >   Vendor: USB2.0    Model:       HS-SM       Rev: 1.64
> >   Type:   Direct-Access                      ANSI SCSI revision: 00
> > sd 3:0:0:2: Attached scsi removable disk sdh
> > sd 3:0:0:2: Attached scsi generic sg7 type 0
> >   Vendor: USB2.0    Model:       HS-SD/MMC   Rev: 1.64
> >   Type:   Direct-Access                      ANSI SCSI revision: 00

> I assume this is still the broken-barriers bug.  Thanks for all the help on
> this, guys.  More is to be asked for, I'm afraid.
> 
> I've prepared a tree which is basically 2.6.17-mm5, only the git-scsi-misc
> and git-libata-all trees have been omitted.  It's at 

What does USB storage have to do with SATA?

	Jeff




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 21:30     ` 2.6.17-mm5 Andrew Morton
  2006-07-01 22:26       ` 2.6.17-mm5 James Bottomley
  2006-07-01 22:29       ` More RAID / SATA / barrier problems [ Re: 2.6.17-mm5 ] Neil Brown
@ 2006-07-01 22:54       ` Jeff Garzik
  2006-07-27 21:02       ` 2.6.17-mm5 Ming Zhang
  3 siblings, 0 replies; 35+ messages in thread
From: Jeff Garzik @ 2006-07-01 22:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Grant Wilson, linux-kernel, Neil Brown, linux-scsi

On Sat, Jul 01, 2006 at 02:30:47PM -0700, Andrew Morton wrote:
> Grant Wilson <grant.wilson@zen.co.uk> wrote:
> > [  155.226233] Oops: 0000 [1] PREEMPT SMP 

Also, would be nice to re-test without preempt.

Disabling preempt _continues_ to fix (bandaid?) problems...

	Jeff




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 22:32         ` 2.6.17-mm5 Neil Brown
@ 2006-07-01 22:56           ` Jeff Garzik
  2006-07-02  0:10             ` 2.6.17-mm5 James Bottomley
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff Garzik @ 2006-07-01 22:56 UTC (permalink / raw)
  To: Neil Brown
  Cc: James Bottomley, Andrew Morton, Grant Wilson, linux-kernel, linux-scsi

On Sun, Jul 02, 2006 at 08:32:28AM +1000, Neil Brown wrote:
> The problem seems to be simply that on some hardware at least,
> BIO_RW_BARRIER writes result in an EIO.  Don't know why yet.

Could be that <whatever device> is choking on FLUSH CACHE (ATA)
or SYNCHRONIZE CACHE (SCSI).

That's one possible reason why EIO may result from a barrier...

	Jeff




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-01 22:52       ` Jeff Garzik
@ 2006-07-01 22:58         ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2006-07-01 22:58 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: helgehaf, linux-kernel, linux-scsi, neilb, reuben-lkml, grant.wilson

On Sat, 1 Jul 2006 18:52:12 -0400
Jeff Garzik <jeff@garzik.org> wrote:

> On Sat, Jul 01, 2006 at 03:22:58PM -0700, Andrew Morton wrote:
> > Helge Hafting <helgehaf@aitel.hist.no> wrote:
> > > kjournald starting.  Commit interval 5 seconds
> > > EXT3-fs: mounted filesystem with ordered data mode.
> > >   Vendor: USB2.0    Model:       HS-CF       Rev: 1.64
> > >   Type:   Direct-Access                      ANSI SCSI revision: 00
> > > sd 3:0:0:0: Attached scsi removable disk sdf
> > > sd 3:0:0:0: Attached scsi generic sg5 type 0
> > >   Vendor: USB2.0    Model:       HS-MS       Rev: 1.64
> > >   Type:   Direct-Access                      ANSI SCSI revision: 00
> > > sd 3:0:0:1: Attached scsi removable disk sdg
> > > sd 3:0:0:1: Attached scsi generic sg6 type 0
> > >   Vendor: USB2.0    Model:       HS-SM       Rev: 1.64
> > >   Type:   Direct-Access                      ANSI SCSI revision: 00
> > > sd 3:0:0:2: Attached scsi removable disk sdh
> > > sd 3:0:0:2: Attached scsi generic sg7 type 0
> > >   Vendor: USB2.0    Model:       HS-SD/MMC   Rev: 1.64
> > >   Type:   Direct-Access                      ANSI SCSI revision: 00
> 
> > I assume this is still the broken-barriers bug.  Thanks for all the help on
> > this, guys.  More is to be asked for, I'm afraid.
> > 
> > I've prepared a tree which is basically 2.6.17-mm5, only the git-scsi-misc
> > and git-libata-all trees have been omitted.  It's at 
> 
> What does USB storage have to do with SATA?
> 

Please read the mailing list - several of these reports have been with
sata.

Yes, thank you.  As this report is against usb-storage then the bug most
probably lies in git-scsi-misc.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 22:56           ` 2.6.17-mm5 Jeff Garzik
@ 2006-07-02  0:10             ` James Bottomley
  0 siblings, 0 replies; 35+ messages in thread
From: James Bottomley @ 2006-07-02  0:10 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Neil Brown, Andrew Morton, Grant Wilson, linux-kernel, linux-scsi

On Sat, 2006-07-01 at 18:56 -0400, Jeff Garzik wrote:
> On Sun, Jul 02, 2006 at 08:32:28AM +1000, Neil Brown wrote:
> > The problem seems to be simply that on some hardware at least,
> > BIO_RW_BARRIER writes result in an EIO.  Don't know why yet.
> 
> Could be that <whatever device> is choking on FLUSH CACHE (ATA)
> or SYNCHRONIZE CACHE (SCSI).
> 
> That's one possible reason why EIO may result from a barrier...

There is no barrier implementation on SCSI (basically you can't maintain
barriers in the face of TCQ, so only depth one devices can do it and
hence all the scsi drivers turn it off), so it must be a FLUSH CACHE.

This one looks like it went down via prepare_flush rather than
issue_flush, so the normal error printing case that issue flush has is
skipped.  This patch should tell us what the error was on the command.

James

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 3d04a9f..3e3e3b7 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1162,7 +1162,20 @@ static int scsi_issue_flush_fn(request_q
 
 static void scsi_blk_pc_done(struct scsi_cmnd *cmd)
 {
+	int res = cmd->result;
+	struct scsi_sense_hdr sshdr;
+
 	BUG_ON(!blk_pc_request(cmd->request));
+	if (!res) {
+		printk(KERN_ERR "REQ_BLOCK_PC FAILED for ");
+		__scsi_print_command(cmd->cmnd);
+		printk(KERN_ERR "FAILED\n  status = %x, message = %02x, "
+		       "host = %d, driver = %02x\n  ",
+		       status_byte(res), msg_byte(res),
+		       host_byte(res), driver_byte(res));
+		if (scsi_command_normalize_sense(cmd, &sshdr))
+			scsi_print_sense_hdr("sd", &sshdr);
+	}
 	/*
 	 * This will complete the whole command with uptodate=1 so
 	 * as far as the block layer is concerned the command completed


James



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-01 18:14   ` 2.6.17-mm5 dislikes raid-1, just like mm4 Helge Hafting
  2006-07-01 22:22     ` Andrew Morton
@ 2006-07-02  3:51     ` Tejun Heo
  1 sibling, 0 replies; 35+ messages in thread
From: Tejun Heo @ 2006-07-02  3:51 UTC (permalink / raw)
  To: Helge Hafting; +Cc: Andrew Morton, linux-kernel, Jeff Garzik

Helge Hafting wrote:
> md:  adding sda2 ...
> md: created md0
> md: bind<sda2>
> md: bind<sdb1>
> md: running: <sdb1><sda2>
> raid1: raid set md0 active with 2 out of 2 mirrors
> md: ... autorun DONE.
> kjournald starting.  Commit interval 5 seconds
> EXT3-fs: mounted filesystem with ordered data mode.
>   Vendor: USB2.0    Model:       HS-CF       Rev: 1.64
>   Type:   Direct-Access                      ANSI SCSI revision: 00
> sd 3:0:0:0: Attached scsi removable disk sdf
> sd 3:0:0:0: Attached scsi generic sg5 type 0
>   Vendor: USB2.0    Model:       HS-MS       Rev: 1.64
>   Type:   Direct-Access                      ANSI SCSI revision: 00
> sd 3:0:0:1: Attached scsi removable disk sdg
> sd 3:0:0:1: Attached scsi generic sg6 type 0
>   Vendor: USB2.0    Model:       HS-SM       Rev: 1.64
>   Type:   Direct-Access                      ANSI SCSI revision: 00
> sd 3:0:0:2: Attached scsi removable disk sdh
> sd 3:0:0:2: Attached scsi generic sg7 type 0
>   Vendor: USB2.0    Model:       HS-SD/MMC   Rev: 1.64
>   Type:   Direct-Access                      ANSI SCSI revision: 00
> sd 3:0:0:3: Attached scsi removable disk sdi
> sd 3:0:0:3: Attached scsi generic sg8 type 0
> usb-storage: device scan complete
> loadkeys[2214]: segfault at 00000000000005a0 rip 00002b22e169feea rsp 00007fffc973c478 error 4
> Adding 1000424k swap on /dev/sda6.  Priority:1 extents:1 across:1000424k
> EXT3 FS on sdd1, internal journal
> raid1: Disk failure on sda2, disabling device. 
>         Operation continuing on 1 devices
> RAID1 conf printout:
>  --- wd:1 rd:2
>  disk 0, wo:1, o:0, dev:sda2
>  disk 1, wo:0, o:1, dev:sdb1
> RAID1 conf printout:
>  --- wd:1 rd:2
>  disk 1, wo:0, o:1, dev:sdb1
> raid1: Disk failure on sda5, disabling device. 
>         Operation continuing on 1 devices
> RAID1 conf printout:
>  --- wd:1 rd:2
>  disk 0, wo:1, o:0, dev:sda5
>  disk 1, wo:0, o:1, dev:sdb5
> RAID1 conf printout:
>  --- wd:1 rd:2
>  disk 1, wo:0, o:1, dev:sdb5
> raid1: Disk failure on sde2, disabling device. 
>         Operation continuing on 1 devices
> RAID1 conf printout:
>  --- wd:1 rd:2
>  disk 0, wo:1, o:0, dev:sde2
>  disk 1, wo:0, o:1, dev:sdd2
> RAID1 conf printout:
>  --- wd:1 rd:2
>  disk 1, wo:0, o:1, dev:sdd2
> kjournald starting.  Commit interval 5 seconds
> EXT3 FS on md3, internal journal
> EXT3-fs: mounted filesystem with ordered data mode.
> kjournald starting.  Commit interval 5 seconds
> EXT3 FS on sda8, internal journal
> EXT3-fs: mounted filesystem with writeback data mode.
> kjournald starting.  Commit interval 5 seconds
> EXT3 FS on md2, internal journal
> EXT3-fs: mounted filesystem with ordered data mode.
> kjournald starting.  Commit interval 5 seconds
> EXT3 FS on md0, internal journal
> EXT3-fs: mounted filesystem with journal data mode.
> kjournald starting.  Commit interval 5 seconds
> EXT3-fs: mounted filesystem with writeback data mode.
> kjournald starting.  Commit interval 5 seconds
> EXT3 FS on sda7, internal journal
> EXT3-fs: mounted filesystem with ordered data mode.
> PM: Writing back config space on device 0000:00:0b.0 at offset b (was 165314e4, writing 13001462)
> PM: Writing back config space on device 0000:00:0b.0 at offset 3 (was 0, writing 2008)
> PM: Writing back config space on device 0000:00:0b.0 at offset 2 (was 2000000, writing 2000003)
> PM: Writing back config space on device 0000:00:0b.0 at offset 1 (was 2b00000, writing 2b00006)
> ADDRCONF(NETDEV_UP): eth0: link is not ready
> [...]
> 
> As we see, the md devices are assembled, then the filesystems are
> mounted and swap turned on.  Then all three md devices fail a 
> partition at the same time.  Somehow, I don't believe that
> is correct. ;-)

Hello, all,

I've just tested both libata-dev#upstream and 2.6.17-mm5 and both work 
fine on my machine w/ sata_sil24.  It doesn't seem to be a libata 
problem ATM.  libata should have complained loud & clear if it had 
indicated error to upper layer thus causing above degraded raid array 
event.  Can you please post full boot dmesg preferably w/ timestamps?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-01 22:22     ` Andrew Morton
  2006-07-01 22:52       ` Jeff Garzik
@ 2006-07-02  4:43       ` Reuben Farrelly
  2006-07-02  6:09         ` Andrew Morton
  2006-07-02  5:13       ` Reuben Farrelly
  2 siblings, 1 reply; 35+ messages in thread
From: Reuben Farrelly @ 2006-07-02  4:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Helge Hafting, linux-kernel, linux-scsi, Neil Brown, Grant Wilson



On 2/07/2006 10:22 a.m., Andrew Morton wrote:
> On Sat, 1 Jul 2006 20:14:55 +0200
> Helge Hafting <helgehaf@aitel.hist.no> wrote:
> 
>> I  just got mm5 up, and it has the same problem as mm4.
>> Raid-1 does not work. I used 2.6.16 to resync my raids,
>> and booted into 2.6.17-mm5.
<snip>
>> As we see, the md devices are assembled, then the filesystems are
>> mounted and swap turned on.  Then all three md devices fail a 
>> partition at the same time.  Somehow, I don't believe that
>> is correct. ;-)
>>
> 
> I assume this is still the broken-barriers bug.  Thanks for all the help on
> this, guys.  More is to be asked for, I'm afraid.
> 
> I've prepared a tree which is basically 2.6.17-mm5, only the git-scsi-misc
> and git-libata-all trees have been omitted.  It's at 
> 
> http://www.zip.com.au/~akpm/linux/patches/stuff/2.6.17-mm5-no-sata-scsi.bz2
> 
> (That's a diff against 2.6.17)

Works.

> If that kernel works, then the next step is to test
> 
> http://www.zip.com.au/~akpm/linux/patches/stuff/2.6.17-mm5-no-scsi.bz2
> 
> which is 2.6.17-mm5 without git-scsi-misc, but with git-libata-all.

Works.  I'm running it now and it looks to be all fine (including the 
workaround/fix for MSI)

In both cases I rebooted twice with each kernel to be sure it wasn't a one-off.

This then must point to git-scsi-misc being implicated, if not the source.......

Reuben

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-01 22:22     ` Andrew Morton
  2006-07-01 22:52       ` Jeff Garzik
  2006-07-02  4:43       ` Reuben Farrelly
@ 2006-07-02  5:13       ` Reuben Farrelly
  2006-07-02 13:53         ` James Bottomley
  2 siblings, 1 reply; 35+ messages in thread
From: Reuben Farrelly @ 2006-07-02  5:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Helge Hafting, linux-kernel, linux-scsi, Neil Brown, Grant Wilson


On 2/07/2006 10:22 a.m., Andrew Morton wrote:
> I assume this is still the broken-barriers bug.  Thanks for all the help on
> this, guys.  More is to be asked for, I'm afraid.
> 
> I've prepared a tree which is basically 2.6.17-mm5, only the git-scsi-misc
> and git-libata-all trees have been omitted.  It's at 
> 
> http://www.zip.com.au/~akpm/linux/patches/stuff/2.6.17-mm5-no-sata-scsi.bz2
> 
> (That's a diff against 2.6.17)
> 
> If that kernel works, then the next step is to test
> 
> http://www.zip.com.au/~akpm/linux/patches/stuff/2.6.17-mm5-no-scsi.bz2
> 
> which is 2.6.17-mm5 without git-scsi-misc, but with git-libata-all.

Just for kicks, after testing those two trees (see previous email) I took my 
2.6.17-mm5 without git-scsi-misc and then patched git-scsi-misc.patch back in, 
rebuilt and rebooted and noted that RAID broke again.  Reverted the patch and it 
all worked.

So I can conclude that definitely and reproduceably that's the one.........

reuben



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-02  4:43       ` Reuben Farrelly
@ 2006-07-02  6:09         ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2006-07-02  6:09 UTC (permalink / raw)
  To: Reuben Farrelly, James Bottomley
  Cc: helgehaf, linux-kernel, linux-scsi, neilb, grant.wilson

On Sun, 02 Jul 2006 16:43:56 +1200
Reuben Farrelly <reuben-lkml@reub.net> wrote:

> 
> 
> On 2/07/2006 10:22 a.m., Andrew Morton wrote:
> > On Sat, 1 Jul 2006 20:14:55 +0200
> > Helge Hafting <helgehaf@aitel.hist.no> wrote:
> > 
> >> I  just got mm5 up, and it has the same problem as mm4.
> >> Raid-1 does not work. I used 2.6.16 to resync my raids,
> >> and booted into 2.6.17-mm5.
> <snip>
> >> As we see, the md devices are assembled, then the filesystems are
> >> mounted and swap turned on.  Then all three md devices fail a 
> >> partition at the same time.  Somehow, I don't believe that
> >> is correct. ;-)
> >>
> > 
> > I assume this is still the broken-barriers bug.  Thanks for all the help on
> > this, guys.  More is to be asked for, I'm afraid.
> > 
> > I've prepared a tree which is basically 2.6.17-mm5, only the git-scsi-misc
> > and git-libata-all trees have been omitted.  It's at 
> > 
> > http://www.zip.com.au/~akpm/linux/patches/stuff/2.6.17-mm5-no-sata-scsi.bz2
> > 
> > (That's a diff against 2.6.17)
> 
> Works.
> 
> > If that kernel works, then the next step is to test
> > 
> > http://www.zip.com.au/~akpm/linux/patches/stuff/2.6.17-mm5-no-scsi.bz2
> > 
> > which is 2.6.17-mm5 without git-scsi-misc, but with git-libata-all.
> 
> Works.  I'm running it now and it looks to be all fine (including the 
> workaround/fix for MSI)
> 
> In both cases I rebooted twice with each kernel to be sure it wasn't a one-off.
> 
> This then must point to git-scsi-misc being implicated, if not the source.......
> 

Yep, everything points to that, thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 10:35 ` 2.6.17-mm5 Andrew Morton
                     ` (3 preceding siblings ...)
       [not found]   ` <20060701142419.GB28750@tlg.swandive.local>
@ 2006-07-02 10:03   ` Andy Whitcroft
  2006-07-02 10:14     ` 2.6.17-mm5 Andrew Morton
  2006-07-03  0:47   ` 2.6.17-mm5 Theodore Tso
  2006-07-03  7:32   ` 2.6.17-mm5 Heiko Carstens
  6 siblings, 1 reply; 35+ messages in thread
From: Andy Whitcroft @ 2006-07-02 10:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Seems that we have some kind of schedular balance panic, I want to say
back as this seems very familiar.  Seems to be affecting the multi-node
NUMA-Q systems here.  The single node ones appear unaffected.

Nothing jumps out of the patch list.  Any suggestions as to what to rip
out :)

-apw

divide error: 0000 [#1]
8K_STACKS SMP
last sysfs file:
Modules linked in:
CPU:    3
EIP:    0060:[<c0112b6e>]    Not tainted VLI
EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1)
EIP is at find_busiest_group+0x1a3/0x47c
eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
esi: 00000000   edi: e7677264   ebp: e74a3ec8   esp: e74a3e58
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, ti=e74a2000 task=e7485030 task.ti=e74a2000)
Stack: e7677264 00000010 c0119020 00000000 00000000 00000000 00000000
00000000
       ffffffff 00000000 00000000 00000001 00000001 00000001 00000080
00000000
       00000000 00000200 00000020 00000080 00000000 00000000 e7677260
c13dc960
Call Trace:
 [<c0119020>] vprintk+0x5f/0x213
 [<c0112efb>] load_balance+0x54/0x1d6
 [<c011332d>] rebalance_tick+0xc5/0xe3
 [<c01137a3>] scheduler_tick+0x2cb/0x2d3
 [<c01215b4>] update_process_times+0x51/0x5d
 [<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
 [<c0102d5b>] apic_timer_interrupt+0x1f/0x24
 [<c01006c0>] default_idle+0x0/0x59
 [<c01006f1>] default_idle+0x31/0x59
 [<c0100791>] cpu_idle+0x64/0x79
Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45
dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1
83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b
EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e74a3e58
 <0>Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-02 10:03   ` 2.6.17-mm5 Andy Whitcroft
@ 2006-07-02 10:14     ` Andrew Morton
  2006-07-02 10:40       ` 2.6.17-mm5 Andy Whitcroft
  0 siblings, 1 reply; 35+ messages in thread
From: Andrew Morton @ 2006-07-02 10:14 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: linux-kernel

On Sun, 02 Jul 2006 11:03:16 +0100
Andy Whitcroft <apw@shadowen.org> wrote:

> Seems that we have some kind of schedular balance panic, I want to say
> back as this seems very familiar.  Seems to be affecting the multi-node
> NUMA-Q systems here.  The single node ones appear unaffected.
> 
> Nothing jumps out of the patch list.  Any suggestions as to what to rip
> out :)
> 
> -apw
> 
> divide error: 0000 [#1]
> 8K_STACKS SMP
> last sysfs file:
> Modules linked in:
> CPU:    3
> EIP:    0060:[<c0112b6e>]    Not tainted VLI
> EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1)
> EIP is at find_busiest_group+0x1a3/0x47c
> eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
> esi: 00000000   edi: e7677264   ebp: e74a3ec8   esp: e74a3e58
> ds: 007b   es: 007b   ss: 0068
> Process swapper (pid: 0, ti=e74a2000 task=e7485030 task.ti=e74a2000)
> Stack: e7677264 00000010 c0119020 00000000 00000000 00000000 00000000
> 00000000
>        ffffffff 00000000 00000000 00000001 00000001 00000001 00000080
> 00000000
>        00000000 00000200 00000020 00000080 00000000 00000000 e7677260
> c13dc960
> Call Trace:
>  [<c0119020>] vprintk+0x5f/0x213
>  [<c0112efb>] load_balance+0x54/0x1d6
>  [<c011332d>] rebalance_tick+0xc5/0xe3
>  [<c01137a3>] scheduler_tick+0x2cb/0x2d3
>  [<c01215b4>] update_process_times+0x51/0x5d
>  [<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
>  [<c0102d5b>] apic_timer_interrupt+0x1f/0x24
>  [<c01006c0>] default_idle+0x0/0x59
>  [<c01006f1>] default_idle+0x31/0x59
>  [<c0100791>] cpu_idle+0x64/0x79
> Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45
> dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1
> 83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b
> EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e74a3e58
>  <0>Kernel panic - not syncing: Fatal exception in interrupt

Well there are only a handful of divides in find_busiest_group().  Wanna
have a poke around in gdb and work out which one you're hitting?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-02 10:14     ` 2.6.17-mm5 Andrew Morton
@ 2006-07-02 10:40       ` Andy Whitcroft
  2006-07-02 11:14         ` 2.6.17-mm5 Andrew Morton
  0 siblings, 1 reply; 35+ messages in thread
From: Andy Whitcroft @ 2006-07-02 10:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton wrote:
> On Sun, 02 Jul 2006 11:03:16 +0100
> Andy Whitcroft <apw@shadowen.org> wrote:
> 
> 
>>Seems that we have some kind of schedular balance panic, I want to say
>>back as this seems very familiar.  Seems to be affecting the multi-node
>>NUMA-Q systems here.  The single node ones appear unaffected.
>>
>>Nothing jumps out of the patch list.  Any suggestions as to what to rip
>>out :)
>>
>>-apw
>>
>>divide error: 0000 [#1]
>>8K_STACKS SMP
>>last sysfs file:
>>Modules linked in:
>>CPU:    3
>>EIP:    0060:[<c0112b6e>]    Not tainted VLI
>>EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1)
>>EIP is at find_busiest_group+0x1a3/0x47c
>>eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
>>esi: 00000000   edi: e7677264   ebp: e74a3ec8   esp: e74a3e58
>>ds: 007b   es: 007b   ss: 0068
>>Process swapper (pid: 0, ti=e74a2000 task=e7485030 task.ti=e74a2000)
>>Stack: e7677264 00000010 c0119020 00000000 00000000 00000000 00000000
>>00000000
>>       ffffffff 00000000 00000000 00000001 00000001 00000001 00000080
>>00000000
>>       00000000 00000200 00000020 00000080 00000000 00000000 e7677260
>>c13dc960
>>Call Trace:
>> [<c0119020>] vprintk+0x5f/0x213
>> [<c0112efb>] load_balance+0x54/0x1d6
>> [<c011332d>] rebalance_tick+0xc5/0xe3
>> [<c01137a3>] scheduler_tick+0x2cb/0x2d3
>> [<c01215b4>] update_process_times+0x51/0x5d
>> [<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
>> [<c0102d5b>] apic_timer_interrupt+0x1f/0x24
>> [<c01006c0>] default_idle+0x0/0x59
>> [<c01006f1>] default_idle+0x31/0x59
>> [<c0100791>] cpu_idle+0x64/0x79
>>Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45
>>dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1
>>83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b
>>EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e74a3e58
>> <0>Kernel panic - not syncing: Fatal exception in interrupt
> 
> 
> Well there are only a handful of divides in find_busiest_group().  Wanna
> have a poke around in gdb and work out which one you're hitting?

Sure I'll see what information I can get on this one.

-apw

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-02 10:40       ` 2.6.17-mm5 Andy Whitcroft
@ 2006-07-02 11:14         ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2006-07-02 11:14 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: linux-kernel

On Sun, 02 Jul 2006 11:40:26 +0100
Andy Whitcroft <apw@shadowen.org> wrote:

> > Well there are only a handful of divides in find_busiest_group().  Wanna
> > have a poke around in gdb and work out which one you're hitting?
> 
> Sure I'll see what information I can get on this one.

Easy way:

Set CONFIG_DEBUG_INFO, do:

make kernel/sched.o
gdb kernel/sched.o
(gdb) p find_busiest_group
$1 = {struct sched_group *(struct sched_domain *, int, long unsigned int *, 
    enum idle_type, int *)} 0xff0 <find_busiest_group>
(gdb) l *(0xff0 + 0x1a3)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-02  5:13       ` Reuben Farrelly
@ 2006-07-02 13:53         ` James Bottomley
  2006-07-02 14:28           ` Grant Wilson
  0 siblings, 1 reply; 35+ messages in thread
From: James Bottomley @ 2006-07-02 13:53 UTC (permalink / raw)
  To: Reuben Farrelly
  Cc: Andrew Morton, Helge Hafting, linux-kernel, linux-scsi,
	Neil Brown, Grant Wilson

On Sun, 2006-07-02 at 17:13 +1200, Reuben Farrelly wrote:
> Just for kicks, after testing those two trees (see previous email) I
> took my 
> 2.6.17-mm5 without git-scsi-misc and then patched git-scsi-misc.patch
> back in, 
> rebuilt and rebooted and noted that RAID broke again.  Reverted the
> patch and it 
> all worked.
> 
> So I can conclude that definitely and reproduceably that's the
> one.........

OK, I have a theory.  I think 

[SCSI] sd/scsi_lib simplify sd_rw_intr and scsi_io_completion

Failed to take into account completion of zero length commands (which is
what a flush is).  Could you try the whole of -mm with this patch?

Thanks,

James

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 4c4add5..3d04a9f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -855,7 +855,8 @@ static void scsi_release_buffers(struct 
  *		b) We can just use scsi_requeue_command() here.  This would
  *		   be used if we just wanted to retry, for example.
  */
-void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
+void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes,
+			unsigned int block_bytes)
 {
 	int result = cmd->result;
 	int this_count = cmd->bufflen;
@@ -920,72 +921,87 @@ void scsi_io_completion(struct scsi_cmnd
 	 * Next deal with any sectors which we were able to correctly
 	 * handle.
 	 */
-	if (good_bytes > 0) {
-		SCSI_LOG_HLCOMPLETE(1, printk("%ld sectors total, "
-					      "%d bytes done.\n",
+	if (good_bytes >= 0) {
+		SCSI_LOG_HLCOMPLETE(1, printk("%ld sectors total, %d bytes done.\n",
 					      req->nr_sectors, good_bytes));
 		SCSI_LOG_HLCOMPLETE(1, printk("use_sg is %d\n", cmd->use_sg));
 
 		if (clear_errors)
 			req->errors = 0;
+		/*
+		 * If multiple sectors are requested in one buffer, then
+		 * they will have been finished off by the first command.
+		 * If not, then we have a multi-buffer command.
+		 *
+		 * If block_bytes != 0, it means we had a medium error
+		 * of some sort, and that we want to mark some number of
+		 * sectors as not uptodate.  Thus we want to inhibit
+		 * requeueing right here - we will requeue down below
+		 * when we handle the bad sectors.
+		 */
 
-		/* A number of bytes were successfully read.  If there
-		 * is leftovers and there is some kind of error
-		 * (result != 0), retry the rest.
+		/*
+		 * If the command completed without error, then either
+		 * finish off the rest of the command, or start a new one.
 		 */
-		if (scsi_end_request(cmd, 1, good_bytes, !!result) == NULL)
+		if (scsi_end_request(cmd, 1, good_bytes, result == 0) == NULL)
 			return;
 	}
-
-	/* good_bytes = 0, or (inclusive) there were leftovers and
-	 * result = 0, so scsi_end_request couldn't retry.
+	/*
+	 * Now, if we were good little boys and girls, Santa left us a request
+	 * sense buffer.  We can extract information from this, so we
+	 * can choose a block to remap, etc.
 	 */
 	if (sense_valid && !sense_deferred) {
 		switch (sshdr.sense_key) {
 		case UNIT_ATTENTION:
 			if (cmd->device->removable) {
-				/* Detected disc change.  Set a bit
+				/* detected disc change.  set a bit 
 				 * and quietly refuse further access.
 				 */
 				cmd->device->changed = 1;
-				scsi_end_request(cmd, 0, this_count, 1);
+				scsi_end_request(cmd, 0,
+						this_count, 1);
 				return;
 			} else {
-				/* Must have been a power glitch, or a
-				 * bus reset.  Could not have been a
-				 * media change, so we just retry the
-				 * request and see what happens.
-				 */
+				/*
+				* Must have been a power glitch, or a
+				* bus reset.  Could not have been a
+				* media change, so we just retry the
+				* request and see what happens.  
+				*/
 				scsi_requeue_command(q, cmd);
 				return;
 			}
 			break;
 		case ILLEGAL_REQUEST:
-			/* If we had an ILLEGAL REQUEST returned, then
-			 * we may have performed an unsupported
-			 * command.  The only thing this should be
-			 * would be a ten byte read where only a six
-			 * byte read was supported.  Also, on a system
-			 * where READ CAPACITY failed, we may have
-			 * read past the end of the disk.
-			 */
+			/*
+		 	* If we had an ILLEGAL REQUEST returned, then we may
+		 	* have performed an unsupported command.  The only
+		 	* thing this should be would be a ten byte read where
+			* only a six byte read was supported.  Also, on a
+			* system where READ CAPACITY failed, we may have read
+			* past the end of the disk.
+		 	*/
 			if ((cmd->device->use_10_for_rw &&
 			    sshdr.asc == 0x20 && sshdr.ascq == 0x00) &&
 			    (cmd->cmnd[0] == READ_10 ||
 			     cmd->cmnd[0] == WRITE_10)) {
 				cmd->device->use_10_for_rw = 0;
-				/* This will cause a retry with a
-				 * 6-byte command.
+				/*
+				 * This will cause a retry with a 6-byte
+				 * command.
 				 */
 				scsi_requeue_command(q, cmd);
-				return;
+				result = 0;
 			} else {
 				scsi_end_request(cmd, 0, this_count, 1);
 				return;
 			}
 			break;
 		case NOT_READY:
-			/* If the device is in the process of becoming
+			/*
+			 * If the device is in the process of becoming
 			 * ready, or has a temporary blockage, retry.
 			 */
 			if (sshdr.asc == 0x04) {
@@ -1005,7 +1021,7 @@ void scsi_io_completion(struct scsi_cmnd
 			}
 			if (!(req->flags & REQ_QUIET)) {
 				scmd_printk(KERN_INFO, cmd,
-					    "Device not ready: ");
+					   "Device not ready: ");
 				scsi_print_sense_hdr("", &sshdr);
 			}
 			scsi_end_request(cmd, 0, this_count, 1);
@@ -1013,21 +1029,21 @@ void scsi_io_completion(struct scsi_cmnd
 		case VOLUME_OVERFLOW:
 			if (!(req->flags & REQ_QUIET)) {
 				scmd_printk(KERN_INFO, cmd,
-					    "Volume overflow, CDB: ");
+					   "Volume overflow, CDB: ");
 				__scsi_print_command(cmd->data_cmnd);
 				scsi_print_sense("", cmd);
 			}
-			/* See SSC3rXX or current. */
-			scsi_end_request(cmd, 0, this_count, 1);
+			scsi_end_request(cmd, 0, block_bytes, 1);
 			return;
 		default:
 			break;
 		}
-	}
+	}			/* driver byte != 0 */
 	if (host_byte(result) == DID_RESET) {
-		/* Third party bus reset or reset for error recovery
-		 * reasons.  Just retry the request and see what
-		 * happens.
+		/*
+		 * Third party bus reset or reset for error
+		 * recovery reasons.  Just retry the request
+		 * and see what happens.  
 		 */
 		scsi_requeue_command(q, cmd);
 		return;
@@ -1035,13 +1051,21 @@ void scsi_io_completion(struct scsi_cmnd
 	if (result) {
 		if (!(req->flags & REQ_QUIET)) {
 			scmd_printk(KERN_INFO, cmd,
-				    "SCSI error: return code = 0x%08x\n",
-				    result);
+				   "SCSI error: return code = 0x%x\n", result);
+
 			if (driver_byte(result) & DRIVER_SENSE)
 				scsi_print_sense("", cmd);
 		}
+		/*
+		 * Mark a single buffer as not uptodate.  Queue the remainder.
+		 * We sometimes get this cruft in the event that a medium error
+		 * isn't properly reported.
+		 */
+		block_bytes = req->hard_cur_sectors << 9;
+		if (!block_bytes)
+			block_bytes = req->data_len;
+		scsi_end_request(cmd, 0, block_bytes, 1);
 	}
-	scsi_end_request(cmd, 0, this_count, !result);
 }
 EXPORT_SYMBOL(scsi_io_completion);
 
@@ -1145,7 +1169,7 @@ static void scsi_blk_pc_done(struct scsi
 	 * successfully. Since this is a REQ_BLOCK_PC command the
 	 * caller should check the request's errors value
 	 */
-	scsi_io_completion(cmd, cmd->bufflen);
+	scsi_io_completion(cmd, cmd->bufflen, 0);
 }
 
 static void scsi_setup_blk_pc_cmnd(struct scsi_cmnd *cmd)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index f899ff0..3541990 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -891,10 +891,11 @@ #endif
 static void sd_rw_intr(struct scsi_cmnd * SCpnt)
 {
 	int result = SCpnt->result;
- 	unsigned int xfer_size = SCpnt->request_bufflen;
- 	unsigned int good_bytes = result ? 0 : xfer_size;
- 	u64 start_lba = SCpnt->request->sector;
- 	u64 bad_lba;
+	int this_count = SCpnt->request_bufflen;
+	int good_bytes = (result == 0 ? this_count : 0);
+	sector_t block_sectors = 1;
+	u64 first_err_block;
+	sector_t error_sector;
 	struct scsi_sense_hdr sshdr;
 	int sense_valid = 0;
 	int sense_deferred = 0;
@@ -905,6 +906,7 @@ static void sd_rw_intr(struct scsi_cmnd 
 		if (sense_valid)
 			sense_deferred = scsi_sense_is_deferred(&sshdr);
 	}
+
 #ifdef CONFIG_SCSI_LOGGING
 	SCSI_LOG_HLCOMPLETE(1, printk("sd_rw_intr: %s: res=0x%x\n", 
 				SCpnt->request->rq_disk->disk_name, result));
@@ -914,72 +916,89 @@ #ifdef CONFIG_SCSI_LOGGING
 				sshdr.sense_key, sshdr.asc, sshdr.ascq));
 	}
 #endif
-	if (driver_byte(result) != DRIVER_SENSE &&
-	    (!sense_valid || sense_deferred))
-		goto out;
+	/*
+	   Handle MEDIUM ERRORs that indicate partial success.  Since this is a
+	   relatively rare error condition, no care is taken to avoid
+	   unnecessary additional work such as memcpy's that could be avoided.
+	 */
+	if (driver_byte(result) != 0 &&
+		 sense_valid && !sense_deferred) {
+		switch (sshdr.sense_key) {
+		case MEDIUM_ERROR:
+			if (!blk_fs_request(SCpnt->request))
+				break;
+			info_valid = scsi_get_sense_info_fld(
+				SCpnt->sense_buffer, SCSI_SENSE_BUFFERSIZE,
+				&first_err_block);
+			/*
+			 * May want to warn and skip if following cast results
+			 * in actual truncation (if sector_t < 64 bits)
+			 */
+			error_sector = (sector_t)first_err_block;
+			if (SCpnt->request->bio != NULL)
+				block_sectors = bio_sectors(SCpnt->request->bio);
+			switch (SCpnt->device->sector_size) {
+			case 1024:
+				error_sector <<= 1;
+				if (block_sectors < 2)
+					block_sectors = 2;
+				break;
+			case 2048:
+				error_sector <<= 2;
+				if (block_sectors < 4)
+					block_sectors = 4;
+				break;
+			case 4096:
+				error_sector <<=3;
+				if (block_sectors < 8)
+					block_sectors = 8;
+				break;
+			case 256:
+				error_sector >>= 1;
+				break;
+			default:
+				break;
+			}
 
-	switch (sshdr.sense_key) {
-	case HARDWARE_ERROR:
-	case MEDIUM_ERROR:
-		if (!blk_fs_request(SCpnt->request))
-			goto out;
-		info_valid = scsi_get_sense_info_fld(SCpnt->sense_buffer,
-						     SCSI_SENSE_BUFFERSIZE,
-						     &bad_lba);
-		if (!info_valid)
-			goto out;
-		if (xfer_size <= SCpnt->device->sector_size)
-			goto out;
-		switch (SCpnt->device->sector_size) {
-		case 256:
-			start_lba <<= 1;
-			break;
-		case 512:
+			error_sector &= ~(block_sectors - 1);
+			good_bytes = (error_sector - SCpnt->request->sector) << 9;
+			if (good_bytes < 0 || good_bytes >= this_count)
+				good_bytes = 0;
 			break;
-		case 1024:
-			start_lba >>= 1;
-			break;
-		case 2048:
-			start_lba >>= 2;
+
+		case RECOVERED_ERROR: /* an error occurred, but it recovered */
+		case NO_SENSE: /* LLDD got sense data */
+			/*
+			 * Inform the user, but make sure that it's not treated
+			 * as a hard error.
+			 */
+			scsi_print_sense("sd", SCpnt);
+			SCpnt->result = 0;
+			memset(SCpnt->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
+			good_bytes = this_count;
 			break;
-		case 4096:
-			start_lba >>= 3;
+
+		case ILLEGAL_REQUEST:
+			if (SCpnt->device->use_10_for_rw &&
+			    (SCpnt->cmnd[0] == READ_10 ||
+			     SCpnt->cmnd[0] == WRITE_10))
+				SCpnt->device->use_10_for_rw = 0;
+			if (SCpnt->device->use_10_for_ms &&
+			    (SCpnt->cmnd[0] == MODE_SENSE_10 ||
+			     SCpnt->cmnd[0] == MODE_SELECT_10))
+				SCpnt->device->use_10_for_ms = 0;
 			break;
+
 		default:
-			/* Print something here with limiting frequency. */
-			goto out;
 			break;
 		}
-		/* This computation should always be done in terms of
-		 * the resolution of the device's medium.
-		 */
-		good_bytes = (bad_lba - start_lba)*SCpnt->device->sector_size;
-		break;
-	case RECOVERED_ERROR:
-	case NO_SENSE:
-		/* Inform the user, but make sure that it's not treated
-		 * as a hard error.
-		 */
-		scsi_print_sense("sd", SCpnt);
-		SCpnt->result = 0;
-		memset(SCpnt->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
-		good_bytes = xfer_size;
-		break;
-	case ILLEGAL_REQUEST:
-		if (SCpnt->device->use_10_for_rw &&
-		    (SCpnt->cmnd[0] == READ_10 ||
-		     SCpnt->cmnd[0] == WRITE_10))
-			SCpnt->device->use_10_for_rw = 0;
-		if (SCpnt->device->use_10_for_ms &&
-		    (SCpnt->cmnd[0] == MODE_SENSE_10 ||
-		     SCpnt->cmnd[0] == MODE_SELECT_10))
-			SCpnt->device->use_10_for_ms = 0;
-		break;
-	default:
-		break;
 	}
- out:
-	scsi_io_completion(SCpnt, good_bytes);
+	/*
+	 * This calls the generic completion function, now that we know
+	 * how many actual sectors finished, and how many sectors we need
+	 * to say have failed.
+	 */
+	scsi_io_completion(SCpnt, good_bytes, block_sectors << 9);
 }
 
 static int media_not_present(struct scsi_disk *sdkp,
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index fd94408..ebf6579 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -292,7 +292,7 @@ #endif
 	 * how many actual sectors finished, and how many sectors we need
 	 * to say have failed.
 	 */
-	scsi_io_completion(SCpnt, good_bytes);
+	scsi_io_completion(SCpnt, good_bytes, block_sectors << 9);
 }
 
 static int sr_init_command(struct scsi_cmnd * SCpnt)
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 371f70d..e46cd40 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -143,7 +143,7 @@ #define SCSI_STATE_MLQUEUE         0x100
 
 extern struct scsi_cmnd *scsi_get_command(struct scsi_device *, gfp_t);
 extern void scsi_put_command(struct scsi_cmnd *);
-extern void scsi_io_completion(struct scsi_cmnd *, unsigned int);
+extern void scsi_io_completion(struct scsi_cmnd *, unsigned int, unsigned int);
 extern void scsi_finish_command(struct scsi_cmnd *cmd);
 extern void scsi_req_abort_cmd(struct scsi_cmnd *cmd);
 



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-02 13:53         ` James Bottomley
@ 2006-07-02 14:28           ` Grant Wilson
  2006-07-02 15:06             ` James Bottomley
  0 siblings, 1 reply; 35+ messages in thread
From: Grant Wilson @ 2006-07-02 14:28 UTC (permalink / raw)
  To: James Bottomley
  Cc: Reuben Farrelly, Andrew Morton, Helge Hafting, linux-kernel,
	linux-scsi, Neil Brown

James Bottomley wrote:
> On Sun, 2006-07-02 at 17:13 +1200, Reuben Farrelly wrote:
>> Just for kicks, after testing those two trees (see previous email) I
>> took my 
>> 2.6.17-mm5 without git-scsi-misc and then patched git-scsi-misc.patch
>> back in, 
>> rebuilt and rebooted and noted that RAID broke again.  Reverted the
>> patch and it 
>> all worked.
>>
>> So I can conclude that definitely and reproduceably that's the
>> one.........
> 
> OK, I have a theory.  I think 
> 
> [SCSI] sd/scsi_lib simplify sd_rw_intr and scsi_io_completion
> 
> Failed to take into account completion of zero length commands (which is
> what a flush is).  Could you try the whole of -mm with this patch?
> 
> Thanks,
> 
> James
> 
[patch snipped]

With the patch applied to 2.6.17-mm5 my RAID-1 is up and running on both
SATA drives with no problems.

Thanks,
Grant

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-02 14:28           ` Grant Wilson
@ 2006-07-02 15:06             ` James Bottomley
  2006-07-02 15:43               ` Grant Wilson
  0 siblings, 1 reply; 35+ messages in thread
From: James Bottomley @ 2006-07-02 15:06 UTC (permalink / raw)
  To: Grant Wilson
  Cc: Reuben Farrelly, Andrew Morton, Helge Hafting, linux-kernel,
	linux-scsi, Neil Brown

On Sun, 2006-07-02 at 15:28 +0100, Grant Wilson wrote:
> With the patch applied to 2.6.17-mm5 my RAID-1 is up and running on both
> SATA drives with no problems.

That's great, thanks.  Now we know what the problem patch is, I'd like
to try an 11th our correction of the logic fault in the original.  Could
you try this patch against original -mm (by reversing the previous
patch).  I think it should correct the problem?

Thanks,

James

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index bf5191f..08af9aa 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -920,22 +920,20 @@ void scsi_io_completion(struct scsi_cmnd
 	 * Next deal with any sectors which we were able to correctly
 	 * handle.
 	 */
-	if (good_bytes > 0) {
-		SCSI_LOG_HLCOMPLETE(1, printk("%ld sectors total, "
-					      "%d bytes done.\n",
-					      req->nr_sectors, good_bytes));
-		SCSI_LOG_HLCOMPLETE(1, printk("use_sg is %d\n", cmd->use_sg));
-
-		if (clear_errors)
-			req->errors = 0;
-
-		/* A number of bytes were successfully read.  If there
-		 * is leftovers and there is some kind of error
-		 * (result != 0), retry the rest.
-		 */
-		if (scsi_end_request(cmd, 1, good_bytes, !!result) == NULL)
-			return;
-	}
+	SCSI_LOG_HLCOMPLETE(1, printk("%ld sectors total, "
+				      "%d bytes done.\n",
+				      req->nr_sectors, good_bytes));
+	SCSI_LOG_HLCOMPLETE(1, printk("use_sg is %d\n", cmd->use_sg));
+
+	if (clear_errors)
+		req->errors = 0;
+
+	/* A number of bytes were successfully read.  If there
+	 * are leftovers and there is some kind of error
+	 * (result != 0), retry the rest.
+	 */
+	if (scsi_end_request(cmd, 1, good_bytes, result == 0) == NULL)
+		return;
 
 	/* good_bytes = 0, or (inclusive) there were leftovers and
 	 * result = 0, so scsi_end_request couldn't retry.



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-02 15:06             ` James Bottomley
@ 2006-07-02 15:43               ` Grant Wilson
  2006-07-02 19:07                 ` Helge Hafting
  0 siblings, 1 reply; 35+ messages in thread
From: Grant Wilson @ 2006-07-02 15:43 UTC (permalink / raw)
  To: James Bottomley
  Cc: Reuben Farrelly, Andrew Morton, Helge Hafting, linux-kernel,
	linux-scsi, Neil Brown

James Bottomley wrote:
> On Sun, 2006-07-02 at 15:28 +0100, Grant Wilson wrote:
>> With the patch applied to 2.6.17-mm5 my RAID-1 is up and running on both
>> SATA drives with no problems.
> 
> That's great, thanks.  Now we know what the problem patch is, I'd like
> to try an 11th our correction of the logic fault in the original.  Could
> you try this patch against original -mm (by reversing the previous
> patch).  I think it should correct the problem?
> 
> Thanks,
> 
> James
> 
[snip]

With the first patch reversed and the second applied to -mm5 my RAID-1
array is still working correctly on both disks.

Thanks again,
Grant

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-02 15:43               ` Grant Wilson
@ 2006-07-02 19:07                 ` Helge Hafting
  2006-07-03  6:52                   ` Reuben Farrelly
  0 siblings, 1 reply; 35+ messages in thread
From: Helge Hafting @ 2006-07-02 19:07 UTC (permalink / raw)
  To: Grant Wilson
  Cc: James Bottomley, Reuben Farrelly, Andrew Morton, linux-kernel,
	linux-scsi, Neil Brown

On Sun, Jul 02, 2006 at 04:43:14PM +0100, Grant Wilson wrote:
> James Bottomley wrote:
> > On Sun, 2006-07-02 at 15:28 +0100, Grant Wilson wrote:
> >> With the patch applied to 2.6.17-mm5 my RAID-1 is up and running on both
> >> SATA drives with no problems.
> > 
> > That's great, thanks.  Now we know what the problem patch is, I'd like
> > to try an 11th our correction of the logic fault in the original.  Could
> > you try this patch against original -mm (by reversing the previous
> > patch).  I think it should correct the problem?
> > 
> > Thanks,
> > 
> > James
> > 
> [snip]
> 
> With the first patch reversed and the second applied to -mm5 my RAID-1
> array is still working correctly on both disks.
> 
The patch makes 2.6.17-mm5 md work on SATA and SCSI for me too.

Helge Hafting

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 10:35 ` 2.6.17-mm5 Andrew Morton
                     ` (4 preceding siblings ...)
  2006-07-02 10:03   ` 2.6.17-mm5 Andy Whitcroft
@ 2006-07-03  0:47   ` Theodore Tso
  2006-07-03  7:32   ` 2.6.17-mm5 Heiko Carstens
  6 siblings, 0 replies; 35+ messages in thread
From: Theodore Tso @ 2006-07-03  0:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

The following patch is needed to fix UML compilation in -mm5 given
that alternatives_smp_module_add and alternatives_smp_module_del are
null inline functions if !CONFIG_SMP.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

Index: linux-2.6.17-mm5/arch/um/kernel/um_arch.c
===================================================================
--- linux-2.6.17-mm5.orig/arch/um/kernel/um_arch.c	2006-07-02 20:37:17.000000000 -0400
+++ linux-2.6.17-mm5/arch/um/kernel/um_arch.c	2006-07-02 20:38:08.000000000 -0400
@@ -495,6 +495,7 @@
 {
 }
 
+#ifdef CONFIG_SMP
 void alternatives_smp_module_add(struct module *mod, char *name,
 				 void *locks, void *locks_end,
 				 void *text,  void *text_end)
@@ -504,3 +505,4 @@
 void alternatives_smp_module_del(struct module *mod)
 {
 }
+#endif

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5 dislikes raid-1, just like mm4
  2006-07-02 19:07                 ` Helge Hafting
@ 2006-07-03  6:52                   ` Reuben Farrelly
  0 siblings, 0 replies; 35+ messages in thread
From: Reuben Farrelly @ 2006-07-03  6:52 UTC (permalink / raw)
  To: Helge Hafting
  Cc: Grant Wilson, James Bottomley, Andrew Morton, linux-kernel,
	linux-scsi, Neil Brown



On 3/07/2006 7:07 a.m., Helge Hafting wrote:
> On Sun, Jul 02, 2006 at 04:43:14PM +0100, Grant Wilson wrote:
>> James Bottomley wrote:
>>> On Sun, 2006-07-02 at 15:28 +0100, Grant Wilson wrote:
>>>> With the patch applied to 2.6.17-mm5 my RAID-1 is up and running on both
>>>> SATA drives with no problems.
>>> That's great, thanks.  Now we know what the problem patch is, I'd like
>>> to try an 11th our correction of the logic fault in the original.  Could
>>> you try this patch against original -mm (by reversing the previous
>>> patch).  I think it should correct the problem?
>>>
>>> Thanks,
>>>
>>> James
>>>
>> [snip]
>>
>> With the first patch reversed and the second applied to -mm5 my RAID-1
>> array is still working correctly on both disks.
>>
> The patch makes 2.6.17-mm5 md work on SATA and SCSI for me too.
> 
> Helge Hafting

+1.  Fixes everything here up too.

So with two patches applied (this one and an unrelated MSI fix) I'm all up and 
running perfectly on -mm5.

Thanks,
Reuben

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 10:35 ` 2.6.17-mm5 Andrew Morton
                     ` (5 preceding siblings ...)
  2006-07-03  0:47   ` 2.6.17-mm5 Theodore Tso
@ 2006-07-03  7:32   ` Heiko Carstens
  6 siblings, 0 replies; 35+ messages in thread
From: Heiko Carstens @ 2006-07-03  7:32 UTC (permalink / raw)
  To: Andrew Morton, Martin Peschke; +Cc: linux-kernel

  LD      .tmp_vmlinux1
drivers/s390/built-in.o(.text+0x587f2): In function `zfcp_ccw_set_online':
: undefined reference to `statistic_create'
drivers/s390/built-in.o(.text+0x58838): In function `zfcp_ccw_set_online':
: undefined reference to `statistic_remove'
drivers/s390/built-in.o(.text+0x58954): In function `zfcp_ccw_set_offline':
: undefined reference to `statistic_remove'
drivers/s390/built-in.o(.text+0x603e0): In function `zfcp_erp_thread':
: undefined reference to `statistic_add'
drivers/s390/built-in.o(.text+0x60676): In function `zfcp_erp_thread':
: undefined reference to `statistic_add'
drivers/s390/built-in.o(.text+0x62000): In function `zfcp_qdio_response_handler':
: undefined reference to `statistic_add'
drivers/s390/built-in.o(.text+0x622b2): In function `zfcp_qdio_sbals_from_sg':
: undefined reference to `statistic_add'
drivers/s390/built-in.o(.text+0x6258a): In function `zfcp_qdio_sbals_from_scsicmnd':
: undefined reference to `statistic_add'
drivers/s390/built-in.o(.text+0x6280c): more undefined references to `statistic_add' follow
make: *** [.tmp_vmlinux1] Error 1

Guess there is a couple of do {} while(0) defines missing in
include/linux/statistic.h for the !CONFIG_STATISTICS case. Martin?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 21:30     ` 2.6.17-mm5 Andrew Morton
                         ` (2 preceding siblings ...)
  2006-07-01 22:54       ` 2.6.17-mm5 Jeff Garzik
@ 2006-07-27 21:02       ` Ming Zhang
  3 siblings, 0 replies; 35+ messages in thread
From: Ming Zhang @ 2006-07-27 21:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Grant Wilson, linux-kernel, Neil Brown, linux-scsi

On Sat, 2006-07-01 at 14:30 -0700, Andrew Morton wrote:
> On Sat, 1 Jul 2006 15:24:19 +0100
<...>

> 
> > [  155.123022] Unable to handle kernel NULL pointer dereference at 0000000000000048 RIP: 
> > [  155.155867]  [<ffffffff8047157a>] md_error+0x45/0x91
> > [  155.200353] PGD 77954067 PUD 726e5067 PMD 0 
> > [  155.226233] Oops: 0000 [1] PREEMPT SMP 
> > [  155.249516] last sysfs file: /devices/system/cpu/cpu0/cpufreq/scaling_setspeed
> > [  155.292808] CPU 0 
> > [  155.304968] Modules linked in: dm_mod evdev
> > [  155.330331] Pid: 0, comm: swapper Not tainted 2.6.17-mm5 #1
> > [  155.363697] RIP: 0010:[<ffffffff8047157a>]  [<ffffffff8047157a>] md_error+0x45/0x91
> > [  155.409638] RSP: 0018:ffffffff807a0c50  EFLAGS: 00010046
> > [  155.441445] RAX: 0000000000000000 RBX: ffff81007aa34708 RCX: 000000000000003f
> > [  155.484216] RDX: 00000000fffffffb RSI: ffff81007a821d28 RDI: ffff81007aa34708
> > [  155.526989] RBP: ffffffff807a0c60 R08: 0000000000000000 R09: ffff81007aac43b0
> > [  155.569759] R10: ffffffff804221e5 R11: 0000000000000058 R12: ffff81007aac4ab0
> > [  155.612533] R13: ffff81007aac43b0 R14: ffff81007aac4ab0 R15: 00000000fffffffb
> > [  155.655303] FS:  00002aeb361606d0(0000) GS:ffffffff80a46000(0000) knlGS:0000000000000000
> > [  155.703791] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > [  155.738195] CR2: 0000000000000048 CR3: 0000000070997000 CR4: 00000000000006e0
> > [  155.780969] Process swapper (pid: 0, threadinfo ffffffff80a64000, task ffffffff80696a00)
> > [  155.829404] Stack:  ffff81007a821d28 ffff81007aa34708 ffffffff807a0c80 ffffffff804728d9
> > [  155.877840]  ffff81007a821d28 ffff81007aa34708 ffffffff807a0cc0 ffffffff8047409c
> > [  155.922535]  00001000807a0d00 ffff81007aac4ab0 00000000fffffffb ffff81007aac4ab0
> > [  155.966085] Call Trace:
> > [  155.982416]  [<ffffffff804728d9>] super_written+0x30/0x65
> > [  156.015292]  [<ffffffff8047409c>] super_written_barrier+0xc4/0xd1
> > [  156.052297]  [<ffffffff8023a5a5>] bio_endio+0x56/0x5b
> > [  156.082688]  [<ffffffff8022d21b>] __end_that_request_first+0x1c9/0x4c9
> > [  156.122068]  [<ffffffff8024a0d6>] end_that_request_first+0xc/0xe
> > [  156.158343]  [<ffffffff8036a692>] blk_ordered_complete_seq+0x7c/0x8b
> > [  156.196705]  [<ffffffff8036a6d1>] post_flush_end_io+0x30/0x35
> > [  156.231419]  [<ffffffff8036a5b5>] end_that_request_last+0xd9/0xf6
> > [  156.268215]  [<ffffffff80422204>] scsi_end_request+0xad/0xd7
> > [  156.302573]  [<ffffffff80422637>] scsi_io_completion+0x3e1/0x3f0
> > [  156.339004]  [<ffffffff8042266c>] scsi_blk_pc_done+0x26/0x28
> > [  156.373357]  [<ffffffff8041d11e>] scsi_finish_command+0xa9/0xb2
> > [  156.409264]  [<ffffffff804229f9>] scsi_softirq_done+0xf4/0xfd
> > [  156.444143]  [<ffffffff80237f66>] blk_done_softirq+0x70/0x7f
> > [  156.478323]  [<ffffffff80211366>] __do_softirq+0x67/0xf4
> > [  156.510224]  [<ffffffff8025f95e>] call_softirq+0x1e/0x28
> > [  156.542083] 
> > [  156.542083] Code: 48 8b 40 48 48 85 c0 74 3f ff d0 f0 0f ba ab e0 01 00 00 03 
> 
> The barrier code is in there again.
> 
> mddev->pers is NULL in md_error(), so the test of


feel curious, how did you find out it is because "mddev->pers is NULL"?

thanks!


> !mddev->pers->error_handler oopsed.  Perhaps this is a real MD bug which is
> now being exposed by the new barrier-handling problem.
> 
> 
> This should get you further, but...
> 
> From: Andrew Morton <akpm@osdl.org>
> 
> Cc: Neil Brown <neilb@suse.de>
> Signed-off-by: Andrew Morton <akpm@osdl.org>
> ---
> 
>  drivers/md/md.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff -puN drivers/md/md.c~md-oops-workaround drivers/md/md.c
> --- a/drivers/md/md.c~md-oops-workaround
> +++ a/drivers/md/md.c
> @@ -4586,6 +4586,8 @@ void md_error(mddev_t *mddev, mdk_rdev_t
>  		__builtin_return_address(0),__builtin_return_address(1),
>  		__builtin_return_address(2),__builtin_return_address(3));
>  */
> +	if (!mddev->pers)
> +		return;
>  	if (!mddev->pers->error_handler)
>  		return;
>  	mddev->pers->error_handler(mddev,rdev);
> _
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2006-07-27 21:02 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20060701175444.958D6E00608B@knarzkiste.dyndns.org>
2006-07-01 10:35 ` 2.6.17-mm5 Andrew Morton
2006-07-01 11:08   ` 2.6.17-mm5 Reuben Farrelly
2006-07-01 11:51     ` 2.6.17-mm5 Andrew Morton
2006-07-01 12:31       ` 2.6.17-mm5 Reuben Farrelly
2006-07-01 13:06         ` 2.6.17-mm5 Brice Goglin
2006-07-01 17:00           ` 2.6.17-mm5 Greg KH
2006-07-01 18:03   ` 2.6.17-mm5 Ralf Hildebrandt
2006-07-01 18:14   ` 2.6.17-mm5 dislikes raid-1, just like mm4 Helge Hafting
2006-07-01 22:22     ` Andrew Morton
2006-07-01 22:52       ` Jeff Garzik
2006-07-01 22:58         ` Andrew Morton
2006-07-02  4:43       ` Reuben Farrelly
2006-07-02  6:09         ` Andrew Morton
2006-07-02  5:13       ` Reuben Farrelly
2006-07-02 13:53         ` James Bottomley
2006-07-02 14:28           ` Grant Wilson
2006-07-02 15:06             ` James Bottomley
2006-07-02 15:43               ` Grant Wilson
2006-07-02 19:07                 ` Helge Hafting
2006-07-03  6:52                   ` Reuben Farrelly
2006-07-02  3:51     ` Tejun Heo
     [not found]   ` <20060701142419.GB28750@tlg.swandive.local>
2006-07-01 21:30     ` 2.6.17-mm5 Andrew Morton
2006-07-01 22:26       ` 2.6.17-mm5 James Bottomley
2006-07-01 22:32         ` 2.6.17-mm5 Neil Brown
2006-07-01 22:56           ` 2.6.17-mm5 Jeff Garzik
2006-07-02  0:10             ` 2.6.17-mm5 James Bottomley
2006-07-01 22:29       ` More RAID / SATA / barrier problems [ Re: 2.6.17-mm5 ] Neil Brown
2006-07-01 22:54       ` 2.6.17-mm5 Jeff Garzik
2006-07-27 21:02       ` 2.6.17-mm5 Ming Zhang
2006-07-02 10:03   ` 2.6.17-mm5 Andy Whitcroft
2006-07-02 10:14     ` 2.6.17-mm5 Andrew Morton
2006-07-02 10:40       ` 2.6.17-mm5 Andy Whitcroft
2006-07-02 11:14         ` 2.6.17-mm5 Andrew Morton
2006-07-03  0:47   ` 2.6.17-mm5 Theodore Tso
2006-07-03  7:32   ` 2.6.17-mm5 Heiko Carstens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).