All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.26-rc5-mm3
@ 2008-06-12  5:59 ` Andrew Morton
  0 siblings, 0 replies; 290+ messages in thread
From: Andrew Morton @ 2008-06-12  5:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: kernel-testers, linux-mm


ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/

- This is a bugfixed version of 2.6.26-rc5-mm2, which was a bugfixed
  version of 2.6.26-rc5-mm1.  None of the git trees were repulled for -mm3
  (and nor were they repulled for -mm2).

  The aim here is to get all the stupid bugs out of the way so that some
  serious MM testing can be performed.

- Please perform some serious MM testing.



Boilerplate:

- See the `hot-fixes' directory for any important updates to this patchset.

- To fetch an -mm tree using git, use (for example)

  git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1
  git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1

- -mm kernel commit activity can be reviewed by subscribing to the
  mm-commits mailing list.

        echo "subscribe mm-commits" | mail majordomo@vger.kernel.org

- If you hit a bug in -mm and it is not obvious which patch caused it, it is
  most valuable if you can perform a bisection search to identify which patch
  introduced the bug.  Instructions for this process are at

        http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt

  But beware that this process takes some time (around ten rebuilds and
  reboots), so consider reporting the bug first and if we cannot immediately
  identify the faulty patch, then perform the bisection search.

- When reporting bugs, please try to Cc: the relevant maintainer and mailing
  list on any email.

- When reporting bugs in this kernel via email, please also rewrite the
  email Subject: in some manner to reflect the nature of the bug.  Some
  developers filter by Subject: when looking for messages to read.

- Occasional snapshots of the -mm lineup are uploaded to
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on
  the mm-commits list.  These probably are at least compilable.

- More-than-daily -mm snapshots may be found at
  http://userweb.kernel.org/~akpm/mmotm/.  These are almost certainly not
  compileable.




Changes since 2.6.26-rc5-mm2:

 origin.patch
 linux-next.patch
 git-jg-misc.patch
 git-leds.patch
 git-libata-all.patch
 git-battery.patch
 git-parisc.patch
 git-regulator.patch
 git-unionfs.patch
 git-logfs.patch
 git-unprivileged-mounts.patch
 git-xtensa.patch
 git-orion.patch
 git-pekka.patch

 git trees

-fsldma-the-mpc8377mds-board-device-tree-node-for-fsldma-driver.patch

 Merged into mainline or a subsystem tree

+capabilities-add-back-dummy-support-for-keepcaps.patch
+cciss-add-new-hardware-support.patch
+cciss-add-new-hardware-support-fix.patch
+cciss-bump-version-to-20-to-reflect-new-hw-support.patch
+kprobes-fix-error-checking-of-batch-registration.patch
+m68knommu-init-coldfire-timer-trr-with-n-1-not-n.patch
+rtc-at32ap700x-fix-bug-in-at32_rtc_readalarm.patch

 2.6.26 queue

-acpi-video-balcklist-fujitsu-lifebook-s6410.patch

 Dropped

-bay-exit-if-notify-handler-cannot-be-installed.patch

 Dropped

-intel-agp-rewrite-gtt-on-resume-fix.patch
-intel-agp-rewrite-gtt-on-resume-fix-fix.patch

 Folded into intel-agp-rewrite-gtt-on-resume.patch

-kbuild-move-non-__kernel__-checking-headers-to-header-y.patch

 Dropped

+8390-split-8390-support-into-a-pausing-and-a-non-pausing-driver-core-fix.patch

 Unfix
 8390-split-8390-support-into-a-pausing-and-a-non-pausing-driver-core.patch

+mpc8xxx_wdt-various-renames-mostly-s-mpc83xx-mpc8xxx-g-fix.patch

 Fix mpc8xxx_wdt-various-renames-mostly-s-mpc83xx-mpc8xxx-g.patch

+intel_rng-make-device-not-found-a-warning.patch
+driver-video-cirrusfb-fix-ram-address-printk.patch
+driver-video-cirrusfb-fix-ram-address-printk-fix.patch
+driver-video-cirrusfb-fix-ram-address-printk-fix-fix.patch
+driver-char-generic_nvram-fix-banner.patch
+pagemap-pass-mm-into-pagewalkers.patch
+pagemap-fix-large-pages-in-pagemap.patch
+proc-sysvipc-shm-fix-32-bit-truncation-of-segment-sizes.patch
+console-keyboard-mapping-broken-by-04c71976.patch

 More 2.6.26 queue

+bay-exit-if-notify-handler-cannot-be-installed.patch

 ACPI Bay driver fix

+smc91x-fix-build-error-from-the-smc_get_mac_addr-api-change.patch

 netdev fix

+add-a-helper-function-to-test-if-an-object-is-on-the-stack.patch

 Infrastructure

+hugetlb-introduce-pud_huge-s390-fix.patch

 Fix hugetlb-introduce-pud_huge.patch some more

+kprobes-remove-redundant-config-check.patch
+kprobes-indirectly-call-kprobe_target.patch
+kprobes-add-tests-for-register_kprobes.patch

 kprobers updates

+not-for-merging-pnp-changes-suspend-oops.patch

 Try to debug some pnp problems

+quota-move-function-macros-from-quotah-to-quotaopsh-fix.patch

 Fix quota-move-function-macros-from-quotah-to-quotaopsh.patch some more

+memcg-remove-refcnt-from-page_cgroup-fix-2.patch

 Fix memcg-remove-refcnt-from-page_cgroup-fix.patch

+sgi-xp-eliminate-in-comments.patch
+sgi-xp-use-standard-bitops-macros-and-functions.patch
+sgi-xp-add-jiffies-to-reserved-pages-timestamp-name.patch

 Update SGI XP driver

+dma-mapping-add-the-device-argument-to-dma_mapping_error-b34-fix.patch

 Fix dma-mapping-add-the-device-argument-to-dma_mapping_error.patch som more

+include-linux-aioh-removed-duplicated-include.patch

 AIO cleanup

+kernel-call-constructors-uml-fix-1.patch
+kernel-call-constructors-uml-fix-2.patch

 Fix kernel-call-constructors.patch

+x86-support-1gb-hugepages-with-get_user_pages_lockless.patch

 Wire up x86 large large pages

+mm-speculative-page-references-hugh-fix3.patch

 Fix mm-speculative-page-references.patch som more

 vmscan-move-isolate_lru_page-to-vmscanc.patch
+vmscan-move-isolate_lru_page-to-vmscanc-fix.patch
 vmscan-use-an-indexed-array-for-lru-variables.patch
-vmscan-use-an-array-for-the-lru-pagevecs.patch
+swap-use-an-array-for-the-lru-pagevecs.patch
 vmscan-free-swap-space-on-swap-in-activation.patch
-vmscan-define-page_file_cache-function.patch
+define-page_file_cache-function.patch
 vmscan-split-lru-lists-into-anon-file-sets.patch
 vmscan-second-chance-replacement-for-anonymous-pages.patch
-vmscan-add-some-sanity-checks-to-get_scan_ratio.patch
 vmscan-fix-pagecache-reclaim-referenced-bit-check.patch
 vmscan-add-newly-swapped-in-pages-to-the-inactive-list.patch
-vmscan-more-aggressively-use-lumpy-reclaim.patch
-vmscan-pageflag-helpers-for-configed-out-flags.patch
-vmscan-noreclaim-lru-infrastructure.patch
-vmscan-noreclaim-lru-page-statistics.patch
-vmscan-ramfs-and-ram-disk-pages-are-non-reclaimable.patch
-vmscan-shm_locked-pages-are-non-reclaimable.patch
-vmscan-mlocked-pages-are-non-reclaimable.patch
-vmscan-downgrade-mmap-sem-while-populating-mlocked-regions.patch
-vmscan-handle-mlocked-pages-during-map-remap-unmap.patch
-vmscan-mlocked-pages-statistics.patch
-vmscan-cull-non-reclaimable-pages-in-fault-path.patch
-vmscan-noreclaim-and-mlocked-pages-vm-events.patch
-mm-only-vmscan-noreclaim-lru-scan-sysctl.patch
-vmscan-mlocked-pages-count-attempts-to-free-mlocked-page.patch
-vmscan-noreclaim-lru-and-mlocked-pages-documentation.patch
+more-aggressively-use-lumpy-reclaim.patch
+pageflag-helpers-for-configed-out-flags.patch
+unevictable-lru-infrastructure.patch
+unevictable-lru-page-statistics.patch
+ramfs-and-ram-disk-pages-are-unevictable.patch
+shm_locked-pages-are-unevictable.patch
+mlock-mlocked-pages-are-unevictable.patch
+mlock-mlocked-pages-are-unevictable-fix.patch
+mlock-mlocked-pages-are-unevictable-fix-fix.patch
+mlock-mlocked-pages-are-unevictable-fix-2.patch
+mlock-downgrade-mmap-sem-while-populating-mlocked-regions.patch
+mmap-handle-mlocked-pages-during-map-remap-unmap.patch
+mmap-handle-mlocked-pages-during-map-remap-unmap-cleanup.patch
+vmstat-mlocked-pages-statistics.patch
+swap-cull-unevictable-pages-in-fault-path.patch
+vmstat-unevictable-and-mlocked-pages-vm-events.patch
+vmscan-unevictable-lru-scan-sysctl.patch
+vmscan-unevictable-lru-scan-sysctl-nommu-fix.patch
+mlock-count-attempts-to-free-mlocked-page.patch
+doc-unevictable-lru-and-mlocked-pages-documentation.patch

 New iteration of Rik's page reclaim work

1390 commits in 967 patch files



All patches:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/patch-list



^ permalink raw reply	[flat|nested] 290+ messages in thread

* 2.6.26-rc5-mm3
@ 2008-06-12  5:59 ` Andrew Morton
  0 siblings, 0 replies; 290+ messages in thread
From: Andrew Morton @ 2008-06-12  5:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: kernel-testers, linux-mm

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/

- This is a bugfixed version of 2.6.26-rc5-mm2, which was a bugfixed
  version of 2.6.26-rc5-mm1.  None of the git trees were repulled for -mm3
  (and nor were they repulled for -mm2).

  The aim here is to get all the stupid bugs out of the way so that some
  serious MM testing can be performed.

- Please perform some serious MM testing.



Boilerplate:

- See the `hot-fixes' directory for any important updates to this patchset.

- To fetch an -mm tree using git, use (for example)

  git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1
  git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1

- -mm kernel commit activity can be reviewed by subscribing to the
  mm-commits mailing list.

        echo "subscribe mm-commits" | mail majordomo@vger.kernel.org

- If you hit a bug in -mm and it is not obvious which patch caused it, it is
  most valuable if you can perform a bisection search to identify which patch
  introduced the bug.  Instructions for this process are at

        http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt

  But beware that this process takes some time (around ten rebuilds and
  reboots), so consider reporting the bug first and if we cannot immediately
  identify the faulty patch, then perform the bisection search.

- When reporting bugs, please try to Cc: the relevant maintainer and mailing
  list on any email.

- When reporting bugs in this kernel via email, please also rewrite the
  email Subject: in some manner to reflect the nature of the bug.  Some
  developers filter by Subject: when looking for messages to read.

- Occasional snapshots of the -mm lineup are uploaded to
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on
  the mm-commits list.  These probably are at least compilable.

- More-than-daily -mm snapshots may be found at
  http://userweb.kernel.org/~akpm/mmotm/.  These are almost certainly not
  compileable.




Changes since 2.6.26-rc5-mm2:

 origin.patch
 linux-next.patch
 git-jg-misc.patch
 git-leds.patch
 git-libata-all.patch
 git-battery.patch
 git-parisc.patch
 git-regulator.patch
 git-unionfs.patch
 git-logfs.patch
 git-unprivileged-mounts.patch
 git-xtensa.patch
 git-orion.patch
 git-pekka.patch

 git trees

-fsldma-the-mpc8377mds-board-device-tree-node-for-fsldma-driver.patch

 Merged into mainline or a subsystem tree

+capabilities-add-back-dummy-support-for-keepcaps.patch
+cciss-add-new-hardware-support.patch
+cciss-add-new-hardware-support-fix.patch
+cciss-bump-version-to-20-to-reflect-new-hw-support.patch
+kprobes-fix-error-checking-of-batch-registration.patch
+m68knommu-init-coldfire-timer-trr-with-n-1-not-n.patch
+rtc-at32ap700x-fix-bug-in-at32_rtc_readalarm.patch

 2.6.26 queue

-acpi-video-balcklist-fujitsu-lifebook-s6410.patch

 Dropped

-bay-exit-if-notify-handler-cannot-be-installed.patch

 Dropped

-intel-agp-rewrite-gtt-on-resume-fix.patch
-intel-agp-rewrite-gtt-on-resume-fix-fix.patch

 Folded into intel-agp-rewrite-gtt-on-resume.patch

-kbuild-move-non-__kernel__-checking-headers-to-header-y.patch

 Dropped

+8390-split-8390-support-into-a-pausing-and-a-non-pausing-driver-core-fix.patch

 Unfix
 8390-split-8390-support-into-a-pausing-and-a-non-pausing-driver-core.patch

+mpc8xxx_wdt-various-renames-mostly-s-mpc83xx-mpc8xxx-g-fix.patch

 Fix mpc8xxx_wdt-various-renames-mostly-s-mpc83xx-mpc8xxx-g.patch

+intel_rng-make-device-not-found-a-warning.patch
+driver-video-cirrusfb-fix-ram-address-printk.patch
+driver-video-cirrusfb-fix-ram-address-printk-fix.patch
+driver-video-cirrusfb-fix-ram-address-printk-fix-fix.patch
+driver-char-generic_nvram-fix-banner.patch
+pagemap-pass-mm-into-pagewalkers.patch
+pagemap-fix-large-pages-in-pagemap.patch
+proc-sysvipc-shm-fix-32-bit-truncation-of-segment-sizes.patch
+console-keyboard-mapping-broken-by-04c71976.patch

 More 2.6.26 queue

+bay-exit-if-notify-handler-cannot-be-installed.patch

 ACPI Bay driver fix

+smc91x-fix-build-error-from-the-smc_get_mac_addr-api-change.patch

 netdev fix

+add-a-helper-function-to-test-if-an-object-is-on-the-stack.patch

 Infrastructure

+hugetlb-introduce-pud_huge-s390-fix.patch

 Fix hugetlb-introduce-pud_huge.patch some more

+kprobes-remove-redundant-config-check.patch
+kprobes-indirectly-call-kprobe_target.patch
+kprobes-add-tests-for-register_kprobes.patch

 kprobers updates

+not-for-merging-pnp-changes-suspend-oops.patch

 Try to debug some pnp problems

+quota-move-function-macros-from-quotah-to-quotaopsh-fix.patch

 Fix quota-move-function-macros-from-quotah-to-quotaopsh.patch some more

+memcg-remove-refcnt-from-page_cgroup-fix-2.patch

 Fix memcg-remove-refcnt-from-page_cgroup-fix.patch

+sgi-xp-eliminate-in-comments.patch
+sgi-xp-use-standard-bitops-macros-and-functions.patch
+sgi-xp-add-jiffies-to-reserved-pages-timestamp-name.patch

 Update SGI XP driver

+dma-mapping-add-the-device-argument-to-dma_mapping_error-b34-fix.patch

 Fix dma-mapping-add-the-device-argument-to-dma_mapping_error.patch som more

+include-linux-aioh-removed-duplicated-include.patch

 AIO cleanup

+kernel-call-constructors-uml-fix-1.patch
+kernel-call-constructors-uml-fix-2.patch

 Fix kernel-call-constructors.patch

+x86-support-1gb-hugepages-with-get_user_pages_lockless.patch

 Wire up x86 large large pages

+mm-speculative-page-references-hugh-fix3.patch

 Fix mm-speculative-page-references.patch som more

 vmscan-move-isolate_lru_page-to-vmscanc.patch
+vmscan-move-isolate_lru_page-to-vmscanc-fix.patch
 vmscan-use-an-indexed-array-for-lru-variables.patch
-vmscan-use-an-array-for-the-lru-pagevecs.patch
+swap-use-an-array-for-the-lru-pagevecs.patch
 vmscan-free-swap-space-on-swap-in-activation.patch
-vmscan-define-page_file_cache-function.patch
+define-page_file_cache-function.patch
 vmscan-split-lru-lists-into-anon-file-sets.patch
 vmscan-second-chance-replacement-for-anonymous-pages.patch
-vmscan-add-some-sanity-checks-to-get_scan_ratio.patch
 vmscan-fix-pagecache-reclaim-referenced-bit-check.patch
 vmscan-add-newly-swapped-in-pages-to-the-inactive-list.patch
-vmscan-more-aggressively-use-lumpy-reclaim.patch
-vmscan-pageflag-helpers-for-configed-out-flags.patch
-vmscan-noreclaim-lru-infrastructure.patch
-vmscan-noreclaim-lru-page-statistics.patch
-vmscan-ramfs-and-ram-disk-pages-are-non-reclaimable.patch
-vmscan-shm_locked-pages-are-non-reclaimable.patch
-vmscan-mlocked-pages-are-non-reclaimable.patch
-vmscan-downgrade-mmap-sem-while-populating-mlocked-regions.patch
-vmscan-handle-mlocked-pages-during-map-remap-unmap.patch
-vmscan-mlocked-pages-statistics.patch
-vmscan-cull-non-reclaimable-pages-in-fault-path.patch
-vmscan-noreclaim-and-mlocked-pages-vm-events.patch
-mm-only-vmscan-noreclaim-lru-scan-sysctl.patch
-vmscan-mlocked-pages-count-attempts-to-free-mlocked-page.patch
-vmscan-noreclaim-lru-and-mlocked-pages-documentation.patch
+more-aggressively-use-lumpy-reclaim.patch
+pageflag-helpers-for-configed-out-flags.patch
+unevictable-lru-infrastructure.patch
+unevictable-lru-page-statistics.patch
+ramfs-and-ram-disk-pages-are-unevictable.patch
+shm_locked-pages-are-unevictable.patch
+mlock-mlocked-pages-are-unevictable.patch
+mlock-mlocked-pages-are-unevictable-fix.patch
+mlock-mlocked-pages-are-unevictable-fix-fix.patch
+mlock-mlocked-pages-are-unevictable-fix-2.patch
+mlock-downgrade-mmap-sem-while-populating-mlocked-regions.patch
+mmap-handle-mlocked-pages-during-map-remap-unmap.patch
+mmap-handle-mlocked-pages-during-map-remap-unmap-cleanup.patch
+vmstat-mlocked-pages-statistics.patch
+swap-cull-unevictable-pages-in-fault-path.patch
+vmstat-unevictable-and-mlocked-pages-vm-events.patch
+vmscan-unevictable-lru-scan-sysctl.patch
+vmscan-unevictable-lru-scan-sysctl-nommu-fix.patch
+mlock-count-attempts-to-free-mlocked-page.patch
+doc-unevictable-lru-and-mlocked-pages-documentation.patch

 New iteration of Rik's page reclaim work

1390 commits in 967 patch files



All patches:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/patch-list


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* 2.6.26-rc5-mm3
@ 2008-06-12  5:59 ` Andrew Morton
  0 siblings, 0 replies; 290+ messages in thread
From: Andrew Morton @ 2008-06-12  5:59 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: kernel-testers-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg


ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/

- This is a bugfixed version of 2.6.26-rc5-mm2, which was a bugfixed
  version of 2.6.26-rc5-mm1.  None of the git trees were repulled for -mm3
  (and nor were they repulled for -mm2).

  The aim here is to get all the stupid bugs out of the way so that some
  serious MM testing can be performed.

- Please perform some serious MM testing.



Boilerplate:

- See the `hot-fixes' directory for any important updates to this patchset.

- To fetch an -mm tree using git, use (for example)

  git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1
  git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1

- -mm kernel commit activity can be reviewed by subscribing to the
  mm-commits mailing list.

        echo "subscribe mm-commits" | mail majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

- If you hit a bug in -mm and it is not obvious which patch caused it, it is
  most valuable if you can perform a bisection search to identify which patch
  introduced the bug.  Instructions for this process are at

        http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt

  But beware that this process takes some time (around ten rebuilds and
  reboots), so consider reporting the bug first and if we cannot immediately
  identify the faulty patch, then perform the bisection search.

- When reporting bugs, please try to Cc: the relevant maintainer and mailing
  list on any email.

- When reporting bugs in this kernel via email, please also rewrite the
  email Subject: in some manner to reflect the nature of the bug.  Some
  developers filter by Subject: when looking for messages to read.

- Occasional snapshots of the -mm lineup are uploaded to
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on
  the mm-commits list.  These probably are at least compilable.

- More-than-daily -mm snapshots may be found at
  http://userweb.kernel.org/~akpm/mmotm/.  These are almost certainly not
  compileable.




Changes since 2.6.26-rc5-mm2:

 origin.patch
 linux-next.patch
 git-jg-misc.patch
 git-leds.patch
 git-libata-all.patch
 git-battery.patch
 git-parisc.patch
 git-regulator.patch
 git-unionfs.patch
 git-logfs.patch
 git-unprivileged-mounts.patch
 git-xtensa.patch
 git-orion.patch
 git-pekka.patch

 git trees

-fsldma-the-mpc8377mds-board-device-tree-node-for-fsldma-driver.patch

 Merged into mainline or a subsystem tree

+capabilities-add-back-dummy-support-for-keepcaps.patch
+cciss-add-new-hardware-support.patch
+cciss-add-new-hardware-support-fix.patch
+cciss-bump-version-to-20-to-reflect-new-hw-support.patch
+kprobes-fix-error-checking-of-batch-registration.patch
+m68knommu-init-coldfire-timer-trr-with-n-1-not-n.patch
+rtc-at32ap700x-fix-bug-in-at32_rtc_readalarm.patch

 2.6.26 queue

-acpi-video-balcklist-fujitsu-lifebook-s6410.patch

 Dropped

-bay-exit-if-notify-handler-cannot-be-installed.patch

 Dropped

-intel-agp-rewrite-gtt-on-resume-fix.patch
-intel-agp-rewrite-gtt-on-resume-fix-fix.patch

 Folded into intel-agp-rewrite-gtt-on-resume.patch

-kbuild-move-non-__kernel__-checking-headers-to-header-y.patch

 Dropped

+8390-split-8390-support-into-a-pausing-and-a-non-pausing-driver-core-fix.patch

 Unfix
 8390-split-8390-support-into-a-pausing-and-a-non-pausing-driver-core.patch

+mpc8xxx_wdt-various-renames-mostly-s-mpc83xx-mpc8xxx-g-fix.patch

 Fix mpc8xxx_wdt-various-renames-mostly-s-mpc83xx-mpc8xxx-g.patch

+intel_rng-make-device-not-found-a-warning.patch
+driver-video-cirrusfb-fix-ram-address-printk.patch
+driver-video-cirrusfb-fix-ram-address-printk-fix.patch
+driver-video-cirrusfb-fix-ram-address-printk-fix-fix.patch
+driver-char-generic_nvram-fix-banner.patch
+pagemap-pass-mm-into-pagewalkers.patch
+pagemap-fix-large-pages-in-pagemap.patch
+proc-sysvipc-shm-fix-32-bit-truncation-of-segment-sizes.patch
+console-keyboard-mapping-broken-by-04c71976.patch

 More 2.6.26 queue

+bay-exit-if-notify-handler-cannot-be-installed.patch

 ACPI Bay driver fix

+smc91x-fix-build-error-from-the-smc_get_mac_addr-api-change.patch

 netdev fix

+add-a-helper-function-to-test-if-an-object-is-on-the-stack.patch

 Infrastructure

+hugetlb-introduce-pud_huge-s390-fix.patch

 Fix hugetlb-introduce-pud_huge.patch some more

+kprobes-remove-redundant-config-check.patch
+kprobes-indirectly-call-kprobe_target.patch
+kprobes-add-tests-for-register_kprobes.patch

 kprobers updates

+not-for-merging-pnp-changes-suspend-oops.patch

 Try to debug some pnp problems

+quota-move-function-macros-from-quotah-to-quotaopsh-fix.patch

 Fix quota-move-function-macros-from-quotah-to-quotaopsh.patch some more

+memcg-remove-refcnt-from-page_cgroup-fix-2.patch

 Fix memcg-remove-refcnt-from-page_cgroup-fix.patch

+sgi-xp-eliminate-in-comments.patch
+sgi-xp-use-standard-bitops-macros-and-functions.patch
+sgi-xp-add-jiffies-to-reserved-pages-timestamp-name.patch

 Update SGI XP driver

+dma-mapping-add-the-device-argument-to-dma_mapping_error-b34-fix.patch

 Fix dma-mapping-add-the-device-argument-to-dma_mapping_error.patch som more

+include-linux-aioh-removed-duplicated-include.patch

 AIO cleanup

+kernel-call-constructors-uml-fix-1.patch
+kernel-call-constructors-uml-fix-2.patch

 Fix kernel-call-constructors.patch

+x86-support-1gb-hugepages-with-get_user_pages_lockless.patch

 Wire up x86 large large pages

+mm-speculative-page-references-hugh-fix3.patch

 Fix mm-speculative-page-references.patch som more

 vmscan-move-isolate_lru_page-to-vmscanc.patch
+vmscan-move-isolate_lru_page-to-vmscanc-fix.patch
 vmscan-use-an-indexed-array-for-lru-variables.patch
-vmscan-use-an-array-for-the-lru-pagevecs.patch
+swap-use-an-array-for-the-lru-pagevecs.patch
 vmscan-free-swap-space-on-swap-in-activation.patch
-vmscan-define-page_file_cache-function.patch
+define-page_file_cache-function.patch
 vmscan-split-lru-lists-into-anon-file-sets.patch
 vmscan-second-chance-replacement-for-anonymous-pages.patch
-vmscan-add-some-sanity-checks-to-get_scan_ratio.patch
 vmscan-fix-pagecache-reclaim-referenced-bit-check.patch
 vmscan-add-newly-swapped-in-pages-to-the-inactive-list.patch
-vmscan-more-aggressively-use-lumpy-reclaim.patch
-vmscan-pageflag-helpers-for-configed-out-flags.patch
-vmscan-noreclaim-lru-infrastructure.patch
-vmscan-noreclaim-lru-page-statistics.patch
-vmscan-ramfs-and-ram-disk-pages-are-non-reclaimable.patch
-vmscan-shm_locked-pages-are-non-reclaimable.patch
-vmscan-mlocked-pages-are-non-reclaimable.patch
-vmscan-downgrade-mmap-sem-while-populating-mlocked-regions.patch
-vmscan-handle-mlocked-pages-during-map-remap-unmap.patch
-vmscan-mlocked-pages-statistics.patch
-vmscan-cull-non-reclaimable-pages-in-fault-path.patch
-vmscan-noreclaim-and-mlocked-pages-vm-events.patch
-mm-only-vmscan-noreclaim-lru-scan-sysctl.patch
-vmscan-mlocked-pages-count-attempts-to-free-mlocked-page.patch
-vmscan-noreclaim-lru-and-mlocked-pages-documentation.patch
+more-aggressively-use-lumpy-reclaim.patch
+pageflag-helpers-for-configed-out-flags.patch
+unevictable-lru-infrastructure.patch
+unevictable-lru-page-statistics.patch
+ramfs-and-ram-disk-pages-are-unevictable.patch
+shm_locked-pages-are-unevictable.patch
+mlock-mlocked-pages-are-unevictable.patch
+mlock-mlocked-pages-are-unevictable-fix.patch
+mlock-mlocked-pages-are-unevictable-fix-fix.patch
+mlock-mlocked-pages-are-unevictable-fix-2.patch
+mlock-downgrade-mmap-sem-while-populating-mlocked-regions.patch
+mmap-handle-mlocked-pages-during-map-remap-unmap.patch
+mmap-handle-mlocked-pages-during-map-remap-unmap-cleanup.patch
+vmstat-mlocked-pages-statistics.patch
+swap-cull-unevictable-pages-in-fault-path.patch
+vmstat-unevictable-and-mlocked-pages-vm-events.patch
+vmscan-unevictable-lru-scan-sysctl.patch
+vmscan-unevictable-lru-scan-sysctl-nommu-fix.patch
+mlock-count-attempts-to-free-mlocked-page.patch
+doc-unevictable-lru-and-mlocked-pages-documentation.patch

 New iteration of Rik's page reclaim work

1390 commits in 967 patch files



All patches:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/patch-list


--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* 2.6.26-rc5-mm3: kernel BUG at mm/vmscan.c:510
  2008-06-12  5:59 ` 2.6.26-rc5-mm3 Andrew Morton
  (?)
@ 2008-06-12  7:58   ` Alexey Dobriyan
  -1 siblings, 0 replies; 290+ messages in thread
From: Alexey Dobriyan @ 2008-06-12  7:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, kernel-testers, linux-mm, riel, npiggin

[  254.217776] ------------[ cut here ]------------
[  254.217776] kernel BUG at mm/vmscan.c:510!
[  254.217776] invalid opcode: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
[  254.217776] last sysfs file: /sys/kernel/uevent_seqnum
[  254.217776] CPU 1 
[  254.217776] Modules linked in: ext2 nf_conntrack_irc xt_state iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables usblp ehci_hcd uhci_hcd usbcore sr_mod cdrom
[  254.217776] Pid: 12044, comm: madvise02 Not tainted 2.6.26-rc5-mm3 #4
[  254.217776] RIP: 0010:[<ffffffff802729b2>]  [<ffffffff802729b2>] putback_lru_page+0x152/0x160
[  254.217776] RSP: 0018:ffff81012edd1cd8  EFLAGS: 00010202
[  254.217776] RAX: ffffe20003f344b8 RBX: 0000000000000000 RCX: 0000000000000001
[  254.217776] RDX: 0000000000005d5c RSI: 0000000000000000 RDI: ffffe20003f344b8
[  254.217776] RBP: ffff81012edd1cf8 R08: 0000000000000000 R09: 0000000000000000
[  254.217776] R10: ffffffff80275152 R11: 0000000000000001 R12: ffffe20003f344b8
[  254.217776] R13: 00000000ffffffff R14: ffff810124801080 R15: ffffffffffffffff
[  254.217776] FS:  00007fb3ad83c6f0(0000) GS:ffff81017f845320(0000) knlGS:0000000000000000
[  254.217776] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  254.217776] CR2: 00007fffb5846d38 CR3: 0000000117de9000 CR4: 00000000000006e0
[  254.217776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  254.217776] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  254.217776] Process madvise02 (pid: 12044, threadinfo ffff81012edd0000, task ffff81017db6b3c0)
[  254.217776] Stack:  ffffe20003f344b8 ffffe20003f344b8 ffffffff80629300 0000000000000001
[  254.217776]  ffff81012edd1d18 ffffffff8027d268 ffffe20003f344b8 0000000000000000
[  254.217776]  ffff81012edd1d38 ffffffff80271783 0000000000000246 ffffe20003f344b8
[  254.217776] Call Trace:
[  254.217776]  [<ffffffff8027d268>] __clear_page_mlock+0xe8/0x100
[  254.217776]  [<ffffffff80271783>] truncate_complete_page+0x73/0x80
[  254.217776]  [<ffffffff80271871>] truncate_inode_pages_range+0xe1/0x3c0
[  254.217776]  [<ffffffff80271b60>] truncate_inode_pages+0x10/0x20
[  254.217776]  [<ffffffff802e9738>] ext3_delete_inode+0x18/0xf0
[  254.217776]  [<ffffffff802e9720>] ? ext3_delete_inode+0x0/0xf0
[  254.217776]  [<ffffffff802aa27b>] generic_delete_inode+0x7b/0x100
[  254.217776]  [<ffffffff802aa43c>] generic_drop_inode+0x13c/0x180
[  254.217776]  [<ffffffff802a960d>] iput+0x5d/0x70
[  254.217776]  [<ffffffff8029f43e>] do_unlinkat+0x13e/0x1e0
[  254.217776]  [<ffffffff8046de77>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  254.217776]  [<ffffffff80255c69>] ? trace_hardirqs_on_caller+0xc9/0x150
[  254.217776]  [<ffffffff8046de77>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  254.217776]  [<ffffffff8029f4f1>] sys_unlink+0x11/0x20
[  254.217776]  [<ffffffff8020b6bb>] system_call_after_swapgs+0x7b/0x80
[  254.217776] 
[  254.217776] 
[  254.217776] Code: 0f 0b eb fe 0f 1f 44 00 00 f6 47 01 40 48 89 f8 75 1d 83 78 08 01 75 13 4c 89 e7 31 db e8 97 44 ff ff e9 2b ff ff ff 0f 0b eb fe <0f> 0b eb fe 48 8b 47 10 eb dd 0f 1f 40 00 55 48 89 e5 41 57 45 
[  254.217776] RIP  [<ffffffff802729b2>] putback_lru_page+0x152/0x160
[  254.217776]  RSP <ffff81012edd1cd8>
[  254.234540] ---[ end trace a1dd07b571590cc8 ]---


^ permalink raw reply	[flat|nested] 290+ messages in thread

* 2.6.26-rc5-mm3: kernel BUG at mm/vmscan.c:510
@ 2008-06-12  7:58   ` Alexey Dobriyan
  0 siblings, 0 replies; 290+ messages in thread
From: Alexey Dobriyan @ 2008-06-12  7:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, kernel-testers, linux-mm, riel, npiggin

[  254.217776] ------------[ cut here ]------------
[  254.217776] kernel BUG at mm/vmscan.c:510!
[  254.217776] invalid opcode: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
[  254.217776] last sysfs file: /sys/kernel/uevent_seqnum
[  254.217776] CPU 1 
[  254.217776] Modules linked in: ext2 nf_conntrack_irc xt_state iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables usblp ehci_hcd uhci_hcd usbcore sr_mod cdrom
[  254.217776] Pid: 12044, comm: madvise02 Not tainted 2.6.26-rc5-mm3 #4
[  254.217776] RIP: 0010:[<ffffffff802729b2>]  [<ffffffff802729b2>] putback_lru_page+0x152/0x160
[  254.217776] RSP: 0018:ffff81012edd1cd8  EFLAGS: 00010202
[  254.217776] RAX: ffffe20003f344b8 RBX: 0000000000000000 RCX: 0000000000000001
[  254.217776] RDX: 0000000000005d5c RSI: 0000000000000000 RDI: ffffe20003f344b8
[  254.217776] RBP: ffff81012edd1cf8 R08: 0000000000000000 R09: 0000000000000000
[  254.217776] R10: ffffffff80275152 R11: 0000000000000001 R12: ffffe20003f344b8
[  254.217776] R13: 00000000ffffffff R14: ffff810124801080 R15: ffffffffffffffff
[  254.217776] FS:  00007fb3ad83c6f0(0000) GS:ffff81017f845320(0000) knlGS:0000000000000000
[  254.217776] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  254.217776] CR2: 00007fffb5846d38 CR3: 0000000117de9000 CR4: 00000000000006e0
[  254.217776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  254.217776] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  254.217776] Process madvise02 (pid: 12044, threadinfo ffff81012edd0000, task ffff81017db6b3c0)
[  254.217776] Stack:  ffffe20003f344b8 ffffe20003f344b8 ffffffff80629300 0000000000000001
[  254.217776]  ffff81012edd1d18 ffffffff8027d268 ffffe20003f344b8 0000000000000000
[  254.217776]  ffff81012edd1d38 ffffffff80271783 0000000000000246 ffffe20003f344b8
[  254.217776] Call Trace:
[  254.217776]  [<ffffffff8027d268>] __clear_page_mlock+0xe8/0x100
[  254.217776]  [<ffffffff80271783>] truncate_complete_page+0x73/0x80
[  254.217776]  [<ffffffff80271871>] truncate_inode_pages_range+0xe1/0x3c0
[  254.217776]  [<ffffffff80271b60>] truncate_inode_pages+0x10/0x20
[  254.217776]  [<ffffffff802e9738>] ext3_delete_inode+0x18/0xf0
[  254.217776]  [<ffffffff802e9720>] ? ext3_delete_inode+0x0/0xf0
[  254.217776]  [<ffffffff802aa27b>] generic_delete_inode+0x7b/0x100
[  254.217776]  [<ffffffff802aa43c>] generic_drop_inode+0x13c/0x180
[  254.217776]  [<ffffffff802a960d>] iput+0x5d/0x70
[  254.217776]  [<ffffffff8029f43e>] do_unlinkat+0x13e/0x1e0
[  254.217776]  [<ffffffff8046de77>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  254.217776]  [<ffffffff80255c69>] ? trace_hardirqs_on_caller+0xc9/0x150
[  254.217776]  [<ffffffff8046de77>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  254.217776]  [<ffffffff8029f4f1>] sys_unlink+0x11/0x20
[  254.217776]  [<ffffffff8020b6bb>] system_call_after_swapgs+0x7b/0x80
[  254.217776] 
[  254.217776] 
[  254.217776] Code: 0f 0b eb fe 0f 1f 44 00 00 f6 47 01 40 48 89 f8 75 1d 83 78 08 01 75 13 4c 89 e7 31 db e8 97 44 ff ff e9 2b ff ff ff 0f 0b eb fe <0f> 0b eb fe 48 8b 47 10 eb dd 0f 1f 40 00 55 48 89 e5 41 57 45 
[  254.217776] RIP  [<ffffffff802729b2>] putback_lru_page+0x152/0x160
[  254.217776]  RSP <ffff81012edd1cd8>
[  254.234540] ---[ end trace a1dd07b571590cc8 ]---

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* 2.6.26-rc5-mm3: kernel BUG at mm/vmscan.c:510
@ 2008-06-12  7:58   ` Alexey Dobriyan
  0 siblings, 0 replies; 290+ messages in thread
From: Alexey Dobriyan @ 2008-06-12  7:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, riel-H+wXaHxf7aLQT0dZR+AlfA,
	npiggin-l3A5Bk7waGM

[  254.217776] ------------[ cut here ]------------
[  254.217776] kernel BUG at mm/vmscan.c:510!
[  254.217776] invalid opcode: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
[  254.217776] last sysfs file: /sys/kernel/uevent_seqnum
[  254.217776] CPU 1 
[  254.217776] Modules linked in: ext2 nf_conntrack_irc xt_state iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables usblp ehci_hcd uhci_hcd usbcore sr_mod cdrom
[  254.217776] Pid: 12044, comm: madvise02 Not tainted 2.6.26-rc5-mm3 #4
[  254.217776] RIP: 0010:[<ffffffff802729b2>]  [<ffffffff802729b2>] putback_lru_page+0x152/0x160
[  254.217776] RSP: 0018:ffff81012edd1cd8  EFLAGS: 00010202
[  254.217776] RAX: ffffe20003f344b8 RBX: 0000000000000000 RCX: 0000000000000001
[  254.217776] RDX: 0000000000005d5c RSI: 0000000000000000 RDI: ffffe20003f344b8
[  254.217776] RBP: ffff81012edd1cf8 R08: 0000000000000000 R09: 0000000000000000
[  254.217776] R10: ffffffff80275152 R11: 0000000000000001 R12: ffffe20003f344b8
[  254.217776] R13: 00000000ffffffff R14: ffff810124801080 R15: ffffffffffffffff
[  254.217776] FS:  00007fb3ad83c6f0(0000) GS:ffff81017f845320(0000) knlGS:0000000000000000
[  254.217776] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  254.217776] CR2: 00007fffb5846d38 CR3: 0000000117de9000 CR4: 00000000000006e0
[  254.217776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  254.217776] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  254.217776] Process madvise02 (pid: 12044, threadinfo ffff81012edd0000, task ffff81017db6b3c0)
[  254.217776] Stack:  ffffe20003f344b8 ffffe20003f344b8 ffffffff80629300 0000000000000001
[  254.217776]  ffff81012edd1d18 ffffffff8027d268 ffffe20003f344b8 0000000000000000
[  254.217776]  ffff81012edd1d38 ffffffff80271783 0000000000000246 ffffe20003f344b8
[  254.217776] Call Trace:
[  254.217776]  [<ffffffff8027d268>] __clear_page_mlock+0xe8/0x100
[  254.217776]  [<ffffffff80271783>] truncate_complete_page+0x73/0x80
[  254.217776]  [<ffffffff80271871>] truncate_inode_pages_range+0xe1/0x3c0
[  254.217776]  [<ffffffff80271b60>] truncate_inode_pages+0x10/0x20
[  254.217776]  [<ffffffff802e9738>] ext3_delete_inode+0x18/0xf0
[  254.217776]  [<ffffffff802e9720>] ? ext3_delete_inode+0x0/0xf0
[  254.217776]  [<ffffffff802aa27b>] generic_delete_inode+0x7b/0x100
[  254.217776]  [<ffffffff802aa43c>] generic_drop_inode+0x13c/0x180
[  254.217776]  [<ffffffff802a960d>] iput+0x5d/0x70
[  254.217776]  [<ffffffff8029f43e>] do_unlinkat+0x13e/0x1e0
[  254.217776]  [<ffffffff8046de77>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  254.217776]  [<ffffffff80255c69>] ? trace_hardirqs_on_caller+0xc9/0x150
[  254.217776]  [<ffffffff8046de77>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  254.217776]  [<ffffffff8029f4f1>] sys_unlink+0x11/0x20
[  254.217776]  [<ffffffff8020b6bb>] system_call_after_swapgs+0x7b/0x80
[  254.217776] 
[  254.217776] 
[  254.217776] Code: 0f 0b eb fe 0f 1f 44 00 00 f6 47 01 40 48 89 f8 75 1d 83 78 08 01 75 13 4c 89 e7 31 db e8 97 44 ff ff e9 2b ff ff ff 0f 0b eb fe <0f> 0b eb fe 48 8b 47 10 eb dd 0f 1f 40 00 55 48 89 e5 41 57 45 
[  254.217776] RIP  [<ffffffff802729b2>] putback_lru_page+0x152/0x160
[  254.217776]  RSP <ffff81012edd1cd8>
[  254.234540] ---[ end trace a1dd07b571590cc8 ]---

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3: kernel BUG at mm/vmscan.c:510
  2008-06-12  7:58   ` Alexey Dobriyan
  (?)
@ 2008-06-12  8:22     ` Andrew Morton
  -1 siblings, 0 replies; 290+ messages in thread
From: Andrew Morton @ 2008-06-12  8:22 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: linux-kernel, kernel-testers, linux-mm, riel, npiggin, Lee Schermerhorn

On Thu, 12 Jun 2008 11:58:58 +0400 Alexey Dobriyan <adobriyan@gmail.com> wrote:

> [  254.217776] ------------[ cut here ]------------
> [  254.217776] kernel BUG at mm/vmscan.c:510!
> [  254.217776] invalid opcode: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
> [  254.217776] last sysfs file: /sys/kernel/uevent_seqnum
> [  254.217776] CPU 1 
> [  254.217776] Modules linked in: ext2 nf_conntrack_irc xt_state iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables usblp ehci_hcd uhci_hcd usbcore sr_mod cdrom
> [  254.217776] Pid: 12044, comm: madvise02 Not tainted 2.6.26-rc5-mm3 #4
> [  254.217776] RIP: 0010:[<ffffffff802729b2>]  [<ffffffff802729b2>] putback_lru_page+0x152/0x160
> [  254.217776] RSP: 0018:ffff81012edd1cd8  EFLAGS: 00010202
> [  254.217776] RAX: ffffe20003f344b8 RBX: 0000000000000000 RCX: 0000000000000001
> [  254.217776] RDX: 0000000000005d5c RSI: 0000000000000000 RDI: ffffe20003f344b8
> [  254.217776] RBP: ffff81012edd1cf8 R08: 0000000000000000 R09: 0000000000000000
> [  254.217776] R10: ffffffff80275152 R11: 0000000000000001 R12: ffffe20003f344b8
> [  254.217776] R13: 00000000ffffffff R14: ffff810124801080 R15: ffffffffffffffff
> [  254.217776] FS:  00007fb3ad83c6f0(0000) GS:ffff81017f845320(0000) knlGS:0000000000000000
> [  254.217776] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  254.217776] CR2: 00007fffb5846d38 CR3: 0000000117de9000 CR4: 00000000000006e0
> [  254.217776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  254.217776] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  254.217776] Process madvise02 (pid: 12044, threadinfo ffff81012edd0000, task ffff81017db6b3c0)
> [  254.217776] Stack:  ffffe20003f344b8 ffffe20003f344b8 ffffffff80629300 0000000000000001
> [  254.217776]  ffff81012edd1d18 ffffffff8027d268 ffffe20003f344b8 0000000000000000
> [  254.217776]  ffff81012edd1d38 ffffffff80271783 0000000000000246 ffffe20003f344b8
> [  254.217776] Call Trace:
> [  254.217776]  [<ffffffff8027d268>] __clear_page_mlock+0xe8/0x100
> [  254.217776]  [<ffffffff80271783>] truncate_complete_page+0x73/0x80
> [  254.217776]  [<ffffffff80271871>] truncate_inode_pages_range+0xe1/0x3c0
> [  254.217776]  [<ffffffff80271b60>] truncate_inode_pages+0x10/0x20
> [  254.217776]  [<ffffffff802e9738>] ext3_delete_inode+0x18/0xf0
> [  254.217776]  [<ffffffff802e9720>] ? ext3_delete_inode+0x0/0xf0
> [  254.217776]  [<ffffffff802aa27b>] generic_delete_inode+0x7b/0x100
> [  254.217776]  [<ffffffff802aa43c>] generic_drop_inode+0x13c/0x180
> [  254.217776]  [<ffffffff802a960d>] iput+0x5d/0x70
> [  254.217776]  [<ffffffff8029f43e>] do_unlinkat+0x13e/0x1e0
> [  254.217776]  [<ffffffff8046de77>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [  254.217776]  [<ffffffff80255c69>] ? trace_hardirqs_on_caller+0xc9/0x150
> [  254.217776]  [<ffffffff8046de77>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [  254.217776]  [<ffffffff8029f4f1>] sys_unlink+0x11/0x20
> [  254.217776]  [<ffffffff8020b6bb>] system_call_after_swapgs+0x7b/0x80
> [  254.217776] 
> [  254.217776] 
> [  254.217776] Code: 0f 0b eb fe 0f 1f 44 00 00 f6 47 01 40 48 89 f8 75 1d 83 78 08 01 75 13 4c 89 e7 31 db e8 97 44 ff ff e9 2b ff ff ff 0f 0b eb fe <0f> 0b eb fe 48 8b 47 10 eb dd 0f 1f 40 00 55 48 89 e5 41 57 45 
> [  254.217776] RIP  [<ffffffff802729b2>] putback_lru_page+0x152/0x160
> [  254.217776]  RSP <ffff81012edd1cd8>
> [  254.234540] ---[ end trace a1dd07b571590cc8 ]---

int putback_lru_page(struct page *page)
{
	int lru;
	int ret = 1;
	int was_unevictable;

	VM_BUG_ON(!PageLocked(page));
	VM_BUG_ON(PageLRU(page));

	lru = !!TestClearPageActive(page);
	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */

	if (unlikely(!page->mapping)) {
		/*
		 * page truncated.  drop lock as put_page() will
		 * free the page.
		 */
		VM_BUG_ON(page_count(page) != 1);


added by unevictable-lru-infrastructure.patch.

How does one reproduce this?  Looks like LTP madvise2.


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3: kernel BUG at mm/vmscan.c:510
@ 2008-06-12  8:22     ` Andrew Morton
  0 siblings, 0 replies; 290+ messages in thread
From: Andrew Morton @ 2008-06-12  8:22 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: linux-kernel, kernel-testers, linux-mm, riel, npiggin, Lee Schermerhorn

On Thu, 12 Jun 2008 11:58:58 +0400 Alexey Dobriyan <adobriyan@gmail.com> wrote:

> [  254.217776] ------------[ cut here ]------------
> [  254.217776] kernel BUG at mm/vmscan.c:510!
> [  254.217776] invalid opcode: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
> [  254.217776] last sysfs file: /sys/kernel/uevent_seqnum
> [  254.217776] CPU 1 
> [  254.217776] Modules linked in: ext2 nf_conntrack_irc xt_state iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables usblp ehci_hcd uhci_hcd usbcore sr_mod cdrom
> [  254.217776] Pid: 12044, comm: madvise02 Not tainted 2.6.26-rc5-mm3 #4
> [  254.217776] RIP: 0010:[<ffffffff802729b2>]  [<ffffffff802729b2>] putback_lru_page+0x152/0x160
> [  254.217776] RSP: 0018:ffff81012edd1cd8  EFLAGS: 00010202
> [  254.217776] RAX: ffffe20003f344b8 RBX: 0000000000000000 RCX: 0000000000000001
> [  254.217776] RDX: 0000000000005d5c RSI: 0000000000000000 RDI: ffffe20003f344b8
> [  254.217776] RBP: ffff81012edd1cf8 R08: 0000000000000000 R09: 0000000000000000
> [  254.217776] R10: ffffffff80275152 R11: 0000000000000001 R12: ffffe20003f344b8
> [  254.217776] R13: 00000000ffffffff R14: ffff810124801080 R15: ffffffffffffffff
> [  254.217776] FS:  00007fb3ad83c6f0(0000) GS:ffff81017f845320(0000) knlGS:0000000000000000
> [  254.217776] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  254.217776] CR2: 00007fffb5846d38 CR3: 0000000117de9000 CR4: 00000000000006e0
> [  254.217776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  254.217776] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  254.217776] Process madvise02 (pid: 12044, threadinfo ffff81012edd0000, task ffff81017db6b3c0)
> [  254.217776] Stack:  ffffe20003f344b8 ffffe20003f344b8 ffffffff80629300 0000000000000001
> [  254.217776]  ffff81012edd1d18 ffffffff8027d268 ffffe20003f344b8 0000000000000000
> [  254.217776]  ffff81012edd1d38 ffffffff80271783 0000000000000246 ffffe20003f344b8
> [  254.217776] Call Trace:
> [  254.217776]  [<ffffffff8027d268>] __clear_page_mlock+0xe8/0x100
> [  254.217776]  [<ffffffff80271783>] truncate_complete_page+0x73/0x80
> [  254.217776]  [<ffffffff80271871>] truncate_inode_pages_range+0xe1/0x3c0
> [  254.217776]  [<ffffffff80271b60>] truncate_inode_pages+0x10/0x20
> [  254.217776]  [<ffffffff802e9738>] ext3_delete_inode+0x18/0xf0
> [  254.217776]  [<ffffffff802e9720>] ? ext3_delete_inode+0x0/0xf0
> [  254.217776]  [<ffffffff802aa27b>] generic_delete_inode+0x7b/0x100
> [  254.217776]  [<ffffffff802aa43c>] generic_drop_inode+0x13c/0x180
> [  254.217776]  [<ffffffff802a960d>] iput+0x5d/0x70
> [  254.217776]  [<ffffffff8029f43e>] do_unlinkat+0x13e/0x1e0
> [  254.217776]  [<ffffffff8046de77>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [  254.217776]  [<ffffffff80255c69>] ? trace_hardirqs_on_caller+0xc9/0x150
> [  254.217776]  [<ffffffff8046de77>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [  254.217776]  [<ffffffff8029f4f1>] sys_unlink+0x11/0x20
> [  254.217776]  [<ffffffff8020b6bb>] system_call_after_swapgs+0x7b/0x80
> [  254.217776] 
> [  254.217776] 
> [  254.217776] Code: 0f 0b eb fe 0f 1f 44 00 00 f6 47 01 40 48 89 f8 75 1d 83 78 08 01 75 13 4c 89 e7 31 db e8 97 44 ff ff e9 2b ff ff ff 0f 0b eb fe <0f> 0b eb fe 48 8b 47 10 eb dd 0f 1f 40 00 55 48 89 e5 41 57 45 
> [  254.217776] RIP  [<ffffffff802729b2>] putback_lru_page+0x152/0x160
> [  254.217776]  RSP <ffff81012edd1cd8>
> [  254.234540] ---[ end trace a1dd07b571590cc8 ]---

int putback_lru_page(struct page *page)
{
	int lru;
	int ret = 1;
	int was_unevictable;

	VM_BUG_ON(!PageLocked(page));
	VM_BUG_ON(PageLRU(page));

	lru = !!TestClearPageActive(page);
	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */

	if (unlikely(!page->mapping)) {
		/*
		 * page truncated.  drop lock as put_page() will
		 * free the page.
		 */
		VM_BUG_ON(page_count(page) != 1);


added by unevictable-lru-infrastructure.patch.

How does one reproduce this?  Looks like LTP madvise2.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3: kernel BUG at mm/vmscan.c:510
@ 2008-06-12  8:22     ` Andrew Morton
  0 siblings, 0 replies; 290+ messages in thread
From: Andrew Morton @ 2008-06-12  8:22 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, riel-H+wXaHxf7aLQT0dZR+AlfA,
	npiggin-l3A5Bk7waGM, Lee Schermerhorn

On Thu, 12 Jun 2008 11:58:58 +0400 Alexey Dobriyan <adobriyan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> [  254.217776] ------------[ cut here ]------------
> [  254.217776] kernel BUG at mm/vmscan.c:510!
> [  254.217776] invalid opcode: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
> [  254.217776] last sysfs file: /sys/kernel/uevent_seqnum
> [  254.217776] CPU 1 
> [  254.217776] Modules linked in: ext2 nf_conntrack_irc xt_state iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables usblp ehci_hcd uhci_hcd usbcore sr_mod cdrom
> [  254.217776] Pid: 12044, comm: madvise02 Not tainted 2.6.26-rc5-mm3 #4
> [  254.217776] RIP: 0010:[<ffffffff802729b2>]  [<ffffffff802729b2>] putback_lru_page+0x152/0x160
> [  254.217776] RSP: 0018:ffff81012edd1cd8  EFLAGS: 00010202
> [  254.217776] RAX: ffffe20003f344b8 RBX: 0000000000000000 RCX: 0000000000000001
> [  254.217776] RDX: 0000000000005d5c RSI: 0000000000000000 RDI: ffffe20003f344b8
> [  254.217776] RBP: ffff81012edd1cf8 R08: 0000000000000000 R09: 0000000000000000
> [  254.217776] R10: ffffffff80275152 R11: 0000000000000001 R12: ffffe20003f344b8
> [  254.217776] R13: 00000000ffffffff R14: ffff810124801080 R15: ffffffffffffffff
> [  254.217776] FS:  00007fb3ad83c6f0(0000) GS:ffff81017f845320(0000) knlGS:0000000000000000
> [  254.217776] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  254.217776] CR2: 00007fffb5846d38 CR3: 0000000117de9000 CR4: 00000000000006e0
> [  254.217776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  254.217776] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  254.217776] Process madvise02 (pid: 12044, threadinfo ffff81012edd0000, task ffff81017db6b3c0)
> [  254.217776] Stack:  ffffe20003f344b8 ffffe20003f344b8 ffffffff80629300 0000000000000001
> [  254.217776]  ffff81012edd1d18 ffffffff8027d268 ffffe20003f344b8 0000000000000000
> [  254.217776]  ffff81012edd1d38 ffffffff80271783 0000000000000246 ffffe20003f344b8
> [  254.217776] Call Trace:
> [  254.217776]  [<ffffffff8027d268>] __clear_page_mlock+0xe8/0x100
> [  254.217776]  [<ffffffff80271783>] truncate_complete_page+0x73/0x80
> [  254.217776]  [<ffffffff80271871>] truncate_inode_pages_range+0xe1/0x3c0
> [  254.217776]  [<ffffffff80271b60>] truncate_inode_pages+0x10/0x20
> [  254.217776]  [<ffffffff802e9738>] ext3_delete_inode+0x18/0xf0
> [  254.217776]  [<ffffffff802e9720>] ? ext3_delete_inode+0x0/0xf0
> [  254.217776]  [<ffffffff802aa27b>] generic_delete_inode+0x7b/0x100
> [  254.217776]  [<ffffffff802aa43c>] generic_drop_inode+0x13c/0x180
> [  254.217776]  [<ffffffff802a960d>] iput+0x5d/0x70
> [  254.217776]  [<ffffffff8029f43e>] do_unlinkat+0x13e/0x1e0
> [  254.217776]  [<ffffffff8046de77>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [  254.217776]  [<ffffffff80255c69>] ? trace_hardirqs_on_caller+0xc9/0x150
> [  254.217776]  [<ffffffff8046de77>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [  254.217776]  [<ffffffff8029f4f1>] sys_unlink+0x11/0x20
> [  254.217776]  [<ffffffff8020b6bb>] system_call_after_swapgs+0x7b/0x80
> [  254.217776] 
> [  254.217776] 
> [  254.217776] Code: 0f 0b eb fe 0f 1f 44 00 00 f6 47 01 40 48 89 f8 75 1d 83 78 08 01 75 13 4c 89 e7 31 db e8 97 44 ff ff e9 2b ff ff ff 0f 0b eb fe <0f> 0b eb fe 48 8b 47 10 eb dd 0f 1f 40 00 55 48 89 e5 41 57 45 
> [  254.217776] RIP  [<ffffffff802729b2>] putback_lru_page+0x152/0x160
> [  254.217776]  RSP <ffff81012edd1cd8>
> [  254.234540] ---[ end trace a1dd07b571590cc8 ]---

int putback_lru_page(struct page *page)
{
	int lru;
	int ret = 1;
	int was_unevictable;

	VM_BUG_ON(!PageLocked(page));
	VM_BUG_ON(PageLRU(page));

	lru = !!TestClearPageActive(page);
	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */

	if (unlikely(!page->mapping)) {
		/*
		 * page truncated.  drop lock as put_page() will
		 * free the page.
		 */
		VM_BUG_ON(page_count(page) != 1);


added by unevictable-lru-infrastructure.patch.

How does one reproduce this?  Looks like LTP madvise2.

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3: kernel BUG at mm/vmscan.c:510
  2008-06-12  8:22     ` Andrew Morton
  (?)
@ 2008-06-12  8:23       ` Alexey Dobriyan
  -1 siblings, 0 replies; 290+ messages in thread
From: Alexey Dobriyan @ 2008-06-12  8:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, kernel-testers, linux-mm, riel, npiggin, Lee Schermerhorn

On Thu, Jun 12, 2008 at 01:22:05AM -0700, Andrew Morton wrote:
> On Thu, 12 Jun 2008 11:58:58 +0400 Alexey Dobriyan <adobriyan@gmail.com> wrote:
> 
> > [  254.217776] ------------[ cut here ]------------
> > [  254.217776] kernel BUG at mm/vmscan.c:510!
> > [  254.217776] invalid opcode: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
> > [  254.217776] last sysfs file: /sys/kernel/uevent_seqnum
> > [  254.217776] CPU 1 
> > [  254.217776] Modules linked in: ext2 nf_conntrack_irc xt_state iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables usblp ehci_hcd uhci_hcd usbcore sr_mod cdrom
> > [  254.217776] Pid: 12044, comm: madvise02 Not tainted 2.6.26-rc5-mm3 #4
> > [  254.217776] RIP: 0010:[<ffffffff802729b2>]  [<ffffffff802729b2>] putback_lru_page+0x152/0x160
> > [  254.217776] RSP: 0018:ffff81012edd1cd8  EFLAGS: 00010202
> > [  254.217776] RAX: ffffe20003f344b8 RBX: 0000000000000000 RCX: 0000000000000001
> > [  254.217776] RDX: 0000000000005d5c RSI: 0000000000000000 RDI: ffffe20003f344b8
> > [  254.217776] RBP: ffff81012edd1cf8 R08: 0000000000000000 R09: 0000000000000000
> > [  254.217776] R10: ffffffff80275152 R11: 0000000000000001 R12: ffffe20003f344b8
> > [  254.217776] R13: 00000000ffffffff R14: ffff810124801080 R15: ffffffffffffffff
> > [  254.217776] FS:  00007fb3ad83c6f0(0000) GS:ffff81017f845320(0000) knlGS:0000000000000000
> > [  254.217776] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  254.217776] CR2: 00007fffb5846d38 CR3: 0000000117de9000 CR4: 00000000000006e0
> > [  254.217776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  254.217776] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [  254.217776] Process madvise02 (pid: 12044, threadinfo ffff81012edd0000, task ffff81017db6b3c0)
> > [  254.217776] Stack:  ffffe20003f344b8 ffffe20003f344b8 ffffffff80629300 0000000000000001
> > [  254.217776]  ffff81012edd1d18 ffffffff8027d268 ffffe20003f344b8 0000000000000000
> > [  254.217776]  ffff81012edd1d38 ffffffff80271783 0000000000000246 ffffe20003f344b8
> > [  254.217776] Call Trace:
> > [  254.217776]  [<ffffffff8027d268>] __clear_page_mlock+0xe8/0x100
> > [  254.217776]  [<ffffffff80271783>] truncate_complete_page+0x73/0x80
> > [  254.217776]  [<ffffffff80271871>] truncate_inode_pages_range+0xe1/0x3c0
> > [  254.217776]  [<ffffffff80271b60>] truncate_inode_pages+0x10/0x20
> > [  254.217776]  [<ffffffff802e9738>] ext3_delete_inode+0x18/0xf0
> > [  254.217776]  [<ffffffff802e9720>] ? ext3_delete_inode+0x0/0xf0
> > [  254.217776]  [<ffffffff802aa27b>] generic_delete_inode+0x7b/0x100
> > [  254.217776]  [<ffffffff802aa43c>] generic_drop_inode+0x13c/0x180
> > [  254.217776]  [<ffffffff802a960d>] iput+0x5d/0x70
> > [  254.217776]  [<ffffffff8029f43e>] do_unlinkat+0x13e/0x1e0
> > [  254.217776]  [<ffffffff8046de77>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > [  254.217776]  [<ffffffff80255c69>] ? trace_hardirqs_on_caller+0xc9/0x150
> > [  254.217776]  [<ffffffff8046de77>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > [  254.217776]  [<ffffffff8029f4f1>] sys_unlink+0x11/0x20
> > [  254.217776]  [<ffffffff8020b6bb>] system_call_after_swapgs+0x7b/0x80
> > [  254.217776] 
> > [  254.217776] 
> > [  254.217776] Code: 0f 0b eb fe 0f 1f 44 00 00 f6 47 01 40 48 89 f8 75 1d 83 78 08 01 75 13 4c 89 e7 31 db e8 97 44 ff ff e9 2b ff ff ff 0f 0b eb fe <0f> 0b eb fe 48 8b 47 10 eb dd 0f 1f 40 00 55 48 89 e5 41 57 45 
> > [  254.217776] RIP  [<ffffffff802729b2>] putback_lru_page+0x152/0x160
> > [  254.217776]  RSP <ffff81012edd1cd8>
> > [  254.234540] ---[ end trace a1dd07b571590cc8 ]---
> 
> int putback_lru_page(struct page *page)
> {
> 	int lru;
> 	int ret = 1;
> 	int was_unevictable;
> 
> 	VM_BUG_ON(!PageLocked(page));
> 	VM_BUG_ON(PageLRU(page));
> 
> 	lru = !!TestClearPageActive(page);
> 	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
> 
> 	if (unlikely(!page->mapping)) {
> 		/*
> 		 * page truncated.  drop lock as put_page() will
> 		 * free the page.
> 		 */
> 		VM_BUG_ON(page_count(page) != 1);
> 
> 
> added by unevictable-lru-infrastructure.patch.
> 
> How does one reproduce this?  Looks like LTP madvise2.

Yep, totally reproducible here:

	sudo ./testcases/bin/madvise02


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3: kernel BUG at mm/vmscan.c:510
@ 2008-06-12  8:23       ` Alexey Dobriyan
  0 siblings, 0 replies; 290+ messages in thread
From: Alexey Dobriyan @ 2008-06-12  8:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, kernel-testers, linux-mm, riel, npiggin, Lee Schermerhorn

On Thu, Jun 12, 2008 at 01:22:05AM -0700, Andrew Morton wrote:
> On Thu, 12 Jun 2008 11:58:58 +0400 Alexey Dobriyan <adobriyan@gmail.com> wrote:
> 
> > [  254.217776] ------------[ cut here ]------------
> > [  254.217776] kernel BUG at mm/vmscan.c:510!
> > [  254.217776] invalid opcode: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
> > [  254.217776] last sysfs file: /sys/kernel/uevent_seqnum
> > [  254.217776] CPU 1 
> > [  254.217776] Modules linked in: ext2 nf_conntrack_irc xt_state iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables usblp ehci_hcd uhci_hcd usbcore sr_mod cdrom
> > [  254.217776] Pid: 12044, comm: madvise02 Not tainted 2.6.26-rc5-mm3 #4
> > [  254.217776] RIP: 0010:[<ffffffff802729b2>]  [<ffffffff802729b2>] putback_lru_page+0x152/0x160
> > [  254.217776] RSP: 0018:ffff81012edd1cd8  EFLAGS: 00010202
> > [  254.217776] RAX: ffffe20003f344b8 RBX: 0000000000000000 RCX: 0000000000000001
> > [  254.217776] RDX: 0000000000005d5c RSI: 0000000000000000 RDI: ffffe20003f344b8
> > [  254.217776] RBP: ffff81012edd1cf8 R08: 0000000000000000 R09: 0000000000000000
> > [  254.217776] R10: ffffffff80275152 R11: 0000000000000001 R12: ffffe20003f344b8
> > [  254.217776] R13: 00000000ffffffff R14: ffff810124801080 R15: ffffffffffffffff
> > [  254.217776] FS:  00007fb3ad83c6f0(0000) GS:ffff81017f845320(0000) knlGS:0000000000000000
> > [  254.217776] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  254.217776] CR2: 00007fffb5846d38 CR3: 0000000117de9000 CR4: 00000000000006e0
> > [  254.217776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  254.217776] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [  254.217776] Process madvise02 (pid: 12044, threadinfo ffff81012edd0000, task ffff81017db6b3c0)
> > [  254.217776] Stack:  ffffe20003f344b8 ffffe20003f344b8 ffffffff80629300 0000000000000001
> > [  254.217776]  ffff81012edd1d18 ffffffff8027d268 ffffe20003f344b8 0000000000000000
> > [  254.217776]  ffff81012edd1d38 ffffffff80271783 0000000000000246 ffffe20003f344b8
> > [  254.217776] Call Trace:
> > [  254.217776]  [<ffffffff8027d268>] __clear_page_mlock+0xe8/0x100
> > [  254.217776]  [<ffffffff80271783>] truncate_complete_page+0x73/0x80
> > [  254.217776]  [<ffffffff80271871>] truncate_inode_pages_range+0xe1/0x3c0
> > [  254.217776]  [<ffffffff80271b60>] truncate_inode_pages+0x10/0x20
> > [  254.217776]  [<ffffffff802e9738>] ext3_delete_inode+0x18/0xf0
> > [  254.217776]  [<ffffffff802e9720>] ? ext3_delete_inode+0x0/0xf0
> > [  254.217776]  [<ffffffff802aa27b>] generic_delete_inode+0x7b/0x100
> > [  254.217776]  [<ffffffff802aa43c>] generic_drop_inode+0x13c/0x180
> > [  254.217776]  [<ffffffff802a960d>] iput+0x5d/0x70
> > [  254.217776]  [<ffffffff8029f43e>] do_unlinkat+0x13e/0x1e0
> > [  254.217776]  [<ffffffff8046de77>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > [  254.217776]  [<ffffffff80255c69>] ? trace_hardirqs_on_caller+0xc9/0x150
> > [  254.217776]  [<ffffffff8046de77>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > [  254.217776]  [<ffffffff8029f4f1>] sys_unlink+0x11/0x20
> > [  254.217776]  [<ffffffff8020b6bb>] system_call_after_swapgs+0x7b/0x80
> > [  254.217776] 
> > [  254.217776] 
> > [  254.217776] Code: 0f 0b eb fe 0f 1f 44 00 00 f6 47 01 40 48 89 f8 75 1d 83 78 08 01 75 13 4c 89 e7 31 db e8 97 44 ff ff e9 2b ff ff ff 0f 0b eb fe <0f> 0b eb fe 48 8b 47 10 eb dd 0f 1f 40 00 55 48 89 e5 41 57 45 
> > [  254.217776] RIP  [<ffffffff802729b2>] putback_lru_page+0x152/0x160
> > [  254.217776]  RSP <ffff81012edd1cd8>
> > [  254.234540] ---[ end trace a1dd07b571590cc8 ]---
> 
> int putback_lru_page(struct page *page)
> {
> 	int lru;
> 	int ret = 1;
> 	int was_unevictable;
> 
> 	VM_BUG_ON(!PageLocked(page));
> 	VM_BUG_ON(PageLRU(page));
> 
> 	lru = !!TestClearPageActive(page);
> 	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
> 
> 	if (unlikely(!page->mapping)) {
> 		/*
> 		 * page truncated.  drop lock as put_page() will
> 		 * free the page.
> 		 */
> 		VM_BUG_ON(page_count(page) != 1);
> 
> 
> added by unevictable-lru-infrastructure.patch.
> 
> How does one reproduce this?  Looks like LTP madvise2.

Yep, totally reproducible here:

	sudo ./testcases/bin/madvise02

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3: kernel BUG at mm/vmscan.c:510
@ 2008-06-12  8:23       ` Alexey Dobriyan
  0 siblings, 0 replies; 290+ messages in thread
From: Alexey Dobriyan @ 2008-06-12  8:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, riel-H+wXaHxf7aLQT0dZR+AlfA,
	npiggin-l3A5Bk7waGM, Lee Schermerhorn

On Thu, Jun 12, 2008 at 01:22:05AM -0700, Andrew Morton wrote:
> On Thu, 12 Jun 2008 11:58:58 +0400 Alexey Dobriyan <adobriyan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> 
> > [  254.217776] ------------[ cut here ]------------
> > [  254.217776] kernel BUG at mm/vmscan.c:510!
> > [  254.217776] invalid opcode: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
> > [  254.217776] last sysfs file: /sys/kernel/uevent_seqnum
> > [  254.217776] CPU 1 
> > [  254.217776] Modules linked in: ext2 nf_conntrack_irc xt_state iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables usblp ehci_hcd uhci_hcd usbcore sr_mod cdrom
> > [  254.217776] Pid: 12044, comm: madvise02 Not tainted 2.6.26-rc5-mm3 #4
> > [  254.217776] RIP: 0010:[<ffffffff802729b2>]  [<ffffffff802729b2>] putback_lru_page+0x152/0x160
> > [  254.217776] RSP: 0018:ffff81012edd1cd8  EFLAGS: 00010202
> > [  254.217776] RAX: ffffe20003f344b8 RBX: 0000000000000000 RCX: 0000000000000001
> > [  254.217776] RDX: 0000000000005d5c RSI: 0000000000000000 RDI: ffffe20003f344b8
> > [  254.217776] RBP: ffff81012edd1cf8 R08: 0000000000000000 R09: 0000000000000000
> > [  254.217776] R10: ffffffff80275152 R11: 0000000000000001 R12: ffffe20003f344b8
> > [  254.217776] R13: 00000000ffffffff R14: ffff810124801080 R15: ffffffffffffffff
> > [  254.217776] FS:  00007fb3ad83c6f0(0000) GS:ffff81017f845320(0000) knlGS:0000000000000000
> > [  254.217776] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  254.217776] CR2: 00007fffb5846d38 CR3: 0000000117de9000 CR4: 00000000000006e0
> > [  254.217776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  254.217776] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [  254.217776] Process madvise02 (pid: 12044, threadinfo ffff81012edd0000, task ffff81017db6b3c0)
> > [  254.217776] Stack:  ffffe20003f344b8 ffffe20003f344b8 ffffffff80629300 0000000000000001
> > [  254.217776]  ffff81012edd1d18 ffffffff8027d268 ffffe20003f344b8 0000000000000000
> > [  254.217776]  ffff81012edd1d38 ffffffff80271783 0000000000000246 ffffe20003f344b8
> > [  254.217776] Call Trace:
> > [  254.217776]  [<ffffffff8027d268>] __clear_page_mlock+0xe8/0x100
> > [  254.217776]  [<ffffffff80271783>] truncate_complete_page+0x73/0x80
> > [  254.217776]  [<ffffffff80271871>] truncate_inode_pages_range+0xe1/0x3c0
> > [  254.217776]  [<ffffffff80271b60>] truncate_inode_pages+0x10/0x20
> > [  254.217776]  [<ffffffff802e9738>] ext3_delete_inode+0x18/0xf0
> > [  254.217776]  [<ffffffff802e9720>] ? ext3_delete_inode+0x0/0xf0
> > [  254.217776]  [<ffffffff802aa27b>] generic_delete_inode+0x7b/0x100
> > [  254.217776]  [<ffffffff802aa43c>] generic_drop_inode+0x13c/0x180
> > [  254.217776]  [<ffffffff802a960d>] iput+0x5d/0x70
> > [  254.217776]  [<ffffffff8029f43e>] do_unlinkat+0x13e/0x1e0
> > [  254.217776]  [<ffffffff8046de77>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > [  254.217776]  [<ffffffff80255c69>] ? trace_hardirqs_on_caller+0xc9/0x150
> > [  254.217776]  [<ffffffff8046de77>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > [  254.217776]  [<ffffffff8029f4f1>] sys_unlink+0x11/0x20
> > [  254.217776]  [<ffffffff8020b6bb>] system_call_after_swapgs+0x7b/0x80
> > [  254.217776] 
> > [  254.217776] 
> > [  254.217776] Code: 0f 0b eb fe 0f 1f 44 00 00 f6 47 01 40 48 89 f8 75 1d 83 78 08 01 75 13 4c 89 e7 31 db e8 97 44 ff ff e9 2b ff ff ff 0f 0b eb fe <0f> 0b eb fe 48 8b 47 10 eb dd 0f 1f 40 00 55 48 89 e5 41 57 45 
> > [  254.217776] RIP  [<ffffffff802729b2>] putback_lru_page+0x152/0x160
> > [  254.217776]  RSP <ffff81012edd1cd8>
> > [  254.234540] ---[ end trace a1dd07b571590cc8 ]---
> 
> int putback_lru_page(struct page *page)
> {
> 	int lru;
> 	int ret = 1;
> 	int was_unevictable;
> 
> 	VM_BUG_ON(!PageLocked(page));
> 	VM_BUG_ON(PageLRU(page));
> 
> 	lru = !!TestClearPageActive(page);
> 	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
> 
> 	if (unlikely(!page->mapping)) {
> 		/*
> 		 * page truncated.  drop lock as put_page() will
> 		 * free the page.
> 		 */
> 		VM_BUG_ON(page_count(page) != 1);
> 
> 
> added by unevictable-lru-infrastructure.patch.
> 
> How does one reproduce this?  Looks like LTP madvise2.

Yep, totally reproducible here:

	sudo ./testcases/bin/madvise02

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
  2008-06-12  5:59 ` 2.6.26-rc5-mm3 Andrew Morton
  (?)
@ 2008-06-12  8:44   ` Kamalesh Babulal
  -1 siblings, 0 replies; 290+ messages in thread
From: Kamalesh Babulal @ 2008-06-12  8:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, kernel-testers, linux-mm, Nick Piggin, Andy Whitcroft

Hi Andrew,

2.6.26-rc5-mm3 kernel panics while booting up on the x86_64
machine. Sorry the console is bit overwritten for the first few lines.

------------[ cut here ]------------
ot fs
no fstab.kernel BUG at mm/filemap.c:575!
sys, mounting ininvalid opcode: 0000 [1] ternal defaultsSMP 
Switching to ne
w root and runnilast sysfs file: /sys/block/dm-3/removable
ng init.
unmounCPU 3 ting old /dev
u
nmounting old /pModules linked in:roc
unmounting 
old /sys
Pid: 1, comm: init Not tainted 2.6.26-rc5-mm3-autotest #1
RIP: 0010:[<ffffffff80268155>]  [<ffffffff80268155>] unlock_page+0xf/0x26
RSP: 0018:ffff81003f9e1dc8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffe20000f63080 RCX: 0000000000000036
RDX: 0000000000000000 RSI: ffffe20000f63080 RDI: ffffe20000f63080
RBP: 0000000000000000 R08: ffff81003f9a5727 R09: ffffc10000200200
R10: ffffc10000100100 R11: 000000000000000e R12: 0000000000000000
R13: 0000000000000000 R14: ffff81003f47aed8 R15: 0000000000000000
FS:  000000000066d870(0063) GS:ffff81003f99fa80(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000065afa0 CR3: 000000003d580000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process init (pid: 1, threadinfo ffff81003f9e0000, task ffff81003f9d8000)
Stack:  ffffe20000f63080 ffffffff80270d9c 0000000000000000 ffffffffffffffff
 000000000000000e 0000000000000000 ffffe20000f63080 ffffe20000f630c0
 ffffe20000f63100 ffffe20000f63140 ffffe20000f63180 ffffe20000f631c0
Call Trace:
 [<ffffffff80270d9c>] truncate_inode_pages_range+0xc5/0x305
 [<ffffffff802a7177>] generic_delete_inode+0xc9/0x133
 [<ffffffff8029e3cd>] do_unlinkat+0xf0/0x160
 [<ffffffff8020bd0b>] system_call_after_swapgs+0x7b/0x80


Code: 00 00 48 85 c0 74 0b 48 8b 40 10 48 85 c0 74 02 ff d0 e8 75 ec 32 00 41 5b 31 c0 c3 53 48 89 fb f0 0f ba 37 00 19 c0 85 c0 75 04 <0f> 0b eb fe e8 56 f5 ff ff 48 89 de 48 89 c7 31 d2 5b e9 47 be 
RIP  [<ffffffff80268155>] unlock_page+0xf/0x26
 RSP <ffff81003f9e1dc8>
---[ end trace 27b1d01b03af7c12 ]---
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init Tainted: G      D   2.6.26-rc5-mm3-autotest #1

Call Trace:
 [<ffffffff80232d87>] panic+0x86/0x144
 [<ffffffff80233a09>] printk+0x4e/0x56
 [<ffffffff80235740>] do_exit+0x71/0x67c
 [<ffffffff80598691>] oops_begin+0x0/0x8c
 [<ffffffff8020dbc0>] do_invalid_op+0x87/0x91
 [<ffffffff80268155>] unlock_page+0xf/0x26
 [<ffffffff805982d9>] error_exit+0x0/0x51
 [<ffffffff80268155>] unlock_page+0xf/0x26
 [<ffffffff80270d9c>] truncate_inode_pages_range+0xc5/0x305
 [<ffffffff802a7177>] generic_delete_inode+0xc9/0x133
 [<ffffffff8029e3cd>] do_unlinkat+0xf0/0x160
 [<ffffffff8020bd0b>] system_call_after_swapgs+0x7b/0x80


-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-12  8:44   ` Kamalesh Babulal
  0 siblings, 0 replies; 290+ messages in thread
From: Kamalesh Babulal @ 2008-06-12  8:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, kernel-testers, linux-mm, Nick Piggin, Andy Whitcroft

Hi Andrew,

2.6.26-rc5-mm3 kernel panics while booting up on the x86_64
machine. Sorry the console is bit overwritten for the first few lines.

------------[ cut here ]------------
ot fs
no fstab.kernel BUG at mm/filemap.c:575!
sys, mounting ininvalid opcode: 0000 [1] ternal defaultsSMP 
Switching to ne
w root and runnilast sysfs file: /sys/block/dm-3/removable
ng init.
unmounCPU 3 ting old /dev
u
nmounting old /pModules linked in:roc
unmounting 
old /sys
Pid: 1, comm: init Not tainted 2.6.26-rc5-mm3-autotest #1
RIP: 0010:[<ffffffff80268155>]  [<ffffffff80268155>] unlock_page+0xf/0x26
RSP: 0018:ffff81003f9e1dc8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffe20000f63080 RCX: 0000000000000036
RDX: 0000000000000000 RSI: ffffe20000f63080 RDI: ffffe20000f63080
RBP: 0000000000000000 R08: ffff81003f9a5727 R09: ffffc10000200200
R10: ffffc10000100100 R11: 000000000000000e R12: 0000000000000000
R13: 0000000000000000 R14: ffff81003f47aed8 R15: 0000000000000000
FS:  000000000066d870(0063) GS:ffff81003f99fa80(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000065afa0 CR3: 000000003d580000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process init (pid: 1, threadinfo ffff81003f9e0000, task ffff81003f9d8000)
Stack:  ffffe20000f63080 ffffffff80270d9c 0000000000000000 ffffffffffffffff
 000000000000000e 0000000000000000 ffffe20000f63080 ffffe20000f630c0
 ffffe20000f63100 ffffe20000f63140 ffffe20000f63180 ffffe20000f631c0
Call Trace:
 [<ffffffff80270d9c>] truncate_inode_pages_range+0xc5/0x305
 [<ffffffff802a7177>] generic_delete_inode+0xc9/0x133
 [<ffffffff8029e3cd>] do_unlinkat+0xf0/0x160
 [<ffffffff8020bd0b>] system_call_after_swapgs+0x7b/0x80


Code: 00 00 48 85 c0 74 0b 48 8b 40 10 48 85 c0 74 02 ff d0 e8 75 ec 32 00 41 5b 31 c0 c3 53 48 89 fb f0 0f ba 37 00 19 c0 85 c0 75 04 <0f> 0b eb fe e8 56 f5 ff ff 48 89 de 48 89 c7 31 d2 5b e9 47 be 
RIP  [<ffffffff80268155>] unlock_page+0xf/0x26
 RSP <ffff81003f9e1dc8>
---[ end trace 27b1d01b03af7c12 ]---
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init Tainted: G      D   2.6.26-rc5-mm3-autotest #1

Call Trace:
 [<ffffffff80232d87>] panic+0x86/0x144
 [<ffffffff80233a09>] printk+0x4e/0x56
 [<ffffffff80235740>] do_exit+0x71/0x67c
 [<ffffffff80598691>] oops_begin+0x0/0x8c
 [<ffffffff8020dbc0>] do_invalid_op+0x87/0x91
 [<ffffffff80268155>] unlock_page+0xf/0x26
 [<ffffffff805982d9>] error_exit+0x0/0x51
 [<ffffffff80268155>] unlock_page+0xf/0x26
 [<ffffffff80270d9c>] truncate_inode_pages_range+0xc5/0x305
 [<ffffffff802a7177>] generic_delete_inode+0xc9/0x133
 [<ffffffff8029e3cd>] do_unlinkat+0xf0/0x160
 [<ffffffff8020bd0b>] system_call_after_swapgs+0x7b/0x80


-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-12  8:44   ` Kamalesh Babulal
  0 siblings, 0 replies; 290+ messages in thread
From: Kamalesh Babulal @ 2008-06-12  8:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Nick Piggin, Andy Whitcroft

Hi Andrew,

2.6.26-rc5-mm3 kernel panics while booting up on the x86_64
machine. Sorry the console is bit overwritten for the first few lines.

------------[ cut here ]------------
ot fs
no fstab.kernel BUG at mm/filemap.c:575!
sys, mounting ininvalid opcode: 0000 [1] ternal defaultsSMP 
Switching to ne
w root and runnilast sysfs file: /sys/block/dm-3/removable
ng init.
unmounCPU 3 ting old /dev
u
nmounting old /pModules linked in:roc
unmounting 
old /sys
Pid: 1, comm: init Not tainted 2.6.26-rc5-mm3-autotest #1
RIP: 0010:[<ffffffff80268155>]  [<ffffffff80268155>] unlock_page+0xf/0x26
RSP: 0018:ffff81003f9e1dc8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffe20000f63080 RCX: 0000000000000036
RDX: 0000000000000000 RSI: ffffe20000f63080 RDI: ffffe20000f63080
RBP: 0000000000000000 R08: ffff81003f9a5727 R09: ffffc10000200200
R10: ffffc10000100100 R11: 000000000000000e R12: 0000000000000000
R13: 0000000000000000 R14: ffff81003f47aed8 R15: 0000000000000000
FS:  000000000066d870(0063) GS:ffff81003f99fa80(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000065afa0 CR3: 000000003d580000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process init (pid: 1, threadinfo ffff81003f9e0000, task ffff81003f9d8000)
Stack:  ffffe20000f63080 ffffffff80270d9c 0000000000000000 ffffffffffffffff
 000000000000000e 0000000000000000 ffffe20000f63080 ffffe20000f630c0
 ffffe20000f63100 ffffe20000f63140 ffffe20000f63180 ffffe20000f631c0
Call Trace:
 [<ffffffff80270d9c>] truncate_inode_pages_range+0xc5/0x305
 [<ffffffff802a7177>] generic_delete_inode+0xc9/0x133
 [<ffffffff8029e3cd>] do_unlinkat+0xf0/0x160
 [<ffffffff8020bd0b>] system_call_after_swapgs+0x7b/0x80


Code: 00 00 48 85 c0 74 0b 48 8b 40 10 48 85 c0 74 02 ff d0 e8 75 ec 32 00 41 5b 31 c0 c3 53 48 89 fb f0 0f ba 37 00 19 c0 85 c0 75 04 <0f> 0b eb fe e8 56 f5 ff ff 48 89 de 48 89 c7 31 d2 5b e9 47 be 
RIP  [<ffffffff80268155>] unlock_page+0xf/0x26
 RSP <ffff81003f9e1dc8>
---[ end trace 27b1d01b03af7c12 ]---
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init Tainted: G      D   2.6.26-rc5-mm3-autotest #1

Call Trace:
 [<ffffffff80232d87>] panic+0x86/0x144
 [<ffffffff80233a09>] printk+0x4e/0x56
 [<ffffffff80235740>] do_exit+0x71/0x67c
 [<ffffffff80598691>] oops_begin+0x0/0x8c
 [<ffffffff8020dbc0>] do_invalid_op+0x87/0x91
 [<ffffffff80268155>] unlock_page+0xf/0x26
 [<ffffffff805982d9>] error_exit+0x0/0x51
 [<ffffffff80268155>] unlock_page+0xf/0x26
 [<ffffffff80270d9c>] truncate_inode_pages_range+0xc5/0x305
 [<ffffffff802a7177>] generic_delete_inode+0xc9/0x133
 [<ffffffff8029e3cd>] do_unlinkat+0xf0/0x160
 [<ffffffff8020bd0b>] system_call_after_swapgs+0x7b/0x80


-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
  2008-06-12  8:44   ` Kamalesh Babulal
  (?)
@ 2008-06-12  8:57     ` Andrew Morton
  -1 siblings, 0 replies; 290+ messages in thread
From: Andrew Morton @ 2008-06-12  8:57 UTC (permalink / raw)
  To: Kamalesh Babulal
  Cc: linux-kernel, kernel-testers, linux-mm, Nick Piggin, Andy Whitcroft

On Thu, 12 Jun 2008 14:14:21 +0530 Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:

> Hi Andrew,
> 
> 2.6.26-rc5-mm3 kernel panics while booting up on the x86_64
> machine. Sorry the console is bit overwritten for the first few lines.
> 
> ------------[ cut here ]------------
> ot fs
> no fstab.kernel BUG at mm/filemap.c:575!
> sys, mounting ininvalid opcode: 0000 [1] ternal defaultsSMP 
> Switching to ne
> w root and runnilast sysfs file: /sys/block/dm-3/removable
> ng init.
> unmounCPU 3 ting old /dev
> u
> nmounting old /pModules linked in:roc
> unmounting 
> old /sys
> Pid: 1, comm: init Not tainted 2.6.26-rc5-mm3-autotest #1
> RIP: 0010:[<ffffffff80268155>]  [<ffffffff80268155>] unlock_page+0xf/0x26
> RSP: 0018:ffff81003f9e1dc8  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffffe20000f63080 RCX: 0000000000000036
> RDX: 0000000000000000 RSI: ffffe20000f63080 RDI: ffffe20000f63080
> RBP: 0000000000000000 R08: ffff81003f9a5727 R09: ffffc10000200200
> R10: ffffc10000100100 R11: 000000000000000e R12: 0000000000000000
> R13: 0000000000000000 R14: ffff81003f47aed8 R15: 0000000000000000
> FS:  000000000066d870(0063) GS:ffff81003f99fa80(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 000000000065afa0 CR3: 000000003d580000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process init (pid: 1, threadinfo ffff81003f9e0000, task ffff81003f9d8000)
> Stack:  ffffe20000f63080 ffffffff80270d9c 0000000000000000 ffffffffffffffff
>  000000000000000e 0000000000000000 ffffe20000f63080 ffffe20000f630c0
>  ffffe20000f63100 ffffe20000f63140 ffffe20000f63180 ffffe20000f631c0
> Call Trace:
>  [<ffffffff80270d9c>] truncate_inode_pages_range+0xc5/0x305
>  [<ffffffff802a7177>] generic_delete_inode+0xc9/0x133
>  [<ffffffff8029e3cd>] do_unlinkat+0xf0/0x160
>  [<ffffffff8020bd0b>] system_call_after_swapgs+0x7b/0x80
> 
> 
> Code: 00 00 48 85 c0 74 0b 48 8b 40 10 48 85 c0 74 02 ff d0 e8 75 ec 32 00 41 5b 31 c0 c3 53 48 89 fb f0 0f ba 37 00 19 c0 85 c0 75 04 <0f> 0b eb fe e8 56 f5 ff ff 48 89 de 48 89 c7 31 d2 5b e9 47 be 
> RIP  [<ffffffff80268155>] unlock_page+0xf/0x26
>  RSP <ffff81003f9e1dc8>
> ---[ end trace 27b1d01b03af7c12 ]---

Another unlock of an unlocked page.  Presumably when reclaim hadn't
done anything yet. 

Don't know, sorry.  Strange.

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-12  8:57     ` Andrew Morton
  0 siblings, 0 replies; 290+ messages in thread
From: Andrew Morton @ 2008-06-12  8:57 UTC (permalink / raw)
  To: Kamalesh Babulal
  Cc: linux-kernel, kernel-testers, linux-mm, Nick Piggin, Andy Whitcroft

On Thu, 12 Jun 2008 14:14:21 +0530 Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:

> Hi Andrew,
> 
> 2.6.26-rc5-mm3 kernel panics while booting up on the x86_64
> machine. Sorry the console is bit overwritten for the first few lines.
> 
> ------------[ cut here ]------------
> ot fs
> no fstab.kernel BUG at mm/filemap.c:575!
> sys, mounting ininvalid opcode: 0000 [1] ternal defaultsSMP 
> Switching to ne
> w root and runnilast sysfs file: /sys/block/dm-3/removable
> ng init.
> unmounCPU 3 ting old /dev
> u
> nmounting old /pModules linked in:roc
> unmounting 
> old /sys
> Pid: 1, comm: init Not tainted 2.6.26-rc5-mm3-autotest #1
> RIP: 0010:[<ffffffff80268155>]  [<ffffffff80268155>] unlock_page+0xf/0x26
> RSP: 0018:ffff81003f9e1dc8  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffffe20000f63080 RCX: 0000000000000036
> RDX: 0000000000000000 RSI: ffffe20000f63080 RDI: ffffe20000f63080
> RBP: 0000000000000000 R08: ffff81003f9a5727 R09: ffffc10000200200
> R10: ffffc10000100100 R11: 000000000000000e R12: 0000000000000000
> R13: 0000000000000000 R14: ffff81003f47aed8 R15: 0000000000000000
> FS:  000000000066d870(0063) GS:ffff81003f99fa80(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 000000000065afa0 CR3: 000000003d580000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process init (pid: 1, threadinfo ffff81003f9e0000, task ffff81003f9d8000)
> Stack:  ffffe20000f63080 ffffffff80270d9c 0000000000000000 ffffffffffffffff
>  000000000000000e 0000000000000000 ffffe20000f63080 ffffe20000f630c0
>  ffffe20000f63100 ffffe20000f63140 ffffe20000f63180 ffffe20000f631c0
> Call Trace:
>  [<ffffffff80270d9c>] truncate_inode_pages_range+0xc5/0x305
>  [<ffffffff802a7177>] generic_delete_inode+0xc9/0x133
>  [<ffffffff8029e3cd>] do_unlinkat+0xf0/0x160
>  [<ffffffff8020bd0b>] system_call_after_swapgs+0x7b/0x80
> 
> 
> Code: 00 00 48 85 c0 74 0b 48 8b 40 10 48 85 c0 74 02 ff d0 e8 75 ec 32 00 41 5b 31 c0 c3 53 48 89 fb f0 0f ba 37 00 19 c0 85 c0 75 04 <0f> 0b eb fe e8 56 f5 ff ff 48 89 de 48 89 c7 31 d2 5b e9 47 be 
> RIP  [<ffffffff80268155>] unlock_page+0xf/0x26
>  RSP <ffff81003f9e1dc8>
> ---[ end trace 27b1d01b03af7c12 ]---

Another unlock of an unlocked page.  Presumably when reclaim hadn't
done anything yet. 

Don't know, sorry.  Strange.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-12  8:57     ` Andrew Morton
  0 siblings, 0 replies; 290+ messages in thread
From: Andrew Morton @ 2008-06-12  8:57 UTC (permalink / raw)
  To: Kamalesh Babulal
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Nick Piggin, Andy Whitcroft

On Thu, 12 Jun 2008 14:14:21 +0530 Kamalesh Babulal <kamalesh-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:

> Hi Andrew,
> 
> 2.6.26-rc5-mm3 kernel panics while booting up on the x86_64
> machine. Sorry the console is bit overwritten for the first few lines.
> 
> ------------[ cut here ]------------
> ot fs
> no fstab.kernel BUG at mm/filemap.c:575!
> sys, mounting ininvalid opcode: 0000 [1] ternal defaultsSMP 
> Switching to ne
> w root and runnilast sysfs file: /sys/block/dm-3/removable
> ng init.
> unmounCPU 3 ting old /dev
> u
> nmounting old /pModules linked in:roc
> unmounting 
> old /sys
> Pid: 1, comm: init Not tainted 2.6.26-rc5-mm3-autotest #1
> RIP: 0010:[<ffffffff80268155>]  [<ffffffff80268155>] unlock_page+0xf/0x26
> RSP: 0018:ffff81003f9e1dc8  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffffe20000f63080 RCX: 0000000000000036
> RDX: 0000000000000000 RSI: ffffe20000f63080 RDI: ffffe20000f63080
> RBP: 0000000000000000 R08: ffff81003f9a5727 R09: ffffc10000200200
> R10: ffffc10000100100 R11: 000000000000000e R12: 0000000000000000
> R13: 0000000000000000 R14: ffff81003f47aed8 R15: 0000000000000000
> FS:  000000000066d870(0063) GS:ffff81003f99fa80(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 000000000065afa0 CR3: 000000003d580000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process init (pid: 1, threadinfo ffff81003f9e0000, task ffff81003f9d8000)
> Stack:  ffffe20000f63080 ffffffff80270d9c 0000000000000000 ffffffffffffffff
>  000000000000000e 0000000000000000 ffffe20000f63080 ffffe20000f630c0
>  ffffe20000f63100 ffffe20000f63140 ffffe20000f63180 ffffe20000f631c0
> Call Trace:
>  [<ffffffff80270d9c>] truncate_inode_pages_range+0xc5/0x305
>  [<ffffffff802a7177>] generic_delete_inode+0xc9/0x133
>  [<ffffffff8029e3cd>] do_unlinkat+0xf0/0x160
>  [<ffffffff8020bd0b>] system_call_after_swapgs+0x7b/0x80
> 
> 
> Code: 00 00 48 85 c0 74 0b 48 8b 40 10 48 85 c0 74 02 ff d0 e8 75 ec 32 00 41 5b 31 c0 c3 53 48 89 fb f0 0f ba 37 00 19 c0 85 c0 75 04 <0f> 0b eb fe e8 56 f5 ff ff 48 89 de 48 89 c7 31 d2 5b e9 47 be 
> RIP  [<ffffffff80268155>] unlock_page+0xf/0x26
>  RSP <ffff81003f9e1dc8>
> ---[ end trace 27b1d01b03af7c12 ]---

Another unlock of an unlocked page.  Presumably when reclaim hadn't
done anything yet. 

Don't know, sorry.  Strange.
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
  2008-06-12  8:57     ` Andrew Morton
  (?)
@ 2008-06-12 11:20       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-12 11:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kamalesh Babulal, linux-kernel, kernel-testers, linux-mm,
	Nick Piggin, Andy Whitcroft

On Thu, 12 Jun 2008 01:57:46 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> > Call Trace:
> >  [<ffffffff80270d9c>] truncate_inode_pages_range+0xc5/0x305
> >  [<ffffffff802a7177>] generic_delete_inode+0xc9/0x133
> >  [<ffffffff8029e3cd>] do_unlinkat+0xf0/0x160
> >  [<ffffffff8020bd0b>] system_call_after_swapgs+0x7b/0x80
> > 
> > 
> > Code: 00 00 48 85 c0 74 0b 48 8b 40 10 48 85 c0 74 02 ff d0 e8 75 ec 32 00 41 5b 31 c0 c3 53 48 89 fb f0 0f ba 37 00 19 c0 85 c0 75 04 <0f> 0b eb fe e8 56 f5 ff ff 48 89 de 48 89 c7 31 d2 5b e9 47 be 
> > RIP  [<ffffffff80268155>] unlock_page+0xf/0x26
> >  RSP <ffff81003f9e1dc8>
> > ---[ end trace 27b1d01b03af7c12 ]---
> 
> Another unlock of an unlocked page.  Presumably when reclaim hadn't
> done anything yet. 
> 
> Don't know, sorry.  Strange.
> 
at first look,

==
truncate_inode_pages_range()
	-> TestSetPageLocked()  //
        	-> truncate_complete_page()
			-> remove_from_page_cache() // makes page->mapping to be NULL.
			-> clear_page_mlock()
				-> __clear_page_mlock()
					-> putback_lru_page()
						-> unlock_page() // page->mapping is NULL
	-> unlock_page() //BUG
==

It seems truncate_complete_page() is bad.
==
static void
truncate_complete_page(struct address_space *mapping, struct page *page)
{
        if (page->mapping != mapping)
                return;

        if (PagePrivate(page))
                do_invalidatepage(page, 0);

        cancel_dirty_page(page, PAGE_CACHE_SIZE);

        remove_from_page_cache(page);     -----------------(A)
        clear_page_mlock(page);           -----------------(B)
        ClearPageUptodate(page);
        ClearPageMappedToDisk(page);
        page_cache_release(page);       /* pagecache ref */
}
==

(B) should be called before (A) as invalidate_complete_page() does.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-12 11:20       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-12 11:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kamalesh Babulal, linux-kernel, kernel-testers, linux-mm,
	Nick Piggin, Andy Whitcroft

On Thu, 12 Jun 2008 01:57:46 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> > Call Trace:
> >  [<ffffffff80270d9c>] truncate_inode_pages_range+0xc5/0x305
> >  [<ffffffff802a7177>] generic_delete_inode+0xc9/0x133
> >  [<ffffffff8029e3cd>] do_unlinkat+0xf0/0x160
> >  [<ffffffff8020bd0b>] system_call_after_swapgs+0x7b/0x80
> > 
> > 
> > Code: 00 00 48 85 c0 74 0b 48 8b 40 10 48 85 c0 74 02 ff d0 e8 75 ec 32 00 41 5b 31 c0 c3 53 48 89 fb f0 0f ba 37 00 19 c0 85 c0 75 04 <0f> 0b eb fe e8 56 f5 ff ff 48 89 de 48 89 c7 31 d2 5b e9 47 be 
> > RIP  [<ffffffff80268155>] unlock_page+0xf/0x26
> >  RSP <ffff81003f9e1dc8>
> > ---[ end trace 27b1d01b03af7c12 ]---
> 
> Another unlock of an unlocked page.  Presumably when reclaim hadn't
> done anything yet. 
> 
> Don't know, sorry.  Strange.
> 
at first look,

==
truncate_inode_pages_range()
	-> TestSetPageLocked()  //
        	-> truncate_complete_page()
			-> remove_from_page_cache() // makes page->mapping to be NULL.
			-> clear_page_mlock()
				-> __clear_page_mlock()
					-> putback_lru_page()
						-> unlock_page() // page->mapping is NULL
	-> unlock_page() //BUG
==

It seems truncate_complete_page() is bad.
==
static void
truncate_complete_page(struct address_space *mapping, struct page *page)
{
        if (page->mapping != mapping)
                return;

        if (PagePrivate(page))
                do_invalidatepage(page, 0);

        cancel_dirty_page(page, PAGE_CACHE_SIZE);

        remove_from_page_cache(page);     -----------------(A)
        clear_page_mlock(page);           -----------------(B)
        ClearPageUptodate(page);
        ClearPageMappedToDisk(page);
        page_cache_release(page);       /* pagecache ref */
}
==

(B) should be called before (A) as invalidate_complete_page() does.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-12 11:20       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-12 11:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kamalesh Babulal, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Nick Piggin, Andy Whitcroft

On Thu, 12 Jun 2008 01:57:46 -0700
Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:

> > Call Trace:
> >  [<ffffffff80270d9c>] truncate_inode_pages_range+0xc5/0x305
> >  [<ffffffff802a7177>] generic_delete_inode+0xc9/0x133
> >  [<ffffffff8029e3cd>] do_unlinkat+0xf0/0x160
> >  [<ffffffff8020bd0b>] system_call_after_swapgs+0x7b/0x80
> > 
> > 
> > Code: 00 00 48 85 c0 74 0b 48 8b 40 10 48 85 c0 74 02 ff d0 e8 75 ec 32 00 41 5b 31 c0 c3 53 48 89 fb f0 0f ba 37 00 19 c0 85 c0 75 04 <0f> 0b eb fe e8 56 f5 ff ff 48 89 de 48 89 c7 31 d2 5b e9 47 be 
> > RIP  [<ffffffff80268155>] unlock_page+0xf/0x26
> >  RSP <ffff81003f9e1dc8>
> > ---[ end trace 27b1d01b03af7c12 ]---
> 
> Another unlock of an unlocked page.  Presumably when reclaim hadn't
> done anything yet. 
> 
> Don't know, sorry.  Strange.
> 
at first look,

==
truncate_inode_pages_range()
	-> TestSetPageLocked()  //
        	-> truncate_complete_page()
			-> remove_from_page_cache() // makes page->mapping to be NULL.
			-> clear_page_mlock()
				-> __clear_page_mlock()
					-> putback_lru_page()
						-> unlock_page() // page->mapping is NULL
	-> unlock_page() //BUG
==

It seems truncate_complete_page() is bad.
==
static void
truncate_complete_page(struct address_space *mapping, struct page *page)
{
        if (page->mapping != mapping)
                return;

        if (PagePrivate(page))
                do_invalidatepage(page, 0);

        cancel_dirty_page(page, PAGE_CACHE_SIZE);

        remove_from_page_cache(page);     -----------------(A)
        clear_page_mlock(page);           -----------------(B)
        ClearPageUptodate(page);
        ClearPageMappedToDisk(page);
        page_cache_release(page);       /* pagecache ref */
}
==

(B) should be called before (A) as invalidate_complete_page() does.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
  2008-06-12  8:57     ` Andrew Morton
  (?)
@ 2008-06-12 11:38       ` Nick Piggin
  -1 siblings, 0 replies; 290+ messages in thread
From: Nick Piggin @ 2008-06-12 11:38 UTC (permalink / raw)
  To: Andrew Morton, Rik van Riel
  Cc: Kamalesh Babulal, linux-kernel, kernel-testers, linux-mm,
	Nick Piggin, Andy Whitcroft

On Thursday 12 June 2008 18:57, Andrew Morton wrote:
> On Thu, 12 Jun 2008 14:14:21 +0530 Kamalesh Babulal 
<kamalesh@linux.vnet.ibm.com> wrote:
> > Hi Andrew,
> >
> > 2.6.26-rc5-mm3 kernel panics while booting up on the x86_64
> > machine. Sorry the console is bit overwritten for the first few lines.
> >
> > ------------[ cut here ]------------
> > ot fs
> > no fstab.kernel BUG at mm/filemap.c:575!
> > sys, mounting ininvalid opcode: 0000 [1] ternal defaultsSMP
> > Switching to ne
> > w root and runnilast sysfs file: /sys/block/dm-3/removable
> > ng init.
> > unmounCPU 3 ting old /dev
> > u
> > nmounting old /pModules linked in:roc
> > unmounting
> > old /sys
> > Pid: 1, comm: init Not tainted 2.6.26-rc5-mm3-autotest #1
> > RIP: 0010:[<ffffffff80268155>]  [<ffffffff80268155>] unlock_page+0xf/0x26
> > RSP: 0018:ffff81003f9e1dc8  EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: ffffe20000f63080 RCX: 0000000000000036
> > RDX: 0000000000000000 RSI: ffffe20000f63080 RDI: ffffe20000f63080
> > RBP: 0000000000000000 R08: ffff81003f9a5727 R09: ffffc10000200200
> > R10: ffffc10000100100 R11: 000000000000000e R12: 0000000000000000
> > R13: 0000000000000000 R14: ffff81003f47aed8 R15: 0000000000000000
> > FS:  000000000066d870(0063) GS:ffff81003f99fa80(0000)
> > knlGS:0000000000000000 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 000000000065afa0 CR3: 000000003d580000 CR4: 00000000000006e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process init (pid: 1, threadinfo ffff81003f9e0000, task ffff81003f9d8000)
> > Stack:  ffffe20000f63080 ffffffff80270d9c 0000000000000000
> > ffffffffffffffff 000000000000000e 0000000000000000 ffffe20000f63080
> > ffffe20000f630c0 ffffe20000f63100 ffffe20000f63140 ffffe20000f63180
> > ffffe20000f631c0 Call Trace:
> >  [<ffffffff80270d9c>] truncate_inode_pages_range+0xc5/0x305
> >  [<ffffffff802a7177>] generic_delete_inode+0xc9/0x133
> >  [<ffffffff8029e3cd>] do_unlinkat+0xf0/0x160
> >  [<ffffffff8020bd0b>] system_call_after_swapgs+0x7b/0x80
> >
> >
> > Code: 00 00 48 85 c0 74 0b 48 8b 40 10 48 85 c0 74 02 ff d0 e8 75 ec 32
> > 00 41 5b 31 c0 c3 53 48 89 fb f0 0f ba 37 00 19 c0 85 c0 75 04 <0f> 0b eb
> > fe e8 56 f5 ff ff 48 89 de 48 89 c7 31 d2 5b e9 47 be RIP 
> > [<ffffffff80268155>] unlock_page+0xf/0x26
> >  RSP <ffff81003f9e1dc8>
> > ---[ end trace 27b1d01b03af7c12 ]---
>
> Another unlock of an unlocked page.  Presumably when reclaim hadn't
> done anything yet.
>
> Don't know, sorry.  Strange.

Looks like something lockless pagecache *could* be connected with, but
I have never seen such a bug.

Hmm...

@@ -104,6 +105,7 @@ truncate_complete_page(struct address_sp
        cancel_dirty_page(page, PAGE_CACHE_SIZE);

        remove_from_page_cache(page);
+       clear_page_mlock(page);
        ClearPageUptodate(page);
        ClearPageMappedToDisk(page);
        page_cache_release(page);       /* pagecache ref */

...

+static inline void clear_page_mlock(struct page *page)
+{
+       if (unlikely(TestClearPageMlocked(page)))
+               __clear_page_mlock(page);
+}

...

+void __clear_page_mlock(struct page *page)
+{
+       VM_BUG_ON(!PageLocked(page));   /* for LRU isolate/putback */
+
+       dec_zone_page_state(page, NR_MLOCK);
+       count_vm_event(NORECL_PGCLEARED);
+       if (!isolate_lru_page(page)) {
+               putback_lru_page(page);
+       } else {
+               /*
+                * Page not on the LRU yet.  Flush all pagevecs and retry.
+                */
+               lru_add_drain_all();
+               if (!isolate_lru_page(page))
+                       putback_lru_page(page);
+               else if (PageUnevictable(page))
+                       count_vm_event(NORECL_PGSTRANDED);
+       }
+}

...

+int putback_lru_page(struct page *page)
+{
+       int lru;
+       int ret = 1;
+       int was_unevictable;
+
+       VM_BUG_ON(!PageLocked(page));
+       VM_BUG_ON(PageLRU(page));
+
+       lru = !!TestClearPageActive(page);
+       was_unevictable = TestClearPageUnevictable(page); /* for 
page_evictable() */
+
+       if (unlikely(!page->mapping)) {
+               /*
+                * page truncated.  drop lock as put_page() will
+                * free the page.
+                */
+               VM_BUG_ON(page_count(page) != 1);
+               unlock_page(page);
                ^^^^^^^^^^^^^^^^^^


This is a rather wild thing to be doing. It's a really bad idea
to drop a lock that's taken several function calls distant and
across different files...

This is most likely where the locking is getting screwed up, but
even if it was cobbled together to work, it just makes the
locking scheme very hard to follow and verify.

I don't have any suggestions yet, as I still haven't been able
to review the patchset properly (and probably won't for the next
week or so). But please rethink the locking.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-12 11:38       ` Nick Piggin
  0 siblings, 0 replies; 290+ messages in thread
From: Nick Piggin @ 2008-06-12 11:38 UTC (permalink / raw)
  To: Andrew Morton, Rik van Riel
  Cc: Kamalesh Babulal, linux-kernel, kernel-testers, linux-mm,
	Nick Piggin, Andy Whitcroft

On Thursday 12 June 2008 18:57, Andrew Morton wrote:
> On Thu, 12 Jun 2008 14:14:21 +0530 Kamalesh Babulal 
<kamalesh@linux.vnet.ibm.com> wrote:
> > Hi Andrew,
> >
> > 2.6.26-rc5-mm3 kernel panics while booting up on the x86_64
> > machine. Sorry the console is bit overwritten for the first few lines.
> >
> > ------------[ cut here ]------------
> > ot fs
> > no fstab.kernel BUG at mm/filemap.c:575!
> > sys, mounting ininvalid opcode: 0000 [1] ternal defaultsSMP
> > Switching to ne
> > w root and runnilast sysfs file: /sys/block/dm-3/removable
> > ng init.
> > unmounCPU 3 ting old /dev
> > u
> > nmounting old /pModules linked in:roc
> > unmounting
> > old /sys
> > Pid: 1, comm: init Not tainted 2.6.26-rc5-mm3-autotest #1
> > RIP: 0010:[<ffffffff80268155>]  [<ffffffff80268155>] unlock_page+0xf/0x26
> > RSP: 0018:ffff81003f9e1dc8  EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: ffffe20000f63080 RCX: 0000000000000036
> > RDX: 0000000000000000 RSI: ffffe20000f63080 RDI: ffffe20000f63080
> > RBP: 0000000000000000 R08: ffff81003f9a5727 R09: ffffc10000200200
> > R10: ffffc10000100100 R11: 000000000000000e R12: 0000000000000000
> > R13: 0000000000000000 R14: ffff81003f47aed8 R15: 0000000000000000
> > FS:  000000000066d870(0063) GS:ffff81003f99fa80(0000)
> > knlGS:0000000000000000 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 000000000065afa0 CR3: 000000003d580000 CR4: 00000000000006e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process init (pid: 1, threadinfo ffff81003f9e0000, task ffff81003f9d8000)
> > Stack:  ffffe20000f63080 ffffffff80270d9c 0000000000000000
> > ffffffffffffffff 000000000000000e 0000000000000000 ffffe20000f63080
> > ffffe20000f630c0 ffffe20000f63100 ffffe20000f63140 ffffe20000f63180
> > ffffe20000f631c0 Call Trace:
> >  [<ffffffff80270d9c>] truncate_inode_pages_range+0xc5/0x305
> >  [<ffffffff802a7177>] generic_delete_inode+0xc9/0x133
> >  [<ffffffff8029e3cd>] do_unlinkat+0xf0/0x160
> >  [<ffffffff8020bd0b>] system_call_after_swapgs+0x7b/0x80
> >
> >
> > Code: 00 00 48 85 c0 74 0b 48 8b 40 10 48 85 c0 74 02 ff d0 e8 75 ec 32
> > 00 41 5b 31 c0 c3 53 48 89 fb f0 0f ba 37 00 19 c0 85 c0 75 04 <0f> 0b eb
> > fe e8 56 f5 ff ff 48 89 de 48 89 c7 31 d2 5b e9 47 be RIP 
> > [<ffffffff80268155>] unlock_page+0xf/0x26
> >  RSP <ffff81003f9e1dc8>
> > ---[ end trace 27b1d01b03af7c12 ]---
>
> Another unlock of an unlocked page.  Presumably when reclaim hadn't
> done anything yet.
>
> Don't know, sorry.  Strange.

Looks like something lockless pagecache *could* be connected with, but
I have never seen such a bug.

Hmm...

@@ -104,6 +105,7 @@ truncate_complete_page(struct address_sp
        cancel_dirty_page(page, PAGE_CACHE_SIZE);

        remove_from_page_cache(page);
+       clear_page_mlock(page);
        ClearPageUptodate(page);
        ClearPageMappedToDisk(page);
        page_cache_release(page);       /* pagecache ref */

...

+static inline void clear_page_mlock(struct page *page)
+{
+       if (unlikely(TestClearPageMlocked(page)))
+               __clear_page_mlock(page);
+}

...

+void __clear_page_mlock(struct page *page)
+{
+       VM_BUG_ON(!PageLocked(page));   /* for LRU isolate/putback */
+
+       dec_zone_page_state(page, NR_MLOCK);
+       count_vm_event(NORECL_PGCLEARED);
+       if (!isolate_lru_page(page)) {
+               putback_lru_page(page);
+       } else {
+               /*
+                * Page not on the LRU yet.  Flush all pagevecs and retry.
+                */
+               lru_add_drain_all();
+               if (!isolate_lru_page(page))
+                       putback_lru_page(page);
+               else if (PageUnevictable(page))
+                       count_vm_event(NORECL_PGSTRANDED);
+       }
+}

...

+int putback_lru_page(struct page *page)
+{
+       int lru;
+       int ret = 1;
+       int was_unevictable;
+
+       VM_BUG_ON(!PageLocked(page));
+       VM_BUG_ON(PageLRU(page));
+
+       lru = !!TestClearPageActive(page);
+       was_unevictable = TestClearPageUnevictable(page); /* for 
page_evictable() */
+
+       if (unlikely(!page->mapping)) {
+               /*
+                * page truncated.  drop lock as put_page() will
+                * free the page.
+                */
+               VM_BUG_ON(page_count(page) != 1);
+               unlock_page(page);
                ^^^^^^^^^^^^^^^^^^


This is a rather wild thing to be doing. It's a really bad idea
to drop a lock that's taken several function calls distant and
across different files...

This is most likely where the locking is getting screwed up, but
even if it was cobbled together to work, it just makes the
locking scheme very hard to follow and verify.

I don't have any suggestions yet, as I still haven't been able
to review the patchset properly (and probably won't for the next
week or so). But please rethink the locking.

Thanks,
Nick

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-12 11:38       ` Nick Piggin
  0 siblings, 0 replies; 290+ messages in thread
From: Nick Piggin @ 2008-06-12 11:38 UTC (permalink / raw)
  To: Andrew Morton, Rik van Riel
  Cc: Kamalesh Babulal, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Nick Piggin, Andy Whitcroft

On Thursday 12 June 2008 18:57, Andrew Morton wrote:
> On Thu, 12 Jun 2008 14:14:21 +0530 Kamalesh Babulal 
<kamalesh-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:
> > Hi Andrew,
> >
> > 2.6.26-rc5-mm3 kernel panics while booting up on the x86_64
> > machine. Sorry the console is bit overwritten for the first few lines.
> >
> > ------------[ cut here ]------------
> > ot fs
> > no fstab.kernel BUG at mm/filemap.c:575!
> > sys, mounting ininvalid opcode: 0000 [1] ternal defaultsSMP
> > Switching to ne
> > w root and runnilast sysfs file: /sys/block/dm-3/removable
> > ng init.
> > unmounCPU 3 ting old /dev
> > u
> > nmounting old /pModules linked in:roc
> > unmounting
> > old /sys
> > Pid: 1, comm: init Not tainted 2.6.26-rc5-mm3-autotest #1
> > RIP: 0010:[<ffffffff80268155>]  [<ffffffff80268155>] unlock_page+0xf/0x26
> > RSP: 0018:ffff81003f9e1dc8  EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: ffffe20000f63080 RCX: 0000000000000036
> > RDX: 0000000000000000 RSI: ffffe20000f63080 RDI: ffffe20000f63080
> > RBP: 0000000000000000 R08: ffff81003f9a5727 R09: ffffc10000200200
> > R10: ffffc10000100100 R11: 000000000000000e R12: 0000000000000000
> > R13: 0000000000000000 R14: ffff81003f47aed8 R15: 0000000000000000
> > FS:  000000000066d870(0063) GS:ffff81003f99fa80(0000)
> > knlGS:0000000000000000 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 000000000065afa0 CR3: 000000003d580000 CR4: 00000000000006e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process init (pid: 1, threadinfo ffff81003f9e0000, task ffff81003f9d8000)
> > Stack:  ffffe20000f63080 ffffffff80270d9c 0000000000000000
> > ffffffffffffffff 000000000000000e 0000000000000000 ffffe20000f63080
> > ffffe20000f630c0 ffffe20000f63100 ffffe20000f63140 ffffe20000f63180
> > ffffe20000f631c0 Call Trace:
> >  [<ffffffff80270d9c>] truncate_inode_pages_range+0xc5/0x305
> >  [<ffffffff802a7177>] generic_delete_inode+0xc9/0x133
> >  [<ffffffff8029e3cd>] do_unlinkat+0xf0/0x160
> >  [<ffffffff8020bd0b>] system_call_after_swapgs+0x7b/0x80
> >
> >
> > Code: 00 00 48 85 c0 74 0b 48 8b 40 10 48 85 c0 74 02 ff d0 e8 75 ec 32
> > 00 41 5b 31 c0 c3 53 48 89 fb f0 0f ba 37 00 19 c0 85 c0 75 04 <0f> 0b eb
> > fe e8 56 f5 ff ff 48 89 de 48 89 c7 31 d2 5b e9 47 be RIP 
> > [<ffffffff80268155>] unlock_page+0xf/0x26
> >  RSP <ffff81003f9e1dc8>
> > ---[ end trace 27b1d01b03af7c12 ]---
>
> Another unlock of an unlocked page.  Presumably when reclaim hadn't
> done anything yet.
>
> Don't know, sorry.  Strange.

Looks like something lockless pagecache *could* be connected with, but
I have never seen such a bug.

Hmm...

@@ -104,6 +105,7 @@ truncate_complete_page(struct address_sp
        cancel_dirty_page(page, PAGE_CACHE_SIZE);

        remove_from_page_cache(page);
+       clear_page_mlock(page);
        ClearPageUptodate(page);
        ClearPageMappedToDisk(page);
        page_cache_release(page);       /* pagecache ref */

...

+static inline void clear_page_mlock(struct page *page)
+{
+       if (unlikely(TestClearPageMlocked(page)))
+               __clear_page_mlock(page);
+}

...

+void __clear_page_mlock(struct page *page)
+{
+       VM_BUG_ON(!PageLocked(page));   /* for LRU isolate/putback */
+
+       dec_zone_page_state(page, NR_MLOCK);
+       count_vm_event(NORECL_PGCLEARED);
+       if (!isolate_lru_page(page)) {
+               putback_lru_page(page);
+       } else {
+               /*
+                * Page not on the LRU yet.  Flush all pagevecs and retry.
+                */
+               lru_add_drain_all();
+               if (!isolate_lru_page(page))
+                       putback_lru_page(page);
+               else if (PageUnevictable(page))
+                       count_vm_event(NORECL_PGSTRANDED);
+       }
+}

...

+int putback_lru_page(struct page *page)
+{
+       int lru;
+       int ret = 1;
+       int was_unevictable;
+
+       VM_BUG_ON(!PageLocked(page));
+       VM_BUG_ON(PageLRU(page));
+
+       lru = !!TestClearPageActive(page);
+       was_unevictable = TestClearPageUnevictable(page); /* for 
page_evictable() */
+
+       if (unlikely(!page->mapping)) {
+               /*
+                * page truncated.  drop lock as put_page() will
+                * free the page.
+                */
+               VM_BUG_ON(page_count(page) != 1);
+               unlock_page(page);
                ^^^^^^^^^^^^^^^^^^


This is a rather wild thing to be doing. It's a really bad idea
to drop a lock that's taken several function calls distant and
across different files...

This is most likely where the locking is getting screwed up, but
even if it was cobbled together to work, it just makes the
locking scheme very hard to follow and verify.

I don't have any suggestions yet, as I still haven't been able
to review the patchset properly (and probably won't for the next
week or so). But please rethink the locking.

Thanks,
Nick
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3
  2008-06-12  5:59 ` 2.6.26-rc5-mm3 Andrew Morton
  (?)
@ 2008-06-12 23:32   ` Byron Bradley
  -1 siblings, 0 replies; 290+ messages in thread
From: Byron Bradley @ 2008-06-12 23:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, kernel-testers, linux-mm, Daniel Walker, Hua Zhong,
	Ingo Molnar

Looks like x86 and ARM both fail to boot if PROFILE_LIKELY, FTRACE and 
DYNAMIC_FTRACE are selected. If any one of those three are disabled it 
boots (or fails in some other way which I'm looking at now). The serial 
console output from both machines when they fail to boot is below, let me 
know if there is any other information I can provide.

ARM (Marvell Orion 5x):
<5>Linux version 2.6.26-rc5-mm3-dirty (bb3081@gamma) (gcc version 4.2.0 20070413 (prerelease) (CodeSourcery Sourcery G++ Lite 2007q1-21)) #24 PREEMPT Thu Jun 12 23:39:12 BST 2008
CPU: Feroceon [41069260] revision 0 (ARMv5TEJ), cr=a0053177
Machine: QNAP TS-109/TS-209
<4>Clearing invalid memory bank 0KB@0xffffffff
<4>Clearing invalid memory bank 0KB@0xffffffff
<4>Clearing invalid memory bank 0KB@0xffffffff
<4>Ignoring unrecognised tag 0x00000000
<4>Ignoring unrecognised tag 0x00000000
<4>Ignoring unrecognised tag 0x00000000
<4>Ignoring unrecognised tag 0x41000403
Memory policy: ECC disabled, Data cache writeback
<7>On node 0 totalpages: 32768
<7>Node 0 memmap at 0xc05df000 size 1048576 first pfn 0xc05df000
<7>free_area_init_node: node 0, pgdat c0529680, node_mem_map c05df000
<7>  DMA zone: 32512 pages, LIFO batch:7
CPU0: D VIVT write-back cache
CPU0: I cache: 32768 bytes, associativity 1, 32 byte lines, 1024 sets
CPU0: D cache: 32768 bytes, associativity 1, 32 byte lines, 1024 sets
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512
<5>Kernel command line: console=ttyS0,115200n8 root=/dev/nfs nfsroot=192.168.2.53:/stuff/debian ip=dhcp
PID hash table entries: 512 (order: 9, 2048 bytes)
Console: colour dummy device 80x30
<6>Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
<6>Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
<6>Memory: 128MB = 128MB total
<5>Memory: 123776KB available (5016K code, 799K data, 160K init)
<6>SLUB: Genslabs=12, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
<7>Calibrating delay loop... 331.77 BogoMIPS (lpj=1658880)
Mount-cache hash table entries: 512
<6>CPU: Testing write buffer coherency: ok

x86 (AMD Athlon):
Linux version 2.6.26-rc5-mm3-dirty (bb3081@gamma) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #6 Thu Jun 12 23:53:18 BST 2008
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000001fff0000 (usable)
 BIOS-e820: 000000001fff0000 - 000000001fff3000 (ACPI NVS)
 BIOS-e820: 000000001fff3000 - 0000000020000000 (ACPI data)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
last_pfn = 131056 max_arch_pfn = 1048576
0MB HIGHMEM available.
511MB LOWMEM available.
  mapped low ram: 0 - 01400000
  low ram: 00f7a000 - 1fff0000
  bootmap 00f7a000 - 00f7e000
  early res: 0 [0-fff] BIOS data page
  early res: 1 [100000-f74657] TEXT DATA BSS
  early res: 2 [f75000-f79fff] INIT_PG_TABLE
  early res: 3 [9f800-fffff] BIOS reserved
  early res: 4 [f7a000-f7dfff] BOOTMAP
Zone PFN ranges:
  DMA             0 ->     4096
  Normal       4096 ->   131056
  HighMem    131056 ->   131056
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
    0:        0 ->      159
    0:      256 ->   131056
DMI 2.2 present.
ACPI: RSDP 000F7950, 0014 (r0 Nvidia)
ACPI: RSDT 1FFF3000, 002C (r1 Nvidia AWRDACPI 42302E31 AWRD        0)
ACPI: FACP 1FFF3040, 0074 (r1 Nvidia AWRDACPI 42302E31 AWRD        0)
ACPI: DSDT 1FFF30C0, 4C22 (r1 NVIDIA AWRDACPI     1000 MSFT  100000E)
ACPI: FACS 1FFF0000, 0040
ACPI: APIC 1FFF7D00, 006E (r1 Nvidia AWRDACPI 42302E31 AWRD        0)
ACPI: PM-Timer IO Port: 0x4008
Allocating PCI resources starting at 30000000 (gap: 20000000:dec00000)
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 129935
Kernel command line: console=ttyS0,115200 root=/dev/nfs nfsroot=192.168.2.53:/stuff/debian-amd ip=dhcp BOOT_IMAGE=linux.amd
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 8192 bytes)
Detected 1102.525 MHz processor.
Console: colour VGA+ 80x25
console [ttyS0] enabled
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 504184k/524224k available (8084k kernel code, 19476k reserved, 2784k data, 436k init, 0k highmem)
virtual kernel memory layout:
    fixmap  : 0xfffed000 - 0xfffff000   (  72 kB)
    pkmap   : 0xff800000 - 0xffc00000   (4096 kB)
    vmalloc : 0xe0800000 - 0xff7fe000   ( 495 MB)
    lowmem  : 0xc0000000 - 0xdfff0000   ( 511 MB)
      .init : 0xc0ba0000 - 0xc0c0d000   ( 436 kB)
      .data : 0xc08e53f1 - 0xc0b9d418   (2784 kB)
      .text : 0xc0100000 - 0xc08e53f1   (8084 kB)
Checking if this processor honours the WP bit even in supervisor mode...Ok.
SLUB: Genslabs=12, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
Calibrating delay using timer specific routine.. 2207.88 BogoMIPS (lpj=11039440)
Mount-cache hash table entries: 512
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: AMD Athlon(tm)  stepping 00
Checking 'hlt' instruction... OK.
Freeing SMP alternatives: 0k freed
ACPI: Core revision 20080321
Parsing all Control Methods:
Table [DSDT](id 0001) - 804 Objects with 77 Devices 276 Methods 35 Regions
 tbxface-0598 [00] tb_load_namespace     : ACPI Tables successfully acquired
ACPI: setting ELCR to 0200 (from 1c28)
evxfevnt-0091 [00] enable                : Transition to ACPI mode successful
gcov: version magic: 0x3430322a
net_namespace: 324 bytes


Cheers,

-- 
Byron Bradley

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3
@ 2008-06-12 23:32   ` Byron Bradley
  0 siblings, 0 replies; 290+ messages in thread
From: Byron Bradley @ 2008-06-12 23:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, kernel-testers, linux-mm, Daniel Walker, Hua Zhong,
	Ingo Molnar

Looks like x86 and ARM both fail to boot if PROFILE_LIKELY, FTRACE and 
DYNAMIC_FTRACE are selected. If any one of those three are disabled it 
boots (or fails in some other way which I'm looking at now). The serial 
console output from both machines when they fail to boot is below, let me 
know if there is any other information I can provide.

ARM (Marvell Orion 5x):
<5>Linux version 2.6.26-rc5-mm3-dirty (bb3081@gamma) (gcc version 4.2.0 20070413 (prerelease) (CodeSourcery Sourcery G++ Lite 2007q1-21)) #24 PREEMPT Thu Jun 12 23:39:12 BST 2008
CPU: Feroceon [41069260] revision 0 (ARMv5TEJ), cr=a0053177
Machine: QNAP TS-109/TS-209
<4>Clearing invalid memory bank 0KB@0xffffffff
<4>Clearing invalid memory bank 0KB@0xffffffff
<4>Clearing invalid memory bank 0KB@0xffffffff
<4>Ignoring unrecognised tag 0x00000000
<4>Ignoring unrecognised tag 0x00000000
<4>Ignoring unrecognised tag 0x00000000
<4>Ignoring unrecognised tag 0x41000403
Memory policy: ECC disabled, Data cache writeback
<7>On node 0 totalpages: 32768
<7>Node 0 memmap at 0xc05df000 size 1048576 first pfn 0xc05df000
<7>free_area_init_node: node 0, pgdat c0529680, node_mem_map c05df000
<7>  DMA zone: 32512 pages, LIFO batch:7
CPU0: D VIVT write-back cache
CPU0: I cache: 32768 bytes, associativity 1, 32 byte lines, 1024 sets
CPU0: D cache: 32768 bytes, associativity 1, 32 byte lines, 1024 sets
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512
<5>Kernel command line: console=ttyS0,115200n8 root=/dev/nfs nfsroot=192.168.2.53:/stuff/debian ip=dhcp
PID hash table entries: 512 (order: 9, 2048 bytes)
Console: colour dummy device 80x30
<6>Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
<6>Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
<6>Memory: 128MB = 128MB total
<5>Memory: 123776KB available (5016K code, 799K data, 160K init)
<6>SLUB: Genslabs=12, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
<7>Calibrating delay loop... 331.77 BogoMIPS (lpj=1658880)
Mount-cache hash table entries: 512
<6>CPU: Testing write buffer coherency: ok

x86 (AMD Athlon):
Linux version 2.6.26-rc5-mm3-dirty (bb3081@gamma) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #6 Thu Jun 12 23:53:18 BST 2008
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000001fff0000 (usable)
 BIOS-e820: 000000001fff0000 - 000000001fff3000 (ACPI NVS)
 BIOS-e820: 000000001fff3000 - 0000000020000000 (ACPI data)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
last_pfn = 131056 max_arch_pfn = 1048576
0MB HIGHMEM available.
511MB LOWMEM available.
  mapped low ram: 0 - 01400000
  low ram: 00f7a000 - 1fff0000
  bootmap 00f7a000 - 00f7e000
  early res: 0 [0-fff] BIOS data page
  early res: 1 [100000-f74657] TEXT DATA BSS
  early res: 2 [f75000-f79fff] INIT_PG_TABLE
  early res: 3 [9f800-fffff] BIOS reserved
  early res: 4 [f7a000-f7dfff] BOOTMAP
Zone PFN ranges:
  DMA             0 ->     4096
  Normal       4096 ->   131056
  HighMem    131056 ->   131056
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
    0:        0 ->      159
    0:      256 ->   131056
DMI 2.2 present.
ACPI: RSDP 000F7950, 0014 (r0 Nvidia)
ACPI: RSDT 1FFF3000, 002C (r1 Nvidia AWRDACPI 42302E31 AWRD        0)
ACPI: FACP 1FFF3040, 0074 (r1 Nvidia AWRDACPI 42302E31 AWRD        0)
ACPI: DSDT 1FFF30C0, 4C22 (r1 NVIDIA AWRDACPI     1000 MSFT  100000E)
ACPI: FACS 1FFF0000, 0040
ACPI: APIC 1FFF7D00, 006E (r1 Nvidia AWRDACPI 42302E31 AWRD        0)
ACPI: PM-Timer IO Port: 0x4008
Allocating PCI resources starting at 30000000 (gap: 20000000:dec00000)
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 129935
Kernel command line: console=ttyS0,115200 root=/dev/nfs nfsroot=192.168.2.53:/stuff/debian-amd ip=dhcp BOOT_IMAGE=linux.amd
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 8192 bytes)
Detected 1102.525 MHz processor.
Console: colour VGA+ 80x25
console [ttyS0] enabled
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 504184k/524224k available (8084k kernel code, 19476k reserved, 2784k data, 436k init, 0k highmem)
virtual kernel memory layout:
    fixmap  : 0xfffed000 - 0xfffff000   (  72 kB)
    pkmap   : 0xff800000 - 0xffc00000   (4096 kB)
    vmalloc : 0xe0800000 - 0xff7fe000   ( 495 MB)
    lowmem  : 0xc0000000 - 0xdfff0000   ( 511 MB)
      .init : 0xc0ba0000 - 0xc0c0d000   ( 436 kB)
      .data : 0xc08e53f1 - 0xc0b9d418   (2784 kB)
      .text : 0xc0100000 - 0xc08e53f1   (8084 kB)
Checking if this processor honours the WP bit even in supervisor mode...Ok.
SLUB: Genslabs=12, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
Calibrating delay using timer specific routine.. 2207.88 BogoMIPS (lpj=11039440)
Mount-cache hash table entries: 512
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: AMD Athlon(tm)  stepping 00
Checking 'hlt' instruction... OK.
Freeing SMP alternatives: 0k freed
ACPI: Core revision 20080321
Parsing all Control Methods:
Table [DSDT](id 0001) - 804 Objects with 77 Devices 276 Methods 35 Regions
 tbxface-0598 [00] tb_load_namespace     : ACPI Tables successfully acquired
ACPI: setting ELCR to 0200 (from 1c28)
evxfevnt-0091 [00] enable                : Transition to ACPI mode successful
gcov: version magic: 0x3430322a
net_namespace: 324 bytes


Cheers,

-- 
Byron Bradley

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3
@ 2008-06-12 23:32   ` Byron Bradley
  0 siblings, 0 replies; 290+ messages in thread
From: Byron Bradley @ 2008-06-12 23:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Daniel Walker, Hua Zhong,
	Ingo Molnar

Looks like x86 and ARM both fail to boot if PROFILE_LIKELY, FTRACE and 
DYNAMIC_FTRACE are selected. If any one of those three are disabled it 
boots (or fails in some other way which I'm looking at now). The serial 
console output from both machines when they fail to boot is below, let me 
know if there is any other information I can provide.

ARM (Marvell Orion 5x):
<5>Linux version 2.6.26-rc5-mm3-dirty (bb3081@gamma) (gcc version 4.2.0 20070413 (prerelease) (CodeSourcery Sourcery G++ Lite 2007q1-21)) #24 PREEMPT Thu Jun 12 23:39:12 BST 2008
CPU: Feroceon [41069260] revision 0 (ARMv5TEJ), cr=a0053177
Machine: QNAP TS-109/TS-209
<4>Clearing invalid memory bank 0KB@0xffffffff
<4>Clearing invalid memory bank 0KB@0xffffffff
<4>Clearing invalid memory bank 0KB@0xffffffff
<4>Ignoring unrecognised tag 0x00000000
<4>Ignoring unrecognised tag 0x00000000
<4>Ignoring unrecognised tag 0x00000000
<4>Ignoring unrecognised tag 0x41000403
Memory policy: ECC disabled, Data cache writeback
<7>On node 0 totalpages: 32768
<7>Node 0 memmap at 0xc05df000 size 1048576 first pfn 0xc05df000
<7>free_area_init_node: node 0, pgdat c0529680, node_mem_map c05df000
<7>  DMA zone: 32512 pages, LIFO batch:7
CPU0: D VIVT write-back cache
CPU0: I cache: 32768 bytes, associativity 1, 32 byte lines, 1024 sets
CPU0: D cache: 32768 bytes, associativity 1, 32 byte lines, 1024 sets
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512
<5>Kernel command line: console=ttyS0,115200n8 root=/dev/nfs nfsroot=192.168.2.53:/stuff/debian ip=dhcp
PID hash table entries: 512 (order: 9, 2048 bytes)
Console: colour dummy device 80x30
<6>Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
<6>Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
<6>Memory: 128MB = 128MB total
<5>Memory: 123776KB available (5016K code, 799K data, 160K init)
<6>SLUB: Genslabs=12, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
<7>Calibrating delay loop... 331.77 BogoMIPS (lpj=1658880)
Mount-cache hash table entries: 512
<6>CPU: Testing write buffer coherency: ok

x86 (AMD Athlon):
Linux version 2.6.26-rc5-mm3-dirty (bb3081@gamma) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #6 Thu Jun 12 23:53:18 BST 2008
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000001fff0000 (usable)
 BIOS-e820: 000000001fff0000 - 000000001fff3000 (ACPI NVS)
 BIOS-e820: 000000001fff3000 - 0000000020000000 (ACPI data)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
last_pfn = 131056 max_arch_pfn = 1048576
0MB HIGHMEM available.
511MB LOWMEM available.
  mapped low ram: 0 - 01400000
  low ram: 00f7a000 - 1fff0000
  bootmap 00f7a000 - 00f7e000
  early res: 0 [0-fff] BIOS data page
  early res: 1 [100000-f74657] TEXT DATA BSS
  early res: 2 [f75000-f79fff] INIT_PG_TABLE
  early res: 3 [9f800-fffff] BIOS reserved
  early res: 4 [f7a000-f7dfff] BOOTMAP
Zone PFN ranges:
  DMA             0 ->     4096
  Normal       4096 ->   131056
  HighMem    131056 ->   131056
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
    0:        0 ->      159
    0:      256 ->   131056
DMI 2.2 present.
ACPI: RSDP 000F7950, 0014 (r0 Nvidia)
ACPI: RSDT 1FFF3000, 002C (r1 Nvidia AWRDACPI 42302E31 AWRD        0)
ACPI: FACP 1FFF3040, 0074 (r1 Nvidia AWRDACPI 42302E31 AWRD        0)
ACPI: DSDT 1FFF30C0, 4C22 (r1 NVIDIA AWRDACPI     1000 MSFT  100000E)
ACPI: FACS 1FFF0000, 0040
ACPI: APIC 1FFF7D00, 006E (r1 Nvidia AWRDACPI 42302E31 AWRD        0)
ACPI: PM-Timer IO Port: 0x4008
Allocating PCI resources starting at 30000000 (gap: 20000000:dec00000)
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 129935
Kernel command line: console=ttyS0,115200 root=/dev/nfs nfsroot=192.168.2.53:/stuff/debian-amd ip=dhcp BOOT_IMAGE=linux.amd
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 8192 bytes)
Detected 1102.525 MHz processor.
Console: colour VGA+ 80x25
console [ttyS0] enabled
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 504184k/524224k available (8084k kernel code, 19476k reserved, 2784k data, 436k init, 0k highmem)
virtual kernel memory layout:
    fixmap  : 0xfffed000 - 0xfffff000   (  72 kB)
    pkmap   : 0xff800000 - 0xffc00000   (4096 kB)
    vmalloc : 0xe0800000 - 0xff7fe000   ( 495 MB)
    lowmem  : 0xc0000000 - 0xdfff0000   ( 511 MB)
      .init : 0xc0ba0000 - 0xc0c0d000   ( 436 kB)
      .data : 0xc08e53f1 - 0xc0b9d418   (2784 kB)
      .text : 0xc0100000 - 0xc08e53f1   (8084 kB)
Checking if this processor honours the WP bit even in supervisor mode...Ok.
SLUB: Genslabs=12, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
Calibrating delay using timer specific routine.. 2207.88 BogoMIPS (lpj=11039440)
Mount-cache hash table entries: 512
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: AMD Athlon(tm)  stepping 00
Checking 'hlt' instruction... OK.
Freeing SMP alternatives: 0k freed
ACPI: Core revision 20080321
Parsing all Control Methods:
Table [DSDT](id 0001) - 804 Objects with 77 Devices 276 Methods 35 Regions
 tbxface-0598 [00] tb_load_namespace     : ACPI Tables successfully acquired
ACPI: setting ELCR to 0200 (from 1c28)
evxfevnt-0091 [00] enable                : Transition to ACPI mode successful
gcov: version magic: 0x3430322a
net_namespace: 324 bytes


Cheers,

-- 
Byron Bradley
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3
  2008-06-12 23:32   ` 2.6.26-rc5-mm3 Byron Bradley
  (?)
@ 2008-06-12 23:55     ` Daniel Walker
  -1 siblings, 0 replies; 290+ messages in thread
From: Daniel Walker @ 2008-06-12 23:55 UTC (permalink / raw)
  To: Byron Bradley
  Cc: Andrew Morton, linux-kernel, kernel-testers, linux-mm, Hua Zhong,
	Ingo Molnar


On Fri, 2008-06-13 at 00:32 +0100, Byron Bradley wrote:
> Looks like x86 and ARM both fail to boot if PROFILE_LIKELY, FTRACE and 
> DYNAMIC_FTRACE are selected. If any one of those three are disabled it 
> boots (or fails in some other way which I'm looking at now). The serial 
> console output from both machines when they fail to boot is below, let me 
> know if there is any other information I can provide.

Did you happen to check PROFILE_LIKELY and FTRACE alone?

Daniel


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3
@ 2008-06-12 23:55     ` Daniel Walker
  0 siblings, 0 replies; 290+ messages in thread
From: Daniel Walker @ 2008-06-12 23:55 UTC (permalink / raw)
  To: Byron Bradley
  Cc: Andrew Morton, linux-kernel, kernel-testers, linux-mm, Hua Zhong,
	Ingo Molnar

On Fri, 2008-06-13 at 00:32 +0100, Byron Bradley wrote:
> Looks like x86 and ARM both fail to boot if PROFILE_LIKELY, FTRACE and 
> DYNAMIC_FTRACE are selected. If any one of those three are disabled it 
> boots (or fails in some other way which I'm looking at now). The serial 
> console output from both machines when they fail to boot is below, let me 
> know if there is any other information I can provide.

Did you happen to check PROFILE_LIKELY and FTRACE alone?

Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3
@ 2008-06-12 23:55     ` Daniel Walker
  0 siblings, 0 replies; 290+ messages in thread
From: Daniel Walker @ 2008-06-12 23:55 UTC (permalink / raw)
  To: Byron Bradley
  Cc: Andrew Morton, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Hua Zhong, Ingo Molnar


On Fri, 2008-06-13 at 00:32 +0100, Byron Bradley wrote:
> Looks like x86 and ARM both fail to boot if PROFILE_LIKELY, FTRACE and 
> DYNAMIC_FTRACE are selected. If any one of those three are disabled it 
> boots (or fails in some other way which I'm looking at now). The serial 
> console output from both machines when they fail to boot is below, let me 
> know if there is any other information I can provide.

Did you happen to check PROFILE_LIKELY and FTRACE alone?

Daniel

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3
  2008-06-12 23:55     ` 2.6.26-rc5-mm3 Daniel Walker
@ 2008-06-13  0:04       ` Byron Bradley
  -1 siblings, 0 replies; 290+ messages in thread
From: Byron Bradley @ 2008-06-13  0:04 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Byron Bradley, Andrew Morton, linux-kernel, kernel-testers,
	linux-mm, Hua Zhong, Ingo Molnar

On Thu, 12 Jun 2008, Daniel Walker wrote:

> 
> On Fri, 2008-06-13 at 00:32 +0100, Byron Bradley wrote:
> > Looks like x86 and ARM both fail to boot if PROFILE_LIKELY, FTRACE and 
> > DYNAMIC_FTRACE are selected. If any one of those three are disabled it 
> > boots (or fails in some other way which I'm looking at now). The serial 
> > console output from both machines when they fail to boot is below, let me 
> > know if there is any other information I can provide.
> 
> Did you happen to check PROFILE_LIKELY and FTRACE alone?

Yes, without DYNAMIC_FTRACE the arm box gets all the way to userspace and 
the x86 box panics while registering a driver so most likely unrelated to 
this problem.

-- 
Byron Bradley

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3
@ 2008-06-13  0:04       ` Byron Bradley
  0 siblings, 0 replies; 290+ messages in thread
From: Byron Bradley @ 2008-06-13  0:04 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Byron Bradley, Andrew Morton, linux-kernel, kernel-testers,
	linux-mm, Hua Zhong, Ingo Molnar

On Thu, 12 Jun 2008, Daniel Walker wrote:

> 
> On Fri, 2008-06-13 at 00:32 +0100, Byron Bradley wrote:
> > Looks like x86 and ARM both fail to boot if PROFILE_LIKELY, FTRACE and 
> > DYNAMIC_FTRACE are selected. If any one of those three are disabled it 
> > boots (or fails in some other way which I'm looking at now). The serial 
> > console output from both machines when they fail to boot is below, let me 
> > know if there is any other information I can provide.
> 
> Did you happen to check PROFILE_LIKELY and FTRACE alone?

Yes, without DYNAMIC_FTRACE the arm box gets all the way to userspace and 
the x86 box panics while registering a driver so most likely unrelated to 
this problem.

-- 
Byron Bradley

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
  2008-06-12 11:38       ` Nick Piggin
@ 2008-06-13  0:25         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-13  0:25 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Rik van Riel, Kamalesh Babulal, linux-kernel,
	kernel-testers, linux-mm, Nick Piggin, Andy Whitcroft

On Thu, 12 Jun 2008 21:38:59 +1000
Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> +int putback_lru_page(struct page *page)
> +{
> +       int lru;
> +       int ret = 1;
> +       int was_unevictable;
> +
> +       VM_BUG_ON(!PageLocked(page));
> +       VM_BUG_ON(PageLRU(page));
> +
> +       lru = !!TestClearPageActive(page);
> +       was_unevictable = TestClearPageUnevictable(page); /* for 
> page_evictable() */
> +
> +       if (unlikely(!page->mapping)) {
> +               /*
> +                * page truncated.  drop lock as put_page() will
> +                * free the page.
> +                */
> +               VM_BUG_ON(page_count(page) != 1);
> +               unlock_page(page);
>                 ^^^^^^^^^^^^^^^^^^
> 
> 
> This is a rather wild thing to be doing. It's a really bad idea
> to drop a lock that's taken several function calls distant and
> across different files...
> 
I agree and strongly hope this unlock should be removed.
The caller can do unlock by itself, I think.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-13  0:25         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-13  0:25 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Rik van Riel, Kamalesh Babulal, linux-kernel,
	kernel-testers, linux-mm, Nick Piggin, Andy Whitcroft

On Thu, 12 Jun 2008 21:38:59 +1000
Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> +int putback_lru_page(struct page *page)
> +{
> +       int lru;
> +       int ret = 1;
> +       int was_unevictable;
> +
> +       VM_BUG_ON(!PageLocked(page));
> +       VM_BUG_ON(PageLRU(page));
> +
> +       lru = !!TestClearPageActive(page);
> +       was_unevictable = TestClearPageUnevictable(page); /* for 
> page_evictable() */
> +
> +       if (unlikely(!page->mapping)) {
> +               /*
> +                * page truncated.  drop lock as put_page() will
> +                * free the page.
> +                */
> +               VM_BUG_ON(page_count(page) != 1);
> +               unlock_page(page);
>                 ^^^^^^^^^^^^^^^^^^
> 
> 
> This is a rather wild thing to be doing. It's a really bad idea
> to drop a lock that's taken several function calls distant and
> across different files...
> 
I agree and strongly hope this unlock should be removed.
The caller can do unlock by itself, I think.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
  2008-06-12 11:20       ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-13  1:44         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-13  1:44 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Kamalesh Babulal, linux-kernel, kernel-testers,
	linux-mm, Nick Piggin, Andy Whitcroft, riel, Lee.Schermerhorn

This is reproducer of panic. "quick fix" is attached.
But I think putback_lru_page() should be re-designed.

==
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
#include <errno.h>

int main(int argc, char *argv[])
{
        int fd;
        char *filename = argv[1];
        char buffer[4096];
        char *addr;
        int len;

        fd = open(filename, O_CREAT | O_EXCL | O_RDWR, S_IRWXU);

        if (fd < 0) {
                perror("open");
                exit(1);
        }
        len = write(fd, buffer, sizeof(buffer));

        if (len < 0) {
                perror("write");
                exit(1);
        }

        addr = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED|MAP_LOCKED, fd, 0);
        if (addr == MAP_FAILED) {
                perror("mmap");
                exit(1);
        }
        munmap(addr, 4096);
        close(fd);

        unlink(filename);
}
==
you'll see panic.

Fix is here
==

quick fix for double unlock_page();

Signed-off-by: KAMEZAWA Hiroyuki <kamewzawa.hiroyu@jp.fujitsu.com>
Index: linux-2.6.26-rc5-mm3/mm/truncate.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/mm/truncate.c
+++ linux-2.6.26-rc5-mm3/mm/truncate.c
@@ -104,8 +104,8 @@ truncate_complete_page(struct address_sp
 
 	cancel_dirty_page(page, PAGE_CACHE_SIZE);
 
-	remove_from_page_cache(page);
 	clear_page_mlock(page);
+	remove_from_page_cache(page);
 	ClearPageUptodate(page);
 	ClearPageMappedToDisk(page);
 	page_cache_release(page);	/* pagecache ref */


^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-13  1:44         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-13  1:44 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Kamalesh Babulal, linux-kernel, kernel-testers,
	linux-mm, Nick Piggin, Andy Whitcroft, riel, Lee.Schermerhorn

This is reproducer of panic. "quick fix" is attached.
But I think putback_lru_page() should be re-designed.

==
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
#include <errno.h>

int main(int argc, char *argv[])
{
        int fd;
        char *filename = argv[1];
        char buffer[4096];
        char *addr;
        int len;

        fd = open(filename, O_CREAT | O_EXCL | O_RDWR, S_IRWXU);

        if (fd < 0) {
                perror("open");
                exit(1);
        }
        len = write(fd, buffer, sizeof(buffer));

        if (len < 0) {
                perror("write");
                exit(1);
        }

        addr = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED|MAP_LOCKED, fd, 0);
        if (addr == MAP_FAILED) {
                perror("mmap");
                exit(1);
        }
        munmap(addr, 4096);
        close(fd);

        unlink(filename);
}
==
you'll see panic.

Fix is here
==

quick fix for double unlock_page();

Signed-off-by: KAMEZAWA Hiroyuki <kamewzawa.hiroyu@jp.fujitsu.com>
Index: linux-2.6.26-rc5-mm3/mm/truncate.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/mm/truncate.c
+++ linux-2.6.26-rc5-mm3/mm/truncate.c
@@ -104,8 +104,8 @@ truncate_complete_page(struct address_sp
 
 	cancel_dirty_page(page, PAGE_CACHE_SIZE);
 
-	remove_from_page_cache(page);
 	clear_page_mlock(page);
+	remove_from_page_cache(page);
 	ClearPageUptodate(page);
 	ClearPageMappedToDisk(page);
 	page_cache_release(page);	/* pagecache ref */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-13  1:44         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-13  1:44 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Kamalesh Babulal,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Nick Piggin, Andy Whitcroft,
	riel-H+wXaHxf7aLQT0dZR+AlfA, Lee.Schermerhorn-VXdhtT5mjnY

This is reproducer of panic. "quick fix" is attached.
But I think putback_lru_page() should be re-designed.

==
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
#include <errno.h>

int main(int argc, char *argv[])
{
        int fd;
        char *filename = argv[1];
        char buffer[4096];
        char *addr;
        int len;

        fd = open(filename, O_CREAT | O_EXCL | O_RDWR, S_IRWXU);

        if (fd < 0) {
                perror("open");
                exit(1);
        }
        len = write(fd, buffer, sizeof(buffer));

        if (len < 0) {
                perror("write");
                exit(1);
        }

        addr = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED|MAP_LOCKED, fd, 0);
        if (addr == MAP_FAILED) {
                perror("mmap");
                exit(1);
        }
        munmap(addr, 4096);
        close(fd);

        unlink(filename);
}
==
you'll see panic.

Fix is here
==

quick fix for double unlock_page();

Signed-off-by: KAMEZAWA Hiroyuki <kamewzawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
Index: linux-2.6.26-rc5-mm3/mm/truncate.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/mm/truncate.c
+++ linux-2.6.26-rc5-mm3/mm/truncate.c
@@ -104,8 +104,8 @@ truncate_complete_page(struct address_sp
 
 	cancel_dirty_page(page, PAGE_CACHE_SIZE);
 
-	remove_from_page_cache(page);
 	clear_page_mlock(page);
+	remove_from_page_cache(page);
 	ClearPageUptodate(page);
 	ClearPageMappedToDisk(page);
 	page_cache_release(page);	/* pagecache ref */

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
  2008-06-13  1:44         ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-13  2:13           ` Andrew Morton
  -1 siblings, 0 replies; 290+ messages in thread
From: Andrew Morton @ 2008-06-13  2:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Kamalesh Babulal, linux-kernel, kernel-testers, linux-mm,
	Nick Piggin, Andy Whitcroft, riel, Lee.Schermerhorn

On Fri, 13 Jun 2008 10:44:44 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> This is reproducer of panic. "quick fix" is attached.

Thanks - I put that in
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/hot-fixes/

> But I think putback_lru_page() should be re-designed.

Yes, it sounds that way.

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-13  2:13           ` Andrew Morton
  0 siblings, 0 replies; 290+ messages in thread
From: Andrew Morton @ 2008-06-13  2:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Kamalesh Babulal, linux-kernel, kernel-testers, linux-mm,
	Nick Piggin, Andy Whitcroft, riel, Lee.Schermerhorn

On Fri, 13 Jun 2008 10:44:44 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> This is reproducer of panic. "quick fix" is attached.

Thanks - I put that in
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/hot-fixes/

> But I think putback_lru_page() should be re-designed.

Yes, it sounds that way.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-13  2:13           ` Andrew Morton
  0 siblings, 0 replies; 290+ messages in thread
From: Andrew Morton @ 2008-06-13  2:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Kamalesh Babulal, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Nick Piggin, Andy Whitcroft,
	riel-H+wXaHxf7aLQT0dZR+AlfA, Lee.Schermerhorn-VXdhtT5mjnY

On Fri, 13 Jun 2008 10:44:44 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:

> This is reproducer of panic. "quick fix" is attached.

Thanks - I put that in
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/hot-fixes/

> But I think putback_lru_page() should be re-designed.

Yes, it sounds that way.
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-13  4:18     ` Valdis.Kletnieks-PjAqaU27lzQ
  0 siblings, 0 replies; 290+ messages in thread
From: Valdis.Kletnieks @ 2008-06-13  4:18 UTC (permalink / raw)
  To: Kamalesh Babulal
  Cc: Andrew Morton, linux-kernel, kernel-testers, linux-mm,
	Nick Piggin, Andy Whitcroft

[-- Attachment #1: Type: text/plain, Size: 1110 bytes --]

On Thu, 12 Jun 2008 14:14:21 +0530, Kamalesh Babulal said:
> Hi Andrew,
> 
> 2.6.26-rc5-mm3 kernel panics while booting up on the x86_64
> machine. Sorry the console is bit overwritten for the first few lines.

> no fstab.kernel BUG at mm/filemap.c:575!

For whatever it's worth, I'm seeing the same thing on my x86_64 laptop.
-rc5-mm2 works OK, I'm going to try to bisect it tonight.

% diff -u /usr/src/linux-2.6.26-rc5-mm[23]/.config
--- /usr/src/linux-2.6.26-rc5-mm2/.config       2008-06-10 22:21:13.000000000 -0400
+++ /usr/src/linux-2.6.26-rc5-mm3/.config       2008-06-12 22:20:25.000000000 -0400
@@ -1,7 +1,7 @@
 #
 # Automatically generated make config: don't edit
-# Linux kernel version: 2.6.26-rc5-mm2
-# Tue Jun 10 22:21:13 2008
+# Linux kernel version: 2.6.26-rc5-mm3
+# Thu Jun 12 22:20:25 2008
 #
 CONFIG_64BIT=y
 # CONFIG_X86_32 is not set
@@ -275,7 +275,7 @@
 CONFIG_ZONE_DMA_FLAG=1
 CONFIG_BOUNCE=y
 CONFIG_VIRT_TO_BUS=y
-# CONFIG_NORECLAIM_LRU is not set
+CONFIG_UNEVICTABLE_LRU=y
 CONFIG_MTRR=y
 CONFIG_MTRR_SANITIZER=y
 CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=0

Not much changed there...


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-13  4:18     ` Valdis.Kletnieks-PjAqaU27lzQ
  0 siblings, 0 replies; 290+ messages in thread
From: Valdis.Kletnieks-PjAqaU27lzQ @ 2008-06-13  4:18 UTC (permalink / raw)
  To: Kamalesh Babulal
  Cc: Andrew Morton, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Nick Piggin, Andy Whitcroft

[-- Attachment #1: Type: text/plain, Size: 1110 bytes --]

On Thu, 12 Jun 2008 14:14:21 +0530, Kamalesh Babulal said:
> Hi Andrew,
> 
> 2.6.26-rc5-mm3 kernel panics while booting up on the x86_64
> machine. Sorry the console is bit overwritten for the first few lines.

> no fstab.kernel BUG at mm/filemap.c:575!

For whatever it's worth, I'm seeing the same thing on my x86_64 laptop.
-rc5-mm2 works OK, I'm going to try to bisect it tonight.

% diff -u /usr/src/linux-2.6.26-rc5-mm[23]/.config
--- /usr/src/linux-2.6.26-rc5-mm2/.config       2008-06-10 22:21:13.000000000 -0400
+++ /usr/src/linux-2.6.26-rc5-mm3/.config       2008-06-12 22:20:25.000000000 -0400
@@ -1,7 +1,7 @@
 #
 # Automatically generated make config: don't edit
-# Linux kernel version: 2.6.26-rc5-mm2
-# Tue Jun 10 22:21:13 2008
+# Linux kernel version: 2.6.26-rc5-mm3
+# Thu Jun 12 22:20:25 2008
 #
 CONFIG_64BIT=y
 # CONFIG_X86_32 is not set
@@ -275,7 +275,7 @@
 CONFIG_ZONE_DMA_FLAG=1
 CONFIG_BOUNCE=y
 CONFIG_VIRT_TO_BUS=y
-# CONFIG_NORECLAIM_LRU is not set
+CONFIG_UNEVICTABLE_LRU=y
 CONFIG_MTRR=y
 CONFIG_MTRR_SANITIZER=y
 CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=0

Not much changed there...


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-13  4:34           ` Valdis.Kletnieks-PjAqaU27lzQ
  0 siblings, 0 replies; 290+ messages in thread
From: Valdis.Kletnieks @ 2008-06-13  4:34 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Kamalesh Babulal, linux-kernel, kernel-testers,
	linux-mm, Nick Piggin, Andy Whitcroft, riel, Lee.Schermerhorn

[-- Attachment #1: Type: text/plain, Size: 808 bytes --]

On Fri, 13 Jun 2008 10:44:44 +0900, KAMEZAWA Hiroyuki said:

> quick fix for double unlock_page();
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamewzawa.hiroyu@jp.fujitsu.com>
> Index: linux-2.6.26-rc5-mm3/mm/truncate.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/mm/truncate.c
> +++ linux-2.6.26-rc5-mm3/mm/truncate.c
> @@ -104,8 +104,8 @@ truncate_complete_page(struct address_sp
>  
>  	cancel_dirty_page(page, PAGE_CACHE_SIZE);
>  
> -	remove_from_page_cache(page);
>  	clear_page_mlock(page);
> +	remove_from_page_cache(page);
>  	ClearPageUptodate(page);
>  	ClearPageMappedToDisk(page);
>  	page_cache_release(page);	/* pagecache ref */

Confirming this quick fix works on my laptop that was hitting this crash -
am now up and running on -rc5-mm3.

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-13  4:34           ` Valdis.Kletnieks-PjAqaU27lzQ
  0 siblings, 0 replies; 290+ messages in thread
From: Valdis.Kletnieks-PjAqaU27lzQ @ 2008-06-13  4:34 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Kamalesh Babulal,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Nick Piggin, Andy Whitcroft,
	riel-H+wXaHxf7aLQT0dZR+AlfA, Lee.Schermerhorn-VXdhtT5mjnY

[-- Attachment #1: Type: text/plain, Size: 833 bytes --]

On Fri, 13 Jun 2008 10:44:44 +0900, KAMEZAWA Hiroyuki said:

> quick fix for double unlock_page();
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamewzawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> Index: linux-2.6.26-rc5-mm3/mm/truncate.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/mm/truncate.c
> +++ linux-2.6.26-rc5-mm3/mm/truncate.c
> @@ -104,8 +104,8 @@ truncate_complete_page(struct address_sp
>  
>  	cancel_dirty_page(page, PAGE_CACHE_SIZE);
>  
> -	remove_from_page_cache(page);
>  	clear_page_mlock(page);
> +	remove_from_page_cache(page);
>  	ClearPageUptodate(page);
>  	ClearPageMappedToDisk(page);
>  	page_cache_release(page);	/* pagecache ref */

Confirming this quick fix works on my laptop that was hitting this crash -
am now up and running on -rc5-mm3.

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
  2008-06-13  4:18     ` Valdis.Kletnieks-PjAqaU27lzQ
  (?)
@ 2008-06-13  7:16       ` Andrew Morton
  -1 siblings, 0 replies; 290+ messages in thread
From: Andrew Morton @ 2008-06-13  7:16 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Kamalesh Babulal, linux-kernel, kernel-testers, linux-mm,
	Nick Piggin, Andy Whitcroft

On Fri, 13 Jun 2008 00:18:43 -0400 Valdis.Kletnieks@vt.edu wrote:

> On Thu, 12 Jun 2008 14:14:21 +0530, Kamalesh Babulal said:
> > Hi Andrew,
> > 
> > 2.6.26-rc5-mm3 kernel panics while booting up on the x86_64
> > machine. Sorry the console is bit overwritten for the first few lines.
> 
> > no fstab.kernel BUG at mm/filemap.c:575!
> 
> For whatever it's worth, I'm seeing the same thing on my x86_64 laptop.
> -rc5-mm2 works OK, I'm going to try to bisect it tonight.

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/hot-fixes/fix-double-unlock_page-in-2626-rc5-mm3-kernel-bug-at-mm-filemapc-575.patch is said to "fix" it.


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-13  7:16       ` Andrew Morton
  0 siblings, 0 replies; 290+ messages in thread
From: Andrew Morton @ 2008-06-13  7:16 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Kamalesh Babulal, linux-kernel, kernel-testers, linux-mm,
	Nick Piggin, Andy Whitcroft

On Fri, 13 Jun 2008 00:18:43 -0400 Valdis.Kletnieks@vt.edu wrote:

> On Thu, 12 Jun 2008 14:14:21 +0530, Kamalesh Babulal said:
> > Hi Andrew,
> > 
> > 2.6.26-rc5-mm3 kernel panics while booting up on the x86_64
> > machine. Sorry the console is bit overwritten for the first few lines.
> 
> > no fstab.kernel BUG at mm/filemap.c:575!
> 
> For whatever it's worth, I'm seeing the same thing on my x86_64 laptop.
> -rc5-mm2 works OK, I'm going to try to bisect it tonight.

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/hot-fixes/fix-double-unlock_page-in-2626-rc5-mm3-kernel-bug-at-mm-filemapc-575.patch is said to "fix" it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-13  7:16       ` Andrew Morton
  0 siblings, 0 replies; 290+ messages in thread
From: Andrew Morton @ 2008-06-13  7:16 UTC (permalink / raw)
  To: Valdis.Kletnieks-PjAqaU27lzQ
  Cc: Kamalesh Babulal, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Nick Piggin, Andy Whitcroft

On Fri, 13 Jun 2008 00:18:43 -0400 Valdis.Kletnieks-PjAqaU27lzQ@public.gmane.org wrote:

> On Thu, 12 Jun 2008 14:14:21 +0530, Kamalesh Babulal said:
> > Hi Andrew,
> > 
> > 2.6.26-rc5-mm3 kernel panics while booting up on the x86_64
> > machine. Sorry the console is bit overwritten for the first few lines.
> 
> > no fstab.kernel BUG at mm/filemap.c:575!
> 
> For whatever it's worth, I'm seeing the same thing on my x86_64 laptop.
> -rc5-mm2 works OK, I'm going to try to bisect it tonight.

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/hot-fixes/fix-double-unlock_page-in-2626-rc5-mm3-kernel-bug-at-mm-filemapc-575.patch is said to "fix" it.

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
  2008-06-13  2:13           ` Andrew Morton
  (?)
@ 2008-06-13 15:30             ` Lee Schermerhorn
  -1 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-13 15:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Kamalesh Babulal, linux-kernel,
	kernel-testers, linux-mm, Nick Piggin, Andy Whitcroft, riel

On Thu, 2008-06-12 at 19:13 -0700, Andrew Morton wrote:
> On Fri, 13 Jun 2008 10:44:44 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > This is reproducer of panic. "quick fix" is attached.
> 
> Thanks - I put that in
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/hot-fixes/
> 
> > But I think putback_lru_page() should be re-designed.
> 
> Yes, it sounds that way.

Here's a proposed replacement patch that reworks putback_lru_page()
slightly and cleans up the call sites.  I still want to balance the
get_page() in isolate_lru_page() with a put_page() in putback_lru_page()
for the primary users--vmscan and page migration.  So, I need to drop
the lock before the put_page() when handed a page with null mapping and
a single reference count as the page will be freed on put_page() and a
locked page would bug out in free_pages_check()/bad_page().  

Lee

PATCH fix page unlocking protocol for putback_lru_page()

Against:  2.6.26-rc5-mm3

Replaces Kame-san's hotfix:
fix-double-unlock_page-in-2626-rc5-mm3-kernel-bug-at-mm-filemapc-575.patch

Applies at end of vmscan/unevictable/mlock series to avoid patch conflicts.

1)  modified putback_lru_page() to drop page lock only if both page_mapping()
    NULL and page_count() == 1 [rather than VM_BUG_ON(page_count(page) != 1].
    I want to balance the put_page() from isolate_lru_page() here for vmscan
    and, e.g., page migration rather than requiring explicit checks of the
    page_mapping() and explicit put_page() in these areas.  However, the page
    could be truncated while one of these subsystems holds it isolated from
    the LRU.  So, need to handle this case.  Callers of putback_lru_page()
    need to be aware of this and only call it with a page with NULL
    page_mapping() when they will no longer reference the page afterwards.
    This is the case for vmscan and page migration.

2)  m[un]lock_vma_page() already will not be called for page with NULL
    mapping.  Added VM_BUG_ON() to assert this.

3)  modified clear_page_lock() to skip the isolate/putback shuffle for
    pages with NULL mapping, as they are being truncated/freed.  Thus,
    any future callers of clear_page_lock() need not be concerned about
    the putback_lru_page() semantics for truncated pages.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 mm/mlock.c  |   29 +++++++++++++++++++----------
 mm/vmscan.c |   12 +++++++-----
 2 files changed, 26 insertions(+), 15 deletions(-)

Index: linux-2.6.26-rc5-mm3/mm/mlock.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/mm/mlock.c	2008-06-12 11:42:59.000000000 -0400
+++ linux-2.6.26-rc5-mm3/mm/mlock.c	2008-06-13 09:47:14.000000000 -0400
@@ -59,27 +59,33 @@ void __clear_page_mlock(struct page *pag
 
 	dec_zone_page_state(page, NR_MLOCK);
 	count_vm_event(NORECL_PGCLEARED);
-	if (!isolate_lru_page(page)) {
-		putback_lru_page(page);
-	} else {
-		/*
-		 * Page not on the LRU yet.  Flush all pagevecs and retry.
-		 */
-		lru_add_drain_all();
-		if (!isolate_lru_page(page))
+	if (page->mapping) {	/* truncated ? */
+		if (!isolate_lru_page(page)) {
 			putback_lru_page(page);
-		else if (PageUnevictable(page))
-			count_vm_event(NORECL_PGSTRANDED);
+		} else {
+			/*
+			 * Page not on the LRU yet.
+			 * Flush all pagevecs and retry.
+			 */
+			lru_add_drain_all();
+			if (!isolate_lru_page(page))
+				putback_lru_page(page);
+			else if (PageUnevictable(page))
+				count_vm_event(NORECL_PGSTRANDED);
+		}
 	}
 }
 
 /*
  * Mark page as mlocked if not already.
  * If page on LRU, isolate and putback to move to unevictable list.
+ *
+ * Called with page locked and page_mapping() != NULL.
  */
 void mlock_vma_page(struct page *page)
 {
 	BUG_ON(!PageLocked(page));
+	VM_BUG_ON(!page_mapping(page));
 
 	if (!TestSetPageMlocked(page)) {
 		inc_zone_page_state(page, NR_MLOCK);
@@ -92,6 +98,8 @@ void mlock_vma_page(struct page *page)
 /*
  * called from munlock()/munmap() path with page supposedly on the LRU.
  *
+ * Called with page locked and page_mapping() != NULL.
+ *
  * Note:  unlike mlock_vma_page(), we can't just clear the PageMlocked
  * [in try_to_munlock()] and then attempt to isolate the page.  We must
  * isolate the page to keep others from messing with its unevictable
@@ -110,6 +118,7 @@ void mlock_vma_page(struct page *page)
 static void munlock_vma_page(struct page *page)
 {
 	BUG_ON(!PageLocked(page));
+	VM_BUG_ON(!page_mapping(page));
 
 	if (TestClearPageMlocked(page)) {
 		dec_zone_page_state(page, NR_MLOCK);
Index: linux-2.6.26-rc5-mm3/mm/vmscan.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/mm/vmscan.c	2008-06-12 11:39:09.000000000 -0400
+++ linux-2.6.26-rc5-mm3/mm/vmscan.c	2008-06-13 09:44:44.000000000 -0400
@@ -1,4 +1,4 @@
-/*
+ /*
  *  linux/mm/vmscan.c
  *
  *  Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
@@ -488,6 +488,9 @@ int remove_mapping(struct address_space 
  * lru_lock must not be held, interrupts must be enabled.
  * Must be called with page locked.
  *
+ * If page truncated [page_mapping() == NULL] and we hold the last reference,
+ * the page will be freed here.  For vmscan and page migration.
+ *
  * return 1 if page still locked [not truncated], else 0
  */
 int putback_lru_page(struct page *page)
@@ -502,12 +505,11 @@ int putback_lru_page(struct page *page)
 	lru = !!TestClearPageActive(page);
 	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
 
-	if (unlikely(!page->mapping)) {
+	if (unlikely(!page->mapping && page_count(page) == 1)) {
 		/*
-		 * page truncated.  drop lock as put_page() will
-		 * free the page.
+		 * page truncated and we hold last reference.
+		 * drop lock as put_page() will free the page.
 		 */
-		VM_BUG_ON(page_count(page) != 1);
 		unlock_page(page);
 		ret = 0;
 	} else if (page_evictable(page, NULL)) {



^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-13 15:30             ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-13 15:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Kamalesh Babulal, linux-kernel,
	kernel-testers, linux-mm, Nick Piggin, Andy Whitcroft, riel

On Thu, 2008-06-12 at 19:13 -0700, Andrew Morton wrote:
> On Fri, 13 Jun 2008 10:44:44 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > This is reproducer of panic. "quick fix" is attached.
> 
> Thanks - I put that in
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/hot-fixes/
> 
> > But I think putback_lru_page() should be re-designed.
> 
> Yes, it sounds that way.

Here's a proposed replacement patch that reworks putback_lru_page()
slightly and cleans up the call sites.  I still want to balance the
get_page() in isolate_lru_page() with a put_page() in putback_lru_page()
for the primary users--vmscan and page migration.  So, I need to drop
the lock before the put_page() when handed a page with null mapping and
a single reference count as the page will be freed on put_page() and a
locked page would bug out in free_pages_check()/bad_page().  

Lee

PATCH fix page unlocking protocol for putback_lru_page()

Against:  2.6.26-rc5-mm3

Replaces Kame-san's hotfix:
fix-double-unlock_page-in-2626-rc5-mm3-kernel-bug-at-mm-filemapc-575.patch

Applies at end of vmscan/unevictable/mlock series to avoid patch conflicts.

1)  modified putback_lru_page() to drop page lock only if both page_mapping()
    NULL and page_count() == 1 [rather than VM_BUG_ON(page_count(page) != 1].
    I want to balance the put_page() from isolate_lru_page() here for vmscan
    and, e.g., page migration rather than requiring explicit checks of the
    page_mapping() and explicit put_page() in these areas.  However, the page
    could be truncated while one of these subsystems holds it isolated from
    the LRU.  So, need to handle this case.  Callers of putback_lru_page()
    need to be aware of this and only call it with a page with NULL
    page_mapping() when they will no longer reference the page afterwards.
    This is the case for vmscan and page migration.

2)  m[un]lock_vma_page() already will not be called for page with NULL
    mapping.  Added VM_BUG_ON() to assert this.

3)  modified clear_page_lock() to skip the isolate/putback shuffle for
    pages with NULL mapping, as they are being truncated/freed.  Thus,
    any future callers of clear_page_lock() need not be concerned about
    the putback_lru_page() semantics for truncated pages.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 mm/mlock.c  |   29 +++++++++++++++++++----------
 mm/vmscan.c |   12 +++++++-----
 2 files changed, 26 insertions(+), 15 deletions(-)

Index: linux-2.6.26-rc5-mm3/mm/mlock.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/mm/mlock.c	2008-06-12 11:42:59.000000000 -0400
+++ linux-2.6.26-rc5-mm3/mm/mlock.c	2008-06-13 09:47:14.000000000 -0400
@@ -59,27 +59,33 @@ void __clear_page_mlock(struct page *pag
 
 	dec_zone_page_state(page, NR_MLOCK);
 	count_vm_event(NORECL_PGCLEARED);
-	if (!isolate_lru_page(page)) {
-		putback_lru_page(page);
-	} else {
-		/*
-		 * Page not on the LRU yet.  Flush all pagevecs and retry.
-		 */
-		lru_add_drain_all();
-		if (!isolate_lru_page(page))
+	if (page->mapping) {	/* truncated ? */
+		if (!isolate_lru_page(page)) {
 			putback_lru_page(page);
-		else if (PageUnevictable(page))
-			count_vm_event(NORECL_PGSTRANDED);
+		} else {
+			/*
+			 * Page not on the LRU yet.
+			 * Flush all pagevecs and retry.
+			 */
+			lru_add_drain_all();
+			if (!isolate_lru_page(page))
+				putback_lru_page(page);
+			else if (PageUnevictable(page))
+				count_vm_event(NORECL_PGSTRANDED);
+		}
 	}
 }
 
 /*
  * Mark page as mlocked if not already.
  * If page on LRU, isolate and putback to move to unevictable list.
+ *
+ * Called with page locked and page_mapping() != NULL.
  */
 void mlock_vma_page(struct page *page)
 {
 	BUG_ON(!PageLocked(page));
+	VM_BUG_ON(!page_mapping(page));
 
 	if (!TestSetPageMlocked(page)) {
 		inc_zone_page_state(page, NR_MLOCK);
@@ -92,6 +98,8 @@ void mlock_vma_page(struct page *page)
 /*
  * called from munlock()/munmap() path with page supposedly on the LRU.
  *
+ * Called with page locked and page_mapping() != NULL.
+ *
  * Note:  unlike mlock_vma_page(), we can't just clear the PageMlocked
  * [in try_to_munlock()] and then attempt to isolate the page.  We must
  * isolate the page to keep others from messing with its unevictable
@@ -110,6 +118,7 @@ void mlock_vma_page(struct page *page)
 static void munlock_vma_page(struct page *page)
 {
 	BUG_ON(!PageLocked(page));
+	VM_BUG_ON(!page_mapping(page));
 
 	if (TestClearPageMlocked(page)) {
 		dec_zone_page_state(page, NR_MLOCK);
Index: linux-2.6.26-rc5-mm3/mm/vmscan.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/mm/vmscan.c	2008-06-12 11:39:09.000000000 -0400
+++ linux-2.6.26-rc5-mm3/mm/vmscan.c	2008-06-13 09:44:44.000000000 -0400
@@ -1,4 +1,4 @@
-/*
+ /*
  *  linux/mm/vmscan.c
  *
  *  Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
@@ -488,6 +488,9 @@ int remove_mapping(struct address_space 
  * lru_lock must not be held, interrupts must be enabled.
  * Must be called with page locked.
  *
+ * If page truncated [page_mapping() == NULL] and we hold the last reference,
+ * the page will be freed here.  For vmscan and page migration.
+ *
  * return 1 if page still locked [not truncated], else 0
  */
 int putback_lru_page(struct page *page)
@@ -502,12 +505,11 @@ int putback_lru_page(struct page *page)
 	lru = !!TestClearPageActive(page);
 	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
 
-	if (unlikely(!page->mapping)) {
+	if (unlikely(!page->mapping && page_count(page) == 1)) {
 		/*
-		 * page truncated.  drop lock as put_page() will
-		 * free the page.
+		 * page truncated and we hold last reference.
+		 * drop lock as put_page() will free the page.
 		 */
-		VM_BUG_ON(page_count(page) != 1);
 		unlock_page(page);
 		ret = 0;
 	} else if (page_evictable(page, NULL)) {


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-13 15:30             ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-13 15:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Kamalesh Babulal,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Nick Piggin, Andy Whitcroft,
	riel-H+wXaHxf7aLQT0dZR+AlfA

On Thu, 2008-06-12 at 19:13 -0700, Andrew Morton wrote:
> On Fri, 13 Jun 2008 10:44:44 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> 
> > This is reproducer of panic. "quick fix" is attached.
> 
> Thanks - I put that in
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/hot-fixes/
> 
> > But I think putback_lru_page() should be re-designed.
> 
> Yes, it sounds that way.

Here's a proposed replacement patch that reworks putback_lru_page()
slightly and cleans up the call sites.  I still want to balance the
get_page() in isolate_lru_page() with a put_page() in putback_lru_page()
for the primary users--vmscan and page migration.  So, I need to drop
the lock before the put_page() when handed a page with null mapping and
a single reference count as the page will be freed on put_page() and a
locked page would bug out in free_pages_check()/bad_page().  

Lee

PATCH fix page unlocking protocol for putback_lru_page()

Against:  2.6.26-rc5-mm3

Replaces Kame-san's hotfix:
fix-double-unlock_page-in-2626-rc5-mm3-kernel-bug-at-mm-filemapc-575.patch

Applies at end of vmscan/unevictable/mlock series to avoid patch conflicts.

1)  modified putback_lru_page() to drop page lock only if both page_mapping()
    NULL and page_count() == 1 [rather than VM_BUG_ON(page_count(page) != 1].
    I want to balance the put_page() from isolate_lru_page() here for vmscan
    and, e.g., page migration rather than requiring explicit checks of the
    page_mapping() and explicit put_page() in these areas.  However, the page
    could be truncated while one of these subsystems holds it isolated from
    the LRU.  So, need to handle this case.  Callers of putback_lru_page()
    need to be aware of this and only call it with a page with NULL
    page_mapping() when they will no longer reference the page afterwards.
    This is the case for vmscan and page migration.

2)  m[un]lock_vma_page() already will not be called for page with NULL
    mapping.  Added VM_BUG_ON() to assert this.

3)  modified clear_page_lock() to skip the isolate/putback shuffle for
    pages with NULL mapping, as they are being truncated/freed.  Thus,
    any future callers of clear_page_lock() need not be concerned about
    the putback_lru_page() semantics for truncated pages.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn-VXdhtT5mjnY@public.gmane.org>

 mm/mlock.c  |   29 +++++++++++++++++++----------
 mm/vmscan.c |   12 +++++++-----
 2 files changed, 26 insertions(+), 15 deletions(-)

Index: linux-2.6.26-rc5-mm3/mm/mlock.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/mm/mlock.c	2008-06-12 11:42:59.000000000 -0400
+++ linux-2.6.26-rc5-mm3/mm/mlock.c	2008-06-13 09:47:14.000000000 -0400
@@ -59,27 +59,33 @@ void __clear_page_mlock(struct page *pag
 
 	dec_zone_page_state(page, NR_MLOCK);
 	count_vm_event(NORECL_PGCLEARED);
-	if (!isolate_lru_page(page)) {
-		putback_lru_page(page);
-	} else {
-		/*
-		 * Page not on the LRU yet.  Flush all pagevecs and retry.
-		 */
-		lru_add_drain_all();
-		if (!isolate_lru_page(page))
+	if (page->mapping) {	/* truncated ? */
+		if (!isolate_lru_page(page)) {
 			putback_lru_page(page);
-		else if (PageUnevictable(page))
-			count_vm_event(NORECL_PGSTRANDED);
+		} else {
+			/*
+			 * Page not on the LRU yet.
+			 * Flush all pagevecs and retry.
+			 */
+			lru_add_drain_all();
+			if (!isolate_lru_page(page))
+				putback_lru_page(page);
+			else if (PageUnevictable(page))
+				count_vm_event(NORECL_PGSTRANDED);
+		}
 	}
 }
 
 /*
  * Mark page as mlocked if not already.
  * If page on LRU, isolate and putback to move to unevictable list.
+ *
+ * Called with page locked and page_mapping() != NULL.
  */
 void mlock_vma_page(struct page *page)
 {
 	BUG_ON(!PageLocked(page));
+	VM_BUG_ON(!page_mapping(page));
 
 	if (!TestSetPageMlocked(page)) {
 		inc_zone_page_state(page, NR_MLOCK);
@@ -92,6 +98,8 @@ void mlock_vma_page(struct page *page)
 /*
  * called from munlock()/munmap() path with page supposedly on the LRU.
  *
+ * Called with page locked and page_mapping() != NULL.
+ *
  * Note:  unlike mlock_vma_page(), we can't just clear the PageMlocked
  * [in try_to_munlock()] and then attempt to isolate the page.  We must
  * isolate the page to keep others from messing with its unevictable
@@ -110,6 +118,7 @@ void mlock_vma_page(struct page *page)
 static void munlock_vma_page(struct page *page)
 {
 	BUG_ON(!PageLocked(page));
+	VM_BUG_ON(!page_mapping(page));
 
 	if (TestClearPageMlocked(page)) {
 		dec_zone_page_state(page, NR_MLOCK);
Index: linux-2.6.26-rc5-mm3/mm/vmscan.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/mm/vmscan.c	2008-06-12 11:39:09.000000000 -0400
+++ linux-2.6.26-rc5-mm3/mm/vmscan.c	2008-06-13 09:44:44.000000000 -0400
@@ -1,4 +1,4 @@
-/*
+ /*
  *  linux/mm/vmscan.c
  *
  *  Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
@@ -488,6 +488,9 @@ int remove_mapping(struct address_space 
  * lru_lock must not be held, interrupts must be enabled.
  * Must be called with page locked.
  *
+ * If page truncated [page_mapping() == NULL] and we hold the last reference,
+ * the page will be freed here.  For vmscan and page migration.
+ *
  * return 1 if page still locked [not truncated], else 0
  */
 int putback_lru_page(struct page *page)
@@ -502,12 +505,11 @@ int putback_lru_page(struct page *page)
 	lru = !!TestClearPageActive(page);
 	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
 
-	if (unlikely(!page->mapping)) {
+	if (unlikely(!page->mapping && page_count(page) == 1)) {
 		/*
-		 * page truncated.  drop lock as put_page() will
-		 * free the page.
+		 * page truncated and we hold last reference.
+		 * drop lock as put_page() will free the page.
 		 */
-		VM_BUG_ON(page_count(page) != 1);
 		unlock_page(page);
 		ret = 0;
 	} else if (page_evictable(page, NULL)) {


--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
  2008-06-13  1:44         ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-14 13:32           ` Kamalesh Babulal
  -1 siblings, 0 replies; 290+ messages in thread
From: Kamalesh Babulal @ 2008-06-14 13:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, linux-kernel, kernel-testers, linux-mm,
	Nick Piggin, Andy Whitcroft, riel, Lee.Schermerhorn

KAMEZAWA Hiroyuki wrote:
> This is reproducer of panic. "quick fix" is attached.
> But I think putback_lru_page() should be re-designed.
> 
> ==
> #include <stdio.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/mman.h>
> #include <unistd.h>
> #include <errno.h>
> 
> int main(int argc, char *argv[])
> {
>         int fd;
>         char *filename = argv[1];
>         char buffer[4096];
>         char *addr;
>         int len;
> 
>         fd = open(filename, O_CREAT | O_EXCL | O_RDWR, S_IRWXU);
> 
>         if (fd < 0) {
>                 perror("open");
>                 exit(1);
>         }
>         len = write(fd, buffer, sizeof(buffer));
> 
>         if (len < 0) {
>                 perror("write");
>                 exit(1);
>         }
> 
>         addr = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED|MAP_LOCKED, fd, 0);
>         if (addr == MAP_FAILED) {
>                 perror("mmap");
>                 exit(1);
>         }
>         munmap(addr, 4096);
>         close(fd);
> 
>         unlink(filename);
> }
> ==
> you'll see panic.
> 
> Fix is here
> ==
Hi Kame,

Thanks, The patch fixes the kernel panic.

Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> 
> quick fix for double unlock_page();
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamewzawa.hiroyu@jp.fujitsu.com>
> Index: linux-2.6.26-rc5-mm3/mm/truncate.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/mm/truncate.c
> +++ linux-2.6.26-rc5-mm3/mm/truncate.c
> @@ -104,8 +104,8 @@ truncate_complete_page(struct address_sp
> 
>  	cancel_dirty_page(page, PAGE_CACHE_SIZE);
> 
> -	remove_from_page_cache(page);
>  	clear_page_mlock(page);
> +	remove_from_page_cache(page);
>  	ClearPageUptodate(page);
>  	ClearPageMappedToDisk(page);
>  	page_cache_release(page);	/* pagecache ref */
> 


-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-14 13:32           ` Kamalesh Babulal
  0 siblings, 0 replies; 290+ messages in thread
From: Kamalesh Babulal @ 2008-06-14 13:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, linux-kernel, kernel-testers, linux-mm,
	Nick Piggin, Andy Whitcroft, riel, Lee.Schermerhorn

KAMEZAWA Hiroyuki wrote:
> This is reproducer of panic. "quick fix" is attached.
> But I think putback_lru_page() should be re-designed.
> 
> ==
> #include <stdio.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/mman.h>
> #include <unistd.h>
> #include <errno.h>
> 
> int main(int argc, char *argv[])
> {
>         int fd;
>         char *filename = argv[1];
>         char buffer[4096];
>         char *addr;
>         int len;
> 
>         fd = open(filename, O_CREAT | O_EXCL | O_RDWR, S_IRWXU);
> 
>         if (fd < 0) {
>                 perror("open");
>                 exit(1);
>         }
>         len = write(fd, buffer, sizeof(buffer));
> 
>         if (len < 0) {
>                 perror("write");
>                 exit(1);
>         }
> 
>         addr = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED|MAP_LOCKED, fd, 0);
>         if (addr == MAP_FAILED) {
>                 perror("mmap");
>                 exit(1);
>         }
>         munmap(addr, 4096);
>         close(fd);
> 
>         unlink(filename);
> }
> ==
> you'll see panic.
> 
> Fix is here
> ==
Hi Kame,

Thanks, The patch fixes the kernel panic.

Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> 
> quick fix for double unlock_page();
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamewzawa.hiroyu@jp.fujitsu.com>
> Index: linux-2.6.26-rc5-mm3/mm/truncate.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/mm/truncate.c
> +++ linux-2.6.26-rc5-mm3/mm/truncate.c
> @@ -104,8 +104,8 @@ truncate_complete_page(struct address_sp
> 
>  	cancel_dirty_page(page, PAGE_CACHE_SIZE);
> 
> -	remove_from_page_cache(page);
>  	clear_page_mlock(page);
> +	remove_from_page_cache(page);
>  	ClearPageUptodate(page);
>  	ClearPageMappedToDisk(page);
>  	page_cache_release(page);	/* pagecache ref */
> 


-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-14 13:32           ` Kamalesh Babulal
  0 siblings, 0 replies; 290+ messages in thread
From: Kamalesh Babulal @ 2008-06-14 13:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Nick Piggin, Andy Whitcroft,
	riel-H+wXaHxf7aLQT0dZR+AlfA, Lee.Schermerhorn-VXdhtT5mjnY

KAMEZAWA Hiroyuki wrote:
> This is reproducer of panic. "quick fix" is attached.
> But I think putback_lru_page() should be re-designed.
> 
> ==
> #include <stdio.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/mman.h>
> #include <unistd.h>
> #include <errno.h>
> 
> int main(int argc, char *argv[])
> {
>         int fd;
>         char *filename = argv[1];
>         char buffer[4096];
>         char *addr;
>         int len;
> 
>         fd = open(filename, O_CREAT | O_EXCL | O_RDWR, S_IRWXU);
> 
>         if (fd < 0) {
>                 perror("open");
>                 exit(1);
>         }
>         len = write(fd, buffer, sizeof(buffer));
> 
>         if (len < 0) {
>                 perror("write");
>                 exit(1);
>         }
> 
>         addr = mmap(NULL, 4096, PROT_WRITE, MAP_SHARED|MAP_LOCKED, fd, 0);
>         if (addr == MAP_FAILED) {
>                 perror("mmap");
>                 exit(1);
>         }
>         munmap(addr, 4096);
>         close(fd);
> 
>         unlink(filename);
> }
> ==
> you'll see panic.
> 
> Fix is here
> ==
Hi Kame,

Thanks, The patch fixes the kernel panic.

Tested-by: Kamalesh Babulal <kamalesh-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> quick fix for double unlock_page();
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamewzawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> Index: linux-2.6.26-rc5-mm3/mm/truncate.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/mm/truncate.c
> +++ linux-2.6.26-rc5-mm3/mm/truncate.c
> @@ -104,8 +104,8 @@ truncate_complete_page(struct address_sp
> 
>  	cancel_dirty_page(page, PAGE_CACHE_SIZE);
> 
> -	remove_from_page_cache(page);
>  	clear_page_mlock(page);
> +	remove_from_page_cache(page);
>  	ClearPageUptodate(page);
>  	ClearPageMappedToDisk(page);
>  	page_cache_release(page);	/* pagecache ref */
> 


-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
  2008-06-13 15:30             ` Lee Schermerhorn
  (?)
@ 2008-06-15  3:59               ` Kamalesh Babulal
  -1 siblings, 0 replies; 290+ messages in thread
From: Kamalesh Babulal @ 2008-06-15  3:59 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, linux-kernel, kernel-testers,
	linux-mm, Nick Piggin, Andy Whitcroft, riel

Lee Schermerhorn wrote:
> On Thu, 2008-06-12 at 19:13 -0700, Andrew Morton wrote:
>> On Fri, 13 Jun 2008 10:44:44 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>>
>>> This is reproducer of panic. "quick fix" is attached.
>> Thanks - I put that in
>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/hot-fixes/
>>
>>> But I think putback_lru_page() should be re-designed.
>> Yes, it sounds that way.
> 
> Here's a proposed replacement patch that reworks putback_lru_page()
> slightly and cleans up the call sites.  I still want to balance the
> get_page() in isolate_lru_page() with a put_page() in putback_lru_page()
> for the primary users--vmscan and page migration.  So, I need to drop
> the lock before the put_page() when handed a page with null mapping and
> a single reference count as the page will be freed on put_page() and a
> locked page would bug out in free_pages_check()/bad_page().  
> 
> Lee
> 
> PATCH fix page unlocking protocol for putback_lru_page()
> 
> Against:  2.6.26-rc5-mm3
> 
> Replaces Kame-san's hotfix:
> fix-double-unlock_page-in-2626-rc5-mm3-kernel-bug-at-mm-filemapc-575.patch
> 
> Applies at end of vmscan/unevictable/mlock series to avoid patch conflicts.
> 
> 1)  modified putback_lru_page() to drop page lock only if both page_mapping()
>     NULL and page_count() == 1 [rather than VM_BUG_ON(page_count(page) != 1].
>     I want to balance the put_page() from isolate_lru_page() here for vmscan
>     and, e.g., page migration rather than requiring explicit checks of the
>     page_mapping() and explicit put_page() in these areas.  However, the page
>     could be truncated while one of these subsystems holds it isolated from
>     the LRU.  So, need to handle this case.  Callers of putback_lru_page()
>     need to be aware of this and only call it with a page with NULL
>     page_mapping() when they will no longer reference the page afterwards.
>     This is the case for vmscan and page migration.
> 
> 2)  m[un]lock_vma_page() already will not be called for page with NULL
>     mapping.  Added VM_BUG_ON() to assert this.
> 
> 3)  modified clear_page_lock() to skip the isolate/putback shuffle for
>     pages with NULL mapping, as they are being truncated/freed.  Thus,
>     any future callers of clear_page_lock() need not be concerned about
>     the putback_lru_page() semantics for truncated pages.
> 
Hi Lee,

Thanks, After applying the patch, the kernel does not panic's while
bootup.

Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>

> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> 
>  mm/mlock.c  |   29 +++++++++++++++++++----------
>  mm/vmscan.c |   12 +++++++-----
>  2 files changed, 26 insertions(+), 15 deletions(-)
> 
> Index: linux-2.6.26-rc5-mm3/mm/mlock.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/mm/mlock.c	2008-06-12 11:42:59.000000000 -0400
> +++ linux-2.6.26-rc5-mm3/mm/mlock.c	2008-06-13 09:47:14.000000000 -0400
> @@ -59,27 +59,33 @@ void __clear_page_mlock(struct page *pag
> 
>  	dec_zone_page_state(page, NR_MLOCK);
>  	count_vm_event(NORECL_PGCLEARED);
> -	if (!isolate_lru_page(page)) {
> -		putback_lru_page(page);
> -	} else {
> -		/*
> -		 * Page not on the LRU yet.  Flush all pagevecs and retry.
> -		 */
> -		lru_add_drain_all();
> -		if (!isolate_lru_page(page))
> +	if (page->mapping) {	/* truncated ? */
> +		if (!isolate_lru_page(page)) {
>  			putback_lru_page(page);
> -		else if (PageUnevictable(page))
> -			count_vm_event(NORECL_PGSTRANDED);
> +		} else {
> +			/*
> +			 * Page not on the LRU yet.
> +			 * Flush all pagevecs and retry.
> +			 */
> +			lru_add_drain_all();
> +			if (!isolate_lru_page(page))
> +				putback_lru_page(page);
> +			else if (PageUnevictable(page))
> +				count_vm_event(NORECL_PGSTRANDED);
> +		}
>  	}
>  }
> 
>  /*
>   * Mark page as mlocked if not already.
>   * If page on LRU, isolate and putback to move to unevictable list.
> + *
> + * Called with page locked and page_mapping() != NULL.
>   */
>  void mlock_vma_page(struct page *page)
>  {
>  	BUG_ON(!PageLocked(page));
> +	VM_BUG_ON(!page_mapping(page));
> 
>  	if (!TestSetPageMlocked(page)) {
>  		inc_zone_page_state(page, NR_MLOCK);
> @@ -92,6 +98,8 @@ void mlock_vma_page(struct page *page)
>  /*
>   * called from munlock()/munmap() path with page supposedly on the LRU.
>   *
> + * Called with page locked and page_mapping() != NULL.
> + *
>   * Note:  unlike mlock_vma_page(), we can't just clear the PageMlocked
>   * [in try_to_munlock()] and then attempt to isolate the page.  We must
>   * isolate the page to keep others from messing with its unevictable
> @@ -110,6 +118,7 @@ void mlock_vma_page(struct page *page)
>  static void munlock_vma_page(struct page *page)
>  {
>  	BUG_ON(!PageLocked(page));
> +	VM_BUG_ON(!page_mapping(page));
> 
>  	if (TestClearPageMlocked(page)) {
>  		dec_zone_page_state(page, NR_MLOCK);
> Index: linux-2.6.26-rc5-mm3/mm/vmscan.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/mm/vmscan.c	2008-06-12 11:39:09.000000000 -0400
> +++ linux-2.6.26-rc5-mm3/mm/vmscan.c	2008-06-13 09:44:44.000000000 -0400
> @@ -1,4 +1,4 @@
> -/*
> + /*
>   *  linux/mm/vmscan.c
>   *
>   *  Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
> @@ -488,6 +488,9 @@ int remove_mapping(struct address_space 
>   * lru_lock must not be held, interrupts must be enabled.
>   * Must be called with page locked.
>   *
> + * If page truncated [page_mapping() == NULL] and we hold the last reference,
> + * the page will be freed here.  For vmscan and page migration.
> + *
>   * return 1 if page still locked [not truncated], else 0
>   */
>  int putback_lru_page(struct page *page)
> @@ -502,12 +505,11 @@ int putback_lru_page(struct page *page)
>  	lru = !!TestClearPageActive(page);
>  	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
> 
> -	if (unlikely(!page->mapping)) {
> +	if (unlikely(!page->mapping && page_count(page) == 1)) {
>  		/*
> -		 * page truncated.  drop lock as put_page() will
> -		 * free the page.
> +		 * page truncated and we hold last reference.
> +		 * drop lock as put_page() will free the page.
>  		 */
> -		VM_BUG_ON(page_count(page) != 1);
>  		unlock_page(page);
>  		ret = 0;
>  	} else if (page_evictable(page, NULL)) {
> 
> 


-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-15  3:59               ` Kamalesh Babulal
  0 siblings, 0 replies; 290+ messages in thread
From: Kamalesh Babulal @ 2008-06-15  3:59 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, linux-kernel, kernel-testers,
	linux-mm, Nick Piggin, Andy Whitcroft, riel

Lee Schermerhorn wrote:
> On Thu, 2008-06-12 at 19:13 -0700, Andrew Morton wrote:
>> On Fri, 13 Jun 2008 10:44:44 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>>
>>> This is reproducer of panic. "quick fix" is attached.
>> Thanks - I put that in
>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/hot-fixes/
>>
>>> But I think putback_lru_page() should be re-designed.
>> Yes, it sounds that way.
> 
> Here's a proposed replacement patch that reworks putback_lru_page()
> slightly and cleans up the call sites.  I still want to balance the
> get_page() in isolate_lru_page() with a put_page() in putback_lru_page()
> for the primary users--vmscan and page migration.  So, I need to drop
> the lock before the put_page() when handed a page with null mapping and
> a single reference count as the page will be freed on put_page() and a
> locked page would bug out in free_pages_check()/bad_page().  
> 
> Lee
> 
> PATCH fix page unlocking protocol for putback_lru_page()
> 
> Against:  2.6.26-rc5-mm3
> 
> Replaces Kame-san's hotfix:
> fix-double-unlock_page-in-2626-rc5-mm3-kernel-bug-at-mm-filemapc-575.patch
> 
> Applies at end of vmscan/unevictable/mlock series to avoid patch conflicts.
> 
> 1)  modified putback_lru_page() to drop page lock only if both page_mapping()
>     NULL and page_count() == 1 [rather than VM_BUG_ON(page_count(page) != 1].
>     I want to balance the put_page() from isolate_lru_page() here for vmscan
>     and, e.g., page migration rather than requiring explicit checks of the
>     page_mapping() and explicit put_page() in these areas.  However, the page
>     could be truncated while one of these subsystems holds it isolated from
>     the LRU.  So, need to handle this case.  Callers of putback_lru_page()
>     need to be aware of this and only call it with a page with NULL
>     page_mapping() when they will no longer reference the page afterwards.
>     This is the case for vmscan and page migration.
> 
> 2)  m[un]lock_vma_page() already will not be called for page with NULL
>     mapping.  Added VM_BUG_ON() to assert this.
> 
> 3)  modified clear_page_lock() to skip the isolate/putback shuffle for
>     pages with NULL mapping, as they are being truncated/freed.  Thus,
>     any future callers of clear_page_lock() need not be concerned about
>     the putback_lru_page() semantics for truncated pages.
> 
Hi Lee,

Thanks, After applying the patch, the kernel does not panic's while
bootup.

Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>

> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> 
>  mm/mlock.c  |   29 +++++++++++++++++++----------
>  mm/vmscan.c |   12 +++++++-----
>  2 files changed, 26 insertions(+), 15 deletions(-)
> 
> Index: linux-2.6.26-rc5-mm3/mm/mlock.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/mm/mlock.c	2008-06-12 11:42:59.000000000 -0400
> +++ linux-2.6.26-rc5-mm3/mm/mlock.c	2008-06-13 09:47:14.000000000 -0400
> @@ -59,27 +59,33 @@ void __clear_page_mlock(struct page *pag
> 
>  	dec_zone_page_state(page, NR_MLOCK);
>  	count_vm_event(NORECL_PGCLEARED);
> -	if (!isolate_lru_page(page)) {
> -		putback_lru_page(page);
> -	} else {
> -		/*
> -		 * Page not on the LRU yet.  Flush all pagevecs and retry.
> -		 */
> -		lru_add_drain_all();
> -		if (!isolate_lru_page(page))
> +	if (page->mapping) {	/* truncated ? */
> +		if (!isolate_lru_page(page)) {
>  			putback_lru_page(page);
> -		else if (PageUnevictable(page))
> -			count_vm_event(NORECL_PGSTRANDED);
> +		} else {
> +			/*
> +			 * Page not on the LRU yet.
> +			 * Flush all pagevecs and retry.
> +			 */
> +			lru_add_drain_all();
> +			if (!isolate_lru_page(page))
> +				putback_lru_page(page);
> +			else if (PageUnevictable(page))
> +				count_vm_event(NORECL_PGSTRANDED);
> +		}
>  	}
>  }
> 
>  /*
>   * Mark page as mlocked if not already.
>   * If page on LRU, isolate and putback to move to unevictable list.
> + *
> + * Called with page locked and page_mapping() != NULL.
>   */
>  void mlock_vma_page(struct page *page)
>  {
>  	BUG_ON(!PageLocked(page));
> +	VM_BUG_ON(!page_mapping(page));
> 
>  	if (!TestSetPageMlocked(page)) {
>  		inc_zone_page_state(page, NR_MLOCK);
> @@ -92,6 +98,8 @@ void mlock_vma_page(struct page *page)
>  /*
>   * called from munlock()/munmap() path with page supposedly on the LRU.
>   *
> + * Called with page locked and page_mapping() != NULL.
> + *
>   * Note:  unlike mlock_vma_page(), we can't just clear the PageMlocked
>   * [in try_to_munlock()] and then attempt to isolate the page.  We must
>   * isolate the page to keep others from messing with its unevictable
> @@ -110,6 +118,7 @@ void mlock_vma_page(struct page *page)
>  static void munlock_vma_page(struct page *page)
>  {
>  	BUG_ON(!PageLocked(page));
> +	VM_BUG_ON(!page_mapping(page));
> 
>  	if (TestClearPageMlocked(page)) {
>  		dec_zone_page_state(page, NR_MLOCK);
> Index: linux-2.6.26-rc5-mm3/mm/vmscan.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/mm/vmscan.c	2008-06-12 11:39:09.000000000 -0400
> +++ linux-2.6.26-rc5-mm3/mm/vmscan.c	2008-06-13 09:44:44.000000000 -0400
> @@ -1,4 +1,4 @@
> -/*
> + /*
>   *  linux/mm/vmscan.c
>   *
>   *  Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
> @@ -488,6 +488,9 @@ int remove_mapping(struct address_space 
>   * lru_lock must not be held, interrupts must be enabled.
>   * Must be called with page locked.
>   *
> + * If page truncated [page_mapping() == NULL] and we hold the last reference,
> + * the page will be freed here.  For vmscan and page migration.
> + *
>   * return 1 if page still locked [not truncated], else 0
>   */
>  int putback_lru_page(struct page *page)
> @@ -502,12 +505,11 @@ int putback_lru_page(struct page *page)
>  	lru = !!TestClearPageActive(page);
>  	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
> 
> -	if (unlikely(!page->mapping)) {
> +	if (unlikely(!page->mapping && page_count(page) == 1)) {
>  		/*
> -		 * page truncated.  drop lock as put_page() will
> -		 * free the page.
> +		 * page truncated and we hold last reference.
> +		 * drop lock as put_page() will free the page.
>  		 */
> -		VM_BUG_ON(page_count(page) != 1);
>  		unlock_page(page);
>  		ret = 0;
>  	} else if (page_evictable(page, NULL)) {
> 
> 


-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-15  3:59               ` Kamalesh Babulal
  0 siblings, 0 replies; 290+ messages in thread
From: Kamalesh Babulal @ 2008-06-15  3:59 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Andrew Morton, KAMEZAWA Hiroyuki,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Nick Piggin, Andy Whitcroft,
	riel-H+wXaHxf7aLQT0dZR+AlfA

Lee Schermerhorn wrote:
> On Thu, 2008-06-12 at 19:13 -0700, Andrew Morton wrote:
>> On Fri, 13 Jun 2008 10:44:44 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
>>
>>> This is reproducer of panic. "quick fix" is attached.
>> Thanks - I put that in
>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/hot-fixes/
>>
>>> But I think putback_lru_page() should be re-designed.
>> Yes, it sounds that way.
> 
> Here's a proposed replacement patch that reworks putback_lru_page()
> slightly and cleans up the call sites.  I still want to balance the
> get_page() in isolate_lru_page() with a put_page() in putback_lru_page()
> for the primary users--vmscan and page migration.  So, I need to drop
> the lock before the put_page() when handed a page with null mapping and
> a single reference count as the page will be freed on put_page() and a
> locked page would bug out in free_pages_check()/bad_page().  
> 
> Lee
> 
> PATCH fix page unlocking protocol for putback_lru_page()
> 
> Against:  2.6.26-rc5-mm3
> 
> Replaces Kame-san's hotfix:
> fix-double-unlock_page-in-2626-rc5-mm3-kernel-bug-at-mm-filemapc-575.patch
> 
> Applies at end of vmscan/unevictable/mlock series to avoid patch conflicts.
> 
> 1)  modified putback_lru_page() to drop page lock only if both page_mapping()
>     NULL and page_count() == 1 [rather than VM_BUG_ON(page_count(page) != 1].
>     I want to balance the put_page() from isolate_lru_page() here for vmscan
>     and, e.g., page migration rather than requiring explicit checks of the
>     page_mapping() and explicit put_page() in these areas.  However, the page
>     could be truncated while one of these subsystems holds it isolated from
>     the LRU.  So, need to handle this case.  Callers of putback_lru_page()
>     need to be aware of this and only call it with a page with NULL
>     page_mapping() when they will no longer reference the page afterwards.
>     This is the case for vmscan and page migration.
> 
> 2)  m[un]lock_vma_page() already will not be called for page with NULL
>     mapping.  Added VM_BUG_ON() to assert this.
> 
> 3)  modified clear_page_lock() to skip the isolate/putback shuffle for
>     pages with NULL mapping, as they are being truncated/freed.  Thus,
>     any future callers of clear_page_lock() need not be concerned about
>     the putback_lru_page() semantics for truncated pages.
> 
Hi Lee,

Thanks, After applying the patch, the kernel does not panic's while
bootup.

Tested-by: Kamalesh Babulal <kamalesh-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

> Signed-off-by: Lee Schermerhorn <lee.schermerhorn-VXdhtT5mjnY@public.gmane.org>
> 
>  mm/mlock.c  |   29 +++++++++++++++++++----------
>  mm/vmscan.c |   12 +++++++-----
>  2 files changed, 26 insertions(+), 15 deletions(-)
> 
> Index: linux-2.6.26-rc5-mm3/mm/mlock.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/mm/mlock.c	2008-06-12 11:42:59.000000000 -0400
> +++ linux-2.6.26-rc5-mm3/mm/mlock.c	2008-06-13 09:47:14.000000000 -0400
> @@ -59,27 +59,33 @@ void __clear_page_mlock(struct page *pag
> 
>  	dec_zone_page_state(page, NR_MLOCK);
>  	count_vm_event(NORECL_PGCLEARED);
> -	if (!isolate_lru_page(page)) {
> -		putback_lru_page(page);
> -	} else {
> -		/*
> -		 * Page not on the LRU yet.  Flush all pagevecs and retry.
> -		 */
> -		lru_add_drain_all();
> -		if (!isolate_lru_page(page))
> +	if (page->mapping) {	/* truncated ? */
> +		if (!isolate_lru_page(page)) {
>  			putback_lru_page(page);
> -		else if (PageUnevictable(page))
> -			count_vm_event(NORECL_PGSTRANDED);
> +		} else {
> +			/*
> +			 * Page not on the LRU yet.
> +			 * Flush all pagevecs and retry.
> +			 */
> +			lru_add_drain_all();
> +			if (!isolate_lru_page(page))
> +				putback_lru_page(page);
> +			else if (PageUnevictable(page))
> +				count_vm_event(NORECL_PGSTRANDED);
> +		}
>  	}
>  }
> 
>  /*
>   * Mark page as mlocked if not already.
>   * If page on LRU, isolate and putback to move to unevictable list.
> + *
> + * Called with page locked and page_mapping() != NULL.
>   */
>  void mlock_vma_page(struct page *page)
>  {
>  	BUG_ON(!PageLocked(page));
> +	VM_BUG_ON(!page_mapping(page));
> 
>  	if (!TestSetPageMlocked(page)) {
>  		inc_zone_page_state(page, NR_MLOCK);
> @@ -92,6 +98,8 @@ void mlock_vma_page(struct page *page)
>  /*
>   * called from munlock()/munmap() path with page supposedly on the LRU.
>   *
> + * Called with page locked and page_mapping() != NULL.
> + *
>   * Note:  unlike mlock_vma_page(), we can't just clear the PageMlocked
>   * [in try_to_munlock()] and then attempt to isolate the page.  We must
>   * isolate the page to keep others from messing with its unevictable
> @@ -110,6 +118,7 @@ void mlock_vma_page(struct page *page)
>  static void munlock_vma_page(struct page *page)
>  {
>  	BUG_ON(!PageLocked(page));
> +	VM_BUG_ON(!page_mapping(page));
> 
>  	if (TestClearPageMlocked(page)) {
>  		dec_zone_page_state(page, NR_MLOCK);
> Index: linux-2.6.26-rc5-mm3/mm/vmscan.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/mm/vmscan.c	2008-06-12 11:39:09.000000000 -0400
> +++ linux-2.6.26-rc5-mm3/mm/vmscan.c	2008-06-13 09:44:44.000000000 -0400
> @@ -1,4 +1,4 @@
> -/*
> + /*
>   *  linux/mm/vmscan.c
>   *
>   *  Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
> @@ -488,6 +488,9 @@ int remove_mapping(struct address_space 
>   * lru_lock must not be held, interrupts must be enabled.
>   * Must be called with page locked.
>   *
> + * If page truncated [page_mapping() == NULL] and we hold the last reference,
> + * the page will be freed here.  For vmscan and page migration.
> + *
>   * return 1 if page still locked [not truncated], else 0
>   */
>  int putback_lru_page(struct page *page)
> @@ -502,12 +505,11 @@ int putback_lru_page(struct page *page)
>  	lru = !!TestClearPageActive(page);
>  	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
> 
> -	if (unlikely(!page->mapping)) {
> +	if (unlikely(!page->mapping && page_count(page) == 1)) {
>  		/*
> -		 * page truncated.  drop lock as put_page() will
> -		 * free the page.
> +		 * page truncated and we hold last reference.
> +		 * drop lock as put_page() will free the page.
>  		 */
> -		VM_BUG_ON(page_count(page) != 1);
>  		unlock_page(page);
>  		ret = 0;
>  	} else if (page_evictable(page, NULL)) {
> 
> 


-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
  2008-06-13 15:30             ` Lee Schermerhorn
  (?)
@ 2008-06-16 14:49               ` Lee Schermerhorn
  -1 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-16 14:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Kamalesh Babulal, linux-kernel,
	kernel-testers, linux-mm, Nick Piggin, Andy Whitcroft, riel

On Fri, 2008-06-13 at 11:30 -0400, Lee Schermerhorn wrote:
> On Thu, 2008-06-12 at 19:13 -0700, Andrew Morton wrote:
> > On Fri, 13 Jun 2008 10:44:44 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > 
> > > This is reproducer of panic. "quick fix" is attached.
> > 
> > Thanks - I put that in
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/hot-fixes/
> > 
> > > But I think putback_lru_page() should be re-designed.
> > 
> > Yes, it sounds that way.
> 
> Here's a proposed replacement patch that reworks putback_lru_page()
> slightly and cleans up the call sites.  I still want to balance the
> get_page() in isolate_lru_page() with a put_page() in putback_lru_page()
> for the primary users--vmscan and page migration.  So, I need to drop
> the lock before the put_page() when handed a page with null mapping and
> a single reference count as the page will be freed on put_page() and a
> locked page would bug out in free_pages_check()/bad_page().  
> 

Below is a fix to the "proposed replacement patch" posted on Friday.
Incorrect test for page->mapping().

Lee

Against:  2.6.26-rc5-mm3 

Incremental fix to my proposed patch to "fix double unlock_page() in
2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575".

"page_mapping(page)" should be "page->mapping" in VM_BUG_ON()s
introduced to m[un]lock_vma_page().

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 mm/mlock.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6.26-rc5-mm3/mm/mlock.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/mm/mlock.c	2008-06-16 09:47:28.000000000 -0400
+++ linux-2.6.26-rc5-mm3/mm/mlock.c	2008-06-16 09:48:27.000000000 -0400
@@ -80,12 +80,12 @@ void __clear_page_mlock(struct page *pag
  * Mark page as mlocked if not already.
  * If page on LRU, isolate and putback to move to unevictable list.
  *
- * Called with page locked and page_mapping() != NULL.
+ * Called with page locked and page->mapping != NULL.
  */
 void mlock_vma_page(struct page *page)
 {
 	BUG_ON(!PageLocked(page));
-	VM_BUG_ON(!page_mapping(page));
+	VM_BUG_ON(!page->mapping);
 
 	if (!TestSetPageMlocked(page)) {
 		inc_zone_page_state(page, NR_MLOCK);
@@ -98,7 +98,7 @@ void mlock_vma_page(struct page *page)
 /*
  * called from munlock()/munmap() path with page supposedly on the LRU.
  *
- * Called with page locked and page_mapping() != NULL.
+ * Called with page locked and page->mapping != NULL.
  *
  * Note:  unlike mlock_vma_page(), we can't just clear the PageMlocked
  * [in try_to_munlock()] and then attempt to isolate the page.  We must
@@ -118,7 +118,7 @@ void mlock_vma_page(struct page *page)
 static void munlock_vma_page(struct page *page)
 {
 	BUG_ON(!PageLocked(page));
-	VM_BUG_ON(!page_mapping(page));
+	VM_BUG_ON(!page->mapping);
 
 	if (TestClearPageMlocked(page)) {
 		dec_zone_page_state(page, NR_MLOCK);



^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-16 14:49               ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-16 14:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Kamalesh Babulal, linux-kernel,
	kernel-testers, linux-mm, Nick Piggin, Andy Whitcroft, riel

On Fri, 2008-06-13 at 11:30 -0400, Lee Schermerhorn wrote:
> On Thu, 2008-06-12 at 19:13 -0700, Andrew Morton wrote:
> > On Fri, 13 Jun 2008 10:44:44 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > 
> > > This is reproducer of panic. "quick fix" is attached.
> > 
> > Thanks - I put that in
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/hot-fixes/
> > 
> > > But I think putback_lru_page() should be re-designed.
> > 
> > Yes, it sounds that way.
> 
> Here's a proposed replacement patch that reworks putback_lru_page()
> slightly and cleans up the call sites.  I still want to balance the
> get_page() in isolate_lru_page() with a put_page() in putback_lru_page()
> for the primary users--vmscan and page migration.  So, I need to drop
> the lock before the put_page() when handed a page with null mapping and
> a single reference count as the page will be freed on put_page() and a
> locked page would bug out in free_pages_check()/bad_page().  
> 

Below is a fix to the "proposed replacement patch" posted on Friday.
Incorrect test for page->mapping().

Lee

Against:  2.6.26-rc5-mm3 

Incremental fix to my proposed patch to "fix double unlock_page() in
2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575".

"page_mapping(page)" should be "page->mapping" in VM_BUG_ON()s
introduced to m[un]lock_vma_page().

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 mm/mlock.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6.26-rc5-mm3/mm/mlock.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/mm/mlock.c	2008-06-16 09:47:28.000000000 -0400
+++ linux-2.6.26-rc5-mm3/mm/mlock.c	2008-06-16 09:48:27.000000000 -0400
@@ -80,12 +80,12 @@ void __clear_page_mlock(struct page *pag
  * Mark page as mlocked if not already.
  * If page on LRU, isolate and putback to move to unevictable list.
  *
- * Called with page locked and page_mapping() != NULL.
+ * Called with page locked and page->mapping != NULL.
  */
 void mlock_vma_page(struct page *page)
 {
 	BUG_ON(!PageLocked(page));
-	VM_BUG_ON(!page_mapping(page));
+	VM_BUG_ON(!page->mapping);
 
 	if (!TestSetPageMlocked(page)) {
 		inc_zone_page_state(page, NR_MLOCK);
@@ -98,7 +98,7 @@ void mlock_vma_page(struct page *page)
 /*
  * called from munlock()/munmap() path with page supposedly on the LRU.
  *
- * Called with page locked and page_mapping() != NULL.
+ * Called with page locked and page->mapping != NULL.
  *
  * Note:  unlike mlock_vma_page(), we can't just clear the PageMlocked
  * [in try_to_munlock()] and then attempt to isolate the page.  We must
@@ -118,7 +118,7 @@ void mlock_vma_page(struct page *page)
 static void munlock_vma_page(struct page *page)
 {
 	BUG_ON(!PageLocked(page));
-	VM_BUG_ON(!page_mapping(page));
+	VM_BUG_ON(!page->mapping);
 
 	if (TestClearPageMlocked(page)) {
 		dec_zone_page_state(page, NR_MLOCK);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-16 14:49               ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-16 14:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Kamalesh Babulal,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Nick Piggin, Andy Whitcroft,
	riel-H+wXaHxf7aLQT0dZR+AlfA

On Fri, 2008-06-13 at 11:30 -0400, Lee Schermerhorn wrote:
> On Thu, 2008-06-12 at 19:13 -0700, Andrew Morton wrote:
> > On Fri, 13 Jun 2008 10:44:44 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> > 
> > > This is reproducer of panic. "quick fix" is attached.
> > 
> > Thanks - I put that in
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.26-rc5/2.6.26-rc5-mm3/hot-fixes/
> > 
> > > But I think putback_lru_page() should be re-designed.
> > 
> > Yes, it sounds that way.
> 
> Here's a proposed replacement patch that reworks putback_lru_page()
> slightly and cleans up the call sites.  I still want to balance the
> get_page() in isolate_lru_page() with a put_page() in putback_lru_page()
> for the primary users--vmscan and page migration.  So, I need to drop
> the lock before the put_page() when handed a page with null mapping and
> a single reference count as the page will be freed on put_page() and a
> locked page would bug out in free_pages_check()/bad_page().  
> 

Below is a fix to the "proposed replacement patch" posted on Friday.
Incorrect test for page->mapping().

Lee

Against:  2.6.26-rc5-mm3 

Incremental fix to my proposed patch to "fix double unlock_page() in
2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575".

"page_mapping(page)" should be "page->mapping" in VM_BUG_ON()s
introduced to m[un]lock_vma_page().

Signed-off-by: Lee Schermerhorn <lee.schermerhorn-VXdhtT5mjnY@public.gmane.org>

 mm/mlock.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6.26-rc5-mm3/mm/mlock.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/mm/mlock.c	2008-06-16 09:47:28.000000000 -0400
+++ linux-2.6.26-rc5-mm3/mm/mlock.c	2008-06-16 09:48:27.000000000 -0400
@@ -80,12 +80,12 @@ void __clear_page_mlock(struct page *pag
  * Mark page as mlocked if not already.
  * If page on LRU, isolate and putback to move to unevictable list.
  *
- * Called with page locked and page_mapping() != NULL.
+ * Called with page locked and page->mapping != NULL.
  */
 void mlock_vma_page(struct page *page)
 {
 	BUG_ON(!PageLocked(page));
-	VM_BUG_ON(!page_mapping(page));
+	VM_BUG_ON(!page->mapping);
 
 	if (!TestSetPageMlocked(page)) {
 		inc_zone_page_state(page, NR_MLOCK);
@@ -98,7 +98,7 @@ void mlock_vma_page(struct page *page)
 /*
  * called from munlock()/munmap() path with page supposedly on the LRU.
  *
- * Called with page locked and page_mapping() != NULL.
+ * Called with page locked and page->mapping != NULL.
  *
  * Note:  unlike mlock_vma_page(), we can't just clear the PageMlocked
  * [in try_to_munlock()] and then attempt to isolate the page.  We must
@@ -118,7 +118,7 @@ void mlock_vma_page(struct page *page)
 static void munlock_vma_page(struct page *page)
 {
 	BUG_ON(!PageLocked(page));
-	VM_BUG_ON(!page_mapping(page));
+	VM_BUG_ON(!page->mapping);
 
 	if (TestClearPageMlocked(page)) {
 		dec_zone_page_state(page, NR_MLOCK);


--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
  2008-06-13 15:30             ` Lee Schermerhorn
  (?)
@ 2008-06-17  2:32               ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-17  2:32 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Andrew Morton, Kamalesh Babulal, linux-kernel, kernel-testers,
	linux-mm, Nick Piggin, Andy Whitcroft, riel

On Fri, 13 Jun 2008 11:30:46 -0400
Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:

> 1)  modified putback_lru_page() to drop page lock only if both page_mapping()
>     NULL and page_count() == 1 [rather than VM_BUG_ON(page_count(page) != 1].

I'm sorry that I cannot catch the whole changes..

I cannot convice that this implicit behavior won't cause lock-up in future, again.
Even if there are enough comments...

Why the page should be locked when it is put back to LRU ?
I think this restriction is added by RvR patch set, right ?
I'm sorry that I cannot catch the whole changes..

Anyway, IMHO, lock <-> unlock should be visible as a pair as much as possible.

Thanks,
-Kame

>     I want to balance the put_page() from isolate_lru_page() here for vmscan
>     and, e.g., page migration rather than requiring explicit checks of the
>     page_mapping() and explicit put_page() in these areas.  However, the page
>     could be truncated while one of these subsystems holds it isolated from
>     the LRU.  So, need to handle this case.  Callers of putback_lru_page()
>     need to be aware of this and only call it with a page with NULL
>     page_mapping() when they will no longer reference the page afterwards.
>     This is the case for vmscan and page migration.
> 
> 2)  m[un]lock_vma_page() already will not be called for page with NULL
>     mapping.  Added VM_BUG_ON() to assert this.
> 
> 3)  modified clear_page_lock() to skip the isolate/putback shuffle for
>     pages with NULL mapping, as they are being truncated/freed.  Thus,
>     any future callers of clear_page_lock() need not be concerned about
>     the putback_lru_page() semantics for truncated pages.
> 
> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> 
>  mm/mlock.c  |   29 +++++++++++++++++++----------
>  mm/vmscan.c |   12 +++++++-----
>  2 files changed, 26 insertions(+), 15 deletions(-)
> 
> Index: linux-2.6.26-rc5-mm3/mm/mlock.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/mm/mlock.c	2008-06-12 11:42:59.000000000 -0400
> +++ linux-2.6.26-rc5-mm3/mm/mlock.c	2008-06-13 09:47:14.000000000 -0400
> @@ -59,27 +59,33 @@ void __clear_page_mlock(struct page *pag
>  
>  	dec_zone_page_state(page, NR_MLOCK);
>  	count_vm_event(NORECL_PGCLEARED);
> -	if (!isolate_lru_page(page)) {
> -		putback_lru_page(page);
> -	} else {
> -		/*
> -		 * Page not on the LRU yet.  Flush all pagevecs and retry.
> -		 */
> -		lru_add_drain_all();
> -		if (!isolate_lru_page(page))
> +	if (page->mapping) {	/* truncated ? */
> +		if (!isolate_lru_page(page)) {
>  			putback_lru_page(page);
> -		else if (PageUnevictable(page))
> -			count_vm_event(NORECL_PGSTRANDED);
> +		} else {
> +			/*
> +			 * Page not on the LRU yet.
> +			 * Flush all pagevecs and retry.
> +			 */
> +			lru_add_drain_all();
> +			if (!isolate_lru_page(page))
> +				putback_lru_page(page);
> +			else if (PageUnevictable(page))
> +				count_vm_event(NORECL_PGSTRANDED);
> +		}
>  	}
>  }
>  
>  /*
>   * Mark page as mlocked if not already.
>   * If page on LRU, isolate and putback to move to unevictable list.
> + *
> + * Called with page locked and page_mapping() != NULL.
>   */
>  void mlock_vma_page(struct page *page)
>  {
>  	BUG_ON(!PageLocked(page));
> +	VM_BUG_ON(!page_mapping(page));
>  
>  	if (!TestSetPageMlocked(page)) {
>  		inc_zone_page_state(page, NR_MLOCK);
> @@ -92,6 +98,8 @@ void mlock_vma_page(struct page *page)
>  /*
>   * called from munlock()/munmap() path with page supposedly on the LRU.
>   *
> + * Called with page locked and page_mapping() != NULL.
> + *
>   * Note:  unlike mlock_vma_page(), we can't just clear the PageMlocked
>   * [in try_to_munlock()] and then attempt to isolate the page.  We must
>   * isolate the page to keep others from messing with its unevictable
> @@ -110,6 +118,7 @@ void mlock_vma_page(struct page *page)
>  static void munlock_vma_page(struct page *page)
>  {
>  	BUG_ON(!PageLocked(page));
> +	VM_BUG_ON(!page_mapping(page));
>  
>  	if (TestClearPageMlocked(page)) {
>  		dec_zone_page_state(page, NR_MLOCK);
> Index: linux-2.6.26-rc5-mm3/mm/vmscan.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/mm/vmscan.c	2008-06-12 11:39:09.000000000 -0400
> +++ linux-2.6.26-rc5-mm3/mm/vmscan.c	2008-06-13 09:44:44.000000000 -0400
> @@ -1,4 +1,4 @@
> -/*
> + /*
>   *  linux/mm/vmscan.c
>   *
>   *  Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
> @@ -488,6 +488,9 @@ int remove_mapping(struct address_space 
>   * lru_lock must not be held, interrupts must be enabled.
>   * Must be called with page locked.
>   *
> + * If page truncated [page_mapping() == NULL] and we hold the last reference,
> + * the page will be freed here.  For vmscan and page migration.
> + *
>   * return 1 if page still locked [not truncated], else 0
>   */
>  int putback_lru_page(struct page *page)
> @@ -502,12 +505,11 @@ int putback_lru_page(struct page *page)
>  	lru = !!TestClearPageActive(page);
>  	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
>  
> -	if (unlikely(!page->mapping)) {
> +	if (unlikely(!page->mapping && page_count(page) == 1)) {
>  		/*
> -		 * page truncated.  drop lock as put_page() will
> -		 * free the page.
> +		 * page truncated and we hold last reference.
> +		 * drop lock as put_page() will free the page.
>  		 */
> -		VM_BUG_ON(page_count(page) != 1);
>  		unlock_page(page);
>  		ret = 0;
>  	} else if (page_evictable(page, NULL)) {
> 
> 
> 


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-17  2:32               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-17  2:32 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Andrew Morton, Kamalesh Babulal, linux-kernel, kernel-testers,
	linux-mm, Nick Piggin, Andy Whitcroft, riel

On Fri, 13 Jun 2008 11:30:46 -0400
Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:

> 1)  modified putback_lru_page() to drop page lock only if both page_mapping()
>     NULL and page_count() == 1 [rather than VM_BUG_ON(page_count(page) != 1].

I'm sorry that I cannot catch the whole changes..

I cannot convice that this implicit behavior won't cause lock-up in future, again.
Even if there are enough comments...

Why the page should be locked when it is put back to LRU ?
I think this restriction is added by RvR patch set, right ?
I'm sorry that I cannot catch the whole changes..

Anyway, IMHO, lock <-> unlock should be visible as a pair as much as possible.

Thanks,
-Kame

>     I want to balance the put_page() from isolate_lru_page() here for vmscan
>     and, e.g., page migration rather than requiring explicit checks of the
>     page_mapping() and explicit put_page() in these areas.  However, the page
>     could be truncated while one of these subsystems holds it isolated from
>     the LRU.  So, need to handle this case.  Callers of putback_lru_page()
>     need to be aware of this and only call it with a page with NULL
>     page_mapping() when they will no longer reference the page afterwards.
>     This is the case for vmscan and page migration.
> 
> 2)  m[un]lock_vma_page() already will not be called for page with NULL
>     mapping.  Added VM_BUG_ON() to assert this.
> 
> 3)  modified clear_page_lock() to skip the isolate/putback shuffle for
>     pages with NULL mapping, as they are being truncated/freed.  Thus,
>     any future callers of clear_page_lock() need not be concerned about
>     the putback_lru_page() semantics for truncated pages.
> 
> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> 
>  mm/mlock.c  |   29 +++++++++++++++++++----------
>  mm/vmscan.c |   12 +++++++-----
>  2 files changed, 26 insertions(+), 15 deletions(-)
> 
> Index: linux-2.6.26-rc5-mm3/mm/mlock.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/mm/mlock.c	2008-06-12 11:42:59.000000000 -0400
> +++ linux-2.6.26-rc5-mm3/mm/mlock.c	2008-06-13 09:47:14.000000000 -0400
> @@ -59,27 +59,33 @@ void __clear_page_mlock(struct page *pag
>  
>  	dec_zone_page_state(page, NR_MLOCK);
>  	count_vm_event(NORECL_PGCLEARED);
> -	if (!isolate_lru_page(page)) {
> -		putback_lru_page(page);
> -	} else {
> -		/*
> -		 * Page not on the LRU yet.  Flush all pagevecs and retry.
> -		 */
> -		lru_add_drain_all();
> -		if (!isolate_lru_page(page))
> +	if (page->mapping) {	/* truncated ? */
> +		if (!isolate_lru_page(page)) {
>  			putback_lru_page(page);
> -		else if (PageUnevictable(page))
> -			count_vm_event(NORECL_PGSTRANDED);
> +		} else {
> +			/*
> +			 * Page not on the LRU yet.
> +			 * Flush all pagevecs and retry.
> +			 */
> +			lru_add_drain_all();
> +			if (!isolate_lru_page(page))
> +				putback_lru_page(page);
> +			else if (PageUnevictable(page))
> +				count_vm_event(NORECL_PGSTRANDED);
> +		}
>  	}
>  }
>  
>  /*
>   * Mark page as mlocked if not already.
>   * If page on LRU, isolate and putback to move to unevictable list.
> + *
> + * Called with page locked and page_mapping() != NULL.
>   */
>  void mlock_vma_page(struct page *page)
>  {
>  	BUG_ON(!PageLocked(page));
> +	VM_BUG_ON(!page_mapping(page));
>  
>  	if (!TestSetPageMlocked(page)) {
>  		inc_zone_page_state(page, NR_MLOCK);
> @@ -92,6 +98,8 @@ void mlock_vma_page(struct page *page)
>  /*
>   * called from munlock()/munmap() path with page supposedly on the LRU.
>   *
> + * Called with page locked and page_mapping() != NULL.
> + *
>   * Note:  unlike mlock_vma_page(), we can't just clear the PageMlocked
>   * [in try_to_munlock()] and then attempt to isolate the page.  We must
>   * isolate the page to keep others from messing with its unevictable
> @@ -110,6 +118,7 @@ void mlock_vma_page(struct page *page)
>  static void munlock_vma_page(struct page *page)
>  {
>  	BUG_ON(!PageLocked(page));
> +	VM_BUG_ON(!page_mapping(page));
>  
>  	if (TestClearPageMlocked(page)) {
>  		dec_zone_page_state(page, NR_MLOCK);
> Index: linux-2.6.26-rc5-mm3/mm/vmscan.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/mm/vmscan.c	2008-06-12 11:39:09.000000000 -0400
> +++ linux-2.6.26-rc5-mm3/mm/vmscan.c	2008-06-13 09:44:44.000000000 -0400
> @@ -1,4 +1,4 @@
> -/*
> + /*
>   *  linux/mm/vmscan.c
>   *
>   *  Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
> @@ -488,6 +488,9 @@ int remove_mapping(struct address_space 
>   * lru_lock must not be held, interrupts must be enabled.
>   * Must be called with page locked.
>   *
> + * If page truncated [page_mapping() == NULL] and we hold the last reference,
> + * the page will be freed here.  For vmscan and page migration.
> + *
>   * return 1 if page still locked [not truncated], else 0
>   */
>  int putback_lru_page(struct page *page)
> @@ -502,12 +505,11 @@ int putback_lru_page(struct page *page)
>  	lru = !!TestClearPageActive(page);
>  	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
>  
> -	if (unlikely(!page->mapping)) {
> +	if (unlikely(!page->mapping && page_count(page) == 1)) {
>  		/*
> -		 * page truncated.  drop lock as put_page() will
> -		 * free the page.
> +		 * page truncated and we hold last reference.
> +		 * drop lock as put_page() will free the page.
>  		 */
> -		VM_BUG_ON(page_count(page) != 1);
>  		unlock_page(page);
>  		ret = 0;
>  	} else if (page_evictable(page, NULL)) {
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-17  2:32               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-17  2:32 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Andrew Morton, Kamalesh Babulal,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Nick Piggin, Andy Whitcroft,
	riel-H+wXaHxf7aLQT0dZR+AlfA

On Fri, 13 Jun 2008 11:30:46 -0400
Lee Schermerhorn <Lee.Schermerhorn-VXdhtT5mjnY@public.gmane.org> wrote:

> 1)  modified putback_lru_page() to drop page lock only if both page_mapping()
>     NULL and page_count() == 1 [rather than VM_BUG_ON(page_count(page) != 1].

I'm sorry that I cannot catch the whole changes..

I cannot convice that this implicit behavior won't cause lock-up in future, again.
Even if there are enough comments...

Why the page should be locked when it is put back to LRU ?
I think this restriction is added by RvR patch set, right ?
I'm sorry that I cannot catch the whole changes..

Anyway, IMHO, lock <-> unlock should be visible as a pair as much as possible.

Thanks,
-Kame

>     I want to balance the put_page() from isolate_lru_page() here for vmscan
>     and, e.g., page migration rather than requiring explicit checks of the
>     page_mapping() and explicit put_page() in these areas.  However, the page
>     could be truncated while one of these subsystems holds it isolated from
>     the LRU.  So, need to handle this case.  Callers of putback_lru_page()
>     need to be aware of this and only call it with a page with NULL
>     page_mapping() when they will no longer reference the page afterwards.
>     This is the case for vmscan and page migration.
> 
> 2)  m[un]lock_vma_page() already will not be called for page with NULL
>     mapping.  Added VM_BUG_ON() to assert this.
> 
> 3)  modified clear_page_lock() to skip the isolate/putback shuffle for
>     pages with NULL mapping, as they are being truncated/freed.  Thus,
>     any future callers of clear_page_lock() need not be concerned about
>     the putback_lru_page() semantics for truncated pages.
> 
> Signed-off-by: Lee Schermerhorn <lee.schermerhorn-VXdhtT5mjnY@public.gmane.org>
> 
>  mm/mlock.c  |   29 +++++++++++++++++++----------
>  mm/vmscan.c |   12 +++++++-----
>  2 files changed, 26 insertions(+), 15 deletions(-)
> 
> Index: linux-2.6.26-rc5-mm3/mm/mlock.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/mm/mlock.c	2008-06-12 11:42:59.000000000 -0400
> +++ linux-2.6.26-rc5-mm3/mm/mlock.c	2008-06-13 09:47:14.000000000 -0400
> @@ -59,27 +59,33 @@ void __clear_page_mlock(struct page *pag
>  
>  	dec_zone_page_state(page, NR_MLOCK);
>  	count_vm_event(NORECL_PGCLEARED);
> -	if (!isolate_lru_page(page)) {
> -		putback_lru_page(page);
> -	} else {
> -		/*
> -		 * Page not on the LRU yet.  Flush all pagevecs and retry.
> -		 */
> -		lru_add_drain_all();
> -		if (!isolate_lru_page(page))
> +	if (page->mapping) {	/* truncated ? */
> +		if (!isolate_lru_page(page)) {
>  			putback_lru_page(page);
> -		else if (PageUnevictable(page))
> -			count_vm_event(NORECL_PGSTRANDED);
> +		} else {
> +			/*
> +			 * Page not on the LRU yet.
> +			 * Flush all pagevecs and retry.
> +			 */
> +			lru_add_drain_all();
> +			if (!isolate_lru_page(page))
> +				putback_lru_page(page);
> +			else if (PageUnevictable(page))
> +				count_vm_event(NORECL_PGSTRANDED);
> +		}
>  	}
>  }
>  
>  /*
>   * Mark page as mlocked if not already.
>   * If page on LRU, isolate and putback to move to unevictable list.
> + *
> + * Called with page locked and page_mapping() != NULL.
>   */
>  void mlock_vma_page(struct page *page)
>  {
>  	BUG_ON(!PageLocked(page));
> +	VM_BUG_ON(!page_mapping(page));
>  
>  	if (!TestSetPageMlocked(page)) {
>  		inc_zone_page_state(page, NR_MLOCK);
> @@ -92,6 +98,8 @@ void mlock_vma_page(struct page *page)
>  /*
>   * called from munlock()/munmap() path with page supposedly on the LRU.
>   *
> + * Called with page locked and page_mapping() != NULL.
> + *
>   * Note:  unlike mlock_vma_page(), we can't just clear the PageMlocked
>   * [in try_to_munlock()] and then attempt to isolate the page.  We must
>   * isolate the page to keep others from messing with its unevictable
> @@ -110,6 +118,7 @@ void mlock_vma_page(struct page *page)
>  static void munlock_vma_page(struct page *page)
>  {
>  	BUG_ON(!PageLocked(page));
> +	VM_BUG_ON(!page_mapping(page));
>  
>  	if (TestClearPageMlocked(page)) {
>  		dec_zone_page_state(page, NR_MLOCK);
> Index: linux-2.6.26-rc5-mm3/mm/vmscan.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/mm/vmscan.c	2008-06-12 11:39:09.000000000 -0400
> +++ linux-2.6.26-rc5-mm3/mm/vmscan.c	2008-06-13 09:44:44.000000000 -0400
> @@ -1,4 +1,4 @@
> -/*
> + /*
>   *  linux/mm/vmscan.c
>   *
>   *  Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
> @@ -488,6 +488,9 @@ int remove_mapping(struct address_space 
>   * lru_lock must not be held, interrupts must be enabled.
>   * Must be called with page locked.
>   *
> + * If page truncated [page_mapping() == NULL] and we hold the last reference,
> + * the page will be freed here.  For vmscan and page migration.
> + *
>   * return 1 if page still locked [not truncated], else 0
>   */
>  int putback_lru_page(struct page *page)
> @@ -502,12 +505,11 @@ int putback_lru_page(struct page *page)
>  	lru = !!TestClearPageActive(page);
>  	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
>  
> -	if (unlikely(!page->mapping)) {
> +	if (unlikely(!page->mapping && page_count(page) == 1)) {
>  		/*
> -		 * page truncated.  drop lock as put_page() will
> -		 * free the page.
> +		 * page truncated and we hold last reference.
> +		 * drop lock as put_page() will free the page.
>  		 */
> -		VM_BUG_ON(page_count(page) != 1);
>  		unlock_page(page);
>  		ret = 0;
>  	} else if (page_evictable(page, NULL)) {
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
  2008-06-12  5:59 ` 2.6.26-rc5-mm3 Andrew Morton
  (?)
@ 2008-06-17  7:35   ` Daisuke Nishimura
  -1 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-17  7:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Lee Schermerhorn, Kosaki Motohiro, Nick Piggin,
	linux-mm, linux-kernel, kernel-testers

Hi.

I got this bug while migrating pages only a few times
via memory_migrate of cpuset.

Unfortunately, even if this patch is applied,
I got bad_page problem after hundreds times of page migration
(I'll report it in another mail).
But I believe something like this patch is needed anyway.

------------[ cut here ]------------
kernel BUG at mm/migrate.c:719!
invalid opcode: 0000 [1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
CPU 0
Modules linked in: ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_log dm_multipath dm_mod sbs sbshc button battery acpi_memhotplug ac parport_pc lp parport floppy serio_raw rtc_cmos rtc_core rtc_lib 8139too pcspkr 8139cp mii ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode]
Pid: 3096, comm: switch.sh Not tainted 2.6.26-rc5-mm3 #1
RIP: 0010:[<ffffffff8029bb85>]  [<ffffffff8029bb85>] migrate_pages+0x33e/0x49f
RSP: 0018:ffff81002f463bb8  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffe20000c17500 RCX: 0000000000000034
RDX: ffffe20000c17500 RSI: ffffe200010003c0 RDI: ffffe20000c17528
RBP: ffffe200010003c0 R08: 8000000000000000 R09: 304605894800282f
R10: 282f87058b480028 R11: 0028304005894800 R12: ffff81003f90a5d8
R13: 0000000000000000 R14: ffffe20000bf4cc0 R15: ffff81002f463c88
FS:  00007ff9386576f0(0000) GS:ffffffff8061d800(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007ff938669000 CR3: 000000002f458000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process switch.sh (pid: 3096, threadinfo ffff81002f462000, task ffff81003e99cf10)
Stack:  0000000000000001 ffffffff80290777 0000000000000000 0000000000000000
 ffff81002f463c88 ffff81000000ea18 ffff81002f463c88 000000000000000c
 ffff81002f463ca8 00007ffffffff000 00007fff649f6000 0000000000000004
Call Trace:
 [<ffffffff80290777>] ? new_node_page+0x0/0x2f
 [<ffffffff80291611>] ? do_migrate_pages+0x19b/0x1e7
 [<ffffffff802315c7>] ? set_cpus_allowed_ptr+0xe6/0xf3
 [<ffffffff8025c827>] ? cpuset_migrate_mm+0x58/0x8f
 [<ffffffff8025d0fd>] ? cpuset_attach+0x8b/0x9e
 [<ffffffff8025a3e1>] ? cgroup_attach_task+0x3a3/0x3f5
 [<ffffffff80276cb5>] ? __alloc_pages_internal+0xe2/0x3d1
 [<ffffffff8025af06>] ? cgroup_common_file_write+0x150/0x1dd
 [<ffffffff8025aaf4>] ? cgroup_file_write+0x54/0x150
 [<ffffffff8029f839>] ? vfs_write+0xad/0x136
 [<ffffffff8029fd76>] ? sys_write+0x45/0x6e
 [<ffffffff8020bef2>] ? tracesys+0xd5/0xda


Code: 4c 48 8d 7b 28 e8 cc 87 09 00 48 83 7b 18 00 75 30 48 8b 03 48 89 da 25 00 40 00 00 48 85 c0 74 04 48 8b 53 10 83 7a 08 01 74 04 <0f> 0b eb fe 48 89 df e8 5e 50 fd ff 48 89 df e8 7d d6 fd ff eb
RIP  [<ffffffff8029bb85>] migrate_pages+0x33e/0x49f
 RSP <ffff81002f463bb8>
Clocksource tsc unstable (delta = 438246251 ns)
---[ end trace ce4e6053f7b9bba1 ]---


This bug is caused by VM_BUG_ON() in unmap_and_move().

unmap_and_move()
    710         if (rc != -EAGAIN) {
    711                 /*
    712                  * A page that has been migrated has all references
    713                  * removed and will be freed. A page that has not been
    714                  * migrated will have kepts its references and be
    715                  * restored.
    716                  */
    717                 list_del(&page->lru);
    718                 if (!page->mapping) {
    719                         VM_BUG_ON(page_count(page) != 1);
    720                         unlock_page(page);
    721                         put_page(page);         /* just free the old page */
    722                         goto end_migration;
    723                 } else
    724                         unlock = putback_lru_page(page);
    725         }

I think the page count is not necessarily 1 here, because
migration_entry_wait increases page count and waits for the
page to be unlocked.
So, if the old page is accessed between migrate_page_move_mapping,
which checks the page count, and remove_migration_ptes, page count
would not be 1 here.

Actually, just commenting out get/put_page from migration_entry_wait
works well in my environment(succeeded in hundreds times of page migration),
but modifying migration_entry_wait this way is not good, I think.


This patch depends on Lee Schermerhorn's fix for double unlock_page.

This patch also fixes a race between migrate_entry_wait and
page_freeze_refs in migrate_page_move_mapping.


Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

---
diff -uprN linux-2.6.26-rc5-mm3/mm/migrate.c linux-2.6.26-rc5-mm3-test/mm/migrate.c
--- linux-2.6.26-rc5-mm3/mm/migrate.c	2008-06-17 15:31:23.000000000 +0900
+++ linux-2.6.26-rc5-mm3-test/mm/migrate.c	2008-06-17 13:59:15.000000000 +0900
@@ -232,6 +232,7 @@ void migration_entry_wait(struct mm_stru
 	swp_entry_t entry;
 	struct page *page;
 
+retry:
 	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
 	pte = *ptep;
 	if (!is_swap_pte(pte))
@@ -243,11 +244,20 @@ void migration_entry_wait(struct mm_stru
 
 	page = migration_entry_to_page(entry);
 
-	get_page(page);
-	pte_unmap_unlock(ptep, ptl);
-	wait_on_page_locked(page);
-	put_page(page);
-	return;
+	/*
+	 * page count might be set to zero by page_freeze_refs()
+	 * in migrate_page_move_mapping().
+	 */
+	if (get_page_unless_zero(page)) {
+		pte_unmap_unlock(ptep, ptl);
+		wait_on_page_locked(page);
+		put_page(page);
+		return;
+	} else {
+		pte_unmap_unlock(ptep, ptl);
+		goto retry;
+	}
+
 out:
 	pte_unmap_unlock(ptep, ptl);
 }
@@ -715,13 +725,7 @@ unlock:
  		 * restored.
  		 */
  		list_del(&page->lru);
-		if (!page->mapping) {
-			VM_BUG_ON(page_count(page) != 1);
-			unlock_page(page);
-			put_page(page);		/* just free the old page */
-			goto end_migration;
-		} else
-			unlock = putback_lru_page(page);
+		unlock = putback_lru_page(page);
 	}
 
 	if (unlock)

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-17  7:35   ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-17  7:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Lee Schermerhorn, Kosaki Motohiro, Nick Piggin,
	linux-mm, linux-kernel, kernel-testers

Hi.

I got this bug while migrating pages only a few times
via memory_migrate of cpuset.

Unfortunately, even if this patch is applied,
I got bad_page problem after hundreds times of page migration
(I'll report it in another mail).
But I believe something like this patch is needed anyway.

------------[ cut here ]------------
kernel BUG at mm/migrate.c:719!
invalid opcode: 0000 [1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
CPU 0
Modules linked in: ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_log dm_multipath dm_mod sbs sbshc button battery acpi_memhotplug ac parport_pc lp parport floppy serio_raw rtc_cmos rtc_core rtc_lib 8139too pcspkr 8139cp mii ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode]
Pid: 3096, comm: switch.sh Not tainted 2.6.26-rc5-mm3 #1
RIP: 0010:[<ffffffff8029bb85>]  [<ffffffff8029bb85>] migrate_pages+0x33e/0x49f
RSP: 0018:ffff81002f463bb8  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffe20000c17500 RCX: 0000000000000034
RDX: ffffe20000c17500 RSI: ffffe200010003c0 RDI: ffffe20000c17528
RBP: ffffe200010003c0 R08: 8000000000000000 R09: 304605894800282f
R10: 282f87058b480028 R11: 0028304005894800 R12: ffff81003f90a5d8
R13: 0000000000000000 R14: ffffe20000bf4cc0 R15: ffff81002f463c88
FS:  00007ff9386576f0(0000) GS:ffffffff8061d800(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007ff938669000 CR3: 000000002f458000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process switch.sh (pid: 3096, threadinfo ffff81002f462000, task ffff81003e99cf10)
Stack:  0000000000000001 ffffffff80290777 0000000000000000 0000000000000000
 ffff81002f463c88 ffff81000000ea18 ffff81002f463c88 000000000000000c
 ffff81002f463ca8 00007ffffffff000 00007fff649f6000 0000000000000004
Call Trace:
 [<ffffffff80290777>] ? new_node_page+0x0/0x2f
 [<ffffffff80291611>] ? do_migrate_pages+0x19b/0x1e7
 [<ffffffff802315c7>] ? set_cpus_allowed_ptr+0xe6/0xf3
 [<ffffffff8025c827>] ? cpuset_migrate_mm+0x58/0x8f
 [<ffffffff8025d0fd>] ? cpuset_attach+0x8b/0x9e
 [<ffffffff8025a3e1>] ? cgroup_attach_task+0x3a3/0x3f5
 [<ffffffff80276cb5>] ? __alloc_pages_internal+0xe2/0x3d1
 [<ffffffff8025af06>] ? cgroup_common_file_write+0x150/0x1dd
 [<ffffffff8025aaf4>] ? cgroup_file_write+0x54/0x150
 [<ffffffff8029f839>] ? vfs_write+0xad/0x136
 [<ffffffff8029fd76>] ? sys_write+0x45/0x6e
 [<ffffffff8020bef2>] ? tracesys+0xd5/0xda


Code: 4c 48 8d 7b 28 e8 cc 87 09 00 48 83 7b 18 00 75 30 48 8b 03 48 89 da 25 00 40 00 00 48 85 c0 74 04 48 8b 53 10 83 7a 08 01 74 04 <0f> 0b eb fe 48 89 df e8 5e 50 fd ff 48 89 df e8 7d d6 fd ff eb
RIP  [<ffffffff8029bb85>] migrate_pages+0x33e/0x49f
 RSP <ffff81002f463bb8>
Clocksource tsc unstable (delta = 438246251 ns)
---[ end trace ce4e6053f7b9bba1 ]---


This bug is caused by VM_BUG_ON() in unmap_and_move().

unmap_and_move()
    710         if (rc != -EAGAIN) {
    711                 /*
    712                  * A page that has been migrated has all references
    713                  * removed and will be freed. A page that has not been
    714                  * migrated will have kepts its references and be
    715                  * restored.
    716                  */
    717                 list_del(&page->lru);
    718                 if (!page->mapping) {
    719                         VM_BUG_ON(page_count(page) != 1);
    720                         unlock_page(page);
    721                         put_page(page);         /* just free the old page */
    722                         goto end_migration;
    723                 } else
    724                         unlock = putback_lru_page(page);
    725         }

I think the page count is not necessarily 1 here, because
migration_entry_wait increases page count and waits for the
page to be unlocked.
So, if the old page is accessed between migrate_page_move_mapping,
which checks the page count, and remove_migration_ptes, page count
would not be 1 here.

Actually, just commenting out get/put_page from migration_entry_wait
works well in my environment(succeeded in hundreds times of page migration),
but modifying migration_entry_wait this way is not good, I think.


This patch depends on Lee Schermerhorn's fix for double unlock_page.

This patch also fixes a race between migrate_entry_wait and
page_freeze_refs in migrate_page_move_mapping.


Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

---
diff -uprN linux-2.6.26-rc5-mm3/mm/migrate.c linux-2.6.26-rc5-mm3-test/mm/migrate.c
--- linux-2.6.26-rc5-mm3/mm/migrate.c	2008-06-17 15:31:23.000000000 +0900
+++ linux-2.6.26-rc5-mm3-test/mm/migrate.c	2008-06-17 13:59:15.000000000 +0900
@@ -232,6 +232,7 @@ void migration_entry_wait(struct mm_stru
 	swp_entry_t entry;
 	struct page *page;
 
+retry:
 	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
 	pte = *ptep;
 	if (!is_swap_pte(pte))
@@ -243,11 +244,20 @@ void migration_entry_wait(struct mm_stru
 
 	page = migration_entry_to_page(entry);
 
-	get_page(page);
-	pte_unmap_unlock(ptep, ptl);
-	wait_on_page_locked(page);
-	put_page(page);
-	return;
+	/*
+	 * page count might be set to zero by page_freeze_refs()
+	 * in migrate_page_move_mapping().
+	 */
+	if (get_page_unless_zero(page)) {
+		pte_unmap_unlock(ptep, ptl);
+		wait_on_page_locked(page);
+		put_page(page);
+		return;
+	} else {
+		pte_unmap_unlock(ptep, ptl);
+		goto retry;
+	}
+
 out:
 	pte_unmap_unlock(ptep, ptl);
 }
@@ -715,13 +725,7 @@ unlock:
  		 * restored.
  		 */
  		list_del(&page->lru);
-		if (!page->mapping) {
-			VM_BUG_ON(page_count(page) != 1);
-			unlock_page(page);
-			put_page(page);		/* just free the old page */
-			goto end_migration;
-		} else
-			unlock = putback_lru_page(page);
+		unlock = putback_lru_page(page);
 	}
 
 	if (unlock)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-17  7:35   ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-17  7:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Lee Schermerhorn, Kosaki Motohiro, Nick Piggin,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

Hi.

I got this bug while migrating pages only a few times
via memory_migrate of cpuset.

Unfortunately, even if this patch is applied,
I got bad_page problem after hundreds times of page migration
(I'll report it in another mail).
But I believe something like this patch is needed anyway.

------------[ cut here ]------------
kernel BUG at mm/migrate.c:719!
invalid opcode: 0000 [1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
CPU 0
Modules linked in: ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_log dm_multipath dm_mod sbs sbshc button battery acpi_memhotplug ac parport_pc lp parport floppy serio_raw rtc_cmos rtc_core rtc_lib 8139too pcspkr 8139cp mii ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode]
Pid: 3096, comm: switch.sh Not tainted 2.6.26-rc5-mm3 #1
RIP: 0010:[<ffffffff8029bb85>]  [<ffffffff8029bb85>] migrate_pages+0x33e/0x49f
RSP: 0018:ffff81002f463bb8  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffe20000c17500 RCX: 0000000000000034
RDX: ffffe20000c17500 RSI: ffffe200010003c0 RDI: ffffe20000c17528
RBP: ffffe200010003c0 R08: 8000000000000000 R09: 304605894800282f
R10: 282f87058b480028 R11: 0028304005894800 R12: ffff81003f90a5d8
R13: 0000000000000000 R14: ffffe20000bf4cc0 R15: ffff81002f463c88
FS:  00007ff9386576f0(0000) GS:ffffffff8061d800(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007ff938669000 CR3: 000000002f458000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process switch.sh (pid: 3096, threadinfo ffff81002f462000, task ffff81003e99cf10)
Stack:  0000000000000001 ffffffff80290777 0000000000000000 0000000000000000
 ffff81002f463c88 ffff81000000ea18 ffff81002f463c88 000000000000000c
 ffff81002f463ca8 00007ffffffff000 00007fff649f6000 0000000000000004
Call Trace:
 [<ffffffff80290777>] ? new_node_page+0x0/0x2f
 [<ffffffff80291611>] ? do_migrate_pages+0x19b/0x1e7
 [<ffffffff802315c7>] ? set_cpus_allowed_ptr+0xe6/0xf3
 [<ffffffff8025c827>] ? cpuset_migrate_mm+0x58/0x8f
 [<ffffffff8025d0fd>] ? cpuset_attach+0x8b/0x9e
 [<ffffffff8025a3e1>] ? cgroup_attach_task+0x3a3/0x3f5
 [<ffffffff80276cb5>] ? __alloc_pages_internal+0xe2/0x3d1
 [<ffffffff8025af06>] ? cgroup_common_file_write+0x150/0x1dd
 [<ffffffff8025aaf4>] ? cgroup_file_write+0x54/0x150
 [<ffffffff8029f839>] ? vfs_write+0xad/0x136
 [<ffffffff8029fd76>] ? sys_write+0x45/0x6e
 [<ffffffff8020bef2>] ? tracesys+0xd5/0xda


Code: 4c 48 8d 7b 28 e8 cc 87 09 00 48 83 7b 18 00 75 30 48 8b 03 48 89 da 25 00 40 00 00 48 85 c0 74 04 48 8b 53 10 83 7a 08 01 74 04 <0f> 0b eb fe 48 89 df e8 5e 50 fd ff 48 89 df e8 7d d6 fd ff eb
RIP  [<ffffffff8029bb85>] migrate_pages+0x33e/0x49f
 RSP <ffff81002f463bb8>
Clocksource tsc unstable (delta = 438246251 ns)
---[ end trace ce4e6053f7b9bba1 ]---


This bug is caused by VM_BUG_ON() in unmap_and_move().

unmap_and_move()
    710         if (rc != -EAGAIN) {
    711                 /*
    712                  * A page that has been migrated has all references
    713                  * removed and will be freed. A page that has not been
    714                  * migrated will have kepts its references and be
    715                  * restored.
    716                  */
    717                 list_del(&page->lru);
    718                 if (!page->mapping) {
    719                         VM_BUG_ON(page_count(page) != 1);
    720                         unlock_page(page);
    721                         put_page(page);         /* just free the old page */
    722                         goto end_migration;
    723                 } else
    724                         unlock = putback_lru_page(page);
    725         }

I think the page count is not necessarily 1 here, because
migration_entry_wait increases page count and waits for the
page to be unlocked.
So, if the old page is accessed between migrate_page_move_mapping,
which checks the page count, and remove_migration_ptes, page count
would not be 1 here.

Actually, just commenting out get/put_page from migration_entry_wait
works well in my environment(succeeded in hundreds times of page migration),
but modifying migration_entry_wait this way is not good, I think.


This patch depends on Lee Schermerhorn's fix for double unlock_page.

This patch also fixes a race between migrate_entry_wait and
page_freeze_refs in migrate_page_move_mapping.


Signed-off-by: Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>

---
diff -uprN linux-2.6.26-rc5-mm3/mm/migrate.c linux-2.6.26-rc5-mm3-test/mm/migrate.c
--- linux-2.6.26-rc5-mm3/mm/migrate.c	2008-06-17 15:31:23.000000000 +0900
+++ linux-2.6.26-rc5-mm3-test/mm/migrate.c	2008-06-17 13:59:15.000000000 +0900
@@ -232,6 +232,7 @@ void migration_entry_wait(struct mm_stru
 	swp_entry_t entry;
 	struct page *page;
 
+retry:
 	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
 	pte = *ptep;
 	if (!is_swap_pte(pte))
@@ -243,11 +244,20 @@ void migration_entry_wait(struct mm_stru
 
 	page = migration_entry_to_page(entry);
 
-	get_page(page);
-	pte_unmap_unlock(ptep, ptl);
-	wait_on_page_locked(page);
-	put_page(page);
-	return;
+	/*
+	 * page count might be set to zero by page_freeze_refs()
+	 * in migrate_page_move_mapping().
+	 */
+	if (get_page_unless_zero(page)) {
+		pte_unmap_unlock(ptep, ptl);
+		wait_on_page_locked(page);
+		put_page(page);
+		return;
+	} else {
+		pte_unmap_unlock(ptep, ptl);
+		goto retry;
+	}
+
 out:
 	pte_unmap_unlock(ptep, ptl);
 }
@@ -715,13 +725,7 @@ unlock:
  		 * restored.
  		 */
  		list_del(&page->lru);
-		if (!page->mapping) {
-			VM_BUG_ON(page_count(page) != 1);
-			unlock_page(page);
-			put_page(page);		/* just free the old page */
-			goto end_migration;
-		} else
-			unlock = putback_lru_page(page);
+		unlock = putback_lru_page(page);
 	}
 
 	if (unlock)
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
  2008-06-17  7:35   ` Daisuke Nishimura
  (?)
@ 2008-06-17  7:47     ` Daisuke Nishimura
  -1 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-17  7:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Lee Schermerhorn, Kosaki Motohiro, Nick Piggin,
	linux-mm, linux-kernel, kernel-testers

On Tue, 17 Jun 2008 16:35:01 +0900, Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> Hi.
> 
> I got this bug while migrating pages only a few times
> via memory_migrate of cpuset.
> 
> Unfortunately, even if this patch is applied,
> I got bad_page problem after hundreds times of page migration
> (I'll report it in another mail).
> But I believe something like this patch is needed anyway.
> 

I got bad_page after hundreds times of page migration.
It seems that a locked page is being freed.


Bad page state in process 'switch.sh'
page:ffffe20001ee8f40 flags:0x0500000000080019 mapping:0000000000000000 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
Pid: 23283, comm: switch.sh Not tainted 2.6.26-rc5-mm3-test6-lee #1

Call Trace:
 [<ffffffff802747b0>] bad_page+0x97/0x131
 [<ffffffff80275ae6>] free_hot_cold_page+0xd4/0x19c
 [<ffffffff8027a5c3>] putback_lru_page+0xf4/0xfb
 [<ffffffff8029b210>] putback_lru_pages+0x46/0x74
 [<ffffffff8029bc5b>] migrate_pages+0x3f4/0x468
 [<ffffffff80290797>] new_node_page+0x0/0x2f
 [<ffffffff80291631>] do_migrate_pages+0x19b/0x1e7
 [<ffffffff8025c827>] cpuset_migrate_mm+0x58/0x8f
 [<ffffffff8025d0fd>] cpuset_attach+0x8b/0x9e
 [<ffffffff8032ffdc>] sscanf+0x49/0x51
 [<ffffffff8025a3e1>] cgroup_attach_task+0x3a3/0x3f5
 [<ffffffff80489a90>] __mutex_lock_slowpath+0x64/0x93
 [<ffffffff8025af06>] cgroup_common_file_write+0x150/0x1dd
 [<ffffffff8025aaf4>] cgroup_file_write+0x54/0x150
 [<ffffffff8029f855>] vfs_write+0xad/0x136
 [<ffffffff8029fd92>] sys_write+0x45/0x6e
 [<ffffffff8020bef2>] tracesys+0xd5/0xda

Hexdump:
000: 28 00 08 00 00 00 00 05 02 00 00 00 01 00 00 00
010: 00 00 00 00 00 00 00 00 41 3b 41 2f 00 81 ff ff
020: 46 01 00 00 00 00 00 00 e8 17 e6 01 00 e2 ff ff
030: e8 4b e6 01 00 e2 ff ff 00 00 00 00 00 00 00 00
040: 19 00 08 00 00 00 00 05 00 00 00 00 ff ff ff ff
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
060: ba 06 00 00 00 00 00 00 00 01 10 00 00 c1 ff ff
070: 00 02 20 00 00 c1 ff ff 00 00 00 00 00 00 00 00
080: 28 00 08 00 00 00 00 05 01 00 00 00 00 00 00 00
090: 00 00 00 00 00 00 00 00 01 3d 41 2f 00 81 ff ff
0a0: bb c3 55 f7 07 00 00 00 68 c4 f0 01 00 e2 ff ff
0b0: e8 8f ee 01 00 e2 ff ff 00 00 00 00 00 00 00 00
------------[ cut here ]------------
kernel BUG at mm/filemap.c:575!
invalid opcode: 0000 [1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
CPU 1
Modules linked in: nfs lockd nfs_acl ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_log dm_multipath dm_mod sbs sbshc button battery acpi_memhotplug ac parport_pc lp parport floppy serio_raw rtc_cmos 8139too rtc_core rtc_lib 8139cp mii pcspkr ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode]
Pid: 23283, comm: switch.sh Tainted: G    B     2.6.26-rc5-mm3-test6-lee #1
RIP: 0010:[<ffffffff80270bfe>]  [<ffffffff80270bfe>] unlock_page+0xf/0x26
RSP: 0018:ffff8100396e7b78  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffe20001ee8f40 RCX: 000000000000005a
RDX: 0000000000000006 RSI: 0000000000000003 RDI: ffffe20001ee8f40
RBP: ffffe20001f3e9c0 R08: 0000000000000008 R09: ffff810001101780
R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000004
R13: ffff8100396e7c88 R14: ffffe20001e8d080 R15: ffff8100396e7c88
FS:  00007fd4597fb6f0(0000) GS:ffff81007f98d280(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000418498 CR3: 000000003e9ac000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process switch.sh (pid: 23283, threadinfo ffff8100396e6000, task ffff8100318a64a0)
Stack:  ffffe20001ee8f40 ffffffff8029b21c ffffe20001e98e40 ffff8100396e7c60
 ffffe20000665140 ffff8100314fd581 0000000000000000 ffffffff8029bc5b
 0000000000000000 ffffffff80290797 0000000000000000 0000000000000001
Call Trace:
 [<ffffffff8029b21c>] ? putback_lru_pages+0x52/0x74
 [<ffffffff8029bc5b>] ? migrate_pages+0x3f4/0x468
 [<ffffffff80290797>] ? new_node_page+0x0/0x2f
 [<ffffffff80291631>] ? do_migrate_pages+0x19b/0x1e7
 [<ffffffff8025c827>] ? cpuset_migrate_mm+0x58/0x8f
 [<ffffffff8025d0fd>] ? cpuset_attach+0x8b/0x9e
 [<ffffffff8032ffdc>] ? sscanf+0x49/0x51
 [<ffffffff8025a3e1>] ? cgroup_attach_task+0x3a3/0x3f5
 [<ffffffff80489a90>] ? __mutex_lock_slowpath+0x64/0x93
 [<ffffffff8025af06>] ? cgroup_common_file_write+0x150/0x1dd
 [<ffffffff8025aaf4>] ? cgroup_file_write+0x54/0x150
 [<ffffffff8029f855>] ? vfs_write+0xad/0x136
 [<ffffffff8029fd92>] ? sys_write+0x45/0x6e
 [<ffffffff8020bef2>] ? tracesys+0xd5/0xda


Code: 40 58 48 85 c0 74 0b 48 8b 40 10 48 85 c0 74 02 ff d0 e8 7b 89 21 00 41 5b 31 c0 c3 53 48 89 fb f0 0f ba 37 00 19 c0 85 c0 75 04 <0f> 0b eb fe e8 01 f5 ff ff 48 89 de 48 89 c7 31 d2 5b e9 ea 5e
RIP  [<ffffffff80270bfe>] unlock_page+0xf/0x26
 RSP <ffff8100396e7b78>
---[ end trace 4ab171fcf075cf2e ]---


^ permalink raw reply	[flat|nested] 290+ messages in thread

* [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
@ 2008-06-17  7:47     ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-17  7:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Lee Schermerhorn, Kosaki Motohiro, Nick Piggin,
	linux-mm, linux-kernel, kernel-testers

On Tue, 17 Jun 2008 16:35:01 +0900, Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> Hi.
> 
> I got this bug while migrating pages only a few times
> via memory_migrate of cpuset.
> 
> Unfortunately, even if this patch is applied,
> I got bad_page problem after hundreds times of page migration
> (I'll report it in another mail).
> But I believe something like this patch is needed anyway.
> 

I got bad_page after hundreds times of page migration.
It seems that a locked page is being freed.


Bad page state in process 'switch.sh'
page:ffffe20001ee8f40 flags:0x0500000000080019 mapping:0000000000000000 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
Pid: 23283, comm: switch.sh Not tainted 2.6.26-rc5-mm3-test6-lee #1

Call Trace:
 [<ffffffff802747b0>] bad_page+0x97/0x131
 [<ffffffff80275ae6>] free_hot_cold_page+0xd4/0x19c
 [<ffffffff8027a5c3>] putback_lru_page+0xf4/0xfb
 [<ffffffff8029b210>] putback_lru_pages+0x46/0x74
 [<ffffffff8029bc5b>] migrate_pages+0x3f4/0x468
 [<ffffffff80290797>] new_node_page+0x0/0x2f
 [<ffffffff80291631>] do_migrate_pages+0x19b/0x1e7
 [<ffffffff8025c827>] cpuset_migrate_mm+0x58/0x8f
 [<ffffffff8025d0fd>] cpuset_attach+0x8b/0x9e
 [<ffffffff8032ffdc>] sscanf+0x49/0x51
 [<ffffffff8025a3e1>] cgroup_attach_task+0x3a3/0x3f5
 [<ffffffff80489a90>] __mutex_lock_slowpath+0x64/0x93
 [<ffffffff8025af06>] cgroup_common_file_write+0x150/0x1dd
 [<ffffffff8025aaf4>] cgroup_file_write+0x54/0x150
 [<ffffffff8029f855>] vfs_write+0xad/0x136
 [<ffffffff8029fd92>] sys_write+0x45/0x6e
 [<ffffffff8020bef2>] tracesys+0xd5/0xda

Hexdump:
000: 28 00 08 00 00 00 00 05 02 00 00 00 01 00 00 00
010: 00 00 00 00 00 00 00 00 41 3b 41 2f 00 81 ff ff
020: 46 01 00 00 00 00 00 00 e8 17 e6 01 00 e2 ff ff
030: e8 4b e6 01 00 e2 ff ff 00 00 00 00 00 00 00 00
040: 19 00 08 00 00 00 00 05 00 00 00 00 ff ff ff ff
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
060: ba 06 00 00 00 00 00 00 00 01 10 00 00 c1 ff ff
070: 00 02 20 00 00 c1 ff ff 00 00 00 00 00 00 00 00
080: 28 00 08 00 00 00 00 05 01 00 00 00 00 00 00 00
090: 00 00 00 00 00 00 00 00 01 3d 41 2f 00 81 ff ff
0a0: bb c3 55 f7 07 00 00 00 68 c4 f0 01 00 e2 ff ff
0b0: e8 8f ee 01 00 e2 ff ff 00 00 00 00 00 00 00 00
------------[ cut here ]------------
kernel BUG at mm/filemap.c:575!
invalid opcode: 0000 [1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
CPU 1
Modules linked in: nfs lockd nfs_acl ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_log dm_multipath dm_mod sbs sbshc button battery acpi_memhotplug ac parport_pc lp parport floppy serio_raw rtc_cmos 8139too rtc_core rtc_lib 8139cp mii pcspkr ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode]
Pid: 23283, comm: switch.sh Tainted: G    B     2.6.26-rc5-mm3-test6-lee #1
RIP: 0010:[<ffffffff80270bfe>]  [<ffffffff80270bfe>] unlock_page+0xf/0x26
RSP: 0018:ffff8100396e7b78  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffe20001ee8f40 RCX: 000000000000005a
RDX: 0000000000000006 RSI: 0000000000000003 RDI: ffffe20001ee8f40
RBP: ffffe20001f3e9c0 R08: 0000000000000008 R09: ffff810001101780
R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000004
R13: ffff8100396e7c88 R14: ffffe20001e8d080 R15: ffff8100396e7c88
FS:  00007fd4597fb6f0(0000) GS:ffff81007f98d280(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000418498 CR3: 000000003e9ac000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process switch.sh (pid: 23283, threadinfo ffff8100396e6000, task ffff8100318a64a0)
Stack:  ffffe20001ee8f40 ffffffff8029b21c ffffe20001e98e40 ffff8100396e7c60
 ffffe20000665140 ffff8100314fd581 0000000000000000 ffffffff8029bc5b
 0000000000000000 ffffffff80290797 0000000000000000 0000000000000001
Call Trace:
 [<ffffffff8029b21c>] ? putback_lru_pages+0x52/0x74
 [<ffffffff8029bc5b>] ? migrate_pages+0x3f4/0x468
 [<ffffffff80290797>] ? new_node_page+0x0/0x2f
 [<ffffffff80291631>] ? do_migrate_pages+0x19b/0x1e7
 [<ffffffff8025c827>] ? cpuset_migrate_mm+0x58/0x8f
 [<ffffffff8025d0fd>] ? cpuset_attach+0x8b/0x9e
 [<ffffffff8032ffdc>] ? sscanf+0x49/0x51
 [<ffffffff8025a3e1>] ? cgroup_attach_task+0x3a3/0x3f5
 [<ffffffff80489a90>] ? __mutex_lock_slowpath+0x64/0x93
 [<ffffffff8025af06>] ? cgroup_common_file_write+0x150/0x1dd
 [<ffffffff8025aaf4>] ? cgroup_file_write+0x54/0x150
 [<ffffffff8029f855>] ? vfs_write+0xad/0x136
 [<ffffffff8029fd92>] ? sys_write+0x45/0x6e
 [<ffffffff8020bef2>] ? tracesys+0xd5/0xda


Code: 40 58 48 85 c0 74 0b 48 8b 40 10 48 85 c0 74 02 ff d0 e8 7b 89 21 00 41 5b 31 c0 c3 53 48 89 fb f0 0f ba 37 00 19 c0 85 c0 75 04 <0f> 0b eb fe e8 01 f5 ff ff 48 89 de 48 89 c7 31 d2 5b e9 ea 5e
RIP  [<ffffffff80270bfe>] unlock_page+0xf/0x26
 RSP <ffff8100396e7b78>
---[ end trace 4ab171fcf075cf2e ]---

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
@ 2008-06-17  7:47     ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-17  7:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Lee Schermerhorn, Kosaki Motohiro, Nick Piggin,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Tue, 17 Jun 2008 16:35:01 +0900, Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org> wrote:
> Hi.
> 
> I got this bug while migrating pages only a few times
> via memory_migrate of cpuset.
> 
> Unfortunately, even if this patch is applied,
> I got bad_page problem after hundreds times of page migration
> (I'll report it in another mail).
> But I believe something like this patch is needed anyway.
> 

I got bad_page after hundreds times of page migration.
It seems that a locked page is being freed.


Bad page state in process 'switch.sh'
page:ffffe20001ee8f40 flags:0x0500000000080019 mapping:0000000000000000 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
Pid: 23283, comm: switch.sh Not tainted 2.6.26-rc5-mm3-test6-lee #1

Call Trace:
 [<ffffffff802747b0>] bad_page+0x97/0x131
 [<ffffffff80275ae6>] free_hot_cold_page+0xd4/0x19c
 [<ffffffff8027a5c3>] putback_lru_page+0xf4/0xfb
 [<ffffffff8029b210>] putback_lru_pages+0x46/0x74
 [<ffffffff8029bc5b>] migrate_pages+0x3f4/0x468
 [<ffffffff80290797>] new_node_page+0x0/0x2f
 [<ffffffff80291631>] do_migrate_pages+0x19b/0x1e7
 [<ffffffff8025c827>] cpuset_migrate_mm+0x58/0x8f
 [<ffffffff8025d0fd>] cpuset_attach+0x8b/0x9e
 [<ffffffff8032ffdc>] sscanf+0x49/0x51
 [<ffffffff8025a3e1>] cgroup_attach_task+0x3a3/0x3f5
 [<ffffffff80489a90>] __mutex_lock_slowpath+0x64/0x93
 [<ffffffff8025af06>] cgroup_common_file_write+0x150/0x1dd
 [<ffffffff8025aaf4>] cgroup_file_write+0x54/0x150
 [<ffffffff8029f855>] vfs_write+0xad/0x136
 [<ffffffff8029fd92>] sys_write+0x45/0x6e
 [<ffffffff8020bef2>] tracesys+0xd5/0xda

Hexdump:
000: 28 00 08 00 00 00 00 05 02 00 00 00 01 00 00 00
010: 00 00 00 00 00 00 00 00 41 3b 41 2f 00 81 ff ff
020: 46 01 00 00 00 00 00 00 e8 17 e6 01 00 e2 ff ff
030: e8 4b e6 01 00 e2 ff ff 00 00 00 00 00 00 00 00
040: 19 00 08 00 00 00 00 05 00 00 00 00 ff ff ff ff
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
060: ba 06 00 00 00 00 00 00 00 01 10 00 00 c1 ff ff
070: 00 02 20 00 00 c1 ff ff 00 00 00 00 00 00 00 00
080: 28 00 08 00 00 00 00 05 01 00 00 00 00 00 00 00
090: 00 00 00 00 00 00 00 00 01 3d 41 2f 00 81 ff ff
0a0: bb c3 55 f7 07 00 00 00 68 c4 f0 01 00 e2 ff ff
0b0: e8 8f ee 01 00 e2 ff ff 00 00 00 00 00 00 00 00
------------[ cut here ]------------
kernel BUG at mm/filemap.c:575!
invalid opcode: 0000 [1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
CPU 1
Modules linked in: nfs lockd nfs_acl ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_log dm_multipath dm_mod sbs sbshc button battery acpi_memhotplug ac parport_pc lp parport floppy serio_raw rtc_cmos 8139too rtc_core rtc_lib 8139cp mii pcspkr ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode]
Pid: 23283, comm: switch.sh Tainted: G    B     2.6.26-rc5-mm3-test6-lee #1
RIP: 0010:[<ffffffff80270bfe>]  [<ffffffff80270bfe>] unlock_page+0xf/0x26
RSP: 0018:ffff8100396e7b78  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffe20001ee8f40 RCX: 000000000000005a
RDX: 0000000000000006 RSI: 0000000000000003 RDI: ffffe20001ee8f40
RBP: ffffe20001f3e9c0 R08: 0000000000000008 R09: ffff810001101780
R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000004
R13: ffff8100396e7c88 R14: ffffe20001e8d080 R15: ffff8100396e7c88
FS:  00007fd4597fb6f0(0000) GS:ffff81007f98d280(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000418498 CR3: 000000003e9ac000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process switch.sh (pid: 23283, threadinfo ffff8100396e6000, task ffff8100318a64a0)
Stack:  ffffe20001ee8f40 ffffffff8029b21c ffffe20001e98e40 ffff8100396e7c60
 ffffe20000665140 ffff8100314fd581 0000000000000000 ffffffff8029bc5b
 0000000000000000 ffffffff80290797 0000000000000000 0000000000000001
Call Trace:
 [<ffffffff8029b21c>] ? putback_lru_pages+0x52/0x74
 [<ffffffff8029bc5b>] ? migrate_pages+0x3f4/0x468
 [<ffffffff80290797>] ? new_node_page+0x0/0x2f
 [<ffffffff80291631>] ? do_migrate_pages+0x19b/0x1e7
 [<ffffffff8025c827>] ? cpuset_migrate_mm+0x58/0x8f
 [<ffffffff8025d0fd>] ? cpuset_attach+0x8b/0x9e
 [<ffffffff8032ffdc>] ? sscanf+0x49/0x51
 [<ffffffff8025a3e1>] ? cgroup_attach_task+0x3a3/0x3f5
 [<ffffffff80489a90>] ? __mutex_lock_slowpath+0x64/0x93
 [<ffffffff8025af06>] ? cgroup_common_file_write+0x150/0x1dd
 [<ffffffff8025aaf4>] ? cgroup_file_write+0x54/0x150
 [<ffffffff8029f855>] ? vfs_write+0xad/0x136
 [<ffffffff8029fd92>] ? sys_write+0x45/0x6e
 [<ffffffff8020bef2>] ? tracesys+0xd5/0xda


Code: 40 58 48 85 c0 74 0b 48 8b 40 10 48 85 c0 74 02 ff d0 e8 7b 89 21 00 41 5b 31 c0 c3 53 48 89 fb f0 0f ba 37 00 19 c0 85 c0 75 04 <0f> 0b eb fe e8 01 f5 ff ff 48 89 de 48 89 c7 31 d2 5b e9 ea 5e
RIP  [<ffffffff80270bfe>] unlock_page+0xf/0x26
 RSP <ffff8100396e7b78>
---[ end trace 4ab171fcf075cf2e ]---

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
  2008-06-17  7:47     ` Daisuke Nishimura
  (?)
@ 2008-06-17  9:03       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-17  9:03 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Tue, 17 Jun 2008 16:47:09 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> On Tue, 17 Jun 2008 16:35:01 +0900, Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > Hi.
> > 
> > I got this bug while migrating pages only a few times
> > via memory_migrate of cpuset.
> > 
> > Unfortunately, even if this patch is applied,
> > I got bad_page problem after hundreds times of page migration
> > (I'll report it in another mail).
> > But I believe something like this patch is needed anyway.
> > 
> 
> I got bad_page after hundreds times of page migration.
> It seems that a locked page is being freed.
> 
Good catch, and I think your investigation in the last e-mail was correct.
I'd like to dig this...but it seems some kind of big fix is necessary.
Did this happen under page-migraion by cpuset-task-move test ?

Thanks,
-Kame



> 
> Bad page state in process 'switch.sh'
> page:ffffe20001ee8f40 flags:0x0500000000080019 mapping:0000000000000000 mapcount:0 count:0
> Trying to fix it up, but a reboot is needed
> Backtrace:
> Pid: 23283, comm: switch.sh Not tainted 2.6.26-rc5-mm3-test6-lee #1
> 
> Call Trace:
>  [<ffffffff802747b0>] bad_page+0x97/0x131
>  [<ffffffff80275ae6>] free_hot_cold_page+0xd4/0x19c
>  [<ffffffff8027a5c3>] putback_lru_page+0xf4/0xfb
>  [<ffffffff8029b210>] putback_lru_pages+0x46/0x74
>  [<ffffffff8029bc5b>] migrate_pages+0x3f4/0x468
>  [<ffffffff80290797>] new_node_page+0x0/0x2f
>  [<ffffffff80291631>] do_migrate_pages+0x19b/0x1e7
>  [<ffffffff8025c827>] cpuset_migrate_mm+0x58/0x8f
>  [<ffffffff8025d0fd>] cpuset_attach+0x8b/0x9e
>  [<ffffffff8032ffdc>] sscanf+0x49/0x51
>  [<ffffffff8025a3e1>] cgroup_attach_task+0x3a3/0x3f5
>  [<ffffffff80489a90>] __mutex_lock_slowpath+0x64/0x93
>  [<ffffffff8025af06>] cgroup_common_file_write+0x150/0x1dd
>  [<ffffffff8025aaf4>] cgroup_file_write+0x54/0x150
>  [<ffffffff8029f855>] vfs_write+0xad/0x136
>  [<ffffffff8029fd92>] sys_write+0x45/0x6e
>  [<ffffffff8020bef2>] tracesys+0xd5/0xda
> 
> Hexdump:
> 000: 28 00 08 00 00 00 00 05 02 00 00 00 01 00 00 00
> 010: 00 00 00 00 00 00 00 00 41 3b 41 2f 00 81 ff ff
> 020: 46 01 00 00 00 00 00 00 e8 17 e6 01 00 e2 ff ff
> 030: e8 4b e6 01 00 e2 ff ff 00 00 00 00 00 00 00 00
> 040: 19 00 08 00 00 00 00 05 00 00 00 00 ff ff ff ff
> 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 060: ba 06 00 00 00 00 00 00 00 01 10 00 00 c1 ff ff
> 070: 00 02 20 00 00 c1 ff ff 00 00 00 00 00 00 00 00
> 080: 28 00 08 00 00 00 00 05 01 00 00 00 00 00 00 00
> 090: 00 00 00 00 00 00 00 00 01 3d 41 2f 00 81 ff ff
> 0a0: bb c3 55 f7 07 00 00 00 68 c4 f0 01 00 e2 ff ff
> 0b0: e8 8f ee 01 00 e2 ff ff 00 00 00 00 00 00 00 00
> ------------[ cut here ]------------
> kernel BUG at mm/filemap.c:575!
> invalid opcode: 0000 [1] SMP
> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
> CPU 1
> Modules linked in: nfs lockd nfs_acl ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_log dm_multipath dm_mod sbs sbshc button battery acpi_memhotplug ac parport_pc lp parport floppy serio_raw rtc_cmos 8139too rtc_core rtc_lib 8139cp mii pcspkr ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode]
> Pid: 23283, comm: switch.sh Tainted: G    B     2.6.26-rc5-mm3-test6-lee #1
> RIP: 0010:[<ffffffff80270bfe>]  [<ffffffff80270bfe>] unlock_page+0xf/0x26
> RSP: 0018:ffff8100396e7b78  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffffe20001ee8f40 RCX: 000000000000005a
> RDX: 0000000000000006 RSI: 0000000000000003 RDI: ffffe20001ee8f40
> RBP: ffffe20001f3e9c0 R08: 0000000000000008 R09: ffff810001101780
> R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000004
> R13: ffff8100396e7c88 R14: ffffe20001e8d080 R15: ffff8100396e7c88
> FS:  00007fd4597fb6f0(0000) GS:ffff81007f98d280(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000418498 CR3: 000000003e9ac000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process switch.sh (pid: 23283, threadinfo ffff8100396e6000, task ffff8100318a64a0)
> Stack:  ffffe20001ee8f40 ffffffff8029b21c ffffe20001e98e40 ffff8100396e7c60
>  ffffe20000665140 ffff8100314fd581 0000000000000000 ffffffff8029bc5b
>  0000000000000000 ffffffff80290797 0000000000000000 0000000000000001
> Call Trace:
>  [<ffffffff8029b21c>] ? putback_lru_pages+0x52/0x74
>  [<ffffffff8029bc5b>] ? migrate_pages+0x3f4/0x468
>  [<ffffffff80290797>] ? new_node_page+0x0/0x2f
>  [<ffffffff80291631>] ? do_migrate_pages+0x19b/0x1e7
>  [<ffffffff8025c827>] ? cpuset_migrate_mm+0x58/0x8f
>  [<ffffffff8025d0fd>] ? cpuset_attach+0x8b/0x9e
>  [<ffffffff8032ffdc>] ? sscanf+0x49/0x51
>  [<ffffffff8025a3e1>] ? cgroup_attach_task+0x3a3/0x3f5
>  [<ffffffff80489a90>] ? __mutex_lock_slowpath+0x64/0x93
>  [<ffffffff8025af06>] ? cgroup_common_file_write+0x150/0x1dd
>  [<ffffffff8025aaf4>] ? cgroup_file_write+0x54/0x150
>  [<ffffffff8029f855>] ? vfs_write+0xad/0x136
>  [<ffffffff8029fd92>] ? sys_write+0x45/0x6e
>  [<ffffffff8020bef2>] ? tracesys+0xd5/0xda
> 
> 
> Code: 40 58 48 85 c0 74 0b 48 8b 40 10 48 85 c0 74 02 ff d0 e8 7b 89 21 00 41 5b 31 c0 c3 53 48 89 fb f0 0f ba 37 00 19 c0 85 c0 75 04 <0f> 0b eb fe e8 01 f5 ff ff 48 89 de 48 89 c7 31 d2 5b e9 ea 5e
> RIP  [<ffffffff80270bfe>] unlock_page+0xf/0x26
>  RSP <ffff8100396e7b78>
> ---[ end trace 4ab171fcf075cf2e ]---
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
@ 2008-06-17  9:03       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-17  9:03 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Tue, 17 Jun 2008 16:47:09 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> On Tue, 17 Jun 2008 16:35:01 +0900, Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > Hi.
> > 
> > I got this bug while migrating pages only a few times
> > via memory_migrate of cpuset.
> > 
> > Unfortunately, even if this patch is applied,
> > I got bad_page problem after hundreds times of page migration
> > (I'll report it in another mail).
> > But I believe something like this patch is needed anyway.
> > 
> 
> I got bad_page after hundreds times of page migration.
> It seems that a locked page is being freed.
> 
Good catch, and I think your investigation in the last e-mail was correct.
I'd like to dig this...but it seems some kind of big fix is necessary.
Did this happen under page-migraion by cpuset-task-move test ?

Thanks,
-Kame



> 
> Bad page state in process 'switch.sh'
> page:ffffe20001ee8f40 flags:0x0500000000080019 mapping:0000000000000000 mapcount:0 count:0
> Trying to fix it up, but a reboot is needed
> Backtrace:
> Pid: 23283, comm: switch.sh Not tainted 2.6.26-rc5-mm3-test6-lee #1
> 
> Call Trace:
>  [<ffffffff802747b0>] bad_page+0x97/0x131
>  [<ffffffff80275ae6>] free_hot_cold_page+0xd4/0x19c
>  [<ffffffff8027a5c3>] putback_lru_page+0xf4/0xfb
>  [<ffffffff8029b210>] putback_lru_pages+0x46/0x74
>  [<ffffffff8029bc5b>] migrate_pages+0x3f4/0x468
>  [<ffffffff80290797>] new_node_page+0x0/0x2f
>  [<ffffffff80291631>] do_migrate_pages+0x19b/0x1e7
>  [<ffffffff8025c827>] cpuset_migrate_mm+0x58/0x8f
>  [<ffffffff8025d0fd>] cpuset_attach+0x8b/0x9e
>  [<ffffffff8032ffdc>] sscanf+0x49/0x51
>  [<ffffffff8025a3e1>] cgroup_attach_task+0x3a3/0x3f5
>  [<ffffffff80489a90>] __mutex_lock_slowpath+0x64/0x93
>  [<ffffffff8025af06>] cgroup_common_file_write+0x150/0x1dd
>  [<ffffffff8025aaf4>] cgroup_file_write+0x54/0x150
>  [<ffffffff8029f855>] vfs_write+0xad/0x136
>  [<ffffffff8029fd92>] sys_write+0x45/0x6e
>  [<ffffffff8020bef2>] tracesys+0xd5/0xda
> 
> Hexdump:
> 000: 28 00 08 00 00 00 00 05 02 00 00 00 01 00 00 00
> 010: 00 00 00 00 00 00 00 00 41 3b 41 2f 00 81 ff ff
> 020: 46 01 00 00 00 00 00 00 e8 17 e6 01 00 e2 ff ff
> 030: e8 4b e6 01 00 e2 ff ff 00 00 00 00 00 00 00 00
> 040: 19 00 08 00 00 00 00 05 00 00 00 00 ff ff ff ff
> 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 060: ba 06 00 00 00 00 00 00 00 01 10 00 00 c1 ff ff
> 070: 00 02 20 00 00 c1 ff ff 00 00 00 00 00 00 00 00
> 080: 28 00 08 00 00 00 00 05 01 00 00 00 00 00 00 00
> 090: 00 00 00 00 00 00 00 00 01 3d 41 2f 00 81 ff ff
> 0a0: bb c3 55 f7 07 00 00 00 68 c4 f0 01 00 e2 ff ff
> 0b0: e8 8f ee 01 00 e2 ff ff 00 00 00 00 00 00 00 00
> ------------[ cut here ]------------
> kernel BUG at mm/filemap.c:575!
> invalid opcode: 0000 [1] SMP
> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
> CPU 1
> Modules linked in: nfs lockd nfs_acl ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_log dm_multipath dm_mod sbs sbshc button battery acpi_memhotplug ac parport_pc lp parport floppy serio_raw rtc_cmos 8139too rtc_core rtc_lib 8139cp mii pcspkr ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode]
> Pid: 23283, comm: switch.sh Tainted: G    B     2.6.26-rc5-mm3-test6-lee #1
> RIP: 0010:[<ffffffff80270bfe>]  [<ffffffff80270bfe>] unlock_page+0xf/0x26
> RSP: 0018:ffff8100396e7b78  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffffe20001ee8f40 RCX: 000000000000005a
> RDX: 0000000000000006 RSI: 0000000000000003 RDI: ffffe20001ee8f40
> RBP: ffffe20001f3e9c0 R08: 0000000000000008 R09: ffff810001101780
> R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000004
> R13: ffff8100396e7c88 R14: ffffe20001e8d080 R15: ffff8100396e7c88
> FS:  00007fd4597fb6f0(0000) GS:ffff81007f98d280(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000418498 CR3: 000000003e9ac000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process switch.sh (pid: 23283, threadinfo ffff8100396e6000, task ffff8100318a64a0)
> Stack:  ffffe20001ee8f40 ffffffff8029b21c ffffe20001e98e40 ffff8100396e7c60
>  ffffe20000665140 ffff8100314fd581 0000000000000000 ffffffff8029bc5b
>  0000000000000000 ffffffff80290797 0000000000000000 0000000000000001
> Call Trace:
>  [<ffffffff8029b21c>] ? putback_lru_pages+0x52/0x74
>  [<ffffffff8029bc5b>] ? migrate_pages+0x3f4/0x468
>  [<ffffffff80290797>] ? new_node_page+0x0/0x2f
>  [<ffffffff80291631>] ? do_migrate_pages+0x19b/0x1e7
>  [<ffffffff8025c827>] ? cpuset_migrate_mm+0x58/0x8f
>  [<ffffffff8025d0fd>] ? cpuset_attach+0x8b/0x9e
>  [<ffffffff8032ffdc>] ? sscanf+0x49/0x51
>  [<ffffffff8025a3e1>] ? cgroup_attach_task+0x3a3/0x3f5
>  [<ffffffff80489a90>] ? __mutex_lock_slowpath+0x64/0x93
>  [<ffffffff8025af06>] ? cgroup_common_file_write+0x150/0x1dd
>  [<ffffffff8025aaf4>] ? cgroup_file_write+0x54/0x150
>  [<ffffffff8029f855>] ? vfs_write+0xad/0x136
>  [<ffffffff8029fd92>] ? sys_write+0x45/0x6e
>  [<ffffffff8020bef2>] ? tracesys+0xd5/0xda
> 
> 
> Code: 40 58 48 85 c0 74 0b 48 8b 40 10 48 85 c0 74 02 ff d0 e8 7b 89 21 00 41 5b 31 c0 c3 53 48 89 fb f0 0f ba 37 00 19 c0 85 c0 75 04 <0f> 0b eb fe e8 01 f5 ff ff 48 89 de 48 89 c7 31 d2 5b e9 ea 5e
> RIP  [<ffffffff80270bfe>] unlock_page+0xf/0x26
>  RSP <ffff8100396e7b78>
> ---[ end trace 4ab171fcf075cf2e ]---
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
@ 2008-06-17  9:03       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-17  9:03 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Kosaki Motohiro,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Tue, 17 Jun 2008 16:47:09 +0900
Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org> wrote:

> On Tue, 17 Jun 2008 16:35:01 +0900, Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org> wrote:
> > Hi.
> > 
> > I got this bug while migrating pages only a few times
> > via memory_migrate of cpuset.
> > 
> > Unfortunately, even if this patch is applied,
> > I got bad_page problem after hundreds times of page migration
> > (I'll report it in another mail).
> > But I believe something like this patch is needed anyway.
> > 
> 
> I got bad_page after hundreds times of page migration.
> It seems that a locked page is being freed.
> 
Good catch, and I think your investigation in the last e-mail was correct.
I'd like to dig this...but it seems some kind of big fix is necessary.
Did this happen under page-migraion by cpuset-task-move test ?

Thanks,
-Kame



> 
> Bad page state in process 'switch.sh'
> page:ffffe20001ee8f40 flags:0x0500000000080019 mapping:0000000000000000 mapcount:0 count:0
> Trying to fix it up, but a reboot is needed
> Backtrace:
> Pid: 23283, comm: switch.sh Not tainted 2.6.26-rc5-mm3-test6-lee #1
> 
> Call Trace:
>  [<ffffffff802747b0>] bad_page+0x97/0x131
>  [<ffffffff80275ae6>] free_hot_cold_page+0xd4/0x19c
>  [<ffffffff8027a5c3>] putback_lru_page+0xf4/0xfb
>  [<ffffffff8029b210>] putback_lru_pages+0x46/0x74
>  [<ffffffff8029bc5b>] migrate_pages+0x3f4/0x468
>  [<ffffffff80290797>] new_node_page+0x0/0x2f
>  [<ffffffff80291631>] do_migrate_pages+0x19b/0x1e7
>  [<ffffffff8025c827>] cpuset_migrate_mm+0x58/0x8f
>  [<ffffffff8025d0fd>] cpuset_attach+0x8b/0x9e
>  [<ffffffff8032ffdc>] sscanf+0x49/0x51
>  [<ffffffff8025a3e1>] cgroup_attach_task+0x3a3/0x3f5
>  [<ffffffff80489a90>] __mutex_lock_slowpath+0x64/0x93
>  [<ffffffff8025af06>] cgroup_common_file_write+0x150/0x1dd
>  [<ffffffff8025aaf4>] cgroup_file_write+0x54/0x150
>  [<ffffffff8029f855>] vfs_write+0xad/0x136
>  [<ffffffff8029fd92>] sys_write+0x45/0x6e
>  [<ffffffff8020bef2>] tracesys+0xd5/0xda
> 
> Hexdump:
> 000: 28 00 08 00 00 00 00 05 02 00 00 00 01 00 00 00
> 010: 00 00 00 00 00 00 00 00 41 3b 41 2f 00 81 ff ff
> 020: 46 01 00 00 00 00 00 00 e8 17 e6 01 00 e2 ff ff
> 030: e8 4b e6 01 00 e2 ff ff 00 00 00 00 00 00 00 00
> 040: 19 00 08 00 00 00 00 05 00 00 00 00 ff ff ff ff
> 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 060: ba 06 00 00 00 00 00 00 00 01 10 00 00 c1 ff ff
> 070: 00 02 20 00 00 c1 ff ff 00 00 00 00 00 00 00 00
> 080: 28 00 08 00 00 00 00 05 01 00 00 00 00 00 00 00
> 090: 00 00 00 00 00 00 00 00 01 3d 41 2f 00 81 ff ff
> 0a0: bb c3 55 f7 07 00 00 00 68 c4 f0 01 00 e2 ff ff
> 0b0: e8 8f ee 01 00 e2 ff ff 00 00 00 00 00 00 00 00
> ------------[ cut here ]------------
> kernel BUG at mm/filemap.c:575!
> invalid opcode: 0000 [1] SMP
> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
> CPU 1
> Modules linked in: nfs lockd nfs_acl ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_log dm_multipath dm_mod sbs sbshc button battery acpi_memhotplug ac parport_pc lp parport floppy serio_raw rtc_cmos 8139too rtc_core rtc_lib 8139cp mii pcspkr ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode]
> Pid: 23283, comm: switch.sh Tainted: G    B     2.6.26-rc5-mm3-test6-lee #1
> RIP: 0010:[<ffffffff80270bfe>]  [<ffffffff80270bfe>] unlock_page+0xf/0x26
> RSP: 0018:ffff8100396e7b78  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffffe20001ee8f40 RCX: 000000000000005a
> RDX: 0000000000000006 RSI: 0000000000000003 RDI: ffffe20001ee8f40
> RBP: ffffe20001f3e9c0 R08: 0000000000000008 R09: ffff810001101780
> R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000004
> R13: ffff8100396e7c88 R14: ffffe20001e8d080 R15: ffff8100396e7c88
> FS:  00007fd4597fb6f0(0000) GS:ffff81007f98d280(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000418498 CR3: 000000003e9ac000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process switch.sh (pid: 23283, threadinfo ffff8100396e6000, task ffff8100318a64a0)
> Stack:  ffffe20001ee8f40 ffffffff8029b21c ffffe20001e98e40 ffff8100396e7c60
>  ffffe20000665140 ffff8100314fd581 0000000000000000 ffffffff8029bc5b
>  0000000000000000 ffffffff80290797 0000000000000000 0000000000000001
> Call Trace:
>  [<ffffffff8029b21c>] ? putback_lru_pages+0x52/0x74
>  [<ffffffff8029bc5b>] ? migrate_pages+0x3f4/0x468
>  [<ffffffff80290797>] ? new_node_page+0x0/0x2f
>  [<ffffffff80291631>] ? do_migrate_pages+0x19b/0x1e7
>  [<ffffffff8025c827>] ? cpuset_migrate_mm+0x58/0x8f
>  [<ffffffff8025d0fd>] ? cpuset_attach+0x8b/0x9e
>  [<ffffffff8032ffdc>] ? sscanf+0x49/0x51
>  [<ffffffff8025a3e1>] ? cgroup_attach_task+0x3a3/0x3f5
>  [<ffffffff80489a90>] ? __mutex_lock_slowpath+0x64/0x93
>  [<ffffffff8025af06>] ? cgroup_common_file_write+0x150/0x1dd
>  [<ffffffff8025aaf4>] ? cgroup_file_write+0x54/0x150
>  [<ffffffff8029f855>] ? vfs_write+0xad/0x136
>  [<ffffffff8029fd92>] ? sys_write+0x45/0x6e
>  [<ffffffff8020bef2>] ? tracesys+0xd5/0xda
> 
> 
> Code: 40 58 48 85 c0 74 0b 48 8b 40 10 48 85 c0 74 02 ff d0 e8 7b 89 21 00 41 5b 31 c0 c3 53 48 89 fb f0 0f ba 37 00 19 c0 85 c0 75 04 <0f> 0b eb fe e8 01 f5 ff ff 48 89 de 48 89 c7 31 d2 5b e9 ea 5e
> RIP  [<ffffffff80270bfe>] unlock_page+0xf/0x26
>  RSP <ffff8100396e7b78>
> ---[ end trace 4ab171fcf075cf2e ]---
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"> email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org </a>
> 

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
  2008-06-17  9:03       ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-17  9:14         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-17  9:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro, Daisuke Nishimura, Andrew Morton, Rik van Riel,
	Lee Schermerhorn, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers

> > > I got this bug while migrating pages only a few times
> > > via memory_migrate of cpuset.
> > > 
> > > Unfortunately, even if this patch is applied,
> > > I got bad_page problem after hundreds times of page migration
> > > (I'll report it in another mail).
> > > But I believe something like this patch is needed anyway.
> > > 
> > 
> > I got bad_page after hundreds times of page migration.
> > It seems that a locked page is being freed.
> > 
> Good catch, and I think your investigation in the last e-mail was correct.
> I'd like to dig this...but it seems some kind of big fix is necessary.
> Did this happen under page-migraion by cpuset-task-move test ?

Indeed!

I guess lee's unevictable infrastructure and nick's specurative pagecache
is conflicted.
I'm investigating deeply now.




^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
@ 2008-06-17  9:14         ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-17  9:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro, Daisuke Nishimura, Andrew Morton, Rik van Riel,
	Lee Schermerhorn, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers

> > > I got this bug while migrating pages only a few times
> > > via memory_migrate of cpuset.
> > > 
> > > Unfortunately, even if this patch is applied,
> > > I got bad_page problem after hundreds times of page migration
> > > (I'll report it in another mail).
> > > But I believe something like this patch is needed anyway.
> > > 
> > 
> > I got bad_page after hundreds times of page migration.
> > It seems that a locked page is being freed.
> > 
> Good catch, and I think your investigation in the last e-mail was correct.
> I'd like to dig this...but it seems some kind of big fix is necessary.
> Did this happen under page-migraion by cpuset-task-move test ?

Indeed!

I guess lee's unevictable infrastructure and nick's specurative pagecache
is conflicted.
I'm investigating deeply now.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
@ 2008-06-17  9:14         ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-17  9:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, Daisuke Nishimura,
	Andrew Morton, Rik van Riel, Lee Schermerhorn, Nick Piggin,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

> > > I got this bug while migrating pages only a few times
> > > via memory_migrate of cpuset.
> > > 
> > > Unfortunately, even if this patch is applied,
> > > I got bad_page problem after hundreds times of page migration
> > > (I'll report it in another mail).
> > > But I believe something like this patch is needed anyway.
> > > 
> > 
> > I got bad_page after hundreds times of page migration.
> > It seems that a locked page is being freed.
> > 
> Good catch, and I think your investigation in the last e-mail was correct.
> I'd like to dig this...but it seems some kind of big fix is necessary.
> Did this happen under page-migraion by cpuset-task-move test ?

Indeed!

I guess lee's unevictable infrastructure and nick's specurative pagecache
is conflicted.
I'm investigating deeply now.



--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
  2008-06-17  9:03       ` KAMEZAWA Hiroyuki
@ 2008-06-17  9:15         ` Daisuke Nishimura
  -1 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-17  9:15 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Tue, 17 Jun 2008 18:03:14 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 17 Jun 2008 16:47:09 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > On Tue, 17 Jun 2008 16:35:01 +0900, Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > > Hi.
> > > 
> > > I got this bug while migrating pages only a few times
> > > via memory_migrate of cpuset.
> > > 
> > > Unfortunately, even if this patch is applied,
> > > I got bad_page problem after hundreds times of page migration
> > > (I'll report it in another mail).
> > > But I believe something like this patch is needed anyway.
> > > 
> > 
> > I got bad_page after hundreds times of page migration.
> > It seems that a locked page is being freed.
> > 
> Good catch, and I think your investigation in the last e-mail was correct.
> I'd like to dig this...but it seems some kind of big fix is necessary.
> Did this happen under page-migraion by cpuset-task-move test ?
> 
Yes.

I made 2 cpuset directories, run some processes in each cpusets,
and run a script like below infinitely to move tasks and migrate pages.

---
#!/bin/bash

G1=$1
G2=$2

move_task()
{
        for pid in $1
        do
                echo $pid >$2/tasks 2>/dev/null
        done
}

G1_TASK=`cat ${G1}/tasks`
G2_TASK=`cat ${G2}/tasks`

move_task "${G1_TASK}" ${G2} &
move_task "${G2_TASK}" ${G1} &

wait
---

I got this bad_page after running this script for about 600 times.


Thanks,
Daisuke Nishimura.



^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
@ 2008-06-17  9:15         ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-17  9:15 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Tue, 17 Jun 2008 18:03:14 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 17 Jun 2008 16:47:09 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > On Tue, 17 Jun 2008 16:35:01 +0900, Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > > Hi.
> > > 
> > > I got this bug while migrating pages only a few times
> > > via memory_migrate of cpuset.
> > > 
> > > Unfortunately, even if this patch is applied,
> > > I got bad_page problem after hundreds times of page migration
> > > (I'll report it in another mail).
> > > But I believe something like this patch is needed anyway.
> > > 
> > 
> > I got bad_page after hundreds times of page migration.
> > It seems that a locked page is being freed.
> > 
> Good catch, and I think your investigation in the last e-mail was correct.
> I'd like to dig this...but it seems some kind of big fix is necessary.
> Did this happen under page-migraion by cpuset-task-move test ?
> 
Yes.

I made 2 cpuset directories, run some processes in each cpusets,
and run a script like below infinitely to move tasks and migrate pages.

---
#!/bin/bash

G1=$1
G2=$2

move_task()
{
        for pid in $1
        do
                echo $pid >$2/tasks 2>/dev/null
        done
}

G1_TASK=`cat ${G1}/tasks`
G2_TASK=`cat ${G2}/tasks`

move_task "${G1_TASK}" ${G2} &
move_task "${G2_TASK}" ${G1} &

wait
---

I got this bad_page after running this script for about 600 times.


Thanks,
Daisuke Nishimura.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
  2008-06-17  2:32               ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-17 15:26                 ` Lee Schermerhorn
  -1 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-17 15:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, KOSAKI Motohiro, Kamalesh Babulal, linux-kernel,
	kernel-testers, linux-mm, Nick Piggin, Andy Whitcroft, riel

On Tue, 2008-06-17 at 11:32 +0900, KAMEZAWA Hiroyuki wrote:
> On Fri, 13 Jun 2008 11:30:46 -0400
> Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> 
> > 1)  modified putback_lru_page() to drop page lock only if both page_mapping()
> >     NULL and page_count() == 1 [rather than VM_BUG_ON(page_count(page) != 1].
> 
> I'm sorry that I cannot catch the whole changes..
> 
> I cannot convice that this implicit behavior won't cause lock-up in future, again.
> Even if there are enough comments...
> 
> Why the page should be locked when it is put back to LRU ?
> I think this restriction is added by RvR patch set, right ?
> I'm sorry that I cannot catch the whole changes..

Kame-san:  The restriction to put the page back to the LRU via
putback_lru_page() with the page locked does come from the unevictable
page infrastructure.  Both page migration and vmscan can hold the page
isolated from the LRU, but unlocked, for quite some time.  During this
time, a page can become nonreclaimable [or unevictable] or a
nonreclaimable page can become reclaimable.  It's OK if an unevictable
pages gets on on the regular LRU lists, because we'll detect it and
"cull" it if/when vmscan attempts to reclaim it.  However, if a
reclaimable page gets onto the unevictable LRU list, we may never get it
off, except via manual scan.  Rik doesn't think we need the manual scan,
so we've been very careful to avoid conditions where we could "leak" a
reclaimable page permantently onto the unevictable list.  Kosaki-san
found several scenarios where this could happen unless we check, under
page lock, the unevictable conditions when putting these pages back on
the LRU.

> 
> Anyway, IMHO, lock <-> unlock should be visible as a pair as much as possible.

I've considered modifying putback_lru_page() not to unlock/put the page
when mapping == NULL and count == 1.  Then all of the callers would have
to remember this state, drop the lock and call put page themselves.  I
think this would duplicate code and look ugly, but if we need to do
that, I guess we'll do it.

Regards,
Lee
> 
> Thanks,
> -Kame
> 
> >     I want to balance the put_page() from isolate_lru_page() here for vmscan
> >     and, e.g., page migration rather than requiring explicit checks of the
> >     page_mapping() and explicit put_page() in these areas.  However, the page
> >     could be truncated while one of these subsystems holds it isolated from
> >     the LRU.  So, need to handle this case.  Callers of putback_lru_page()
> >     need to be aware of this and only call it with a page with NULL
> >     page_mapping() when they will no longer reference the page afterwards.
> >     This is the case for vmscan and page migration.
> > 
> > 2)  m[un]lock_vma_page() already will not be called for page with NULL
> >     mapping.  Added VM_BUG_ON() to assert this.
> > 
> > 3)  modified clear_page_lock() to skip the isolate/putback shuffle for
> >     pages with NULL mapping, as they are being truncated/freed.  Thus,
> >     any future callers of clear_page_lock() need not be concerned about
> >     the putback_lru_page() semantics for truncated pages.
> > 
> > Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> > 
> >  mm/mlock.c  |   29 +++++++++++++++++++----------
> >  mm/vmscan.c |   12 +++++++-----
> >  2 files changed, 26 insertions(+), 15 deletions(-)
> > 
> > Index: linux-2.6.26-rc5-mm3/mm/mlock.c
> > ===================================================================
> > --- linux-2.6.26-rc5-mm3.orig/mm/mlock.c	2008-06-12 11:42:59.000000000 -0400
> > +++ linux-2.6.26-rc5-mm3/mm/mlock.c	2008-06-13 09:47:14.000000000 -0400
> > @@ -59,27 +59,33 @@ void __clear_page_mlock(struct page *pag
> >  
> >  	dec_zone_page_state(page, NR_MLOCK);
> >  	count_vm_event(NORECL_PGCLEARED);
> > -	if (!isolate_lru_page(page)) {
> > -		putback_lru_page(page);
> > -	} else {
> > -		/*
> > -		 * Page not on the LRU yet.  Flush all pagevecs and retry.
> > -		 */
> > -		lru_add_drain_all();
> > -		if (!isolate_lru_page(page))
> > +	if (page->mapping) {	/* truncated ? */
> > +		if (!isolate_lru_page(page)) {
> >  			putback_lru_page(page);
> > -		else if (PageUnevictable(page))
> > -			count_vm_event(NORECL_PGSTRANDED);
> > +		} else {
> > +			/*
> > +			 * Page not on the LRU yet.
> > +			 * Flush all pagevecs and retry.
> > +			 */
> > +			lru_add_drain_all();
> > +			if (!isolate_lru_page(page))
> > +				putback_lru_page(page);
> > +			else if (PageUnevictable(page))
> > +				count_vm_event(NORECL_PGSTRANDED);
> > +		}
> >  	}
> >  }
> >  
> >  /*
> >   * Mark page as mlocked if not already.
> >   * If page on LRU, isolate and putback to move to unevictable list.
> > + *
> > + * Called with page locked and page_mapping() != NULL.
> >   */
> >  void mlock_vma_page(struct page *page)
> >  {
> >  	BUG_ON(!PageLocked(page));
> > +	VM_BUG_ON(!page_mapping(page));
> >  
> >  	if (!TestSetPageMlocked(page)) {
> >  		inc_zone_page_state(page, NR_MLOCK);
> > @@ -92,6 +98,8 @@ void mlock_vma_page(struct page *page)
> >  /*
> >   * called from munlock()/munmap() path with page supposedly on the LRU.
> >   *
> > + * Called with page locked and page_mapping() != NULL.
> > + *
> >   * Note:  unlike mlock_vma_page(), we can't just clear the PageMlocked
> >   * [in try_to_munlock()] and then attempt to isolate the page.  We must
> >   * isolate the page to keep others from messing with its unevictable
> > @@ -110,6 +118,7 @@ void mlock_vma_page(struct page *page)
> >  static void munlock_vma_page(struct page *page)
> >  {
> >  	BUG_ON(!PageLocked(page));
> > +	VM_BUG_ON(!page_mapping(page));
> >  
> >  	if (TestClearPageMlocked(page)) {
> >  		dec_zone_page_state(page, NR_MLOCK);
> > Index: linux-2.6.26-rc5-mm3/mm/vmscan.c
> > ===================================================================
> > --- linux-2.6.26-rc5-mm3.orig/mm/vmscan.c	2008-06-12 11:39:09.000000000 -0400
> > +++ linux-2.6.26-rc5-mm3/mm/vmscan.c	2008-06-13 09:44:44.000000000 -0400
> > @@ -1,4 +1,4 @@
> > -/*
> > + /*
> >   *  linux/mm/vmscan.c
> >   *
> >   *  Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
> > @@ -488,6 +488,9 @@ int remove_mapping(struct address_space 
> >   * lru_lock must not be held, interrupts must be enabled.
> >   * Must be called with page locked.
> >   *
> > + * If page truncated [page_mapping() == NULL] and we hold the last reference,
> > + * the page will be freed here.  For vmscan and page migration.
> > + *
> >   * return 1 if page still locked [not truncated], else 0
> >   */
> >  int putback_lru_page(struct page *page)
> > @@ -502,12 +505,11 @@ int putback_lru_page(struct page *page)
> >  	lru = !!TestClearPageActive(page);
> >  	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
> >  
> > -	if (unlikely(!page->mapping)) {
> > +	if (unlikely(!page->mapping && page_count(page) == 1)) {
> >  		/*
> > -		 * page truncated.  drop lock as put_page() will
> > -		 * free the page.
> > +		 * page truncated and we hold last reference.
> > +		 * drop lock as put_page() will free the page.
> >  		 */
> > -		VM_BUG_ON(page_count(page) != 1);
> >  		unlock_page(page);
> >  		ret = 0;
> >  	} else if (page_evictable(page, NULL)) {
> > 
> > 
> > 
> 


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-17 15:26                 ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-17 15:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, KOSAKI Motohiro, Kamalesh Babulal, linux-kernel,
	kernel-testers, linux-mm, Nick Piggin, Andy Whitcroft, riel

On Tue, 2008-06-17 at 11:32 +0900, KAMEZAWA Hiroyuki wrote:
> On Fri, 13 Jun 2008 11:30:46 -0400
> Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> 
> > 1)  modified putback_lru_page() to drop page lock only if both page_mapping()
> >     NULL and page_count() == 1 [rather than VM_BUG_ON(page_count(page) != 1].
> 
> I'm sorry that I cannot catch the whole changes..
> 
> I cannot convice that this implicit behavior won't cause lock-up in future, again.
> Even if there are enough comments...
> 
> Why the page should be locked when it is put back to LRU ?
> I think this restriction is added by RvR patch set, right ?
> I'm sorry that I cannot catch the whole changes..

Kame-san:  The restriction to put the page back to the LRU via
putback_lru_page() with the page locked does come from the unevictable
page infrastructure.  Both page migration and vmscan can hold the page
isolated from the LRU, but unlocked, for quite some time.  During this
time, a page can become nonreclaimable [or unevictable] or a
nonreclaimable page can become reclaimable.  It's OK if an unevictable
pages gets on on the regular LRU lists, because we'll detect it and
"cull" it if/when vmscan attempts to reclaim it.  However, if a
reclaimable page gets onto the unevictable LRU list, we may never get it
off, except via manual scan.  Rik doesn't think we need the manual scan,
so we've been very careful to avoid conditions where we could "leak" a
reclaimable page permantently onto the unevictable list.  Kosaki-san
found several scenarios where this could happen unless we check, under
page lock, the unevictable conditions when putting these pages back on
the LRU.

> 
> Anyway, IMHO, lock <-> unlock should be visible as a pair as much as possible.

I've considered modifying putback_lru_page() not to unlock/put the page
when mapping == NULL and count == 1.  Then all of the callers would have
to remember this state, drop the lock and call put page themselves.  I
think this would duplicate code and look ugly, but if we need to do
that, I guess we'll do it.

Regards,
Lee
> 
> Thanks,
> -Kame
> 
> >     I want to balance the put_page() from isolate_lru_page() here for vmscan
> >     and, e.g., page migration rather than requiring explicit checks of the
> >     page_mapping() and explicit put_page() in these areas.  However, the page
> >     could be truncated while one of these subsystems holds it isolated from
> >     the LRU.  So, need to handle this case.  Callers of putback_lru_page()
> >     need to be aware of this and only call it with a page with NULL
> >     page_mapping() when they will no longer reference the page afterwards.
> >     This is the case for vmscan and page migration.
> > 
> > 2)  m[un]lock_vma_page() already will not be called for page with NULL
> >     mapping.  Added VM_BUG_ON() to assert this.
> > 
> > 3)  modified clear_page_lock() to skip the isolate/putback shuffle for
> >     pages with NULL mapping, as they are being truncated/freed.  Thus,
> >     any future callers of clear_page_lock() need not be concerned about
> >     the putback_lru_page() semantics for truncated pages.
> > 
> > Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> > 
> >  mm/mlock.c  |   29 +++++++++++++++++++----------
> >  mm/vmscan.c |   12 +++++++-----
> >  2 files changed, 26 insertions(+), 15 deletions(-)
> > 
> > Index: linux-2.6.26-rc5-mm3/mm/mlock.c
> > ===================================================================
> > --- linux-2.6.26-rc5-mm3.orig/mm/mlock.c	2008-06-12 11:42:59.000000000 -0400
> > +++ linux-2.6.26-rc5-mm3/mm/mlock.c	2008-06-13 09:47:14.000000000 -0400
> > @@ -59,27 +59,33 @@ void __clear_page_mlock(struct page *pag
> >  
> >  	dec_zone_page_state(page, NR_MLOCK);
> >  	count_vm_event(NORECL_PGCLEARED);
> > -	if (!isolate_lru_page(page)) {
> > -		putback_lru_page(page);
> > -	} else {
> > -		/*
> > -		 * Page not on the LRU yet.  Flush all pagevecs and retry.
> > -		 */
> > -		lru_add_drain_all();
> > -		if (!isolate_lru_page(page))
> > +	if (page->mapping) {	/* truncated ? */
> > +		if (!isolate_lru_page(page)) {
> >  			putback_lru_page(page);
> > -		else if (PageUnevictable(page))
> > -			count_vm_event(NORECL_PGSTRANDED);
> > +		} else {
> > +			/*
> > +			 * Page not on the LRU yet.
> > +			 * Flush all pagevecs and retry.
> > +			 */
> > +			lru_add_drain_all();
> > +			if (!isolate_lru_page(page))
> > +				putback_lru_page(page);
> > +			else if (PageUnevictable(page))
> > +				count_vm_event(NORECL_PGSTRANDED);
> > +		}
> >  	}
> >  }
> >  
> >  /*
> >   * Mark page as mlocked if not already.
> >   * If page on LRU, isolate and putback to move to unevictable list.
> > + *
> > + * Called with page locked and page_mapping() != NULL.
> >   */
> >  void mlock_vma_page(struct page *page)
> >  {
> >  	BUG_ON(!PageLocked(page));
> > +	VM_BUG_ON(!page_mapping(page));
> >  
> >  	if (!TestSetPageMlocked(page)) {
> >  		inc_zone_page_state(page, NR_MLOCK);
> > @@ -92,6 +98,8 @@ void mlock_vma_page(struct page *page)
> >  /*
> >   * called from munlock()/munmap() path with page supposedly on the LRU.
> >   *
> > + * Called with page locked and page_mapping() != NULL.
> > + *
> >   * Note:  unlike mlock_vma_page(), we can't just clear the PageMlocked
> >   * [in try_to_munlock()] and then attempt to isolate the page.  We must
> >   * isolate the page to keep others from messing with its unevictable
> > @@ -110,6 +118,7 @@ void mlock_vma_page(struct page *page)
> >  static void munlock_vma_page(struct page *page)
> >  {
> >  	BUG_ON(!PageLocked(page));
> > +	VM_BUG_ON(!page_mapping(page));
> >  
> >  	if (TestClearPageMlocked(page)) {
> >  		dec_zone_page_state(page, NR_MLOCK);
> > Index: linux-2.6.26-rc5-mm3/mm/vmscan.c
> > ===================================================================
> > --- linux-2.6.26-rc5-mm3.orig/mm/vmscan.c	2008-06-12 11:39:09.000000000 -0400
> > +++ linux-2.6.26-rc5-mm3/mm/vmscan.c	2008-06-13 09:44:44.000000000 -0400
> > @@ -1,4 +1,4 @@
> > -/*
> > + /*
> >   *  linux/mm/vmscan.c
> >   *
> >   *  Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
> > @@ -488,6 +488,9 @@ int remove_mapping(struct address_space 
> >   * lru_lock must not be held, interrupts must be enabled.
> >   * Must be called with page locked.
> >   *
> > + * If page truncated [page_mapping() == NULL] and we hold the last reference,
> > + * the page will be freed here.  For vmscan and page migration.
> > + *
> >   * return 1 if page still locked [not truncated], else 0
> >   */
> >  int putback_lru_page(struct page *page)
> > @@ -502,12 +505,11 @@ int putback_lru_page(struct page *page)
> >  	lru = !!TestClearPageActive(page);
> >  	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
> >  
> > -	if (unlikely(!page->mapping)) {
> > +	if (unlikely(!page->mapping && page_count(page) == 1)) {
> >  		/*
> > -		 * page truncated.  drop lock as put_page() will
> > -		 * free the page.
> > +		 * page truncated and we hold last reference.
> > +		 * drop lock as put_page() will free the page.
> >  		 */
> > -		VM_BUG_ON(page_count(page) != 1);
> >  		unlock_page(page);
> >  		ret = 0;
> >  	} else if (page_evictable(page, NULL)) {
> > 
> > 
> > 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!
@ 2008-06-17 15:26                 ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-17 15:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, KOSAKI Motohiro, Kamalesh Babulal,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Nick Piggin, Andy Whitcroft,
	riel-H+wXaHxf7aLQT0dZR+AlfA

On Tue, 2008-06-17 at 11:32 +0900, KAMEZAWA Hiroyuki wrote:
> On Fri, 13 Jun 2008 11:30:46 -0400
> Lee Schermerhorn <Lee.Schermerhorn-VXdhtT5mjnY@public.gmane.org> wrote:
> 
> > 1)  modified putback_lru_page() to drop page lock only if both page_mapping()
> >     NULL and page_count() == 1 [rather than VM_BUG_ON(page_count(page) != 1].
> 
> I'm sorry that I cannot catch the whole changes..
> 
> I cannot convice that this implicit behavior won't cause lock-up in future, again.
> Even if there are enough comments...
> 
> Why the page should be locked when it is put back to LRU ?
> I think this restriction is added by RvR patch set, right ?
> I'm sorry that I cannot catch the whole changes..

Kame-san:  The restriction to put the page back to the LRU via
putback_lru_page() with the page locked does come from the unevictable
page infrastructure.  Both page migration and vmscan can hold the page
isolated from the LRU, but unlocked, for quite some time.  During this
time, a page can become nonreclaimable [or unevictable] or a
nonreclaimable page can become reclaimable.  It's OK if an unevictable
pages gets on on the regular LRU lists, because we'll detect it and
"cull" it if/when vmscan attempts to reclaim it.  However, if a
reclaimable page gets onto the unevictable LRU list, we may never get it
off, except via manual scan.  Rik doesn't think we need the manual scan,
so we've been very careful to avoid conditions where we could "leak" a
reclaimable page permantently onto the unevictable list.  Kosaki-san
found several scenarios where this could happen unless we check, under
page lock, the unevictable conditions when putting these pages back on
the LRU.

> 
> Anyway, IMHO, lock <-> unlock should be visible as a pair as much as possible.

I've considered modifying putback_lru_page() not to unlock/put the page
when mapping == NULL and count == 1.  Then all of the callers would have
to remember this state, drop the lock and call put page themselves.  I
think this would duplicate code and look ugly, but if we need to do
that, I guess we'll do it.

Regards,
Lee
> 
> Thanks,
> -Kame
> 
> >     I want to balance the put_page() from isolate_lru_page() here for vmscan
> >     and, e.g., page migration rather than requiring explicit checks of the
> >     page_mapping() and explicit put_page() in these areas.  However, the page
> >     could be truncated while one of these subsystems holds it isolated from
> >     the LRU.  So, need to handle this case.  Callers of putback_lru_page()
> >     need to be aware of this and only call it with a page with NULL
> >     page_mapping() when they will no longer reference the page afterwards.
> >     This is the case for vmscan and page migration.
> > 
> > 2)  m[un]lock_vma_page() already will not be called for page with NULL
> >     mapping.  Added VM_BUG_ON() to assert this.
> > 
> > 3)  modified clear_page_lock() to skip the isolate/putback shuffle for
> >     pages with NULL mapping, as they are being truncated/freed.  Thus,
> >     any future callers of clear_page_lock() need not be concerned about
> >     the putback_lru_page() semantics for truncated pages.
> > 
> > Signed-off-by: Lee Schermerhorn <lee.schermerhorn-VXdhtT5mjnY@public.gmane.org>
> > 
> >  mm/mlock.c  |   29 +++++++++++++++++++----------
> >  mm/vmscan.c |   12 +++++++-----
> >  2 files changed, 26 insertions(+), 15 deletions(-)
> > 
> > Index: linux-2.6.26-rc5-mm3/mm/mlock.c
> > ===================================================================
> > --- linux-2.6.26-rc5-mm3.orig/mm/mlock.c	2008-06-12 11:42:59.000000000 -0400
> > +++ linux-2.6.26-rc5-mm3/mm/mlock.c	2008-06-13 09:47:14.000000000 -0400
> > @@ -59,27 +59,33 @@ void __clear_page_mlock(struct page *pag
> >  
> >  	dec_zone_page_state(page, NR_MLOCK);
> >  	count_vm_event(NORECL_PGCLEARED);
> > -	if (!isolate_lru_page(page)) {
> > -		putback_lru_page(page);
> > -	} else {
> > -		/*
> > -		 * Page not on the LRU yet.  Flush all pagevecs and retry.
> > -		 */
> > -		lru_add_drain_all();
> > -		if (!isolate_lru_page(page))
> > +	if (page->mapping) {	/* truncated ? */
> > +		if (!isolate_lru_page(page)) {
> >  			putback_lru_page(page);
> > -		else if (PageUnevictable(page))
> > -			count_vm_event(NORECL_PGSTRANDED);
> > +		} else {
> > +			/*
> > +			 * Page not on the LRU yet.
> > +			 * Flush all pagevecs and retry.
> > +			 */
> > +			lru_add_drain_all();
> > +			if (!isolate_lru_page(page))
> > +				putback_lru_page(page);
> > +			else if (PageUnevictable(page))
> > +				count_vm_event(NORECL_PGSTRANDED);
> > +		}
> >  	}
> >  }
> >  
> >  /*
> >   * Mark page as mlocked if not already.
> >   * If page on LRU, isolate and putback to move to unevictable list.
> > + *
> > + * Called with page locked and page_mapping() != NULL.
> >   */
> >  void mlock_vma_page(struct page *page)
> >  {
> >  	BUG_ON(!PageLocked(page));
> > +	VM_BUG_ON(!page_mapping(page));
> >  
> >  	if (!TestSetPageMlocked(page)) {
> >  		inc_zone_page_state(page, NR_MLOCK);
> > @@ -92,6 +98,8 @@ void mlock_vma_page(struct page *page)
> >  /*
> >   * called from munlock()/munmap() path with page supposedly on the LRU.
> >   *
> > + * Called with page locked and page_mapping() != NULL.
> > + *
> >   * Note:  unlike mlock_vma_page(), we can't just clear the PageMlocked
> >   * [in try_to_munlock()] and then attempt to isolate the page.  We must
> >   * isolate the page to keep others from messing with its unevictable
> > @@ -110,6 +118,7 @@ void mlock_vma_page(struct page *page)
> >  static void munlock_vma_page(struct page *page)
> >  {
> >  	BUG_ON(!PageLocked(page));
> > +	VM_BUG_ON(!page_mapping(page));
> >  
> >  	if (TestClearPageMlocked(page)) {
> >  		dec_zone_page_state(page, NR_MLOCK);
> > Index: linux-2.6.26-rc5-mm3/mm/vmscan.c
> > ===================================================================
> > --- linux-2.6.26-rc5-mm3.orig/mm/vmscan.c	2008-06-12 11:39:09.000000000 -0400
> > +++ linux-2.6.26-rc5-mm3/mm/vmscan.c	2008-06-13 09:44:44.000000000 -0400
> > @@ -1,4 +1,4 @@
> > -/*
> > + /*
> >   *  linux/mm/vmscan.c
> >   *
> >   *  Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
> > @@ -488,6 +488,9 @@ int remove_mapping(struct address_space 
> >   * lru_lock must not be held, interrupts must be enabled.
> >   * Must be called with page locked.
> >   *
> > + * If page truncated [page_mapping() == NULL] and we hold the last reference,
> > + * the page will be freed here.  For vmscan and page migration.
> > + *
> >   * return 1 if page still locked [not truncated], else 0
> >   */
> >  int putback_lru_page(struct page *page)
> > @@ -502,12 +505,11 @@ int putback_lru_page(struct page *page)
> >  	lru = !!TestClearPageActive(page);
> >  	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
> >  
> > -	if (unlikely(!page->mapping)) {
> > +	if (unlikely(!page->mapping && page_count(page) == 1)) {
> >  		/*
> > -		 * page truncated.  drop lock as put_page() will
> > -		 * free the page.
> > +		 * page truncated and we hold last reference.
> > +		 * drop lock as put_page() will free the page.
> >  		 */
> > -		VM_BUG_ON(page_count(page) != 1);
> >  		unlock_page(page);
> >  		ret = 0;
> >  	} else if (page_evictable(page, NULL)) {
> > 
> > 
> > 
> 

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
  2008-06-17  7:35   ` Daisuke Nishimura
  (?)
@ 2008-06-17 15:33     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-17 15:33 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: kosaki.motohiro, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

> @@ -715,13 +725,7 @@ unlock:
>   		 * restored.
>   		 */
>   		list_del(&page->lru);
> -		if (!page->mapping) {
> -			VM_BUG_ON(page_count(page) != 1);
> -			unlock_page(page);
> -			put_page(page);		/* just free the old page */
> -			goto end_migration;
> -		} else
> -			unlock = putback_lru_page(page);
> +		unlock = putback_lru_page(page);
>  	}
>  
>  	if (unlock)

this part is really necessary?
I tryed to remove it, but any problem doesn't happend.

Of cource, another part is definitly necessary for specurative pagecache :)





^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-17 15:33     ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-17 15:33 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: kosaki.motohiro, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

> @@ -715,13 +725,7 @@ unlock:
>   		 * restored.
>   		 */
>   		list_del(&page->lru);
> -		if (!page->mapping) {
> -			VM_BUG_ON(page_count(page) != 1);
> -			unlock_page(page);
> -			put_page(page);		/* just free the old page */
> -			goto end_migration;
> -		} else
> -			unlock = putback_lru_page(page);
> +		unlock = putback_lru_page(page);
>  	}
>  
>  	if (unlock)

this part is really necessary?
I tryed to remove it, but any problem doesn't happend.

Of cource, another part is definitly necessary for specurative pagecache :)




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-17 15:33     ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-17 15:33 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, Andrew Morton,
	Rik van Riel, Lee Schermerhorn, Nick Piggin,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

> @@ -715,13 +725,7 @@ unlock:
>   		 * restored.
>   		 */
>   		list_del(&page->lru);
> -		if (!page->mapping) {
> -			VM_BUG_ON(page_count(page) != 1);
> -			unlock_page(page);
> -			put_page(page);		/* just free the old page */
> -			goto end_migration;
> -		} else
> -			unlock = putback_lru_page(page);
> +		unlock = putback_lru_page(page);
>  	}
>  
>  	if (unlock)

this part is really necessary?
I tryed to remove it, but any problem doesn't happend.

Of cource, another part is definitly necessary for specurative pagecache :)




--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
  2008-06-17  7:47     ` Daisuke Nishimura
  (?)
@ 2008-06-17 15:34       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-17 15:34 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: kosaki.motohiro, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

> > I got this bug while migrating pages only a few times
> > via memory_migrate of cpuset.
> > 
> > Unfortunately, even if this patch is applied,
> > I got bad_page problem after hundreds times of page migration
> > (I'll report it in another mail).
> > But I believe something like this patch is needed anyway.
> > 
> 
> I got bad_page after hundreds times of page migration.
> It seems that a locked page is being freed.

I can't reproduce this bad page.
I'll try again tomorrow ;)




^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
@ 2008-06-17 15:34       ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-17 15:34 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: kosaki.motohiro, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

> > I got this bug while migrating pages only a few times
> > via memory_migrate of cpuset.
> > 
> > Unfortunately, even if this patch is applied,
> > I got bad_page problem after hundreds times of page migration
> > (I'll report it in another mail).
> > But I believe something like this patch is needed anyway.
> > 
> 
> I got bad_page after hundreds times of page migration.
> It seems that a locked page is being freed.

I can't reproduce this bad page.
I'll try again tomorrow ;)



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
@ 2008-06-17 15:34       ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-17 15:34 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, Andrew Morton,
	Rik van Riel, Lee Schermerhorn, Nick Piggin,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

> > I got this bug while migrating pages only a few times
> > via memory_migrate of cpuset.
> > 
> > Unfortunately, even if this patch is applied,
> > I got bad_page problem after hundreds times of page migration
> > (I'll report it in another mail).
> > But I believe something like this patch is needed anyway.
> > 
> 
> I got bad_page after hundreds times of page migration.
> It seems that a locked page is being freed.

I can't reproduce this bad page.
I'll try again tomorrow ;)



--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
  2008-06-17  7:35   ` Daisuke Nishimura
  (?)
@ 2008-06-17 17:46     ` Lee Schermerhorn
  -1 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-17 17:46 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, Rik van Riel, Kosaki Motohiro, Nick Piggin,
	linux-mm, linux-kernel, kernel-testers

On Tue, 2008-06-17 at 16:35 +0900, Daisuke Nishimura wrote:
> Hi.
> 
> I got this bug while migrating pages only a few times
> via memory_migrate of cpuset.

Ah, I did test migration fairly heavily, but not by moving cpusets.  

> 
> Unfortunately, even if this patch is applied,
> I got bad_page problem after hundreds times of page migration
> (I'll report it in another mail).
> But I believe something like this patch is needed anyway.

Agreed.  See comments below.
> 
> ------------[ cut here ]------------
> kernel BUG at mm/migrate.c:719!
> invalid opcode: 0000 [1] SMP
> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
> CPU 0
> Modules linked in: ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_log dm_multipath dm_mod sbs sbshc button battery acpi_memhotplug ac parport_pc lp parport floppy serio_raw rtc_cmos rtc_core rtc_lib 8139too pcspkr 8139cp mii ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode]
> Pid: 3096, comm: switch.sh Not tainted 2.6.26-rc5-mm3 #1
> RIP: 0010:[<ffffffff8029bb85>]  [<ffffffff8029bb85>] migrate_pages+0x33e/0x49f
> RSP: 0018:ffff81002f463bb8  EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffffe20000c17500 RCX: 0000000000000034
> RDX: ffffe20000c17500 RSI: ffffe200010003c0 RDI: ffffe20000c17528
> RBP: ffffe200010003c0 R08: 8000000000000000 R09: 304605894800282f
> R10: 282f87058b480028 R11: 0028304005894800 R12: ffff81003f90a5d8
> R13: 0000000000000000 R14: ffffe20000bf4cc0 R15: ffff81002f463c88
> FS:  00007ff9386576f0(0000) GS:ffffffff8061d800(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007ff938669000 CR3: 000000002f458000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process switch.sh (pid: 3096, threadinfo ffff81002f462000, task ffff81003e99cf10)
> Stack:  0000000000000001 ffffffff80290777 0000000000000000 0000000000000000
>  ffff81002f463c88 ffff81000000ea18 ffff81002f463c88 000000000000000c
>  ffff81002f463ca8 00007ffffffff000 00007fff649f6000 0000000000000004
> Call Trace:
>  [<ffffffff80290777>] ? new_node_page+0x0/0x2f
>  [<ffffffff80291611>] ? do_migrate_pages+0x19b/0x1e7
>  [<ffffffff802315c7>] ? set_cpus_allowed_ptr+0xe6/0xf3
>  [<ffffffff8025c827>] ? cpuset_migrate_mm+0x58/0x8f
>  [<ffffffff8025d0fd>] ? cpuset_attach+0x8b/0x9e
>  [<ffffffff8025a3e1>] ? cgroup_attach_task+0x3a3/0x3f5
>  [<ffffffff80276cb5>] ? __alloc_pages_internal+0xe2/0x3d1
>  [<ffffffff8025af06>] ? cgroup_common_file_write+0x150/0x1dd
>  [<ffffffff8025aaf4>] ? cgroup_file_write+0x54/0x150
>  [<ffffffff8029f839>] ? vfs_write+0xad/0x136
>  [<ffffffff8029fd76>] ? sys_write+0x45/0x6e
>  [<ffffffff8020bef2>] ? tracesys+0xd5/0xda
> 
> 
> Code: 4c 48 8d 7b 28 e8 cc 87 09 00 48 83 7b 18 00 75 30 48 8b 03 48 89 da 25 00 40 00 00 48 85 c0 74 04 48 8b 53 10 83 7a 08 01 74 04 <0f> 0b eb fe 48 89 df e8 5e 50 fd ff 48 89 df e8 7d d6 fd ff eb
> RIP  [<ffffffff8029bb85>] migrate_pages+0x33e/0x49f
>  RSP <ffff81002f463bb8>
> Clocksource tsc unstable (delta = 438246251 ns)
> ---[ end trace ce4e6053f7b9bba1 ]---
> 
> 
> This bug is caused by VM_BUG_ON() in unmap_and_move().
> 
> unmap_and_move()
>     710         if (rc != -EAGAIN) {
>     711                 /*
>     712                  * A page that has been migrated has all references
>     713                  * removed and will be freed. A page that has not been
>     714                  * migrated will have kepts its references and be
>     715                  * restored.
>     716                  */
>     717                 list_del(&page->lru);
>     718                 if (!page->mapping) {
>     719                         VM_BUG_ON(page_count(page) != 1);
>     720                         unlock_page(page);
>     721                         put_page(page);         /* just free the old page */
>     722                         goto end_migration;
>     723                 } else
>     724                         unlock = putback_lru_page(page);
>     725         }

I think that at least part of your patch, below, should fix this
problem.  See comments there.

Now I wonder if the assertion that newpage count == 1 could be violated?
I don't see how.  We've just allocated and filled it and haven't
unlocked it yet, so we should hold the only reference.  Do you agree?
> 
> I think the page count is not necessarily 1 here, because
> migration_entry_wait increases page count and waits for the
> page to be unlocked.
> So, if the old page is accessed between migrate_page_move_mapping,
> which checks the page count, and remove_migration_ptes, page count
> would not be 1 here.
> 
> Actually, just commenting out get/put_page from migration_entry_wait
> works well in my environment(succeeded in hundreds times of page migration),
> but modifying migration_entry_wait this way is not good, I think.
> 
> 
> This patch depends on Lee Schermerhorn's fix for double unlock_page.
> 
> This patch also fixes a race between migrate_entry_wait and
> page_freeze_refs in migrate_page_move_mapping.
> 
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> 
> ---
> diff -uprN linux-2.6.26-rc5-mm3/mm/migrate.c linux-2.6.26-rc5-mm3-test/mm/migrate.c
> --- linux-2.6.26-rc5-mm3/mm/migrate.c	2008-06-17 15:31:23.000000000 +0900
> +++ linux-2.6.26-rc5-mm3-test/mm/migrate.c	2008-06-17 13:59:15.000000000 +0900
> @@ -232,6 +232,7 @@ void migration_entry_wait(struct mm_stru
>  	swp_entry_t entry;
>  	struct page *page;
>  
> +retry:
>  	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
>  	pte = *ptep;
>  	if (!is_swap_pte(pte))
> @@ -243,11 +244,20 @@ void migration_entry_wait(struct mm_stru
>  
>  	page = migration_entry_to_page(entry);
>  
> -	get_page(page);
> -	pte_unmap_unlock(ptep, ptl);
> -	wait_on_page_locked(page);
> -	put_page(page);
> -	return;
> +	/*
> +	 * page count might be set to zero by page_freeze_refs()
> +	 * in migrate_page_move_mapping().
> +	 */
> +	if (get_page_unless_zero(page)) {
> +		pte_unmap_unlock(ptep, ptl);
> +		wait_on_page_locked(page);
> +		put_page(page);
> +		return;
> +	} else {
> +		pte_unmap_unlock(ptep, ptl);
> +		goto retry;
> +	}
> +

I'm not sure about this part.  If it IS needed, I think it would be
needed independently of the unevictable/putback_lru_page() changes, as
this race must have already existed.

However, unmap_and_move() replaced the migration entries with bona fide
pte's referencing the new page before freeing the old page, so I think
we're OK without this change.

>  out:
>  	pte_unmap_unlock(ptep, ptl);
>  }
> @@ -715,13 +725,7 @@ unlock:
>   		 * restored.
>   		 */
>   		list_del(&page->lru);
> -		if (!page->mapping) {
> -			VM_BUG_ON(page_count(page) != 1);
> -			unlock_page(page);
> -			put_page(page);		/* just free the old page */
> -			goto end_migration;
> -		} else
> -			unlock = putback_lru_page(page);
> +		unlock = putback_lru_page(page);
>  	}
>  
>  	if (unlock)

I agree with this part.  I came to the same conclusion looking at the
code.  If we just changed the if() and VM_BUG_ON() to:

if (!page->mapping && page_count(page) == 1) { ...

we'd be doing exactly what putback_lru_page() is doing.  So, this code
as always unnecessary, duplicate code [that I was trying to avoid :(].
So, just let putback_lru_page() handle this condition and conditionally
unlock_page().

I'm testing with my stress load with the 2nd part of the patch above and
it's holding up OK.  Of course, I didn't hit the problem before.  I'll
try your duplicator script and see what happens.

Regards,
Lee


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-17 17:46     ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-17 17:46 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, Rik van Riel, Kosaki Motohiro, Nick Piggin,
	linux-mm, linux-kernel, kernel-testers

On Tue, 2008-06-17 at 16:35 +0900, Daisuke Nishimura wrote:
> Hi.
> 
> I got this bug while migrating pages only a few times
> via memory_migrate of cpuset.

Ah, I did test migration fairly heavily, but not by moving cpusets.  

> 
> Unfortunately, even if this patch is applied,
> I got bad_page problem after hundreds times of page migration
> (I'll report it in another mail).
> But I believe something like this patch is needed anyway.

Agreed.  See comments below.
> 
> ------------[ cut here ]------------
> kernel BUG at mm/migrate.c:719!
> invalid opcode: 0000 [1] SMP
> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
> CPU 0
> Modules linked in: ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_log dm_multipath dm_mod sbs sbshc button battery acpi_memhotplug ac parport_pc lp parport floppy serio_raw rtc_cmos rtc_core rtc_lib 8139too pcspkr 8139cp mii ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode]
> Pid: 3096, comm: switch.sh Not tainted 2.6.26-rc5-mm3 #1
> RIP: 0010:[<ffffffff8029bb85>]  [<ffffffff8029bb85>] migrate_pages+0x33e/0x49f
> RSP: 0018:ffff81002f463bb8  EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffffe20000c17500 RCX: 0000000000000034
> RDX: ffffe20000c17500 RSI: ffffe200010003c0 RDI: ffffe20000c17528
> RBP: ffffe200010003c0 R08: 8000000000000000 R09: 304605894800282f
> R10: 282f87058b480028 R11: 0028304005894800 R12: ffff81003f90a5d8
> R13: 0000000000000000 R14: ffffe20000bf4cc0 R15: ffff81002f463c88
> FS:  00007ff9386576f0(0000) GS:ffffffff8061d800(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007ff938669000 CR3: 000000002f458000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process switch.sh (pid: 3096, threadinfo ffff81002f462000, task ffff81003e99cf10)
> Stack:  0000000000000001 ffffffff80290777 0000000000000000 0000000000000000
>  ffff81002f463c88 ffff81000000ea18 ffff81002f463c88 000000000000000c
>  ffff81002f463ca8 00007ffffffff000 00007fff649f6000 0000000000000004
> Call Trace:
>  [<ffffffff80290777>] ? new_node_page+0x0/0x2f
>  [<ffffffff80291611>] ? do_migrate_pages+0x19b/0x1e7
>  [<ffffffff802315c7>] ? set_cpus_allowed_ptr+0xe6/0xf3
>  [<ffffffff8025c827>] ? cpuset_migrate_mm+0x58/0x8f
>  [<ffffffff8025d0fd>] ? cpuset_attach+0x8b/0x9e
>  [<ffffffff8025a3e1>] ? cgroup_attach_task+0x3a3/0x3f5
>  [<ffffffff80276cb5>] ? __alloc_pages_internal+0xe2/0x3d1
>  [<ffffffff8025af06>] ? cgroup_common_file_write+0x150/0x1dd
>  [<ffffffff8025aaf4>] ? cgroup_file_write+0x54/0x150
>  [<ffffffff8029f839>] ? vfs_write+0xad/0x136
>  [<ffffffff8029fd76>] ? sys_write+0x45/0x6e
>  [<ffffffff8020bef2>] ? tracesys+0xd5/0xda
> 
> 
> Code: 4c 48 8d 7b 28 e8 cc 87 09 00 48 83 7b 18 00 75 30 48 8b 03 48 89 da 25 00 40 00 00 48 85 c0 74 04 48 8b 53 10 83 7a 08 01 74 04 <0f> 0b eb fe 48 89 df e8 5e 50 fd ff 48 89 df e8 7d d6 fd ff eb
> RIP  [<ffffffff8029bb85>] migrate_pages+0x33e/0x49f
>  RSP <ffff81002f463bb8>
> Clocksource tsc unstable (delta = 438246251 ns)
> ---[ end trace ce4e6053f7b9bba1 ]---
> 
> 
> This bug is caused by VM_BUG_ON() in unmap_and_move().
> 
> unmap_and_move()
>     710         if (rc != -EAGAIN) {
>     711                 /*
>     712                  * A page that has been migrated has all references
>     713                  * removed and will be freed. A page that has not been
>     714                  * migrated will have kepts its references and be
>     715                  * restored.
>     716                  */
>     717                 list_del(&page->lru);
>     718                 if (!page->mapping) {
>     719                         VM_BUG_ON(page_count(page) != 1);
>     720                         unlock_page(page);
>     721                         put_page(page);         /* just free the old page */
>     722                         goto end_migration;
>     723                 } else
>     724                         unlock = putback_lru_page(page);
>     725         }

I think that at least part of your patch, below, should fix this
problem.  See comments there.

Now I wonder if the assertion that newpage count == 1 could be violated?
I don't see how.  We've just allocated and filled it and haven't
unlocked it yet, so we should hold the only reference.  Do you agree?
> 
> I think the page count is not necessarily 1 here, because
> migration_entry_wait increases page count and waits for the
> page to be unlocked.
> So, if the old page is accessed between migrate_page_move_mapping,
> which checks the page count, and remove_migration_ptes, page count
> would not be 1 here.
> 
> Actually, just commenting out get/put_page from migration_entry_wait
> works well in my environment(succeeded in hundreds times of page migration),
> but modifying migration_entry_wait this way is not good, I think.
> 
> 
> This patch depends on Lee Schermerhorn's fix for double unlock_page.
> 
> This patch also fixes a race between migrate_entry_wait and
> page_freeze_refs in migrate_page_move_mapping.
> 
> 
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> 
> ---
> diff -uprN linux-2.6.26-rc5-mm3/mm/migrate.c linux-2.6.26-rc5-mm3-test/mm/migrate.c
> --- linux-2.6.26-rc5-mm3/mm/migrate.c	2008-06-17 15:31:23.000000000 +0900
> +++ linux-2.6.26-rc5-mm3-test/mm/migrate.c	2008-06-17 13:59:15.000000000 +0900
> @@ -232,6 +232,7 @@ void migration_entry_wait(struct mm_stru
>  	swp_entry_t entry;
>  	struct page *page;
>  
> +retry:
>  	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
>  	pte = *ptep;
>  	if (!is_swap_pte(pte))
> @@ -243,11 +244,20 @@ void migration_entry_wait(struct mm_stru
>  
>  	page = migration_entry_to_page(entry);
>  
> -	get_page(page);
> -	pte_unmap_unlock(ptep, ptl);
> -	wait_on_page_locked(page);
> -	put_page(page);
> -	return;
> +	/*
> +	 * page count might be set to zero by page_freeze_refs()
> +	 * in migrate_page_move_mapping().
> +	 */
> +	if (get_page_unless_zero(page)) {
> +		pte_unmap_unlock(ptep, ptl);
> +		wait_on_page_locked(page);
> +		put_page(page);
> +		return;
> +	} else {
> +		pte_unmap_unlock(ptep, ptl);
> +		goto retry;
> +	}
> +

I'm not sure about this part.  If it IS needed, I think it would be
needed independently of the unevictable/putback_lru_page() changes, as
this race must have already existed.

However, unmap_and_move() replaced the migration entries with bona fide
pte's referencing the new page before freeing the old page, so I think
we're OK without this change.

>  out:
>  	pte_unmap_unlock(ptep, ptl);
>  }
> @@ -715,13 +725,7 @@ unlock:
>   		 * restored.
>   		 */
>   		list_del(&page->lru);
> -		if (!page->mapping) {
> -			VM_BUG_ON(page_count(page) != 1);
> -			unlock_page(page);
> -			put_page(page);		/* just free the old page */
> -			goto end_migration;
> -		} else
> -			unlock = putback_lru_page(page);
> +		unlock = putback_lru_page(page);
>  	}
>  
>  	if (unlock)
i>>?
I agree with this part.  I came to the same conclusion looking at the
code.  If we just changed the if() and VM_BUG_ON() to:

if (!page->mapping && page_count(page) == 1) { ...

we'd be doing exactly what putback_lru_page() is doing.  So, this code
as always unnecessary, duplicate code [that I was trying to avoid :(].
So, just let putback_lru_page() handle this condition and conditionally
unlock_page().

I'm testing with my stress load with the 2nd part of the patch above and
it's holding up OK.  Of course, I didn't hit the problem before.  I'll
try your duplicator script and see what happens.

Regards,
Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-17 17:46     ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-17 17:46 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, Rik van Riel, Kosaki Motohiro, Nick Piggin,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Tue, 2008-06-17 at 16:35 +0900, Daisuke Nishimura wrote:
> Hi.
> 
> I got this bug while migrating pages only a few times
> via memory_migrate of cpuset.

Ah, I did test migration fairly heavily, but not by moving cpusets.  

> 
> Unfortunately, even if this patch is applied,
> I got bad_page problem after hundreds times of page migration
> (I'll report it in another mail).
> But I believe something like this patch is needed anyway.

Agreed.  See comments below.
> 
> ------------[ cut here ]------------
> kernel BUG at mm/migrate.c:719!
> invalid opcode: 0000 [1] SMP
> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
> CPU 0
> Modules linked in: ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_log dm_multipath dm_mod sbs sbshc button battery acpi_memhotplug ac parport_pc lp parport floppy serio_raw rtc_cmos rtc_core rtc_lib 8139too pcspkr 8139cp mii ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode]
> Pid: 3096, comm: switch.sh Not tainted 2.6.26-rc5-mm3 #1
> RIP: 0010:[<ffffffff8029bb85>]  [<ffffffff8029bb85>] migrate_pages+0x33e/0x49f
> RSP: 0018:ffff81002f463bb8  EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffffe20000c17500 RCX: 0000000000000034
> RDX: ffffe20000c17500 RSI: ffffe200010003c0 RDI: ffffe20000c17528
> RBP: ffffe200010003c0 R08: 8000000000000000 R09: 304605894800282f
> R10: 282f87058b480028 R11: 0028304005894800 R12: ffff81003f90a5d8
> R13: 0000000000000000 R14: ffffe20000bf4cc0 R15: ffff81002f463c88
> FS:  00007ff9386576f0(0000) GS:ffffffff8061d800(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007ff938669000 CR3: 000000002f458000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process switch.sh (pid: 3096, threadinfo ffff81002f462000, task ffff81003e99cf10)
> Stack:  0000000000000001 ffffffff80290777 0000000000000000 0000000000000000
>  ffff81002f463c88 ffff81000000ea18 ffff81002f463c88 000000000000000c
>  ffff81002f463ca8 00007ffffffff000 00007fff649f6000 0000000000000004
> Call Trace:
>  [<ffffffff80290777>] ? new_node_page+0x0/0x2f
>  [<ffffffff80291611>] ? do_migrate_pages+0x19b/0x1e7
>  [<ffffffff802315c7>] ? set_cpus_allowed_ptr+0xe6/0xf3
>  [<ffffffff8025c827>] ? cpuset_migrate_mm+0x58/0x8f
>  [<ffffffff8025d0fd>] ? cpuset_attach+0x8b/0x9e
>  [<ffffffff8025a3e1>] ? cgroup_attach_task+0x3a3/0x3f5
>  [<ffffffff80276cb5>] ? __alloc_pages_internal+0xe2/0x3d1
>  [<ffffffff8025af06>] ? cgroup_common_file_write+0x150/0x1dd
>  [<ffffffff8025aaf4>] ? cgroup_file_write+0x54/0x150
>  [<ffffffff8029f839>] ? vfs_write+0xad/0x136
>  [<ffffffff8029fd76>] ? sys_write+0x45/0x6e
>  [<ffffffff8020bef2>] ? tracesys+0xd5/0xda
> 
> 
> Code: 4c 48 8d 7b 28 e8 cc 87 09 00 48 83 7b 18 00 75 30 48 8b 03 48 89 da 25 00 40 00 00 48 85 c0 74 04 48 8b 53 10 83 7a 08 01 74 04 <0f> 0b eb fe 48 89 df e8 5e 50 fd ff 48 89 df e8 7d d6 fd ff eb
> RIP  [<ffffffff8029bb85>] migrate_pages+0x33e/0x49f
>  RSP <ffff81002f463bb8>
> Clocksource tsc unstable (delta = 438246251 ns)
> ---[ end trace ce4e6053f7b9bba1 ]---
> 
> 
> This bug is caused by VM_BUG_ON() in unmap_and_move().
> 
> unmap_and_move()
>     710         if (rc != -EAGAIN) {
>     711                 /*
>     712                  * A page that has been migrated has all references
>     713                  * removed and will be freed. A page that has not been
>     714                  * migrated will have kepts its references and be
>     715                  * restored.
>     716                  */
>     717                 list_del(&page->lru);
>     718                 if (!page->mapping) {
>     719                         VM_BUG_ON(page_count(page) != 1);
>     720                         unlock_page(page);
>     721                         put_page(page);         /* just free the old page */
>     722                         goto end_migration;
>     723                 } else
>     724                         unlock = putback_lru_page(page);
>     725         }

I think that at least part of your patch, below, should fix this
problem.  See comments there.

Now I wonder if the assertion that newpage count == 1 could be violated?
I don't see how.  We've just allocated and filled it and haven't
unlocked it yet, so we should hold the only reference.  Do you agree?
> 
> I think the page count is not necessarily 1 here, because
> migration_entry_wait increases page count and waits for the
> page to be unlocked.
> So, if the old page is accessed between migrate_page_move_mapping,
> which checks the page count, and remove_migration_ptes, page count
> would not be 1 here.
> 
> Actually, just commenting out get/put_page from migration_entry_wait
> works well in my environment(succeeded in hundreds times of page migration),
> but modifying migration_entry_wait this way is not good, I think.
> 
> 
> This patch depends on Lee Schermerhorn's fix for double unlock_page.
> 
> This patch also fixes a race between migrate_entry_wait and
> page_freeze_refs in migrate_page_move_mapping.
> 
> 
> Signed-off-by: Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
> 
> ---
> diff -uprN linux-2.6.26-rc5-mm3/mm/migrate.c linux-2.6.26-rc5-mm3-test/mm/migrate.c
> --- linux-2.6.26-rc5-mm3/mm/migrate.c	2008-06-17 15:31:23.000000000 +0900
> +++ linux-2.6.26-rc5-mm3-test/mm/migrate.c	2008-06-17 13:59:15.000000000 +0900
> @@ -232,6 +232,7 @@ void migration_entry_wait(struct mm_stru
>  	swp_entry_t entry;
>  	struct page *page;
>  
> +retry:
>  	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
>  	pte = *ptep;
>  	if (!is_swap_pte(pte))
> @@ -243,11 +244,20 @@ void migration_entry_wait(struct mm_stru
>  
>  	page = migration_entry_to_page(entry);
>  
> -	get_page(page);
> -	pte_unmap_unlock(ptep, ptl);
> -	wait_on_page_locked(page);
> -	put_page(page);
> -	return;
> +	/*
> +	 * page count might be set to zero by page_freeze_refs()
> +	 * in migrate_page_move_mapping().
> +	 */
> +	if (get_page_unless_zero(page)) {
> +		pte_unmap_unlock(ptep, ptl);
> +		wait_on_page_locked(page);
> +		put_page(page);
> +		return;
> +	} else {
> +		pte_unmap_unlock(ptep, ptl);
> +		goto retry;
> +	}
> +

I'm not sure about this part.  If it IS needed, I think it would be
needed independently of the unevictable/putback_lru_page() changes, as
this race must have already existed.

However, unmap_and_move() replaced the migration entries with bona fide
pte's referencing the new page before freeing the old page, so I think
we're OK without this change.

>  out:
>  	pte_unmap_unlock(ptep, ptl);
>  }
> @@ -715,13 +725,7 @@ unlock:
>   		 * restored.
>   		 */
>   		list_del(&page->lru);
> -		if (!page->mapping) {
> -			VM_BUG_ON(page_count(page) != 1);
> -			unlock_page(page);
> -			put_page(page);		/* just free the old page */
> -			goto end_migration;
> -		} else
> -			unlock = putback_lru_page(page);
> +		unlock = putback_lru_page(page);
>  	}
>  
>  	if (unlock)

I agree with this part.  I came to the same conclusion looking at the
code.  If we just changed the if() and VM_BUG_ON() to:

if (!page->mapping && page_count(page) == 1) { ...

we'd be doing exactly what putback_lru_page() is doing.  So, this code
as always unnecessary, duplicate code [that I was trying to avoid :(].
So, just let putback_lru_page() handle this condition and conditionally
unlock_page().

I'm testing with my stress load with the 2nd part of the patch above and
it's holding up OK.  Of course, I didn't hit the problem before.  I'll
try your duplicator script and see what happens.

Regards,
Lee

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
  2008-06-17  9:15         ` Daisuke Nishimura
  (?)
@ 2008-06-17 18:29           ` Lee Schermerhorn
  -1 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-17 18:29 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Tue, 2008-06-17 at 18:15 +0900, Daisuke Nishimura wrote:
> On Tue, 17 Jun 2008 18:03:14 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Tue, 17 Jun 2008 16:47:09 +0900
> > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > 
> > > On Tue, 17 Jun 2008 16:35:01 +0900, Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > > > Hi.
> > > > 
> > > > I got this bug while migrating pages only a few times
> > > > via memory_migrate of cpuset.
> > > > 
> > > > Unfortunately, even if this patch is applied,
> > > > I got bad_page problem after hundreds times of page migration
> > > > (I'll report it in another mail).
> > > > But I believe something like this patch is needed anyway.
> > > > 
> > > 
> > > I got bad_page after hundreds times of page migration.
> > > It seems that a locked page is being freed.

I'm seeing *mlocked* pages [PG_mlocked] being freed now with my stress
load, with just the "if(!page->mapping) { } clause removed, as proposed
in your rfc patch in previous mail.  Need to investigate this...

I'm not seeing *locked* pages [PG_lock], tho'.  From your stack trace,
it appears that migrate_page() left locked pages on the list of pages to
be putback.  The pages get locked and unlocked in unmap_and_move().  I
haven't found a path [yet] where the page can be returned still locked.
I think I need to duplicate the problem.

> > > 
> > Good catch, and I think your investigation in the last e-mail was correct.
> > I'd like to dig this...but it seems some kind of big fix is necessary.
> > Did this happen under page-migraion by cpuset-task-move test ?
> > 
> Yes.
> 
> I made 2 cpuset directories, run some processes in each cpusets,
> and run a script like below infinitely to move tasks and migrate pages.

What processes/tests do you run in each cpuset?

> 
> ---
> #!/bin/bash
> 
> G1=$1
> G2=$2
> 
> move_task()
> {
>         for pid in $1
>         do
>                 echo $pid >$2/tasks 2>/dev/null
>         done
> }
> 
> G1_TASK=`cat ${G1}/tasks`
> G2_TASK=`cat ${G2}/tasks`
> 
> move_task "${G1_TASK}" ${G2} &
> move_task "${G2_TASK}" ${G1} &
> 
> wait
> ---
> 
> I got this bad_page after running this script for about 600 times.
> 




^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
@ 2008-06-17 18:29           ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-17 18:29 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Tue, 2008-06-17 at 18:15 +0900, Daisuke Nishimura wrote:
> On Tue, 17 Jun 2008 18:03:14 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Tue, 17 Jun 2008 16:47:09 +0900
> > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > 
> > > On Tue, 17 Jun 2008 16:35:01 +0900, Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > > > Hi.
> > > > 
> > > > I got this bug while migrating pages only a few times
> > > > via memory_migrate of cpuset.
> > > > 
> > > > Unfortunately, even if this patch is applied,
> > > > I got bad_page problem after hundreds times of page migration
> > > > (I'll report it in another mail).
> > > > But I believe something like this patch is needed anyway.
> > > > 
> > > 
> > > I got bad_page after hundreds times of page migration.
> > > It seems that a locked page is being freed.

I'm seeing *mlocked* pages [PG_mlocked] being freed now with my stress
load, with just the "if(!page->mapping) { } clause removed, as proposed
in your rfc patch in previous mail.  Need to investigate this...

I'm not seeing *locked* pages [PG_lock], tho'.  From your stack trace,
it appears that migrate_page() left locked pages on the list of pages to
be putback.  The pages get locked and unlocked in unmap_and_move().  I
haven't found a path [yet] where the page can be returned still locked.
I think I need to duplicate the problem.

> > > 
> > Good catch, and I think your investigation in the last e-mail was correct.
> > I'd like to dig this...but it seems some kind of big fix is necessary.
> > Did this happen under page-migraion by cpuset-task-move test ?
> > 
> Yes.
> 
> I made 2 cpuset directories, run some processes in each cpusets,
> and run a script like below infinitely to move tasks and migrate pages.

What processes/tests do you run in each cpuset?

> 
> ---
> #!/bin/bash
> 
> G1=$1
> G2=$2
> 
> move_task()
> {
>         for pid in $1
>         do
>                 echo $pid >$2/tasks 2>/dev/null
>         done
> }
> 
> G1_TASK=`cat ${G1}/tasks`
> G2_TASK=`cat ${G2}/tasks`
> 
> move_task "${G1_TASK}" ${G2} &
> move_task "${G2_TASK}" ${G1} &
> 
> wait
> ---
> 
> I got this bad_page after running this script for about 600 times.
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
@ 2008-06-17 18:29           ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-17 18:29 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Tue, 2008-06-17 at 18:15 +0900, Daisuke Nishimura wrote:
> On Tue, 17 Jun 2008 18:03:14 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> > On Tue, 17 Jun 2008 16:47:09 +0900
> > Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org> wrote:
> > 
> > > On Tue, 17 Jun 2008 16:35:01 +0900, Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org> wrote:
> > > > Hi.
> > > > 
> > > > I got this bug while migrating pages only a few times
> > > > via memory_migrate of cpuset.
> > > > 
> > > > Unfortunately, even if this patch is applied,
> > > > I got bad_page problem after hundreds times of page migration
> > > > (I'll report it in another mail).
> > > > But I believe something like this patch is needed anyway.
> > > > 
> > > 
> > > I got bad_page after hundreds times of page migration.
> > > It seems that a locked page is being freed.

I'm seeing *mlocked* pages [PG_mlocked] being freed now with my stress
load, with just the "if(!page->mapping) { } clause removed, as proposed
in your rfc patch in previous mail.  Need to investigate this...

I'm not seeing *locked* pages [PG_lock], tho'.  From your stack trace,
it appears that migrate_page() left locked pages on the list of pages to
be putback.  The pages get locked and unlocked in unmap_and_move().  I
haven't found a path [yet] where the page can be returned still locked.
I think I need to duplicate the problem.

> > > 
> > Good catch, and I think your investigation in the last e-mail was correct.
> > I'd like to dig this...but it seems some kind of big fix is necessary.
> > Did this happen under page-migraion by cpuset-task-move test ?
> > 
> Yes.
> 
> I made 2 cpuset directories, run some processes in each cpusets,
> and run a script like below infinitely to move tasks and migrate pages.

What processes/tests do you run in each cpuset?

> 
> ---
> #!/bin/bash
> 
> G1=$1
> G2=$2
> 
> move_task()
> {
>         for pid in $1
>         do
>                 echo $pid >$2/tasks 2>/dev/null
>         done
> }
> 
> G1_TASK=`cat ${G1}/tasks`
> G2_TASK=`cat ${G2}/tasks`
> 
> move_task "${G1_TASK}" ${G2} &
> move_task "${G2_TASK}" ${G1} &
> 
> wait
> ---
> 
> I got this bad_page after running this script for about 600 times.
> 



--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
  2008-06-17 17:46     ` Lee Schermerhorn
  (?)
@ 2008-06-17 18:33       ` Hugh Dickins
  -1 siblings, 0 replies; 290+ messages in thread
From: Hugh Dickins @ 2008-06-17 18:33 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Tue, 17 Jun 2008, Lee Schermerhorn wrote:
> 
> Now I wonder if the assertion that newpage count == 1 could be violated?
> I don't see how.  We've just allocated and filled it and haven't
> unlocked it yet, so we should hold the only reference.  Do you agree?

Disagree: IIRC, excellent example of the kind of assumption
that becomes invalid with Nick's speculative page references.

Someone interested in the previous use of the page may have
incremented the refcount, and in due course will find that
it's got reused for something else, and will then back off.

Hugh

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-17 18:33       ` Hugh Dickins
  0 siblings, 0 replies; 290+ messages in thread
From: Hugh Dickins @ 2008-06-17 18:33 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Tue, 17 Jun 2008, Lee Schermerhorn wrote:
> 
> Now I wonder if the assertion that newpage count == 1 could be violated?
> I don't see how.  We've just allocated and filled it and haven't
> unlocked it yet, so we should hold the only reference.  Do you agree?

Disagree: IIRC, excellent example of the kind of assumption
that becomes invalid with Nick's speculative page references.

Someone interested in the previous use of the page may have
incremented the refcount, and in due course will find that
it's got reused for something else, and will then back off.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-17 18:33       ` Hugh Dickins
  0 siblings, 0 replies; 290+ messages in thread
From: Hugh Dickins @ 2008-06-17 18:33 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Tue, 17 Jun 2008, Lee Schermerhorn wrote:
> 
> Now I wonder if the assertion that newpage count == 1 could be violated?
> I don't see how.  We've just allocated and filled it and haven't
> unlocked it yet, so we should hold the only reference.  Do you agree?

Disagree: IIRC, excellent example of the kind of assumption
that becomes invalid with Nick's speculative page references.

Someone interested in the previous use of the page may have
incremented the refcount, and in due course will find that
it's got reused for something else, and will then back off.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
  2008-06-17 18:33       ` Hugh Dickins
  (?)
@ 2008-06-17 19:28         ` Lee Schermerhorn
  -1 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-17 19:28 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Tue, 2008-06-17 at 19:33 +0100, Hugh Dickins wrote:
> On Tue, 17 Jun 2008, Lee Schermerhorn wrote:
> > 
> > Now I wonder if the assertion that newpage count == 1 could be violated?
> > I don't see how.  We've just allocated and filled it and haven't
> > unlocked it yet, so we should hold the only reference.  Do you agree?
> 
> Disagree: IIRC, excellent example of the kind of assumption
> that becomes invalid with Nick's speculative page references.
> 
> Someone interested in the previous use of the page may have
> incremented the refcount, and in due course will find that
> it's got reused for something else, and will then back off.
> 

Yeah.  Kosaki-san mentioned that we'd need some rework for the
speculative page cache work.  Looks like we'll need to drop the
VM_BUG_ON().  

I need to go read up on the new invariants we can trust with the
speculative page cache.  

Thanks,
Lee


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-17 19:28         ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-17 19:28 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Tue, 2008-06-17 at 19:33 +0100, Hugh Dickins wrote:
> On Tue, 17 Jun 2008, Lee Schermerhorn wrote:
> > 
> > Now I wonder if the assertion that newpage count == 1 could be violated?
> > I don't see how.  We've just allocated and filled it and haven't
> > unlocked it yet, so we should hold the only reference.  Do you agree?
> 
> Disagree: IIRC, excellent example of the kind of assumption
> that becomes invalid with Nick's speculative page references.
> 
> Someone interested in the previous use of the page may have
> incremented the refcount, and in due course will find that
> it's got reused for something else, and will then back off.
> 

Yeah.  Kosaki-san mentioned that we'd need some rework for the
speculative page cache work.  Looks like we'll need to drop the
VM_BUG_ON().  

I need to go read up on the new invariants we can trust with the
speculative page cache.  

Thanks,
Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-17 19:28         ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-17 19:28 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Tue, 2008-06-17 at 19:33 +0100, Hugh Dickins wrote:
> On Tue, 17 Jun 2008, Lee Schermerhorn wrote:
> > 
> > Now I wonder if the assertion that newpage count == 1 could be violated?
> > I don't see how.  We've just allocated and filled it and haven't
> > unlocked it yet, so we should hold the only reference.  Do you agree?
> 
> Disagree: IIRC, excellent example of the kind of assumption
> that becomes invalid with Nick's speculative page references.
> 
> Someone interested in the previous use of the page may have
> incremented the refcount, and in due course will find that
> it's got reused for something else, and will then back off.
> 

Yeah.  Kosaki-san mentioned that we'd need some rework for the
speculative page cache work.  Looks like we'll need to drop the
VM_BUG_ON().  

I need to go read up on the new invariants we can trust with the
speculative page cache.  

Thanks,
Lee

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH] unevictable mlocked pages:  initialize mm member of munlock mm_walk structure
  2008-06-17 18:29           ` Lee Schermerhorn
  (?)
@ 2008-06-17 20:00             ` Lee Schermerhorn
  -1 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-17 20:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, Rik van Riel,
	Kosaki Motohiro, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers

was: Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix
kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)

On Tue, 2008-06-17 at 14:29 -0400, Lee Schermerhorn wrote:
> On Tue, 2008-06-17 at 18:15 +0900, Daisuke Nishimura wrote:
> > On Tue, 17 Jun 2008 18:03:14 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > On Tue, 17 Jun 2008 16:47:09 +0900
> > > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > > 
> > > > On Tue, 17 Jun 2008 16:35:01 +0900, Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > > > > Hi.
> > > > > 
> > > > > I got this bug while migrating pages only a few times
> > > > > via memory_migrate of cpuset.
> > > > > 
> > > > > Unfortunately, even if this patch is applied,
> > > > > I got bad_page problem after hundreds times of page migration
> > > > > (I'll report it in another mail).
> > > > > But I believe something like this patch is needed anyway.
> > > > > 
> > > > 
> > > > I got bad_page after hundreds times of page migration.
> > > > It seems that a locked page is being freed.
> 
> I'm seeing *mlocked* pages [PG_mlocked] being freed now with my stress
> load, with just the "if(!page->mapping) { } clause removed, as proposed
> in your rfc patch in previous mail.  Need to investigate this...
> 
<snip>

This [freeing of mlocked pages] also occurs in unpatched 26-rc5-mm3.

Fixed by the following:

PATCH:  fix munlock page table walk - now requires 'mm'

Against 2.6.26-rc5-mm3.

Incremental fix for: mlock-mlocked-pages-are-unevictable-fix.patch 

Initialize the 'mm' member of the mm_walk structure, else the
page table walk doesn't occur, and mlocked pages will not be
munlocked.  This is visible in the vmstats:  

	noreclaim_pgs_munlocked - should equal noreclaim_pgs_mlocked
	  less (nr_mlock + noreclaim_pgs_cleared), but is always zero 
	  [munlock_vma_page() never called]

	noreclaim_pgs_mlockfreed - should be zero [for debug only],
	  but == noreclaim_pgs_mlocked - (nr_mlock + noreclaim_pgs_cleared)


Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 mm/mlock.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6.26-rc5-mm3/mm/mlock.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/mm/mlock.c	2008-06-17 15:20:57.000000000 -0400
+++ linux-2.6.26-rc5-mm3/mm/mlock.c	2008-06-17 15:23:17.000000000 -0400
@@ -318,6 +318,8 @@ static void __munlock_vma_pages_range(st
 	VM_BUG_ON(start < vma->vm_start);
 	VM_BUG_ON(end > vma->vm_end);
 
+	munlock_page_walk.mm = mm;
+
 	lru_add_drain_all();	/* push cached pages to LRU */
 	walk_page_range(start, end, &munlock_page_walk);
 	lru_add_drain_all();	/* to update stats */




^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH] unevictable mlocked pages:  initialize mm member of munlock mm_walk structure
@ 2008-06-17 20:00             ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-17 20:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, Rik van Riel,
	Kosaki Motohiro, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers

kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)

On Tue, 2008-06-17 at 14:29 -0400, Lee Schermerhorn wrote:
> On Tue, 2008-06-17 at 18:15 +0900, Daisuke Nishimura wrote:
> > On Tue, 17 Jun 2008 18:03:14 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > On Tue, 17 Jun 2008 16:47:09 +0900
> > > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > > 
> > > > On Tue, 17 Jun 2008 16:35:01 +0900, Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > > > > Hi.
> > > > > 
> > > > > I got this bug while migrating pages only a few times
> > > > > via memory_migrate of cpuset.
> > > > > 
> > > > > Unfortunately, even if this patch is applied,
> > > > > I got bad_page problem after hundreds times of page migration
> > > > > (I'll report it in another mail).
> > > > > But I believe something like this patch is needed anyway.
> > > > > 
> > > > 
> > > > I got bad_page after hundreds times of page migration.
> > > > It seems that a locked page is being freed.
> 
> I'm seeing *mlocked* pages [PG_mlocked] being freed now with my stress
> load, with just the "if(!page->mapping) { } clause removed, as proposed
> in your rfc patch in previous mail.  Need to investigate this...
> 
<snip>

This [freeing of mlocked pages] also occurs in unpatched 26-rc5-mm3.

Fixed by the following:

PATCH:  fix munlock page table walk - now requires 'mm'

Against 2.6.26-rc5-mm3.

Incremental fix for: mlock-mlocked-pages-are-unevictable-fix.patch 

Initialize the 'mm' member of the mm_walk structure, else the
page table walk doesn't occur, and mlocked pages will not be
munlocked.  This is visible in the vmstats:  

	noreclaim_pgs_munlocked - should equal noreclaim_pgs_mlocked
	  less (nr_mlock + noreclaim_pgs_cleared), but is always zero 
	  [munlock_vma_page() never called]

	noreclaim_pgs_mlockfreed - should be zero [for debug only],
	  but == noreclaim_pgs_mlocked - (nr_mlock + noreclaim_pgs_cleared)


Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 mm/mlock.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6.26-rc5-mm3/mm/mlock.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/mm/mlock.c	2008-06-17 15:20:57.000000000 -0400
+++ linux-2.6.26-rc5-mm3/mm/mlock.c	2008-06-17 15:23:17.000000000 -0400
@@ -318,6 +318,8 @@ static void __munlock_vma_pages_range(st
 	VM_BUG_ON(start < vma->vm_start);
 	VM_BUG_ON(end > vma->vm_end);
 
+	munlock_page_walk.mm = mm;
+
 	lru_add_drain_all();	/* push cached pages to LRU */
 	walk_page_range(start, end, &munlock_page_walk);
 	lru_add_drain_all();	/* to update stats */



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH] unevictable mlocked pages:  initialize mm member of munlock mm_walk structure
@ 2008-06-17 20:00             ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-17 20:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, Rik van Riel,
	Kosaki Motohiro, Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

was: Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix
kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)

On Tue, 2008-06-17 at 14:29 -0400, Lee Schermerhorn wrote:
> On Tue, 2008-06-17 at 18:15 +0900, Daisuke Nishimura wrote:
> > On Tue, 17 Jun 2008 18:03:14 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> > > On Tue, 17 Jun 2008 16:47:09 +0900
> > > Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org> wrote:
> > > 
> > > > On Tue, 17 Jun 2008 16:35:01 +0900, Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org> wrote:
> > > > > Hi.
> > > > > 
> > > > > I got this bug while migrating pages only a few times
> > > > > via memory_migrate of cpuset.
> > > > > 
> > > > > Unfortunately, even if this patch is applied,
> > > > > I got bad_page problem after hundreds times of page migration
> > > > > (I'll report it in another mail).
> > > > > But I believe something like this patch is needed anyway.
> > > > > 
> > > > 
> > > > I got bad_page after hundreds times of page migration.
> > > > It seems that a locked page is being freed.
> 
> I'm seeing *mlocked* pages [PG_mlocked] being freed now with my stress
> load, with just the "if(!page->mapping) { } clause removed, as proposed
> in your rfc patch in previous mail.  Need to investigate this...
> 
<snip>

This [freeing of mlocked pages] also occurs in unpatched 26-rc5-mm3.

Fixed by the following:

PATCH:  fix munlock page table walk - now requires 'mm'

Against 2.6.26-rc5-mm3.

Incremental fix for: mlock-mlocked-pages-are-unevictable-fix.patch 

Initialize the 'mm' member of the mm_walk structure, else the
page table walk doesn't occur, and mlocked pages will not be
munlocked.  This is visible in the vmstats:  

	noreclaim_pgs_munlocked - should equal noreclaim_pgs_mlocked
	  less (nr_mlock + noreclaim_pgs_cleared), but is always zero 
	  [munlock_vma_page() never called]

	noreclaim_pgs_mlockfreed - should be zero [for debug only],
	  but == noreclaim_pgs_mlocked - (nr_mlock + noreclaim_pgs_cleared)


Signed-off-by: Lee Schermerhorn <lee.schermerhorn-VXdhtT5mjnY@public.gmane.org>

 mm/mlock.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6.26-rc5-mm3/mm/mlock.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/mm/mlock.c	2008-06-17 15:20:57.000000000 -0400
+++ linux-2.6.26-rc5-mm3/mm/mlock.c	2008-06-17 15:23:17.000000000 -0400
@@ -318,6 +318,8 @@ static void __munlock_vma_pages_range(st
 	VM_BUG_ON(start < vma->vm_start);
 	VM_BUG_ON(end > vma->vm_end);
 
+	munlock_page_walk.mm = mm;
+
 	lru_add_drain_all();	/* push cached pages to LRU */
 	walk_page_range(start, end, &munlock_page_walk);
 	lru_add_drain_all();	/* to update stats */



--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
  2008-06-17  7:35   ` Daisuke Nishimura
  (?)
@ 2008-06-18  1:13     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  1:13 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers, hugh

On Tue, 17 Jun 2008 16:35:01 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> This patch also fixes a race between migrate_entry_wait and
> page_freeze_refs in migrate_page_move_mapping.
> 
Ok, let's fix one by one. please add your Signed-off-by if ok.

This is a fix for page migration under speculative page lookup protocol.
-Kame
==
In speculative page cache lookup protocol, page_count(page) is set to 0
while radix-tree midification is going on, truncation, migration, etc...

While page migration, a page fault to page under migration should wait
unlock_page() and migration_entry_wait() waits for the page from its
pte entry. It does get_page() -> wait_on_page_locked() -> put_page() now.

In page migration, page_freeze_refs() -> page_unfreeze_refs() is called.

Here, page_unfreeze_refs() expects page_count(page) == 0 and panics
if page_count(page) != 0. To avoid this, we shouldn't touch page_count()
if it is zero. This patch uses page_cache_get_speculative() to avoid
the panic.

From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/migrate.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: test-2.6.26-rc5-mm3/mm/migrate.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/migrate.c
+++ test-2.6.26-rc5-mm3/mm/migrate.c
@@ -243,7 +243,8 @@ void migration_entry_wait(struct mm_stru
 
 	page = migration_entry_to_page(entry);
 
-	get_page(page);
+	if (!page_cache_get_speculative())
+		goto out;
 	pte_unmap_unlock(ptep, ptl);
 	wait_on_page_locked(page);
 	put_page(page);



^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-18  1:13     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  1:13 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers, hugh

On Tue, 17 Jun 2008 16:35:01 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> This patch also fixes a race between migrate_entry_wait and
> page_freeze_refs in migrate_page_move_mapping.
> 
Ok, let's fix one by one. please add your Signed-off-by if ok.

This is a fix for page migration under speculative page lookup protocol.
-Kame
==
In speculative page cache lookup protocol, page_count(page) is set to 0
while radix-tree midification is going on, truncation, migration, etc...

While page migration, a page fault to page under migration should wait
unlock_page() and migration_entry_wait() waits for the page from its
pte entry. It does get_page() -> wait_on_page_locked() -> put_page() now.

In page migration, page_freeze_refs() -> page_unfreeze_refs() is called.

Here, page_unfreeze_refs() expects page_count(page) == 0 and panics
if page_count(page) != 0. To avoid this, we shouldn't touch page_count()
if it is zero. This patch uses page_cache_get_speculative() to avoid
the panic.

From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/migrate.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: test-2.6.26-rc5-mm3/mm/migrate.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/migrate.c
+++ test-2.6.26-rc5-mm3/mm/migrate.c
@@ -243,7 +243,8 @@ void migration_entry_wait(struct mm_stru
 
 	page = migration_entry_to_page(entry);
 
-	get_page(page);
+	if (!page_cache_get_speculative())
+		goto out;
 	pte_unmap_unlock(ptep, ptl);
 	wait_on_page_locked(page);
 	put_page(page);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-18  1:13     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  1:13 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Kosaki Motohiro,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	hugh-DTz5qymZ9yRBDgjK7y7TUQ

On Tue, 17 Jun 2008 16:35:01 +0900
Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org> wrote:

> This patch also fixes a race between migrate_entry_wait and
> page_freeze_refs in migrate_page_move_mapping.
> 
Ok, let's fix one by one. please add your Signed-off-by if ok.

This is a fix for page migration under speculative page lookup protocol.
-Kame
==
In speculative page cache lookup protocol, page_count(page) is set to 0
while radix-tree midification is going on, truncation, migration, etc...

While page migration, a page fault to page under migration should wait
unlock_page() and migration_entry_wait() waits for the page from its
pte entry. It does get_page() -> wait_on_page_locked() -> put_page() now.

In page migration, page_freeze_refs() -> page_unfreeze_refs() is called.

Here, page_unfreeze_refs() expects page_count(page) == 0 and panics
if page_count(page) != 0. To avoid this, we shouldn't touch page_count()
if it is zero. This patch uses page_cache_get_speculative() to avoid
the panic.

From: Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
---
 mm/migrate.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: test-2.6.26-rc5-mm3/mm/migrate.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/migrate.c
+++ test-2.6.26-rc5-mm3/mm/migrate.c
@@ -243,7 +243,8 @@ void migration_entry_wait(struct mm_stru
 
 	page = migration_entry_to_page(entry);
 
-	get_page(page);
+	if (!page_cache_get_speculative())
+		goto out;
 	pte_unmap_unlock(ptep, ptl);
 	wait_on_page_locked(page);
 	put_page(page);


--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
  2008-06-18  1:13     ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-18  1:26       ` Daisuke Nishimura
  -1 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-18  1:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers, hugh

On Wed, 18 Jun 2008 10:13:49 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 17 Jun 2008 16:35:01 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > This patch also fixes a race between migrate_entry_wait and
> > page_freeze_refs in migrate_page_move_mapping.
> > 
> Ok, let's fix one by one. please add your Signed-off-by if ok.
> 
Agree. It should be fixed independently.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

> This is a fix for page migration under speculative page lookup protocol.
> -Kame
> ==
> In speculative page cache lookup protocol, page_count(page) is set to 0
> while radix-tree midification is going on, truncation, migration, etc...
> 
> While page migration, a page fault to page under migration should wait
> unlock_page() and migration_entry_wait() waits for the page from its
> pte entry. It does get_page() -> wait_on_page_locked() -> put_page() now.
> 
> In page migration, page_freeze_refs() -> page_unfreeze_refs() is called.
> 
> Here, page_unfreeze_refs() expects page_count(page) == 0 and panics
> if page_count(page) != 0. To avoid this, we shouldn't touch page_count()
> if it is zero. This patch uses page_cache_get_speculative() to avoid
> the panic.
> 
> From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/migrate.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Index: test-2.6.26-rc5-mm3/mm/migrate.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/migrate.c
> +++ test-2.6.26-rc5-mm3/mm/migrate.c
> @@ -243,7 +243,8 @@ void migration_entry_wait(struct mm_stru
>  
>  	page = migration_entry_to_page(entry);
>  
> -	get_page(page);
> +	if (!page_cache_get_speculative())
> +		goto out;
>  	pte_unmap_unlock(ptep, ptl);
>  	wait_on_page_locked(page);
>  	put_page(page);
> 


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-18  1:26       ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-18  1:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers, hugh

On Wed, 18 Jun 2008 10:13:49 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 17 Jun 2008 16:35:01 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > This patch also fixes a race between migrate_entry_wait and
> > page_freeze_refs in migrate_page_move_mapping.
> > 
> Ok, let's fix one by one. please add your Signed-off-by if ok.
> 
Agree. It should be fixed independently.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

> This is a fix for page migration under speculative page lookup protocol.
> -Kame
> ==
> In speculative page cache lookup protocol, page_count(page) is set to 0
> while radix-tree midification is going on, truncation, migration, etc...
> 
> While page migration, a page fault to page under migration should wait
> unlock_page() and migration_entry_wait() waits for the page from its
> pte entry. It does get_page() -> wait_on_page_locked() -> put_page() now.
> 
> In page migration, page_freeze_refs() -> page_unfreeze_refs() is called.
> 
> Here, page_unfreeze_refs() expects page_count(page) == 0 and panics
> if page_count(page) != 0. To avoid this, we shouldn't touch page_count()
> if it is zero. This patch uses page_cache_get_speculative() to avoid
> the panic.
> 
> From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/migrate.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Index: test-2.6.26-rc5-mm3/mm/migrate.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/migrate.c
> +++ test-2.6.26-rc5-mm3/mm/migrate.c
> @@ -243,7 +243,8 @@ void migration_entry_wait(struct mm_stru
>  
>  	page = migration_entry_to_page(entry);
>  
> -	get_page(page);
> +	if (!page_cache_get_speculative())
> +		goto out;
>  	pte_unmap_unlock(ptep, ptl);
>  	wait_on_page_locked(page);
>  	put_page(page);
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-18  1:26       ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-18  1:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Kosaki Motohiro,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	hugh-DTz5qymZ9yRBDgjK7y7TUQ

On Wed, 18 Jun 2008 10:13:49 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> On Tue, 17 Jun 2008 16:35:01 +0900
> Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org> wrote:
> 
> > This patch also fixes a race between migrate_entry_wait and
> > page_freeze_refs in migrate_page_move_mapping.
> > 
> Ok, let's fix one by one. please add your Signed-off-by if ok.
> 
Agree. It should be fixed independently.

Signed-off-by: Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>

> This is a fix for page migration under speculative page lookup protocol.
> -Kame
> ==
> In speculative page cache lookup protocol, page_count(page) is set to 0
> while radix-tree midification is going on, truncation, migration, etc...
> 
> While page migration, a page fault to page under migration should wait
> unlock_page() and migration_entry_wait() waits for the page from its
> pte entry. It does get_page() -> wait_on_page_locked() -> put_page() now.
> 
> In page migration, page_freeze_refs() -> page_unfreeze_refs() is called.
> 
> Here, page_unfreeze_refs() expects page_count(page) == 0 and panics
> if page_count(page) != 0. To avoid this, we shouldn't touch page_count()
> if it is zero. This patch uses page_cache_get_speculative() to avoid
> the panic.
> 
> From: Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> ---
>  mm/migrate.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Index: test-2.6.26-rc5-mm3/mm/migrate.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/migrate.c
> +++ test-2.6.26-rc5-mm3/mm/migrate.c
> @@ -243,7 +243,8 @@ void migration_entry_wait(struct mm_stru
>  
>  	page = migration_entry_to_page(entry);
>  
> -	get_page(page);
> +	if (!page_cache_get_speculative())
> +		goto out;
>  	pte_unmap_unlock(ptep, ptl);
>  	wait_on_page_locked(page);
>  	put_page(page);
> 

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
  2008-06-17 15:33     ` KOSAKI Motohiro
  (?)
@ 2008-06-18  1:54       ` Daisuke Nishimura
  -1 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-18  1:54 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Nick Piggin,
	linux-mm, linux-kernel, kernel-testers

On Wed, 18 Jun 2008 00:33:18 +0900, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> > @@ -715,13 +725,7 @@ unlock:
> >   		 * restored.
> >   		 */
> >   		list_del(&page->lru);
> > -		if (!page->mapping) {
> > -			VM_BUG_ON(page_count(page) != 1);
> > -			unlock_page(page);
> > -			put_page(page);		/* just free the old page */
> > -			goto end_migration;
> > -		} else
> > -			unlock = putback_lru_page(page);
> > +		unlock = putback_lru_page(page);
> >  	}
> >  
> >  	if (unlock)
> 
> this part is really necessary?
> I tryed to remove it, but any problem doesn't happend.
> 
I made this part first, and added a fix for migration_entry_wait later.

So, I haven't test without this part, and I think it will cause
VM_BUG_ON() here without this part.

Anyway, I will test it.


> Of cource, another part is definitly necessary for specurative pagecache :)
> 

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-18  1:54       ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-18  1:54 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Nick Piggin,
	linux-mm, linux-kernel, kernel-testers

On Wed, 18 Jun 2008 00:33:18 +0900, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> > @@ -715,13 +725,7 @@ unlock:
> >   		 * restored.
> >   		 */
> >   		list_del(&page->lru);
> > -		if (!page->mapping) {
> > -			VM_BUG_ON(page_count(page) != 1);
> > -			unlock_page(page);
> > -			put_page(page);		/* just free the old page */
> > -			goto end_migration;
> > -		} else
> > -			unlock = putback_lru_page(page);
> > +		unlock = putback_lru_page(page);
> >  	}
> >  
> >  	if (unlock)
> 
> this part is really necessary?
> I tryed to remove it, but any problem doesn't happend.
> 
I made this part first, and added a fix for migration_entry_wait later.

So, I haven't test without this part, and I think it will cause
VM_BUG_ON() here without this part.

Anyway, I will test it.


> Of cource, another part is definitly necessary for specurative pagecache :)
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-18  1:54       ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-18  1:54 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Nick Piggin,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Wed, 18 Jun 2008 00:33:18 +0900, KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> > @@ -715,13 +725,7 @@ unlock:
> >   		 * restored.
> >   		 */
> >   		list_del(&page->lru);
> > -		if (!page->mapping) {
> > -			VM_BUG_ON(page_count(page) != 1);
> > -			unlock_page(page);
> > -			put_page(page);		/* just free the old page */
> > -			goto end_migration;
> > -		} else
> > -			unlock = putback_lru_page(page);
> > +		unlock = putback_lru_page(page);
> >  	}
> >  
> >  	if (unlock)
> 
> this part is really necessary?
> I tryed to remove it, but any problem doesn't happend.
> 
I made this part first, and added a fix for migration_entry_wait later.

So, I haven't test without this part, and I think it will cause
VM_BUG_ON() here without this part.

Anyway, I will test it.


> Of cource, another part is definitly necessary for specurative pagecache :)
> 
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH] migration_entry_wait fix.
  2008-06-18  1:13     ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-18  1:54       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  1:54 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers, hugh

On Wed, 18 Jun 2008 10:13:49 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> +	if (!page_cache_get_speculative())
> +		goto out;
This is obviously buggy....sorry..quilt refresh miss..

==
In speculative page cache lookup protocol, page_count(page) is set to 0
while radix-tree modification is going on, truncation, migration, etc...

While page migration, a page fault to page under migration should wait
unlock_page() and migration_entry_wait() waits for the page from its
pte entry. It does get_page() -> wait_on_page_locked() -> put_page() now.

In page migration, page_freeze_refs() -> page_unfreeze_refs() is called.

Here, page_unfreeze_refs() expects page_count(page) == 0 and panics
if page_count(page) != 0. To avoid this, we shouldn't touch page_count()
if it is zero. This patch uses page_cache_get_speculative() to avoid
the panic.

From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/migrate.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: test-2.6.26-rc5-mm3/mm/migrate.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/migrate.c
+++ test-2.6.26-rc5-mm3/mm/migrate.c
@@ -243,7 +243,8 @@ void migration_entry_wait(struct mm_stru
 
 	page = migration_entry_to_page(entry);
 
-	get_page(page);
+	if (!page_cache_get_speculative(page))
+		goto out;
 	pte_unmap_unlock(ptep, ptl);
 	wait_on_page_locked(page);
 	put_page(page);


^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH] migration_entry_wait fix.
@ 2008-06-18  1:54       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  1:54 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers, hugh

On Wed, 18 Jun 2008 10:13:49 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> +	if (!page_cache_get_speculative())
> +		goto out;
This is obviously buggy....sorry..quilt refresh miss..

==
In speculative page cache lookup protocol, page_count(page) is set to 0
while radix-tree modification is going on, truncation, migration, etc...

While page migration, a page fault to page under migration should wait
unlock_page() and migration_entry_wait() waits for the page from its
pte entry. It does get_page() -> wait_on_page_locked() -> put_page() now.

In page migration, page_freeze_refs() -> page_unfreeze_refs() is called.

Here, page_unfreeze_refs() expects page_count(page) == 0 and panics
if page_count(page) != 0. To avoid this, we shouldn't touch page_count()
if it is zero. This patch uses page_cache_get_speculative() to avoid
the panic.

From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/migrate.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: test-2.6.26-rc5-mm3/mm/migrate.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/migrate.c
+++ test-2.6.26-rc5-mm3/mm/migrate.c
@@ -243,7 +243,8 @@ void migration_entry_wait(struct mm_stru
 
 	page = migration_entry_to_page(entry);
 
-	get_page(page);
+	if (!page_cache_get_speculative(page))
+		goto out;
 	pte_unmap_unlock(ptep, ptl);
 	wait_on_page_locked(page);
 	put_page(page);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH] migration_entry_wait fix.
@ 2008-06-18  1:54       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  1:54 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	hugh-DTz5qymZ9yRBDgjK7y7TUQ

On Wed, 18 Jun 2008 10:13:49 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:

> +	if (!page_cache_get_speculative())
> +		goto out;
This is obviously buggy....sorry..quilt refresh miss..

==
In speculative page cache lookup protocol, page_count(page) is set to 0
while radix-tree modification is going on, truncation, migration, etc...

While page migration, a page fault to page under migration should wait
unlock_page() and migration_entry_wait() waits for the page from its
pte entry. It does get_page() -> wait_on_page_locked() -> put_page() now.

In page migration, page_freeze_refs() -> page_unfreeze_refs() is called.

Here, page_unfreeze_refs() expects page_count(page) == 0 and panics
if page_count(page) != 0. To avoid this, we shouldn't touch page_count()
if it is zero. This patch uses page_cache_get_speculative() to avoid
the panic.

From: Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
Signed-off-by: Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
---
 mm/migrate.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: test-2.6.26-rc5-mm3/mm/migrate.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/migrate.c
+++ test-2.6.26-rc5-mm3/mm/migrate.c
@@ -243,7 +243,8 @@ void migration_entry_wait(struct mm_stru
 
 	page = migration_entry_to_page(entry);
 
-	get_page(page);
+	if (!page_cache_get_speculative(page))
+		goto out;
 	pte_unmap_unlock(ptep, ptl);
 	wait_on_page_locked(page);
 	put_page(page);

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
@ 2008-06-18  2:32         ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-18  2:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Nick Piggin,
	linux-mm, linux-kernel, kernel-testers

[-- Attachment #1: Type: text/plain, Size: 1978 bytes --]

On Wed, 18 Jun 2008 00:34:16 +0900, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> > > I got this bug while migrating pages only a few times
> > > via memory_migrate of cpuset.
> > > 
> > > Unfortunately, even if this patch is applied,
> > > I got bad_page problem after hundreds times of page migration
> > > (I'll report it in another mail).
> > > But I believe something like this patch is needed anyway.
> > > 
> > 
> > I got bad_page after hundreds times of page migration.
> > It seems that a locked page is being freed.
> 
> I can't reproduce this bad page.
> I'll try again tomorrow ;)
> 

OK. I'll report on my test more precisely.

- Environment
  HW: 4CPU(x86_64), 2node NUMA
  kernel: 2.6.26-rc5-mm3 + Lee's two fixes about double unlock_page
          + my patch. config is attached.

- mount cpuset and make settings
  # mount -t cgroup -o cpuset cpuset /cgroup/cpuset

  # mkdir /cgroup/cpuset/01
  # echo 0-1 >/cgroup/cpuset/01/cpuset.cpus
  # echo 0 >/cgroup/cpuset/01/cpuset.mems
  # echo 1 >/cgroup/cpuset/01/cpuset.memory_migrate

  # mkdir /cgroup/cpuset/02
  # echo 2-3 >/cgroup/cpuset/02/cpuset.cpus
  # echo 1 >/cgroup/cpuset/02/cpuset.mems
  # echo 1 >/cgroup/cpuset/02/cpuset.memory_migrate

- register processes in cpusets
  # echo $$ >/cgroup/cpuset/01/tasks

  I'm using LTP's page01 test, and running two instances infinitely.
  # while true; do (somewhere)/page01 4194304 1; done &
  # while true; do (somewhere)/page01 4194304 1; done &

  The same thing should be done about 02 directory.

- echo pids to another directory
  Run simple script like below.

---
#!/bin/bash

G1=$1
G2=$2

move_task()
{
        for pid in $1
        do
                echo $pid >$2/tasks 2>/dev/null
        done
}

G1_TASK=`cat ${G1}/tasks`
G2_TASK=`cat ${G2}/tasks`

move_task "${G1_TASK}" ${G2} &
move_task "${G2_TASK}" ${G1} &

wait
---

Please let me know if you need other information.
I'm also digging this problem.


Thanks,
Daisuke Nishimura.


[-- Attachment #2: config-2.6.26-rc5-mm3 --]
[-- Type: text/plain, Size: 76269 bytes --]

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.26-rc5-mm3
# Thu Jun 12 15:26:49 2008
#
CONFIG_64BIT=y
# CONFIG_X86_32 is not set
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
# CONFIG_GENERIC_LOCKBREAK is not set
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
# CONFIG_GENERIC_GPIO is not set
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
# CONFIG_RWSEM_XCHGADD_ALGORITHM is not set
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_HAVE_CPUMASK_OF_CPU_MAP=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_AOUT=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_X86_SMP=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_X86_TRAMPOLINE=y
# CONFIG_KTIME_SCALAR is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
# CONFIG_TASK_XACCT is not set
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_TREE=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_CGROUP_NS is not set
# CONFIG_CGROUP_DEVICE is not set
CONFIG_CPUSETS=y
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_GROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_RT_GROUP_SCHED is not set
# CONFIG_USER_SCHED is not set
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_MM_OWNER=y
CONFIG_CGROUP_MEM_RES_CTLR=y
CONFIG_CGROUP_MEMRLIMIT_CTLR=y
CONFIG_SYSFS_DEPRECATED=y
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_RELAY=y
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_SYSCTL_SYSCALL_CHECK=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
CONFIG_KALLSYMS_EXTRA_PASS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_COMPAT_BRK=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_PROFILING=y
# CONFIG_MARKERS is not set
CONFIG_OPROFILE=m
CONFIG_HAVE_OPROFILE=y
CONFIG_KPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
# CONFIG_HAVE_DMA_ATTRS is not set
# CONFIG_HAVE_CLK is not set

#
# GCOV profiling
#
# CONFIG_GCOV_PROFILE is not set
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_BLK_DEV_IO_TRACE=y
# CONFIG_BLK_DEV_BSG is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_CLASSIC_RCU=y

#
# Processor type and features
#
# CONFIG_TICK_ONESHOT is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_X86_RDC321X is not set
# CONFIG_X86_VSMP is not set
# CONFIG_PARAVIRT_GUEST is not set
CONFIG_MEMTEST=y
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_CPU=y
CONFIG_X86_L1_CACHE_BYTES=128
CONFIG_X86_INTERNODE_CACHE_BYTES=128
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
CONFIG_GART_IOMMU=y
CONFIG_CALGARY_IOMMU=y
# CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT is not set
CONFIG_SWIOTLB=y
CONFIG_IOMMU_HELPER=y
# CONFIG_MAXSMP is not set
CONFIG_NR_CPUS=255
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
# CONFIG_I8K is not set
CONFIG_MICROCODE=m
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_NUMA=y
CONFIG_K8_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_NODES_SPAN_OTHER_NODES=y
CONFIG_NUMA_EMU=y
CONFIG_NODES_SHIFT=6
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_MEMORY_PROBE=y
CONFIG_ILLEGAL_POINTER_VALUE=0xffffc10000000000
CONFIG_SELECT_MEMORY_MODEL=y
# CONFIG_FLATMEM_MANUAL is not set
# CONFIG_DISCONTIGMEM_MANUAL is not set
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_HAVE_GET_USER_PAGES_FAST=y
CONFIG_NEED_MULTIPLE_NODES=y
CONFIG_HAVE_MEMORY_PRESENT=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_MIGRATION=y
CONFIG_RESOURCES_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_UNEVICTABLE_LRU=y
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=0
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
# CONFIG_X86_PAT is not set
# CONFIG_EFI is not set
# CONFIG_SECCOMP is not set
# CONFIG_CC_STACKPROTECTOR is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
# CONFIG_SCHED_HRTICK is not set
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x200000
# CONFIG_RELOCATABLE is not set
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_HOTPLUG_CPU=y
CONFIG_COMPAT_VDSO=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y

#
# Power management options
#
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
CONFIG_PM_SLEEP_SMP=y
CONFIG_PM_SLEEP=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
# CONFIG_HIBERNATION is not set
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
# CONFIG_ACPI_PROCFS is not set
CONFIG_ACPI_PROCFS_POWER=y
CONFIG_ACPI_SYSFS_POWER=y
CONFIG_ACPI_PROC_EVENT=y
CONFIG_ACPI_AC=m
CONFIG_ACPI_BATTERY=m
CONFIG_ACPI_BUTTON=m
CONFIG_ACPI_FAN=y
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_BAY=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_NUMA=y
# CONFIG_ACPI_WMI is not set
CONFIG_ACPI_ASUS=m
CONFIG_ACPI_TOSHIBA=m
# CONFIG_ACPI_CUSTOM_DSDT is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_EC=y
# CONFIG_ACPI_PCI_SLOT is not set
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
CONFIG_ACPI_HOTPLUG_MEMORY=m
CONFIG_ACPI_SBS=m

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
CONFIG_CPU_FREQ_DEBUG=y
CONFIG_CPU_FREQ_STAT=m
CONFIG_CPU_FREQ_STAT_DETAILS=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=m
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=m
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=m

#
# CPUFreq processor drivers
#
CONFIG_X86_ACPI_CPUFREQ=m
CONFIG_X86_POWERNOW_K8=y
CONFIG_X86_POWERNOW_K8_ACPI=y
CONFIG_X86_SPEEDSTEP_CENTRINO=y
# CONFIG_X86_P4_CLOCKMOD is not set

#
# shared options
#
# CONFIG_X86_ACPI_CPUFREQ_PROC_INTF is not set
# CONFIG_X86_SPEEDSTEP_LIB is not set
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
# CONFIG_DMAR is not set
CONFIG_PCIEPORTBUS=y
CONFIG_HOTPLUG_PCI_PCIE=m
CONFIG_PCIEAER=y
# CONFIG_PCIEASPM is not set
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_PCI_MSI=y
CONFIG_PCI_LEGACY=y
# CONFIG_PCI_DEBUG is not set
CONFIG_HT_IRQ=y
CONFIG_ISA_DMA_API=y
CONFIG_K8_NB=y
CONFIG_PCCARD=y
# CONFIG_PCMCIA_DEBUG is not set
CONFIG_PCMCIA=y
CONFIG_PCMCIA_LOAD_CIS=y
CONFIG_PCMCIA_IOCTL=y
CONFIG_CARDBUS=y

#
# PC-card bridges
#
CONFIG_YENTA=y
CONFIG_YENTA_O2=y
CONFIG_YENTA_RICOH=y
CONFIG_YENTA_TI=y
CONFIG_YENTA_ENE_TUNE=y
CONFIG_YENTA_TOSHIBA=y
CONFIG_PD6729=m
# CONFIG_I82092 is not set
CONFIG_PCCARD_NONSTATIC=y
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_FAKE=m
CONFIG_HOTPLUG_PCI_ACPI=m
CONFIG_HOTPLUG_PCI_ACPI_IBM=m
# CONFIG_HOTPLUG_PCI_CPCI is not set
CONFIG_HOTPLUG_PCI_SHPC=m

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
CONFIG_IA32_EMULATION=y
# CONFIG_IA32_AOUT is not set
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y

#
# Networking
#
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
CONFIG_XFRM=y
CONFIG_XFRM_USER=y
# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_XFRM_MIGRATE is not set
# CONFIG_XFRM_STATISTICS is not set
CONFIG_NET_KEY=m
# CONFIG_NET_KEY_MIGRATE is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_ASK_IP_FIB_HASH=y
# CONFIG_IP_FIB_TRIE is not set
CONFIG_IP_FIB_HASH=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
# CONFIG_IP_PNP is not set
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE=m
CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
# CONFIG_ARPD is not set
CONFIG_SYN_COOKIES=y
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_XFRM_MODE_TRANSPORT=m
CONFIG_INET_XFRM_MODE_TUNNEL=m
CONFIG_INET_XFRM_MODE_BEET=y
CONFIG_INET_LRO=m
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=y
CONFIG_TCP_CONG_CUBIC=m
CONFIG_TCP_CONG_WESTWOOD=m
CONFIG_TCP_CONG_HTCP=m
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
CONFIG_TCP_CONG_VEGAS=m
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
CONFIG_TCP_CONG_VENO=m
# CONFIG_TCP_CONG_YEAH is not set
# CONFIG_TCP_CONG_ILLINOIS is not set
CONFIG_DEFAULT_BIC=y
# CONFIG_DEFAULT_CUBIC is not set
# CONFIG_DEFAULT_HTCP is not set
# CONFIG_DEFAULT_VEGAS is not set
# CONFIG_DEFAULT_WESTWOOD is not set
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="bic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IP_VS=m
# CONFIG_IP_VS_DEBUG is not set
CONFIG_IP_VS_TAB_BITS=12

#
# IPVS transport protocol load balancing support
#
CONFIG_IP_VS_PROTO_TCP=y
CONFIG_IP_VS_PROTO_UDP=y
CONFIG_IP_VS_PROTO_ESP=y
CONFIG_IP_VS_PROTO_AH=y

#
# IPVS scheduler
#
CONFIG_IP_VS_RR=m
CONFIG_IP_VS_WRR=m
CONFIG_IP_VS_LC=m
CONFIG_IP_VS_WLC=m
CONFIG_IP_VS_LBLC=m
CONFIG_IP_VS_LBLCR=m
CONFIG_IP_VS_DH=m
CONFIG_IP_VS_SH=m
CONFIG_IP_VS_SED=m
CONFIG_IP_VS_NQ=m

#
# IPVS application helper
#
CONFIG_IP_VS_FTP=m
CONFIG_IPV6=m
CONFIG_IPV6_PRIVACY=y
CONFIG_IPV6_ROUTER_PREF=y
CONFIG_IPV6_ROUTE_INFO=y
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
CONFIG_INET6_IPCOMP=m
# CONFIG_IPV6_MIP6 is not set
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_INET6_XFRM_MODE_TRANSPORT=m
CONFIG_INET6_XFRM_MODE_TUNNEL=m
CONFIG_INET6_XFRM_MODE_BEET=m
# CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION is not set
CONFIG_IPV6_SIT=m
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
CONFIG_IPV6_MULTIPLE_TABLES=y
# CONFIG_IPV6_SUBTREES is not set
# CONFIG_IPV6_MROUTE is not set
CONFIG_NETLABEL=y
CONFIG_NETWORK_SECMARK=y
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=y

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_NETLINK=m
CONFIG_NETFILTER_NETLINK_QUEUE=m
CONFIG_NETFILTER_NETLINK_LOG=m
# CONFIG_NF_CONNTRACK is not set
CONFIG_NETFILTER_XTABLES=m
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
# CONFIG_NETFILTER_XT_TARGET_DSCP is not set
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
# CONFIG_NETFILTER_XT_TARGET_NFLOG is not set
# CONFIG_NETFILTER_XT_TARGET_RATEEST is not set
# CONFIG_NETFILTER_XT_TARGET_TRACE is not set
CONFIG_NETFILTER_XT_TARGET_SECMARK=m
# CONFIG_NETFILTER_XT_TARGET_TCPMSS is not set
# CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP is not set
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_DCCP=m
# CONFIG_NETFILTER_XT_MATCH_DSCP is not set
CONFIG_NETFILTER_XT_MATCH_ESP=m
# CONFIG_NETFILTER_XT_MATCH_IPRANGE is not set
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
CONFIG_NETFILTER_XT_MATCH_LIMIT=m
CONFIG_NETFILTER_XT_MATCH_MAC=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
# CONFIG_NETFILTER_XT_MATCH_OWNER is not set
CONFIG_NETFILTER_XT_MATCH_POLICY=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=m
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
CONFIG_NETFILTER_XT_MATCH_QUOTA=m
# CONFIG_NETFILTER_XT_MATCH_RATEEST is not set
CONFIG_NETFILTER_XT_MATCH_REALM=m
CONFIG_NETFILTER_XT_MATCH_SCTP=m
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
# CONFIG_NETFILTER_XT_MATCH_TIME is not set
# CONFIG_NETFILTER_XT_MATCH_U32 is not set
# CONFIG_NETFILTER_XT_MATCH_HASHLIMIT is not set

#
# IP: Netfilter Configuration
#
CONFIG_IP_NF_QUEUE=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_RECENT=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_AH=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_MATCH_ADDRTYPE=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_TARGET_LOG=m
CONFIG_IP_NF_TARGET_ULOG=m
CONFIG_IP_NF_MANGLE=m
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m

#
# IPv6: Netfilter Configuration
#
CONFIG_IP6_NF_QUEUE=m
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_RT=m
CONFIG_IP6_NF_MATCH_OPTS=m
CONFIG_IP6_NF_MATCH_FRAG=m
CONFIG_IP6_NF_MATCH_HL=m
CONFIG_IP6_NF_MATCH_IPV6HEADER=m
CONFIG_IP6_NF_MATCH_AH=m
# CONFIG_IP6_NF_MATCH_MH is not set
CONFIG_IP6_NF_MATCH_EUI64=m
CONFIG_IP6_NF_FILTER=m
CONFIG_IP6_NF_TARGET_LOG=m
CONFIG_IP6_NF_TARGET_REJECT=m
CONFIG_IP6_NF_MANGLE=m
CONFIG_IP6_NF_TARGET_HL=m
CONFIG_IP6_NF_RAW=m

#
# Bridge: Netfilter Configuration
#
CONFIG_BRIDGE_NF_EBTABLES=m
CONFIG_BRIDGE_EBT_BROUTE=m
CONFIG_BRIDGE_EBT_T_FILTER=m
CONFIG_BRIDGE_EBT_T_NAT=m
CONFIG_BRIDGE_EBT_802_3=m
CONFIG_BRIDGE_EBT_AMONG=m
CONFIG_BRIDGE_EBT_ARP=m
CONFIG_BRIDGE_EBT_IP=m
CONFIG_BRIDGE_EBT_LIMIT=m
CONFIG_BRIDGE_EBT_MARK=m
CONFIG_BRIDGE_EBT_PKTTYPE=m
CONFIG_BRIDGE_EBT_STP=m
CONFIG_BRIDGE_EBT_VLAN=m
CONFIG_BRIDGE_EBT_ARPREPLY=m
CONFIG_BRIDGE_EBT_DNAT=m
CONFIG_BRIDGE_EBT_MARK_T=m
CONFIG_BRIDGE_EBT_REDIRECT=m
CONFIG_BRIDGE_EBT_SNAT=m
CONFIG_BRIDGE_EBT_LOG=m
CONFIG_BRIDGE_EBT_ULOG=m
# CONFIG_BRIDGE_EBT_NFLOG is not set
CONFIG_IP_DCCP=m
CONFIG_INET_DCCP_DIAG=m
CONFIG_IP_DCCP_ACKVEC=y

#
# DCCP CCIDs Configuration (EXPERIMENTAL)
#
CONFIG_IP_DCCP_CCID2=m
# CONFIG_IP_DCCP_CCID2_DEBUG is not set
CONFIG_IP_DCCP_CCID3=m
# CONFIG_IP_DCCP_CCID3_DEBUG is not set
CONFIG_IP_DCCP_CCID3_RTO=100
CONFIG_IP_DCCP_TFRC_LIB=m

#
# DCCP Kernel Hacking
#
# CONFIG_IP_DCCP_DEBUG is not set
# CONFIG_NET_DCCPPROBE is not set
CONFIG_IP_SCTP=m
# CONFIG_SCTP_DBG_MSG is not set
# CONFIG_SCTP_DBG_OBJCNT is not set
# CONFIG_SCTP_HMAC_NONE is not set
# CONFIG_SCTP_HMAC_SHA1 is not set
CONFIG_SCTP_HMAC_MD5=y
CONFIG_TIPC=m
# CONFIG_TIPC_ADVANCED is not set
# CONFIG_TIPC_DEBUG is not set
CONFIG_ATM=m
CONFIG_ATM_CLIP=m
# CONFIG_ATM_CLIP_NO_ICMP is not set
CONFIG_ATM_LANE=m
# CONFIG_ATM_MPOA is not set
CONFIG_ATM_BR2684=m
# CONFIG_ATM_BR2684_IPFILTER is not set
CONFIG_BRIDGE=m
CONFIG_VLAN_8021Q=m
# CONFIG_DECNET is not set
CONFIG_LLC=y
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=m
CONFIG_NET_SCH_ATM=m
CONFIG_NET_SCH_PRIO=m
# CONFIG_NET_SCH_RR is not set
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_NETEM=m
CONFIG_NET_SCH_INGRESS=m

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=m
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_ROUTE=y
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
# CONFIG_NET_CLS_FLOW is not set
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_STACK=32
CONFIG_NET_EMATCH_CMP=m
CONFIG_NET_EMATCH_NBYTE=m
CONFIG_NET_EMATCH_U32=m
CONFIG_NET_EMATCH_META=m
CONFIG_NET_EMATCH_TEXT=m
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
CONFIG_NET_ACT_IPT=m
# CONFIG_NET_ACT_NAT is not set
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
CONFIG_NET_CLS_IND=y
CONFIG_NET_SCH_FIFO=y

#
# Network testing
#
CONFIG_NET_PKTGEN=m
# CONFIG_NET_TCPPROBE is not set
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_IRDA is not set
CONFIG_BT=m
CONFIG_BT_L2CAP=m
CONFIG_BT_SCO=m
CONFIG_BT_RFCOMM=m
CONFIG_BT_RFCOMM_TTY=y
CONFIG_BT_BNEP=m
CONFIG_BT_BNEP_MC_FILTER=y
CONFIG_BT_BNEP_PROTO_FILTER=y
CONFIG_BT_CMTP=m
CONFIG_BT_HIDP=m

#
# Bluetooth device drivers
#
CONFIG_BT_HCIUSB=m
CONFIG_BT_HCIUSB_SCO=y
# CONFIG_BT_HCIBTSDIO is not set
CONFIG_BT_HCIUART=m
CONFIG_BT_HCIUART_H4=y
CONFIG_BT_HCIUART_BCSP=y
# CONFIG_BT_HCIUART_LL is not set
CONFIG_BT_HCIBCM203X=m
CONFIG_BT_HCIBPA10X=m
CONFIG_BT_HCIBFUSB=m
CONFIG_BT_HCIDTL1=m
CONFIG_BT_HCIBT3C=m
CONFIG_BT_HCIBLUECARD=m
CONFIG_BT_HCIBTUART=m
CONFIG_BT_HCIVHCI=m
# CONFIG_AF_RXRPC is not set
CONFIG_FIB_RULES=y

#
# Wireless
#
CONFIG_CFG80211=m
CONFIG_NL80211=y
CONFIG_WIRELESS_EXT=y
CONFIG_MAC80211=m

#
# QoS/HT support disabled
#

#
# QoS/HT support needs CONFIG_NETDEVICES_MULTIQUEUE
#

#
# Rate control algorithm selection
#
CONFIG_MAC80211_RC_DEFAULT_PID=y
# CONFIG_MAC80211_RC_DEFAULT_NONE is not set

#
# Selecting 'y' for an algorithm will
#

#
# build the algorithm into mac80211.
#
CONFIG_MAC80211_RC_DEFAULT="pid"
CONFIG_MAC80211_RC_PID=y
# CONFIG_MAC80211_MESH is not set
CONFIG_MAC80211_LEDS=y
# CONFIG_MAC80211_DEBUGFS is not set
# CONFIG_MAC80211_DEBUG_PACKET_ALIGNMENT is not set
CONFIG_MAC80211_DEBUG=y
# CONFIG_MAC80211_HT_DEBUG is not set
# CONFIG_MAC80211_VERBOSE_DEBUG is not set
# CONFIG_MAC80211_LOWTX_FRAME_DUMP is not set
# CONFIG_TKIP_DEBUG is not set
# CONFIG_MAC80211_DEBUG_COUNTERS is not set
# CONFIG_MAC80211_IBSS_DEBUG is not set
# CONFIG_MAC80211_VERBOSE_PS_DEBUG is not set
CONFIG_IEEE80211=m
# CONFIG_IEEE80211_DEBUG is not set
CONFIG_IEEE80211_CRYPT_WEP=m
CONFIG_IEEE80211_CRYPT_CCMP=m
CONFIG_IEEE80211_CRYPT_TKIP=m
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_BUILTIN_FIRMWARE=""
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_SYS_HYPERVISOR is not set
CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
CONFIG_MTD=m
# CONFIG_MTD_DEBUG is not set
CONFIG_MTD_CONCAT=m
CONFIG_MTD_PARTITIONS=y
CONFIG_MTD_REDBOOT_PARTS=m
CONFIG_MTD_REDBOOT_DIRECTORY_BLOCK=-1
# CONFIG_MTD_REDBOOT_PARTS_UNALLOCATED is not set
# CONFIG_MTD_REDBOOT_PARTS_READONLY is not set
# CONFIG_MTD_AR7_PARTS is not set

#
# User Modules And Translation Layers
#
CONFIG_MTD_CHAR=m
CONFIG_MTD_BLKDEVS=m
CONFIG_MTD_BLOCK=m
CONFIG_MTD_BLOCK_RO=m
CONFIG_FTL=m
CONFIG_NFTL=m
CONFIG_NFTL_RW=y
# CONFIG_INFTL is not set
CONFIG_RFD_FTL=m
# CONFIG_SSFDC is not set
# CONFIG_MTD_OOPS is not set

#
# RAM/ROM/Flash chip drivers
#
CONFIG_MTD_CFI=m
CONFIG_MTD_JEDECPROBE=m
CONFIG_MTD_GEN_PROBE=m
# CONFIG_MTD_CFI_ADV_OPTIONS is not set
CONFIG_MTD_MAP_BANK_WIDTH_1=y
CONFIG_MTD_MAP_BANK_WIDTH_2=y
CONFIG_MTD_MAP_BANK_WIDTH_4=y
# CONFIG_MTD_MAP_BANK_WIDTH_8 is not set
# CONFIG_MTD_MAP_BANK_WIDTH_16 is not set
# CONFIG_MTD_MAP_BANK_WIDTH_32 is not set
CONFIG_MTD_CFI_I1=y
CONFIG_MTD_CFI_I2=y
# CONFIG_MTD_CFI_I4 is not set
# CONFIG_MTD_CFI_I8 is not set
CONFIG_MTD_CFI_INTELEXT=m
CONFIG_MTD_CFI_AMDSTD=m
CONFIG_MTD_CFI_STAA=m
CONFIG_MTD_CFI_UTIL=m
CONFIG_MTD_RAM=m
CONFIG_MTD_ROM=m
CONFIG_MTD_ABSENT=m

#
# Mapping drivers for chip access
#
# CONFIG_MTD_COMPLEX_MAPPINGS is not set
# CONFIG_MTD_PHYSMAP is not set
CONFIG_MTD_SC520CDP=m
CONFIG_MTD_NETSC520=m
CONFIG_MTD_TS5500=m
# CONFIG_MTD_AMD76XROM is not set
# CONFIG_MTD_ICHXROM is not set
# CONFIG_MTD_ESB2ROM is not set
# CONFIG_MTD_CK804XROM is not set
CONFIG_MTD_SCB2_FLASH=m
# CONFIG_MTD_NETtel is not set
# CONFIG_MTD_DILNETPC is not set
# CONFIG_MTD_L440GX is not set
# CONFIG_MTD_INTEL_VR_NOR is not set
# CONFIG_MTD_PLATRAM is not set

#
# Self-contained MTD device drivers
#
# CONFIG_MTD_PMC551 is not set
# CONFIG_MTD_SLRAM is not set
# CONFIG_MTD_PHRAM is not set
CONFIG_MTD_MTDRAM=m
CONFIG_MTDRAM_TOTAL_SIZE=4096
CONFIG_MTDRAM_ERASE_SIZE=128
CONFIG_MTD_BLOCK2MTD=m

#
# Disk-On-Chip Device Drivers
#
# CONFIG_MTD_DOC2000 is not set
# CONFIG_MTD_DOC2001 is not set
# CONFIG_MTD_DOC2001PLUS is not set
CONFIG_MTD_NAND=m
# CONFIG_MTD_NAND_VERIFY_WRITE is not set
CONFIG_MTD_NAND_ECC_SMC=y
# CONFIG_MTD_NAND_MUSEUM_IDS is not set
CONFIG_MTD_NAND_IDS=m
CONFIG_MTD_NAND_DISKONCHIP=m
# CONFIG_MTD_NAND_DISKONCHIP_PROBE_ADVANCED is not set
CONFIG_MTD_NAND_DISKONCHIP_PROBE_ADDRESS=0
# CONFIG_MTD_NAND_DISKONCHIP_BBTWRITE is not set
# CONFIG_MTD_NAND_CAFE is not set
CONFIG_MTD_NAND_NANDSIM=m
# CONFIG_MTD_NAND_PLATFORM is not set
# CONFIG_MTD_ALAUDA is not set
# CONFIG_MTD_ONENAND is not set

#
# UBI - Unsorted block images
#
# CONFIG_MTD_UBI is not set
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
CONFIG_PARPORT_SERIAL=m
# CONFIG_PARPORT_PC_FIFO is not set
# CONFIG_PARPORT_PC_SUPERIO is not set
CONFIG_PARPORT_PC_PCMCIA=m
# CONFIG_PARPORT_GSC is not set
# CONFIG_PARPORT_AX88796 is not set
CONFIG_PARPORT_1284=y
CONFIG_PARPORT_NOT_PC=y
CONFIG_PNP=y
# CONFIG_PNP_DEBUG is not set

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_FD=m
CONFIG_PARIDE=m

#
# Parallel IDE high-level drivers
#
CONFIG_PARIDE_PD=m
CONFIG_PARIDE_PCD=m
CONFIG_PARIDE_PF=m
CONFIG_PARIDE_PT=m
CONFIG_PARIDE_PG=m

#
# Parallel IDE protocol modules
#
CONFIG_PARIDE_ATEN=m
CONFIG_PARIDE_BPCK=m
CONFIG_PARIDE_COMM=m
CONFIG_PARIDE_DSTR=m
CONFIG_PARIDE_FIT2=m
CONFIG_PARIDE_FIT3=m
CONFIG_PARIDE_EPAT=m
CONFIG_PARIDE_EPATC8=y
CONFIG_PARIDE_EPIA=m
CONFIG_PARIDE_FRIQ=m
CONFIG_PARIDE_FRPW=m
CONFIG_PARIDE_KBIC=m
CONFIG_PARIDE_KTTI=m
CONFIG_PARIDE_ON20=m
CONFIG_PARIDE_ON26=m
CONFIG_BLK_CPQ_DA=m
CONFIG_BLK_CPQ_CISS_DA=m
CONFIG_CISS_SCSI_TAPE=y
CONFIG_BLK_DEV_DAC960=m
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_CRYPTOLOOP=m
CONFIG_BLK_DEV_NBD=m
CONFIG_BLK_DEV_SX8=m
# CONFIG_BLK_DEV_UB is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=16384
# CONFIG_BLK_DEV_XIP is not set
CONFIG_CDROM_PKTCDVD=m
CONFIG_CDROM_PKTCDVD_BUFFERS=8
# CONFIG_CDROM_PKTCDVD_WCACHE is not set
CONFIG_ATA_OVER_ETH=m
CONFIG_MISC_DEVICES=y
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
# CONFIG_EEPROM_93CX6 is not set
# CONFIG_SGI_IOC4 is not set
# CONFIG_TIFM_CORE is not set
# CONFIG_ACER_WMI is not set
# CONFIG_ASUS_LAPTOP is not set
# CONFIG_FUJITSU_LAPTOP is not set
# CONFIG_MSI_LAPTOP is not set
# CONFIG_COMPAL_LAPTOP is not set
# CONFIG_SONY_LAPTOP is not set
# CONFIG_THINKPAD_ACPI is not set
# CONFIG_INTEL_MENLOW is not set
# CONFIG_EEEPC_LAPTOP is not set
# CONFIG_ENCLOSURE_SERVICES is not set
CONFIG_HAVE_IDE=y
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y

#
# Please see Documentation/ide/ide.txt for help/info on IDE drives
#
CONFIG_IDE_TIMINGS=y
CONFIG_IDE_ATAPI=y
# CONFIG_BLK_DEV_IDE_SATA is not set
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDECS=m
# CONFIG_BLK_DEV_DELKIN is not set
CONFIG_BLK_DEV_IDECD=m
CONFIG_BLK_DEV_IDECD_VERBOSE_ERRORS=y
# CONFIG_BLK_DEV_IDETAPE is not set
CONFIG_BLK_DEV_IDEFLOPPY=y
CONFIG_BLK_DEV_IDESCSI=m
# CONFIG_BLK_DEV_IDEACPI is not set
CONFIG_IDE_TASK_IOCTL=y
CONFIG_IDE_PROC_FS=y

#
# IDE chipset support/bugfixes
#
CONFIG_IDE_GENERIC=y
# CONFIG_BLK_DEV_PLATFORM is not set
# CONFIG_BLK_DEV_CMD640 is not set
CONFIG_BLK_DEV_IDEPNP=y
CONFIG_BLK_DEV_IDEDMA_SFF=y

#
# PCI IDE chipsets support
#
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_PCIBUS_ORDER=y
# CONFIG_BLK_DEV_OFFBOARD is not set
CONFIG_BLK_DEV_GENERIC=y
# CONFIG_BLK_DEV_OPTI621 is not set
# CONFIG_BLK_DEV_RZ1000 is not set
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_BLK_DEV_AEC62XX=y
CONFIG_BLK_DEV_ALI15X3=y
CONFIG_BLK_DEV_AMD74XX=y
CONFIG_BLK_DEV_ATIIXP=y
CONFIG_BLK_DEV_CMD64X=y
# CONFIG_BLK_DEV_TRIFLEX is not set
# CONFIG_BLK_DEV_CY82C693 is not set
# CONFIG_BLK_DEV_CS5520 is not set
# CONFIG_BLK_DEV_CS5530 is not set
CONFIG_BLK_DEV_HPT34X=y
# CONFIG_HPT34X_AUTODMA is not set
CONFIG_BLK_DEV_HPT366=y
# CONFIG_BLK_DEV_JMICRON is not set
# CONFIG_BLK_DEV_SC1200 is not set
CONFIG_BLK_DEV_PIIX=y
# CONFIG_BLK_DEV_IT8213 is not set
CONFIG_BLK_DEV_IT821X=y
# CONFIG_BLK_DEV_NS87415 is not set
CONFIG_BLK_DEV_PDC202XX_OLD=y
CONFIG_BLK_DEV_PDC202XX_NEW=y
CONFIG_BLK_DEV_SVWKS=y
CONFIG_BLK_DEV_SIIMAGE=y
CONFIG_BLK_DEV_SIS5513=y
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
CONFIG_BLK_DEV_VIA82CXXX=y
# CONFIG_BLK_DEV_TC86C001 is not set
CONFIG_BLK_DEV_IDEDMA=y
# CONFIG_BLK_DEV_HD_ONLY is not set
# CONFIG_BLK_DEV_HD is not set

#
# SCSI device support
#
CONFIG_RAID_ATTRS=m
CONFIG_SCSI=m
CONFIG_SCSI_DMA=y
# CONFIG_SCSI_TGT is not set
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=m
CONFIG_CHR_DEV_ST=m
CONFIG_CHR_DEV_OSST=m
CONFIG_BLK_DEV_SR=m
CONFIG_BLK_DEV_SR_VENDOR=y
CONFIG_CHR_DEV_SG=m
CONFIG_CHR_DEV_SCH=m

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
# CONFIG_SCSI_SCAN_ASYNC is not set
CONFIG_SCSI_WAIT_SCAN=m

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=m
CONFIG_SCSI_FC_ATTRS=m
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
# CONFIG_SCSI_SAS_ATA is not set
CONFIG_SCSI_SAS_HOST_SMP=y
# CONFIG_SCSI_SAS_LIBSAS_DEBUG is not set
CONFIG_SCSI_SRP_ATTRS=m
CONFIG_SCSI_LOWLEVEL=y
CONFIG_ISCSI_TCP=m
CONFIG_BLK_DEV_3W_XXXX_RAID=m
CONFIG_SCSI_3W_9XXX=m
CONFIG_SCSI_ACARD=m
CONFIG_SCSI_AACRAID=m
CONFIG_SCSI_AIC7XXX=m
CONFIG_AIC7XXX_CMDS_PER_DEVICE=4
CONFIG_AIC7XXX_RESET_DELAY_MS=15000
# CONFIG_AIC7XXX_DEBUG_ENABLE is not set
CONFIG_AIC7XXX_DEBUG_MASK=0
# CONFIG_AIC7XXX_REG_PRETTY_PRINT is not set
CONFIG_SCSI_AIC7XXX_OLD=m
CONFIG_SCSI_AIC79XX=m
CONFIG_AIC79XX_CMDS_PER_DEVICE=4
CONFIG_AIC79XX_RESET_DELAY_MS=15000
# CONFIG_AIC79XX_DEBUG_ENABLE is not set
CONFIG_AIC79XX_DEBUG_MASK=0
# CONFIG_AIC79XX_REG_PRETTY_PRINT is not set
CONFIG_SCSI_AIC94XX=m
# CONFIG_AIC94XX_DEBUG is not set
# CONFIG_SCSI_BROADSAS is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
CONFIG_SCSI_ARCMSR=m
# CONFIG_SCSI_ARCMSR_AER is not set
CONFIG_MEGARAID_NEWGEN=y
CONFIG_MEGARAID_MM=m
CONFIG_MEGARAID_MAILBOX=m
CONFIG_MEGARAID_LEGACY=m
CONFIG_MEGARAID_SAS=m
CONFIG_SCSI_HPTIOP=m
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
CONFIG_SCSI_GDTH=m
CONFIG_SCSI_IPS=m
CONFIG_SCSI_INITIO=m
# CONFIG_SCSI_INIA100 is not set
CONFIG_SCSI_PPA=m
CONFIG_SCSI_IMM=m
# CONFIG_SCSI_IZIP_EPP16 is not set
# CONFIG_SCSI_IZIP_SLOW_CTR is not set
# CONFIG_SCSI_MVSAS is not set
CONFIG_SCSI_STEX=m
CONFIG_SCSI_SYM53C8XX_2=m
CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=1
CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16
CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64
CONFIG_SCSI_SYM53C8XX_MMIO=y
# CONFIG_SCSI_IPR is not set
CONFIG_SCSI_QLOGIC_1280=m
CONFIG_SCSI_QLA_FC=m
CONFIG_SCSI_QLA_ISCSI=m
CONFIG_SCSI_LPFC=m
CONFIG_SCSI_DC395x=m
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_DEBUG is not set
# CONFIG_SCSI_SRP is not set
# CONFIG_SCSI_LOWLEVEL_PCMCIA is not set
# CONFIG_SCSI_DH is not set
CONFIG_ATA=m
# CONFIG_ATA_NONSTANDARD is not set
CONFIG_ATA_ACPI=y
CONFIG_SATA_PMP=y
CONFIG_SATA_AHCI=m
CONFIG_SATA_SIL24=m
CONFIG_ATA_SFF=y
CONFIG_SATA_SVW=m
CONFIG_ATA_PIIX=m
CONFIG_SATA_MV=m
CONFIG_SATA_NV=m
CONFIG_PDC_ADMA=m
CONFIG_SATA_QSTOR=m
CONFIG_SATA_PROMISE=m
CONFIG_SATA_SX4=m
CONFIG_SATA_SIL=m
CONFIG_SATA_SIS=m
CONFIG_SATA_ULI=m
CONFIG_SATA_VIA=m
CONFIG_SATA_VITESSE=m
CONFIG_SATA_INIC162X=m
# CONFIG_PATA_ACPI is not set
# CONFIG_PATA_ALI is not set
# CONFIG_PATA_AMD is not set
# CONFIG_PATA_ARTOP is not set
# CONFIG_PATA_ATIIXP is not set
# CONFIG_PATA_CMD640_PCI is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CS5520 is not set
# CONFIG_PATA_CS5530 is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
# CONFIG_ATA_GENERIC is not set
# CONFIG_PATA_HPT366 is not set
# CONFIG_PATA_HPT37X is not set
# CONFIG_PATA_HPT3X2N is not set
# CONFIG_PATA_HPT3X3 is not set
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_IT8213 is not set
# CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_TRIFLEX is not set
CONFIG_PATA_MARVELL=m
# CONFIG_PATA_MPIIX is not set
# CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_NETCELL is not set
# CONFIG_PATA_NINJA32 is not set
# CONFIG_PATA_NS87410 is not set
# CONFIG_PATA_NS87415 is not set
# CONFIG_PATA_OPTI is not set
# CONFIG_PATA_OPTIDMA is not set
# CONFIG_PATA_PCMCIA is not set
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_RADISYS is not set
# CONFIG_PATA_RZ1000 is not set
# CONFIG_PATA_SC1200 is not set
# CONFIG_PATA_SERVERWORKS is not set
CONFIG_PATA_PDC2027X=m
# CONFIG_PATA_SIL680 is not set
CONFIG_PATA_SIS=m
# CONFIG_PATA_VIA is not set
# CONFIG_PATA_WINBOND is not set
# CONFIG_PATA_SCH is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=m
CONFIG_MD_RAID456=m
CONFIG_MD_RAID5_RESHAPE=y
CONFIG_MD_MULTIPATH=m
CONFIG_MD_FAULTY=m
CONFIG_BLK_DEV_DM=m
# CONFIG_DM_DEBUG is not set
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_MIRROR=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
# CONFIG_DM_DELAY is not set
# CONFIG_DM_UEVENT is not set
CONFIG_FUSION=y
CONFIG_FUSION_SPI=m
CONFIG_FUSION_FC=m
CONFIG_FUSION_SAS=m
CONFIG_FUSION_MAX_SGE=40
CONFIG_FUSION_CTL=m
CONFIG_FUSION_LAN=m
# CONFIG_FUSION_LOGGING is not set

#
# IEEE 1394 (FireWire) support
#
CONFIG_FIREWIRE=m
CONFIG_FIREWIRE_OHCI=m
CONFIG_FIREWIRE_OHCI_DEBUG=y
CONFIG_FIREWIRE_SBP2=m
# CONFIG_IEEE1394 is not set
CONFIG_I2O=m
# CONFIG_I2O_LCT_NOTIFY_ON_CHANGES is not set
CONFIG_I2O_EXT_ADAPTEC=y
CONFIG_I2O_EXT_ADAPTEC_DMA64=y
CONFIG_I2O_CONFIG=m
CONFIG_I2O_CONFIG_OLD_IOCTL=y
CONFIG_I2O_BUS=m
CONFIG_I2O_BLOCK=m
CONFIG_I2O_SCSI=m
CONFIG_I2O_PROC=m
# CONFIG_MACINTOSH_DRIVERS is not set
CONFIG_NETDEVICES=y
# CONFIG_NETDEVICES_MULTIQUEUE is not set
CONFIG_IFB=m
CONFIG_DUMMY=m
CONFIG_BONDING=m
# CONFIG_MACVLAN is not set
# CONFIG_EQUALIZER is not set
CONFIG_TUN=m
# CONFIG_VETH is not set
# CONFIG_NET_SB1000 is not set
# CONFIG_ARCNET is not set
CONFIG_PHYLIB=m

#
# MII PHY device drivers
#
CONFIG_MARVELL_PHY=m
CONFIG_DAVICOM_PHY=m
CONFIG_QSEMI_PHY=m
CONFIG_LXT_PHY=m
CONFIG_CICADA_PHY=m
CONFIG_VITESSE_PHY=m
CONFIG_SMSC_PHY=m
# CONFIG_BROADCOM_PHY is not set
# CONFIG_ICPLUS_PHY is not set
# CONFIG_REALTEK_PHY is not set
# CONFIG_MDIO_BITBANG is not set
CONFIG_NET_ETHERNET=y
CONFIG_MII=m
CONFIG_HAPPYMEAL=m
CONFIG_SUNGEM=m
CONFIG_CASSINI=m
CONFIG_NET_VENDOR_3COM=y
CONFIG_VORTEX=m
CONFIG_TYPHOON=m
CONFIG_NET_TULIP=y
CONFIG_DE2104X=m
CONFIG_TULIP=m
# CONFIG_TULIP_MWI is not set
CONFIG_TULIP_MMIO=y
# CONFIG_TULIP_NAPI is not set
CONFIG_DE4X5=m
CONFIG_WINBOND_840=m
CONFIG_DM9102=m
CONFIG_ULI526X=m
CONFIG_PCMCIA_XIRCOM=m
# CONFIG_HP100 is not set
# CONFIG_IBM_NEW_EMAC_ZMII is not set
# CONFIG_IBM_NEW_EMAC_RGMII is not set
# CONFIG_IBM_NEW_EMAC_TAH is not set
# CONFIG_IBM_NEW_EMAC_EMAC4 is not set
CONFIG_NET_PCI=y
CONFIG_PCNET32=m
CONFIG_AMD8111_ETH=m
CONFIG_AMD8111E_NAPI=y
CONFIG_ADAPTEC_STARFIRE=m
CONFIG_ADAPTEC_STARFIRE_NAPI=y
CONFIG_B44=m
CONFIG_B44_PCI_AUTOSELECT=y
CONFIG_B44_PCICORE_AUTOSELECT=y
CONFIG_B44_PCI=y
CONFIG_FORCEDETH=m
# CONFIG_FORCEDETH_NAPI is not set
# CONFIG_EEPRO100 is not set
CONFIG_E100=m
CONFIG_FEALNX=m
CONFIG_NATSEMI=m
CONFIG_NE2K_PCI=m
CONFIG_8139CP=m
CONFIG_8139TOO=m
# CONFIG_8139TOO_PIO is not set
# CONFIG_8139TOO_TUNE_TWISTER is not set
CONFIG_8139TOO_8129=y
# CONFIG_8139_OLD_RX_RESET is not set
# CONFIG_R6040 is not set
CONFIG_SIS900=m
CONFIG_EPIC100=m
# CONFIG_SUNDANCE is not set
# CONFIG_TLAN is not set
CONFIG_VIA_RHINE=m
CONFIG_VIA_RHINE_MMIO=y
CONFIG_VIA_RHINE_NAPI=y
# CONFIG_SC92031 is not set
CONFIG_NET_POCKET=y
# CONFIG_ATP is not set
# CONFIG_DE600 is not set
# CONFIG_DE620 is not set
CONFIG_NETDEV_1000=y
CONFIG_ACENIC=m
# CONFIG_ACENIC_OMIT_TIGON_I is not set
CONFIG_DL2K=m
CONFIG_E1000=m
CONFIG_E1000_NAPI=y
# CONFIG_E1000_DISABLE_PACKET_SPLIT is not set
CONFIG_E1000E=m
CONFIG_E1000E_ENABLED=y
# CONFIG_IP1000 is not set
CONFIG_IGB=m
CONFIG_NS83820=m
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
CONFIG_R8169=m
CONFIG_R8169_NAPI=y
CONFIG_R8169_VLAN=y
CONFIG_SIS190=m
CONFIG_SKGE=m
# CONFIG_SKGE_DEBUG is not set
CONFIG_SKY2=m
# CONFIG_SKY2_DEBUG is not set
CONFIG_VIA_VELOCITY=m
CONFIG_TIGON3=m
CONFIG_BNX2=m
CONFIG_QLA3XXX=m
# CONFIG_ATL1 is not set
CONFIG_NETDEV_10000=y
CONFIG_CHELSIO_T1=m
# CONFIG_CHELSIO_T1_1G is not set
CONFIG_CHELSIO_T1_NAPI=y
CONFIG_CHELSIO_T3=m
# CONFIG_IXGBE is not set
CONFIG_IXGB=m
CONFIG_IXGB_NAPI=y
CONFIG_S2IO=m
CONFIG_S2IO_NAPI=y
CONFIG_MYRI10GE=m
CONFIG_NETXEN_NIC=m
# CONFIG_NIU is not set
# CONFIG_MLX4_CORE is not set
# CONFIG_TEHUTI is not set
# CONFIG_BNX2X is not set
# CONFIG_SFC is not set
CONFIG_TR=y
CONFIG_IBMOL=m
CONFIG_3C359=m
# CONFIG_TMS380TR is not set

#
# Wireless LAN
#
# CONFIG_WLAN_PRE80211 is not set
# CONFIG_WLAN_80211 is not set
# CONFIG_IWLWIFI_LEDS is not set

#
# USB Network Adapters
#
CONFIG_USB_CATC=m
CONFIG_USB_KAWETH=m
# CONFIG_USB_KAWETH_FIRMWARE is not set
CONFIG_USB_PEGASUS=m
CONFIG_USB_RTL8150=m
CONFIG_USB_USBNET=m
CONFIG_USB_NET_AX8817X=m
CONFIG_USB_NET_CDCETHER=m
# CONFIG_USB_NET_DM9601 is not set
CONFIG_USB_NET_GL620A=m
CONFIG_USB_NET_NET1080=m
CONFIG_USB_NET_PLUSB=m
# CONFIG_USB_NET_MCS7830 is not set
CONFIG_USB_NET_RNDIS_HOST=m
CONFIG_USB_NET_CDC_SUBSET=m
CONFIG_USB_ALI_M5632=y
CONFIG_USB_AN2720=y
CONFIG_USB_BELKIN=y
CONFIG_USB_ARMLINUX=y
CONFIG_USB_EPSON2888=y
# CONFIG_USB_KC2190 is not set
CONFIG_USB_NET_ZAURUS=m
CONFIG_NET_PCMCIA=y
CONFIG_PCMCIA_3C589=m
CONFIG_PCMCIA_3C574=m
CONFIG_PCMCIA_FMVJ18X=m
CONFIG_PCMCIA_PCNET=m
CONFIG_PCMCIA_NMCLAN=m
CONFIG_PCMCIA_SMC91C92=m
CONFIG_PCMCIA_XIRC2PS=m
CONFIG_PCMCIA_AXNET=m
# CONFIG_WAN is not set
CONFIG_ATM_DRIVERS=y
# CONFIG_ATM_DUMMY is not set
CONFIG_ATM_TCP=m
CONFIG_ATM_LANAI=m
CONFIG_ATM_ENI=m
# CONFIG_ATM_ENI_DEBUG is not set
# CONFIG_ATM_ENI_TUNE_BURST is not set
CONFIG_ATM_FIRESTREAM=m
# CONFIG_ATM_ZATM is not set
CONFIG_ATM_IDT77252=m
# CONFIG_ATM_IDT77252_DEBUG is not set
# CONFIG_ATM_IDT77252_RCV_ALL is not set
CONFIG_ATM_IDT77252_USE_SUNI=y
CONFIG_ATM_AMBASSADOR=m
# CONFIG_ATM_AMBASSADOR_DEBUG is not set
CONFIG_ATM_HORIZON=m
# CONFIG_ATM_HORIZON_DEBUG is not set
CONFIG_ATM_FORE200E_MAYBE=m
# CONFIG_ATM_FORE200E_PCA is not set
CONFIG_ATM_HE=m
# CONFIG_ATM_HE_USE_SUNI is not set
CONFIG_FDDI=y
# CONFIG_DEFXX is not set
# CONFIG_SKFP is not set
# CONFIG_HIPPI is not set
# CONFIG_PLIP is not set
CONFIG_PPP=m
CONFIG_PPP_MULTILINK=y
CONFIG_PPP_FILTER=y
CONFIG_PPP_ASYNC=m
CONFIG_PPP_SYNC_TTY=m
CONFIG_PPP_DEFLATE=m
# CONFIG_PPP_BSDCOMP is not set
CONFIG_PPP_MPPE=m
CONFIG_PPPOE=m
CONFIG_PPPOATM=m
# CONFIG_PPPOL2TP is not set
CONFIG_SLIP=m
CONFIG_SLIP_COMPRESSED=y
CONFIG_SLHC=m
CONFIG_SLIP_SMART=y
# CONFIG_SLIP_MODE_SLIP6 is not set
CONFIG_NET_FC=y
CONFIG_NETCONSOLE=m
# CONFIG_NETCONSOLE_DYNAMIC is not set
CONFIG_NETPOLL=y
CONFIG_NETPOLL_TRAP=y
CONFIG_NET_POLL_CONTROLLER=y
CONFIG_ISDN=m
CONFIG_ISDN_I4L=m
CONFIG_ISDN_PPP=y
CONFIG_ISDN_PPP_VJ=y
CONFIG_ISDN_MPP=y
CONFIG_IPPP_FILTER=y
# CONFIG_ISDN_PPP_BSDCOMP is not set
CONFIG_ISDN_AUDIO=y
CONFIG_ISDN_TTY_FAX=y

#
# ISDN feature submodules
#
CONFIG_ISDN_DIVERSION=m

#
# ISDN4Linux hardware drivers
#

#
# Passive cards
#
CONFIG_ISDN_DRV_HISAX=m

#
# D-channel protocol features
#
CONFIG_HISAX_EURO=y
CONFIG_DE_AOC=y
CONFIG_HISAX_NO_SENDCOMPLETE=y
CONFIG_HISAX_NO_LLC=y
CONFIG_HISAX_NO_KEYPAD=y
CONFIG_HISAX_1TR6=y
CONFIG_HISAX_NI1=y
CONFIG_HISAX_MAX_CARDS=8

#
# HiSax supported cards
#
CONFIG_HISAX_16_3=y
CONFIG_HISAX_TELESPCI=y
CONFIG_HISAX_S0BOX=y
CONFIG_HISAX_FRITZPCI=y
CONFIG_HISAX_AVM_A1_PCMCIA=y
CONFIG_HISAX_ELSA=y
CONFIG_HISAX_DIEHLDIVA=y
CONFIG_HISAX_SEDLBAUER=y
CONFIG_HISAX_NETJET=y
CONFIG_HISAX_NETJET_U=y
CONFIG_HISAX_NICCY=y
CONFIG_HISAX_BKM_A4T=y
CONFIG_HISAX_SCT_QUADRO=y
CONFIG_HISAX_GAZEL=y
CONFIG_HISAX_HFC_PCI=y
CONFIG_HISAX_W6692=y
CONFIG_HISAX_HFC_SX=y
CONFIG_HISAX_ENTERNOW_PCI=y
# CONFIG_HISAX_DEBUG is not set

#
# HiSax PCMCIA card service modules
#
CONFIG_HISAX_SEDLBAUER_CS=m
CONFIG_HISAX_ELSA_CS=m
CONFIG_HISAX_AVM_A1_CS=m
CONFIG_HISAX_TELES_CS=m

#
# HiSax sub driver modules
#
CONFIG_HISAX_ST5481=m
# CONFIG_HISAX_HFCUSB is not set
CONFIG_HISAX_HFC4S8S=m
CONFIG_HISAX_FRITZ_PCIPNP=m
CONFIG_HISAX_HDLC=y

#
# Active cards
#
# CONFIG_HYSDN is not set
CONFIG_ISDN_DRV_GIGASET=m
CONFIG_GIGASET_BASE=m
CONFIG_GIGASET_M105=m
# CONFIG_GIGASET_M101 is not set
# CONFIG_GIGASET_DEBUG is not set
# CONFIG_GIGASET_UNDOCREQ is not set
CONFIG_ISDN_CAPI=m
CONFIG_ISDN_DRV_AVMB1_VERBOSE_REASON=y
CONFIG_CAPI_TRACE=y
CONFIG_ISDN_CAPI_MIDDLEWARE=y
CONFIG_ISDN_CAPI_CAPI20=m
CONFIG_ISDN_CAPI_CAPIFS_BOOL=y
CONFIG_ISDN_CAPI_CAPIFS=m
CONFIG_ISDN_CAPI_CAPIDRV=m

#
# CAPI hardware drivers
#
CONFIG_CAPI_AVM=y
CONFIG_ISDN_DRV_AVMB1_B1PCI=m
CONFIG_ISDN_DRV_AVMB1_B1PCIV4=y
CONFIG_ISDN_DRV_AVMB1_B1PCMCIA=m
CONFIG_ISDN_DRV_AVMB1_AVM_CS=m
CONFIG_ISDN_DRV_AVMB1_T1PCI=m
CONFIG_ISDN_DRV_AVMB1_C4=m
# CONFIG_CAPI_EICON is not set
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_FF_MEMLESS=y
CONFIG_INPUT_POLLDEV=m

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_JOYDEV=m
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
# CONFIG_MOUSE_PS2_ELANTECH is not set
CONFIG_MOUSE_SERIAL=m
# CONFIG_MOUSE_APPLETOUCH is not set
CONFIG_MOUSE_VSXXXAA=m
CONFIG_INPUT_JOYSTICK=y
# CONFIG_JOYSTICK_ANALOG is not set
# CONFIG_JOYSTICK_A3D is not set
# CONFIG_JOYSTICK_ADI is not set
# CONFIG_JOYSTICK_COBRA is not set
# CONFIG_JOYSTICK_GF2K is not set
# CONFIG_JOYSTICK_GRIP is not set
# CONFIG_JOYSTICK_GRIP_MP is not set
# CONFIG_JOYSTICK_GUILLEMOT is not set
# CONFIG_JOYSTICK_INTERACT is not set
# CONFIG_JOYSTICK_SIDEWINDER is not set
# CONFIG_JOYSTICK_TMDC is not set
# CONFIG_JOYSTICK_IFORCE is not set
# CONFIG_JOYSTICK_WARRIOR is not set
# CONFIG_JOYSTICK_MAGELLAN is not set
# CONFIG_JOYSTICK_SPACEORB is not set
# CONFIG_JOYSTICK_SPACEBALL is not set
# CONFIG_JOYSTICK_STINGER is not set
CONFIG_JOYSTICK_TWIDJOY=m
# CONFIG_JOYSTICK_ZHENHUA is not set
# CONFIG_JOYSTICK_DB9 is not set
# CONFIG_JOYSTICK_GAMECON is not set
# CONFIG_JOYSTICK_TURBOGRAFX is not set
CONFIG_JOYSTICK_JOYDUMP=m
# CONFIG_JOYSTICK_XPAD is not set
# CONFIG_INPUT_TABLET is not set
CONFIG_INPUT_TOUCHSCREEN=y
# CONFIG_TOUCHSCREEN_FUJITSU is not set
CONFIG_TOUCHSCREEN_GUNZE=m
CONFIG_TOUCHSCREEN_ELO=m
CONFIG_TOUCHSCREEN_MTOUCH=m
CONFIG_TOUCHSCREEN_MK712=m
# CONFIG_TOUCHSCREEN_PENMOUNT is not set
# CONFIG_TOUCHSCREEN_TOUCHRIGHT is not set
# CONFIG_TOUCHSCREEN_TOUCHWIN is not set
# CONFIG_TOUCHSCREEN_UCB1400 is not set
# CONFIG_TOUCHSCREEN_WM97XX is not set
# CONFIG_TOUCHSCREEN_USB_COMPOSITE is not set
CONFIG_INPUT_MISC=y
CONFIG_INPUT_PCSPKR=m
# CONFIG_INPUT_APANEL is not set
# CONFIG_INPUT_ATLAS_BTNS is not set
# CONFIG_INPUT_ATI_REMOTE is not set
# CONFIG_INPUT_ATI_REMOTE2 is not set
# CONFIG_INPUT_KEYSPAN_REMOTE is not set
# CONFIG_INPUT_APPLEIR is not set
# CONFIG_INPUT_POWERMATE is not set
# CONFIG_INPUT_YEALINK is not set
CONFIG_INPUT_UINPUT=m

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PARKBD is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
CONFIG_SERIO_RAW=m
CONFIG_GAMEPORT=m
CONFIG_GAMEPORT_NS558=m
CONFIG_GAMEPORT_L4=m
CONFIG_GAMEPORT_EMU10K1=m
CONFIG_GAMEPORT_FM801=m

#
# Character devices
#
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_DEVKMEM=y
CONFIG_SERIAL_NONSTANDARD=y
# CONFIG_COMPUTONE is not set
# CONFIG_ROCKETPORT is not set
CONFIG_CYCLADES=m
# CONFIG_CYZ_INTR is not set
# CONFIG_DIGIEPCA is not set
# CONFIG_MOXA_INTELLIO is not set
# CONFIG_MOXA_SMARTIO is not set
# CONFIG_ISI is not set
CONFIG_SYNCLINK=m
CONFIG_SYNCLINKMP=m
CONFIG_SYNCLINK_GT=m
CONFIG_N_HDLC=m
# CONFIG_RISCOM8 is not set
# CONFIG_SPECIALIX is not set
# CONFIG_SX is not set
# CONFIG_RIO is not set
# CONFIG_STALDRV is not set
# CONFIG_NOZOMI is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_CS=m
CONFIG_SERIAL_8250_NR_UARTS=32
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
CONFIG_SERIAL_8250_DETECT_IRQ=y
CONFIG_SERIAL_8250_RSA=y

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_SERIAL_JSM=m
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set
CONFIG_PRINTER=m
CONFIG_LP_CONSOLE=y
CONFIG_PPDEV=m
CONFIG_IPMI_HANDLER=m
# CONFIG_IPMI_PANIC_EVENT is not set
CONFIG_IPMI_DEVICE_INTERFACE=m
CONFIG_IPMI_SI=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=m
CONFIG_HW_RANDOM=y
CONFIG_HW_RANDOM_INTEL=m
CONFIG_HW_RANDOM_AMD=m
CONFIG_NVRAM=y
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set

#
# PCMCIA character devices
#
# CONFIG_SYNCLINK_CS is not set
CONFIG_CARDMAN_4000=m
CONFIG_CARDMAN_4040=m
# CONFIG_IPWIRELESS is not set
# CONFIG_MWAVE is not set
CONFIG_PC8736x_GPIO=m
CONFIG_NSC_GPIO=m
CONFIG_RAW_DRIVER=y
CONFIG_MAX_RAW_DEVS=8192
CONFIG_HPET=y
# CONFIG_HPET_MMAP is not set
CONFIG_HANGCHECK_TIMER=m
# CONFIG_TCG_TPM is not set
# CONFIG_TELCLOCK is not set
CONFIG_DEVPORT=y
CONFIG_I2C=m
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_ALGOBIT=m

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
CONFIG_I2C_AMD756=m
CONFIG_I2C_AMD756_S4882=m
CONFIG_I2C_AMD8111=m
CONFIG_I2C_I801=m
# CONFIG_I2C_PIIX4 is not set
CONFIG_I2C_NFORCE2=m
# CONFIG_I2C_NFORCE2_S4985 is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
CONFIG_I2C_SIS96X=m
CONFIG_I2C_VIA=m
CONFIG_I2C_VIAPRO=m

#
# Graphics adapter I2C/DDC channel drivers
#
CONFIG_I2C_VOODOO3=m

#
# External I2C/SMBus adapter drivers
#
CONFIG_I2C_PARPORT=m
CONFIG_I2C_PARPORT_LIGHT=m
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set

#
# Other I2C/SMBus bus drivers
#
# CONFIG_I2C_ISCH is not set
# CONFIG_I2C_OCORES is not set
# CONFIG_I2C_SIMTEC is not set
CONFIG_I2C_STUB=m
# CONFIG_I2C_PCA_PLATFORM is not set

#
# Miscellaneous I2C Chip support
#
# CONFIG_DS1682 is not set
CONFIG_SENSORS_EEPROM=m
CONFIG_SENSORS_PCF8574=m
# CONFIG_PCF8575 is not set
CONFIG_SENSORS_PCF8591=m
CONFIG_SENSORS_MAX6875=m
# CONFIG_SENSORS_TSL2550 is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_I2C_DEBUG_CHIP is not set
# CONFIG_SPI is not set
# CONFIG_W1 is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
# CONFIG_PDA_POWER is not set
# CONFIG_BATTERY_DS2760 is not set
CONFIG_HWMON=m
CONFIG_HWMON_VID=m
CONFIG_SENSORS_ABITUGURU=m
# CONFIG_SENSORS_ABITUGURU3 is not set
# CONFIG_SENSORS_AD7418 is not set
CONFIG_SENSORS_ADM1021=m
CONFIG_SENSORS_ADM1025=m
CONFIG_SENSORS_ADM1026=m
# CONFIG_SENSORS_ADM1029 is not set
CONFIG_SENSORS_ADM1031=m
CONFIG_SENSORS_ADM9240=m
# CONFIG_SENSORS_ADT7470 is not set
# CONFIG_SENSORS_ADT7473 is not set
CONFIG_SENSORS_K8TEMP=m
CONFIG_SENSORS_ASB100=m
CONFIG_SENSORS_ATXP1=m
CONFIG_SENSORS_DS1621=m
# CONFIG_SENSORS_I5K_AMB is not set
CONFIG_SENSORS_F71805F=m
# CONFIG_SENSORS_F71882FG is not set
# CONFIG_SENSORS_F75375S is not set
CONFIG_SENSORS_FSCHER=m
CONFIG_SENSORS_FSCPOS=m
# CONFIG_SENSORS_FSCHMD is not set
CONFIG_SENSORS_GL518SM=m
CONFIG_SENSORS_GL520SM=m
# CONFIG_SENSORS_CORETEMP is not set
# CONFIG_SENSORS_IBMAEM is not set
# CONFIG_SENSORS_IBMPEX is not set
CONFIG_SENSORS_IT87=m
CONFIG_SENSORS_LM63=m
CONFIG_SENSORS_LM75=m
CONFIG_SENSORS_LM77=m
CONFIG_SENSORS_LM78=m
CONFIG_SENSORS_LM80=m
CONFIG_SENSORS_LM83=m
CONFIG_SENSORS_LM85=m
CONFIG_SENSORS_LM87=m
CONFIG_SENSORS_LM90=m
CONFIG_SENSORS_LM92=m
# CONFIG_SENSORS_LM93 is not set
CONFIG_SENSORS_MAX1619=m
# CONFIG_SENSORS_MAX6650 is not set
CONFIG_SENSORS_PC87360=m
# CONFIG_SENSORS_PC87427 is not set
CONFIG_SENSORS_SIS5595=m
# CONFIG_SENSORS_DME1737 is not set
CONFIG_SENSORS_SMSC47M1=m
CONFIG_SENSORS_SMSC47M192=m
CONFIG_SENSORS_SMSC47B397=m
# CONFIG_SENSORS_ADS7828 is not set
# CONFIG_SENSORS_THMC50 is not set
CONFIG_SENSORS_VIA686A=m
# CONFIG_SENSORS_VT1211 is not set
CONFIG_SENSORS_VT8231=m
CONFIG_SENSORS_W83781D=m
CONFIG_SENSORS_W83791D=m
CONFIG_SENSORS_W83792D=m
# CONFIG_SENSORS_W83793 is not set
CONFIG_SENSORS_W83L785TS=m
# CONFIG_SENSORS_W83L786NG is not set
CONFIG_SENSORS_W83627HF=m
CONFIG_SENSORS_W83627EHF=m
CONFIG_SENSORS_HDAPS=m
# CONFIG_SENSORS_APPLESMC is not set
# CONFIG_HWMON_DEBUG_CHIP is not set
CONFIG_THERMAL=y
CONFIG_WATCHDOG=y
# CONFIG_WATCHDOG_NOWAYOUT is not set

#
# Watchdog Device Drivers
#
CONFIG_SOFT_WATCHDOG=m
# CONFIG_ACQUIRE_WDT is not set
# CONFIG_ADVANTECH_WDT is not set
CONFIG_ALIM1535_WDT=m
CONFIG_ALIM7101_WDT=m
# CONFIG_SC520_WDT is not set
# CONFIG_EUROTECH_WDT is not set
# CONFIG_IB700_WDT is not set
CONFIG_IBMASR=m
# CONFIG_WAFER_WDT is not set
CONFIG_I6300ESB_WDT=m
# CONFIG_ITCO_WDT is not set
# CONFIG_IT8712F_WDT is not set
# CONFIG_HP_WATCHDOG is not set
# CONFIG_SC1200_WDT is not set
# CONFIG_PC87413_WDT is not set
# CONFIG_60XX_WDT is not set
# CONFIG_SBC8360_WDT is not set
# CONFIG_CPU5_WDT is not set
# CONFIG_SMSC37B787_WDT is not set
CONFIG_W83627HF_WDT=m
# CONFIG_W83697HF_WDT is not set
CONFIG_W83877F_WDT=m
CONFIG_W83977F_WDT=m
CONFIG_MACHZ_WDT=m
# CONFIG_SBC_EPX_C3_WATCHDOG is not set

#
# PCI-based Watchdog Cards
#
CONFIG_PCIPCWATCHDOG=m
CONFIG_WDTPCI=m
CONFIG_WDT_501_PCI=y

#
# USB-based Watchdog Cards
#
CONFIG_USBPCWATCHDOG=m

#
# Sonics Silicon Backplane
#
CONFIG_SSB_POSSIBLE=y
CONFIG_SSB=m
CONFIG_SSB_SPROM=y
CONFIG_SSB_PCIHOST_POSSIBLE=y
CONFIG_SSB_PCIHOST=y
# CONFIG_SSB_B43_PCI_BRIDGE is not set
CONFIG_SSB_PCMCIAHOST_POSSIBLE=y
# CONFIG_SSB_PCMCIAHOST is not set
# CONFIG_SSB_DEBUG is not set
CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y
CONFIG_SSB_DRIVER_PCICORE=y

#
# Multifunction device drivers
#
# CONFIG_MFD_SM501 is not set
# CONFIG_HTC_PASIC3 is not set

#
# Multimedia devices
#

#
# Multimedia core support
#
CONFIG_VIDEO_DEV=m
CONFIG_VIDEO_V4L2_COMMON=m
CONFIG_VIDEO_ALLOW_V4L1=y
CONFIG_VIDEO_V4L1_COMPAT=y
# CONFIG_DVB_CORE is not set
CONFIG_VIDEO_MEDIA=m

#
# Multimedia drivers
#
# CONFIG_MEDIA_ATTACH is not set
CONFIG_MEDIA_TUNER=m
# CONFIG_MEDIA_TUNER_CUSTOMIZE is not set
CONFIG_MEDIA_TUNER_SIMPLE=m
CONFIG_MEDIA_TUNER_TDA8290=m
CONFIG_MEDIA_TUNER_TDA9887=m
CONFIG_MEDIA_TUNER_TEA5761=m
CONFIG_MEDIA_TUNER_TEA5767=m
CONFIG_MEDIA_TUNER_MT20XX=m
CONFIG_MEDIA_TUNER_XC2028=m
CONFIG_MEDIA_TUNER_XC5000=m
CONFIG_VIDEO_V4L2=m
CONFIG_VIDEO_V4L1=m
CONFIG_VIDEOBUF_GEN=m
CONFIG_VIDEOBUF_DMA_SG=m
CONFIG_VIDEOBUF_VMALLOC=m
CONFIG_VIDEO_BTCX=m
CONFIG_VIDEO_IR_I2C=m
CONFIG_VIDEO_IR=m
CONFIG_VIDEO_TVEEPROM=m
CONFIG_VIDEO_TUNER=m
CONFIG_VIDEO_CAPTURE_DRIVERS=y
# CONFIG_VIDEO_ADV_DEBUG is not set
CONFIG_VIDEO_HELPER_CHIPS_AUTO=y
CONFIG_VIDEO_TVAUDIO=m
CONFIG_VIDEO_TDA7432=m
CONFIG_VIDEO_TDA9875=m
CONFIG_VIDEO_MSP3400=m
CONFIG_VIDEO_CS53L32A=m
CONFIG_VIDEO_WM8775=m
CONFIG_VIDEO_SAA711X=m
CONFIG_VIDEO_TVP5150=m
CONFIG_VIDEO_CX25840=m
CONFIG_VIDEO_CX2341X=m
# CONFIG_VIDEO_VIVI is not set
CONFIG_VIDEO_BT848=m
CONFIG_VIDEO_SAA6588=m
# CONFIG_VIDEO_BWQCAM is not set
# CONFIG_VIDEO_CQCAM is not set
# CONFIG_VIDEO_W9966 is not set
# CONFIG_VIDEO_CPIA is not set
CONFIG_VIDEO_CPIA2=m
# CONFIG_VIDEO_SAA5246A is not set
# CONFIG_VIDEO_SAA5249 is not set
# CONFIG_TUNER_3036 is not set
# CONFIG_VIDEO_STRADIS is not set
# CONFIG_VIDEO_ZORAN is not set
# CONFIG_VIDEO_SAA7134 is not set
# CONFIG_VIDEO_MXB is not set
# CONFIG_VIDEO_DPC is not set
# CONFIG_VIDEO_HEXIUM_ORION is not set
# CONFIG_VIDEO_HEXIUM_GEMINI is not set
# CONFIG_VIDEO_CX88 is not set
# CONFIG_VIDEO_IVTV is not set
# CONFIG_VIDEO_CAFE_CCIC is not set
CONFIG_V4L_USB_DRIVERS=y
CONFIG_VIDEO_PVRUSB2=m
CONFIG_VIDEO_PVRUSB2_SYSFS=y
# CONFIG_VIDEO_PVRUSB2_DEBUGIFC is not set
CONFIG_VIDEO_EM28XX=m
# CONFIG_VIDEO_EM28XX_ALSA is not set
# CONFIG_VIDEO_USBVISION is not set
CONFIG_VIDEO_USBVIDEO=m
CONFIG_USB_VICAM=m
CONFIG_USB_IBMCAM=m
CONFIG_USB_KONICAWC=m
CONFIG_USB_QUICKCAM_MESSENGER=m
CONFIG_USB_ET61X251=m
CONFIG_VIDEO_OVCAMCHIP=m
CONFIG_USB_W9968CF=m
CONFIG_USB_OV511=m
CONFIG_USB_SE401=m
CONFIG_USB_SN9C102=m
CONFIG_USB_STV680=m
CONFIG_USB_ZC0301=m
CONFIG_USB_PWC=m
# CONFIG_USB_PWC_DEBUG is not set
# CONFIG_USB_ZR364XX is not set
# CONFIG_USB_STKWEBCAM is not set
# CONFIG_USB_S2255 is not set
# CONFIG_SOC_CAMERA is not set
CONFIG_RADIO_ADAPTERS=y
# CONFIG_RADIO_GEMTEK_PCI is not set
# CONFIG_RADIO_MAXIRADIO is not set
# CONFIG_RADIO_MAESTRO is not set
CONFIG_USB_DSBR=m
# CONFIG_USB_SI470X is not set
# CONFIG_DAB is not set

#
# Graphics support
#
CONFIG_AGP=y
CONFIG_AGP_AMD64=y
CONFIG_AGP_INTEL=y
CONFIG_AGP_SIS=y
CONFIG_AGP_VIA=y
CONFIG_DRM=m
# CONFIG_DRM_TDFX is not set
CONFIG_DRM_R128=m
CONFIG_DRM_RADEON=m
CONFIG_DRM_I810=m
CONFIG_DRM_I830=m
CONFIG_DRM_I915=m
CONFIG_DRM_MGA=m
# CONFIG_DRM_SIS is not set
CONFIG_DRM_VIA=m
CONFIG_DRM_SAVAGE=m
CONFIG_VGASTATE=m
# CONFIG_VIDEO_OUTPUT_CONTROL is not set
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_DDC=m
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
# CONFIG_FB_SYS_FILLRECT is not set
# CONFIG_FB_SYS_COPYAREA is not set
# CONFIG_FB_SYS_IMAGEBLIT is not set
# CONFIG_FB_FOREIGN_ENDIAN is not set
# CONFIG_FB_SYS_FOPS is not set
# CONFIG_FB_SVGALIB is not set
# CONFIG_FB_MACMODES is not set
CONFIG_FB_BACKLIGHT=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
CONFIG_FB_CIRRUS=m
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
CONFIG_FB_VGA16=m
# CONFIG_FB_UVESA is not set
CONFIG_FB_VESA=y
# CONFIG_FB_EFI is not set
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_S1D13XXX is not set
CONFIG_FB_NVIDIA=m
CONFIG_FB_NVIDIA_I2C=y
# CONFIG_FB_NVIDIA_DEBUG is not set
CONFIG_FB_NVIDIA_BACKLIGHT=y
CONFIG_FB_RIVA=m
# CONFIG_FB_RIVA_I2C is not set
# CONFIG_FB_RIVA_DEBUG is not set
CONFIG_FB_RIVA_BACKLIGHT=y
# CONFIG_FB_LE80578 is not set
CONFIG_FB_INTEL=m
# CONFIG_FB_INTEL_DEBUG is not set
CONFIG_FB_INTEL_I2C=y
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
CONFIG_FB_SAVAGE=m
CONFIG_FB_SAVAGE_I2C=y
CONFIG_FB_SAVAGE_ACCEL=y
# CONFIG_FB_SIS is not set
# CONFIG_FB_NEOMAGIC is not set
CONFIG_FB_KYRO=m
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_GEODE is not set
# CONFIG_FB_VIRTUAL is not set
CONFIG_BACKLIGHT_LCD_SUPPORT=y
CONFIG_LCD_CLASS_DEVICE=m
CONFIG_BACKLIGHT_CLASS_DEVICE=y
# CONFIG_BACKLIGHT_CORGI is not set
# CONFIG_BACKLIGHT_PROGEAR is not set

#
# Display device support
#
# CONFIG_DISPLAY_SUPPORT is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64
CONFIG_VIDEO_SELECT=y
CONFIG_DUMMY_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
# CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY is not set
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_LOGO_LINUX_CLUT224=y
CONFIG_SOUND=m
CONFIG_SND=m
CONFIG_SND_TIMER=m
CONFIG_SND_PCM=m
CONFIG_SND_HWDEP=m
CONFIG_SND_RAWMIDI=m
CONFIG_SND_SEQUENCER=m
CONFIG_SND_SEQ_DUMMY=m
CONFIG_SND_OSSEMUL=y
CONFIG_SND_MIXER_OSS=m
CONFIG_SND_PCM_OSS=m
CONFIG_SND_PCM_OSS_PLUGINS=y
CONFIG_SND_SEQUENCER_OSS=y
CONFIG_SND_DYNAMIC_MINORS=y
# CONFIG_SND_SUPPORT_OLD_API is not set
CONFIG_SND_VERBOSE_PROCFS=y
# CONFIG_SND_VERBOSE_PRINTK is not set
# CONFIG_SND_DEBUG is not set
CONFIG_SND_VMASTER=y
CONFIG_SND_MPU401_UART=m
CONFIG_SND_OPL3_LIB=m
CONFIG_SND_VX_LIB=m
CONFIG_SND_AC97_CODEC=m
CONFIG_SND_DRIVERS=y
CONFIG_SND_DUMMY=m
CONFIG_SND_VIRMIDI=m
CONFIG_SND_MTPAV=m
# CONFIG_SND_MTS64 is not set
# CONFIG_SND_SERIAL_U16550 is not set
CONFIG_SND_MPU401=m
# CONFIG_SND_PORTMAN2X4 is not set
# CONFIG_SND_AC97_POWER_SAVE is not set
CONFIG_SND_SB_COMMON=m
CONFIG_SND_PCI=y
CONFIG_SND_AD1889=m
CONFIG_SND_ALS300=m
CONFIG_SND_ALS4000=m
CONFIG_SND_ALI5451=m
CONFIG_SND_ATIIXP=m
CONFIG_SND_ATIIXP_MODEM=m
CONFIG_SND_AU8810=m
CONFIG_SND_AU8820=m
CONFIG_SND_AU8830=m
# CONFIG_SND_AW2 is not set
CONFIG_SND_AZT3328=m
CONFIG_SND_BT87X=m
# CONFIG_SND_BT87X_OVERCLOCK is not set
CONFIG_SND_CA0106=m
CONFIG_SND_CMIPCI=m
# CONFIG_SND_OXYGEN is not set
CONFIG_SND_CS4281=m
CONFIG_SND_CS46XX=m
CONFIG_SND_CS46XX_NEW_DSP=y
# CONFIG_SND_CS5530 is not set
CONFIG_SND_DARLA20=m
CONFIG_SND_GINA20=m
CONFIG_SND_LAYLA20=m
CONFIG_SND_DARLA24=m
CONFIG_SND_GINA24=m
CONFIG_SND_LAYLA24=m
CONFIG_SND_MONA=m
CONFIG_SND_MIA=m
CONFIG_SND_ECHO3G=m
CONFIG_SND_INDIGO=m
CONFIG_SND_INDIGOIO=m
CONFIG_SND_INDIGODJ=m
CONFIG_SND_EMU10K1=m
CONFIG_SND_EMU10K1X=m
CONFIG_SND_ENS1370=m
CONFIG_SND_ENS1371=m
CONFIG_SND_ES1938=m
CONFIG_SND_ES1968=m
CONFIG_SND_FM801=m
CONFIG_SND_FM801_TEA575X_BOOL=y
CONFIG_SND_FM801_TEA575X=m
CONFIG_SND_HDA_INTEL=m
# CONFIG_SND_HDA_HWDEP is not set
CONFIG_SND_HDA_CODEC_REALTEK=y
CONFIG_SND_HDA_CODEC_ANALOG=y
CONFIG_SND_HDA_CODEC_SIGMATEL=y
CONFIG_SND_HDA_CODEC_VIA=y
CONFIG_SND_HDA_CODEC_ATIHDMI=y
CONFIG_SND_HDA_CODEC_CONEXANT=y
CONFIG_SND_HDA_CODEC_CMEDIA=y
CONFIG_SND_HDA_CODEC_SI3054=y
CONFIG_SND_HDA_GENERIC=y
# CONFIG_SND_HDA_POWER_SAVE is not set
CONFIG_SND_HDSP=m
CONFIG_SND_HDSPM=m
# CONFIG_SND_HIFIER is not set
CONFIG_SND_ICE1712=m
CONFIG_SND_ICE1724=m
CONFIG_SND_INTEL8X0=m
CONFIG_SND_INTEL8X0M=m
CONFIG_SND_KORG1212=m
# CONFIG_SND_KORG1212_FIRMWARE_IN_KERNEL is not set
CONFIG_SND_MAESTRO3=m
# CONFIG_SND_MAESTRO3_FIRMWARE_IN_KERNEL is not set
CONFIG_SND_MIXART=m
CONFIG_SND_NM256=m
CONFIG_SND_PCXHR=m
CONFIG_SND_RIPTIDE=m
CONFIG_SND_RME32=m
CONFIG_SND_RME96=m
CONFIG_SND_RME9652=m
CONFIG_SND_SONICVIBES=m
CONFIG_SND_TRIDENT=m
CONFIG_SND_VIA82XX=m
CONFIG_SND_VIA82XX_MODEM=m
# CONFIG_SND_VIRTUOSO is not set
CONFIG_SND_VX222=m
CONFIG_SND_YMFPCI=m
# CONFIG_SND_YMFPCI_FIRMWARE_IN_KERNEL is not set
CONFIG_SND_USB=y
CONFIG_SND_USB_AUDIO=m
CONFIG_SND_USB_USX2Y=m
# CONFIG_SND_USB_CAIAQ is not set
CONFIG_SND_PCMCIA=y
# CONFIG_SND_VXPOCKET is not set
# CONFIG_SND_PDAUDIOCF is not set
# CONFIG_SND_SOC is not set
# CONFIG_SOUND_PRIME is not set
CONFIG_AC97_BUS=m
CONFIG_HID_SUPPORT=y
CONFIG_HID=y
CONFIG_HID_DEBUG=y
# CONFIG_HIDRAW is not set

#
# USB Input Devices
#
CONFIG_USB_HID=y
# CONFIG_USB_HIDINPUT_POWERBOOK is not set
CONFIG_HID_FF=y
CONFIG_HID_PID=y
CONFIG_LOGITECH_FF=y
# CONFIG_LOGIRUMBLEPAD2_FF is not set
# CONFIG_PANTHERLORD_FF is not set
CONFIG_THRUSTMASTER_FF=y
# CONFIG_ZEROPLUS_FF is not set
CONFIG_USB_HIDDEV=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set
# CONFIG_USB_ANNOUNCE_NEW_DEVICES is not set

#
# Miscellaneous USB options
#
CONFIG_USB_DEVICEFS=y
CONFIG_USB_DEVICE_CLASS=y
# CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_SUSPEND is not set
# CONFIG_USB_OTG is not set
# CONFIG_USB_WUSB is not set

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
CONFIG_USB_EHCI_HCD=m
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_TT_NEWSCHED=y
CONFIG_USB_ISP116X_HCD=m
# CONFIG_USB_ISP1760_HCD is not set
CONFIG_USB_OHCI_HCD=m
# CONFIG_USB_OHCI_HCD_SSB is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_DESC is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_MMIO is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=m
CONFIG_USB_SL811_HCD=m
CONFIG_USB_SL811_CS=m
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_WHCI_HCD is not set
# CONFIG_USB_HWA_HCD is not set

#
# USB Device Class drivers
#
CONFIG_USB_ACM=m
CONFIG_USB_PRINTER=m
# CONFIG_USB_WDM is not set

#
# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support'
#

#
# may also be needed; see USB_STORAGE Help for more information
#
CONFIG_USB_STORAGE=m
# CONFIG_USB_STORAGE_DEBUG is not set
CONFIG_USB_STORAGE_DATAFAB=y
CONFIG_USB_STORAGE_FREECOM=y
CONFIG_USB_STORAGE_ISD200=y
CONFIG_USB_STORAGE_DPCM=y
CONFIG_USB_STORAGE_USBAT=y
CONFIG_USB_STORAGE_SDDR09=y
CONFIG_USB_STORAGE_SDDR55=y
CONFIG_USB_STORAGE_JUMPSHOT=y
CONFIG_USB_STORAGE_ALAUDA=y
# CONFIG_USB_STORAGE_ONETOUCH is not set
# CONFIG_USB_STORAGE_KARMA is not set
# CONFIG_USB_STORAGE_CYPRESS_ATACB is not set
# CONFIG_USB_LIBUSUAL is not set

#
# USB Imaging devices
#
CONFIG_USB_MDC800=m
CONFIG_USB_MICROTEK=m
CONFIG_USB_MON=y

#
# USB port drivers
#
CONFIG_USB_USS720=m
CONFIG_USB_SERIAL=m
CONFIG_USB_EZUSB=y
CONFIG_USB_SERIAL_GENERIC=y
# CONFIG_USB_SERIAL_AIRCABLE is not set
CONFIG_USB_SERIAL_AIRPRIME=m
CONFIG_USB_SERIAL_ARK3116=m
CONFIG_USB_SERIAL_BELKIN=m
# CONFIG_USB_SERIAL_CH341 is not set
CONFIG_USB_SERIAL_WHITEHEAT=m
# CONFIG_USB_SERIAL_WHITEHEAT_FIRMWARE is not set
CONFIG_USB_SERIAL_DIGI_ACCELEPORT=m
CONFIG_USB_SERIAL_CP2101=m
CONFIG_USB_SERIAL_CYPRESS_M8=m
CONFIG_USB_SERIAL_EMPEG=m
CONFIG_USB_SERIAL_FTDI_SIO=m
CONFIG_USB_SERIAL_FUNSOFT=m
CONFIG_USB_SERIAL_VISOR=m
CONFIG_USB_SERIAL_IPAQ=m
CONFIG_USB_SERIAL_IR=m
CONFIG_USB_SERIAL_EDGEPORT=m
CONFIG_USB_SERIAL_EDGEPORT_TI=m
CONFIG_USB_SERIAL_GARMIN=m
CONFIG_USB_SERIAL_IPW=m
# CONFIG_USB_SERIAL_IUU is not set
CONFIG_USB_SERIAL_KEYSPAN_PDA=m
# CONFIG_USB_SERIAL_KEYSPAN_PDA_FIRMWARE is not set
CONFIG_USB_SERIAL_KEYSPAN=m
CONFIG_USB_SERIAL_KEYSPAN_MPR=y
CONFIG_USB_SERIAL_KEYSPAN_USA28=y
CONFIG_USB_SERIAL_KEYSPAN_USA28X=y
CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y
CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y
CONFIG_USB_SERIAL_KEYSPAN_USA19=y
CONFIG_USB_SERIAL_KEYSPAN_USA18X=y
CONFIG_USB_SERIAL_KEYSPAN_USA19W=y
CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y
CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y
CONFIG_USB_SERIAL_KEYSPAN_USA49W=y
CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y
CONFIG_USB_SERIAL_KLSI=m
CONFIG_USB_SERIAL_KOBIL_SCT=m
CONFIG_USB_SERIAL_MCT_U232=m
# CONFIG_USB_SERIAL_MOS7720 is not set
# CONFIG_USB_SERIAL_MOS7840 is not set
# CONFIG_USB_SERIAL_MOTOROLA is not set
CONFIG_USB_SERIAL_NAVMAN=m
CONFIG_USB_SERIAL_PL2303=m
# CONFIG_USB_SERIAL_OTI6858 is not set
# CONFIG_USB_SERIAL_SPCP8X5 is not set
CONFIG_USB_SERIAL_HP4X=m
CONFIG_USB_SERIAL_SAFE=m
CONFIG_USB_SERIAL_SAFE_PADDED=y
CONFIG_USB_SERIAL_SIERRAWIRELESS=m
CONFIG_USB_SERIAL_TI=m
# CONFIG_USB_SERIAL_TI_3410_FIRMWARE is not set
# CONFIG_USB_SERIAL_TI_5052_FIRMWARE is not set
CONFIG_USB_SERIAL_CYBERJACK=m
CONFIG_USB_SERIAL_XIRCOM=m
# CONFIG_USB_SERIAL_XIRCOM_FIRMWARE is not set
CONFIG_USB_SERIAL_OPTION=m
CONFIG_USB_SERIAL_OMNINET=m
# CONFIG_USB_SERIAL_DEBUG is not set

#
# USB Miscellaneous drivers
#
CONFIG_USB_EMI62=m
# CONFIG_USB_EMI62_FIRMWARE is not set
CONFIG_USB_EMI26=m
# CONFIG_USB_EMI26_FIRMWARE is not set
# CONFIG_USB_ADUTUX is not set
CONFIG_USB_AUERSWALD=m
CONFIG_USB_RIO500=m
CONFIG_USB_LEGOTOWER=m
CONFIG_USB_LCD=m
# CONFIG_USB_BERRY_CHARGE is not set
CONFIG_USB_LED=m
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_PHIDGET is not set
CONFIG_USB_IDMOUSE=m
# CONFIG_USB_FTDI_ELAN is not set
CONFIG_USB_APPLEDISPLAY=m
CONFIG_USB_SISUSBVGA=m
CONFIG_USB_SISUSBVGA_CON=y
CONFIG_USB_LD=m
# CONFIG_USB_TRANCEVIBRATOR is not set
# CONFIG_USB_IOWARRIOR is not set
CONFIG_USB_TEST=m
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_GOTEMP is not set
CONFIG_USB_ATM=m
CONFIG_USB_SPEEDTOUCH=m
CONFIG_USB_CXACRU=m
CONFIG_USB_UEAGLEATM=m
CONFIG_USB_XUSBATM=m
# CONFIG_USB_GADGET is not set
# CONFIG_UWB is not set
CONFIG_MMC=m
# CONFIG_MMC_DEBUG is not set
# CONFIG_MMC_UNSAFE_RESUME is not set
# CONFIG_MMC_PASSWORDS is not set

#
# MMC/SD Card Drivers
#
CONFIG_MMC_BLOCK=m
CONFIG_MMC_BLOCK_BOUNCE=y
# CONFIG_SDIO_UART is not set
# CONFIG_MMC_TEST is not set

#
# MMC/SD Host Controller Drivers
#
CONFIG_MMC_SDHCI=m
# CONFIG_MMC_SDHCI_PCI is not set
CONFIG_MMC_WBSD=m
# CONFIG_MMC_TIFM_SD is not set
# CONFIG_MMC_SDRICOH_CS is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y

#
# LED drivers
#
# CONFIG_LEDS_PCA9532 is not set
# CONFIG_LEDS_CLEVO_MAIL is not set
# CONFIG_LEDS_PCA955X is not set

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=m
CONFIG_LEDS_TRIGGER_IDE_DISK=y
CONFIG_LEDS_TRIGGER_HEARTBEAT=m
# CONFIG_LEDS_TRIGGER_DEFAULT_ON is not set
# CONFIG_ACCESSIBILITY is not set
CONFIG_INFINIBAND=m
CONFIG_INFINIBAND_USER_MAD=m
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_INFINIBAND_USER_MEM=y
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_MTHCA=m
CONFIG_INFINIBAND_MTHCA_DEBUG=y
CONFIG_INFINIBAND_IPATH=m
CONFIG_INFINIBAND_AMSO1100=m
# CONFIG_INFINIBAND_AMSO1100_DEBUG is not set
CONFIG_INFINIBAND_CXGB3=m
# CONFIG_INFINIBAND_CXGB3_DEBUG is not set
# CONFIG_MLX4_INFINIBAND is not set
# CONFIG_INFINIBAND_NES is not set
CONFIG_INFINIBAND_IPOIB=m
# CONFIG_INFINIBAND_IPOIB_CM is not set
CONFIG_INFINIBAND_IPOIB_DEBUG=y
# CONFIG_INFINIBAND_IPOIB_DEBUG_DATA is not set
CONFIG_INFINIBAND_SRP=m
CONFIG_INFINIBAND_ISER=m
CONFIG_EDAC=y

#
# Reporting subsystems
#
# CONFIG_EDAC_DEBUG is not set
CONFIG_EDAC_MM_EDAC=m
CONFIG_EDAC_E752X=m
# CONFIG_EDAC_I82975X is not set
# CONFIG_EDAC_I3000 is not set
# CONFIG_EDAC_I5000 is not set
CONFIG_RTC_LIB=m
CONFIG_RTC_CLASS=m

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
# CONFIG_RTC_DRV_TEST is not set

#
# I2C RTC drivers
#
CONFIG_RTC_DRV_DS1307=m
# CONFIG_RTC_DRV_DS1374 is not set
CONFIG_RTC_DRV_DS1672=m
# CONFIG_RTC_DRV_MAX6900 is not set
CONFIG_RTC_DRV_RS5C372=m
CONFIG_RTC_DRV_ISL1208=m
CONFIG_RTC_DRV_X1205=m
CONFIG_RTC_DRV_PCF8563=m
CONFIG_RTC_DRV_PCF8583=m
# CONFIG_RTC_DRV_M41T80 is not set
# CONFIG_RTC_DRV_S35390A is not set
# CONFIG_RTC_DRV_FM3130 is not set

#
# SPI RTC drivers
#

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_CMOS=m
# CONFIG_RTC_DRV_DS1511 is not set
CONFIG_RTC_DRV_DS1553=m
CONFIG_RTC_DRV_DS1742=m
# CONFIG_RTC_DRV_STK17TA8 is not set
# CONFIG_RTC_DRV_M48T86 is not set
# CONFIG_RTC_DRV_M48T59 is not set
CONFIG_RTC_DRV_V3020=m

#
# on-CPU RTC drivers
#
# CONFIG_DMADEVICES is not set
# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set

#
# Firmware Drivers
#
CONFIG_EDD=m
# CONFIG_EDD_OFF is not set
CONFIG_DELL_RBU=m
CONFIG_DCDBAS=m
CONFIG_DMIID=y
# CONFIG_ISCSI_IBFT_FIND is not set

#
# File systems
#
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT2_FS_SECURITY=y
CONFIG_EXT2_FS_XIP=y
CONFIG_FS_XIP=y
CONFIG_EXT3_FS=m
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
# CONFIG_EXT4DEV_FS is not set
CONFIG_JBD=m
# CONFIG_JBD_DEBUG is not set
CONFIG_FS_MBCACHE=y
# CONFIG_REISER4_FS is not set
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
CONFIG_FS_POSIX_ACL=y
# CONFIG_XFS_FS is not set
CONFIG_GFS2_FS=m
CONFIG_GFS2_FS_LOCKING_DLM=m
# CONFIG_OCFS2_FS is not set
CONFIG_DNOTIFY=y
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_QUOTA=y
# CONFIG_QUOTA_NETLINK_INTERFACE is not set
CONFIG_PRINT_QUOTA_WARNING=y
# CONFIG_QFMT_V1 is not set
CONFIG_QFMT_V2=y
CONFIG_QUOTACTL=y
# CONFIG_AUTOFS_FS is not set
CONFIG_AUTOFS4_FS=m
# CONFIG_FUSE_FS is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=m
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="ascii"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
# CONFIG_TMPFS_POSIX_ACL is not set
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_CONFIGFS_FS=m

#
# Layered filesystems
#
# CONFIG_ECRYPT_FS is not set
# CONFIG_UNION_FS is not set

#
# Miscellaneous filesystems
#
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
CONFIG_HFS_FS=m
CONFIG_HFSPLUS_FS=m
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
CONFIG_JFFS2_FS=m
CONFIG_JFFS2_FS_DEBUG=0
CONFIG_JFFS2_FS_WRITEBUFFER=y
# CONFIG_JFFS2_FS_WBUF_VERIFY is not set
CONFIG_JFFS2_SUMMARY=y
# CONFIG_JFFS2_FS_XATTR is not set
# CONFIG_JFFS2_COMPRESSION_OPTIONS is not set
CONFIG_JFFS2_ZLIB=y
# CONFIG_JFFS2_LZO is not set
CONFIG_JFFS2_RTIME=y
# CONFIG_JFFS2_RUBIN is not set
# CONFIG_LOGFS is not set
CONFIG_CRAMFS=m
CONFIG_VXFS_FS=m
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=y
CONFIG_NFSD=m
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=m
CONFIG_NFS_ACL_SUPPORT=m
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=m
CONFIG_SUNRPC_GSS=m
CONFIG_SUNRPC_XPRT_RDMA=m
# CONFIG_SUNRPC_BIND34 is not set
CONFIG_RPCSEC_GSS_KRB5=m
CONFIG_RPCSEC_GSS_SPKM3=m
# CONFIG_SMB_FS is not set
CONFIG_CIFS=m
# CONFIG_CIFS_STATS is not set
CONFIG_CIFS_WEAK_PW_HASH=y
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
# CONFIG_CIFS_DEBUG2 is not set
# CONFIG_CIFS_EXPERIMENTAL is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
CONFIG_OSF_PARTITION=y
CONFIG_AMIGA_PARTITION=y
# CONFIG_ATARI_PARTITION is not set
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
# CONFIG_LDM_PARTITION is not set
CONFIG_SGI_PARTITION=y
# CONFIG_ULTRIX_PARTITION is not set
CONFIG_SUN_PARTITION=y
CONFIG_KARMA_PARTITION=y
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_CODEPAGE_737=m
CONFIG_NLS_CODEPAGE_775=m
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_CODEPAGE_852=m
CONFIG_NLS_CODEPAGE_855=m
CONFIG_NLS_CODEPAGE_857=m
CONFIG_NLS_CODEPAGE_860=m
CONFIG_NLS_CODEPAGE_861=m
CONFIG_NLS_CODEPAGE_862=m
CONFIG_NLS_CODEPAGE_863=m
CONFIG_NLS_CODEPAGE_864=m
CONFIG_NLS_CODEPAGE_865=m
CONFIG_NLS_CODEPAGE_866=m
CONFIG_NLS_CODEPAGE_869=m
CONFIG_NLS_CODEPAGE_936=m
CONFIG_NLS_CODEPAGE_950=m
CONFIG_NLS_CODEPAGE_932=m
CONFIG_NLS_CODEPAGE_949=m
CONFIG_NLS_CODEPAGE_874=m
CONFIG_NLS_ISO8859_8=m
CONFIG_NLS_CODEPAGE_1250=m
CONFIG_NLS_CODEPAGE_1251=m
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_ISO8859_2=m
CONFIG_NLS_ISO8859_3=m
CONFIG_NLS_ISO8859_4=m
CONFIG_NLS_ISO8859_5=m
CONFIG_NLS_ISO8859_6=m
CONFIG_NLS_ISO8859_7=m
CONFIG_NLS_ISO8859_9=m
CONFIG_NLS_ISO8859_13=m
CONFIG_NLS_ISO8859_14=m
CONFIG_NLS_ISO8859_15=m
CONFIG_NLS_KOI8_R=m
CONFIG_NLS_KOI8_U=m
CONFIG_NLS_UTF8=m
CONFIG_DLM=m
CONFIG_DLM_DEBUG=y

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
CONFIG_ENABLE_WARN_DEPRECATED=y
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=2048
CONFIG_MAGIC_SYSRQ=y
# CONFIG_UNUSED_SYMBOLS is not set
# CONFIG_PAGE_OWNER is not set
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
CONFIG_DEBUG_KERNEL=y
# CONFIG_DEBUG_SHIRQ is not set
CONFIG_DETECT_SOFTLOCKUP=y
CONFIG_SCHED_DEBUG=y
CONFIG_SCHEDSTATS=y
# CONFIG_TIMER_STATS is not set
# CONFIG_DEBUG_OBJECTS is not set
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_VM=y
# CONFIG_DEBUG_WRITECOUNT is not set
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_SG is not set
# CONFIG_FRAME_POINTER is not set
# CONFIG_DEBUG_SYNCHRO_TEST is not set
# CONFIG_PROFILE_LIKELY is not set
# CONFIG_BOOT_PRINTK_DELAY is not set
# CONFIG_FAULT_INJECTION is not set
# CONFIG_LATENCYTOP is not set
CONFIG_HAVE_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_TRACING=y
# CONFIG_FTRACE is not set
# CONFIG_IRQSOFF_TRACER is not set
# CONFIG_SYSPROF_TRACER is not set
# CONFIG_SCHED_TRACER is not set
# CONFIG_CONTEXT_SWITCH_TRACER is not set
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
# CONFIG_FIREWIRE_OHCI_REMOTE_DMA is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
# CONFIG_KERNEL_TESTS is not set
# CONFIG_NONPROMISC_DEVMEM is not set
CONFIG_EARLY_PRINTK=y
CONFIG_DEBUG_STACKOVERFLOW=y
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_PAGEALLOC is not set
# CONFIG_DEBUG_PER_CPU_MAPS is not set
# CONFIG_X86_PTDUMP is not set
CONFIG_DEBUG_RODATA=y
# CONFIG_DIRECT_GBPAGES is not set
CONFIG_DEBUG_RODATA_TEST=y
# CONFIG_DEBUG_NX_TEST is not set
CONFIG_X86_MPPARSE=y
# CONFIG_IOMMU_DEBUG is not set
CONFIG_MMIOTRACE_HOOKS=y
CONFIG_MMIOTRACE=y
# CONFIG_MMIOTRACE_TEST is not set
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=0
# CONFIG_DEBUG_BOOT_PARAMS is not set
# CONFIG_CPA_DEBUG is not set

#
# Security options
#
CONFIG_KEYS=y
CONFIG_KEYS_DEBUG_PROC_KEYS=y
CONFIG_SECURITY=y
CONFIG_SECURITY_NETWORK=y
CONFIG_SECURITY_NETWORK_XFRM=y
CONFIG_SECURITY_CAPABILITIES=y
# CONFIG_SECURITY_FILE_CAPABILITIES is not set
# CONFIG_SECURITY_ROOTPLUG is not set
CONFIG_SECURITY_DEFAULT_MMAP_MIN_ADDR=0
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE=1
CONFIG_SECURITY_SELINUX_DISABLE=y
CONFIG_SECURITY_SELINUX_DEVELOP=y
CONFIG_SECURITY_SELINUX_AVC_STATS=y
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=1
CONFIG_SECURITY_SELINUX_ENABLE_SECMARK_DEFAULT=y
# CONFIG_SECURITY_SELINUX_POLICYDB_VERSION_MAX is not set
# CONFIG_SECURITY_SMACK is not set
CONFIG_XOR_BLOCKS=m
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_AEAD=m
CONFIG_CRYPTO_BLKCIPHER=m
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_MANAGER=y
# CONFIG_CRYPTO_GF128MUL is not set
CONFIG_CRYPTO_NULL=m
# CONFIG_CRYPTO_CRYPTD is not set
CONFIG_CRYPTO_AUTHENC=m
# CONFIG_CRYPTO_TEST is not set

#
# Authenticated Encryption with Associated Data
#
# CONFIG_CRYPTO_CCM is not set
# CONFIG_CRYPTO_GCM is not set
# CONFIG_CRYPTO_SEQIV is not set

#
# Block modes
#
CONFIG_CRYPTO_CBC=m
# CONFIG_CRYPTO_CTR is not set
# CONFIG_CRYPTO_CTS is not set
CONFIG_CRYPTO_ECB=m
# CONFIG_CRYPTO_LRW is not set
# CONFIG_CRYPTO_PCBC is not set
# CONFIG_CRYPTO_XTS is not set

#
# Hash modes
#
CONFIG_CRYPTO_HMAC=y
# CONFIG_CRYPTO_XCBC is not set

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=m
CONFIG_CRYPTO_MICHAEL_MIC=m
# CONFIG_CRYPTO_RMD128 is not set
# CONFIG_CRYPTO_RMD160 is not set
# CONFIG_CRYPTO_RMD256 is not set
# CONFIG_CRYPTO_RMD320 is not set
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=m
CONFIG_CRYPTO_SHA512=m
CONFIG_CRYPTO_TGR192=m
CONFIG_CRYPTO_WP512=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=m
CONFIG_CRYPTO_AES_X86_64=m
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
# CONFIG_CRYPTO_CAMELLIA is not set
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_DES=m
# CONFIG_CRYPTO_FCRYPT is not set
CONFIG_CRYPTO_KHAZAD=m
# CONFIG_CRYPTO_SALSA20 is not set
# CONFIG_CRYPTO_SALSA20_X86_64 is not set
# CONFIG_CRYPTO_SEED is not set
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m
# CONFIG_CRYPTO_TWOFISH_X86_64 is not set

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=m
# CONFIG_CRYPTO_LZO is not set
CONFIG_CRYPTO_HW=y
# CONFIG_CRYPTO_DEV_HIFN_795X is not set
CONFIG_HAVE_KVM=y
CONFIG_VIRTUALIZATION=y
# CONFIG_KVM is not set
# CONFIG_VIRTIO_PCI is not set
# CONFIG_VIRTIO_BALLOON is not set

#
# Library routines
#
CONFIG_BITREVERSE=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_CRC_CCITT=m
CONFIG_CRC16=m
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC7 is not set
CONFIG_LIBCRC32C=y
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=m
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_REED_SOLOMON=m
CONFIG_REED_SOLOMON_DEC16=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_PLIST=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
# CONFIG_TRACE is not set

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
@ 2008-06-18  2:32         ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-18  2:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Nick Piggin,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 2003 bytes --]

On Wed, 18 Jun 2008 00:34:16 +0900, KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> > > I got this bug while migrating pages only a few times
> > > via memory_migrate of cpuset.
> > > 
> > > Unfortunately, even if this patch is applied,
> > > I got bad_page problem after hundreds times of page migration
> > > (I'll report it in another mail).
> > > But I believe something like this patch is needed anyway.
> > > 
> > 
> > I got bad_page after hundreds times of page migration.
> > It seems that a locked page is being freed.
> 
> I can't reproduce this bad page.
> I'll try again tomorrow ;)
> 

OK. I'll report on my test more precisely.

- Environment
  HW: 4CPU(x86_64), 2node NUMA
  kernel: 2.6.26-rc5-mm3 + Lee's two fixes about double unlock_page
          + my patch. config is attached.

- mount cpuset and make settings
  # mount -t cgroup -o cpuset cpuset /cgroup/cpuset

  # mkdir /cgroup/cpuset/01
  # echo 0-1 >/cgroup/cpuset/01/cpuset.cpus
  # echo 0 >/cgroup/cpuset/01/cpuset.mems
  # echo 1 >/cgroup/cpuset/01/cpuset.memory_migrate

  # mkdir /cgroup/cpuset/02
  # echo 2-3 >/cgroup/cpuset/02/cpuset.cpus
  # echo 1 >/cgroup/cpuset/02/cpuset.mems
  # echo 1 >/cgroup/cpuset/02/cpuset.memory_migrate

- register processes in cpusets
  # echo $$ >/cgroup/cpuset/01/tasks

  I'm using LTP's page01 test, and running two instances infinitely.
  # while true; do (somewhere)/page01 4194304 1; done &
  # while true; do (somewhere)/page01 4194304 1; done &

  The same thing should be done about 02 directory.

- echo pids to another directory
  Run simple script like below.

---
#!/bin/bash

G1=$1
G2=$2

move_task()
{
        for pid in $1
        do
                echo $pid >$2/tasks 2>/dev/null
        done
}

G1_TASK=`cat ${G1}/tasks`
G2_TASK=`cat ${G2}/tasks`

move_task "${G1_TASK}" ${G2} &
move_task "${G2_TASK}" ${G1} &

wait
---

Please let me know if you need other information.
I'm also digging this problem.


Thanks,
Daisuke Nishimura.


[-- Attachment #2: config-2.6.26-rc5-mm3 --]
[-- Type: text/plain, Size: 76269 bytes --]

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.26-rc5-mm3
# Thu Jun 12 15:26:49 2008
#
CONFIG_64BIT=y
# CONFIG_X86_32 is not set
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
# CONFIG_GENERIC_LOCKBREAK is not set
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
# CONFIG_GENERIC_GPIO is not set
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
# CONFIG_RWSEM_XCHGADD_ALGORITHM is not set
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_HAVE_CPUMASK_OF_CPU_MAP=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_AOUT=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_X86_SMP=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_X86_TRAMPOLINE=y
# CONFIG_KTIME_SCALAR is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
# CONFIG_TASK_XACCT is not set
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_TREE=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_CGROUP_NS is not set
# CONFIG_CGROUP_DEVICE is not set
CONFIG_CPUSETS=y
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_GROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_RT_GROUP_SCHED is not set
# CONFIG_USER_SCHED is not set
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_MM_OWNER=y
CONFIG_CGROUP_MEM_RES_CTLR=y
CONFIG_CGROUP_MEMRLIMIT_CTLR=y
CONFIG_SYSFS_DEPRECATED=y
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_RELAY=y
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_SYSCTL_SYSCALL_CHECK=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
CONFIG_KALLSYMS_EXTRA_PASS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_COMPAT_BRK=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_PROFILING=y
# CONFIG_MARKERS is not set
CONFIG_OPROFILE=m
CONFIG_HAVE_OPROFILE=y
CONFIG_KPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
# CONFIG_HAVE_DMA_ATTRS is not set
# CONFIG_HAVE_CLK is not set

#
# GCOV profiling
#
# CONFIG_GCOV_PROFILE is not set
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_BLK_DEV_IO_TRACE=y
# CONFIG_BLK_DEV_BSG is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_CLASSIC_RCU=y

#
# Processor type and features
#
# CONFIG_TICK_ONESHOT is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_X86_RDC321X is not set
# CONFIG_X86_VSMP is not set
# CONFIG_PARAVIRT_GUEST is not set
CONFIG_MEMTEST=y
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_CPU=y
CONFIG_X86_L1_CACHE_BYTES=128
CONFIG_X86_INTERNODE_CACHE_BYTES=128
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
CONFIG_GART_IOMMU=y
CONFIG_CALGARY_IOMMU=y
# CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT is not set
CONFIG_SWIOTLB=y
CONFIG_IOMMU_HELPER=y
# CONFIG_MAXSMP is not set
CONFIG_NR_CPUS=255
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
# CONFIG_I8K is not set
CONFIG_MICROCODE=m
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_NUMA=y
CONFIG_K8_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_NODES_SPAN_OTHER_NODES=y
CONFIG_NUMA_EMU=y
CONFIG_NODES_SHIFT=6
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_MEMORY_PROBE=y
CONFIG_ILLEGAL_POINTER_VALUE=0xffffc10000000000
CONFIG_SELECT_MEMORY_MODEL=y
# CONFIG_FLATMEM_MANUAL is not set
# CONFIG_DISCONTIGMEM_MANUAL is not set
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_HAVE_GET_USER_PAGES_FAST=y
CONFIG_NEED_MULTIPLE_NODES=y
CONFIG_HAVE_MEMORY_PRESENT=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_MIGRATION=y
CONFIG_RESOURCES_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_UNEVICTABLE_LRU=y
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=0
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
# CONFIG_X86_PAT is not set
# CONFIG_EFI is not set
# CONFIG_SECCOMP is not set
# CONFIG_CC_STACKPROTECTOR is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
# CONFIG_SCHED_HRTICK is not set
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x200000
# CONFIG_RELOCATABLE is not set
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_HOTPLUG_CPU=y
CONFIG_COMPAT_VDSO=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y

#
# Power management options
#
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
CONFIG_PM_SLEEP_SMP=y
CONFIG_PM_SLEEP=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
# CONFIG_HIBERNATION is not set
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
# CONFIG_ACPI_PROCFS is not set
CONFIG_ACPI_PROCFS_POWER=y
CONFIG_ACPI_SYSFS_POWER=y
CONFIG_ACPI_PROC_EVENT=y
CONFIG_ACPI_AC=m
CONFIG_ACPI_BATTERY=m
CONFIG_ACPI_BUTTON=m
CONFIG_ACPI_FAN=y
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_BAY=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_NUMA=y
# CONFIG_ACPI_WMI is not set
CONFIG_ACPI_ASUS=m
CONFIG_ACPI_TOSHIBA=m
# CONFIG_ACPI_CUSTOM_DSDT is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_EC=y
# CONFIG_ACPI_PCI_SLOT is not set
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
CONFIG_ACPI_HOTPLUG_MEMORY=m
CONFIG_ACPI_SBS=m

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
CONFIG_CPU_FREQ_DEBUG=y
CONFIG_CPU_FREQ_STAT=m
CONFIG_CPU_FREQ_STAT_DETAILS=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=m
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=m
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=m

#
# CPUFreq processor drivers
#
CONFIG_X86_ACPI_CPUFREQ=m
CONFIG_X86_POWERNOW_K8=y
CONFIG_X86_POWERNOW_K8_ACPI=y
CONFIG_X86_SPEEDSTEP_CENTRINO=y
# CONFIG_X86_P4_CLOCKMOD is not set

#
# shared options
#
# CONFIG_X86_ACPI_CPUFREQ_PROC_INTF is not set
# CONFIG_X86_SPEEDSTEP_LIB is not set
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
# CONFIG_DMAR is not set
CONFIG_PCIEPORTBUS=y
CONFIG_HOTPLUG_PCI_PCIE=m
CONFIG_PCIEAER=y
# CONFIG_PCIEASPM is not set
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_PCI_MSI=y
CONFIG_PCI_LEGACY=y
# CONFIG_PCI_DEBUG is not set
CONFIG_HT_IRQ=y
CONFIG_ISA_DMA_API=y
CONFIG_K8_NB=y
CONFIG_PCCARD=y
# CONFIG_PCMCIA_DEBUG is not set
CONFIG_PCMCIA=y
CONFIG_PCMCIA_LOAD_CIS=y
CONFIG_PCMCIA_IOCTL=y
CONFIG_CARDBUS=y

#
# PC-card bridges
#
CONFIG_YENTA=y
CONFIG_YENTA_O2=y
CONFIG_YENTA_RICOH=y
CONFIG_YENTA_TI=y
CONFIG_YENTA_ENE_TUNE=y
CONFIG_YENTA_TOSHIBA=y
CONFIG_PD6729=m
# CONFIG_I82092 is not set
CONFIG_PCCARD_NONSTATIC=y
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_FAKE=m
CONFIG_HOTPLUG_PCI_ACPI=m
CONFIG_HOTPLUG_PCI_ACPI_IBM=m
# CONFIG_HOTPLUG_PCI_CPCI is not set
CONFIG_HOTPLUG_PCI_SHPC=m

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
CONFIG_IA32_EMULATION=y
# CONFIG_IA32_AOUT is not set
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y

#
# Networking
#
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
CONFIG_XFRM=y
CONFIG_XFRM_USER=y
# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_XFRM_MIGRATE is not set
# CONFIG_XFRM_STATISTICS is not set
CONFIG_NET_KEY=m
# CONFIG_NET_KEY_MIGRATE is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_ASK_IP_FIB_HASH=y
# CONFIG_IP_FIB_TRIE is not set
CONFIG_IP_FIB_HASH=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
# CONFIG_IP_PNP is not set
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE=m
CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
# CONFIG_ARPD is not set
CONFIG_SYN_COOKIES=y
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_XFRM_MODE_TRANSPORT=m
CONFIG_INET_XFRM_MODE_TUNNEL=m
CONFIG_INET_XFRM_MODE_BEET=y
CONFIG_INET_LRO=m
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=y
CONFIG_TCP_CONG_CUBIC=m
CONFIG_TCP_CONG_WESTWOOD=m
CONFIG_TCP_CONG_HTCP=m
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
CONFIG_TCP_CONG_VEGAS=m
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
CONFIG_TCP_CONG_VENO=m
# CONFIG_TCP_CONG_YEAH is not set
# CONFIG_TCP_CONG_ILLINOIS is not set
CONFIG_DEFAULT_BIC=y
# CONFIG_DEFAULT_CUBIC is not set
# CONFIG_DEFAULT_HTCP is not set
# CONFIG_DEFAULT_VEGAS is not set
# CONFIG_DEFAULT_WESTWOOD is not set
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="bic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IP_VS=m
# CONFIG_IP_VS_DEBUG is not set
CONFIG_IP_VS_TAB_BITS=12

#
# IPVS transport protocol load balancing support
#
CONFIG_IP_VS_PROTO_TCP=y
CONFIG_IP_VS_PROTO_UDP=y
CONFIG_IP_VS_PROTO_ESP=y
CONFIG_IP_VS_PROTO_AH=y

#
# IPVS scheduler
#
CONFIG_IP_VS_RR=m
CONFIG_IP_VS_WRR=m
CONFIG_IP_VS_LC=m
CONFIG_IP_VS_WLC=m
CONFIG_IP_VS_LBLC=m
CONFIG_IP_VS_LBLCR=m
CONFIG_IP_VS_DH=m
CONFIG_IP_VS_SH=m
CONFIG_IP_VS_SED=m
CONFIG_IP_VS_NQ=m

#
# IPVS application helper
#
CONFIG_IP_VS_FTP=m
CONFIG_IPV6=m
CONFIG_IPV6_PRIVACY=y
CONFIG_IPV6_ROUTER_PREF=y
CONFIG_IPV6_ROUTE_INFO=y
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
CONFIG_INET6_IPCOMP=m
# CONFIG_IPV6_MIP6 is not set
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_INET6_XFRM_MODE_TRANSPORT=m
CONFIG_INET6_XFRM_MODE_TUNNEL=m
CONFIG_INET6_XFRM_MODE_BEET=m
# CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION is not set
CONFIG_IPV6_SIT=m
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
CONFIG_IPV6_MULTIPLE_TABLES=y
# CONFIG_IPV6_SUBTREES is not set
# CONFIG_IPV6_MROUTE is not set
CONFIG_NETLABEL=y
CONFIG_NETWORK_SECMARK=y
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=y

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_NETLINK=m
CONFIG_NETFILTER_NETLINK_QUEUE=m
CONFIG_NETFILTER_NETLINK_LOG=m
# CONFIG_NF_CONNTRACK is not set
CONFIG_NETFILTER_XTABLES=m
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
# CONFIG_NETFILTER_XT_TARGET_DSCP is not set
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
# CONFIG_NETFILTER_XT_TARGET_NFLOG is not set
# CONFIG_NETFILTER_XT_TARGET_RATEEST is not set
# CONFIG_NETFILTER_XT_TARGET_TRACE is not set
CONFIG_NETFILTER_XT_TARGET_SECMARK=m
# CONFIG_NETFILTER_XT_TARGET_TCPMSS is not set
# CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP is not set
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_DCCP=m
# CONFIG_NETFILTER_XT_MATCH_DSCP is not set
CONFIG_NETFILTER_XT_MATCH_ESP=m
# CONFIG_NETFILTER_XT_MATCH_IPRANGE is not set
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
CONFIG_NETFILTER_XT_MATCH_LIMIT=m
CONFIG_NETFILTER_XT_MATCH_MAC=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
# CONFIG_NETFILTER_XT_MATCH_OWNER is not set
CONFIG_NETFILTER_XT_MATCH_POLICY=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=m
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
CONFIG_NETFILTER_XT_MATCH_QUOTA=m
# CONFIG_NETFILTER_XT_MATCH_RATEEST is not set
CONFIG_NETFILTER_XT_MATCH_REALM=m
CONFIG_NETFILTER_XT_MATCH_SCTP=m
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
# CONFIG_NETFILTER_XT_MATCH_TIME is not set
# CONFIG_NETFILTER_XT_MATCH_U32 is not set
# CONFIG_NETFILTER_XT_MATCH_HASHLIMIT is not set

#
# IP: Netfilter Configuration
#
CONFIG_IP_NF_QUEUE=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_RECENT=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_AH=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_MATCH_ADDRTYPE=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_TARGET_LOG=m
CONFIG_IP_NF_TARGET_ULOG=m
CONFIG_IP_NF_MANGLE=m
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m

#
# IPv6: Netfilter Configuration
#
CONFIG_IP6_NF_QUEUE=m
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_RT=m
CONFIG_IP6_NF_MATCH_OPTS=m
CONFIG_IP6_NF_MATCH_FRAG=m
CONFIG_IP6_NF_MATCH_HL=m
CONFIG_IP6_NF_MATCH_IPV6HEADER=m
CONFIG_IP6_NF_MATCH_AH=m
# CONFIG_IP6_NF_MATCH_MH is not set
CONFIG_IP6_NF_MATCH_EUI64=m
CONFIG_IP6_NF_FILTER=m
CONFIG_IP6_NF_TARGET_LOG=m
CONFIG_IP6_NF_TARGET_REJECT=m
CONFIG_IP6_NF_MANGLE=m
CONFIG_IP6_NF_TARGET_HL=m
CONFIG_IP6_NF_RAW=m

#
# Bridge: Netfilter Configuration
#
CONFIG_BRIDGE_NF_EBTABLES=m
CONFIG_BRIDGE_EBT_BROUTE=m
CONFIG_BRIDGE_EBT_T_FILTER=m
CONFIG_BRIDGE_EBT_T_NAT=m
CONFIG_BRIDGE_EBT_802_3=m
CONFIG_BRIDGE_EBT_AMONG=m
CONFIG_BRIDGE_EBT_ARP=m
CONFIG_BRIDGE_EBT_IP=m
CONFIG_BRIDGE_EBT_LIMIT=m
CONFIG_BRIDGE_EBT_MARK=m
CONFIG_BRIDGE_EBT_PKTTYPE=m
CONFIG_BRIDGE_EBT_STP=m
CONFIG_BRIDGE_EBT_VLAN=m
CONFIG_BRIDGE_EBT_ARPREPLY=m
CONFIG_BRIDGE_EBT_DNAT=m
CONFIG_BRIDGE_EBT_MARK_T=m
CONFIG_BRIDGE_EBT_REDIRECT=m
CONFIG_BRIDGE_EBT_SNAT=m
CONFIG_BRIDGE_EBT_LOG=m
CONFIG_BRIDGE_EBT_ULOG=m
# CONFIG_BRIDGE_EBT_NFLOG is not set
CONFIG_IP_DCCP=m
CONFIG_INET_DCCP_DIAG=m
CONFIG_IP_DCCP_ACKVEC=y

#
# DCCP CCIDs Configuration (EXPERIMENTAL)
#
CONFIG_IP_DCCP_CCID2=m
# CONFIG_IP_DCCP_CCID2_DEBUG is not set
CONFIG_IP_DCCP_CCID3=m
# CONFIG_IP_DCCP_CCID3_DEBUG is not set
CONFIG_IP_DCCP_CCID3_RTO=100
CONFIG_IP_DCCP_TFRC_LIB=m

#
# DCCP Kernel Hacking
#
# CONFIG_IP_DCCP_DEBUG is not set
# CONFIG_NET_DCCPPROBE is not set
CONFIG_IP_SCTP=m
# CONFIG_SCTP_DBG_MSG is not set
# CONFIG_SCTP_DBG_OBJCNT is not set
# CONFIG_SCTP_HMAC_NONE is not set
# CONFIG_SCTP_HMAC_SHA1 is not set
CONFIG_SCTP_HMAC_MD5=y
CONFIG_TIPC=m
# CONFIG_TIPC_ADVANCED is not set
# CONFIG_TIPC_DEBUG is not set
CONFIG_ATM=m
CONFIG_ATM_CLIP=m
# CONFIG_ATM_CLIP_NO_ICMP is not set
CONFIG_ATM_LANE=m
# CONFIG_ATM_MPOA is not set
CONFIG_ATM_BR2684=m
# CONFIG_ATM_BR2684_IPFILTER is not set
CONFIG_BRIDGE=m
CONFIG_VLAN_8021Q=m
# CONFIG_DECNET is not set
CONFIG_LLC=y
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=m
CONFIG_NET_SCH_ATM=m
CONFIG_NET_SCH_PRIO=m
# CONFIG_NET_SCH_RR is not set
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_NETEM=m
CONFIG_NET_SCH_INGRESS=m

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=m
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_ROUTE=y
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
# CONFIG_NET_CLS_FLOW is not set
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_STACK=32
CONFIG_NET_EMATCH_CMP=m
CONFIG_NET_EMATCH_NBYTE=m
CONFIG_NET_EMATCH_U32=m
CONFIG_NET_EMATCH_META=m
CONFIG_NET_EMATCH_TEXT=m
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
CONFIG_NET_ACT_IPT=m
# CONFIG_NET_ACT_NAT is not set
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
CONFIG_NET_CLS_IND=y
CONFIG_NET_SCH_FIFO=y

#
# Network testing
#
CONFIG_NET_PKTGEN=m
# CONFIG_NET_TCPPROBE is not set
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_IRDA is not set
CONFIG_BT=m
CONFIG_BT_L2CAP=m
CONFIG_BT_SCO=m
CONFIG_BT_RFCOMM=m
CONFIG_BT_RFCOMM_TTY=y
CONFIG_BT_BNEP=m
CONFIG_BT_BNEP_MC_FILTER=y
CONFIG_BT_BNEP_PROTO_FILTER=y
CONFIG_BT_CMTP=m
CONFIG_BT_HIDP=m

#
# Bluetooth device drivers
#
CONFIG_BT_HCIUSB=m
CONFIG_BT_HCIUSB_SCO=y
# CONFIG_BT_HCIBTSDIO is not set
CONFIG_BT_HCIUART=m
CONFIG_BT_HCIUART_H4=y
CONFIG_BT_HCIUART_BCSP=y
# CONFIG_BT_HCIUART_LL is not set
CONFIG_BT_HCIBCM203X=m
CONFIG_BT_HCIBPA10X=m
CONFIG_BT_HCIBFUSB=m
CONFIG_BT_HCIDTL1=m
CONFIG_BT_HCIBT3C=m
CONFIG_BT_HCIBLUECARD=m
CONFIG_BT_HCIBTUART=m
CONFIG_BT_HCIVHCI=m
# CONFIG_AF_RXRPC is not set
CONFIG_FIB_RULES=y

#
# Wireless
#
CONFIG_CFG80211=m
CONFIG_NL80211=y
CONFIG_WIRELESS_EXT=y
CONFIG_MAC80211=m

#
# QoS/HT support disabled
#

#
# QoS/HT support needs CONFIG_NETDEVICES_MULTIQUEUE
#

#
# Rate control algorithm selection
#
CONFIG_MAC80211_RC_DEFAULT_PID=y
# CONFIG_MAC80211_RC_DEFAULT_NONE is not set

#
# Selecting 'y' for an algorithm will
#

#
# build the algorithm into mac80211.
#
CONFIG_MAC80211_RC_DEFAULT="pid"
CONFIG_MAC80211_RC_PID=y
# CONFIG_MAC80211_MESH is not set
CONFIG_MAC80211_LEDS=y
# CONFIG_MAC80211_DEBUGFS is not set
# CONFIG_MAC80211_DEBUG_PACKET_ALIGNMENT is not set
CONFIG_MAC80211_DEBUG=y
# CONFIG_MAC80211_HT_DEBUG is not set
# CONFIG_MAC80211_VERBOSE_DEBUG is not set
# CONFIG_MAC80211_LOWTX_FRAME_DUMP is not set
# CONFIG_TKIP_DEBUG is not set
# CONFIG_MAC80211_DEBUG_COUNTERS is not set
# CONFIG_MAC80211_IBSS_DEBUG is not set
# CONFIG_MAC80211_VERBOSE_PS_DEBUG is not set
CONFIG_IEEE80211=m
# CONFIG_IEEE80211_DEBUG is not set
CONFIG_IEEE80211_CRYPT_WEP=m
CONFIG_IEEE80211_CRYPT_CCMP=m
CONFIG_IEEE80211_CRYPT_TKIP=m
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_BUILTIN_FIRMWARE=""
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_SYS_HYPERVISOR is not set
CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
CONFIG_MTD=m
# CONFIG_MTD_DEBUG is not set
CONFIG_MTD_CONCAT=m
CONFIG_MTD_PARTITIONS=y
CONFIG_MTD_REDBOOT_PARTS=m
CONFIG_MTD_REDBOOT_DIRECTORY_BLOCK=-1
# CONFIG_MTD_REDBOOT_PARTS_UNALLOCATED is not set
# CONFIG_MTD_REDBOOT_PARTS_READONLY is not set
# CONFIG_MTD_AR7_PARTS is not set

#
# User Modules And Translation Layers
#
CONFIG_MTD_CHAR=m
CONFIG_MTD_BLKDEVS=m
CONFIG_MTD_BLOCK=m
CONFIG_MTD_BLOCK_RO=m
CONFIG_FTL=m
CONFIG_NFTL=m
CONFIG_NFTL_RW=y
# CONFIG_INFTL is not set
CONFIG_RFD_FTL=m
# CONFIG_SSFDC is not set
# CONFIG_MTD_OOPS is not set

#
# RAM/ROM/Flash chip drivers
#
CONFIG_MTD_CFI=m
CONFIG_MTD_JEDECPROBE=m
CONFIG_MTD_GEN_PROBE=m
# CONFIG_MTD_CFI_ADV_OPTIONS is not set
CONFIG_MTD_MAP_BANK_WIDTH_1=y
CONFIG_MTD_MAP_BANK_WIDTH_2=y
CONFIG_MTD_MAP_BANK_WIDTH_4=y
# CONFIG_MTD_MAP_BANK_WIDTH_8 is not set
# CONFIG_MTD_MAP_BANK_WIDTH_16 is not set
# CONFIG_MTD_MAP_BANK_WIDTH_32 is not set
CONFIG_MTD_CFI_I1=y
CONFIG_MTD_CFI_I2=y
# CONFIG_MTD_CFI_I4 is not set
# CONFIG_MTD_CFI_I8 is not set
CONFIG_MTD_CFI_INTELEXT=m
CONFIG_MTD_CFI_AMDSTD=m
CONFIG_MTD_CFI_STAA=m
CONFIG_MTD_CFI_UTIL=m
CONFIG_MTD_RAM=m
CONFIG_MTD_ROM=m
CONFIG_MTD_ABSENT=m

#
# Mapping drivers for chip access
#
# CONFIG_MTD_COMPLEX_MAPPINGS is not set
# CONFIG_MTD_PHYSMAP is not set
CONFIG_MTD_SC520CDP=m
CONFIG_MTD_NETSC520=m
CONFIG_MTD_TS5500=m
# CONFIG_MTD_AMD76XROM is not set
# CONFIG_MTD_ICHXROM is not set
# CONFIG_MTD_ESB2ROM is not set
# CONFIG_MTD_CK804XROM is not set
CONFIG_MTD_SCB2_FLASH=m
# CONFIG_MTD_NETtel is not set
# CONFIG_MTD_DILNETPC is not set
# CONFIG_MTD_L440GX is not set
# CONFIG_MTD_INTEL_VR_NOR is not set
# CONFIG_MTD_PLATRAM is not set

#
# Self-contained MTD device drivers
#
# CONFIG_MTD_PMC551 is not set
# CONFIG_MTD_SLRAM is not set
# CONFIG_MTD_PHRAM is not set
CONFIG_MTD_MTDRAM=m
CONFIG_MTDRAM_TOTAL_SIZE=4096
CONFIG_MTDRAM_ERASE_SIZE=128
CONFIG_MTD_BLOCK2MTD=m

#
# Disk-On-Chip Device Drivers
#
# CONFIG_MTD_DOC2000 is not set
# CONFIG_MTD_DOC2001 is not set
# CONFIG_MTD_DOC2001PLUS is not set
CONFIG_MTD_NAND=m
# CONFIG_MTD_NAND_VERIFY_WRITE is not set
CONFIG_MTD_NAND_ECC_SMC=y
# CONFIG_MTD_NAND_MUSEUM_IDS is not set
CONFIG_MTD_NAND_IDS=m
CONFIG_MTD_NAND_DISKONCHIP=m
# CONFIG_MTD_NAND_DISKONCHIP_PROBE_ADVANCED is not set
CONFIG_MTD_NAND_DISKONCHIP_PROBE_ADDRESS=0
# CONFIG_MTD_NAND_DISKONCHIP_BBTWRITE is not set
# CONFIG_MTD_NAND_CAFE is not set
CONFIG_MTD_NAND_NANDSIM=m
# CONFIG_MTD_NAND_PLATFORM is not set
# CONFIG_MTD_ALAUDA is not set
# CONFIG_MTD_ONENAND is not set

#
# UBI - Unsorted block images
#
# CONFIG_MTD_UBI is not set
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
CONFIG_PARPORT_SERIAL=m
# CONFIG_PARPORT_PC_FIFO is not set
# CONFIG_PARPORT_PC_SUPERIO is not set
CONFIG_PARPORT_PC_PCMCIA=m
# CONFIG_PARPORT_GSC is not set
# CONFIG_PARPORT_AX88796 is not set
CONFIG_PARPORT_1284=y
CONFIG_PARPORT_NOT_PC=y
CONFIG_PNP=y
# CONFIG_PNP_DEBUG is not set

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_FD=m
CONFIG_PARIDE=m

#
# Parallel IDE high-level drivers
#
CONFIG_PARIDE_PD=m
CONFIG_PARIDE_PCD=m
CONFIG_PARIDE_PF=m
CONFIG_PARIDE_PT=m
CONFIG_PARIDE_PG=m

#
# Parallel IDE protocol modules
#
CONFIG_PARIDE_ATEN=m
CONFIG_PARIDE_BPCK=m
CONFIG_PARIDE_COMM=m
CONFIG_PARIDE_DSTR=m
CONFIG_PARIDE_FIT2=m
CONFIG_PARIDE_FIT3=m
CONFIG_PARIDE_EPAT=m
CONFIG_PARIDE_EPATC8=y
CONFIG_PARIDE_EPIA=m
CONFIG_PARIDE_FRIQ=m
CONFIG_PARIDE_FRPW=m
CONFIG_PARIDE_KBIC=m
CONFIG_PARIDE_KTTI=m
CONFIG_PARIDE_ON20=m
CONFIG_PARIDE_ON26=m
CONFIG_BLK_CPQ_DA=m
CONFIG_BLK_CPQ_CISS_DA=m
CONFIG_CISS_SCSI_TAPE=y
CONFIG_BLK_DEV_DAC960=m
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_CRYPTOLOOP=m
CONFIG_BLK_DEV_NBD=m
CONFIG_BLK_DEV_SX8=m
# CONFIG_BLK_DEV_UB is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=16384
# CONFIG_BLK_DEV_XIP is not set
CONFIG_CDROM_PKTCDVD=m
CONFIG_CDROM_PKTCDVD_BUFFERS=8
# CONFIG_CDROM_PKTCDVD_WCACHE is not set
CONFIG_ATA_OVER_ETH=m
CONFIG_MISC_DEVICES=y
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
# CONFIG_EEPROM_93CX6 is not set
# CONFIG_SGI_IOC4 is not set
# CONFIG_TIFM_CORE is not set
# CONFIG_ACER_WMI is not set
# CONFIG_ASUS_LAPTOP is not set
# CONFIG_FUJITSU_LAPTOP is not set
# CONFIG_MSI_LAPTOP is not set
# CONFIG_COMPAL_LAPTOP is not set
# CONFIG_SONY_LAPTOP is not set
# CONFIG_THINKPAD_ACPI is not set
# CONFIG_INTEL_MENLOW is not set
# CONFIG_EEEPC_LAPTOP is not set
# CONFIG_ENCLOSURE_SERVICES is not set
CONFIG_HAVE_IDE=y
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y

#
# Please see Documentation/ide/ide.txt for help/info on IDE drives
#
CONFIG_IDE_TIMINGS=y
CONFIG_IDE_ATAPI=y
# CONFIG_BLK_DEV_IDE_SATA is not set
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDECS=m
# CONFIG_BLK_DEV_DELKIN is not set
CONFIG_BLK_DEV_IDECD=m
CONFIG_BLK_DEV_IDECD_VERBOSE_ERRORS=y
# CONFIG_BLK_DEV_IDETAPE is not set
CONFIG_BLK_DEV_IDEFLOPPY=y
CONFIG_BLK_DEV_IDESCSI=m
# CONFIG_BLK_DEV_IDEACPI is not set
CONFIG_IDE_TASK_IOCTL=y
CONFIG_IDE_PROC_FS=y

#
# IDE chipset support/bugfixes
#
CONFIG_IDE_GENERIC=y
# CONFIG_BLK_DEV_PLATFORM is not set
# CONFIG_BLK_DEV_CMD640 is not set
CONFIG_BLK_DEV_IDEPNP=y
CONFIG_BLK_DEV_IDEDMA_SFF=y

#
# PCI IDE chipsets support
#
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_PCIBUS_ORDER=y
# CONFIG_BLK_DEV_OFFBOARD is not set
CONFIG_BLK_DEV_GENERIC=y
# CONFIG_BLK_DEV_OPTI621 is not set
# CONFIG_BLK_DEV_RZ1000 is not set
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_BLK_DEV_AEC62XX=y
CONFIG_BLK_DEV_ALI15X3=y
CONFIG_BLK_DEV_AMD74XX=y
CONFIG_BLK_DEV_ATIIXP=y
CONFIG_BLK_DEV_CMD64X=y
# CONFIG_BLK_DEV_TRIFLEX is not set
# CONFIG_BLK_DEV_CY82C693 is not set
# CONFIG_BLK_DEV_CS5520 is not set
# CONFIG_BLK_DEV_CS5530 is not set
CONFIG_BLK_DEV_HPT34X=y
# CONFIG_HPT34X_AUTODMA is not set
CONFIG_BLK_DEV_HPT366=y
# CONFIG_BLK_DEV_JMICRON is not set
# CONFIG_BLK_DEV_SC1200 is not set
CONFIG_BLK_DEV_PIIX=y
# CONFIG_BLK_DEV_IT8213 is not set
CONFIG_BLK_DEV_IT821X=y
# CONFIG_BLK_DEV_NS87415 is not set
CONFIG_BLK_DEV_PDC202XX_OLD=y
CONFIG_BLK_DEV_PDC202XX_NEW=y
CONFIG_BLK_DEV_SVWKS=y
CONFIG_BLK_DEV_SIIMAGE=y
CONFIG_BLK_DEV_SIS5513=y
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
CONFIG_BLK_DEV_VIA82CXXX=y
# CONFIG_BLK_DEV_TC86C001 is not set
CONFIG_BLK_DEV_IDEDMA=y
# CONFIG_BLK_DEV_HD_ONLY is not set
# CONFIG_BLK_DEV_HD is not set

#
# SCSI device support
#
CONFIG_RAID_ATTRS=m
CONFIG_SCSI=m
CONFIG_SCSI_DMA=y
# CONFIG_SCSI_TGT is not set
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=m
CONFIG_CHR_DEV_ST=m
CONFIG_CHR_DEV_OSST=m
CONFIG_BLK_DEV_SR=m
CONFIG_BLK_DEV_SR_VENDOR=y
CONFIG_CHR_DEV_SG=m
CONFIG_CHR_DEV_SCH=m

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
# CONFIG_SCSI_SCAN_ASYNC is not set
CONFIG_SCSI_WAIT_SCAN=m

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=m
CONFIG_SCSI_FC_ATTRS=m
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
# CONFIG_SCSI_SAS_ATA is not set
CONFIG_SCSI_SAS_HOST_SMP=y
# CONFIG_SCSI_SAS_LIBSAS_DEBUG is not set
CONFIG_SCSI_SRP_ATTRS=m
CONFIG_SCSI_LOWLEVEL=y
CONFIG_ISCSI_TCP=m
CONFIG_BLK_DEV_3W_XXXX_RAID=m
CONFIG_SCSI_3W_9XXX=m
CONFIG_SCSI_ACARD=m
CONFIG_SCSI_AACRAID=m
CONFIG_SCSI_AIC7XXX=m
CONFIG_AIC7XXX_CMDS_PER_DEVICE=4
CONFIG_AIC7XXX_RESET_DELAY_MS=15000
# CONFIG_AIC7XXX_DEBUG_ENABLE is not set
CONFIG_AIC7XXX_DEBUG_MASK=0
# CONFIG_AIC7XXX_REG_PRETTY_PRINT is not set
CONFIG_SCSI_AIC7XXX_OLD=m
CONFIG_SCSI_AIC79XX=m
CONFIG_AIC79XX_CMDS_PER_DEVICE=4
CONFIG_AIC79XX_RESET_DELAY_MS=15000
# CONFIG_AIC79XX_DEBUG_ENABLE is not set
CONFIG_AIC79XX_DEBUG_MASK=0
# CONFIG_AIC79XX_REG_PRETTY_PRINT is not set
CONFIG_SCSI_AIC94XX=m
# CONFIG_AIC94XX_DEBUG is not set
# CONFIG_SCSI_BROADSAS is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
CONFIG_SCSI_ARCMSR=m
# CONFIG_SCSI_ARCMSR_AER is not set
CONFIG_MEGARAID_NEWGEN=y
CONFIG_MEGARAID_MM=m
CONFIG_MEGARAID_MAILBOX=m
CONFIG_MEGARAID_LEGACY=m
CONFIG_MEGARAID_SAS=m
CONFIG_SCSI_HPTIOP=m
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
CONFIG_SCSI_GDTH=m
CONFIG_SCSI_IPS=m
CONFIG_SCSI_INITIO=m
# CONFIG_SCSI_INIA100 is not set
CONFIG_SCSI_PPA=m
CONFIG_SCSI_IMM=m
# CONFIG_SCSI_IZIP_EPP16 is not set
# CONFIG_SCSI_IZIP_SLOW_CTR is not set
# CONFIG_SCSI_MVSAS is not set
CONFIG_SCSI_STEX=m
CONFIG_SCSI_SYM53C8XX_2=m
CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=1
CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16
CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64
CONFIG_SCSI_SYM53C8XX_MMIO=y
# CONFIG_SCSI_IPR is not set
CONFIG_SCSI_QLOGIC_1280=m
CONFIG_SCSI_QLA_FC=m
CONFIG_SCSI_QLA_ISCSI=m
CONFIG_SCSI_LPFC=m
CONFIG_SCSI_DC395x=m
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_DEBUG is not set
# CONFIG_SCSI_SRP is not set
# CONFIG_SCSI_LOWLEVEL_PCMCIA is not set
# CONFIG_SCSI_DH is not set
CONFIG_ATA=m
# CONFIG_ATA_NONSTANDARD is not set
CONFIG_ATA_ACPI=y
CONFIG_SATA_PMP=y
CONFIG_SATA_AHCI=m
CONFIG_SATA_SIL24=m
CONFIG_ATA_SFF=y
CONFIG_SATA_SVW=m
CONFIG_ATA_PIIX=m
CONFIG_SATA_MV=m
CONFIG_SATA_NV=m
CONFIG_PDC_ADMA=m
CONFIG_SATA_QSTOR=m
CONFIG_SATA_PROMISE=m
CONFIG_SATA_SX4=m
CONFIG_SATA_SIL=m
CONFIG_SATA_SIS=m
CONFIG_SATA_ULI=m
CONFIG_SATA_VIA=m
CONFIG_SATA_VITESSE=m
CONFIG_SATA_INIC162X=m
# CONFIG_PATA_ACPI is not set
# CONFIG_PATA_ALI is not set
# CONFIG_PATA_AMD is not set
# CONFIG_PATA_ARTOP is not set
# CONFIG_PATA_ATIIXP is not set
# CONFIG_PATA_CMD640_PCI is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CS5520 is not set
# CONFIG_PATA_CS5530 is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
# CONFIG_ATA_GENERIC is not set
# CONFIG_PATA_HPT366 is not set
# CONFIG_PATA_HPT37X is not set
# CONFIG_PATA_HPT3X2N is not set
# CONFIG_PATA_HPT3X3 is not set
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_IT8213 is not set
# CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_TRIFLEX is not set
CONFIG_PATA_MARVELL=m
# CONFIG_PATA_MPIIX is not set
# CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_NETCELL is not set
# CONFIG_PATA_NINJA32 is not set
# CONFIG_PATA_NS87410 is not set
# CONFIG_PATA_NS87415 is not set
# CONFIG_PATA_OPTI is not set
# CONFIG_PATA_OPTIDMA is not set
# CONFIG_PATA_PCMCIA is not set
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_RADISYS is not set
# CONFIG_PATA_RZ1000 is not set
# CONFIG_PATA_SC1200 is not set
# CONFIG_PATA_SERVERWORKS is not set
CONFIG_PATA_PDC2027X=m
# CONFIG_PATA_SIL680 is not set
CONFIG_PATA_SIS=m
# CONFIG_PATA_VIA is not set
# CONFIG_PATA_WINBOND is not set
# CONFIG_PATA_SCH is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=m
CONFIG_MD_RAID456=m
CONFIG_MD_RAID5_RESHAPE=y
CONFIG_MD_MULTIPATH=m
CONFIG_MD_FAULTY=m
CONFIG_BLK_DEV_DM=m
# CONFIG_DM_DEBUG is not set
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_MIRROR=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
# CONFIG_DM_DELAY is not set
# CONFIG_DM_UEVENT is not set
CONFIG_FUSION=y
CONFIG_FUSION_SPI=m
CONFIG_FUSION_FC=m
CONFIG_FUSION_SAS=m
CONFIG_FUSION_MAX_SGE=40
CONFIG_FUSION_CTL=m
CONFIG_FUSION_LAN=m
# CONFIG_FUSION_LOGGING is not set

#
# IEEE 1394 (FireWire) support
#
CONFIG_FIREWIRE=m
CONFIG_FIREWIRE_OHCI=m
CONFIG_FIREWIRE_OHCI_DEBUG=y
CONFIG_FIREWIRE_SBP2=m
# CONFIG_IEEE1394 is not set
CONFIG_I2O=m
# CONFIG_I2O_LCT_NOTIFY_ON_CHANGES is not set
CONFIG_I2O_EXT_ADAPTEC=y
CONFIG_I2O_EXT_ADAPTEC_DMA64=y
CONFIG_I2O_CONFIG=m
CONFIG_I2O_CONFIG_OLD_IOCTL=y
CONFIG_I2O_BUS=m
CONFIG_I2O_BLOCK=m
CONFIG_I2O_SCSI=m
CONFIG_I2O_PROC=m
# CONFIG_MACINTOSH_DRIVERS is not set
CONFIG_NETDEVICES=y
# CONFIG_NETDEVICES_MULTIQUEUE is not set
CONFIG_IFB=m
CONFIG_DUMMY=m
CONFIG_BONDING=m
# CONFIG_MACVLAN is not set
# CONFIG_EQUALIZER is not set
CONFIG_TUN=m
# CONFIG_VETH is not set
# CONFIG_NET_SB1000 is not set
# CONFIG_ARCNET is not set
CONFIG_PHYLIB=m

#
# MII PHY device drivers
#
CONFIG_MARVELL_PHY=m
CONFIG_DAVICOM_PHY=m
CONFIG_QSEMI_PHY=m
CONFIG_LXT_PHY=m
CONFIG_CICADA_PHY=m
CONFIG_VITESSE_PHY=m
CONFIG_SMSC_PHY=m
# CONFIG_BROADCOM_PHY is not set
# CONFIG_ICPLUS_PHY is not set
# CONFIG_REALTEK_PHY is not set
# CONFIG_MDIO_BITBANG is not set
CONFIG_NET_ETHERNET=y
CONFIG_MII=m
CONFIG_HAPPYMEAL=m
CONFIG_SUNGEM=m
CONFIG_CASSINI=m
CONFIG_NET_VENDOR_3COM=y
CONFIG_VORTEX=m
CONFIG_TYPHOON=m
CONFIG_NET_TULIP=y
CONFIG_DE2104X=m
CONFIG_TULIP=m
# CONFIG_TULIP_MWI is not set
CONFIG_TULIP_MMIO=y
# CONFIG_TULIP_NAPI is not set
CONFIG_DE4X5=m
CONFIG_WINBOND_840=m
CONFIG_DM9102=m
CONFIG_ULI526X=m
CONFIG_PCMCIA_XIRCOM=m
# CONFIG_HP100 is not set
# CONFIG_IBM_NEW_EMAC_ZMII is not set
# CONFIG_IBM_NEW_EMAC_RGMII is not set
# CONFIG_IBM_NEW_EMAC_TAH is not set
# CONFIG_IBM_NEW_EMAC_EMAC4 is not set
CONFIG_NET_PCI=y
CONFIG_PCNET32=m
CONFIG_AMD8111_ETH=m
CONFIG_AMD8111E_NAPI=y
CONFIG_ADAPTEC_STARFIRE=m
CONFIG_ADAPTEC_STARFIRE_NAPI=y
CONFIG_B44=m
CONFIG_B44_PCI_AUTOSELECT=y
CONFIG_B44_PCICORE_AUTOSELECT=y
CONFIG_B44_PCI=y
CONFIG_FORCEDETH=m
# CONFIG_FORCEDETH_NAPI is not set
# CONFIG_EEPRO100 is not set
CONFIG_E100=m
CONFIG_FEALNX=m
CONFIG_NATSEMI=m
CONFIG_NE2K_PCI=m
CONFIG_8139CP=m
CONFIG_8139TOO=m
# CONFIG_8139TOO_PIO is not set
# CONFIG_8139TOO_TUNE_TWISTER is not set
CONFIG_8139TOO_8129=y
# CONFIG_8139_OLD_RX_RESET is not set
# CONFIG_R6040 is not set
CONFIG_SIS900=m
CONFIG_EPIC100=m
# CONFIG_SUNDANCE is not set
# CONFIG_TLAN is not set
CONFIG_VIA_RHINE=m
CONFIG_VIA_RHINE_MMIO=y
CONFIG_VIA_RHINE_NAPI=y
# CONFIG_SC92031 is not set
CONFIG_NET_POCKET=y
# CONFIG_ATP is not set
# CONFIG_DE600 is not set
# CONFIG_DE620 is not set
CONFIG_NETDEV_1000=y
CONFIG_ACENIC=m
# CONFIG_ACENIC_OMIT_TIGON_I is not set
CONFIG_DL2K=m
CONFIG_E1000=m
CONFIG_E1000_NAPI=y
# CONFIG_E1000_DISABLE_PACKET_SPLIT is not set
CONFIG_E1000E=m
CONFIG_E1000E_ENABLED=y
# CONFIG_IP1000 is not set
CONFIG_IGB=m
CONFIG_NS83820=m
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
CONFIG_R8169=m
CONFIG_R8169_NAPI=y
CONFIG_R8169_VLAN=y
CONFIG_SIS190=m
CONFIG_SKGE=m
# CONFIG_SKGE_DEBUG is not set
CONFIG_SKY2=m
# CONFIG_SKY2_DEBUG is not set
CONFIG_VIA_VELOCITY=m
CONFIG_TIGON3=m
CONFIG_BNX2=m
CONFIG_QLA3XXX=m
# CONFIG_ATL1 is not set
CONFIG_NETDEV_10000=y
CONFIG_CHELSIO_T1=m
# CONFIG_CHELSIO_T1_1G is not set
CONFIG_CHELSIO_T1_NAPI=y
CONFIG_CHELSIO_T3=m
# CONFIG_IXGBE is not set
CONFIG_IXGB=m
CONFIG_IXGB_NAPI=y
CONFIG_S2IO=m
CONFIG_S2IO_NAPI=y
CONFIG_MYRI10GE=m
CONFIG_NETXEN_NIC=m
# CONFIG_NIU is not set
# CONFIG_MLX4_CORE is not set
# CONFIG_TEHUTI is not set
# CONFIG_BNX2X is not set
# CONFIG_SFC is not set
CONFIG_TR=y
CONFIG_IBMOL=m
CONFIG_3C359=m
# CONFIG_TMS380TR is not set

#
# Wireless LAN
#
# CONFIG_WLAN_PRE80211 is not set
# CONFIG_WLAN_80211 is not set
# CONFIG_IWLWIFI_LEDS is not set

#
# USB Network Adapters
#
CONFIG_USB_CATC=m
CONFIG_USB_KAWETH=m
# CONFIG_USB_KAWETH_FIRMWARE is not set
CONFIG_USB_PEGASUS=m
CONFIG_USB_RTL8150=m
CONFIG_USB_USBNET=m
CONFIG_USB_NET_AX8817X=m
CONFIG_USB_NET_CDCETHER=m
# CONFIG_USB_NET_DM9601 is not set
CONFIG_USB_NET_GL620A=m
CONFIG_USB_NET_NET1080=m
CONFIG_USB_NET_PLUSB=m
# CONFIG_USB_NET_MCS7830 is not set
CONFIG_USB_NET_RNDIS_HOST=m
CONFIG_USB_NET_CDC_SUBSET=m
CONFIG_USB_ALI_M5632=y
CONFIG_USB_AN2720=y
CONFIG_USB_BELKIN=y
CONFIG_USB_ARMLINUX=y
CONFIG_USB_EPSON2888=y
# CONFIG_USB_KC2190 is not set
CONFIG_USB_NET_ZAURUS=m
CONFIG_NET_PCMCIA=y
CONFIG_PCMCIA_3C589=m
CONFIG_PCMCIA_3C574=m
CONFIG_PCMCIA_FMVJ18X=m
CONFIG_PCMCIA_PCNET=m
CONFIG_PCMCIA_NMCLAN=m
CONFIG_PCMCIA_SMC91C92=m
CONFIG_PCMCIA_XIRC2PS=m
CONFIG_PCMCIA_AXNET=m
# CONFIG_WAN is not set
CONFIG_ATM_DRIVERS=y
# CONFIG_ATM_DUMMY is not set
CONFIG_ATM_TCP=m
CONFIG_ATM_LANAI=m
CONFIG_ATM_ENI=m
# CONFIG_ATM_ENI_DEBUG is not set
# CONFIG_ATM_ENI_TUNE_BURST is not set
CONFIG_ATM_FIRESTREAM=m
# CONFIG_ATM_ZATM is not set
CONFIG_ATM_IDT77252=m
# CONFIG_ATM_IDT77252_DEBUG is not set
# CONFIG_ATM_IDT77252_RCV_ALL is not set
CONFIG_ATM_IDT77252_USE_SUNI=y
CONFIG_ATM_AMBASSADOR=m
# CONFIG_ATM_AMBASSADOR_DEBUG is not set
CONFIG_ATM_HORIZON=m
# CONFIG_ATM_HORIZON_DEBUG is not set
CONFIG_ATM_FORE200E_MAYBE=m
# CONFIG_ATM_FORE200E_PCA is not set
CONFIG_ATM_HE=m
# CONFIG_ATM_HE_USE_SUNI is not set
CONFIG_FDDI=y
# CONFIG_DEFXX is not set
# CONFIG_SKFP is not set
# CONFIG_HIPPI is not set
# CONFIG_PLIP is not set
CONFIG_PPP=m
CONFIG_PPP_MULTILINK=y
CONFIG_PPP_FILTER=y
CONFIG_PPP_ASYNC=m
CONFIG_PPP_SYNC_TTY=m
CONFIG_PPP_DEFLATE=m
# CONFIG_PPP_BSDCOMP is not set
CONFIG_PPP_MPPE=m
CONFIG_PPPOE=m
CONFIG_PPPOATM=m
# CONFIG_PPPOL2TP is not set
CONFIG_SLIP=m
CONFIG_SLIP_COMPRESSED=y
CONFIG_SLHC=m
CONFIG_SLIP_SMART=y
# CONFIG_SLIP_MODE_SLIP6 is not set
CONFIG_NET_FC=y
CONFIG_NETCONSOLE=m
# CONFIG_NETCONSOLE_DYNAMIC is not set
CONFIG_NETPOLL=y
CONFIG_NETPOLL_TRAP=y
CONFIG_NET_POLL_CONTROLLER=y
CONFIG_ISDN=m
CONFIG_ISDN_I4L=m
CONFIG_ISDN_PPP=y
CONFIG_ISDN_PPP_VJ=y
CONFIG_ISDN_MPP=y
CONFIG_IPPP_FILTER=y
# CONFIG_ISDN_PPP_BSDCOMP is not set
CONFIG_ISDN_AUDIO=y
CONFIG_ISDN_TTY_FAX=y

#
# ISDN feature submodules
#
CONFIG_ISDN_DIVERSION=m

#
# ISDN4Linux hardware drivers
#

#
# Passive cards
#
CONFIG_ISDN_DRV_HISAX=m

#
# D-channel protocol features
#
CONFIG_HISAX_EURO=y
CONFIG_DE_AOC=y
CONFIG_HISAX_NO_SENDCOMPLETE=y
CONFIG_HISAX_NO_LLC=y
CONFIG_HISAX_NO_KEYPAD=y
CONFIG_HISAX_1TR6=y
CONFIG_HISAX_NI1=y
CONFIG_HISAX_MAX_CARDS=8

#
# HiSax supported cards
#
CONFIG_HISAX_16_3=y
CONFIG_HISAX_TELESPCI=y
CONFIG_HISAX_S0BOX=y
CONFIG_HISAX_FRITZPCI=y
CONFIG_HISAX_AVM_A1_PCMCIA=y
CONFIG_HISAX_ELSA=y
CONFIG_HISAX_DIEHLDIVA=y
CONFIG_HISAX_SEDLBAUER=y
CONFIG_HISAX_NETJET=y
CONFIG_HISAX_NETJET_U=y
CONFIG_HISAX_NICCY=y
CONFIG_HISAX_BKM_A4T=y
CONFIG_HISAX_SCT_QUADRO=y
CONFIG_HISAX_GAZEL=y
CONFIG_HISAX_HFC_PCI=y
CONFIG_HISAX_W6692=y
CONFIG_HISAX_HFC_SX=y
CONFIG_HISAX_ENTERNOW_PCI=y
# CONFIG_HISAX_DEBUG is not set

#
# HiSax PCMCIA card service modules
#
CONFIG_HISAX_SEDLBAUER_CS=m
CONFIG_HISAX_ELSA_CS=m
CONFIG_HISAX_AVM_A1_CS=m
CONFIG_HISAX_TELES_CS=m

#
# HiSax sub driver modules
#
CONFIG_HISAX_ST5481=m
# CONFIG_HISAX_HFCUSB is not set
CONFIG_HISAX_HFC4S8S=m
CONFIG_HISAX_FRITZ_PCIPNP=m
CONFIG_HISAX_HDLC=y

#
# Active cards
#
# CONFIG_HYSDN is not set
CONFIG_ISDN_DRV_GIGASET=m
CONFIG_GIGASET_BASE=m
CONFIG_GIGASET_M105=m
# CONFIG_GIGASET_M101 is not set
# CONFIG_GIGASET_DEBUG is not set
# CONFIG_GIGASET_UNDOCREQ is not set
CONFIG_ISDN_CAPI=m
CONFIG_ISDN_DRV_AVMB1_VERBOSE_REASON=y
CONFIG_CAPI_TRACE=y
CONFIG_ISDN_CAPI_MIDDLEWARE=y
CONFIG_ISDN_CAPI_CAPI20=m
CONFIG_ISDN_CAPI_CAPIFS_BOOL=y
CONFIG_ISDN_CAPI_CAPIFS=m
CONFIG_ISDN_CAPI_CAPIDRV=m

#
# CAPI hardware drivers
#
CONFIG_CAPI_AVM=y
CONFIG_ISDN_DRV_AVMB1_B1PCI=m
CONFIG_ISDN_DRV_AVMB1_B1PCIV4=y
CONFIG_ISDN_DRV_AVMB1_B1PCMCIA=m
CONFIG_ISDN_DRV_AVMB1_AVM_CS=m
CONFIG_ISDN_DRV_AVMB1_T1PCI=m
CONFIG_ISDN_DRV_AVMB1_C4=m
# CONFIG_CAPI_EICON is not set
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_FF_MEMLESS=y
CONFIG_INPUT_POLLDEV=m

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_JOYDEV=m
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
# CONFIG_MOUSE_PS2_ELANTECH is not set
CONFIG_MOUSE_SERIAL=m
# CONFIG_MOUSE_APPLETOUCH is not set
CONFIG_MOUSE_VSXXXAA=m
CONFIG_INPUT_JOYSTICK=y
# CONFIG_JOYSTICK_ANALOG is not set
# CONFIG_JOYSTICK_A3D is not set
# CONFIG_JOYSTICK_ADI is not set
# CONFIG_JOYSTICK_COBRA is not set
# CONFIG_JOYSTICK_GF2K is not set
# CONFIG_JOYSTICK_GRIP is not set
# CONFIG_JOYSTICK_GRIP_MP is not set
# CONFIG_JOYSTICK_GUILLEMOT is not set
# CONFIG_JOYSTICK_INTERACT is not set
# CONFIG_JOYSTICK_SIDEWINDER is not set
# CONFIG_JOYSTICK_TMDC is not set
# CONFIG_JOYSTICK_IFORCE is not set
# CONFIG_JOYSTICK_WARRIOR is not set
# CONFIG_JOYSTICK_MAGELLAN is not set
# CONFIG_JOYSTICK_SPACEORB is not set
# CONFIG_JOYSTICK_SPACEBALL is not set
# CONFIG_JOYSTICK_STINGER is not set
CONFIG_JOYSTICK_TWIDJOY=m
# CONFIG_JOYSTICK_ZHENHUA is not set
# CONFIG_JOYSTICK_DB9 is not set
# CONFIG_JOYSTICK_GAMECON is not set
# CONFIG_JOYSTICK_TURBOGRAFX is not set
CONFIG_JOYSTICK_JOYDUMP=m
# CONFIG_JOYSTICK_XPAD is not set
# CONFIG_INPUT_TABLET is not set
CONFIG_INPUT_TOUCHSCREEN=y
# CONFIG_TOUCHSCREEN_FUJITSU is not set
CONFIG_TOUCHSCREEN_GUNZE=m
CONFIG_TOUCHSCREEN_ELO=m
CONFIG_TOUCHSCREEN_MTOUCH=m
CONFIG_TOUCHSCREEN_MK712=m
# CONFIG_TOUCHSCREEN_PENMOUNT is not set
# CONFIG_TOUCHSCREEN_TOUCHRIGHT is not set
# CONFIG_TOUCHSCREEN_TOUCHWIN is not set
# CONFIG_TOUCHSCREEN_UCB1400 is not set
# CONFIG_TOUCHSCREEN_WM97XX is not set
# CONFIG_TOUCHSCREEN_USB_COMPOSITE is not set
CONFIG_INPUT_MISC=y
CONFIG_INPUT_PCSPKR=m
# CONFIG_INPUT_APANEL is not set
# CONFIG_INPUT_ATLAS_BTNS is not set
# CONFIG_INPUT_ATI_REMOTE is not set
# CONFIG_INPUT_ATI_REMOTE2 is not set
# CONFIG_INPUT_KEYSPAN_REMOTE is not set
# CONFIG_INPUT_APPLEIR is not set
# CONFIG_INPUT_POWERMATE is not set
# CONFIG_INPUT_YEALINK is not set
CONFIG_INPUT_UINPUT=m

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PARKBD is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
CONFIG_SERIO_RAW=m
CONFIG_GAMEPORT=m
CONFIG_GAMEPORT_NS558=m
CONFIG_GAMEPORT_L4=m
CONFIG_GAMEPORT_EMU10K1=m
CONFIG_GAMEPORT_FM801=m

#
# Character devices
#
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_DEVKMEM=y
CONFIG_SERIAL_NONSTANDARD=y
# CONFIG_COMPUTONE is not set
# CONFIG_ROCKETPORT is not set
CONFIG_CYCLADES=m
# CONFIG_CYZ_INTR is not set
# CONFIG_DIGIEPCA is not set
# CONFIG_MOXA_INTELLIO is not set
# CONFIG_MOXA_SMARTIO is not set
# CONFIG_ISI is not set
CONFIG_SYNCLINK=m
CONFIG_SYNCLINKMP=m
CONFIG_SYNCLINK_GT=m
CONFIG_N_HDLC=m
# CONFIG_RISCOM8 is not set
# CONFIG_SPECIALIX is not set
# CONFIG_SX is not set
# CONFIG_RIO is not set
# CONFIG_STALDRV is not set
# CONFIG_NOZOMI is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_CS=m
CONFIG_SERIAL_8250_NR_UARTS=32
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
CONFIG_SERIAL_8250_DETECT_IRQ=y
CONFIG_SERIAL_8250_RSA=y

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_SERIAL_JSM=m
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set
CONFIG_PRINTER=m
CONFIG_LP_CONSOLE=y
CONFIG_PPDEV=m
CONFIG_IPMI_HANDLER=m
# CONFIG_IPMI_PANIC_EVENT is not set
CONFIG_IPMI_DEVICE_INTERFACE=m
CONFIG_IPMI_SI=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=m
CONFIG_HW_RANDOM=y
CONFIG_HW_RANDOM_INTEL=m
CONFIG_HW_RANDOM_AMD=m
CONFIG_NVRAM=y
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set

#
# PCMCIA character devices
#
# CONFIG_SYNCLINK_CS is not set
CONFIG_CARDMAN_4000=m
CONFIG_CARDMAN_4040=m
# CONFIG_IPWIRELESS is not set
# CONFIG_MWAVE is not set
CONFIG_PC8736x_GPIO=m
CONFIG_NSC_GPIO=m
CONFIG_RAW_DRIVER=y
CONFIG_MAX_RAW_DEVS=8192
CONFIG_HPET=y
# CONFIG_HPET_MMAP is not set
CONFIG_HANGCHECK_TIMER=m
# CONFIG_TCG_TPM is not set
# CONFIG_TELCLOCK is not set
CONFIG_DEVPORT=y
CONFIG_I2C=m
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_ALGOBIT=m

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
CONFIG_I2C_AMD756=m
CONFIG_I2C_AMD756_S4882=m
CONFIG_I2C_AMD8111=m
CONFIG_I2C_I801=m
# CONFIG_I2C_PIIX4 is not set
CONFIG_I2C_NFORCE2=m
# CONFIG_I2C_NFORCE2_S4985 is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
CONFIG_I2C_SIS96X=m
CONFIG_I2C_VIA=m
CONFIG_I2C_VIAPRO=m

#
# Graphics adapter I2C/DDC channel drivers
#
CONFIG_I2C_VOODOO3=m

#
# External I2C/SMBus adapter drivers
#
CONFIG_I2C_PARPORT=m
CONFIG_I2C_PARPORT_LIGHT=m
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set

#
# Other I2C/SMBus bus drivers
#
# CONFIG_I2C_ISCH is not set
# CONFIG_I2C_OCORES is not set
# CONFIG_I2C_SIMTEC is not set
CONFIG_I2C_STUB=m
# CONFIG_I2C_PCA_PLATFORM is not set

#
# Miscellaneous I2C Chip support
#
# CONFIG_DS1682 is not set
CONFIG_SENSORS_EEPROM=m
CONFIG_SENSORS_PCF8574=m
# CONFIG_PCF8575 is not set
CONFIG_SENSORS_PCF8591=m
CONFIG_SENSORS_MAX6875=m
# CONFIG_SENSORS_TSL2550 is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_I2C_DEBUG_CHIP is not set
# CONFIG_SPI is not set
# CONFIG_W1 is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
# CONFIG_PDA_POWER is not set
# CONFIG_BATTERY_DS2760 is not set
CONFIG_HWMON=m
CONFIG_HWMON_VID=m
CONFIG_SENSORS_ABITUGURU=m
# CONFIG_SENSORS_ABITUGURU3 is not set
# CONFIG_SENSORS_AD7418 is not set
CONFIG_SENSORS_ADM1021=m
CONFIG_SENSORS_ADM1025=m
CONFIG_SENSORS_ADM1026=m
# CONFIG_SENSORS_ADM1029 is not set
CONFIG_SENSORS_ADM1031=m
CONFIG_SENSORS_ADM9240=m
# CONFIG_SENSORS_ADT7470 is not set
# CONFIG_SENSORS_ADT7473 is not set
CONFIG_SENSORS_K8TEMP=m
CONFIG_SENSORS_ASB100=m
CONFIG_SENSORS_ATXP1=m
CONFIG_SENSORS_DS1621=m
# CONFIG_SENSORS_I5K_AMB is not set
CONFIG_SENSORS_F71805F=m
# CONFIG_SENSORS_F71882FG is not set
# CONFIG_SENSORS_F75375S is not set
CONFIG_SENSORS_FSCHER=m
CONFIG_SENSORS_FSCPOS=m
# CONFIG_SENSORS_FSCHMD is not set
CONFIG_SENSORS_GL518SM=m
CONFIG_SENSORS_GL520SM=m
# CONFIG_SENSORS_CORETEMP is not set
# CONFIG_SENSORS_IBMAEM is not set
# CONFIG_SENSORS_IBMPEX is not set
CONFIG_SENSORS_IT87=m
CONFIG_SENSORS_LM63=m
CONFIG_SENSORS_LM75=m
CONFIG_SENSORS_LM77=m
CONFIG_SENSORS_LM78=m
CONFIG_SENSORS_LM80=m
CONFIG_SENSORS_LM83=m
CONFIG_SENSORS_LM85=m
CONFIG_SENSORS_LM87=m
CONFIG_SENSORS_LM90=m
CONFIG_SENSORS_LM92=m
# CONFIG_SENSORS_LM93 is not set
CONFIG_SENSORS_MAX1619=m
# CONFIG_SENSORS_MAX6650 is not set
CONFIG_SENSORS_PC87360=m
# CONFIG_SENSORS_PC87427 is not set
CONFIG_SENSORS_SIS5595=m
# CONFIG_SENSORS_DME1737 is not set
CONFIG_SENSORS_SMSC47M1=m
CONFIG_SENSORS_SMSC47M192=m
CONFIG_SENSORS_SMSC47B397=m
# CONFIG_SENSORS_ADS7828 is not set
# CONFIG_SENSORS_THMC50 is not set
CONFIG_SENSORS_VIA686A=m
# CONFIG_SENSORS_VT1211 is not set
CONFIG_SENSORS_VT8231=m
CONFIG_SENSORS_W83781D=m
CONFIG_SENSORS_W83791D=m
CONFIG_SENSORS_W83792D=m
# CONFIG_SENSORS_W83793 is not set
CONFIG_SENSORS_W83L785TS=m
# CONFIG_SENSORS_W83L786NG is not set
CONFIG_SENSORS_W83627HF=m
CONFIG_SENSORS_W83627EHF=m
CONFIG_SENSORS_HDAPS=m
# CONFIG_SENSORS_APPLESMC is not set
# CONFIG_HWMON_DEBUG_CHIP is not set
CONFIG_THERMAL=y
CONFIG_WATCHDOG=y
# CONFIG_WATCHDOG_NOWAYOUT is not set

#
# Watchdog Device Drivers
#
CONFIG_SOFT_WATCHDOG=m
# CONFIG_ACQUIRE_WDT is not set
# CONFIG_ADVANTECH_WDT is not set
CONFIG_ALIM1535_WDT=m
CONFIG_ALIM7101_WDT=m
# CONFIG_SC520_WDT is not set
# CONFIG_EUROTECH_WDT is not set
# CONFIG_IB700_WDT is not set
CONFIG_IBMASR=m
# CONFIG_WAFER_WDT is not set
CONFIG_I6300ESB_WDT=m
# CONFIG_ITCO_WDT is not set
# CONFIG_IT8712F_WDT is not set
# CONFIG_HP_WATCHDOG is not set
# CONFIG_SC1200_WDT is not set
# CONFIG_PC87413_WDT is not set
# CONFIG_60XX_WDT is not set
# CONFIG_SBC8360_WDT is not set
# CONFIG_CPU5_WDT is not set
# CONFIG_SMSC37B787_WDT is not set
CONFIG_W83627HF_WDT=m
# CONFIG_W83697HF_WDT is not set
CONFIG_W83877F_WDT=m
CONFIG_W83977F_WDT=m
CONFIG_MACHZ_WDT=m
# CONFIG_SBC_EPX_C3_WATCHDOG is not set

#
# PCI-based Watchdog Cards
#
CONFIG_PCIPCWATCHDOG=m
CONFIG_WDTPCI=m
CONFIG_WDT_501_PCI=y

#
# USB-based Watchdog Cards
#
CONFIG_USBPCWATCHDOG=m

#
# Sonics Silicon Backplane
#
CONFIG_SSB_POSSIBLE=y
CONFIG_SSB=m
CONFIG_SSB_SPROM=y
CONFIG_SSB_PCIHOST_POSSIBLE=y
CONFIG_SSB_PCIHOST=y
# CONFIG_SSB_B43_PCI_BRIDGE is not set
CONFIG_SSB_PCMCIAHOST_POSSIBLE=y
# CONFIG_SSB_PCMCIAHOST is not set
# CONFIG_SSB_DEBUG is not set
CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y
CONFIG_SSB_DRIVER_PCICORE=y

#
# Multifunction device drivers
#
# CONFIG_MFD_SM501 is not set
# CONFIG_HTC_PASIC3 is not set

#
# Multimedia devices
#

#
# Multimedia core support
#
CONFIG_VIDEO_DEV=m
CONFIG_VIDEO_V4L2_COMMON=m
CONFIG_VIDEO_ALLOW_V4L1=y
CONFIG_VIDEO_V4L1_COMPAT=y
# CONFIG_DVB_CORE is not set
CONFIG_VIDEO_MEDIA=m

#
# Multimedia drivers
#
# CONFIG_MEDIA_ATTACH is not set
CONFIG_MEDIA_TUNER=m
# CONFIG_MEDIA_TUNER_CUSTOMIZE is not set
CONFIG_MEDIA_TUNER_SIMPLE=m
CONFIG_MEDIA_TUNER_TDA8290=m
CONFIG_MEDIA_TUNER_TDA9887=m
CONFIG_MEDIA_TUNER_TEA5761=m
CONFIG_MEDIA_TUNER_TEA5767=m
CONFIG_MEDIA_TUNER_MT20XX=m
CONFIG_MEDIA_TUNER_XC2028=m
CONFIG_MEDIA_TUNER_XC5000=m
CONFIG_VIDEO_V4L2=m
CONFIG_VIDEO_V4L1=m
CONFIG_VIDEOBUF_GEN=m
CONFIG_VIDEOBUF_DMA_SG=m
CONFIG_VIDEOBUF_VMALLOC=m
CONFIG_VIDEO_BTCX=m
CONFIG_VIDEO_IR_I2C=m
CONFIG_VIDEO_IR=m
CONFIG_VIDEO_TVEEPROM=m
CONFIG_VIDEO_TUNER=m
CONFIG_VIDEO_CAPTURE_DRIVERS=y
# CONFIG_VIDEO_ADV_DEBUG is not set
CONFIG_VIDEO_HELPER_CHIPS_AUTO=y
CONFIG_VIDEO_TVAUDIO=m
CONFIG_VIDEO_TDA7432=m
CONFIG_VIDEO_TDA9875=m
CONFIG_VIDEO_MSP3400=m
CONFIG_VIDEO_CS53L32A=m
CONFIG_VIDEO_WM8775=m
CONFIG_VIDEO_SAA711X=m
CONFIG_VIDEO_TVP5150=m
CONFIG_VIDEO_CX25840=m
CONFIG_VIDEO_CX2341X=m
# CONFIG_VIDEO_VIVI is not set
CONFIG_VIDEO_BT848=m
CONFIG_VIDEO_SAA6588=m
# CONFIG_VIDEO_BWQCAM is not set
# CONFIG_VIDEO_CQCAM is not set
# CONFIG_VIDEO_W9966 is not set
# CONFIG_VIDEO_CPIA is not set
CONFIG_VIDEO_CPIA2=m
# CONFIG_VIDEO_SAA5246A is not set
# CONFIG_VIDEO_SAA5249 is not set
# CONFIG_TUNER_3036 is not set
# CONFIG_VIDEO_STRADIS is not set
# CONFIG_VIDEO_ZORAN is not set
# CONFIG_VIDEO_SAA7134 is not set
# CONFIG_VIDEO_MXB is not set
# CONFIG_VIDEO_DPC is not set
# CONFIG_VIDEO_HEXIUM_ORION is not set
# CONFIG_VIDEO_HEXIUM_GEMINI is not set
# CONFIG_VIDEO_CX88 is not set
# CONFIG_VIDEO_IVTV is not set
# CONFIG_VIDEO_CAFE_CCIC is not set
CONFIG_V4L_USB_DRIVERS=y
CONFIG_VIDEO_PVRUSB2=m
CONFIG_VIDEO_PVRUSB2_SYSFS=y
# CONFIG_VIDEO_PVRUSB2_DEBUGIFC is not set
CONFIG_VIDEO_EM28XX=m
# CONFIG_VIDEO_EM28XX_ALSA is not set
# CONFIG_VIDEO_USBVISION is not set
CONFIG_VIDEO_USBVIDEO=m
CONFIG_USB_VICAM=m
CONFIG_USB_IBMCAM=m
CONFIG_USB_KONICAWC=m
CONFIG_USB_QUICKCAM_MESSENGER=m
CONFIG_USB_ET61X251=m
CONFIG_VIDEO_OVCAMCHIP=m
CONFIG_USB_W9968CF=m
CONFIG_USB_OV511=m
CONFIG_USB_SE401=m
CONFIG_USB_SN9C102=m
CONFIG_USB_STV680=m
CONFIG_USB_ZC0301=m
CONFIG_USB_PWC=m
# CONFIG_USB_PWC_DEBUG is not set
# CONFIG_USB_ZR364XX is not set
# CONFIG_USB_STKWEBCAM is not set
# CONFIG_USB_S2255 is not set
# CONFIG_SOC_CAMERA is not set
CONFIG_RADIO_ADAPTERS=y
# CONFIG_RADIO_GEMTEK_PCI is not set
# CONFIG_RADIO_MAXIRADIO is not set
# CONFIG_RADIO_MAESTRO is not set
CONFIG_USB_DSBR=m
# CONFIG_USB_SI470X is not set
# CONFIG_DAB is not set

#
# Graphics support
#
CONFIG_AGP=y
CONFIG_AGP_AMD64=y
CONFIG_AGP_INTEL=y
CONFIG_AGP_SIS=y
CONFIG_AGP_VIA=y
CONFIG_DRM=m
# CONFIG_DRM_TDFX is not set
CONFIG_DRM_R128=m
CONFIG_DRM_RADEON=m
CONFIG_DRM_I810=m
CONFIG_DRM_I830=m
CONFIG_DRM_I915=m
CONFIG_DRM_MGA=m
# CONFIG_DRM_SIS is not set
CONFIG_DRM_VIA=m
CONFIG_DRM_SAVAGE=m
CONFIG_VGASTATE=m
# CONFIG_VIDEO_OUTPUT_CONTROL is not set
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_DDC=m
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
# CONFIG_FB_SYS_FILLRECT is not set
# CONFIG_FB_SYS_COPYAREA is not set
# CONFIG_FB_SYS_IMAGEBLIT is not set
# CONFIG_FB_FOREIGN_ENDIAN is not set
# CONFIG_FB_SYS_FOPS is not set
# CONFIG_FB_SVGALIB is not set
# CONFIG_FB_MACMODES is not set
CONFIG_FB_BACKLIGHT=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
CONFIG_FB_CIRRUS=m
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
CONFIG_FB_VGA16=m
# CONFIG_FB_UVESA is not set
CONFIG_FB_VESA=y
# CONFIG_FB_EFI is not set
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_S1D13XXX is not set
CONFIG_FB_NVIDIA=m
CONFIG_FB_NVIDIA_I2C=y
# CONFIG_FB_NVIDIA_DEBUG is not set
CONFIG_FB_NVIDIA_BACKLIGHT=y
CONFIG_FB_RIVA=m
# CONFIG_FB_RIVA_I2C is not set
# CONFIG_FB_RIVA_DEBUG is not set
CONFIG_FB_RIVA_BACKLIGHT=y
# CONFIG_FB_LE80578 is not set
CONFIG_FB_INTEL=m
# CONFIG_FB_INTEL_DEBUG is not set
CONFIG_FB_INTEL_I2C=y
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
CONFIG_FB_SAVAGE=m
CONFIG_FB_SAVAGE_I2C=y
CONFIG_FB_SAVAGE_ACCEL=y
# CONFIG_FB_SIS is not set
# CONFIG_FB_NEOMAGIC is not set
CONFIG_FB_KYRO=m
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_GEODE is not set
# CONFIG_FB_VIRTUAL is not set
CONFIG_BACKLIGHT_LCD_SUPPORT=y
CONFIG_LCD_CLASS_DEVICE=m
CONFIG_BACKLIGHT_CLASS_DEVICE=y
# CONFIG_BACKLIGHT_CORGI is not set
# CONFIG_BACKLIGHT_PROGEAR is not set

#
# Display device support
#
# CONFIG_DISPLAY_SUPPORT is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64
CONFIG_VIDEO_SELECT=y
CONFIG_DUMMY_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
# CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY is not set
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_LOGO_LINUX_CLUT224=y
CONFIG_SOUND=m
CONFIG_SND=m
CONFIG_SND_TIMER=m
CONFIG_SND_PCM=m
CONFIG_SND_HWDEP=m
CONFIG_SND_RAWMIDI=m
CONFIG_SND_SEQUENCER=m
CONFIG_SND_SEQ_DUMMY=m
CONFIG_SND_OSSEMUL=y
CONFIG_SND_MIXER_OSS=m
CONFIG_SND_PCM_OSS=m
CONFIG_SND_PCM_OSS_PLUGINS=y
CONFIG_SND_SEQUENCER_OSS=y
CONFIG_SND_DYNAMIC_MINORS=y
# CONFIG_SND_SUPPORT_OLD_API is not set
CONFIG_SND_VERBOSE_PROCFS=y
# CONFIG_SND_VERBOSE_PRINTK is not set
# CONFIG_SND_DEBUG is not set
CONFIG_SND_VMASTER=y
CONFIG_SND_MPU401_UART=m
CONFIG_SND_OPL3_LIB=m
CONFIG_SND_VX_LIB=m
CONFIG_SND_AC97_CODEC=m
CONFIG_SND_DRIVERS=y
CONFIG_SND_DUMMY=m
CONFIG_SND_VIRMIDI=m
CONFIG_SND_MTPAV=m
# CONFIG_SND_MTS64 is not set
# CONFIG_SND_SERIAL_U16550 is not set
CONFIG_SND_MPU401=m
# CONFIG_SND_PORTMAN2X4 is not set
# CONFIG_SND_AC97_POWER_SAVE is not set
CONFIG_SND_SB_COMMON=m
CONFIG_SND_PCI=y
CONFIG_SND_AD1889=m
CONFIG_SND_ALS300=m
CONFIG_SND_ALS4000=m
CONFIG_SND_ALI5451=m
CONFIG_SND_ATIIXP=m
CONFIG_SND_ATIIXP_MODEM=m
CONFIG_SND_AU8810=m
CONFIG_SND_AU8820=m
CONFIG_SND_AU8830=m
# CONFIG_SND_AW2 is not set
CONFIG_SND_AZT3328=m
CONFIG_SND_BT87X=m
# CONFIG_SND_BT87X_OVERCLOCK is not set
CONFIG_SND_CA0106=m
CONFIG_SND_CMIPCI=m
# CONFIG_SND_OXYGEN is not set
CONFIG_SND_CS4281=m
CONFIG_SND_CS46XX=m
CONFIG_SND_CS46XX_NEW_DSP=y
# CONFIG_SND_CS5530 is not set
CONFIG_SND_DARLA20=m
CONFIG_SND_GINA20=m
CONFIG_SND_LAYLA20=m
CONFIG_SND_DARLA24=m
CONFIG_SND_GINA24=m
CONFIG_SND_LAYLA24=m
CONFIG_SND_MONA=m
CONFIG_SND_MIA=m
CONFIG_SND_ECHO3G=m
CONFIG_SND_INDIGO=m
CONFIG_SND_INDIGOIO=m
CONFIG_SND_INDIGODJ=m
CONFIG_SND_EMU10K1=m
CONFIG_SND_EMU10K1X=m
CONFIG_SND_ENS1370=m
CONFIG_SND_ENS1371=m
CONFIG_SND_ES1938=m
CONFIG_SND_ES1968=m
CONFIG_SND_FM801=m
CONFIG_SND_FM801_TEA575X_BOOL=y
CONFIG_SND_FM801_TEA575X=m
CONFIG_SND_HDA_INTEL=m
# CONFIG_SND_HDA_HWDEP is not set
CONFIG_SND_HDA_CODEC_REALTEK=y
CONFIG_SND_HDA_CODEC_ANALOG=y
CONFIG_SND_HDA_CODEC_SIGMATEL=y
CONFIG_SND_HDA_CODEC_VIA=y
CONFIG_SND_HDA_CODEC_ATIHDMI=y
CONFIG_SND_HDA_CODEC_CONEXANT=y
CONFIG_SND_HDA_CODEC_CMEDIA=y
CONFIG_SND_HDA_CODEC_SI3054=y
CONFIG_SND_HDA_GENERIC=y
# CONFIG_SND_HDA_POWER_SAVE is not set
CONFIG_SND_HDSP=m
CONFIG_SND_HDSPM=m
# CONFIG_SND_HIFIER is not set
CONFIG_SND_ICE1712=m
CONFIG_SND_ICE1724=m
CONFIG_SND_INTEL8X0=m
CONFIG_SND_INTEL8X0M=m
CONFIG_SND_KORG1212=m
# CONFIG_SND_KORG1212_FIRMWARE_IN_KERNEL is not set
CONFIG_SND_MAESTRO3=m
# CONFIG_SND_MAESTRO3_FIRMWARE_IN_KERNEL is not set
CONFIG_SND_MIXART=m
CONFIG_SND_NM256=m
CONFIG_SND_PCXHR=m
CONFIG_SND_RIPTIDE=m
CONFIG_SND_RME32=m
CONFIG_SND_RME96=m
CONFIG_SND_RME9652=m
CONFIG_SND_SONICVIBES=m
CONFIG_SND_TRIDENT=m
CONFIG_SND_VIA82XX=m
CONFIG_SND_VIA82XX_MODEM=m
# CONFIG_SND_VIRTUOSO is not set
CONFIG_SND_VX222=m
CONFIG_SND_YMFPCI=m
# CONFIG_SND_YMFPCI_FIRMWARE_IN_KERNEL is not set
CONFIG_SND_USB=y
CONFIG_SND_USB_AUDIO=m
CONFIG_SND_USB_USX2Y=m
# CONFIG_SND_USB_CAIAQ is not set
CONFIG_SND_PCMCIA=y
# CONFIG_SND_VXPOCKET is not set
# CONFIG_SND_PDAUDIOCF is not set
# CONFIG_SND_SOC is not set
# CONFIG_SOUND_PRIME is not set
CONFIG_AC97_BUS=m
CONFIG_HID_SUPPORT=y
CONFIG_HID=y
CONFIG_HID_DEBUG=y
# CONFIG_HIDRAW is not set

#
# USB Input Devices
#
CONFIG_USB_HID=y
# CONFIG_USB_HIDINPUT_POWERBOOK is not set
CONFIG_HID_FF=y
CONFIG_HID_PID=y
CONFIG_LOGITECH_FF=y
# CONFIG_LOGIRUMBLEPAD2_FF is not set
# CONFIG_PANTHERLORD_FF is not set
CONFIG_THRUSTMASTER_FF=y
# CONFIG_ZEROPLUS_FF is not set
CONFIG_USB_HIDDEV=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set
# CONFIG_USB_ANNOUNCE_NEW_DEVICES is not set

#
# Miscellaneous USB options
#
CONFIG_USB_DEVICEFS=y
CONFIG_USB_DEVICE_CLASS=y
# CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_SUSPEND is not set
# CONFIG_USB_OTG is not set
# CONFIG_USB_WUSB is not set

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
CONFIG_USB_EHCI_HCD=m
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_TT_NEWSCHED=y
CONFIG_USB_ISP116X_HCD=m
# CONFIG_USB_ISP1760_HCD is not set
CONFIG_USB_OHCI_HCD=m
# CONFIG_USB_OHCI_HCD_SSB is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_DESC is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_MMIO is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=m
CONFIG_USB_SL811_HCD=m
CONFIG_USB_SL811_CS=m
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_WHCI_HCD is not set
# CONFIG_USB_HWA_HCD is not set

#
# USB Device Class drivers
#
CONFIG_USB_ACM=m
CONFIG_USB_PRINTER=m
# CONFIG_USB_WDM is not set

#
# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support'
#

#
# may also be needed; see USB_STORAGE Help for more information
#
CONFIG_USB_STORAGE=m
# CONFIG_USB_STORAGE_DEBUG is not set
CONFIG_USB_STORAGE_DATAFAB=y
CONFIG_USB_STORAGE_FREECOM=y
CONFIG_USB_STORAGE_ISD200=y
CONFIG_USB_STORAGE_DPCM=y
CONFIG_USB_STORAGE_USBAT=y
CONFIG_USB_STORAGE_SDDR09=y
CONFIG_USB_STORAGE_SDDR55=y
CONFIG_USB_STORAGE_JUMPSHOT=y
CONFIG_USB_STORAGE_ALAUDA=y
# CONFIG_USB_STORAGE_ONETOUCH is not set
# CONFIG_USB_STORAGE_KARMA is not set
# CONFIG_USB_STORAGE_CYPRESS_ATACB is not set
# CONFIG_USB_LIBUSUAL is not set

#
# USB Imaging devices
#
CONFIG_USB_MDC800=m
CONFIG_USB_MICROTEK=m
CONFIG_USB_MON=y

#
# USB port drivers
#
CONFIG_USB_USS720=m
CONFIG_USB_SERIAL=m
CONFIG_USB_EZUSB=y
CONFIG_USB_SERIAL_GENERIC=y
# CONFIG_USB_SERIAL_AIRCABLE is not set
CONFIG_USB_SERIAL_AIRPRIME=m
CONFIG_USB_SERIAL_ARK3116=m
CONFIG_USB_SERIAL_BELKIN=m
# CONFIG_USB_SERIAL_CH341 is not set
CONFIG_USB_SERIAL_WHITEHEAT=m
# CONFIG_USB_SERIAL_WHITEHEAT_FIRMWARE is not set
CONFIG_USB_SERIAL_DIGI_ACCELEPORT=m
CONFIG_USB_SERIAL_CP2101=m
CONFIG_USB_SERIAL_CYPRESS_M8=m
CONFIG_USB_SERIAL_EMPEG=m
CONFIG_USB_SERIAL_FTDI_SIO=m
CONFIG_USB_SERIAL_FUNSOFT=m
CONFIG_USB_SERIAL_VISOR=m
CONFIG_USB_SERIAL_IPAQ=m
CONFIG_USB_SERIAL_IR=m
CONFIG_USB_SERIAL_EDGEPORT=m
CONFIG_USB_SERIAL_EDGEPORT_TI=m
CONFIG_USB_SERIAL_GARMIN=m
CONFIG_USB_SERIAL_IPW=m
# CONFIG_USB_SERIAL_IUU is not set
CONFIG_USB_SERIAL_KEYSPAN_PDA=m
# CONFIG_USB_SERIAL_KEYSPAN_PDA_FIRMWARE is not set
CONFIG_USB_SERIAL_KEYSPAN=m
CONFIG_USB_SERIAL_KEYSPAN_MPR=y
CONFIG_USB_SERIAL_KEYSPAN_USA28=y
CONFIG_USB_SERIAL_KEYSPAN_USA28X=y
CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y
CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y
CONFIG_USB_SERIAL_KEYSPAN_USA19=y
CONFIG_USB_SERIAL_KEYSPAN_USA18X=y
CONFIG_USB_SERIAL_KEYSPAN_USA19W=y
CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y
CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y
CONFIG_USB_SERIAL_KEYSPAN_USA49W=y
CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y
CONFIG_USB_SERIAL_KLSI=m
CONFIG_USB_SERIAL_KOBIL_SCT=m
CONFIG_USB_SERIAL_MCT_U232=m
# CONFIG_USB_SERIAL_MOS7720 is not set
# CONFIG_USB_SERIAL_MOS7840 is not set
# CONFIG_USB_SERIAL_MOTOROLA is not set
CONFIG_USB_SERIAL_NAVMAN=m
CONFIG_USB_SERIAL_PL2303=m
# CONFIG_USB_SERIAL_OTI6858 is not set
# CONFIG_USB_SERIAL_SPCP8X5 is not set
CONFIG_USB_SERIAL_HP4X=m
CONFIG_USB_SERIAL_SAFE=m
CONFIG_USB_SERIAL_SAFE_PADDED=y
CONFIG_USB_SERIAL_SIERRAWIRELESS=m
CONFIG_USB_SERIAL_TI=m
# CONFIG_USB_SERIAL_TI_3410_FIRMWARE is not set
# CONFIG_USB_SERIAL_TI_5052_FIRMWARE is not set
CONFIG_USB_SERIAL_CYBERJACK=m
CONFIG_USB_SERIAL_XIRCOM=m
# CONFIG_USB_SERIAL_XIRCOM_FIRMWARE is not set
CONFIG_USB_SERIAL_OPTION=m
CONFIG_USB_SERIAL_OMNINET=m
# CONFIG_USB_SERIAL_DEBUG is not set

#
# USB Miscellaneous drivers
#
CONFIG_USB_EMI62=m
# CONFIG_USB_EMI62_FIRMWARE is not set
CONFIG_USB_EMI26=m
# CONFIG_USB_EMI26_FIRMWARE is not set
# CONFIG_USB_ADUTUX is not set
CONFIG_USB_AUERSWALD=m
CONFIG_USB_RIO500=m
CONFIG_USB_LEGOTOWER=m
CONFIG_USB_LCD=m
# CONFIG_USB_BERRY_CHARGE is not set
CONFIG_USB_LED=m
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_PHIDGET is not set
CONFIG_USB_IDMOUSE=m
# CONFIG_USB_FTDI_ELAN is not set
CONFIG_USB_APPLEDISPLAY=m
CONFIG_USB_SISUSBVGA=m
CONFIG_USB_SISUSBVGA_CON=y
CONFIG_USB_LD=m
# CONFIG_USB_TRANCEVIBRATOR is not set
# CONFIG_USB_IOWARRIOR is not set
CONFIG_USB_TEST=m
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_GOTEMP is not set
CONFIG_USB_ATM=m
CONFIG_USB_SPEEDTOUCH=m
CONFIG_USB_CXACRU=m
CONFIG_USB_UEAGLEATM=m
CONFIG_USB_XUSBATM=m
# CONFIG_USB_GADGET is not set
# CONFIG_UWB is not set
CONFIG_MMC=m
# CONFIG_MMC_DEBUG is not set
# CONFIG_MMC_UNSAFE_RESUME is not set
# CONFIG_MMC_PASSWORDS is not set

#
# MMC/SD Card Drivers
#
CONFIG_MMC_BLOCK=m
CONFIG_MMC_BLOCK_BOUNCE=y
# CONFIG_SDIO_UART is not set
# CONFIG_MMC_TEST is not set

#
# MMC/SD Host Controller Drivers
#
CONFIG_MMC_SDHCI=m
# CONFIG_MMC_SDHCI_PCI is not set
CONFIG_MMC_WBSD=m
# CONFIG_MMC_TIFM_SD is not set
# CONFIG_MMC_SDRICOH_CS is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y

#
# LED drivers
#
# CONFIG_LEDS_PCA9532 is not set
# CONFIG_LEDS_CLEVO_MAIL is not set
# CONFIG_LEDS_PCA955X is not set

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=m
CONFIG_LEDS_TRIGGER_IDE_DISK=y
CONFIG_LEDS_TRIGGER_HEARTBEAT=m
# CONFIG_LEDS_TRIGGER_DEFAULT_ON is not set
# CONFIG_ACCESSIBILITY is not set
CONFIG_INFINIBAND=m
CONFIG_INFINIBAND_USER_MAD=m
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_INFINIBAND_USER_MEM=y
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_MTHCA=m
CONFIG_INFINIBAND_MTHCA_DEBUG=y
CONFIG_INFINIBAND_IPATH=m
CONFIG_INFINIBAND_AMSO1100=m
# CONFIG_INFINIBAND_AMSO1100_DEBUG is not set
CONFIG_INFINIBAND_CXGB3=m
# CONFIG_INFINIBAND_CXGB3_DEBUG is not set
# CONFIG_MLX4_INFINIBAND is not set
# CONFIG_INFINIBAND_NES is not set
CONFIG_INFINIBAND_IPOIB=m
# CONFIG_INFINIBAND_IPOIB_CM is not set
CONFIG_INFINIBAND_IPOIB_DEBUG=y
# CONFIG_INFINIBAND_IPOIB_DEBUG_DATA is not set
CONFIG_INFINIBAND_SRP=m
CONFIG_INFINIBAND_ISER=m
CONFIG_EDAC=y

#
# Reporting subsystems
#
# CONFIG_EDAC_DEBUG is not set
CONFIG_EDAC_MM_EDAC=m
CONFIG_EDAC_E752X=m
# CONFIG_EDAC_I82975X is not set
# CONFIG_EDAC_I3000 is not set
# CONFIG_EDAC_I5000 is not set
CONFIG_RTC_LIB=m
CONFIG_RTC_CLASS=m

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
# CONFIG_RTC_DRV_TEST is not set

#
# I2C RTC drivers
#
CONFIG_RTC_DRV_DS1307=m
# CONFIG_RTC_DRV_DS1374 is not set
CONFIG_RTC_DRV_DS1672=m
# CONFIG_RTC_DRV_MAX6900 is not set
CONFIG_RTC_DRV_RS5C372=m
CONFIG_RTC_DRV_ISL1208=m
CONFIG_RTC_DRV_X1205=m
CONFIG_RTC_DRV_PCF8563=m
CONFIG_RTC_DRV_PCF8583=m
# CONFIG_RTC_DRV_M41T80 is not set
# CONFIG_RTC_DRV_S35390A is not set
# CONFIG_RTC_DRV_FM3130 is not set

#
# SPI RTC drivers
#

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_CMOS=m
# CONFIG_RTC_DRV_DS1511 is not set
CONFIG_RTC_DRV_DS1553=m
CONFIG_RTC_DRV_DS1742=m
# CONFIG_RTC_DRV_STK17TA8 is not set
# CONFIG_RTC_DRV_M48T86 is not set
# CONFIG_RTC_DRV_M48T59 is not set
CONFIG_RTC_DRV_V3020=m

#
# on-CPU RTC drivers
#
# CONFIG_DMADEVICES is not set
# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set

#
# Firmware Drivers
#
CONFIG_EDD=m
# CONFIG_EDD_OFF is not set
CONFIG_DELL_RBU=m
CONFIG_DCDBAS=m
CONFIG_DMIID=y
# CONFIG_ISCSI_IBFT_FIND is not set

#
# File systems
#
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT2_FS_SECURITY=y
CONFIG_EXT2_FS_XIP=y
CONFIG_FS_XIP=y
CONFIG_EXT3_FS=m
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
# CONFIG_EXT4DEV_FS is not set
CONFIG_JBD=m
# CONFIG_JBD_DEBUG is not set
CONFIG_FS_MBCACHE=y
# CONFIG_REISER4_FS is not set
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
CONFIG_FS_POSIX_ACL=y
# CONFIG_XFS_FS is not set
CONFIG_GFS2_FS=m
CONFIG_GFS2_FS_LOCKING_DLM=m
# CONFIG_OCFS2_FS is not set
CONFIG_DNOTIFY=y
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_QUOTA=y
# CONFIG_QUOTA_NETLINK_INTERFACE is not set
CONFIG_PRINT_QUOTA_WARNING=y
# CONFIG_QFMT_V1 is not set
CONFIG_QFMT_V2=y
CONFIG_QUOTACTL=y
# CONFIG_AUTOFS_FS is not set
CONFIG_AUTOFS4_FS=m
# CONFIG_FUSE_FS is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=m
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="ascii"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
# CONFIG_TMPFS_POSIX_ACL is not set
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_CONFIGFS_FS=m

#
# Layered filesystems
#
# CONFIG_ECRYPT_FS is not set
# CONFIG_UNION_FS is not set

#
# Miscellaneous filesystems
#
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
CONFIG_HFS_FS=m
CONFIG_HFSPLUS_FS=m
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
CONFIG_JFFS2_FS=m
CONFIG_JFFS2_FS_DEBUG=0
CONFIG_JFFS2_FS_WRITEBUFFER=y
# CONFIG_JFFS2_FS_WBUF_VERIFY is not set
CONFIG_JFFS2_SUMMARY=y
# CONFIG_JFFS2_FS_XATTR is not set
# CONFIG_JFFS2_COMPRESSION_OPTIONS is not set
CONFIG_JFFS2_ZLIB=y
# CONFIG_JFFS2_LZO is not set
CONFIG_JFFS2_RTIME=y
# CONFIG_JFFS2_RUBIN is not set
# CONFIG_LOGFS is not set
CONFIG_CRAMFS=m
CONFIG_VXFS_FS=m
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=y
CONFIG_NFSD=m
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=m
CONFIG_NFS_ACL_SUPPORT=m
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=m
CONFIG_SUNRPC_GSS=m
CONFIG_SUNRPC_XPRT_RDMA=m
# CONFIG_SUNRPC_BIND34 is not set
CONFIG_RPCSEC_GSS_KRB5=m
CONFIG_RPCSEC_GSS_SPKM3=m
# CONFIG_SMB_FS is not set
CONFIG_CIFS=m
# CONFIG_CIFS_STATS is not set
CONFIG_CIFS_WEAK_PW_HASH=y
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
# CONFIG_CIFS_DEBUG2 is not set
# CONFIG_CIFS_EXPERIMENTAL is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
CONFIG_OSF_PARTITION=y
CONFIG_AMIGA_PARTITION=y
# CONFIG_ATARI_PARTITION is not set
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
# CONFIG_LDM_PARTITION is not set
CONFIG_SGI_PARTITION=y
# CONFIG_ULTRIX_PARTITION is not set
CONFIG_SUN_PARTITION=y
CONFIG_KARMA_PARTITION=y
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_CODEPAGE_737=m
CONFIG_NLS_CODEPAGE_775=m
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_CODEPAGE_852=m
CONFIG_NLS_CODEPAGE_855=m
CONFIG_NLS_CODEPAGE_857=m
CONFIG_NLS_CODEPAGE_860=m
CONFIG_NLS_CODEPAGE_861=m
CONFIG_NLS_CODEPAGE_862=m
CONFIG_NLS_CODEPAGE_863=m
CONFIG_NLS_CODEPAGE_864=m
CONFIG_NLS_CODEPAGE_865=m
CONFIG_NLS_CODEPAGE_866=m
CONFIG_NLS_CODEPAGE_869=m
CONFIG_NLS_CODEPAGE_936=m
CONFIG_NLS_CODEPAGE_950=m
CONFIG_NLS_CODEPAGE_932=m
CONFIG_NLS_CODEPAGE_949=m
CONFIG_NLS_CODEPAGE_874=m
CONFIG_NLS_ISO8859_8=m
CONFIG_NLS_CODEPAGE_1250=m
CONFIG_NLS_CODEPAGE_1251=m
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_ISO8859_2=m
CONFIG_NLS_ISO8859_3=m
CONFIG_NLS_ISO8859_4=m
CONFIG_NLS_ISO8859_5=m
CONFIG_NLS_ISO8859_6=m
CONFIG_NLS_ISO8859_7=m
CONFIG_NLS_ISO8859_9=m
CONFIG_NLS_ISO8859_13=m
CONFIG_NLS_ISO8859_14=m
CONFIG_NLS_ISO8859_15=m
CONFIG_NLS_KOI8_R=m
CONFIG_NLS_KOI8_U=m
CONFIG_NLS_UTF8=m
CONFIG_DLM=m
CONFIG_DLM_DEBUG=y

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
CONFIG_ENABLE_WARN_DEPRECATED=y
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=2048
CONFIG_MAGIC_SYSRQ=y
# CONFIG_UNUSED_SYMBOLS is not set
# CONFIG_PAGE_OWNER is not set
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
CONFIG_DEBUG_KERNEL=y
# CONFIG_DEBUG_SHIRQ is not set
CONFIG_DETECT_SOFTLOCKUP=y
CONFIG_SCHED_DEBUG=y
CONFIG_SCHEDSTATS=y
# CONFIG_TIMER_STATS is not set
# CONFIG_DEBUG_OBJECTS is not set
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_VM=y
# CONFIG_DEBUG_WRITECOUNT is not set
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_SG is not set
# CONFIG_FRAME_POINTER is not set
# CONFIG_DEBUG_SYNCHRO_TEST is not set
# CONFIG_PROFILE_LIKELY is not set
# CONFIG_BOOT_PRINTK_DELAY is not set
# CONFIG_FAULT_INJECTION is not set
# CONFIG_LATENCYTOP is not set
CONFIG_HAVE_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_TRACING=y
# CONFIG_FTRACE is not set
# CONFIG_IRQSOFF_TRACER is not set
# CONFIG_SYSPROF_TRACER is not set
# CONFIG_SCHED_TRACER is not set
# CONFIG_CONTEXT_SWITCH_TRACER is not set
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
# CONFIG_FIREWIRE_OHCI_REMOTE_DMA is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
# CONFIG_KERNEL_TESTS is not set
# CONFIG_NONPROMISC_DEVMEM is not set
CONFIG_EARLY_PRINTK=y
CONFIG_DEBUG_STACKOVERFLOW=y
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_PAGEALLOC is not set
# CONFIG_DEBUG_PER_CPU_MAPS is not set
# CONFIG_X86_PTDUMP is not set
CONFIG_DEBUG_RODATA=y
# CONFIG_DIRECT_GBPAGES is not set
CONFIG_DEBUG_RODATA_TEST=y
# CONFIG_DEBUG_NX_TEST is not set
CONFIG_X86_MPPARSE=y
# CONFIG_IOMMU_DEBUG is not set
CONFIG_MMIOTRACE_HOOKS=y
CONFIG_MMIOTRACE=y
# CONFIG_MMIOTRACE_TEST is not set
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=0
# CONFIG_DEBUG_BOOT_PARAMS is not set
# CONFIG_CPA_DEBUG is not set

#
# Security options
#
CONFIG_KEYS=y
CONFIG_KEYS_DEBUG_PROC_KEYS=y
CONFIG_SECURITY=y
CONFIG_SECURITY_NETWORK=y
CONFIG_SECURITY_NETWORK_XFRM=y
CONFIG_SECURITY_CAPABILITIES=y
# CONFIG_SECURITY_FILE_CAPABILITIES is not set
# CONFIG_SECURITY_ROOTPLUG is not set
CONFIG_SECURITY_DEFAULT_MMAP_MIN_ADDR=0
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE=1
CONFIG_SECURITY_SELINUX_DISABLE=y
CONFIG_SECURITY_SELINUX_DEVELOP=y
CONFIG_SECURITY_SELINUX_AVC_STATS=y
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=1
CONFIG_SECURITY_SELINUX_ENABLE_SECMARK_DEFAULT=y
# CONFIG_SECURITY_SELINUX_POLICYDB_VERSION_MAX is not set
# CONFIG_SECURITY_SMACK is not set
CONFIG_XOR_BLOCKS=m
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_AEAD=m
CONFIG_CRYPTO_BLKCIPHER=m
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_MANAGER=y
# CONFIG_CRYPTO_GF128MUL is not set
CONFIG_CRYPTO_NULL=m
# CONFIG_CRYPTO_CRYPTD is not set
CONFIG_CRYPTO_AUTHENC=m
# CONFIG_CRYPTO_TEST is not set

#
# Authenticated Encryption with Associated Data
#
# CONFIG_CRYPTO_CCM is not set
# CONFIG_CRYPTO_GCM is not set
# CONFIG_CRYPTO_SEQIV is not set

#
# Block modes
#
CONFIG_CRYPTO_CBC=m
# CONFIG_CRYPTO_CTR is not set
# CONFIG_CRYPTO_CTS is not set
CONFIG_CRYPTO_ECB=m
# CONFIG_CRYPTO_LRW is not set
# CONFIG_CRYPTO_PCBC is not set
# CONFIG_CRYPTO_XTS is not set

#
# Hash modes
#
CONFIG_CRYPTO_HMAC=y
# CONFIG_CRYPTO_XCBC is not set

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=m
CONFIG_CRYPTO_MICHAEL_MIC=m
# CONFIG_CRYPTO_RMD128 is not set
# CONFIG_CRYPTO_RMD160 is not set
# CONFIG_CRYPTO_RMD256 is not set
# CONFIG_CRYPTO_RMD320 is not set
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=m
CONFIG_CRYPTO_SHA512=m
CONFIG_CRYPTO_TGR192=m
CONFIG_CRYPTO_WP512=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=m
CONFIG_CRYPTO_AES_X86_64=m
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
# CONFIG_CRYPTO_CAMELLIA is not set
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_DES=m
# CONFIG_CRYPTO_FCRYPT is not set
CONFIG_CRYPTO_KHAZAD=m
# CONFIG_CRYPTO_SALSA20 is not set
# CONFIG_CRYPTO_SALSA20_X86_64 is not set
# CONFIG_CRYPTO_SEED is not set
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m
# CONFIG_CRYPTO_TWOFISH_X86_64 is not set

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=m
# CONFIG_CRYPTO_LZO is not set
CONFIG_CRYPTO_HW=y
# CONFIG_CRYPTO_DEV_HIFN_795X is not set
CONFIG_HAVE_KVM=y
CONFIG_VIRTUALIZATION=y
# CONFIG_KVM is not set
# CONFIG_VIRTIO_PCI is not set
# CONFIG_VIRTIO_BALLOON is not set

#
# Library routines
#
CONFIG_BITREVERSE=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_CRC_CCITT=m
CONFIG_CRC16=m
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC7 is not set
CONFIG_LIBCRC32C=y
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=m
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_REED_SOLOMON=m
CONFIG_REED_SOLOMON_DEC16=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_PLIST=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
# CONFIG_TRACE is not set

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
  2008-06-17 18:29           ` Lee Schermerhorn
  (?)
@ 2008-06-18  2:40             ` Daisuke Nishimura
  -1 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-18  2:40 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

> > > > 
> > > Good catch, and I think your investigation in the last e-mail was correct.
> > > I'd like to dig this...but it seems some kind of big fix is necessary.
> > > Did this happen under page-migraion by cpuset-task-move test ?
> > > 
> > Yes.
> > 
> > I made 2 cpuset directories, run some processes in each cpusets,
> > and run a script like below infinitely to move tasks and migrate pages.
> 
> What processes/tests do you run in each cpuset?
> 

Please see the mail I've just sended to Kosaki-san :)


Thanks,
Daisuke Nishimura.

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
@ 2008-06-18  2:40             ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-18  2:40 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

> > > > 
> > > Good catch, and I think your investigation in the last e-mail was correct.
> > > I'd like to dig this...but it seems some kind of big fix is necessary.
> > > Did this happen under page-migraion by cpuset-task-move test ?
> > > 
> > Yes.
> > 
> > I made 2 cpuset directories, run some processes in each cpusets,
> > and run a script like below infinitely to move tasks and migrate pages.
> 
> What processes/tests do you run in each cpuset?
> 

Please see the mail I've just sended to Kosaki-san :)


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
@ 2008-06-18  2:40             ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-18  2:40 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

> > > > 
> > > Good catch, and I think your investigation in the last e-mail was correct.
> > > I'd like to dig this...but it seems some kind of big fix is necessary.
> > > Did this happen under page-migraion by cpuset-task-move test ?
> > > 
> > Yes.
> > 
> > I made 2 cpuset directories, run some processes in each cpusets,
> > and run a script like below infinitely to move tasks and migrate pages.
> 
> What processes/tests do you run in each cpuset?
> 

Please see the mail I've just sended to Kosaki-san :)


Thanks,
Daisuke Nishimura.
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
  2008-06-17 17:46     ` Lee Schermerhorn
  (?)
@ 2008-06-18  2:59       ` Daisuke Nishimura
  -1 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-18  2:59 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Andrew Morton, Rik van Riel, Kosaki Motohiro, Nick Piggin,
	linux-mm, linux-kernel, kernel-testers

> > @@ -232,6 +232,7 @@ void migration_entry_wait(struct mm_stru
> >  	swp_entry_t entry;
> >  	struct page *page;
> >  
> > +retry:
> >  	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
> >  	pte = *ptep;
> >  	if (!is_swap_pte(pte))
> > @@ -243,11 +244,20 @@ void migration_entry_wait(struct mm_stru
> >  
> >  	page = migration_entry_to_page(entry);
> >  
> > -	get_page(page);
> > -	pte_unmap_unlock(ptep, ptl);
> > -	wait_on_page_locked(page);
> > -	put_page(page);
> > -	return;
> > +	/*
> > +	 * page count might be set to zero by page_freeze_refs()
> > +	 * in migrate_page_move_mapping().
> > +	 */
> > +	if (get_page_unless_zero(page)) {
> > +		pte_unmap_unlock(ptep, ptl);
> > +		wait_on_page_locked(page);
> > +		put_page(page);
> > +		return;
> > +	} else {
> > +		pte_unmap_unlock(ptep, ptl);
> > +		goto retry;
> > +	}
> > +
> 
> I'm not sure about this part.  If it IS needed, I think it would be
> needed independently of the unevictable/putback_lru_page() changes, as
> this race must have already existed.
> 
> However, unmap_and_move() replaced the migration entries with bona fide
> pte's referencing the new page before freeing the old page, so I think
> we're OK without this change.
> 

Without this part, I can easily get VM_BUG_ON in get_page,
even when processes in cpusets are only bash.

---
kernel BUG at include/linux/mm.h:297!
 :
Call Trace:
 [<ffffffff80280d82>] ? handle_mm_fault+0x3e5/0x782
 [<ffffffff8048c8bf>] ? do_page_fault+0x3d0/0x7a7
 [<ffffffff80263ed0>] ? audit_syscall_exit+0x2e4/0x303
 [<ffffffff8048a989>] ? error_exit+0x0/0x51
 Code: b8 00 00 00 00 00 e2 ff ff 48 8d 1c 02 48 8b 13 f6 c
2 01 75 04 0f 0b eb fe 80 e6 40 48 89 d8 74 04 48 8b 43 10 83 78 08 00 75 04 <0f> 0b eb fe
 f0 ff 40 08 fe 45 00 f6 03 01 74 0a 31 f6 48 89 df
 RIP  [<ffffffff8029c309>] migration_entry_wait+0xcb/0xfa
 RSP <ffff81062cc6fe58>
---

I agree that this part should be fixed independently, and
Kamezawa-san has already posted a patch for this.


Thanks,
Daisuke Nishimura.

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-18  2:59       ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-18  2:59 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Andrew Morton, Rik van Riel, Kosaki Motohiro, Nick Piggin,
	linux-mm, linux-kernel, kernel-testers

> > @@ -232,6 +232,7 @@ void migration_entry_wait(struct mm_stru
> >  	swp_entry_t entry;
> >  	struct page *page;
> >  
> > +retry:
> >  	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
> >  	pte = *ptep;
> >  	if (!is_swap_pte(pte))
> > @@ -243,11 +244,20 @@ void migration_entry_wait(struct mm_stru
> >  
> >  	page = migration_entry_to_page(entry);
> >  
> > -	get_page(page);
> > -	pte_unmap_unlock(ptep, ptl);
> > -	wait_on_page_locked(page);
> > -	put_page(page);
> > -	return;
> > +	/*
> > +	 * page count might be set to zero by page_freeze_refs()
> > +	 * in migrate_page_move_mapping().
> > +	 */
> > +	if (get_page_unless_zero(page)) {
> > +		pte_unmap_unlock(ptep, ptl);
> > +		wait_on_page_locked(page);
> > +		put_page(page);
> > +		return;
> > +	} else {
> > +		pte_unmap_unlock(ptep, ptl);
> > +		goto retry;
> > +	}
> > +
> 
> I'm not sure about this part.  If it IS needed, I think it would be
> needed independently of the unevictable/putback_lru_page() changes, as
> this race must have already existed.
> 
> However, unmap_and_move() replaced the migration entries with bona fide
> pte's referencing the new page before freeing the old page, so I think
> we're OK without this change.
> 

Without this part, I can easily get VM_BUG_ON in get_page,
even when processes in cpusets are only bash.

---
kernel BUG at include/linux/mm.h:297!
 :
Call Trace:
 [<ffffffff80280d82>] ? handle_mm_fault+0x3e5/0x782
 [<ffffffff8048c8bf>] ? do_page_fault+0x3d0/0x7a7
 [<ffffffff80263ed0>] ? audit_syscall_exit+0x2e4/0x303
 [<ffffffff8048a989>] ? error_exit+0x0/0x51
 Code: b8 00 00 00 00 00 e2 ff ff 48 8d 1c 02 48 8b 13 f6 c
2 01 75 04 0f 0b eb fe 80 e6 40 48 89 d8 74 04 48 8b 43 10 83 78 08 00 75 04 <0f> 0b eb fe
 f0 ff 40 08 fe 45 00 f6 03 01 74 0a 31 f6 48 89 df
 RIP  [<ffffffff8029c309>] migration_entry_wait+0xcb/0xfa
 RSP <ffff81062cc6fe58>
---

I agree that this part should be fixed independently, and
Kamezawa-san has already posted a patch for this.


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-18  2:59       ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-18  2:59 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Andrew Morton, Rik van Riel, Kosaki Motohiro, Nick Piggin,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

> > @@ -232,6 +232,7 @@ void migration_entry_wait(struct mm_stru
> >  	swp_entry_t entry;
> >  	struct page *page;
> >  
> > +retry:
> >  	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
> >  	pte = *ptep;
> >  	if (!is_swap_pte(pte))
> > @@ -243,11 +244,20 @@ void migration_entry_wait(struct mm_stru
> >  
> >  	page = migration_entry_to_page(entry);
> >  
> > -	get_page(page);
> > -	pte_unmap_unlock(ptep, ptl);
> > -	wait_on_page_locked(page);
> > -	put_page(page);
> > -	return;
> > +	/*
> > +	 * page count might be set to zero by page_freeze_refs()
> > +	 * in migrate_page_move_mapping().
> > +	 */
> > +	if (get_page_unless_zero(page)) {
> > +		pte_unmap_unlock(ptep, ptl);
> > +		wait_on_page_locked(page);
> > +		put_page(page);
> > +		return;
> > +	} else {
> > +		pte_unmap_unlock(ptep, ptl);
> > +		goto retry;
> > +	}
> > +
> 
> I'm not sure about this part.  If it IS needed, I think it would be
> needed independently of the unevictable/putback_lru_page() changes, as
> this race must have already existed.
> 
> However, unmap_and_move() replaced the migration entries with bona fide
> pte's referencing the new page before freeing the old page, so I think
> we're OK without this change.
> 

Without this part, I can easily get VM_BUG_ON in get_page,
even when processes in cpusets are only bash.

---
kernel BUG at include/linux/mm.h:297!
 :
Call Trace:
 [<ffffffff80280d82>] ? handle_mm_fault+0x3e5/0x782
 [<ffffffff8048c8bf>] ? do_page_fault+0x3d0/0x7a7
 [<ffffffff80263ed0>] ? audit_syscall_exit+0x2e4/0x303
 [<ffffffff8048a989>] ? error_exit+0x0/0x51
 Code: b8 00 00 00 00 00 e2 ff ff 48 8d 1c 02 48 8b 13 f6 c
2 01 75 04 0f 0b eb fe 80 e6 40 48 89 d8 74 04 48 8b 43 10 83 78 08 00 75 04 <0f> 0b eb fe
 f0 ff 40 08 fe 45 00 f6 03 01 74 0a 31 f6 48 89 df
 RIP  [<ffffffff8029c309>] migration_entry_wait+0xcb/0xfa
 RSP <ffff81062cc6fe58>
---

I agree that this part should be fixed independently, and
Kamezawa-san has already posted a patch for this.


Thanks,
Daisuke Nishimura.
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] unevictable mlocked pages:  initialize mm member of munlock mm_walk structure
  2008-06-17 20:00             ` Lee Schermerhorn
  (?)
@ 2008-06-18  3:33               ` KOSAKI Motohiro
  -1 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-18  3:33 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: kosaki.motohiro, Andrew Morton, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Rik van Riel, Nick Piggin, linux-mm,
	linux-kernel, kernel-testers

> PATCH:  fix munlock page table walk - now requires 'mm'
> 
> Against 2.6.26-rc5-mm3.
> 
> Incremental fix for: mlock-mlocked-pages-are-unevictable-fix.patch 
> 
> Initialize the 'mm' member of the mm_walk structure, else the
> page table walk doesn't occur, and mlocked pages will not be
> munlocked.  This is visible in the vmstats:  

Yup, Dave Hansen changed page_walk interface recently.
thus, his and ours patch is conflicted ;)

below patch is just nit cleanups.


===========================================
From: Lee Schermerhorn <lee.schermerhorn@hp.com>

This [freeing of mlocked pages] also occurs in unpatched 26-rc5-mm3.

Fixed by the following:

PATCH:  fix munlock page table walk - now requires 'mm'

Against 2.6.26-rc5-mm3.

Incremental fix for: mlock-mlocked-pages-are-unevictable-fix.patch 

Initialize the 'mm' member of the mm_walk structure, else the
page table walk doesn't occur, and mlocked pages will not be
munlocked.  This is visible in the vmstats:  

	noreclaim_pgs_munlocked - should equal noreclaim_pgs_mlocked
	  less (nr_mlock + noreclaim_pgs_cleared), but is always zero 
	  [munlock_vma_page() never called]

	noreclaim_pgs_mlockfreed - should be zero [for debug only],
	  but == noreclaim_pgs_mlocked - (nr_mlock + noreclaim_pgs_cleared)


Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

 mm/mlock.c |    1 +
 1 file changed, 1 insertion(+)

Index: b/mm/mlock.c
===================================================================
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -310,6 +310,7 @@ static void __munlock_vma_pages_range(st
 		.pmd_entry = __munlock_pmd_handler,
 		.pte_entry = __munlock_pte_handler,
 		.private = &mpw,
+		.mm = mm,
 	};
 
 	VM_BUG_ON(start & ~PAGE_MASK || end & ~PAGE_MASK);



^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] unevictable mlocked pages:  initialize mm member of munlock mm_walk structure
@ 2008-06-18  3:33               ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-18  3:33 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: kosaki.motohiro, Andrew Morton, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Rik van Riel, Nick Piggin, linux-mm,
	linux-kernel, kernel-testers

> PATCH:  fix munlock page table walk - now requires 'mm'
> 
> Against 2.6.26-rc5-mm3.
> 
> Incremental fix for: mlock-mlocked-pages-are-unevictable-fix.patch 
> 
> Initialize the 'mm' member of the mm_walk structure, else the
> page table walk doesn't occur, and mlocked pages will not be
> munlocked.  This is visible in the vmstats:  

Yup, Dave Hansen changed page_walk interface recently.
thus, his and ours patch is conflicted ;)

below patch is just nit cleanups.


===========================================
From: Lee Schermerhorn <lee.schermerhorn@hp.com>

This [freeing of mlocked pages] also occurs in unpatched 26-rc5-mm3.

Fixed by the following:

PATCH:  fix munlock page table walk - now requires 'mm'

Against 2.6.26-rc5-mm3.

Incremental fix for: mlock-mlocked-pages-are-unevictable-fix.patch 

Initialize the 'mm' member of the mm_walk structure, else the
page table walk doesn't occur, and mlocked pages will not be
munlocked.  This is visible in the vmstats:  

	noreclaim_pgs_munlocked - should equal noreclaim_pgs_mlocked
	  less (nr_mlock + noreclaim_pgs_cleared), but is always zero 
	  [munlock_vma_page() never called]

	noreclaim_pgs_mlockfreed - should be zero [for debug only],
	  but == noreclaim_pgs_mlocked - (nr_mlock + noreclaim_pgs_cleared)


Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

 mm/mlock.c |    1 +
 1 file changed, 1 insertion(+)

Index: b/mm/mlock.c
===================================================================
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -310,6 +310,7 @@ static void __munlock_vma_pages_range(st
 		.pmd_entry = __munlock_pmd_handler,
 		.pte_entry = __munlock_pte_handler,
 		.private = &mpw,
+		.mm = mm,
 	};
 
 	VM_BUG_ON(start & ~PAGE_MASK || end & ~PAGE_MASK);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] unevictable mlocked pages:  initialize mm member of munlock mm_walk structure
@ 2008-06-18  3:33               ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-18  3:33 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, Andrew Morton,
	KAMEZAWA Hiroyuki, Daisuke Nishimura, Rik van Riel, Nick Piggin,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

> PATCH:  fix munlock page table walk - now requires 'mm'
> 
> Against 2.6.26-rc5-mm3.
> 
> Incremental fix for: mlock-mlocked-pages-are-unevictable-fix.patch 
> 
> Initialize the 'mm' member of the mm_walk structure, else the
> page table walk doesn't occur, and mlocked pages will not be
> munlocked.  This is visible in the vmstats:  

Yup, Dave Hansen changed page_walk interface recently.
thus, his and ours patch is conflicted ;)

below patch is just nit cleanups.


===========================================
From: Lee Schermerhorn <lee.schermerhorn-VXdhtT5mjnY@public.gmane.org>

This [freeing of mlocked pages] also occurs in unpatched 26-rc5-mm3.

Fixed by the following:

PATCH:  fix munlock page table walk - now requires 'mm'

Against 2.6.26-rc5-mm3.

Incremental fix for: mlock-mlocked-pages-are-unevictable-fix.patch 

Initialize the 'mm' member of the mm_walk structure, else the
page table walk doesn't occur, and mlocked pages will not be
munlocked.  This is visible in the vmstats:  

	noreclaim_pgs_munlocked - should equal noreclaim_pgs_mlocked
	  less (nr_mlock + noreclaim_pgs_cleared), but is always zero 
	  [munlock_vma_page() never called]

	noreclaim_pgs_mlockfreed - should be zero [for debug only],
	  but == noreclaim_pgs_mlocked - (nr_mlock + noreclaim_pgs_cleared)


Signed-off-by: Lee Schermerhorn <lee.schermerhorn-VXdhtT5mjnY@public.gmane.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

 mm/mlock.c |    1 +
 1 file changed, 1 insertion(+)

Index: b/mm/mlock.c
===================================================================
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -310,6 +310,7 @@ static void __munlock_vma_pages_range(st
 		.pmd_entry = __munlock_pmd_handler,
 		.pte_entry = __munlock_pte_handler,
 		.private = &mpw,
+		.mm = mm,
 	};
 
 	VM_BUG_ON(start & ~PAGE_MASK || end & ~PAGE_MASK);


--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
  2008-06-18  1:54       ` Daisuke Nishimura
  (?)
@ 2008-06-18  4:41         ` Daisuke Nishimura
  -1 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-18  4:41 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Nick Piggin,
	linux-mm, linux-kernel, kernel-testers

On Wed, 18 Jun 2008 10:54:00 +0900, Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> On Wed, 18 Jun 2008 00:33:18 +0900, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> > > @@ -715,13 +725,7 @@ unlock:
> > >   		 * restored.
> > >   		 */
> > >   		list_del(&page->lru);
> > > -		if (!page->mapping) {
> > > -			VM_BUG_ON(page_count(page) != 1);
> > > -			unlock_page(page);
> > > -			put_page(page);		/* just free the old page */
> > > -			goto end_migration;
> > > -		} else
> > > -			unlock = putback_lru_page(page);
> > > +		unlock = putback_lru_page(page);
> > >  	}
> > >  
> > >  	if (unlock)
> > 
> > this part is really necessary?
> > I tryed to remove it, but any problem doesn't happend.
> > 
> I made this part first, and added a fix for migration_entry_wait later.
> 
> So, I haven't test without this part, and I think it will cause
> VM_BUG_ON() here without this part.
> 
> Anyway, I will test it.
> 
I got this VM_BUG_ON() as expected only by doing:

  # echo $$ >/cgroup/cpuset/02/tasks

So, I beleive that both fixes for migration_entry_wait and
unmap_and_move (and, of course, removal VM_BUG_ON from
putback_lru_page) are needed.


Thanks,
Daisuke Nishimura.

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-18  4:41         ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-18  4:41 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Nick Piggin,
	linux-mm, linux-kernel, kernel-testers

On Wed, 18 Jun 2008 10:54:00 +0900, Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> On Wed, 18 Jun 2008 00:33:18 +0900, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> > > @@ -715,13 +725,7 @@ unlock:
> > >   		 * restored.
> > >   		 */
> > >   		list_del(&page->lru);
> > > -		if (!page->mapping) {
> > > -			VM_BUG_ON(page_count(page) != 1);
> > > -			unlock_page(page);
> > > -			put_page(page);		/* just free the old page */
> > > -			goto end_migration;
> > > -		} else
> > > -			unlock = putback_lru_page(page);
> > > +		unlock = putback_lru_page(page);
> > >  	}
> > >  
> > >  	if (unlock)
> > 
> > this part is really necessary?
> > I tryed to remove it, but any problem doesn't happend.
> > 
> I made this part first, and added a fix for migration_entry_wait later.
> 
> So, I haven't test without this part, and I think it will cause
> VM_BUG_ON() here without this part.
> 
> Anyway, I will test it.
> 
I got this VM_BUG_ON() as expected only by doing:

  # echo $$ >/cgroup/cpuset/02/tasks

So, I beleive that both fixes for migration_entry_wait and
unmap_and_move (and, of course, removal VM_BUG_ON from
putback_lru_page) are needed.


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-18  4:41         ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-18  4:41 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Nick Piggin,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Wed, 18 Jun 2008 10:54:00 +0900, Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org> wrote:
> On Wed, 18 Jun 2008 00:33:18 +0900, KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> > > @@ -715,13 +725,7 @@ unlock:
> > >   		 * restored.
> > >   		 */
> > >   		list_del(&page->lru);
> > > -		if (!page->mapping) {
> > > -			VM_BUG_ON(page_count(page) != 1);
> > > -			unlock_page(page);
> > > -			put_page(page);		/* just free the old page */
> > > -			goto end_migration;
> > > -		} else
> > > -			unlock = putback_lru_page(page);
> > > +		unlock = putback_lru_page(page);
> > >  	}
> > >  
> > >  	if (unlock)
> > 
> > this part is really necessary?
> > I tryed to remove it, but any problem doesn't happend.
> > 
> I made this part first, and added a fix for migration_entry_wait later.
> 
> So, I haven't test without this part, and I think it will cause
> VM_BUG_ON() here without this part.
> 
> Anyway, I will test it.
> 
I got this VM_BUG_ON() as expected only by doing:

  # echo $$ >/cgroup/cpuset/02/tasks

So, I beleive that both fixes for migration_entry_wait and
unmap_and_move (and, of course, removal VM_BUG_ON from
putback_lru_page) are needed.


Thanks,
Daisuke Nishimura.
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
  2008-06-18  4:41         ` Daisuke Nishimura
  (?)
@ 2008-06-18  4:59           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  4:59 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Wed, 18 Jun 2008 13:41:28 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> On Wed, 18 Jun 2008 10:54:00 +0900, Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > On Wed, 18 Jun 2008 00:33:18 +0900, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> > > > @@ -715,13 +725,7 @@ unlock:
> > > >   		 * restored.
> > > >   		 */
> > > >   		list_del(&page->lru);
> > > > -		if (!page->mapping) {
> > > > -			VM_BUG_ON(page_count(page) != 1);
> > > > -			unlock_page(page);
> > > > -			put_page(page);		/* just free the old page */
> > > > -			goto end_migration;
> > > > -		} else
> > > > -			unlock = putback_lru_page(page);
> > > > +		unlock = putback_lru_page(page);
> > > >  	}
> > > >  
> > > >  	if (unlock)
> > > 
> > > this part is really necessary?
> > > I tryed to remove it, but any problem doesn't happend.
> > > 
> > I made this part first, and added a fix for migration_entry_wait later.
> > 
> > So, I haven't test without this part, and I think it will cause
> > VM_BUG_ON() here without this part.
> > 
> > Anyway, I will test it.
> > 
> I got this VM_BUG_ON() as expected only by doing:
> 
>   # echo $$ >/cgroup/cpuset/02/tasks
> 
> So, I beleive that both fixes for migration_entry_wait and
> unmap_and_move (and, of course, removal VM_BUG_ON from
> putback_lru_page) are needed.
> 
> 
yes, but I'm now trying to rewrite putback_lru_page(). For avoid more complication.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-18  4:59           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  4:59 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Wed, 18 Jun 2008 13:41:28 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> On Wed, 18 Jun 2008 10:54:00 +0900, Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > On Wed, 18 Jun 2008 00:33:18 +0900, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> > > > @@ -715,13 +725,7 @@ unlock:
> > > >   		 * restored.
> > > >   		 */
> > > >   		list_del(&page->lru);
> > > > -		if (!page->mapping) {
> > > > -			VM_BUG_ON(page_count(page) != 1);
> > > > -			unlock_page(page);
> > > > -			put_page(page);		/* just free the old page */
> > > > -			goto end_migration;
> > > > -		} else
> > > > -			unlock = putback_lru_page(page);
> > > > +		unlock = putback_lru_page(page);
> > > >  	}
> > > >  
> > > >  	if (unlock)
> > > 
> > > this part is really necessary?
> > > I tryed to remove it, but any problem doesn't happend.
> > > 
> > I made this part first, and added a fix for migration_entry_wait later.
> > 
> > So, I haven't test without this part, and I think it will cause
> > VM_BUG_ON() here without this part.
> > 
> > Anyway, I will test it.
> > 
> I got this VM_BUG_ON() as expected only by doing:
> 
>   # echo $$ >/cgroup/cpuset/02/tasks
> 
> So, I beleive that both fixes for migration_entry_wait and
> unmap_and_move (and, of course, removal VM_BUG_ON from
> putback_lru_page) are needed.
> 
> 
yes, but I'm now trying to rewrite putback_lru_page(). For avoid more complication.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-18  4:59           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  4:59 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Wed, 18 Jun 2008 13:41:28 +0900
Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org> wrote:

> On Wed, 18 Jun 2008 10:54:00 +0900, Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org> wrote:
> > On Wed, 18 Jun 2008 00:33:18 +0900, KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> > > > @@ -715,13 +725,7 @@ unlock:
> > > >   		 * restored.
> > > >   		 */
> > > >   		list_del(&page->lru);
> > > > -		if (!page->mapping) {
> > > > -			VM_BUG_ON(page_count(page) != 1);
> > > > -			unlock_page(page);
> > > > -			put_page(page);		/* just free the old page */
> > > > -			goto end_migration;
> > > > -		} else
> > > > -			unlock = putback_lru_page(page);
> > > > +		unlock = putback_lru_page(page);
> > > >  	}
> > > >  
> > > >  	if (unlock)
> > > 
> > > this part is really necessary?
> > > I tryed to remove it, but any problem doesn't happend.
> > > 
> > I made this part first, and added a fix for migration_entry_wait later.
> > 
> > So, I haven't test without this part, and I think it will cause
> > VM_BUG_ON() here without this part.
> > 
> > Anyway, I will test it.
> > 
> I got this VM_BUG_ON() as expected only by doing:
> 
>   # echo $$ >/cgroup/cpuset/02/tasks
> 
> So, I beleive that both fixes for migration_entry_wait and
> unmap_and_move (and, of course, removal VM_BUG_ON from
> putback_lru_page) are needed.
> 
> 
yes, but I'm now trying to rewrite putback_lru_page(). For avoid more complication.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
  2008-06-17 19:28         ` Lee Schermerhorn
  (?)
@ 2008-06-18  5:19           ` Nick Piggin
  -1 siblings, 0 replies; 290+ messages in thread
From: Nick Piggin @ 2008-06-18  5:19 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Hugh Dickins, Daisuke Nishimura, Andrew Morton, Rik van Riel,
	Kosaki Motohiro, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers

On Wednesday 18 June 2008 05:28, Lee Schermerhorn wrote:
> On Tue, 2008-06-17 at 19:33 +0100, Hugh Dickins wrote:
> > On Tue, 17 Jun 2008, Lee Schermerhorn wrote:
> > > Now I wonder if the assertion that newpage count == 1 could be
> > > violated? I don't see how.  We've just allocated and filled it and
> > > haven't unlocked it yet, so we should hold the only reference.  Do you
> > > agree?
> >
> > Disagree: IIRC, excellent example of the kind of assumption
> > that becomes invalid with Nick's speculative page references.
> >
> > Someone interested in the previous use of the page may have
> > incremented the refcount, and in due course will find that
> > it's got reused for something else, and will then back off.
>
> Yeah.  Kosaki-san mentioned that we'd need some rework for the
> speculative page cache work.  Looks like we'll need to drop the
> VM_BUG_ON().
>
> I need to go read up on the new invariants we can trust with the
> speculative page cache.

I don't know if I've added a summary, which is something I should
do.

The best thing to do is never use page_count, but just use get
and put to refcount it. If you really must use it:

- If there are X references to a page, page_count will return >= X.
- If page_count returns Y, there are no more than Y references to the page.


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-18  5:19           ` Nick Piggin
  0 siblings, 0 replies; 290+ messages in thread
From: Nick Piggin @ 2008-06-18  5:19 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Hugh Dickins, Daisuke Nishimura, Andrew Morton, Rik van Riel,
	Kosaki Motohiro, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers

On Wednesday 18 June 2008 05:28, Lee Schermerhorn wrote:
> On Tue, 2008-06-17 at 19:33 +0100, Hugh Dickins wrote:
> > On Tue, 17 Jun 2008, Lee Schermerhorn wrote:
> > > Now I wonder if the assertion that newpage count == 1 could be
> > > violated? I don't see how.  We've just allocated and filled it and
> > > haven't unlocked it yet, so we should hold the only reference.  Do you
> > > agree?
> >
> > Disagree: IIRC, excellent example of the kind of assumption
> > that becomes invalid with Nick's speculative page references.
> >
> > Someone interested in the previous use of the page may have
> > incremented the refcount, and in due course will find that
> > it's got reused for something else, and will then back off.
>
> Yeah.  Kosaki-san mentioned that we'd need some rework for the
> speculative page cache work.  Looks like we'll need to drop the
> VM_BUG_ON().
>
> I need to go read up on the new invariants we can trust with the
> speculative page cache.

I don't know if I've added a summary, which is something I should
do.

The best thing to do is never use page_count, but just use get
and put to refcount it. If you really must use it:

- If there are X references to a page, page_count will return >= X.
- If page_count returns Y, there are no more than Y references to the page.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3
@ 2008-06-18  5:19           ` Nick Piggin
  0 siblings, 0 replies; 290+ messages in thread
From: Nick Piggin @ 2008-06-18  5:19 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Hugh Dickins, Daisuke Nishimura, Andrew Morton, Rik van Riel,
	Kosaki Motohiro, Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Wednesday 18 June 2008 05:28, Lee Schermerhorn wrote:
> On Tue, 2008-06-17 at 19:33 +0100, Hugh Dickins wrote:
> > On Tue, 17 Jun 2008, Lee Schermerhorn wrote:
> > > Now I wonder if the assertion that newpage count == 1 could be
> > > violated? I don't see how.  We've just allocated and filled it and
> > > haven't unlocked it yet, so we should hold the only reference.  Do you
> > > agree?
> >
> > Disagree: IIRC, excellent example of the kind of assumption
> > that becomes invalid with Nick's speculative page references.
> >
> > Someone interested in the previous use of the page may have
> > incremented the refcount, and in due course will find that
> > it's got reused for something else, and will then back off.
>
> Yeah.  Kosaki-san mentioned that we'd need some rework for the
> speculative page cache work.  Looks like we'll need to drop the
> VM_BUG_ON().
>
> I need to go read up on the new invariants we can trust with the
> speculative page cache.

I don't know if I've added a summary, which is something I should
do.

The best thing to do is never use page_count, but just use get
and put to refcount it. If you really must use it:

- If there are X references to a page, page_count will return >= X.
- If page_count returns Y, there are no more than Y references to the page.

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] migration_entry_wait fix.
  2008-06-18  1:54       ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-18  5:26         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-18  5:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro, Daisuke Nishimura, Andrew Morton, Rik van Riel,
	Lee Schermerhorn, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers, hugh

> From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/migrate.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Index: test-2.6.26-rc5-mm3/mm/migrate.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/migrate.c
> +++ test-2.6.26-rc5-mm3/mm/migrate.c
> @@ -243,7 +243,8 @@ void migration_entry_wait(struct mm_stru
>  
>  	page = migration_entry_to_page(entry);
>  
> -	get_page(page);
> +	if (!page_cache_get_speculative(page))
> +		goto out;
>  	pte_unmap_unlock(ptep, ptl);
>  	wait_on_page_locked(page);
>  	put_page(page);

sorry, so late responce.

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>




^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] migration_entry_wait fix.
@ 2008-06-18  5:26         ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-18  5:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro, Daisuke Nishimura, Andrew Morton, Rik van Riel,
	Lee Schermerhorn, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers, hugh

> From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/migrate.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Index: test-2.6.26-rc5-mm3/mm/migrate.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/migrate.c
> +++ test-2.6.26-rc5-mm3/mm/migrate.c
> @@ -243,7 +243,8 @@ void migration_entry_wait(struct mm_stru
>  
>  	page = migration_entry_to_page(entry);
>  
> -	get_page(page);
> +	if (!page_cache_get_speculative(page))
> +		goto out;
>  	pte_unmap_unlock(ptep, ptl);
>  	wait_on_page_locked(page);
>  	put_page(page);

sorry, so late responce.

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] migration_entry_wait fix.
@ 2008-06-18  5:26         ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-18  5:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, Daisuke Nishimura,
	Andrew Morton, Rik van Riel, Lee Schermerhorn, Nick Piggin,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	hugh-DTz5qymZ9yRBDgjK7y7TUQ

> From: Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
> Signed-off-by: Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> ---
>  mm/migrate.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Index: test-2.6.26-rc5-mm3/mm/migrate.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/migrate.c
> +++ test-2.6.26-rc5-mm3/mm/migrate.c
> @@ -243,7 +243,8 @@ void migration_entry_wait(struct mm_stru
>  
>  	page = migration_entry_to_page(entry);
>  
> -	get_page(page);
> +	if (!page_cache_get_speculative(page))
> +		goto out;
>  	pte_unmap_unlock(ptep, ptl);
>  	wait_on_page_locked(page);
>  	put_page(page);

sorry, so late responce.

Acked-by: KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>



--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] migration_entry_wait fix.
  2008-06-18  1:54       ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-18  5:35         ` Nick Piggin
  -1 siblings, 0 replies; 290+ messages in thread
From: Nick Piggin @ 2008-06-18  5:35 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers, hugh

On Wednesday 18 June 2008 11:54, KAMEZAWA Hiroyuki wrote:
> On Wed, 18 Jun 2008 10:13:49 +0900
>
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > +	if (!page_cache_get_speculative())
> > +		goto out;
>
> This is obviously buggy....sorry..quilt refresh miss..
>
> ==
> In speculative page cache lookup protocol, page_count(page) is set to 0
> while radix-tree modification is going on, truncation, migration, etc...

These tend to all happen while the page is locked, and in particular
while the page does not have any references other than the current
code path and the pagecache. So no page tables should point to it.

So migration_entry_wait should not find pages with a refcount of zero.


> While page migration, a page fault to page under migration should wait
> unlock_page() and migration_entry_wait() waits for the page from its
> pte entry. It does get_page() -> wait_on_page_locked() -> put_page() now.
>
> In page migration, page_freeze_refs() -> page_unfreeze_refs() is called.
>
> Here, page_unfreeze_refs() expects page_count(page) == 0 and panics
> if page_count(page) != 0. To avoid this, we shouldn't touch page_count()
> if it is zero. This patch uses page_cache_get_speculative() to avoid
> the panic.

At any rate, page_cache_get_speculative() should not be used for this
purpose, but for when we _really_ don't have any references to a page.

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] migration_entry_wait fix.
@ 2008-06-18  5:35         ` Nick Piggin
  0 siblings, 0 replies; 290+ messages in thread
From: Nick Piggin @ 2008-06-18  5:35 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers, hugh

On Wednesday 18 June 2008 11:54, KAMEZAWA Hiroyuki wrote:
> On Wed, 18 Jun 2008 10:13:49 +0900
>
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > +	if (!page_cache_get_speculative())
> > +		goto out;
>
> This is obviously buggy....sorry..quilt refresh miss..
>
> ==
> In speculative page cache lookup protocol, page_count(page) is set to 0
> while radix-tree modification is going on, truncation, migration, etc...

These tend to all happen while the page is locked, and in particular
while the page does not have any references other than the current
code path and the pagecache. So no page tables should point to it.

So migration_entry_wait should not find pages with a refcount of zero.


> While page migration, a page fault to page under migration should wait
> unlock_page() and migration_entry_wait() waits for the page from its
> pte entry. It does get_page() -> wait_on_page_locked() -> put_page() now.
>
> In page migration, page_freeze_refs() -> page_unfreeze_refs() is called.
>
> Here, page_unfreeze_refs() expects page_count(page) == 0 and panics
> if page_count(page) != 0. To avoid this, we shouldn't touch page_count()
> if it is zero. This patch uses page_cache_get_speculative() to avoid
> the panic.

At any rate, page_cache_get_speculative() should not be used for this
purpose, but for when we _really_ don't have any references to a page.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] migration_entry_wait fix.
@ 2008-06-18  5:35         ` Nick Piggin
  0 siblings, 0 replies; 290+ messages in thread
From: Nick Piggin @ 2008-06-18  5:35 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	hugh-DTz5qymZ9yRBDgjK7y7TUQ

On Wednesday 18 June 2008 11:54, KAMEZAWA Hiroyuki wrote:
> On Wed, 18 Jun 2008 10:13:49 +0900
>
> KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> > +	if (!page_cache_get_speculative())
> > +		goto out;
>
> This is obviously buggy....sorry..quilt refresh miss..
>
> ==
> In speculative page cache lookup protocol, page_count(page) is set to 0
> while radix-tree modification is going on, truncation, migration, etc...

These tend to all happen while the page is locked, and in particular
while the page does not have any references other than the current
code path and the pagecache. So no page tables should point to it.

So migration_entry_wait should not find pages with a refcount of zero.


> While page migration, a page fault to page under migration should wait
> unlock_page() and migration_entry_wait() waits for the page from its
> pte entry. It does get_page() -> wait_on_page_locked() -> put_page() now.
>
> In page migration, page_freeze_refs() -> page_unfreeze_refs() is called.
>
> Here, page_unfreeze_refs() expects page_count(page) == 0 and panics
> if page_count(page) != 0. To avoid this, we shouldn't touch page_count()
> if it is zero. This patch uses page_cache_get_speculative() to avoid
> the panic.

At any rate, page_cache_get_speculative() should not be used for this
purpose, but for when we _really_ don't have any references to a page.
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] migration_entry_wait fix.
  2008-06-18  5:35         ` Nick Piggin
  (?)
@ 2008-06-18  6:04           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  6:04 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers, hugh

On Wed, 18 Jun 2008 15:35:57 +1000
Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> On Wednesday 18 June 2008 11:54, KAMEZAWA Hiroyuki wrote:
> > On Wed, 18 Jun 2008 10:13:49 +0900
> >
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > +	if (!page_cache_get_speculative())
> > > +		goto out;
> >
> > This is obviously buggy....sorry..quilt refresh miss..
> >
> > ==
> > In speculative page cache lookup protocol, page_count(page) is set to 0
> > while radix-tree modification is going on, truncation, migration, etc...
> 
> These tend to all happen while the page is locked, and in particular
> while the page does not have any references other than the current
> code path and the pagecache. So no page tables should point to it.
> 
> So migration_entry_wait should not find pages with a refcount of zero.
> 
> 
> > While page migration, a page fault to page under migration should wait
> > unlock_page() and migration_entry_wait() waits for the page from its
> > pte entry. It does get_page() -> wait_on_page_locked() -> put_page() now.
> >
> > In page migration, page_freeze_refs() -> page_unfreeze_refs() is called.
> >
> > Here, page_unfreeze_refs() expects page_count(page) == 0 and panics
> > if page_count(page) != 0. To avoid this, we shouldn't touch page_count()
> > if it is zero. This patch uses page_cache_get_speculative() to avoid
> > the panic.
> 
> At any rate, page_cache_get_speculative() should not be used for this
> purpose, but for when we _really_ don't have any references to a page.
> 
Then, I got NAK. what should I do ? 
(This fix is not related to lock_page() problem.)

If I read your advice correctly, we shouldn't use lock_page() here.

Before speculative page cache, page_table_entry of a page under migration
has a pte entry which encodes pfn as special pte entry. and wait for the
end of page migration by lock_page().

Maybe we just go back to user-land and makes it to do page-fault again
is better ? 
 
Thanks,
-Kame











^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] migration_entry_wait fix.
@ 2008-06-18  6:04           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  6:04 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers, hugh

On Wed, 18 Jun 2008 15:35:57 +1000
Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> On Wednesday 18 June 2008 11:54, KAMEZAWA Hiroyuki wrote:
> > On Wed, 18 Jun 2008 10:13:49 +0900
> >
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > +	if (!page_cache_get_speculative())
> > > +		goto out;
> >
> > This is obviously buggy....sorry..quilt refresh miss..
> >
> > ==
> > In speculative page cache lookup protocol, page_count(page) is set to 0
> > while radix-tree modification is going on, truncation, migration, etc...
> 
> These tend to all happen while the page is locked, and in particular
> while the page does not have any references other than the current
> code path and the pagecache. So no page tables should point to it.
> 
> So migration_entry_wait should not find pages with a refcount of zero.
> 
> 
> > While page migration, a page fault to page under migration should wait
> > unlock_page() and migration_entry_wait() waits for the page from its
> > pte entry. It does get_page() -> wait_on_page_locked() -> put_page() now.
> >
> > In page migration, page_freeze_refs() -> page_unfreeze_refs() is called.
> >
> > Here, page_unfreeze_refs() expects page_count(page) == 0 and panics
> > if page_count(page) != 0. To avoid this, we shouldn't touch page_count()
> > if it is zero. This patch uses page_cache_get_speculative() to avoid
> > the panic.
> 
> At any rate, page_cache_get_speculative() should not be used for this
> purpose, but for when we _really_ don't have any references to a page.
> 
Then, I got NAK. what should I do ? 
(This fix is not related to lock_page() problem.)

If I read your advice correctly, we shouldn't use lock_page() here.

Before speculative page cache, page_table_entry of a page under migration
has a pte entry which encodes pfn as special pte entry. and wait for the
end of page migration by lock_page().

Maybe we just go back to user-land and makes it to do page-fault again
is better ? 
 
Thanks,
-Kame










--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] migration_entry_wait fix.
@ 2008-06-18  6:04           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  6:04 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	hugh-DTz5qymZ9yRBDgjK7y7TUQ

On Wed, 18 Jun 2008 15:35:57 +1000
Nick Piggin <nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org> wrote:

> On Wednesday 18 June 2008 11:54, KAMEZAWA Hiroyuki wrote:
> > On Wed, 18 Jun 2008 10:13:49 +0900
> >
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> > > +	if (!page_cache_get_speculative())
> > > +		goto out;
> >
> > This is obviously buggy....sorry..quilt refresh miss..
> >
> > ==
> > In speculative page cache lookup protocol, page_count(page) is set to 0
> > while radix-tree modification is going on, truncation, migration, etc...
> 
> These tend to all happen while the page is locked, and in particular
> while the page does not have any references other than the current
> code path and the pagecache. So no page tables should point to it.
> 
> So migration_entry_wait should not find pages with a refcount of zero.
> 
> 
> > While page migration, a page fault to page under migration should wait
> > unlock_page() and migration_entry_wait() waits for the page from its
> > pte entry. It does get_page() -> wait_on_page_locked() -> put_page() now.
> >
> > In page migration, page_freeze_refs() -> page_unfreeze_refs() is called.
> >
> > Here, page_unfreeze_refs() expects page_count(page) == 0 and panics
> > if page_count(page) != 0. To avoid this, we shouldn't touch page_count()
> > if it is zero. This patch uses page_cache_get_speculative() to avoid
> > the panic.
> 
> At any rate, page_cache_get_speculative() should not be used for this
> purpose, but for when we _really_ don't have any references to a page.
> 
Then, I got NAK. what should I do ? 
(This fix is not related to lock_page() problem.)

If I read your advice correctly, we shouldn't use lock_page() here.

Before speculative page cache, page_table_entry of a page under migration
has a pte entry which encodes pfn as special pte entry. and wait for the
end of page migration by lock_page().

Maybe we just go back to user-land and makes it to do page-fault again
is better ? 
 
Thanks,
-Kame










--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] migration_entry_wait fix.
  2008-06-18  6:04           ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-18  6:42             ` Nick Piggin
  -1 siblings, 0 replies; 290+ messages in thread
From: Nick Piggin @ 2008-06-18  6:42 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers, hugh

On Wednesday 18 June 2008 16:04, KAMEZAWA Hiroyuki wrote:
> On Wed, 18 Jun 2008 15:35:57 +1000
>
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > On Wednesday 18 June 2008 11:54, KAMEZAWA Hiroyuki wrote:
> > > On Wed, 18 Jun 2008 10:13:49 +0900
> > >
> > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > > +	if (!page_cache_get_speculative())
> > > > +		goto out;
> > >
> > > This is obviously buggy....sorry..quilt refresh miss..
> > >
> > > ==
> > > In speculative page cache lookup protocol, page_count(page) is set to 0
> > > while radix-tree modification is going on, truncation, migration,
> > > etc...
> >
> > These tend to all happen while the page is locked, and in particular
> > while the page does not have any references other than the current
> > code path and the pagecache. So no page tables should point to it.
> >
> > So migration_entry_wait should not find pages with a refcount of zero.
> >
> > > While page migration, a page fault to page under migration should wait
> > > unlock_page() and migration_entry_wait() waits for the page from its
> > > pte entry. It does get_page() -> wait_on_page_locked() -> put_page()
> > > now.
> > >
> > > In page migration, page_freeze_refs() -> page_unfreeze_refs() is
> > > called.
> > >
> > > Here, page_unfreeze_refs() expects page_count(page) == 0 and panics
> > > if page_count(page) != 0. To avoid this, we shouldn't touch
> > > page_count() if it is zero. This patch uses
> > > page_cache_get_speculative() to avoid the panic.
> >
> > At any rate, page_cache_get_speculative() should not be used for this
> > purpose, but for when we _really_ don't have any references to a page.
>
> Then, I got NAK. what should I do ?

Well, not nack as such as just wanting to find out a bit more about
how this happens (I'm a little bit slow...)

> (This fix is not related to lock_page() problem.)
>
> If I read your advice correctly, we shouldn't use lock_page() here.
>
> Before speculative page cache, page_table_entry of a page under migration
> has a pte entry which encodes pfn as special pte entry. and wait for the
> end of page migration by lock_page().

What I don't think I understand, is how we can have a page in the
page tables (and with the ptl held) but with a zero refcount... Oh,
it's not actually a page but a migration entry! I'm not quite so
familiar with that code.

Hmm, so we might possibly see a page there that has a zero refcount
due to page_freeze_refs? In which case, I think the direction of you
fix is good. Sorry for my misunderstanding the problem, and thank
you for fixing up my code!

I would ask you to use get_page_unless_zero rather than
page_cache_get_speculative(), because it's not exactly a speculative
reference -- a speculative reference is one where we elevate _count
and then must recheck that the page we have is correct.

Also, please add a comment. It would really be nicer to hide this
transiently-frozen state away from migration_entry_wait, but I can't
see any lock that would easily solve it.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] migration_entry_wait fix.
@ 2008-06-18  6:42             ` Nick Piggin
  0 siblings, 0 replies; 290+ messages in thread
From: Nick Piggin @ 2008-06-18  6:42 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers, hugh

On Wednesday 18 June 2008 16:04, KAMEZAWA Hiroyuki wrote:
> On Wed, 18 Jun 2008 15:35:57 +1000
>
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > On Wednesday 18 June 2008 11:54, KAMEZAWA Hiroyuki wrote:
> > > On Wed, 18 Jun 2008 10:13:49 +0900
> > >
> > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > > +	if (!page_cache_get_speculative())
> > > > +		goto out;
> > >
> > > This is obviously buggy....sorry..quilt refresh miss..
> > >
> > > ==
> > > In speculative page cache lookup protocol, page_count(page) is set to 0
> > > while radix-tree modification is going on, truncation, migration,
> > > etc...
> >
> > These tend to all happen while the page is locked, and in particular
> > while the page does not have any references other than the current
> > code path and the pagecache. So no page tables should point to it.
> >
> > So migration_entry_wait should not find pages with a refcount of zero.
> >
> > > While page migration, a page fault to page under migration should wait
> > > unlock_page() and migration_entry_wait() waits for the page from its
> > > pte entry. It does get_page() -> wait_on_page_locked() -> put_page()
> > > now.
> > >
> > > In page migration, page_freeze_refs() -> page_unfreeze_refs() is
> > > called.
> > >
> > > Here, page_unfreeze_refs() expects page_count(page) == 0 and panics
> > > if page_count(page) != 0. To avoid this, we shouldn't touch
> > > page_count() if it is zero. This patch uses
> > > page_cache_get_speculative() to avoid the panic.
> >
> > At any rate, page_cache_get_speculative() should not be used for this
> > purpose, but for when we _really_ don't have any references to a page.
>
> Then, I got NAK. what should I do ?

Well, not nack as such as just wanting to find out a bit more about
how this happens (I'm a little bit slow...)

> (This fix is not related to lock_page() problem.)
>
> If I read your advice correctly, we shouldn't use lock_page() here.
>
> Before speculative page cache, page_table_entry of a page under migration
> has a pte entry which encodes pfn as special pte entry. and wait for the
> end of page migration by lock_page().

What I don't think I understand, is how we can have a page in the
page tables (and with the ptl held) but with a zero refcount... Oh,
it's not actually a page but a migration entry! I'm not quite so
familiar with that code.

Hmm, so we might possibly see a page there that has a zero refcount
due to page_freeze_refs? In which case, I think the direction of you
fix is good. Sorry for my misunderstanding the problem, and thank
you for fixing up my code!

I would ask you to use get_page_unless_zero rather than
page_cache_get_speculative(), because it's not exactly a speculative
reference -- a speculative reference is one where we elevate _count
and then must recheck that the page we have is correct.

Also, please add a comment. It would really be nicer to hide this
transiently-frozen state away from migration_entry_wait, but I can't
see any lock that would easily solve it.

Thanks,
Nick

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] migration_entry_wait fix.
@ 2008-06-18  6:42             ` Nick Piggin
  0 siblings, 0 replies; 290+ messages in thread
From: Nick Piggin @ 2008-06-18  6:42 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	hugh-DTz5qymZ9yRBDgjK7y7TUQ

On Wednesday 18 June 2008 16:04, KAMEZAWA Hiroyuki wrote:
> On Wed, 18 Jun 2008 15:35:57 +1000
>
> Nick Piggin <nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org> wrote:
> > On Wednesday 18 June 2008 11:54, KAMEZAWA Hiroyuki wrote:
> > > On Wed, 18 Jun 2008 10:13:49 +0900
> > >
> > > KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> > > > +	if (!page_cache_get_speculative())
> > > > +		goto out;
> > >
> > > This is obviously buggy....sorry..quilt refresh miss..
> > >
> > > ==
> > > In speculative page cache lookup protocol, page_count(page) is set to 0
> > > while radix-tree modification is going on, truncation, migration,
> > > etc...
> >
> > These tend to all happen while the page is locked, and in particular
> > while the page does not have any references other than the current
> > code path and the pagecache. So no page tables should point to it.
> >
> > So migration_entry_wait should not find pages with a refcount of zero.
> >
> > > While page migration, a page fault to page under migration should wait
> > > unlock_page() and migration_entry_wait() waits for the page from its
> > > pte entry. It does get_page() -> wait_on_page_locked() -> put_page()
> > > now.
> > >
> > > In page migration, page_freeze_refs() -> page_unfreeze_refs() is
> > > called.
> > >
> > > Here, page_unfreeze_refs() expects page_count(page) == 0 and panics
> > > if page_count(page) != 0. To avoid this, we shouldn't touch
> > > page_count() if it is zero. This patch uses
> > > page_cache_get_speculative() to avoid the panic.
> >
> > At any rate, page_cache_get_speculative() should not be used for this
> > purpose, but for when we _really_ don't have any references to a page.
>
> Then, I got NAK. what should I do ?

Well, not nack as such as just wanting to find out a bit more about
how this happens (I'm a little bit slow...)

> (This fix is not related to lock_page() problem.)
>
> If I read your advice correctly, we shouldn't use lock_page() here.
>
> Before speculative page cache, page_table_entry of a page under migration
> has a pte entry which encodes pfn as special pte entry. and wait for the
> end of page migration by lock_page().

What I don't think I understand, is how we can have a page in the
page tables (and with the ptl held) but with a zero refcount... Oh,
it's not actually a page but a migration entry! I'm not quite so
familiar with that code.

Hmm, so we might possibly see a page there that has a zero refcount
due to page_freeze_refs? In which case, I think the direction of you
fix is good. Sorry for my misunderstanding the problem, and thank
you for fixing up my code!

I would ask you to use get_page_unless_zero rather than
page_cache_get_speculative(), because it's not exactly a speculative
reference -- a speculative reference is one where we elevate _count
and then must recheck that the page we have is correct.

Also, please add a comment. It would really be nicer to hide this
transiently-frozen state away from migration_entry_wait, but I can't
see any lock that would easily solve it.

Thanks,
Nick
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] migration_entry_wait fix.
  2008-06-18  6:42             ` Nick Piggin
  (?)
@ 2008-06-18  6:52               ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  6:52 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers, hugh

On Wed, 18 Jun 2008 16:42:37 +1000
Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> > (This fix is not related to lock_page() problem.)
> >
> > If I read your advice correctly, we shouldn't use lock_page() here.
> >
> > Before speculative page cache, page_table_entry of a page under migration
> > has a pte entry which encodes pfn as special pte entry. and wait for the
> > end of page migration by lock_page().
> 
> What I don't think I understand, is how we can have a page in the
> page tables (and with the ptl held) but with a zero refcount... Oh,
> it's not actually a page but a migration entry! I'm not quite so
> familiar with that code.
> 
> Hmm, so we might possibly see a page there that has a zero refcount
> due to page_freeze_refs? In which case, I think the direction of you
> fix is good. Sorry for my misunderstanding the problem, and thank
> you for fixing up my code!
> 
> I would ask you to use get_page_unless_zero rather than
> page_cache_get_speculative(), because it's not exactly a speculative
> reference -- a speculative reference is one where we elevate _count
> and then must recheck that the page we have is correct.
> 
ok.

> Also, please add a comment. It would really be nicer to hide this
> transiently-frozen state away from migration_entry_wait, but I can't
> see any lock that would easily solve it.
> 
ok, will adds comments.

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] migration_entry_wait fix.
@ 2008-06-18  6:52               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  6:52 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers, hugh

On Wed, 18 Jun 2008 16:42:37 +1000
Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> > (This fix is not related to lock_page() problem.)
> >
> > If I read your advice correctly, we shouldn't use lock_page() here.
> >
> > Before speculative page cache, page_table_entry of a page under migration
> > has a pte entry which encodes pfn as special pte entry. and wait for the
> > end of page migration by lock_page().
> 
> What I don't think I understand, is how we can have a page in the
> page tables (and with the ptl held) but with a zero refcount... Oh,
> it's not actually a page but a migration entry! I'm not quite so
> familiar with that code.
> 
> Hmm, so we might possibly see a page there that has a zero refcount
> due to page_freeze_refs? In which case, I think the direction of you
> fix is good. Sorry for my misunderstanding the problem, and thank
> you for fixing up my code!
> 
> I would ask you to use get_page_unless_zero rather than
> page_cache_get_speculative(), because it's not exactly a speculative
> reference -- a speculative reference is one where we elevate _count
> and then must recheck that the page we have is correct.
> 
ok.

> Also, please add a comment. It would really be nicer to hide this
> transiently-frozen state away from migration_entry_wait, but I can't
> see any lock that would easily solve it.
> 
ok, will adds comments.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH] migration_entry_wait fix.
@ 2008-06-18  6:52               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  6:52 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	hugh-DTz5qymZ9yRBDgjK7y7TUQ

On Wed, 18 Jun 2008 16:42:37 +1000
Nick Piggin <nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org> wrote:

> > (This fix is not related to lock_page() problem.)
> >
> > If I read your advice correctly, we shouldn't use lock_page() here.
> >
> > Before speculative page cache, page_table_entry of a page under migration
> > has a pte entry which encodes pfn as special pte entry. and wait for the
> > end of page migration by lock_page().
> 
> What I don't think I understand, is how we can have a page in the
> page tables (and with the ptl held) but with a zero refcount... Oh,
> it's not actually a page but a migration entry! I'm not quite so
> familiar with that code.
> 
> Hmm, so we might possibly see a page there that has a zero refcount
> due to page_freeze_refs? In which case, I think the direction of you
> fix is good. Sorry for my misunderstanding the problem, and thank
> you for fixing up my code!
> 
> I would ask you to use get_page_unless_zero rather than
> page_cache_get_speculative(), because it's not exactly a speculative
> reference -- a speculative reference is one where we elevate _count
> and then must recheck that the page we have is correct.
> 
ok.

> Also, please add a comment. It would really be nicer to hide this
> transiently-frozen state away from migration_entry_wait, but I can't
> see any lock that would easily solve it.
> 
ok, will adds comments.

Thanks,
-Kame


--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH -mm][BUGFIX] migration_entry_wait fix. v2
  2008-06-18  7:29                 ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-18  7:26                   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-18  7:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro, Nick Piggin, Daisuke Nishimura, Andrew Morton,
	Rik van Riel, Lee Schermerhorn, linux-mm, linux-kernel,
	kernel-testers, hugh

> In speculative page cache look up protocol, page_count(page) is set to 0
> while radix-tree modification is going on, truncation, migration, etc...
> 
> While page migration, a page fault to page under migration does
>  - look up page table
>  - find it is migration_entry_pte
>  - decode pfn from migration_entry_pte and get page of pfn_page(pfn)
>  - wait until page is unlocked 
> 
> It does get_page() -> wait_on_page_locked() -> put_page() now.
> 
> In page migration's radix-tree replacement, page_freeze_refs() ->
> page_unfreeze_refs() is called. And page_count(page) turns to be zero
> and must be kept to be zero while radix-tree replacement.
> 
> If get_page() is called against a page under radix-tree replacement,
> the kernel panics(). To avoid this, we shouldn't increment page_count()
> if it is zero. This patch uses get_page_unless_zero().
> 
> Even if get_page_unless_zero() fails, the caller just retries.
> But will be a bit busier.

Great!
	Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>



^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH -mm][BUGFIX] migration_entry_wait fix. v2
@ 2008-06-18  7:26                   ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-18  7:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro, Nick Piggin, Daisuke Nishimura, Andrew Morton,
	Rik van Riel, Lee Schermerhorn, linux-mm, linux-kernel,
	kernel-testers, hugh

> In speculative page cache look up protocol, page_count(page) is set to 0
> while radix-tree modification is going on, truncation, migration, etc...
> 
> While page migration, a page fault to page under migration does
>  - look up page table
>  - find it is migration_entry_pte
>  - decode pfn from migration_entry_pte and get page of pfn_page(pfn)
>  - wait until page is unlocked 
> 
> It does get_page() -> wait_on_page_locked() -> put_page() now.
> 
> In page migration's radix-tree replacement, page_freeze_refs() ->
> page_unfreeze_refs() is called. And page_count(page) turns to be zero
> and must be kept to be zero while radix-tree replacement.
> 
> If get_page() is called against a page under radix-tree replacement,
> the kernel panics(). To avoid this, we shouldn't increment page_count()
> if it is zero. This patch uses get_page_unless_zero().
> 
> Even if get_page_unless_zero() fails, the caller just retries.
> But will be a bit busier.

Great!
	Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH -mm][BUGFIX] migration_entry_wait fix. v2
@ 2008-06-18  7:26                   ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-18  7:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, Nick Piggin,
	Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	hugh-DTz5qymZ9yRBDgjK7y7TUQ

> In speculative page cache look up protocol, page_count(page) is set to 0
> while radix-tree modification is going on, truncation, migration, etc...
> 
> While page migration, a page fault to page under migration does
>  - look up page table
>  - find it is migration_entry_pte
>  - decode pfn from migration_entry_pte and get page of pfn_page(pfn)
>  - wait until page is unlocked 
> 
> It does get_page() -> wait_on_page_locked() -> put_page() now.
> 
> In page migration's radix-tree replacement, page_freeze_refs() ->
> page_unfreeze_refs() is called. And page_count(page) turns to be zero
> and must be kept to be zero while radix-tree replacement.
> 
> If get_page() is called against a page under radix-tree replacement,
> the kernel panics(). To avoid this, we shouldn't increment page_count()
> if it is zero. This patch uses get_page_unless_zero().
> 
> Even if get_page_unless_zero() fails, the caller just retries.
> But will be a bit busier.

Great!
	Acked-by: KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>


--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH -mm][BUGFIX] migration_entry_wait fix. v2
  2008-06-18  6:52               ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-18  7:29                 ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  7:29 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Nick Piggin, Daisuke Nishimura, Andrew Morton, Rik van Riel,
	Lee Schermerhorn, Kosaki Motohiro, linux-mm, linux-kernel,
	kernel-testers, hugh

In speculative page cache look up protocol, page_count(page) is set to 0
while radix-tree modification is going on, truncation, migration, etc...

While page migration, a page fault to page under migration does
 - look up page table
 - find it is migration_entry_pte
 - decode pfn from migration_entry_pte and get page of pfn_page(pfn)
 - wait until page is unlocked 

It does get_page() -> wait_on_page_locked() -> put_page() now.

In page migration's radix-tree replacement, page_freeze_refs() ->
page_unfreeze_refs() is called. And page_count(page) turns to be zero
and must be kept to be zero while radix-tree replacement.

If get_page() is called against a page under radix-tree replacement,
the kernel panics(). To avoid this, we shouldn't increment page_count()
if it is zero. This patch uses get_page_unless_zero().

Even if get_page_unless_zero() fails, the caller just retries.
But will be a bit busier.

Change log v1->v2:
 - rewrote the patch description and added comments.

From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/migrate.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Index: test-2.6.26-rc5-mm3/mm/migrate.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/migrate.c
+++ test-2.6.26-rc5-mm3/mm/migrate.c
@@ -242,8 +242,15 @@ void migration_entry_wait(struct mm_stru
 		goto out;
 
 	page = migration_entry_to_page(entry);
-
-	get_page(page);
+	/*
+	 * Once radix-tree replacement of page migration started, page_count
+	 * *must* be zero. And, we don't want to call wait_on_page_locked()
+	 * against a page without get_page().
+	 * So, we use get_page_unless_zero(), here. Even failed, page fault
+	 * will occur again.
+	 */
+	if (!get_page_unless_zero(page))
+		goto out;
 	pte_unmap_unlock(ptep, ptl);
 	wait_on_page_locked(page);
 	put_page(page);


^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH -mm][BUGFIX] migration_entry_wait fix. v2
@ 2008-06-18  7:29                 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  7:29 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Nick Piggin, Daisuke Nishimura, Andrew Morton, Rik van Riel,
	Lee Schermerhorn, Kosaki Motohiro, linux-mm, linux-kernel,
	kernel-testers, hugh

In speculative page cache look up protocol, page_count(page) is set to 0
while radix-tree modification is going on, truncation, migration, etc...

While page migration, a page fault to page under migration does
 - look up page table
 - find it is migration_entry_pte
 - decode pfn from migration_entry_pte and get page of pfn_page(pfn)
 - wait until page is unlocked 

It does get_page() -> wait_on_page_locked() -> put_page() now.

In page migration's radix-tree replacement, page_freeze_refs() ->
page_unfreeze_refs() is called. And page_count(page) turns to be zero
and must be kept to be zero while radix-tree replacement.

If get_page() is called against a page under radix-tree replacement,
the kernel panics(). To avoid this, we shouldn't increment page_count()
if it is zero. This patch uses get_page_unless_zero().

Even if get_page_unless_zero() fails, the caller just retries.
But will be a bit busier.

Change log v1->v2:
 - rewrote the patch description and added comments.

From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/migrate.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Index: test-2.6.26-rc5-mm3/mm/migrate.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/migrate.c
+++ test-2.6.26-rc5-mm3/mm/migrate.c
@@ -242,8 +242,15 @@ void migration_entry_wait(struct mm_stru
 		goto out;
 
 	page = migration_entry_to_page(entry);
-
-	get_page(page);
+	/*
+	 * Once radix-tree replacement of page migration started, page_count
+	 * *must* be zero. And, we don't want to call wait_on_page_locked()
+	 * against a page without get_page().
+	 * So, we use get_page_unless_zero(), here. Even failed, page fault
+	 * will occur again.
+	 */
+	if (!get_page_unless_zero(page))
+		goto out;
 	pte_unmap_unlock(ptep, ptl);
 	wait_on_page_locked(page);
 	put_page(page);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH -mm][BUGFIX] migration_entry_wait fix. v2
@ 2008-06-18  7:29                 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  7:29 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Nick Piggin, Daisuke Nishimura, Andrew Morton, Rik van Riel,
	Lee Schermerhorn, Kosaki Motohiro,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	hugh-DTz5qymZ9yRBDgjK7y7TUQ

In speculative page cache look up protocol, page_count(page) is set to 0
while radix-tree modification is going on, truncation, migration, etc...

While page migration, a page fault to page under migration does
 - look up page table
 - find it is migration_entry_pte
 - decode pfn from migration_entry_pte and get page of pfn_page(pfn)
 - wait until page is unlocked 

It does get_page() -> wait_on_page_locked() -> put_page() now.

In page migration's radix-tree replacement, page_freeze_refs() ->
page_unfreeze_refs() is called. And page_count(page) turns to be zero
and must be kept to be zero while radix-tree replacement.

If get_page() is called against a page under radix-tree replacement,
the kernel panics(). To avoid this, we shouldn't increment page_count()
if it is zero. This patch uses get_page_unless_zero().

Even if get_page_unless_zero() fails, the caller just retries.
But will be a bit busier.

Change log v1->v2:
 - rewrote the patch description and added comments.

From: Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
Signed-off-by: Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
---
 mm/migrate.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Index: test-2.6.26-rc5-mm3/mm/migrate.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/migrate.c
+++ test-2.6.26-rc5-mm3/mm/migrate.c
@@ -242,8 +242,15 @@ void migration_entry_wait(struct mm_stru
 		goto out;
 
 	page = migration_entry_to_page(entry);
-
-	get_page(page);
+	/*
+	 * Once radix-tree replacement of page migration started, page_count
+	 * *must* be zero. And, we don't want to call wait_on_page_locked()
+	 * against a page without get_page().
+	 * So, we use get_page_unless_zero(), here. Even failed, page fault
+	 * will occur again.
+	 */
+	if (!get_page_unless_zero(page))
+		goto out;
 	pte_unmap_unlock(ptep, ptl);
 	wait_on_page_locked(page);
 	put_page(page);

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH -mm][BUGFIX] migration_entry_wait fix. v2
  2008-06-18  7:29                 ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-18  7:40                   ` Nick Piggin
  -1 siblings, 0 replies; 290+ messages in thread
From: Nick Piggin @ 2008-06-18  7:40 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, linux-mm, linux-kernel, kernel-testers, hugh

On Wednesday 18 June 2008 17:29, KAMEZAWA Hiroyuki wrote:
> In speculative page cache look up protocol, page_count(page) is set to 0
> while radix-tree modification is going on, truncation, migration, etc...
>
> While page migration, a page fault to page under migration does
>  - look up page table
>  - find it is migration_entry_pte
>  - decode pfn from migration_entry_pte and get page of pfn_page(pfn)
>  - wait until page is unlocked
>
> It does get_page() -> wait_on_page_locked() -> put_page() now.
>
> In page migration's radix-tree replacement, page_freeze_refs() ->
> page_unfreeze_refs() is called. And page_count(page) turns to be zero
> and must be kept to be zero while radix-tree replacement.
>
> If get_page() is called against a page under radix-tree replacement,
> the kernel panics(). To avoid this, we shouldn't increment page_count()
> if it is zero. This patch uses get_page_unless_zero().
>
> Even if get_page_unless_zero() fails, the caller just retries.
> But will be a bit busier.
>
> Change log v1->v2:
>  - rewrote the patch description and added comments.
>

Thanks

Acked-by: Nick Piggin <npiggin@suse.de>

Andrew, this is a bugfix to mm-speculative-page-references.patch

> From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/migrate.c |   11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> Index: test-2.6.26-rc5-mm3/mm/migrate.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/migrate.c
> +++ test-2.6.26-rc5-mm3/mm/migrate.c
> @@ -242,8 +242,15 @@ void migration_entry_wait(struct mm_stru
>  		goto out;
>
>  	page = migration_entry_to_page(entry);
> -
> -	get_page(page);
> +	/*
> +	 * Once radix-tree replacement of page migration started, page_count
> +	 * *must* be zero. And, we don't want to call wait_on_page_locked()
> +	 * against a page without get_page().
> +	 * So, we use get_page_unless_zero(), here. Even failed, page fault
> +	 * will occur again.
> +	 */
> +	if (!get_page_unless_zero(page))
> +		goto out;
>  	pte_unmap_unlock(ptep, ptl);
>  	wait_on_page_locked(page);
>  	put_page(page);

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH -mm][BUGFIX] migration_entry_wait fix. v2
@ 2008-06-18  7:40                   ` Nick Piggin
  0 siblings, 0 replies; 290+ messages in thread
From: Nick Piggin @ 2008-06-18  7:40 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, linux-mm, linux-kernel, kernel-testers, hugh

On Wednesday 18 June 2008 17:29, KAMEZAWA Hiroyuki wrote:
> In speculative page cache look up protocol, page_count(page) is set to 0
> while radix-tree modification is going on, truncation, migration, etc...
>
> While page migration, a page fault to page under migration does
>  - look up page table
>  - find it is migration_entry_pte
>  - decode pfn from migration_entry_pte and get page of pfn_page(pfn)
>  - wait until page is unlocked
>
> It does get_page() -> wait_on_page_locked() -> put_page() now.
>
> In page migration's radix-tree replacement, page_freeze_refs() ->
> page_unfreeze_refs() is called. And page_count(page) turns to be zero
> and must be kept to be zero while radix-tree replacement.
>
> If get_page() is called against a page under radix-tree replacement,
> the kernel panics(). To avoid this, we shouldn't increment page_count()
> if it is zero. This patch uses get_page_unless_zero().
>
> Even if get_page_unless_zero() fails, the caller just retries.
> But will be a bit busier.
>
> Change log v1->v2:
>  - rewrote the patch description and added comments.
>

Thanks

Acked-by: Nick Piggin <npiggin@suse.de>

Andrew, this is a bugfix to mm-speculative-page-references.patch

> From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/migrate.c |   11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> Index: test-2.6.26-rc5-mm3/mm/migrate.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/migrate.c
> +++ test-2.6.26-rc5-mm3/mm/migrate.c
> @@ -242,8 +242,15 @@ void migration_entry_wait(struct mm_stru
>  		goto out;
>
>  	page = migration_entry_to_page(entry);
> -
> -	get_page(page);
> +	/*
> +	 * Once radix-tree replacement of page migration started, page_count
> +	 * *must* be zero. And, we don't want to call wait_on_page_locked()
> +	 * against a page without get_page().
> +	 * So, we use get_page_unless_zero(), here. Even failed, page fault
> +	 * will occur again.
> +	 */
> +	if (!get_page_unless_zero(page))
> +		goto out;
>  	pte_unmap_unlock(ptep, ptl);
>  	wait_on_page_locked(page);
>  	put_page(page);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH -mm][BUGFIX] migration_entry_wait fix. v2
@ 2008-06-18  7:40                   ` Nick Piggin
  0 siblings, 0 replies; 290+ messages in thread
From: Nick Piggin @ 2008-06-18  7:40 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	hugh-DTz5qymZ9yRBDgjK7y7TUQ

On Wednesday 18 June 2008 17:29, KAMEZAWA Hiroyuki wrote:
> In speculative page cache look up protocol, page_count(page) is set to 0
> while radix-tree modification is going on, truncation, migration, etc...
>
> While page migration, a page fault to page under migration does
>  - look up page table
>  - find it is migration_entry_pte
>  - decode pfn from migration_entry_pte and get page of pfn_page(pfn)
>  - wait until page is unlocked
>
> It does get_page() -> wait_on_page_locked() -> put_page() now.
>
> In page migration's radix-tree replacement, page_freeze_refs() ->
> page_unfreeze_refs() is called. And page_count(page) turns to be zero
> and must be kept to be zero while radix-tree replacement.
>
> If get_page() is called against a page under radix-tree replacement,
> the kernel panics(). To avoid this, we shouldn't increment page_count()
> if it is zero. This patch uses get_page_unless_zero().
>
> Even if get_page_unless_zero() fails, the caller just retries.
> But will be a bit busier.
>
> Change log v1->v2:
>  - rewrote the patch description and added comments.
>

Thanks

Acked-by: Nick Piggin <npiggin-l3A5Bk7waGM@public.gmane.org>

Andrew, this is a bugfix to mm-speculative-page-references.patch

> From: Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
> Signed-off-by: Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> ---
>  mm/migrate.c |   11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> Index: test-2.6.26-rc5-mm3/mm/migrate.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/migrate.c
> +++ test-2.6.26-rc5-mm3/mm/migrate.c
> @@ -242,8 +242,15 @@ void migration_entry_wait(struct mm_stru
>  		goto out;
>
>  	page = migration_entry_to_page(entry);
> -
> -	get_page(page);
> +	/*
> +	 * Once radix-tree replacement of page migration started, page_count
> +	 * *must* be zero. And, we don't want to call wait_on_page_locked()
> +	 * against a page without get_page().
> +	 * So, we use get_page_unless_zero(), here. Even failed, page fault
> +	 * will occur again.
> +	 */
> +	if (!get_page_unless_zero(page))
> +		goto out;
>  	pte_unmap_unlock(ptep, ptl);
>  	wait_on_page_locked(page);
>  	put_page(page);
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH][-mm] remove redundant page->mapping check
  2008-06-18  4:41         ` Daisuke Nishimura
  (?)
@ 2008-06-18  7:54           ` KOSAKI Motohiro
  -1 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-18  7:54 UTC (permalink / raw)
  To: Daisuke Nishimura, Andrew Morton
  Cc: kosaki.motohiro, Rik van Riel, Lee Schermerhorn, Nick Piggin,
	linux-mm, linux-kernel, kernel-testers

> > > this part is really necessary?
> > > I tryed to remove it, but any problem doesn't happend.
> > > 
> > I made this part first, and added a fix for migration_entry_wait later.
> > 
> > So, I haven't test without this part, and I think it will cause
> > VM_BUG_ON() here without this part.
> > 
> > Anyway, I will test it.
> > 
> I got this VM_BUG_ON() as expected only by doing:
> 
>   # echo $$ >/cgroup/cpuset/02/tasks
> 
> So, I beleive that both fixes for migration_entry_wait and
> unmap_and_move (and, of course, removal VM_BUG_ON from
> putback_lru_page) are needed.

OK, I confirmed this part.

Andrew, please pick.


==================================================

Against: 2.6.26-rc5-mm3

remove redundant mapping check.

we'd be doing exactly what putback_lru_page() is doing.  So, this code
as always unnecessary, duplicate code.
So, just let putback_lru_page() handle this condition and conditionally
unlock_page().


Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by:      Lee Schermerhorn <Lee.Schermerhorn@hp.com>

---
 mm/migrate.c |    8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

Index: b/mm/migrate.c
===================================================================
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -716,13 +716,7 @@ unlock:
  		 * restored.
  		 */
  		list_del(&page->lru);
-		if (!page->mapping) {
-			VM_BUG_ON(page_count(page) != 1);
-			unlock_page(page);
-			put_page(page);		/* just free the old page */
-			goto end_migration;
-		} else
-			unlock = putback_lru_page(page);
+		unlock = putback_lru_page(page);
 	}
 
 	if (unlock)




^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH][-mm] remove redundant page->mapping check
@ 2008-06-18  7:54           ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-18  7:54 UTC (permalink / raw)
  To: Daisuke Nishimura, Andrew Morton
  Cc: kosaki.motohiro, Rik van Riel, Lee Schermerhorn, Nick Piggin,
	linux-mm, linux-kernel, kernel-testers

> > > this part is really necessary?
> > > I tryed to remove it, but any problem doesn't happend.
> > > 
> > I made this part first, and added a fix for migration_entry_wait later.
> > 
> > So, I haven't test without this part, and I think it will cause
> > VM_BUG_ON() here without this part.
> > 
> > Anyway, I will test it.
> > 
> I got this VM_BUG_ON() as expected only by doing:
> 
>   # echo $$ >/cgroup/cpuset/02/tasks
> 
> So, I beleive that both fixes for migration_entry_wait and
> unmap_and_move (and, of course, removal VM_BUG_ON from
> putback_lru_page) are needed.

OK, I confirmed this part.

Andrew, please pick.


==================================================

Against: 2.6.26-rc5-mm3

remove redundant mapping check.

we'd be doing exactly what putback_lru_page() is doing.  So, this code
as always unnecessary, duplicate code.
So, just let putback_lru_page() handle this condition and conditionally
unlock_page().


Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by:      Lee Schermerhorn <Lee.Schermerhorn@hp.com>

---
 mm/migrate.c |    8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

Index: b/mm/migrate.c
===================================================================
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -716,13 +716,7 @@ unlock:
  		 * restored.
  		 */
  		list_del(&page->lru);
-		if (!page->mapping) {
-			VM_BUG_ON(page_count(page) != 1);
-			unlock_page(page);
-			put_page(page);		/* just free the old page */
-			goto end_migration;
-		} else
-			unlock = putback_lru_page(page);
+		unlock = putback_lru_page(page);
 	}
 
 	if (unlock)



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH][-mm] remove redundant page->mapping check
@ 2008-06-18  7:54           ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-18  7:54 UTC (permalink / raw)
  To: Daisuke Nishimura, Andrew Morton
  Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, Rik van Riel,
	Lee Schermerhorn, Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

> > > this part is really necessary?
> > > I tryed to remove it, but any problem doesn't happend.
> > > 
> > I made this part first, and added a fix for migration_entry_wait later.
> > 
> > So, I haven't test without this part, and I think it will cause
> > VM_BUG_ON() here without this part.
> > 
> > Anyway, I will test it.
> > 
> I got this VM_BUG_ON() as expected only by doing:
> 
>   # echo $$ >/cgroup/cpuset/02/tasks
> 
> So, I beleive that both fixes for migration_entry_wait and
> unmap_and_move (and, of course, removal VM_BUG_ON from
> putback_lru_page) are needed.

OK, I confirmed this part.

Andrew, please pick.


==================================================

Against: 2.6.26-rc5-mm3

remove redundant mapping check.

we'd be doing exactly what putback_lru_page() is doing.  So, this code
as always unnecessary, duplicate code.
So, just let putback_lru_page() handle this condition and conditionally
unlock_page().


Signed-off-by: Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
Acked-by:      Lee Schermerhorn <Lee.Schermerhorn-VXdhtT5mjnY@public.gmane.org>

---
 mm/migrate.c |    8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

Index: b/mm/migrate.c
===================================================================
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -716,13 +716,7 @@ unlock:
  		 * restored.
  		 */
  		list_del(&page->lru);
-		if (!page->mapping) {
-			VM_BUG_ON(page_count(page) != 1);
-			unlock_page(page);
-			put_page(page);		/* just free the old page */
-			goto end_migration;
-		} else
-			unlock = putback_lru_page(page);
+		unlock = putback_lru_page(page);
 	}
 
 	if (unlock)



--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [Experimental][PATCH] putback_lru_page rework
  2008-06-17  7:47     ` Daisuke Nishimura
  (?)
@ 2008-06-18  9:40       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  9:40 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

Lee-san, how about this ?
Tested on x86-64 and tried Nisimura-san's test at el. works good now.
-Kame
==
putback_lru_page()/unevictable page handling rework.

Now, putback_lru_page() requires that the page is locked.
And in some special case, implicitly unlock it.

This patch tries to make putback_lru_pages() to be lock_page() free.
(Of course, some callers must take the lock.)

The main reason that putback_lru_page() assumes that page is locked
is to avoid the change in page's status among Mlocked/Not-Mlocked.

Once it is added to unevictable list, the page is removed from
unevictable list only when page is munlocked. (there are other special
case. but we ignore the special case.)
So, status change during putback_lru_page() is fatal and page should 
be locked.

putback_lru_page() in this patch has a new concepts.
When it adds page to unevictable list, it checks the status is 
changed or not again. if changed, retry to putback.

This patche changes also caller side and cleaning up lock/unlock_page().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroy@jp.fujitsu.com>

---
 mm/internal.h |    2 -
 mm/migrate.c  |   23 +++----------
 mm/mlock.c    |   24 +++++++-------
 mm/vmscan.c   |   96 +++++++++++++++++++++++++---------------------------------
 4 files changed, 61 insertions(+), 84 deletions(-)

Index: test-2.6.26-rc5-mm3/mm/vmscan.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/vmscan.c
+++ test-2.6.26-rc5-mm3/mm/vmscan.c
@@ -486,73 +486,63 @@ int remove_mapping(struct address_space 
  * Page may still be unevictable for other reasons.
  *
  * lru_lock must not be held, interrupts must be enabled.
- * Must be called with page locked.
- *
- * return 1 if page still locked [not truncated], else 0
  */
-int putback_lru_page(struct page *page)
+#ifdef CONFIG_UNEVICTABLE_LRU
+void putback_lru_page(struct page *page)
 {
 	int lru;
-	int ret = 1;
 	int was_unevictable;
 
-	VM_BUG_ON(!PageLocked(page));
 	VM_BUG_ON(PageLRU(page));
 
-	lru = !!TestClearPageActive(page);
 	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
 
-	if (unlikely(!page->mapping)) {
-		/*
-		 * page truncated.  drop lock as put_page() will
-		 * free the page.
-		 */
-		VM_BUG_ON(page_count(page) != 1);
-		unlock_page(page);
-		ret = 0;
-	} else if (page_evictable(page, NULL)) {
-		/*
-		 * For evictable pages, we can use the cache.
-		 * In event of a race, worst case is we end up with an
-		 * unevictable page on [in]active list.
-		 * We know how to handle that.
-		 */
+redo:
+	lru = !!TestClearPageActive(page);
+	if (page_evictable(page, NULL)) {
 		lru += page_is_file_cache(page);
 		lru_cache_add_lru(page, lru);
-		mem_cgroup_move_lists(page, lru);
-#ifdef CONFIG_UNEVICTABLE_LRU
-		if (was_unevictable)
-			count_vm_event(NORECL_PGRESCUED);
-#endif
 	} else {
-		/*
-		 * Put unevictable pages directly on zone's unevictable
-		 * list.
-		 */
+		lru = LRU_UNEVICTABLE;
 		add_page_to_unevictable_list(page);
-		mem_cgroup_move_lists(page, LRU_UNEVICTABLE);
-#ifdef CONFIG_UNEVICTABLE_LRU
-		if (!was_unevictable)
-			count_vm_event(NORECL_PGCULLED);
-#endif
 	}
+	mem_cgroup_move_lists(page, lru);
+
+	/*
+	 * page's status can change while we move it among lru. If an evictable
+	 * page is on unevictable list, it never be freed. To avoid that,
+	 * check after we added it to the list, again.
+	 */
+	if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL)) {
+		if (!isolate_lru_page(page)) {
+			put_page(page);
+			goto redo;
+		}
+		/* This means someone else dropped this page from LRU
+		 * So, it will be freed or putback to LRU again. There is
+		 * nothing to do here.
+		 */
+	}
+
+	if (was_unevictable && lru != LRU_UNEVICTABLE)
+		count_vm_event(NORECL_PGRESCUED);
+	else if (!was_unevictable && lru == LRU_UNEVICTABLE)
+		count_vm_event(NORECL_PGCULLED);
 
 	put_page(page);		/* drop ref from isolate */
-	return ret;		/* ret => "page still locked" */
 }
-
-/*
- * Cull page that shrink_*_list() has detected to be unevictable
- * under page lock to close races with other tasks that might be making
- * the page evictable.  Avoid stranding an evictable page on the
- * unevictable list.
- */
-static void cull_unevictable_page(struct page *page)
+#else
+void putback_lru_page(struct page *page)
 {
-	lock_page(page);
-	if (putback_lru_page(page))
-		unlock_page(page);
+	int lru;
+	VM_BUG_ON(PageLRU(page));
+
+	lru = !!TestClearPageActive(page) + page_is_file_cache(page);
+	lru_cache_add_lru(page, lru);
+	mem_cgroup_move_lists(page, lru);
+	put_page(page);
 }
+#endif
 
 /*
  * shrink_page_list() returns the number of reclaimed pages
@@ -746,8 +736,8 @@ free_it:
 		continue;
 
 cull_mlocked:
-		if (putback_lru_page(page))
-			unlock_page(page);
+		unlock_page(page);
+		putback_lru_page(page);
 		continue;
 
 activate_locked:
@@ -1127,7 +1117,7 @@ static unsigned long shrink_inactive_lis
 			list_del(&page->lru);
 			if (unlikely(!page_evictable(page, NULL))) {
 				spin_unlock_irq(&zone->lru_lock);
-				cull_unevictable_page(page);
+				putback_lru_page(page);
 				spin_lock_irq(&zone->lru_lock);
 				continue;
 			}
@@ -1231,7 +1221,7 @@ static void shrink_active_list(unsigned 
 		list_del(&page->lru);
 
 		if (unlikely(!page_evictable(page, NULL))) {
-			cull_unevictable_page(page);
+			putback_lru_page(page);
 			continue;
 		}
 
@@ -2393,8 +2383,6 @@ int zone_reclaim(struct zone *zone, gfp_
 int page_evictable(struct page *page, struct vm_area_struct *vma)
 {
 
-	VM_BUG_ON(PageUnevictable(page));
-
 	if (mapping_unevictable(page_mapping(page)))
 		return 0;
 
Index: test-2.6.26-rc5-mm3/mm/mlock.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/mlock.c
+++ test-2.6.26-rc5-mm3/mm/mlock.c
@@ -55,7 +55,6 @@ EXPORT_SYMBOL(can_do_mlock);
  */
 void __clear_page_mlock(struct page *page)
 {
-	VM_BUG_ON(!PageLocked(page));	/* for LRU isolate/putback */
 
 	dec_zone_page_state(page, NR_MLOCK);
 	count_vm_event(NORECL_PGCLEARED);
@@ -79,7 +78,6 @@ void __clear_page_mlock(struct page *pag
  */
 void mlock_vma_page(struct page *page)
 {
-	BUG_ON(!PageLocked(page));
 
 	if (!TestSetPageMlocked(page)) {
 		inc_zone_page_state(page, NR_MLOCK);
@@ -109,7 +107,6 @@ void mlock_vma_page(struct page *page)
  */
 static void munlock_vma_page(struct page *page)
 {
-	BUG_ON(!PageLocked(page));
 
 	if (TestClearPageMlocked(page)) {
 		dec_zone_page_state(page, NR_MLOCK);
@@ -169,7 +166,8 @@ static int __mlock_vma_pages_range(struc
 
 		/*
 		 * get_user_pages makes pages present if we are
-		 * setting mlock.
+		 * setting mlock. and this extra reference count will
+		 * disable migration of this page.
 		 */
 		ret = get_user_pages(current, mm, addr,
 				min_t(int, nr_pages, ARRAY_SIZE(pages)),
@@ -197,14 +195,8 @@ static int __mlock_vma_pages_range(struc
 		for (i = 0; i < ret; i++) {
 			struct page *page = pages[i];
 
-			/*
-			 * page might be truncated or migrated out from under
-			 * us.  Check after acquiring page lock.
-			 */
-			lock_page(page);
-			if (page->mapping)
+			if (page_mapcount(page))
 				mlock_vma_page(page);
-			unlock_page(page);
 			put_page(page);		/* ref from get_user_pages() */
 
 			/*
@@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
 	struct page *page;
 	pte_t pte;
 
+	/*
+	 * page is never be unmapped by page-reclaim. we lock this page now.
+	 */
 retry:
 	pte = *ptep;
 	/*
@@ -261,7 +256,15 @@ retry:
 		goto out;
 
 	lock_page(page);
-	if (!page->mapping) {
+	/*
+	 * Because we lock page here, we have to check 2 cases.
+	 * - the page is migrated.
+	 * - the page is truncated (file-cache only)
+	 * Note: Anonymous page doesn't clear page->mapping even if it
+	 * is removed from rmap.
+	 */
+	if (!page->mapping ||
+	     (PageAnon(page) && !page_mapcount(page))) {
 		unlock_page(page);
 		goto retry;
 	}
Index: test-2.6.26-rc5-mm3/mm/migrate.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/migrate.c
+++ test-2.6.26-rc5-mm3/mm/migrate.c
@@ -67,9 +67,7 @@ int putback_lru_pages(struct list_head *
 
 	list_for_each_entry_safe(page, page2, l, lru) {
 		list_del(&page->lru);
-		lock_page(page);
-		if (putback_lru_page(page))
-			unlock_page(page);
+		putback_lru_page(page);
 		count++;
 	}
 	return count;
@@ -571,7 +569,6 @@ static int fallback_migrate_page(struct 
 static int move_to_new_page(struct page *newpage, struct page *page)
 {
 	struct address_space *mapping;
-	int unlock = 1;
 	int rc;
 
 	/*
@@ -610,12 +607,11 @@ static int move_to_new_page(struct page 
 		 * Put back on LRU while holding page locked to
 		 * handle potential race with, e.g., munlock()
 		 */
-		unlock = putback_lru_page(newpage);
+		putback_lru_page(newpage);
 	} else
 		newpage->mapping = NULL;
 
-	if (unlock)
-		unlock_page(newpage);
+	unlock_page(newpage);
 
 	return rc;
 }
@@ -632,7 +628,6 @@ static int unmap_and_move(new_page_t get
 	struct page *newpage = get_new_page(page, private, &result);
 	int rcu_locked = 0;
 	int charge = 0;
-	int unlock = 1;
 
 	if (!newpage)
 		return -ENOMEM;
@@ -713,6 +708,7 @@ rcu_unlock:
 		rcu_read_unlock();
 
 unlock:
+	unlock_page(page);
 
 	if (rc != -EAGAIN) {
  		/*
@@ -722,18 +718,9 @@ unlock:
  		 * restored.
  		 */
  		list_del(&page->lru);
-		if (!page->mapping) {
-			VM_BUG_ON(page_count(page) != 1);
-			unlock_page(page);
-			put_page(page);		/* just free the old page */
-			goto end_migration;
-		} else
-			unlock = putback_lru_page(page);
+		putback_lru_page(page);
 	}
 
-	if (unlock)
-		unlock_page(page);
-
 end_migration:
 	if (!charge)
 		mem_cgroup_end_migration(newpage);
Index: test-2.6.26-rc5-mm3/mm/internal.h
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/internal.h
+++ test-2.6.26-rc5-mm3/mm/internal.h
@@ -43,7 +43,7 @@ static inline void __put_page(struct pag
  * in mm/vmscan.c:
  */
 extern int isolate_lru_page(struct page *page);
-extern int putback_lru_page(struct page *page);
+extern void putback_lru_page(struct page *page);
 
 /*
  * in mm/page_alloc.c


^ permalink raw reply	[flat|nested] 290+ messages in thread

* [Experimental][PATCH] putback_lru_page rework
@ 2008-06-18  9:40       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  9:40 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

Lee-san, how about this ?
Tested on x86-64 and tried Nisimura-san's test at el. works good now.
-Kame
==
putback_lru_page()/unevictable page handling rework.

Now, putback_lru_page() requires that the page is locked.
And in some special case, implicitly unlock it.

This patch tries to make putback_lru_pages() to be lock_page() free.
(Of course, some callers must take the lock.)

The main reason that putback_lru_page() assumes that page is locked
is to avoid the change in page's status among Mlocked/Not-Mlocked.

Once it is added to unevictable list, the page is removed from
unevictable list only when page is munlocked. (there are other special
case. but we ignore the special case.)
So, status change during putback_lru_page() is fatal and page should 
be locked.

putback_lru_page() in this patch has a new concepts.
When it adds page to unevictable list, it checks the status is 
changed or not again. if changed, retry to putback.

This patche changes also caller side and cleaning up lock/unlock_page().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroy@jp.fujitsu.com>

---
 mm/internal.h |    2 -
 mm/migrate.c  |   23 +++----------
 mm/mlock.c    |   24 +++++++-------
 mm/vmscan.c   |   96 +++++++++++++++++++++++++---------------------------------
 4 files changed, 61 insertions(+), 84 deletions(-)

Index: test-2.6.26-rc5-mm3/mm/vmscan.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/vmscan.c
+++ test-2.6.26-rc5-mm3/mm/vmscan.c
@@ -486,73 +486,63 @@ int remove_mapping(struct address_space 
  * Page may still be unevictable for other reasons.
  *
  * lru_lock must not be held, interrupts must be enabled.
- * Must be called with page locked.
- *
- * return 1 if page still locked [not truncated], else 0
  */
-int putback_lru_page(struct page *page)
+#ifdef CONFIG_UNEVICTABLE_LRU
+void putback_lru_page(struct page *page)
 {
 	int lru;
-	int ret = 1;
 	int was_unevictable;
 
-	VM_BUG_ON(!PageLocked(page));
 	VM_BUG_ON(PageLRU(page));
 
-	lru = !!TestClearPageActive(page);
 	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
 
-	if (unlikely(!page->mapping)) {
-		/*
-		 * page truncated.  drop lock as put_page() will
-		 * free the page.
-		 */
-		VM_BUG_ON(page_count(page) != 1);
-		unlock_page(page);
-		ret = 0;
-	} else if (page_evictable(page, NULL)) {
-		/*
-		 * For evictable pages, we can use the cache.
-		 * In event of a race, worst case is we end up with an
-		 * unevictable page on [in]active list.
-		 * We know how to handle that.
-		 */
+redo:
+	lru = !!TestClearPageActive(page);
+	if (page_evictable(page, NULL)) {
 		lru += page_is_file_cache(page);
 		lru_cache_add_lru(page, lru);
-		mem_cgroup_move_lists(page, lru);
-#ifdef CONFIG_UNEVICTABLE_LRU
-		if (was_unevictable)
-			count_vm_event(NORECL_PGRESCUED);
-#endif
 	} else {
-		/*
-		 * Put unevictable pages directly on zone's unevictable
-		 * list.
-		 */
+		lru = LRU_UNEVICTABLE;
 		add_page_to_unevictable_list(page);
-		mem_cgroup_move_lists(page, LRU_UNEVICTABLE);
-#ifdef CONFIG_UNEVICTABLE_LRU
-		if (!was_unevictable)
-			count_vm_event(NORECL_PGCULLED);
-#endif
 	}
+	mem_cgroup_move_lists(page, lru);
+
+	/*
+	 * page's status can change while we move it among lru. If an evictable
+	 * page is on unevictable list, it never be freed. To avoid that,
+	 * check after we added it to the list, again.
+	 */
+	if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL)) {
+		if (!isolate_lru_page(page)) {
+			put_page(page);
+			goto redo;
+		}
+		/* This means someone else dropped this page from LRU
+		 * So, it will be freed or putback to LRU again. There is
+		 * nothing to do here.
+		 */
+	}
+
+	if (was_unevictable && lru != LRU_UNEVICTABLE)
+		count_vm_event(NORECL_PGRESCUED);
+	else if (!was_unevictable && lru == LRU_UNEVICTABLE)
+		count_vm_event(NORECL_PGCULLED);
 
 	put_page(page);		/* drop ref from isolate */
-	return ret;		/* ret => "page still locked" */
 }
-
-/*
- * Cull page that shrink_*_list() has detected to be unevictable
- * under page lock to close races with other tasks that might be making
- * the page evictable.  Avoid stranding an evictable page on the
- * unevictable list.
- */
-static void cull_unevictable_page(struct page *page)
+#else
+void putback_lru_page(struct page *page)
 {
-	lock_page(page);
-	if (putback_lru_page(page))
-		unlock_page(page);
+	int lru;
+	VM_BUG_ON(PageLRU(page));
+
+	lru = !!TestClearPageActive(page) + page_is_file_cache(page);
+	lru_cache_add_lru(page, lru);
+	mem_cgroup_move_lists(page, lru);
+	put_page(page);
 }
+#endif
 
 /*
  * shrink_page_list() returns the number of reclaimed pages
@@ -746,8 +736,8 @@ free_it:
 		continue;
 
 cull_mlocked:
-		if (putback_lru_page(page))
-			unlock_page(page);
+		unlock_page(page);
+		putback_lru_page(page);
 		continue;
 
 activate_locked:
@@ -1127,7 +1117,7 @@ static unsigned long shrink_inactive_lis
 			list_del(&page->lru);
 			if (unlikely(!page_evictable(page, NULL))) {
 				spin_unlock_irq(&zone->lru_lock);
-				cull_unevictable_page(page);
+				putback_lru_page(page);
 				spin_lock_irq(&zone->lru_lock);
 				continue;
 			}
@@ -1231,7 +1221,7 @@ static void shrink_active_list(unsigned 
 		list_del(&page->lru);
 
 		if (unlikely(!page_evictable(page, NULL))) {
-			cull_unevictable_page(page);
+			putback_lru_page(page);
 			continue;
 		}
 
@@ -2393,8 +2383,6 @@ int zone_reclaim(struct zone *zone, gfp_
 int page_evictable(struct page *page, struct vm_area_struct *vma)
 {
 
-	VM_BUG_ON(PageUnevictable(page));
-
 	if (mapping_unevictable(page_mapping(page)))
 		return 0;
 
Index: test-2.6.26-rc5-mm3/mm/mlock.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/mlock.c
+++ test-2.6.26-rc5-mm3/mm/mlock.c
@@ -55,7 +55,6 @@ EXPORT_SYMBOL(can_do_mlock);
  */
 void __clear_page_mlock(struct page *page)
 {
-	VM_BUG_ON(!PageLocked(page));	/* for LRU isolate/putback */
 
 	dec_zone_page_state(page, NR_MLOCK);
 	count_vm_event(NORECL_PGCLEARED);
@@ -79,7 +78,6 @@ void __clear_page_mlock(struct page *pag
  */
 void mlock_vma_page(struct page *page)
 {
-	BUG_ON(!PageLocked(page));
 
 	if (!TestSetPageMlocked(page)) {
 		inc_zone_page_state(page, NR_MLOCK);
@@ -109,7 +107,6 @@ void mlock_vma_page(struct page *page)
  */
 static void munlock_vma_page(struct page *page)
 {
-	BUG_ON(!PageLocked(page));
 
 	if (TestClearPageMlocked(page)) {
 		dec_zone_page_state(page, NR_MLOCK);
@@ -169,7 +166,8 @@ static int __mlock_vma_pages_range(struc
 
 		/*
 		 * get_user_pages makes pages present if we are
-		 * setting mlock.
+		 * setting mlock. and this extra reference count will
+		 * disable migration of this page.
 		 */
 		ret = get_user_pages(current, mm, addr,
 				min_t(int, nr_pages, ARRAY_SIZE(pages)),
@@ -197,14 +195,8 @@ static int __mlock_vma_pages_range(struc
 		for (i = 0; i < ret; i++) {
 			struct page *page = pages[i];
 
-			/*
-			 * page might be truncated or migrated out from under
-			 * us.  Check after acquiring page lock.
-			 */
-			lock_page(page);
-			if (page->mapping)
+			if (page_mapcount(page))
 				mlock_vma_page(page);
-			unlock_page(page);
 			put_page(page);		/* ref from get_user_pages() */
 
 			/*
@@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
 	struct page *page;
 	pte_t pte;
 
+	/*
+	 * page is never be unmapped by page-reclaim. we lock this page now.
+	 */
 retry:
 	pte = *ptep;
 	/*
@@ -261,7 +256,15 @@ retry:
 		goto out;
 
 	lock_page(page);
-	if (!page->mapping) {
+	/*
+	 * Because we lock page here, we have to check 2 cases.
+	 * - the page is migrated.
+	 * - the page is truncated (file-cache only)
+	 * Note: Anonymous page doesn't clear page->mapping even if it
+	 * is removed from rmap.
+	 */
+	if (!page->mapping ||
+	     (PageAnon(page) && !page_mapcount(page))) {
 		unlock_page(page);
 		goto retry;
 	}
Index: test-2.6.26-rc5-mm3/mm/migrate.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/migrate.c
+++ test-2.6.26-rc5-mm3/mm/migrate.c
@@ -67,9 +67,7 @@ int putback_lru_pages(struct list_head *
 
 	list_for_each_entry_safe(page, page2, l, lru) {
 		list_del(&page->lru);
-		lock_page(page);
-		if (putback_lru_page(page))
-			unlock_page(page);
+		putback_lru_page(page);
 		count++;
 	}
 	return count;
@@ -571,7 +569,6 @@ static int fallback_migrate_page(struct 
 static int move_to_new_page(struct page *newpage, struct page *page)
 {
 	struct address_space *mapping;
-	int unlock = 1;
 	int rc;
 
 	/*
@@ -610,12 +607,11 @@ static int move_to_new_page(struct page 
 		 * Put back on LRU while holding page locked to
 		 * handle potential race with, e.g., munlock()
 		 */
-		unlock = putback_lru_page(newpage);
+		putback_lru_page(newpage);
 	} else
 		newpage->mapping = NULL;
 
-	if (unlock)
-		unlock_page(newpage);
+	unlock_page(newpage);
 
 	return rc;
 }
@@ -632,7 +628,6 @@ static int unmap_and_move(new_page_t get
 	struct page *newpage = get_new_page(page, private, &result);
 	int rcu_locked = 0;
 	int charge = 0;
-	int unlock = 1;
 
 	if (!newpage)
 		return -ENOMEM;
@@ -713,6 +708,7 @@ rcu_unlock:
 		rcu_read_unlock();
 
 unlock:
+	unlock_page(page);
 
 	if (rc != -EAGAIN) {
  		/*
@@ -722,18 +718,9 @@ unlock:
  		 * restored.
  		 */
  		list_del(&page->lru);
-		if (!page->mapping) {
-			VM_BUG_ON(page_count(page) != 1);
-			unlock_page(page);
-			put_page(page);		/* just free the old page */
-			goto end_migration;
-		} else
-			unlock = putback_lru_page(page);
+		putback_lru_page(page);
 	}
 
-	if (unlock)
-		unlock_page(page);
-
 end_migration:
 	if (!charge)
 		mem_cgroup_end_migration(newpage);
Index: test-2.6.26-rc5-mm3/mm/internal.h
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/internal.h
+++ test-2.6.26-rc5-mm3/mm/internal.h
@@ -43,7 +43,7 @@ static inline void __put_page(struct pag
  * in mm/vmscan.c:
  */
 extern int isolate_lru_page(struct page *page);
-extern int putback_lru_page(struct page *page);
+extern void putback_lru_page(struct page *page);
 
 /*
  * in mm/page_alloc.c

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [Experimental][PATCH] putback_lru_page rework
@ 2008-06-18  9:40       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18  9:40 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Andrew Morton, Rik van Riel, Lee Schermerhorn, Kosaki Motohiro,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

Lee-san, how about this ?
Tested on x86-64 and tried Nisimura-san's test at el. works good now.
-Kame
==
putback_lru_page()/unevictable page handling rework.

Now, putback_lru_page() requires that the page is locked.
And in some special case, implicitly unlock it.

This patch tries to make putback_lru_pages() to be lock_page() free.
(Of course, some callers must take the lock.)

The main reason that putback_lru_page() assumes that page is locked
is to avoid the change in page's status among Mlocked/Not-Mlocked.

Once it is added to unevictable list, the page is removed from
unevictable list only when page is munlocked. (there are other special
case. but we ignore the special case.)
So, status change during putback_lru_page() is fatal and page should 
be locked.

putback_lru_page() in this patch has a new concepts.
When it adds page to unevictable list, it checks the status is 
changed or not again. if changed, retry to putback.

This patche changes also caller side and cleaning up lock/unlock_page().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroy-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

---
 mm/internal.h |    2 -
 mm/migrate.c  |   23 +++----------
 mm/mlock.c    |   24 +++++++-------
 mm/vmscan.c   |   96 +++++++++++++++++++++++++---------------------------------
 4 files changed, 61 insertions(+), 84 deletions(-)

Index: test-2.6.26-rc5-mm3/mm/vmscan.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/vmscan.c
+++ test-2.6.26-rc5-mm3/mm/vmscan.c
@@ -486,73 +486,63 @@ int remove_mapping(struct address_space 
  * Page may still be unevictable for other reasons.
  *
  * lru_lock must not be held, interrupts must be enabled.
- * Must be called with page locked.
- *
- * return 1 if page still locked [not truncated], else 0
  */
-int putback_lru_page(struct page *page)
+#ifdef CONFIG_UNEVICTABLE_LRU
+void putback_lru_page(struct page *page)
 {
 	int lru;
-	int ret = 1;
 	int was_unevictable;
 
-	VM_BUG_ON(!PageLocked(page));
 	VM_BUG_ON(PageLRU(page));
 
-	lru = !!TestClearPageActive(page);
 	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
 
-	if (unlikely(!page->mapping)) {
-		/*
-		 * page truncated.  drop lock as put_page() will
-		 * free the page.
-		 */
-		VM_BUG_ON(page_count(page) != 1);
-		unlock_page(page);
-		ret = 0;
-	} else if (page_evictable(page, NULL)) {
-		/*
-		 * For evictable pages, we can use the cache.
-		 * In event of a race, worst case is we end up with an
-		 * unevictable page on [in]active list.
-		 * We know how to handle that.
-		 */
+redo:
+	lru = !!TestClearPageActive(page);
+	if (page_evictable(page, NULL)) {
 		lru += page_is_file_cache(page);
 		lru_cache_add_lru(page, lru);
-		mem_cgroup_move_lists(page, lru);
-#ifdef CONFIG_UNEVICTABLE_LRU
-		if (was_unevictable)
-			count_vm_event(NORECL_PGRESCUED);
-#endif
 	} else {
-		/*
-		 * Put unevictable pages directly on zone's unevictable
-		 * list.
-		 */
+		lru = LRU_UNEVICTABLE;
 		add_page_to_unevictable_list(page);
-		mem_cgroup_move_lists(page, LRU_UNEVICTABLE);
-#ifdef CONFIG_UNEVICTABLE_LRU
-		if (!was_unevictable)
-			count_vm_event(NORECL_PGCULLED);
-#endif
 	}
+	mem_cgroup_move_lists(page, lru);
+
+	/*
+	 * page's status can change while we move it among lru. If an evictable
+	 * page is on unevictable list, it never be freed. To avoid that,
+	 * check after we added it to the list, again.
+	 */
+	if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL)) {
+		if (!isolate_lru_page(page)) {
+			put_page(page);
+			goto redo;
+		}
+		/* This means someone else dropped this page from LRU
+		 * So, it will be freed or putback to LRU again. There is
+		 * nothing to do here.
+		 */
+	}
+
+	if (was_unevictable && lru != LRU_UNEVICTABLE)
+		count_vm_event(NORECL_PGRESCUED);
+	else if (!was_unevictable && lru == LRU_UNEVICTABLE)
+		count_vm_event(NORECL_PGCULLED);
 
 	put_page(page);		/* drop ref from isolate */
-	return ret;		/* ret => "page still locked" */
 }
-
-/*
- * Cull page that shrink_*_list() has detected to be unevictable
- * under page lock to close races with other tasks that might be making
- * the page evictable.  Avoid stranding an evictable page on the
- * unevictable list.
- */
-static void cull_unevictable_page(struct page *page)
+#else
+void putback_lru_page(struct page *page)
 {
-	lock_page(page);
-	if (putback_lru_page(page))
-		unlock_page(page);
+	int lru;
+	VM_BUG_ON(PageLRU(page));
+
+	lru = !!TestClearPageActive(page) + page_is_file_cache(page);
+	lru_cache_add_lru(page, lru);
+	mem_cgroup_move_lists(page, lru);
+	put_page(page);
 }
+#endif
 
 /*
  * shrink_page_list() returns the number of reclaimed pages
@@ -746,8 +736,8 @@ free_it:
 		continue;
 
 cull_mlocked:
-		if (putback_lru_page(page))
-			unlock_page(page);
+		unlock_page(page);
+		putback_lru_page(page);
 		continue;
 
 activate_locked:
@@ -1127,7 +1117,7 @@ static unsigned long shrink_inactive_lis
 			list_del(&page->lru);
 			if (unlikely(!page_evictable(page, NULL))) {
 				spin_unlock_irq(&zone->lru_lock);
-				cull_unevictable_page(page);
+				putback_lru_page(page);
 				spin_lock_irq(&zone->lru_lock);
 				continue;
 			}
@@ -1231,7 +1221,7 @@ static void shrink_active_list(unsigned 
 		list_del(&page->lru);
 
 		if (unlikely(!page_evictable(page, NULL))) {
-			cull_unevictable_page(page);
+			putback_lru_page(page);
 			continue;
 		}
 
@@ -2393,8 +2383,6 @@ int zone_reclaim(struct zone *zone, gfp_
 int page_evictable(struct page *page, struct vm_area_struct *vma)
 {
 
-	VM_BUG_ON(PageUnevictable(page));
-
 	if (mapping_unevictable(page_mapping(page)))
 		return 0;
 
Index: test-2.6.26-rc5-mm3/mm/mlock.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/mlock.c
+++ test-2.6.26-rc5-mm3/mm/mlock.c
@@ -55,7 +55,6 @@ EXPORT_SYMBOL(can_do_mlock);
  */
 void __clear_page_mlock(struct page *page)
 {
-	VM_BUG_ON(!PageLocked(page));	/* for LRU isolate/putback */
 
 	dec_zone_page_state(page, NR_MLOCK);
 	count_vm_event(NORECL_PGCLEARED);
@@ -79,7 +78,6 @@ void __clear_page_mlock(struct page *pag
  */
 void mlock_vma_page(struct page *page)
 {
-	BUG_ON(!PageLocked(page));
 
 	if (!TestSetPageMlocked(page)) {
 		inc_zone_page_state(page, NR_MLOCK);
@@ -109,7 +107,6 @@ void mlock_vma_page(struct page *page)
  */
 static void munlock_vma_page(struct page *page)
 {
-	BUG_ON(!PageLocked(page));
 
 	if (TestClearPageMlocked(page)) {
 		dec_zone_page_state(page, NR_MLOCK);
@@ -169,7 +166,8 @@ static int __mlock_vma_pages_range(struc
 
 		/*
 		 * get_user_pages makes pages present if we are
-		 * setting mlock.
+		 * setting mlock. and this extra reference count will
+		 * disable migration of this page.
 		 */
 		ret = get_user_pages(current, mm, addr,
 				min_t(int, nr_pages, ARRAY_SIZE(pages)),
@@ -197,14 +195,8 @@ static int __mlock_vma_pages_range(struc
 		for (i = 0; i < ret; i++) {
 			struct page *page = pages[i];
 
-			/*
-			 * page might be truncated or migrated out from under
-			 * us.  Check after acquiring page lock.
-			 */
-			lock_page(page);
-			if (page->mapping)
+			if (page_mapcount(page))
 				mlock_vma_page(page);
-			unlock_page(page);
 			put_page(page);		/* ref from get_user_pages() */
 
 			/*
@@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
 	struct page *page;
 	pte_t pte;
 
+	/*
+	 * page is never be unmapped by page-reclaim. we lock this page now.
+	 */
 retry:
 	pte = *ptep;
 	/*
@@ -261,7 +256,15 @@ retry:
 		goto out;
 
 	lock_page(page);
-	if (!page->mapping) {
+	/*
+	 * Because we lock page here, we have to check 2 cases.
+	 * - the page is migrated.
+	 * - the page is truncated (file-cache only)
+	 * Note: Anonymous page doesn't clear page->mapping even if it
+	 * is removed from rmap.
+	 */
+	if (!page->mapping ||
+	     (PageAnon(page) && !page_mapcount(page))) {
 		unlock_page(page);
 		goto retry;
 	}
Index: test-2.6.26-rc5-mm3/mm/migrate.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/migrate.c
+++ test-2.6.26-rc5-mm3/mm/migrate.c
@@ -67,9 +67,7 @@ int putback_lru_pages(struct list_head *
 
 	list_for_each_entry_safe(page, page2, l, lru) {
 		list_del(&page->lru);
-		lock_page(page);
-		if (putback_lru_page(page))
-			unlock_page(page);
+		putback_lru_page(page);
 		count++;
 	}
 	return count;
@@ -571,7 +569,6 @@ static int fallback_migrate_page(struct 
 static int move_to_new_page(struct page *newpage, struct page *page)
 {
 	struct address_space *mapping;
-	int unlock = 1;
 	int rc;
 
 	/*
@@ -610,12 +607,11 @@ static int move_to_new_page(struct page 
 		 * Put back on LRU while holding page locked to
 		 * handle potential race with, e.g., munlock()
 		 */
-		unlock = putback_lru_page(newpage);
+		putback_lru_page(newpage);
 	} else
 		newpage->mapping = NULL;
 
-	if (unlock)
-		unlock_page(newpage);
+	unlock_page(newpage);
 
 	return rc;
 }
@@ -632,7 +628,6 @@ static int unmap_and_move(new_page_t get
 	struct page *newpage = get_new_page(page, private, &result);
 	int rcu_locked = 0;
 	int charge = 0;
-	int unlock = 1;
 
 	if (!newpage)
 		return -ENOMEM;
@@ -713,6 +708,7 @@ rcu_unlock:
 		rcu_read_unlock();
 
 unlock:
+	unlock_page(page);
 
 	if (rc != -EAGAIN) {
  		/*
@@ -722,18 +718,9 @@ unlock:
  		 * restored.
  		 */
  		list_del(&page->lru);
-		if (!page->mapping) {
-			VM_BUG_ON(page_count(page) != 1);
-			unlock_page(page);
-			put_page(page);		/* just free the old page */
-			goto end_migration;
-		} else
-			unlock = putback_lru_page(page);
+		putback_lru_page(page);
 	}
 
-	if (unlock)
-		unlock_page(page);
-
 end_migration:
 	if (!charge)
 		mem_cgroup_end_migration(newpage);
Index: test-2.6.26-rc5-mm3/mm/internal.h
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/internal.h
+++ test-2.6.26-rc5-mm3/mm/internal.h
@@ -43,7 +43,7 @@ static inline void __put_page(struct pag
  * in mm/vmscan.c:
  */
 extern int isolate_lru_page(struct page *page);
-extern int putback_lru_page(struct page *page);
+extern void putback_lru_page(struct page *page);
 
 /*
  * in mm/page_alloc.c

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
  2008-06-18  2:32         ` Daisuke Nishimura
  (?)
@ 2008-06-18 10:20           ` KOSAKI Motohiro
  -1 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-18 10:20 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: kosaki.motohiro, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

> > > I got bad_page after hundreds times of page migration.
> > > It seems that a locked page is being freed.
> > 
> > I can't reproduce this bad page.
> > I'll try again tomorrow ;)
> 
> OK. I'll report on my test more precisely.

Thank you verbose explain.
I ran its testcase >3H today.
but unfortunately, I couldn't reproduce it.

Hmm...



^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
@ 2008-06-18 10:20           ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-18 10:20 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: kosaki.motohiro, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

> > > I got bad_page after hundreds times of page migration.
> > > It seems that a locked page is being freed.
> > 
> > I can't reproduce this bad page.
> > I'll try again tomorrow ;)
> 
> OK. I'll report on my test more precisely.

Thank you verbose explain.
I ran its testcase >3H today.
but unfortunately, I couldn't reproduce it.

Hmm...


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3)
@ 2008-06-18 10:20           ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-18 10:20 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, Andrew Morton,
	Rik van Riel, Lee Schermerhorn, Nick Piggin,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

> > > I got bad_page after hundreds times of page migration.
> > > It seems that a locked page is being freed.
> > 
> > I can't reproduce this bad page.
> > I'll try again tomorrow ;)
> 
> OK. I'll report on my test more precisely.

Thank you verbose explain.
I ran its testcase >3H today.
but unfortunately, I couldn't reproduce it.

Hmm...


--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
  2008-06-18  9:40       ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-18 11:36         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-18 11:36 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro, Daisuke Nishimura, Andrew Morton, Rik van Riel,
	Lee Schermerhorn, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers

Hi kame-san,

> putback_lru_page() in this patch has a new concepts.
> When it adds page to unevictable list, it checks the status is 
> changed or not again. if changed, retry to putback.

it seems good idea :)
this patch can reduce lock_page() call.


> -	} else if (page_evictable(page, NULL)) {
> -		/*
> -		 * For evictable pages, we can use the cache.
> -		 * In event of a race, worst case is we end up with an
> -		 * unevictable page on [in]active list.
> -		 * We know how to handle that.
> -		 */

I think this comment is useful.
Why do you want kill it?


> +redo:
> +	lru = !!TestClearPageActive(page);
> +	if (page_evictable(page, NULL)) {
>  		lru += page_is_file_cache(page);
>  		lru_cache_add_lru(page, lru);
> -		mem_cgroup_move_lists(page, lru);
> -#ifdef CONFIG_UNEVICTABLE_LRU
> -		if (was_unevictable)
> -			count_vm_event(NORECL_PGRESCUED);
> -#endif
>  	} else {
> -		/*
> -		 * Put unevictable pages directly on zone's unevictable
> -		 * list.
> -		 */

ditto.

> +		lru = LRU_UNEVICTABLE;
>  		add_page_to_unevictable_list(page);
> -		mem_cgroup_move_lists(page, LRU_UNEVICTABLE);
> -#ifdef CONFIG_UNEVICTABLE_LRU
> -		if (!was_unevictable)
> -			count_vm_event(NORECL_PGCULLED);
> -#endif
>  	}
> +	mem_cgroup_move_lists(page, lru);
> +
> +	/*
> +	 * page's status can change while we move it among lru. If an evictable
> +	 * page is on unevictable list, it never be freed. To avoid that,
> +	 * check after we added it to the list, again.
> +	 */
> +	if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL)) {
> +		if (!isolate_lru_page(page)) {
> +			put_page(page);
> +			goto redo;

No.
We should treat carefully unevictable -> unevictable moving too.


> +		}
> +		/* This means someone else dropped this page from LRU
> +		 * So, it will be freed or putback to LRU again. There is
> +		 * nothing to do here.
> +		 */
> +	}
> +
> +	if (was_unevictable && lru != LRU_UNEVICTABLE)
> +		count_vm_event(NORECL_PGRESCUED);
> +	else if (!was_unevictable && lru == LRU_UNEVICTABLE)
> +		count_vm_event(NORECL_PGCULLED);
>  
>  	put_page(page);		/* drop ref from isolate */
> -	return ret;		/* ret => "page still locked" */
>  }
> -
> -/*
> - * Cull page that shrink_*_list() has detected to be unevictable
> - * under page lock to close races with other tasks that might be making
> - * the page evictable.  Avoid stranding an evictable page on the
> - * unevictable list.
> - */
> -static void cull_unevictable_page(struct page *page)
> +#else
> +void putback_lru_page(struct page *page)
>  {
> -	lock_page(page);
> -	if (putback_lru_page(page))
> -		unlock_page(page);
> +	int lru;
> +	VM_BUG_ON(PageLRU(page));
> +
> +	lru = !!TestClearPageActive(page) + page_is_file_cache(page);
> +	lru_cache_add_lru(page, lru);
> +	mem_cgroup_move_lists(page, lru);
> +	put_page(page);
>  }
> +#endif
>  
>  /*
>   * shrink_page_list() returns the number of reclaimed pages
> @@ -746,8 +736,8 @@ free_it:
>  		continue;
>  
>  cull_mlocked:
> -		if (putback_lru_page(page))
> -			unlock_page(page);
> +		unlock_page(page);
> +		putback_lru_page(page);
>  		continue;
>  
>  activate_locked:
> @@ -1127,7 +1117,7 @@ static unsigned long shrink_inactive_lis
>  			list_del(&page->lru);
>  			if (unlikely(!page_evictable(page, NULL))) {
>  				spin_unlock_irq(&zone->lru_lock);
> -				cull_unevictable_page(page);
> +				putback_lru_page(page);
>  				spin_lock_irq(&zone->lru_lock);
>  				continue;
>  			}
> @@ -1231,7 +1221,7 @@ static void shrink_active_list(unsigned 
>  		list_del(&page->lru);
>  
>  		if (unlikely(!page_evictable(page, NULL))) {
> -			cull_unevictable_page(page);
> +			putback_lru_page(page);
>  			continue;
>  		}
>  
> @@ -2393,8 +2383,6 @@ int zone_reclaim(struct zone *zone, gfp_
>  int page_evictable(struct page *page, struct vm_area_struct *vma)
>  {
>  
> -	VM_BUG_ON(PageUnevictable(page));
> -
>  	if (mapping_unevictable(page_mapping(page)))
>  		return 0;

Why do you remove this?




> @@ -169,7 +166,8 @@ static int __mlock_vma_pages_range(struc
>  
>  		/*
>  		 * get_user_pages makes pages present if we are
> -		 * setting mlock.
> +		 * setting mlock. and this extra reference count will
> +		 * disable migration of this page.
>  		 */
>  		ret = get_user_pages(current, mm, addr,
>  				min_t(int, nr_pages, ARRAY_SIZE(pages)),
> @@ -197,14 +195,8 @@ static int __mlock_vma_pages_range(struc
>  		for (i = 0; i < ret; i++) {
>  			struct page *page = pages[i];
>  
> -			/*
> -			 * page might be truncated or migrated out from under
> -			 * us.  Check after acquiring page lock.
> -			 */
> -			lock_page(page);
> -			if (page->mapping)
> +			if (page_mapcount(page))
>  				mlock_vma_page(page);
> -			unlock_page(page);
>  			put_page(page);		/* ref from get_user_pages() */
>  
>  			/*
> @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
>  	struct page *page;
>  	pte_t pte;
>  
> +	/*
> +	 * page is never be unmapped by page-reclaim. we lock this page now.
> +	 */
>  retry:
>  	pte = *ptep;
>  	/*
> @@ -261,7 +256,15 @@ retry:
>  		goto out;
>  
>  	lock_page(page);
> -	if (!page->mapping) {
> +	/*
> +	 * Because we lock page here, we have to check 2 cases.
> +	 * - the page is migrated.
> +	 * - the page is truncated (file-cache only)
> +	 * Note: Anonymous page doesn't clear page->mapping even if it
> +	 * is removed from rmap.
> +	 */
> +	if (!page->mapping ||
> +	     (PageAnon(page) && !page_mapcount(page))) {
>  		unlock_page(page);
>  		goto retry;
>  	}
> Index: test-2.6.26-rc5-mm3/mm/migrate.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/migrate.c
> +++ test-2.6.26-rc5-mm3/mm/migrate.c
> @@ -67,9 +67,7 @@ int putback_lru_pages(struct list_head *
>  
>  	list_for_each_entry_safe(page, page2, l, lru) {
>  		list_del(&page->lru);
> -		lock_page(page);
> -		if (putback_lru_page(page))
> -			unlock_page(page);
> +		putback_lru_page(page);
>  		count++;
>  	}
>  	return count;
> @@ -571,7 +569,6 @@ static int fallback_migrate_page(struct 
>  static int move_to_new_page(struct page *newpage, struct page *page)
>  {
>  	struct address_space *mapping;
> -	int unlock = 1;
>  	int rc;
>  
>  	/*
> @@ -610,12 +607,11 @@ static int move_to_new_page(struct page 
>  		 * Put back on LRU while holding page locked to
>  		 * handle potential race with, e.g., munlock()
>  		 */

this comment isn't true.

> -		unlock = putback_lru_page(newpage);
> +		putback_lru_page(newpage);
>  	} else
>  		newpage->mapping = NULL;

originally move_to_lru() called in unmap_and_move().
unevictable infrastructure patch move to this point for 
calling putback_lru_page() under page locked.

So, your patch remove page locked dependency.
move to unmap_and_move() again is better.

it become page lock holding time reducing.

>  
> -	if (unlock)
> -		unlock_page(newpage);
> +	unlock_page(newpage);
>  
>  	return rc;
>  }
> @@ -632,7 +628,6 @@ static int unmap_and_move(new_page_t get
>  	struct page *newpage = get_new_page(page, private, &result);
>  	int rcu_locked = 0;
>  	int charge = 0;
> -	int unlock = 1;
>  
>  	if (!newpage)
>  		return -ENOMEM;
> @@ -713,6 +708,7 @@ rcu_unlock:
>  		rcu_read_unlock();
>  
>  unlock:
> +	unlock_page(page);
>  
>  	if (rc != -EAGAIN) {
>   		/*
> @@ -722,18 +718,9 @@ unlock:
>   		 * restored.
>   		 */
>   		list_del(&page->lru);
> -		if (!page->mapping) {
> -			VM_BUG_ON(page_count(page) != 1);
> -			unlock_page(page);
> -			put_page(page);		/* just free the old page */
> -			goto end_migration;
> -		} else
> -			unlock = putback_lru_page(page);
> +		putback_lru_page(page);
>  	}
>  
> -	if (unlock)
> -		unlock_page(page);
> -
>  end_migration:
>  	if (!charge)
>  		mem_cgroup_end_migration(newpage);
> Index: test-2.6.26-rc5-mm3/mm/internal.h
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/internal.h
> +++ test-2.6.26-rc5-mm3/mm/internal.h
> @@ -43,7 +43,7 @@ static inline void __put_page(struct pag
>   * in mm/vmscan.c:
>   */
>  extern int isolate_lru_page(struct page *page);
> -extern int putback_lru_page(struct page *page);
> +extern void putback_lru_page(struct page *page);
>  
>  /*
>   * in mm/page_alloc.c
> 




^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-18 11:36         ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-18 11:36 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro, Daisuke Nishimura, Andrew Morton, Rik van Riel,
	Lee Schermerhorn, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers

Hi kame-san,

> putback_lru_page() in this patch has a new concepts.
> When it adds page to unevictable list, it checks the status is 
> changed or not again. if changed, retry to putback.

it seems good idea :)
this patch can reduce lock_page() call.


> -	} else if (page_evictable(page, NULL)) {
> -		/*
> -		 * For evictable pages, we can use the cache.
> -		 * In event of a race, worst case is we end up with an
> -		 * unevictable page on [in]active list.
> -		 * We know how to handle that.
> -		 */

I think this comment is useful.
Why do you want kill it?


> +redo:
> +	lru = !!TestClearPageActive(page);
> +	if (page_evictable(page, NULL)) {
>  		lru += page_is_file_cache(page);
>  		lru_cache_add_lru(page, lru);
> -		mem_cgroup_move_lists(page, lru);
> -#ifdef CONFIG_UNEVICTABLE_LRU
> -		if (was_unevictable)
> -			count_vm_event(NORECL_PGRESCUED);
> -#endif
>  	} else {
> -		/*
> -		 * Put unevictable pages directly on zone's unevictable
> -		 * list.
> -		 */

ditto.

> +		lru = LRU_UNEVICTABLE;
>  		add_page_to_unevictable_list(page);
> -		mem_cgroup_move_lists(page, LRU_UNEVICTABLE);
> -#ifdef CONFIG_UNEVICTABLE_LRU
> -		if (!was_unevictable)
> -			count_vm_event(NORECL_PGCULLED);
> -#endif
>  	}
> +	mem_cgroup_move_lists(page, lru);
> +
> +	/*
> +	 * page's status can change while we move it among lru. If an evictable
> +	 * page is on unevictable list, it never be freed. To avoid that,
> +	 * check after we added it to the list, again.
> +	 */
> +	if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL)) {
> +		if (!isolate_lru_page(page)) {
> +			put_page(page);
> +			goto redo;

No.
We should treat carefully unevictable -> unevictable moving too.


> +		}
> +		/* This means someone else dropped this page from LRU
> +		 * So, it will be freed or putback to LRU again. There is
> +		 * nothing to do here.
> +		 */
> +	}
> +
> +	if (was_unevictable && lru != LRU_UNEVICTABLE)
> +		count_vm_event(NORECL_PGRESCUED);
> +	else if (!was_unevictable && lru == LRU_UNEVICTABLE)
> +		count_vm_event(NORECL_PGCULLED);
>  
>  	put_page(page);		/* drop ref from isolate */
> -	return ret;		/* ret => "page still locked" */
>  }
> -
> -/*
> - * Cull page that shrink_*_list() has detected to be unevictable
> - * under page lock to close races with other tasks that might be making
> - * the page evictable.  Avoid stranding an evictable page on the
> - * unevictable list.
> - */
> -static void cull_unevictable_page(struct page *page)
> +#else
> +void putback_lru_page(struct page *page)
>  {
> -	lock_page(page);
> -	if (putback_lru_page(page))
> -		unlock_page(page);
> +	int lru;
> +	VM_BUG_ON(PageLRU(page));
> +
> +	lru = !!TestClearPageActive(page) + page_is_file_cache(page);
> +	lru_cache_add_lru(page, lru);
> +	mem_cgroup_move_lists(page, lru);
> +	put_page(page);
>  }
> +#endif
>  
>  /*
>   * shrink_page_list() returns the number of reclaimed pages
> @@ -746,8 +736,8 @@ free_it:
>  		continue;
>  
>  cull_mlocked:
> -		if (putback_lru_page(page))
> -			unlock_page(page);
> +		unlock_page(page);
> +		putback_lru_page(page);
>  		continue;
>  
>  activate_locked:
> @@ -1127,7 +1117,7 @@ static unsigned long shrink_inactive_lis
>  			list_del(&page->lru);
>  			if (unlikely(!page_evictable(page, NULL))) {
>  				spin_unlock_irq(&zone->lru_lock);
> -				cull_unevictable_page(page);
> +				putback_lru_page(page);
>  				spin_lock_irq(&zone->lru_lock);
>  				continue;
>  			}
> @@ -1231,7 +1221,7 @@ static void shrink_active_list(unsigned 
>  		list_del(&page->lru);
>  
>  		if (unlikely(!page_evictable(page, NULL))) {
> -			cull_unevictable_page(page);
> +			putback_lru_page(page);
>  			continue;
>  		}
>  
> @@ -2393,8 +2383,6 @@ int zone_reclaim(struct zone *zone, gfp_
>  int page_evictable(struct page *page, struct vm_area_struct *vma)
>  {
>  
> -	VM_BUG_ON(PageUnevictable(page));
> -
>  	if (mapping_unevictable(page_mapping(page)))
>  		return 0;

Why do you remove this?




> @@ -169,7 +166,8 @@ static int __mlock_vma_pages_range(struc
>  
>  		/*
>  		 * get_user_pages makes pages present if we are
> -		 * setting mlock.
> +		 * setting mlock. and this extra reference count will
> +		 * disable migration of this page.
>  		 */
>  		ret = get_user_pages(current, mm, addr,
>  				min_t(int, nr_pages, ARRAY_SIZE(pages)),
> @@ -197,14 +195,8 @@ static int __mlock_vma_pages_range(struc
>  		for (i = 0; i < ret; i++) {
>  			struct page *page = pages[i];
>  
> -			/*
> -			 * page might be truncated or migrated out from under
> -			 * us.  Check after acquiring page lock.
> -			 */
> -			lock_page(page);
> -			if (page->mapping)
> +			if (page_mapcount(page))
>  				mlock_vma_page(page);
> -			unlock_page(page);
>  			put_page(page);		/* ref from get_user_pages() */
>  
>  			/*
> @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
>  	struct page *page;
>  	pte_t pte;
>  
> +	/*
> +	 * page is never be unmapped by page-reclaim. we lock this page now.
> +	 */
>  retry:
>  	pte = *ptep;
>  	/*
> @@ -261,7 +256,15 @@ retry:
>  		goto out;
>  
>  	lock_page(page);
> -	if (!page->mapping) {
> +	/*
> +	 * Because we lock page here, we have to check 2 cases.
> +	 * - the page is migrated.
> +	 * - the page is truncated (file-cache only)
> +	 * Note: Anonymous page doesn't clear page->mapping even if it
> +	 * is removed from rmap.
> +	 */
> +	if (!page->mapping ||
> +	     (PageAnon(page) && !page_mapcount(page))) {
>  		unlock_page(page);
>  		goto retry;
>  	}
> Index: test-2.6.26-rc5-mm3/mm/migrate.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/migrate.c
> +++ test-2.6.26-rc5-mm3/mm/migrate.c
> @@ -67,9 +67,7 @@ int putback_lru_pages(struct list_head *
>  
>  	list_for_each_entry_safe(page, page2, l, lru) {
>  		list_del(&page->lru);
> -		lock_page(page);
> -		if (putback_lru_page(page))
> -			unlock_page(page);
> +		putback_lru_page(page);
>  		count++;
>  	}
>  	return count;
> @@ -571,7 +569,6 @@ static int fallback_migrate_page(struct 
>  static int move_to_new_page(struct page *newpage, struct page *page)
>  {
>  	struct address_space *mapping;
> -	int unlock = 1;
>  	int rc;
>  
>  	/*
> @@ -610,12 +607,11 @@ static int move_to_new_page(struct page 
>  		 * Put back on LRU while holding page locked to
>  		 * handle potential race with, e.g., munlock()
>  		 */

this comment isn't true.

> -		unlock = putback_lru_page(newpage);
> +		putback_lru_page(newpage);
>  	} else
>  		newpage->mapping = NULL;

originally move_to_lru() called in unmap_and_move().
unevictable infrastructure patch move to this point for 
calling putback_lru_page() under page locked.

So, your patch remove page locked dependency.
move to unmap_and_move() again is better.

it become page lock holding time reducing.

>  
> -	if (unlock)
> -		unlock_page(newpage);
> +	unlock_page(newpage);
>  
>  	return rc;
>  }
> @@ -632,7 +628,6 @@ static int unmap_and_move(new_page_t get
>  	struct page *newpage = get_new_page(page, private, &result);
>  	int rcu_locked = 0;
>  	int charge = 0;
> -	int unlock = 1;
>  
>  	if (!newpage)
>  		return -ENOMEM;
> @@ -713,6 +708,7 @@ rcu_unlock:
>  		rcu_read_unlock();
>  
>  unlock:
> +	unlock_page(page);
>  
>  	if (rc != -EAGAIN) {
>   		/*
> @@ -722,18 +718,9 @@ unlock:
>   		 * restored.
>   		 */
>   		list_del(&page->lru);
> -		if (!page->mapping) {
> -			VM_BUG_ON(page_count(page) != 1);
> -			unlock_page(page);
> -			put_page(page);		/* just free the old page */
> -			goto end_migration;
> -		} else
> -			unlock = putback_lru_page(page);
> +		putback_lru_page(page);
>  	}
>  
> -	if (unlock)
> -		unlock_page(page);
> -
>  end_migration:
>  	if (!charge)
>  		mem_cgroup_end_migration(newpage);
> Index: test-2.6.26-rc5-mm3/mm/internal.h
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/internal.h
> +++ test-2.6.26-rc5-mm3/mm/internal.h
> @@ -43,7 +43,7 @@ static inline void __put_page(struct pag
>   * in mm/vmscan.c:
>   */
>  extern int isolate_lru_page(struct page *page);
> -extern int putback_lru_page(struct page *page);
> +extern void putback_lru_page(struct page *page);
>  
>  /*
>   * in mm/page_alloc.c
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-18 11:36         ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-18 11:36 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, Daisuke Nishimura,
	Andrew Morton, Rik van Riel, Lee Schermerhorn, Nick Piggin,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

Hi kame-san,

> putback_lru_page() in this patch has a new concepts.
> When it adds page to unevictable list, it checks the status is 
> changed or not again. if changed, retry to putback.

it seems good idea :)
this patch can reduce lock_page() call.


> -	} else if (page_evictable(page, NULL)) {
> -		/*
> -		 * For evictable pages, we can use the cache.
> -		 * In event of a race, worst case is we end up with an
> -		 * unevictable page on [in]active list.
> -		 * We know how to handle that.
> -		 */

I think this comment is useful.
Why do you want kill it?


> +redo:
> +	lru = !!TestClearPageActive(page);
> +	if (page_evictable(page, NULL)) {
>  		lru += page_is_file_cache(page);
>  		lru_cache_add_lru(page, lru);
> -		mem_cgroup_move_lists(page, lru);
> -#ifdef CONFIG_UNEVICTABLE_LRU
> -		if (was_unevictable)
> -			count_vm_event(NORECL_PGRESCUED);
> -#endif
>  	} else {
> -		/*
> -		 * Put unevictable pages directly on zone's unevictable
> -		 * list.
> -		 */

ditto.

> +		lru = LRU_UNEVICTABLE;
>  		add_page_to_unevictable_list(page);
> -		mem_cgroup_move_lists(page, LRU_UNEVICTABLE);
> -#ifdef CONFIG_UNEVICTABLE_LRU
> -		if (!was_unevictable)
> -			count_vm_event(NORECL_PGCULLED);
> -#endif
>  	}
> +	mem_cgroup_move_lists(page, lru);
> +
> +	/*
> +	 * page's status can change while we move it among lru. If an evictable
> +	 * page is on unevictable list, it never be freed. To avoid that,
> +	 * check after we added it to the list, again.
> +	 */
> +	if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL)) {
> +		if (!isolate_lru_page(page)) {
> +			put_page(page);
> +			goto redo;

No.
We should treat carefully unevictable -> unevictable moving too.


> +		}
> +		/* This means someone else dropped this page from LRU
> +		 * So, it will be freed or putback to LRU again. There is
> +		 * nothing to do here.
> +		 */
> +	}
> +
> +	if (was_unevictable && lru != LRU_UNEVICTABLE)
> +		count_vm_event(NORECL_PGRESCUED);
> +	else if (!was_unevictable && lru == LRU_UNEVICTABLE)
> +		count_vm_event(NORECL_PGCULLED);
>  
>  	put_page(page);		/* drop ref from isolate */
> -	return ret;		/* ret => "page still locked" */
>  }
> -
> -/*
> - * Cull page that shrink_*_list() has detected to be unevictable
> - * under page lock to close races with other tasks that might be making
> - * the page evictable.  Avoid stranding an evictable page on the
> - * unevictable list.
> - */
> -static void cull_unevictable_page(struct page *page)
> +#else
> +void putback_lru_page(struct page *page)
>  {
> -	lock_page(page);
> -	if (putback_lru_page(page))
> -		unlock_page(page);
> +	int lru;
> +	VM_BUG_ON(PageLRU(page));
> +
> +	lru = !!TestClearPageActive(page) + page_is_file_cache(page);
> +	lru_cache_add_lru(page, lru);
> +	mem_cgroup_move_lists(page, lru);
> +	put_page(page);
>  }
> +#endif
>  
>  /*
>   * shrink_page_list() returns the number of reclaimed pages
> @@ -746,8 +736,8 @@ free_it:
>  		continue;
>  
>  cull_mlocked:
> -		if (putback_lru_page(page))
> -			unlock_page(page);
> +		unlock_page(page);
> +		putback_lru_page(page);
>  		continue;
>  
>  activate_locked:
> @@ -1127,7 +1117,7 @@ static unsigned long shrink_inactive_lis
>  			list_del(&page->lru);
>  			if (unlikely(!page_evictable(page, NULL))) {
>  				spin_unlock_irq(&zone->lru_lock);
> -				cull_unevictable_page(page);
> +				putback_lru_page(page);
>  				spin_lock_irq(&zone->lru_lock);
>  				continue;
>  			}
> @@ -1231,7 +1221,7 @@ static void shrink_active_list(unsigned 
>  		list_del(&page->lru);
>  
>  		if (unlikely(!page_evictable(page, NULL))) {
> -			cull_unevictable_page(page);
> +			putback_lru_page(page);
>  			continue;
>  		}
>  
> @@ -2393,8 +2383,6 @@ int zone_reclaim(struct zone *zone, gfp_
>  int page_evictable(struct page *page, struct vm_area_struct *vma)
>  {
>  
> -	VM_BUG_ON(PageUnevictable(page));
> -
>  	if (mapping_unevictable(page_mapping(page)))
>  		return 0;

Why do you remove this?




> @@ -169,7 +166,8 @@ static int __mlock_vma_pages_range(struc
>  
>  		/*
>  		 * get_user_pages makes pages present if we are
> -		 * setting mlock.
> +		 * setting mlock. and this extra reference count will
> +		 * disable migration of this page.
>  		 */
>  		ret = get_user_pages(current, mm, addr,
>  				min_t(int, nr_pages, ARRAY_SIZE(pages)),
> @@ -197,14 +195,8 @@ static int __mlock_vma_pages_range(struc
>  		for (i = 0; i < ret; i++) {
>  			struct page *page = pages[i];
>  
> -			/*
> -			 * page might be truncated or migrated out from under
> -			 * us.  Check after acquiring page lock.
> -			 */
> -			lock_page(page);
> -			if (page->mapping)
> +			if (page_mapcount(page))
>  				mlock_vma_page(page);
> -			unlock_page(page);
>  			put_page(page);		/* ref from get_user_pages() */
>  
>  			/*
> @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
>  	struct page *page;
>  	pte_t pte;
>  
> +	/*
> +	 * page is never be unmapped by page-reclaim. we lock this page now.
> +	 */
>  retry:
>  	pte = *ptep;
>  	/*
> @@ -261,7 +256,15 @@ retry:
>  		goto out;
>  
>  	lock_page(page);
> -	if (!page->mapping) {
> +	/*
> +	 * Because we lock page here, we have to check 2 cases.
> +	 * - the page is migrated.
> +	 * - the page is truncated (file-cache only)
> +	 * Note: Anonymous page doesn't clear page->mapping even if it
> +	 * is removed from rmap.
> +	 */
> +	if (!page->mapping ||
> +	     (PageAnon(page) && !page_mapcount(page))) {
>  		unlock_page(page);
>  		goto retry;
>  	}
> Index: test-2.6.26-rc5-mm3/mm/migrate.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/migrate.c
> +++ test-2.6.26-rc5-mm3/mm/migrate.c
> @@ -67,9 +67,7 @@ int putback_lru_pages(struct list_head *
>  
>  	list_for_each_entry_safe(page, page2, l, lru) {
>  		list_del(&page->lru);
> -		lock_page(page);
> -		if (putback_lru_page(page))
> -			unlock_page(page);
> +		putback_lru_page(page);
>  		count++;
>  	}
>  	return count;
> @@ -571,7 +569,6 @@ static int fallback_migrate_page(struct 
>  static int move_to_new_page(struct page *newpage, struct page *page)
>  {
>  	struct address_space *mapping;
> -	int unlock = 1;
>  	int rc;
>  
>  	/*
> @@ -610,12 +607,11 @@ static int move_to_new_page(struct page 
>  		 * Put back on LRU while holding page locked to
>  		 * handle potential race with, e.g., munlock()
>  		 */

this comment isn't true.

> -		unlock = putback_lru_page(newpage);
> +		putback_lru_page(newpage);
>  	} else
>  		newpage->mapping = NULL;

originally move_to_lru() called in unmap_and_move().
unevictable infrastructure patch move to this point for 
calling putback_lru_page() under page locked.

So, your patch remove page locked dependency.
move to unmap_and_move() again is better.

it become page lock holding time reducing.

>  
> -	if (unlock)
> -		unlock_page(newpage);
> +	unlock_page(newpage);
>  
>  	return rc;
>  }
> @@ -632,7 +628,6 @@ static int unmap_and_move(new_page_t get
>  	struct page *newpage = get_new_page(page, private, &result);
>  	int rcu_locked = 0;
>  	int charge = 0;
> -	int unlock = 1;
>  
>  	if (!newpage)
>  		return -ENOMEM;
> @@ -713,6 +708,7 @@ rcu_unlock:
>  		rcu_read_unlock();
>  
>  unlock:
> +	unlock_page(page);
>  
>  	if (rc != -EAGAIN) {
>   		/*
> @@ -722,18 +718,9 @@ unlock:
>   		 * restored.
>   		 */
>   		list_del(&page->lru);
> -		if (!page->mapping) {
> -			VM_BUG_ON(page_count(page) != 1);
> -			unlock_page(page);
> -			put_page(page);		/* just free the old page */
> -			goto end_migration;
> -		} else
> -			unlock = putback_lru_page(page);
> +		putback_lru_page(page);
>  	}
>  
> -	if (unlock)
> -		unlock_page(page);
> -
>  end_migration:
>  	if (!charge)
>  		mem_cgroup_end_migration(newpage);
> Index: test-2.6.26-rc5-mm3/mm/internal.h
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/internal.h
> +++ test-2.6.26-rc5-mm3/mm/internal.h
> @@ -43,7 +43,7 @@ static inline void __put_page(struct pag
>   * in mm/vmscan.c:
>   */
>  extern int isolate_lru_page(struct page *page);
> -extern int putback_lru_page(struct page *page);
> +extern void putback_lru_page(struct page *page);
>  
>  /*
>   * in mm/page_alloc.c
> 



--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
  2008-06-18 11:36         ` KOSAKI Motohiro
  (?)
@ 2008-06-18 11:55           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18 11:55 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Wed, 18 Jun 2008 20:36:52 +0900
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> Hi kame-san,
> 
> > putback_lru_page() in this patch has a new concepts.
> > When it adds page to unevictable list, it checks the status is 
> > changed or not again. if changed, retry to putback.
> 
> it seems good idea :)
> this patch can reduce lock_page() call.
> 
yes.

> 
> > -	} else if (page_evictable(page, NULL)) {
> > -		/*
> > -		 * For evictable pages, we can use the cache.
> > -		 * In event of a race, worst case is we end up with an
> > -		 * unevictable page on [in]active list.
> > -		 * We know how to handle that.
> > -		 */
> 
> I think this comment is useful.
> Why do you want kill it?
> 
Oh, my mistake.



> > +	mem_cgroup_move_lists(page, lru);
> > +
> > +	/*
> > +	 * page's status can change while we move it among lru. If an evictable
> > +	 * page is on unevictable list, it never be freed. To avoid that,
> > +	 * check after we added it to the list, again.
> > +	 */
> > +	if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL)) {
> > +		if (!isolate_lru_page(page)) {
> > +			put_page(page);
> > +			goto redo;
> 
> No.
> We should treat carefully unevictable -> unevictable moving too.
> 
This lru is the destination ;)


> 
> > +		}
> > +		/* This means someone else dropped this page from LRU
> > +		 * So, it will be freed or putback to LRU again. There is
> > +		 * nothing to do here.
> > +		 */
> > +	}
> > +
> > +	if (was_unevictable && lru != LRU_UNEVICTABLE)
> > +		count_vm_event(NORECL_PGRESCUED);
> > +	else if (!was_unevictable && lru == LRU_UNEVICTABLE)
> > +		count_vm_event(NORECL_PGCULLED);
> >  
> >  	put_page(page);		/* drop ref from isolate */
> > -	return ret;		/* ret => "page still locked" */
> >  }
> > -
> > -/*
> > - * Cull page that shrink_*_list() has detected to be unevictable
> > - * under page lock to close races with other tasks that might be making
> > - * the page evictable.  Avoid stranding an evictable page on the
> > - * unevictable list.
> > - */
> > -static void cull_unevictable_page(struct page *page)
> > +#else
> > +void putback_lru_page(struct page *page)
> >  {
> > -	lock_page(page);
> > -	if (putback_lru_page(page))
> > -		unlock_page(page);
> > +	int lru;
> > +	VM_BUG_ON(PageLRU(page));
> > +
> > +	lru = !!TestClearPageActive(page) + page_is_file_cache(page);
> > +	lru_cache_add_lru(page, lru);
> > +	mem_cgroup_move_lists(page, lru);
> > +	put_page(page);
> >  }
> > +#endif
> >  
> >  /*
> >   * shrink_page_list() returns the number of reclaimed pages
> > @@ -746,8 +736,8 @@ free_it:
> >  		continue;
> >  
> >  cull_mlocked:
> > -		if (putback_lru_page(page))
> > -			unlock_page(page);
> > +		unlock_page(page);
> > +		putback_lru_page(page);
> >  		continue;
> >  
> >  activate_locked:
> > @@ -1127,7 +1117,7 @@ static unsigned long shrink_inactive_lis
> >  			list_del(&page->lru);
> >  			if (unlikely(!page_evictable(page, NULL))) {
> >  				spin_unlock_irq(&zone->lru_lock);
> > -				cull_unevictable_page(page);
> > +				putback_lru_page(page);
> >  				spin_lock_irq(&zone->lru_lock);
> >  				continue;
> >  			}
> > @@ -1231,7 +1221,7 @@ static void shrink_active_list(unsigned 
> >  		list_del(&page->lru);
> >  
> >  		if (unlikely(!page_evictable(page, NULL))) {
> > -			cull_unevictable_page(page);
> > +			putback_lru_page(page);
> >  			continue;
> >  		}
> >  
> > @@ -2393,8 +2383,6 @@ int zone_reclaim(struct zone *zone, gfp_
> >  int page_evictable(struct page *page, struct vm_area_struct *vma)
> >  {
> >  
> > -	VM_BUG_ON(PageUnevictable(page));
> > -
> >  	if (mapping_unevictable(page_mapping(page)))
> >  		return 0;
> 
> Why do you remove this?
> 
I caught panci here ;)
maybe
==
  if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL))
==
check is.


> 
> 
> 
> > @@ -169,7 +166,8 @@ static int __mlock_vma_pages_range(struc
> >  
> >  		/*
> >  		 * get_user_pages makes pages present if we are
> > -		 * setting mlock.
> > +		 * setting mlock. and this extra reference count will
> > +		 * disable migration of this page.
> >  		 */
> >  		ret = get_user_pages(current, mm, addr,
> >  				min_t(int, nr_pages, ARRAY_SIZE(pages)),
> > @@ -197,14 +195,8 @@ static int __mlock_vma_pages_range(struc
> >  		for (i = 0; i < ret; i++) {
> >  			struct page *page = pages[i];
> >  
> > -			/*
> > -			 * page might be truncated or migrated out from under
> > -			 * us.  Check after acquiring page lock.
> > -			 */
> > -			lock_page(page);
> > -			if (page->mapping)
> > +			if (page_mapcount(page))
> >  				mlock_vma_page(page);
> > -			unlock_page(page);
> >  			put_page(page);		/* ref from get_user_pages() */
> >  
> >  			/*
> > @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
> >  	struct page *page;
> >  	pte_t pte;
> >  
> > +	/*
> > +	 * page is never be unmapped by page-reclaim. we lock this page now.
> > +	 */
> >  retry:
> >  	pte = *ptep;
> >  	/*
> > @@ -261,7 +256,15 @@ retry:
> >  		goto out;
> >  
> >  	lock_page(page);
> > -	if (!page->mapping) {
> > +	/*
> > +	 * Because we lock page here, we have to check 2 cases.
> > +	 * - the page is migrated.
> > +	 * - the page is truncated (file-cache only)
> > +	 * Note: Anonymous page doesn't clear page->mapping even if it
> > +	 * is removed from rmap.
> > +	 */
> > +	if (!page->mapping ||
> > +	     (PageAnon(page) && !page_mapcount(page))) {
> >  		unlock_page(page);
> >  		goto retry;
> >  	}
> > Index: test-2.6.26-rc5-mm3/mm/migrate.c
> > ===================================================================
> > --- test-2.6.26-rc5-mm3.orig/mm/migrate.c
> > +++ test-2.6.26-rc5-mm3/mm/migrate.c
> > @@ -67,9 +67,7 @@ int putback_lru_pages(struct list_head *
> >  
> >  	list_for_each_entry_safe(page, page2, l, lru) {
> >  		list_del(&page->lru);
> > -		lock_page(page);
> > -		if (putback_lru_page(page))
> > -			unlock_page(page);
> > +		putback_lru_page(page);
> >  		count++;
> >  	}
> >  	return count;
> > @@ -571,7 +569,6 @@ static int fallback_migrate_page(struct 
> >  static int move_to_new_page(struct page *newpage, struct page *page)
> >  {
> >  	struct address_space *mapping;
> > -	int unlock = 1;
> >  	int rc;
> >  
> >  	/*
> > @@ -610,12 +607,11 @@ static int move_to_new_page(struct page 
> >  		 * Put back on LRU while holding page locked to
> >  		 * handle potential race with, e.g., munlock()
> >  		 */
> 
> this comment isn't true.
> 
yes.


> > -		unlock = putback_lru_page(newpage);
> > +		putback_lru_page(newpage);
> >  	} else
> >  		newpage->mapping = NULL;
> 
> originally move_to_lru() called in unmap_and_move().
> unevictable infrastructure patch move to this point for 
> calling putback_lru_page() under page locked.
> 
> So, your patch remove page locked dependency.
> move to unmap_and_move() again is better.
> 
> it become page lock holding time reducing.
> 
ok, will look into again.

Thanks,
-Kame


> >  
> > -	if (unlock)
> > -		unlock_page(newpage);
> > +	unlock_page(newpage);
> >  
> >  	return rc;
> >  }
> > @@ -632,7 +628,6 @@ static int unmap_and_move(new_page_t get
> >  	struct page *newpage = get_new_page(page, private, &result);
> >  	int rcu_locked = 0;
> >  	int charge = 0;
> > -	int unlock = 1;
> >  
> >  	if (!newpage)
> >  		return -ENOMEM;
> > @@ -713,6 +708,7 @@ rcu_unlock:
> >  		rcu_read_unlock();
> >  
> >  unlock:
> > +	unlock_page(page);
> >  
> >  	if (rc != -EAGAIN) {
> >   		/*
> > @@ -722,18 +718,9 @@ unlock:
> >   		 * restored.
> >   		 */
> >   		list_del(&page->lru);
> > -		if (!page->mapping) {
> > -			VM_BUG_ON(page_count(page) != 1);
> > -			unlock_page(page);
> > -			put_page(page);		/* just free the old page */
> > -			goto end_migration;
> > -		} else
> > -			unlock = putback_lru_page(page);
> > +		putback_lru_page(page);
> >  	}
> >  
> > -	if (unlock)
> > -		unlock_page(page);
> > -
> >  end_migration:
> >  	if (!charge)
> >  		mem_cgroup_end_migration(newpage);
> > Index: test-2.6.26-rc5-mm3/mm/internal.h
> > ===================================================================
> > --- test-2.6.26-rc5-mm3.orig/mm/internal.h
> > +++ test-2.6.26-rc5-mm3/mm/internal.h
> > @@ -43,7 +43,7 @@ static inline void __put_page(struct pag
> >   * in mm/vmscan.c:
> >   */
> >  extern int isolate_lru_page(struct page *page);
> > -extern int putback_lru_page(struct page *page);
> > +extern void putback_lru_page(struct page *page);
> >  
> >  /*
> >   * in mm/page_alloc.c
> > 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-18 11:55           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18 11:55 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Wed, 18 Jun 2008 20:36:52 +0900
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> Hi kame-san,
> 
> > putback_lru_page() in this patch has a new concepts.
> > When it adds page to unevictable list, it checks the status is 
> > changed or not again. if changed, retry to putback.
> 
> it seems good idea :)
> this patch can reduce lock_page() call.
> 
yes.

> 
> > -	} else if (page_evictable(page, NULL)) {
> > -		/*
> > -		 * For evictable pages, we can use the cache.
> > -		 * In event of a race, worst case is we end up with an
> > -		 * unevictable page on [in]active list.
> > -		 * We know how to handle that.
> > -		 */
> 
> I think this comment is useful.
> Why do you want kill it?
> 
Oh, my mistake.



> > +	mem_cgroup_move_lists(page, lru);
> > +
> > +	/*
> > +	 * page's status can change while we move it among lru. If an evictable
> > +	 * page is on unevictable list, it never be freed. To avoid that,
> > +	 * check after we added it to the list, again.
> > +	 */
> > +	if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL)) {
> > +		if (!isolate_lru_page(page)) {
> > +			put_page(page);
> > +			goto redo;
> 
> No.
> We should treat carefully unevictable -> unevictable moving too.
> 
This lru is the destination ;)


> 
> > +		}
> > +		/* This means someone else dropped this page from LRU
> > +		 * So, it will be freed or putback to LRU again. There is
> > +		 * nothing to do here.
> > +		 */
> > +	}
> > +
> > +	if (was_unevictable && lru != LRU_UNEVICTABLE)
> > +		count_vm_event(NORECL_PGRESCUED);
> > +	else if (!was_unevictable && lru == LRU_UNEVICTABLE)
> > +		count_vm_event(NORECL_PGCULLED);
> >  
> >  	put_page(page);		/* drop ref from isolate */
> > -	return ret;		/* ret => "page still locked" */
> >  }
> > -
> > -/*
> > - * Cull page that shrink_*_list() has detected to be unevictable
> > - * under page lock to close races with other tasks that might be making
> > - * the page evictable.  Avoid stranding an evictable page on the
> > - * unevictable list.
> > - */
> > -static void cull_unevictable_page(struct page *page)
> > +#else
> > +void putback_lru_page(struct page *page)
> >  {
> > -	lock_page(page);
> > -	if (putback_lru_page(page))
> > -		unlock_page(page);
> > +	int lru;
> > +	VM_BUG_ON(PageLRU(page));
> > +
> > +	lru = !!TestClearPageActive(page) + page_is_file_cache(page);
> > +	lru_cache_add_lru(page, lru);
> > +	mem_cgroup_move_lists(page, lru);
> > +	put_page(page);
> >  }
> > +#endif
> >  
> >  /*
> >   * shrink_page_list() returns the number of reclaimed pages
> > @@ -746,8 +736,8 @@ free_it:
> >  		continue;
> >  
> >  cull_mlocked:
> > -		if (putback_lru_page(page))
> > -			unlock_page(page);
> > +		unlock_page(page);
> > +		putback_lru_page(page);
> >  		continue;
> >  
> >  activate_locked:
> > @@ -1127,7 +1117,7 @@ static unsigned long shrink_inactive_lis
> >  			list_del(&page->lru);
> >  			if (unlikely(!page_evictable(page, NULL))) {
> >  				spin_unlock_irq(&zone->lru_lock);
> > -				cull_unevictable_page(page);
> > +				putback_lru_page(page);
> >  				spin_lock_irq(&zone->lru_lock);
> >  				continue;
> >  			}
> > @@ -1231,7 +1221,7 @@ static void shrink_active_list(unsigned 
> >  		list_del(&page->lru);
> >  
> >  		if (unlikely(!page_evictable(page, NULL))) {
> > -			cull_unevictable_page(page);
> > +			putback_lru_page(page);
> >  			continue;
> >  		}
> >  
> > @@ -2393,8 +2383,6 @@ int zone_reclaim(struct zone *zone, gfp_
> >  int page_evictable(struct page *page, struct vm_area_struct *vma)
> >  {
> >  
> > -	VM_BUG_ON(PageUnevictable(page));
> > -
> >  	if (mapping_unevictable(page_mapping(page)))
> >  		return 0;
> 
> Why do you remove this?
> 
I caught panci here ;)
maybe
==
  if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL))
==
check is.


> 
> 
> 
> > @@ -169,7 +166,8 @@ static int __mlock_vma_pages_range(struc
> >  
> >  		/*
> >  		 * get_user_pages makes pages present if we are
> > -		 * setting mlock.
> > +		 * setting mlock. and this extra reference count will
> > +		 * disable migration of this page.
> >  		 */
> >  		ret = get_user_pages(current, mm, addr,
> >  				min_t(int, nr_pages, ARRAY_SIZE(pages)),
> > @@ -197,14 +195,8 @@ static int __mlock_vma_pages_range(struc
> >  		for (i = 0; i < ret; i++) {
> >  			struct page *page = pages[i];
> >  
> > -			/*
> > -			 * page might be truncated or migrated out from under
> > -			 * us.  Check after acquiring page lock.
> > -			 */
> > -			lock_page(page);
> > -			if (page->mapping)
> > +			if (page_mapcount(page))
> >  				mlock_vma_page(page);
> > -			unlock_page(page);
> >  			put_page(page);		/* ref from get_user_pages() */
> >  
> >  			/*
> > @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
> >  	struct page *page;
> >  	pte_t pte;
> >  
> > +	/*
> > +	 * page is never be unmapped by page-reclaim. we lock this page now.
> > +	 */
> >  retry:
> >  	pte = *ptep;
> >  	/*
> > @@ -261,7 +256,15 @@ retry:
> >  		goto out;
> >  
> >  	lock_page(page);
> > -	if (!page->mapping) {
> > +	/*
> > +	 * Because we lock page here, we have to check 2 cases.
> > +	 * - the page is migrated.
> > +	 * - the page is truncated (file-cache only)
> > +	 * Note: Anonymous page doesn't clear page->mapping even if it
> > +	 * is removed from rmap.
> > +	 */
> > +	if (!page->mapping ||
> > +	     (PageAnon(page) && !page_mapcount(page))) {
> >  		unlock_page(page);
> >  		goto retry;
> >  	}
> > Index: test-2.6.26-rc5-mm3/mm/migrate.c
> > ===================================================================
> > --- test-2.6.26-rc5-mm3.orig/mm/migrate.c
> > +++ test-2.6.26-rc5-mm3/mm/migrate.c
> > @@ -67,9 +67,7 @@ int putback_lru_pages(struct list_head *
> >  
> >  	list_for_each_entry_safe(page, page2, l, lru) {
> >  		list_del(&page->lru);
> > -		lock_page(page);
> > -		if (putback_lru_page(page))
> > -			unlock_page(page);
> > +		putback_lru_page(page);
> >  		count++;
> >  	}
> >  	return count;
> > @@ -571,7 +569,6 @@ static int fallback_migrate_page(struct 
> >  static int move_to_new_page(struct page *newpage, struct page *page)
> >  {
> >  	struct address_space *mapping;
> > -	int unlock = 1;
> >  	int rc;
> >  
> >  	/*
> > @@ -610,12 +607,11 @@ static int move_to_new_page(struct page 
> >  		 * Put back on LRU while holding page locked to
> >  		 * handle potential race with, e.g., munlock()
> >  		 */
> 
> this comment isn't true.
> 
yes.


> > -		unlock = putback_lru_page(newpage);
> > +		putback_lru_page(newpage);
> >  	} else
> >  		newpage->mapping = NULL;
> 
> originally move_to_lru() called in unmap_and_move().
> unevictable infrastructure patch move to this point for 
> calling putback_lru_page() under page locked.
> 
> So, your patch remove page locked dependency.
> move to unmap_and_move() again is better.
> 
> it become page lock holding time reducing.
> 
ok, will look into again.

Thanks,
-Kame


> >  
> > -	if (unlock)
> > -		unlock_page(newpage);
> > +	unlock_page(newpage);
> >  
> >  	return rc;
> >  }
> > @@ -632,7 +628,6 @@ static int unmap_and_move(new_page_t get
> >  	struct page *newpage = get_new_page(page, private, &result);
> >  	int rcu_locked = 0;
> >  	int charge = 0;
> > -	int unlock = 1;
> >  
> >  	if (!newpage)
> >  		return -ENOMEM;
> > @@ -713,6 +708,7 @@ rcu_unlock:
> >  		rcu_read_unlock();
> >  
> >  unlock:
> > +	unlock_page(page);
> >  
> >  	if (rc != -EAGAIN) {
> >   		/*
> > @@ -722,18 +718,9 @@ unlock:
> >   		 * restored.
> >   		 */
> >   		list_del(&page->lru);
> > -		if (!page->mapping) {
> > -			VM_BUG_ON(page_count(page) != 1);
> > -			unlock_page(page);
> > -			put_page(page);		/* just free the old page */
> > -			goto end_migration;
> > -		} else
> > -			unlock = putback_lru_page(page);
> > +		putback_lru_page(page);
> >  	}
> >  
> > -	if (unlock)
> > -		unlock_page(page);
> > -
> >  end_migration:
> >  	if (!charge)
> >  		mem_cgroup_end_migration(newpage);
> > Index: test-2.6.26-rc5-mm3/mm/internal.h
> > ===================================================================
> > --- test-2.6.26-rc5-mm3.orig/mm/internal.h
> > +++ test-2.6.26-rc5-mm3/mm/internal.h
> > @@ -43,7 +43,7 @@ static inline void __put_page(struct pag
> >   * in mm/vmscan.c:
> >   */
> >  extern int isolate_lru_page(struct page *page);
> > -extern int putback_lru_page(struct page *page);
> > +extern void putback_lru_page(struct page *page);
> >  
> >  /*
> >   * in mm/page_alloc.c
> > 
> 
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-18 11:55           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-18 11:55 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Wed, 18 Jun 2008 20:36:52 +0900
KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:

> Hi kame-san,
> 
> > putback_lru_page() in this patch has a new concepts.
> > When it adds page to unevictable list, it checks the status is 
> > changed or not again. if changed, retry to putback.
> 
> it seems good idea :)
> this patch can reduce lock_page() call.
> 
yes.

> 
> > -	} else if (page_evictable(page, NULL)) {
> > -		/*
> > -		 * For evictable pages, we can use the cache.
> > -		 * In event of a race, worst case is we end up with an
> > -		 * unevictable page on [in]active list.
> > -		 * We know how to handle that.
> > -		 */
> 
> I think this comment is useful.
> Why do you want kill it?
> 
Oh, my mistake.



> > +	mem_cgroup_move_lists(page, lru);
> > +
> > +	/*
> > +	 * page's status can change while we move it among lru. If an evictable
> > +	 * page is on unevictable list, it never be freed. To avoid that,
> > +	 * check after we added it to the list, again.
> > +	 */
> > +	if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL)) {
> > +		if (!isolate_lru_page(page)) {
> > +			put_page(page);
> > +			goto redo;
> 
> No.
> We should treat carefully unevictable -> unevictable moving too.
> 
This lru is the destination ;)


> 
> > +		}
> > +		/* This means someone else dropped this page from LRU
> > +		 * So, it will be freed or putback to LRU again. There is
> > +		 * nothing to do here.
> > +		 */
> > +	}
> > +
> > +	if (was_unevictable && lru != LRU_UNEVICTABLE)
> > +		count_vm_event(NORECL_PGRESCUED);
> > +	else if (!was_unevictable && lru == LRU_UNEVICTABLE)
> > +		count_vm_event(NORECL_PGCULLED);
> >  
> >  	put_page(page);		/* drop ref from isolate */
> > -	return ret;		/* ret => "page still locked" */
> >  }
> > -
> > -/*
> > - * Cull page that shrink_*_list() has detected to be unevictable
> > - * under page lock to close races with other tasks that might be making
> > - * the page evictable.  Avoid stranding an evictable page on the
> > - * unevictable list.
> > - */
> > -static void cull_unevictable_page(struct page *page)
> > +#else
> > +void putback_lru_page(struct page *page)
> >  {
> > -	lock_page(page);
> > -	if (putback_lru_page(page))
> > -		unlock_page(page);
> > +	int lru;
> > +	VM_BUG_ON(PageLRU(page));
> > +
> > +	lru = !!TestClearPageActive(page) + page_is_file_cache(page);
> > +	lru_cache_add_lru(page, lru);
> > +	mem_cgroup_move_lists(page, lru);
> > +	put_page(page);
> >  }
> > +#endif
> >  
> >  /*
> >   * shrink_page_list() returns the number of reclaimed pages
> > @@ -746,8 +736,8 @@ free_it:
> >  		continue;
> >  
> >  cull_mlocked:
> > -		if (putback_lru_page(page))
> > -			unlock_page(page);
> > +		unlock_page(page);
> > +		putback_lru_page(page);
> >  		continue;
> >  
> >  activate_locked:
> > @@ -1127,7 +1117,7 @@ static unsigned long shrink_inactive_lis
> >  			list_del(&page->lru);
> >  			if (unlikely(!page_evictable(page, NULL))) {
> >  				spin_unlock_irq(&zone->lru_lock);
> > -				cull_unevictable_page(page);
> > +				putback_lru_page(page);
> >  				spin_lock_irq(&zone->lru_lock);
> >  				continue;
> >  			}
> > @@ -1231,7 +1221,7 @@ static void shrink_active_list(unsigned 
> >  		list_del(&page->lru);
> >  
> >  		if (unlikely(!page_evictable(page, NULL))) {
> > -			cull_unevictable_page(page);
> > +			putback_lru_page(page);
> >  			continue;
> >  		}
> >  
> > @@ -2393,8 +2383,6 @@ int zone_reclaim(struct zone *zone, gfp_
> >  int page_evictable(struct page *page, struct vm_area_struct *vma)
> >  {
> >  
> > -	VM_BUG_ON(PageUnevictable(page));
> > -
> >  	if (mapping_unevictable(page_mapping(page)))
> >  		return 0;
> 
> Why do you remove this?
> 
I caught panci here ;)
maybe
==
  if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL))
==
check is.


> 
> 
> 
> > @@ -169,7 +166,8 @@ static int __mlock_vma_pages_range(struc
> >  
> >  		/*
> >  		 * get_user_pages makes pages present if we are
> > -		 * setting mlock.
> > +		 * setting mlock. and this extra reference count will
> > +		 * disable migration of this page.
> >  		 */
> >  		ret = get_user_pages(current, mm, addr,
> >  				min_t(int, nr_pages, ARRAY_SIZE(pages)),
> > @@ -197,14 +195,8 @@ static int __mlock_vma_pages_range(struc
> >  		for (i = 0; i < ret; i++) {
> >  			struct page *page = pages[i];
> >  
> > -			/*
> > -			 * page might be truncated or migrated out from under
> > -			 * us.  Check after acquiring page lock.
> > -			 */
> > -			lock_page(page);
> > -			if (page->mapping)
> > +			if (page_mapcount(page))
> >  				mlock_vma_page(page);
> > -			unlock_page(page);
> >  			put_page(page);		/* ref from get_user_pages() */
> >  
> >  			/*
> > @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
> >  	struct page *page;
> >  	pte_t pte;
> >  
> > +	/*
> > +	 * page is never be unmapped by page-reclaim. we lock this page now.
> > +	 */
> >  retry:
> >  	pte = *ptep;
> >  	/*
> > @@ -261,7 +256,15 @@ retry:
> >  		goto out;
> >  
> >  	lock_page(page);
> > -	if (!page->mapping) {
> > +	/*
> > +	 * Because we lock page here, we have to check 2 cases.
> > +	 * - the page is migrated.
> > +	 * - the page is truncated (file-cache only)
> > +	 * Note: Anonymous page doesn't clear page->mapping even if it
> > +	 * is removed from rmap.
> > +	 */
> > +	if (!page->mapping ||
> > +	     (PageAnon(page) && !page_mapcount(page))) {
> >  		unlock_page(page);
> >  		goto retry;
> >  	}
> > Index: test-2.6.26-rc5-mm3/mm/migrate.c
> > ===================================================================
> > --- test-2.6.26-rc5-mm3.orig/mm/migrate.c
> > +++ test-2.6.26-rc5-mm3/mm/migrate.c
> > @@ -67,9 +67,7 @@ int putback_lru_pages(struct list_head *
> >  
> >  	list_for_each_entry_safe(page, page2, l, lru) {
> >  		list_del(&page->lru);
> > -		lock_page(page);
> > -		if (putback_lru_page(page))
> > -			unlock_page(page);
> > +		putback_lru_page(page);
> >  		count++;
> >  	}
> >  	return count;
> > @@ -571,7 +569,6 @@ static int fallback_migrate_page(struct 
> >  static int move_to_new_page(struct page *newpage, struct page *page)
> >  {
> >  	struct address_space *mapping;
> > -	int unlock = 1;
> >  	int rc;
> >  
> >  	/*
> > @@ -610,12 +607,11 @@ static int move_to_new_page(struct page 
> >  		 * Put back on LRU while holding page locked to
> >  		 * handle potential race with, e.g., munlock()
> >  		 */
> 
> this comment isn't true.
> 
yes.


> > -		unlock = putback_lru_page(newpage);
> > +		putback_lru_page(newpage);
> >  	} else
> >  		newpage->mapping = NULL;
> 
> originally move_to_lru() called in unmap_and_move().
> unevictable infrastructure patch move to this point for 
> calling putback_lru_page() under page locked.
> 
> So, your patch remove page locked dependency.
> move to unmap_and_move() again is better.
> 
> it become page lock holding time reducing.
> 
ok, will look into again.

Thanks,
-Kame


> >  
> > -	if (unlock)
> > -		unlock_page(newpage);
> > +	unlock_page(newpage);
> >  
> >  	return rc;
> >  }
> > @@ -632,7 +628,6 @@ static int unmap_and_move(new_page_t get
> >  	struct page *newpage = get_new_page(page, private, &result);
> >  	int rcu_locked = 0;
> >  	int charge = 0;
> > -	int unlock = 1;
> >  
> >  	if (!newpage)
> >  		return -ENOMEM;
> > @@ -713,6 +708,7 @@ rcu_unlock:
> >  		rcu_read_unlock();
> >  
> >  unlock:
> > +	unlock_page(page);
> >  
> >  	if (rc != -EAGAIN) {
> >   		/*
> > @@ -722,18 +718,9 @@ unlock:
> >   		 * restored.
> >   		 */
> >   		list_del(&page->lru);
> > -		if (!page->mapping) {
> > -			VM_BUG_ON(page_count(page) != 1);
> > -			unlock_page(page);
> > -			put_page(page);		/* just free the old page */
> > -			goto end_migration;
> > -		} else
> > -			unlock = putback_lru_page(page);
> > +		putback_lru_page(page);
> >  	}
> >  
> > -	if (unlock)
> > -		unlock_page(page);
> > -
> >  end_migration:
> >  	if (!charge)
> >  		mem_cgroup_end_migration(newpage);
> > Index: test-2.6.26-rc5-mm3/mm/internal.h
> > ===================================================================
> > --- test-2.6.26-rc5-mm3.orig/mm/internal.h
> > +++ test-2.6.26-rc5-mm3/mm/internal.h
> > @@ -43,7 +43,7 @@ static inline void __put_page(struct pag
> >   * in mm/vmscan.c:
> >   */
> >  extern int isolate_lru_page(struct page *page);
> > -extern int putback_lru_page(struct page *page);
> > +extern void putback_lru_page(struct page *page);
> >  
> >  /*
> >   * in mm/page_alloc.c
> > 
> 
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
  2008-06-18  9:40       ` KAMEZAWA Hiroyuki
@ 2008-06-18 14:50         ` Daisuke Nishimura
  -1 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-18 14:50 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers

Hi, Kamezawa-san.

Sorry for my late reply, and thank you for your patch.

> This patch tries to make putback_lru_pages() to be lock_page() free.
> (Of course, some callers must take the lock.)
> 
I like this idea.

I'll test it tomorrow.


Thanks,
Daisuke Nishimura.

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-18 14:50         ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-18 14:50 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Kosaki Motohiro, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers

Hi, Kamezawa-san.

Sorry for my late reply, and thank you for your patch.

> This patch tries to make putback_lru_pages() to be lock_page() free.
> (Of course, some callers must take the lock.)
> 
I like this idea.

I'll test it tomorrow.


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3
  2008-06-12 23:32   ` 2.6.26-rc5-mm3 Byron Bradley
  (?)
@ 2008-06-18 17:55     ` Daniel Walker
  -1 siblings, 0 replies; 290+ messages in thread
From: Daniel Walker @ 2008-06-18 17:55 UTC (permalink / raw)
  To: Byron Bradley
  Cc: Andrew Morton, linux-kernel, kernel-testers, linux-mm, Hua Zhong,
	Ingo Molnar


On Fri, 2008-06-13 at 00:32 +0100, Byron Bradley wrote:
> Looks like x86 and ARM both fail to boot if PROFILE_LIKELY, FTRACE and 
> DYNAMIC_FTRACE are selected. If any one of those three are disabled it 
> boots (or fails in some other way which I'm looking at now). The serial 
> console output from both machines when they fail to boot is below, let me 
> know if there is any other information I can provide.

I was able to reproduce a hang on x86 with those options. The patch
below is a potential fix. I think we don't want to trace
do_check_likely(), since the ftrace internals might use likely/unlikely
macro's which will just cause recursion back to do_check_likely()..

Signed-off-by: Daniel Walker <dwalker@mvista.com>

---
 lib/likely_prof.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.25/lib/likely_prof.c
===================================================================
--- linux-2.6.25.orig/lib/likely_prof.c
+++ linux-2.6.25/lib/likely_prof.c
@@ -22,7 +22,7 @@
 
 static struct likeliness *likeliness_head;
 
-int do_check_likely(struct likeliness *likeliness, unsigned int ret)
+int notrace do_check_likely(struct likeliness *likeliness, unsigned int ret)
 {
 	static unsigned long likely_lock;
 



^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3
@ 2008-06-18 17:55     ` Daniel Walker
  0 siblings, 0 replies; 290+ messages in thread
From: Daniel Walker @ 2008-06-18 17:55 UTC (permalink / raw)
  To: Byron Bradley
  Cc: Andrew Morton, linux-kernel, kernel-testers, linux-mm, Hua Zhong,
	Ingo Molnar

On Fri, 2008-06-13 at 00:32 +0100, Byron Bradley wrote:
> Looks like x86 and ARM both fail to boot if PROFILE_LIKELY, FTRACE and 
> DYNAMIC_FTRACE are selected. If any one of those three are disabled it 
> boots (or fails in some other way which I'm looking at now). The serial 
> console output from both machines when they fail to boot is below, let me 
> know if there is any other information I can provide.

I was able to reproduce a hang on x86 with those options. The patch
below is a potential fix. I think we don't want to trace
do_check_likely(), since the ftrace internals might use likely/unlikely
macro's which will just cause recursion back to do_check_likely()..

Signed-off-by: Daniel Walker <dwalker@mvista.com>

---
 lib/likely_prof.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.25/lib/likely_prof.c
===================================================================
--- linux-2.6.25.orig/lib/likely_prof.c
+++ linux-2.6.25/lib/likely_prof.c
@@ -22,7 +22,7 @@
 
 static struct likeliness *likeliness_head;
 
-int do_check_likely(struct likeliness *likeliness, unsigned int ret)
+int notrace do_check_likely(struct likeliness *likeliness, unsigned int ret)
 {
 	static unsigned long likely_lock;
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3
@ 2008-06-18 17:55     ` Daniel Walker
  0 siblings, 0 replies; 290+ messages in thread
From: Daniel Walker @ 2008-06-18 17:55 UTC (permalink / raw)
  To: Byron Bradley
  Cc: Andrew Morton, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Hua Zhong, Ingo Molnar


On Fri, 2008-06-13 at 00:32 +0100, Byron Bradley wrote:
> Looks like x86 and ARM both fail to boot if PROFILE_LIKELY, FTRACE and 
> DYNAMIC_FTRACE are selected. If any one of those three are disabled it 
> boots (or fails in some other way which I'm looking at now). The serial 
> console output from both machines when they fail to boot is below, let me 
> know if there is any other information I can provide.

I was able to reproduce a hang on x86 with those options. The patch
below is a potential fix. I think we don't want to trace
do_check_likely(), since the ftrace internals might use likely/unlikely
macro's which will just cause recursion back to do_check_likely()..

Signed-off-by: Daniel Walker <dwalker-Igf4POYTYCDQT0dZR+AlfA@public.gmane.org>

---
 lib/likely_prof.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.25/lib/likely_prof.c
===================================================================
--- linux-2.6.25.orig/lib/likely_prof.c
+++ linux-2.6.25/lib/likely_prof.c
@@ -22,7 +22,7 @@
 
 static struct likeliness *likeliness_head;
 
-int do_check_likely(struct likeliness *likeliness, unsigned int ret)
+int notrace do_check_likely(struct likeliness *likeliness, unsigned int ret)
 {
 	static unsigned long likely_lock;
 


--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
  2008-06-18  9:40       ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-18 18:21         ` Lee Schermerhorn
  -1 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-18 18:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Wed, 2008-06-18 at 18:40 +0900, KAMEZAWA Hiroyuki wrote:
> Lee-san, how about this ?
> Tested on x86-64 and tried Nisimura-san's test at el. works good now.

I have been testing with my work load on both ia64 and x86_64 and it
seems to be working well.  I'll let them run for a day or so.

> -Kame
> ==
> putback_lru_page()/unevictable page handling rework.
> 
> Now, putback_lru_page() requires that the page is locked.
> And in some special case, implicitly unlock it.
> 
> This patch tries to make putback_lru_pages() to be lock_page() free.
> (Of course, some callers must take the lock.)
> 
> The main reason that putback_lru_page() assumes that page is locked
> is to avoid the change in page's status among Mlocked/Not-Mlocked.
> 
> Once it is added to unevictable list, the page is removed from
> unevictable list only when page is munlocked. (there are other special
> case. but we ignore the special case.)
> So, status change during putback_lru_page() is fatal and page should 
> be locked.
> 
> putback_lru_page() in this patch has a new concepts.
> When it adds page to unevictable list, it checks the status is 
> changed or not again. if changed, retry to putback.

Given that the race that would activate this retry is likely quite rare,
this approach makes sense.  

> 
> This patche changes also caller side and cleaning up lock/unlock_page().
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroy@jp.fujitsu.com>

A couple of minor comments below, but:

Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

> 
> ---
>  mm/internal.h |    2 -
>  mm/migrate.c  |   23 +++----------
>  mm/mlock.c    |   24 +++++++-------
>  mm/vmscan.c   |   96 +++++++++++++++++++++++++---------------------------------
>  4 files changed, 61 insertions(+), 84 deletions(-)
> 
> Index: test-2.6.26-rc5-mm3/mm/vmscan.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/vmscan.c
> +++ test-2.6.26-rc5-mm3/mm/vmscan.c
> @@ -486,73 +486,63 @@ int remove_mapping(struct address_space 
>   * Page may still be unevictable for other reasons.
>   *
>   * lru_lock must not be held, interrupts must be enabled.
> - * Must be called with page locked.
> - *
> - * return 1 if page still locked [not truncated], else 0
>   */
> -int putback_lru_page(struct page *page)
> +#ifdef CONFIG_UNEVICTABLE_LRU
> +void putback_lru_page(struct page *page)
>  {
>  	int lru;
> -	int ret = 1;
>  	int was_unevictable;
>  
> -	VM_BUG_ON(!PageLocked(page));
>  	VM_BUG_ON(PageLRU(page));
>  
> -	lru = !!TestClearPageActive(page);
>  	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
>  
> -	if (unlikely(!page->mapping)) {
> -		/*
> -		 * page truncated.  drop lock as put_page() will
> -		 * free the page.
> -		 */
> -		VM_BUG_ON(page_count(page) != 1);
> -		unlock_page(page);
> -		ret = 0;
> -	} else if (page_evictable(page, NULL)) {
> -		/*
> -		 * For evictable pages, we can use the cache.
> -		 * In event of a race, worst case is we end up with an
> -		 * unevictable page on [in]active list.
> -		 * We know how to handle that.
> -		 */
> +redo:
> +	lru = !!TestClearPageActive(page);
> +	if (page_evictable(page, NULL)) {
>  		lru += page_is_file_cache(page);
>  		lru_cache_add_lru(page, lru);
> -		mem_cgroup_move_lists(page, lru);
> -#ifdef CONFIG_UNEVICTABLE_LRU
> -		if (was_unevictable)
> -			count_vm_event(NORECL_PGRESCUED);
> -#endif
>  	} else {
> -		/*
> -		 * Put unevictable pages directly on zone's unevictable
> -		 * list.
> -		 */
> +		lru = LRU_UNEVICTABLE;
>  		add_page_to_unevictable_list(page);
> -		mem_cgroup_move_lists(page, LRU_UNEVICTABLE);
> -#ifdef CONFIG_UNEVICTABLE_LRU
> -		if (!was_unevictable)
> -			count_vm_event(NORECL_PGCULLED);
> -#endif
>  	}
> +	mem_cgroup_move_lists(page, lru);
> +
> +	/*
> +	 * page's status can change while we move it among lru. If an evictable
> +	 * page is on unevictable list, it never be freed. To avoid that,
> +	 * check after we added it to the list, again.
> +	 */
> +	if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL)) {
> +		if (!isolate_lru_page(page)) {
> +			put_page(page);
> +			goto redo;
> +		}
> +		/* This means someone else dropped this page from LRU
> +		 * So, it will be freed or putback to LRU again. There is
> +		 * nothing to do here.
> +		 */
> +	}
> +
> +	if (was_unevictable && lru != LRU_UNEVICTABLE)
> +		count_vm_event(NORECL_PGRESCUED);
> +	else if (!was_unevictable && lru == LRU_UNEVICTABLE)
> +		count_vm_event(NORECL_PGCULLED);
>  
>  	put_page(page);		/* drop ref from isolate */
> -	return ret;		/* ret => "page still locked" */
>  }
> -
> -/*
> - * Cull page that shrink_*_list() has detected to be unevictable
> - * under page lock to close races with other tasks that might be making
> - * the page evictable.  Avoid stranding an evictable page on the
> - * unevictable list.
> - */
> -static void cull_unevictable_page(struct page *page)
> +#else
> +void putback_lru_page(struct page *page)
>  {
> -	lock_page(page);
> -	if (putback_lru_page(page))
> -		unlock_page(page);
> +	int lru;
> +	VM_BUG_ON(PageLRU(page));
> +
> +	lru = !!TestClearPageActive(page) + page_is_file_cache(page);
> +	lru_cache_add_lru(page, lru);
> +	mem_cgroup_move_lists(page, lru);
> +	put_page(page);
>  }
> +#endif
>  
>  /*
>   * shrink_page_list() returns the number of reclaimed pages
> @@ -746,8 +736,8 @@ free_it:
>  		continue;
>  
>  cull_mlocked:
> -		if (putback_lru_page(page))
> -			unlock_page(page);
> +		unlock_page(page);
> +		putback_lru_page(page);
>  		continue;
>  
>  activate_locked:
> @@ -1127,7 +1117,7 @@ static unsigned long shrink_inactive_lis
>  			list_del(&page->lru);
>  			if (unlikely(!page_evictable(page, NULL))) {
>  				spin_unlock_irq(&zone->lru_lock);
> -				cull_unevictable_page(page);
> +				putback_lru_page(page);
>  				spin_lock_irq(&zone->lru_lock);
>  				continue;
>  			}
> @@ -1231,7 +1221,7 @@ static void shrink_active_list(unsigned 
>  		list_del(&page->lru);
>  
>  		if (unlikely(!page_evictable(page, NULL))) {
> -			cull_unevictable_page(page);
> +			putback_lru_page(page);
>  			continue;
>  		}
>  
> @@ -2393,8 +2383,6 @@ int zone_reclaim(struct zone *zone, gfp_
>  int page_evictable(struct page *page, struct vm_area_struct *vma)
>  {
>  
> -	VM_BUG_ON(PageUnevictable(page));
> -
>  	if (mapping_unevictable(page_mapping(page)))
>  		return 0;
>  
> Index: test-2.6.26-rc5-mm3/mm/mlock.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/mlock.c
> +++ test-2.6.26-rc5-mm3/mm/mlock.c
> @@ -55,7 +55,6 @@ EXPORT_SYMBOL(can_do_mlock);
>   */
>  void __clear_page_mlock(struct page *page)
>  {
> -	VM_BUG_ON(!PageLocked(page));	/* for LRU isolate/putback */
>  
>  	dec_zone_page_state(page, NR_MLOCK);
>  	count_vm_event(NORECL_PGCLEARED);
> @@ -79,7 +78,6 @@ void __clear_page_mlock(struct page *pag
>   */
>  void mlock_vma_page(struct page *page)
>  {
> -	BUG_ON(!PageLocked(page));
>  
>  	if (!TestSetPageMlocked(page)) {
>  		inc_zone_page_state(page, NR_MLOCK);
> @@ -109,7 +107,6 @@ void mlock_vma_page(struct page *page)
>   */
>  static void munlock_vma_page(struct page *page)
>  {
> -	BUG_ON(!PageLocked(page));
>  
>  	if (TestClearPageMlocked(page)) {
>  		dec_zone_page_state(page, NR_MLOCK);
> @@ -169,7 +166,8 @@ static int __mlock_vma_pages_range(struc
>  
>  		/*
>  		 * get_user_pages makes pages present if we are
> -		 * setting mlock.
> +		 * setting mlock. and this extra reference count will
> +		 * disable migration of this page.
>  		 */
>  		ret = get_user_pages(current, mm, addr,
>  				min_t(int, nr_pages, ARRAY_SIZE(pages)),
> @@ -197,14 +195,8 @@ static int __mlock_vma_pages_range(struc
>  		for (i = 0; i < ret; i++) {
>  			struct page *page = pages[i];
>  
> -			/*
> -			 * page might be truncated or migrated out from under
> -			 * us.  Check after acquiring page lock.
> -			 */
> -			lock_page(page);

Hmmm.  Still thinking about this.  No need to protect against in flight
truncation or migration?

> -			if (page->mapping)
> +			if (page_mapcount(page))
>  				mlock_vma_page(page);
> -			unlock_page(page);
>  			put_page(page);		/* ref from get_user_pages() */
>  
>  			/*
> @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
>  	struct page *page;
>  	pte_t pte;
>  
> +	/*
> +	 * page is never be unmapped by page-reclaim. we lock this page now.
> +	 */

I don't understand what you're trying to say here.  That is, what the
point of this comment is...

>  retry:
>  	pte = *ptep;
>  	/*
> @@ -261,7 +256,15 @@ retry:
>  		goto out;
>  
>  	lock_page(page);
> -	if (!page->mapping) {
> +	/*
> +	 * Because we lock page here, we have to check 2 cases.
> +	 * - the page is migrated.
> +	 * - the page is truncated (file-cache only)
> +	 * Note: Anonymous page doesn't clear page->mapping even if it
> +	 * is removed from rmap.
> +	 */
> +	if (!page->mapping ||
> +	     (PageAnon(page) && !page_mapcount(page))) {
>  		unlock_page(page);
>  		goto retry;
>  	}
> Index: test-2.6.26-rc5-mm3/mm/migrate.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/migrate.c
> +++ test-2.6.26-rc5-mm3/mm/migrate.c
> @@ -67,9 +67,7 @@ int putback_lru_pages(struct list_head *
>  
>  	list_for_each_entry_safe(page, page2, l, lru) {
>  		list_del(&page->lru);
> -		lock_page(page);
> -		if (putback_lru_page(page))
> -			unlock_page(page);
> +		putback_lru_page(page);
>  		count++;
>  	}
>  	return count;
> @@ -571,7 +569,6 @@ static int fallback_migrate_page(struct 
>  static int move_to_new_page(struct page *newpage, struct page *page)
>  {
>  	struct address_space *mapping;
> -	int unlock = 1;
>  	int rc;
>  
>  	/*
> @@ -610,12 +607,11 @@ static int move_to_new_page(struct page 
>  		 * Put back on LRU while holding page locked to
>  		 * handle potential race with, e.g., munlock()
>  		 */
> -		unlock = putback_lru_page(newpage);
> +		putback_lru_page(newpage);
>  	} else
>  		newpage->mapping = NULL;
>  
> -	if (unlock)
> -		unlock_page(newpage);
> +	unlock_page(newpage);
>  
>  	return rc;
>  }
> @@ -632,7 +628,6 @@ static int unmap_and_move(new_page_t get
>  	struct page *newpage = get_new_page(page, private, &result);
>  	int rcu_locked = 0;
>  	int charge = 0;
> -	int unlock = 1;
>  
>  	if (!newpage)
>  		return -ENOMEM;
> @@ -713,6 +708,7 @@ rcu_unlock:
>  		rcu_read_unlock();
>  
>  unlock:
> +	unlock_page(page);
>  
>  	if (rc != -EAGAIN) {
>   		/*
> @@ -722,18 +718,9 @@ unlock:
>   		 * restored.
>   		 */
>   		list_del(&page->lru);
> -		if (!page->mapping) {
> -			VM_BUG_ON(page_count(page) != 1);
> -			unlock_page(page);
> -			put_page(page);		/* just free the old page */
> -			goto end_migration;
> -		} else
> -			unlock = putback_lru_page(page);
> +		putback_lru_page(page);
>  	}
>  
> -	if (unlock)
> -		unlock_page(page);
> -
>  end_migration:
>  	if (!charge)
>  		mem_cgroup_end_migration(newpage);
> Index: test-2.6.26-rc5-mm3/mm/internal.h
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/internal.h
> +++ test-2.6.26-rc5-mm3/mm/internal.h
> @@ -43,7 +43,7 @@ static inline void __put_page(struct pag
>   * in mm/vmscan.c:
>   */
>  extern int isolate_lru_page(struct page *page);
> -extern int putback_lru_page(struct page *page);
> +extern void putback_lru_page(struct page *page);
>  
>  /*
>   * in mm/page_alloc.c
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-18 18:21         ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-18 18:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Wed, 2008-06-18 at 18:40 +0900, KAMEZAWA Hiroyuki wrote:
> Lee-san, how about this ?
> Tested on x86-64 and tried Nisimura-san's test at el. works good now.

I have been testing with my work load on both ia64 and x86_64 and it
seems to be working well.  I'll let them run for a day or so.

> -Kame
> ==
> putback_lru_page()/unevictable page handling rework.
> 
> Now, putback_lru_page() requires that the page is locked.
> And in some special case, implicitly unlock it.
> 
> This patch tries to make putback_lru_pages() to be lock_page() free.
> (Of course, some callers must take the lock.)
> 
> The main reason that putback_lru_page() assumes that page is locked
> is to avoid the change in page's status among Mlocked/Not-Mlocked.
> 
> Once it is added to unevictable list, the page is removed from
> unevictable list only when page is munlocked. (there are other special
> case. but we ignore the special case.)
> So, status change during putback_lru_page() is fatal and page should 
> be locked.
> 
> putback_lru_page() in this patch has a new concepts.
> When it adds page to unevictable list, it checks the status is 
> changed or not again. if changed, retry to putback.

Given that the race that would activate this retry is likely quite rare,
this approach makes sense.  

> 
> This patche changes also caller side and cleaning up lock/unlock_page().
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroy@jp.fujitsu.com>

A couple of minor comments below, but:

Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

> 
> ---
>  mm/internal.h |    2 -
>  mm/migrate.c  |   23 +++----------
>  mm/mlock.c    |   24 +++++++-------
>  mm/vmscan.c   |   96 +++++++++++++++++++++++++---------------------------------
>  4 files changed, 61 insertions(+), 84 deletions(-)
> 
> Index: test-2.6.26-rc5-mm3/mm/vmscan.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/vmscan.c
> +++ test-2.6.26-rc5-mm3/mm/vmscan.c
> @@ -486,73 +486,63 @@ int remove_mapping(struct address_space 
>   * Page may still be unevictable for other reasons.
>   *
>   * lru_lock must not be held, interrupts must be enabled.
> - * Must be called with page locked.
> - *
> - * return 1 if page still locked [not truncated], else 0
>   */
> -int putback_lru_page(struct page *page)
> +#ifdef CONFIG_UNEVICTABLE_LRU
> +void putback_lru_page(struct page *page)
>  {
>  	int lru;
> -	int ret = 1;
>  	int was_unevictable;
>  
> -	VM_BUG_ON(!PageLocked(page));
>  	VM_BUG_ON(PageLRU(page));
>  
> -	lru = !!TestClearPageActive(page);
>  	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
>  
> -	if (unlikely(!page->mapping)) {
> -		/*
> -		 * page truncated.  drop lock as put_page() will
> -		 * free the page.
> -		 */
> -		VM_BUG_ON(page_count(page) != 1);
> -		unlock_page(page);
> -		ret = 0;
> -	} else if (page_evictable(page, NULL)) {
> -		/*
> -		 * For evictable pages, we can use the cache.
> -		 * In event of a race, worst case is we end up with an
> -		 * unevictable page on [in]active list.
> -		 * We know how to handle that.
> -		 */
> +redo:
> +	lru = !!TestClearPageActive(page);
> +	if (page_evictable(page, NULL)) {
>  		lru += page_is_file_cache(page);
>  		lru_cache_add_lru(page, lru);
> -		mem_cgroup_move_lists(page, lru);
> -#ifdef CONFIG_UNEVICTABLE_LRU
> -		if (was_unevictable)
> -			count_vm_event(NORECL_PGRESCUED);
> -#endif
>  	} else {
> -		/*
> -		 * Put unevictable pages directly on zone's unevictable
> -		 * list.
> -		 */
> +		lru = LRU_UNEVICTABLE;
>  		add_page_to_unevictable_list(page);
> -		mem_cgroup_move_lists(page, LRU_UNEVICTABLE);
> -#ifdef CONFIG_UNEVICTABLE_LRU
> -		if (!was_unevictable)
> -			count_vm_event(NORECL_PGCULLED);
> -#endif
>  	}
> +	mem_cgroup_move_lists(page, lru);
> +
> +	/*
> +	 * page's status can change while we move it among lru. If an evictable
> +	 * page is on unevictable list, it never be freed. To avoid that,
> +	 * check after we added it to the list, again.
> +	 */
> +	if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL)) {
> +		if (!isolate_lru_page(page)) {
> +			put_page(page);
> +			goto redo;
> +		}
> +		/* This means someone else dropped this page from LRU
> +		 * So, it will be freed or putback to LRU again. There is
> +		 * nothing to do here.
> +		 */
> +	}
> +
> +	if (was_unevictable && lru != LRU_UNEVICTABLE)
> +		count_vm_event(NORECL_PGRESCUED);
> +	else if (!was_unevictable && lru == LRU_UNEVICTABLE)
> +		count_vm_event(NORECL_PGCULLED);
>  
>  	put_page(page);		/* drop ref from isolate */
> -	return ret;		/* ret => "page still locked" */
>  }
> -
> -/*
> - * Cull page that shrink_*_list() has detected to be unevictable
> - * under page lock to close races with other tasks that might be making
> - * the page evictable.  Avoid stranding an evictable page on the
> - * unevictable list.
> - */
> -static void cull_unevictable_page(struct page *page)
> +#else
> +void putback_lru_page(struct page *page)
>  {
> -	lock_page(page);
> -	if (putback_lru_page(page))
> -		unlock_page(page);
> +	int lru;
> +	VM_BUG_ON(PageLRU(page));
> +
> +	lru = !!TestClearPageActive(page) + page_is_file_cache(page);
> +	lru_cache_add_lru(page, lru);
> +	mem_cgroup_move_lists(page, lru);
> +	put_page(page);
>  }
> +#endif
>  
>  /*
>   * shrink_page_list() returns the number of reclaimed pages
> @@ -746,8 +736,8 @@ free_it:
>  		continue;
>  
>  cull_mlocked:
> -		if (putback_lru_page(page))
> -			unlock_page(page);
> +		unlock_page(page);
> +		putback_lru_page(page);
>  		continue;
>  
>  activate_locked:
> @@ -1127,7 +1117,7 @@ static unsigned long shrink_inactive_lis
>  			list_del(&page->lru);
>  			if (unlikely(!page_evictable(page, NULL))) {
>  				spin_unlock_irq(&zone->lru_lock);
> -				cull_unevictable_page(page);
> +				putback_lru_page(page);
>  				spin_lock_irq(&zone->lru_lock);
>  				continue;
>  			}
> @@ -1231,7 +1221,7 @@ static void shrink_active_list(unsigned 
>  		list_del(&page->lru);
>  
>  		if (unlikely(!page_evictable(page, NULL))) {
> -			cull_unevictable_page(page);
> +			putback_lru_page(page);
>  			continue;
>  		}
>  
> @@ -2393,8 +2383,6 @@ int zone_reclaim(struct zone *zone, gfp_
>  int page_evictable(struct page *page, struct vm_area_struct *vma)
>  {
>  
> -	VM_BUG_ON(PageUnevictable(page));
> -
>  	if (mapping_unevictable(page_mapping(page)))
>  		return 0;
>  
> Index: test-2.6.26-rc5-mm3/mm/mlock.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/mlock.c
> +++ test-2.6.26-rc5-mm3/mm/mlock.c
> @@ -55,7 +55,6 @@ EXPORT_SYMBOL(can_do_mlock);
>   */
>  void __clear_page_mlock(struct page *page)
>  {
> -	VM_BUG_ON(!PageLocked(page));	/* for LRU isolate/putback */
>  
>  	dec_zone_page_state(page, NR_MLOCK);
>  	count_vm_event(NORECL_PGCLEARED);
> @@ -79,7 +78,6 @@ void __clear_page_mlock(struct page *pag
>   */
>  void mlock_vma_page(struct page *page)
>  {
> -	BUG_ON(!PageLocked(page));
>  
>  	if (!TestSetPageMlocked(page)) {
>  		inc_zone_page_state(page, NR_MLOCK);
> @@ -109,7 +107,6 @@ void mlock_vma_page(struct page *page)
>   */
>  static void munlock_vma_page(struct page *page)
>  {
> -	BUG_ON(!PageLocked(page));
>  
>  	if (TestClearPageMlocked(page)) {
>  		dec_zone_page_state(page, NR_MLOCK);
> @@ -169,7 +166,8 @@ static int __mlock_vma_pages_range(struc
>  
>  		/*
>  		 * get_user_pages makes pages present if we are
> -		 * setting mlock.
> +		 * setting mlock. and this extra reference count will
> +		 * disable migration of this page.
>  		 */
>  		ret = get_user_pages(current, mm, addr,
>  				min_t(int, nr_pages, ARRAY_SIZE(pages)),
> @@ -197,14 +195,8 @@ static int __mlock_vma_pages_range(struc
>  		for (i = 0; i < ret; i++) {
>  			struct page *page = pages[i];
>  
> -			/*
> -			 * page might be truncated or migrated out from under
> -			 * us.  Check after acquiring page lock.
> -			 */
> -			lock_page(page);

Hmmm.  Still thinking about this.  No need to protect against in flight
truncation or migration?

> -			if (page->mapping)
> +			if (page_mapcount(page))
>  				mlock_vma_page(page);
> -			unlock_page(page);
>  			put_page(page);		/* ref from get_user_pages() */
>  
>  			/*
> @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
>  	struct page *page;
>  	pte_t pte;
>  
> +	/*
> +	 * page is never be unmapped by page-reclaim. we lock this page now.
> +	 */

I don't understand what you're trying to say here.  That is, what the
point of this comment is...

>  retry:
>  	pte = *ptep;
>  	/*
> @@ -261,7 +256,15 @@ retry:
>  		goto out;
>  
>  	lock_page(page);
> -	if (!page->mapping) {
> +	/*
> +	 * Because we lock page here, we have to check 2 cases.
> +	 * - the page is migrated.
> +	 * - the page is truncated (file-cache only)
> +	 * Note: Anonymous page doesn't clear page->mapping even if it
> +	 * is removed from rmap.
> +	 */
> +	if (!page->mapping ||
> +	     (PageAnon(page) && !page_mapcount(page))) {
>  		unlock_page(page);
>  		goto retry;
>  	}
> Index: test-2.6.26-rc5-mm3/mm/migrate.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/migrate.c
> +++ test-2.6.26-rc5-mm3/mm/migrate.c
> @@ -67,9 +67,7 @@ int putback_lru_pages(struct list_head *
>  
>  	list_for_each_entry_safe(page, page2, l, lru) {
>  		list_del(&page->lru);
> -		lock_page(page);
> -		if (putback_lru_page(page))
> -			unlock_page(page);
> +		putback_lru_page(page);
>  		count++;
>  	}
>  	return count;
> @@ -571,7 +569,6 @@ static int fallback_migrate_page(struct 
>  static int move_to_new_page(struct page *newpage, struct page *page)
>  {
>  	struct address_space *mapping;
> -	int unlock = 1;
>  	int rc;
>  
>  	/*
> @@ -610,12 +607,11 @@ static int move_to_new_page(struct page 
>  		 * Put back on LRU while holding page locked to
>  		 * handle potential race with, e.g., munlock()
>  		 */
> -		unlock = putback_lru_page(newpage);
> +		putback_lru_page(newpage);
>  	} else
>  		newpage->mapping = NULL;
>  
> -	if (unlock)
> -		unlock_page(newpage);
> +	unlock_page(newpage);
>  
>  	return rc;
>  }
> @@ -632,7 +628,6 @@ static int unmap_and_move(new_page_t get
>  	struct page *newpage = get_new_page(page, private, &result);
>  	int rcu_locked = 0;
>  	int charge = 0;
> -	int unlock = 1;
>  
>  	if (!newpage)
>  		return -ENOMEM;
> @@ -713,6 +708,7 @@ rcu_unlock:
>  		rcu_read_unlock();
>  
>  unlock:
> +	unlock_page(page);
>  
>  	if (rc != -EAGAIN) {
>   		/*
> @@ -722,18 +718,9 @@ unlock:
>   		 * restored.
>   		 */
>   		list_del(&page->lru);
> -		if (!page->mapping) {
> -			VM_BUG_ON(page_count(page) != 1);
> -			unlock_page(page);
> -			put_page(page);		/* just free the old page */
> -			goto end_migration;
> -		} else
> -			unlock = putback_lru_page(page);
> +		putback_lru_page(page);
>  	}
>  
> -	if (unlock)
> -		unlock_page(page);
> -
>  end_migration:
>  	if (!charge)
>  		mem_cgroup_end_migration(newpage);
> Index: test-2.6.26-rc5-mm3/mm/internal.h
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/internal.h
> +++ test-2.6.26-rc5-mm3/mm/internal.h
> @@ -43,7 +43,7 @@ static inline void __put_page(struct pag
>   * in mm/vmscan.c:
>   */
>  extern int isolate_lru_page(struct page *page);
> -extern int putback_lru_page(struct page *page);
> +extern void putback_lru_page(struct page *page);
>  
>  /*
>   * in mm/page_alloc.c
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-18 18:21         ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-18 18:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Wed, 2008-06-18 at 18:40 +0900, KAMEZAWA Hiroyuki wrote:
> Lee-san, how about this ?
> Tested on x86-64 and tried Nisimura-san's test at el. works good now.

I have been testing with my work load on both ia64 and x86_64 and it
seems to be working well.  I'll let them run for a day or so.

> -Kame
> ==
> putback_lru_page()/unevictable page handling rework.
> 
> Now, putback_lru_page() requires that the page is locked.
> And in some special case, implicitly unlock it.
> 
> This patch tries to make putback_lru_pages() to be lock_page() free.
> (Of course, some callers must take the lock.)
> 
> The main reason that putback_lru_page() assumes that page is locked
> is to avoid the change in page's status among Mlocked/Not-Mlocked.
> 
> Once it is added to unevictable list, the page is removed from
> unevictable list only when page is munlocked. (there are other special
> case. but we ignore the special case.)
> So, status change during putback_lru_page() is fatal and page should 
> be locked.
> 
> putback_lru_page() in this patch has a new concepts.
> When it adds page to unevictable list, it checks the status is 
> changed or not again. if changed, retry to putback.

Given that the race that would activate this retry is likely quite rare,
this approach makes sense.  

> 
> This patche changes also caller side and cleaning up lock/unlock_page().
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroy-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

A couple of minor comments below, but:

Acked-by: Lee Schermerhorn <lee.schermerhorn-VXdhtT5mjnY@public.gmane.org>

> 
> ---
>  mm/internal.h |    2 -
>  mm/migrate.c  |   23 +++----------
>  mm/mlock.c    |   24 +++++++-------
>  mm/vmscan.c   |   96 +++++++++++++++++++++++++---------------------------------
>  4 files changed, 61 insertions(+), 84 deletions(-)
> 
> Index: test-2.6.26-rc5-mm3/mm/vmscan.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/vmscan.c
> +++ test-2.6.26-rc5-mm3/mm/vmscan.c
> @@ -486,73 +486,63 @@ int remove_mapping(struct address_space 
>   * Page may still be unevictable for other reasons.
>   *
>   * lru_lock must not be held, interrupts must be enabled.
> - * Must be called with page locked.
> - *
> - * return 1 if page still locked [not truncated], else 0
>   */
> -int putback_lru_page(struct page *page)
> +#ifdef CONFIG_UNEVICTABLE_LRU
> +void putback_lru_page(struct page *page)
>  {
>  	int lru;
> -	int ret = 1;
>  	int was_unevictable;
>  
> -	VM_BUG_ON(!PageLocked(page));
>  	VM_BUG_ON(PageLRU(page));
>  
> -	lru = !!TestClearPageActive(page);
>  	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
>  
> -	if (unlikely(!page->mapping)) {
> -		/*
> -		 * page truncated.  drop lock as put_page() will
> -		 * free the page.
> -		 */
> -		VM_BUG_ON(page_count(page) != 1);
> -		unlock_page(page);
> -		ret = 0;
> -	} else if (page_evictable(page, NULL)) {
> -		/*
> -		 * For evictable pages, we can use the cache.
> -		 * In event of a race, worst case is we end up with an
> -		 * unevictable page on [in]active list.
> -		 * We know how to handle that.
> -		 */
> +redo:
> +	lru = !!TestClearPageActive(page);
> +	if (page_evictable(page, NULL)) {
>  		lru += page_is_file_cache(page);
>  		lru_cache_add_lru(page, lru);
> -		mem_cgroup_move_lists(page, lru);
> -#ifdef CONFIG_UNEVICTABLE_LRU
> -		if (was_unevictable)
> -			count_vm_event(NORECL_PGRESCUED);
> -#endif
>  	} else {
> -		/*
> -		 * Put unevictable pages directly on zone's unevictable
> -		 * list.
> -		 */
> +		lru = LRU_UNEVICTABLE;
>  		add_page_to_unevictable_list(page);
> -		mem_cgroup_move_lists(page, LRU_UNEVICTABLE);
> -#ifdef CONFIG_UNEVICTABLE_LRU
> -		if (!was_unevictable)
> -			count_vm_event(NORECL_PGCULLED);
> -#endif
>  	}
> +	mem_cgroup_move_lists(page, lru);
> +
> +	/*
> +	 * page's status can change while we move it among lru. If an evictable
> +	 * page is on unevictable list, it never be freed. To avoid that,
> +	 * check after we added it to the list, again.
> +	 */
> +	if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL)) {
> +		if (!isolate_lru_page(page)) {
> +			put_page(page);
> +			goto redo;
> +		}
> +		/* This means someone else dropped this page from LRU
> +		 * So, it will be freed or putback to LRU again. There is
> +		 * nothing to do here.
> +		 */
> +	}
> +
> +	if (was_unevictable && lru != LRU_UNEVICTABLE)
> +		count_vm_event(NORECL_PGRESCUED);
> +	else if (!was_unevictable && lru == LRU_UNEVICTABLE)
> +		count_vm_event(NORECL_PGCULLED);
>  
>  	put_page(page);		/* drop ref from isolate */
> -	return ret;		/* ret => "page still locked" */
>  }
> -
> -/*
> - * Cull page that shrink_*_list() has detected to be unevictable
> - * under page lock to close races with other tasks that might be making
> - * the page evictable.  Avoid stranding an evictable page on the
> - * unevictable list.
> - */
> -static void cull_unevictable_page(struct page *page)
> +#else
> +void putback_lru_page(struct page *page)
>  {
> -	lock_page(page);
> -	if (putback_lru_page(page))
> -		unlock_page(page);
> +	int lru;
> +	VM_BUG_ON(PageLRU(page));
> +
> +	lru = !!TestClearPageActive(page) + page_is_file_cache(page);
> +	lru_cache_add_lru(page, lru);
> +	mem_cgroup_move_lists(page, lru);
> +	put_page(page);
>  }
> +#endif
>  
>  /*
>   * shrink_page_list() returns the number of reclaimed pages
> @@ -746,8 +736,8 @@ free_it:
>  		continue;
>  
>  cull_mlocked:
> -		if (putback_lru_page(page))
> -			unlock_page(page);
> +		unlock_page(page);
> +		putback_lru_page(page);
>  		continue;
>  
>  activate_locked:
> @@ -1127,7 +1117,7 @@ static unsigned long shrink_inactive_lis
>  			list_del(&page->lru);
>  			if (unlikely(!page_evictable(page, NULL))) {
>  				spin_unlock_irq(&zone->lru_lock);
> -				cull_unevictable_page(page);
> +				putback_lru_page(page);
>  				spin_lock_irq(&zone->lru_lock);
>  				continue;
>  			}
> @@ -1231,7 +1221,7 @@ static void shrink_active_list(unsigned 
>  		list_del(&page->lru);
>  
>  		if (unlikely(!page_evictable(page, NULL))) {
> -			cull_unevictable_page(page);
> +			putback_lru_page(page);
>  			continue;
>  		}
>  
> @@ -2393,8 +2383,6 @@ int zone_reclaim(struct zone *zone, gfp_
>  int page_evictable(struct page *page, struct vm_area_struct *vma)
>  {
>  
> -	VM_BUG_ON(PageUnevictable(page));
> -
>  	if (mapping_unevictable(page_mapping(page)))
>  		return 0;
>  
> Index: test-2.6.26-rc5-mm3/mm/mlock.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/mlock.c
> +++ test-2.6.26-rc5-mm3/mm/mlock.c
> @@ -55,7 +55,6 @@ EXPORT_SYMBOL(can_do_mlock);
>   */
>  void __clear_page_mlock(struct page *page)
>  {
> -	VM_BUG_ON(!PageLocked(page));	/* for LRU isolate/putback */
>  
>  	dec_zone_page_state(page, NR_MLOCK);
>  	count_vm_event(NORECL_PGCLEARED);
> @@ -79,7 +78,6 @@ void __clear_page_mlock(struct page *pag
>   */
>  void mlock_vma_page(struct page *page)
>  {
> -	BUG_ON(!PageLocked(page));
>  
>  	if (!TestSetPageMlocked(page)) {
>  		inc_zone_page_state(page, NR_MLOCK);
> @@ -109,7 +107,6 @@ void mlock_vma_page(struct page *page)
>   */
>  static void munlock_vma_page(struct page *page)
>  {
> -	BUG_ON(!PageLocked(page));
>  
>  	if (TestClearPageMlocked(page)) {
>  		dec_zone_page_state(page, NR_MLOCK);
> @@ -169,7 +166,8 @@ static int __mlock_vma_pages_range(struc
>  
>  		/*
>  		 * get_user_pages makes pages present if we are
> -		 * setting mlock.
> +		 * setting mlock. and this extra reference count will
> +		 * disable migration of this page.
>  		 */
>  		ret = get_user_pages(current, mm, addr,
>  				min_t(int, nr_pages, ARRAY_SIZE(pages)),
> @@ -197,14 +195,8 @@ static int __mlock_vma_pages_range(struc
>  		for (i = 0; i < ret; i++) {
>  			struct page *page = pages[i];
>  
> -			/*
> -			 * page might be truncated or migrated out from under
> -			 * us.  Check after acquiring page lock.
> -			 */
> -			lock_page(page);

Hmmm.  Still thinking about this.  No need to protect against in flight
truncation or migration?

> -			if (page->mapping)
> +			if (page_mapcount(page))
>  				mlock_vma_page(page);
> -			unlock_page(page);
>  			put_page(page);		/* ref from get_user_pages() */
>  
>  			/*
> @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
>  	struct page *page;
>  	pte_t pte;
>  
> +	/*
> +	 * page is never be unmapped by page-reclaim. we lock this page now.
> +	 */

I don't understand what you're trying to say here.  That is, what the
point of this comment is...

>  retry:
>  	pte = *ptep;
>  	/*
> @@ -261,7 +256,15 @@ retry:
>  		goto out;
>  
>  	lock_page(page);
> -	if (!page->mapping) {
> +	/*
> +	 * Because we lock page here, we have to check 2 cases.
> +	 * - the page is migrated.
> +	 * - the page is truncated (file-cache only)
> +	 * Note: Anonymous page doesn't clear page->mapping even if it
> +	 * is removed from rmap.
> +	 */
> +	if (!page->mapping ||
> +	     (PageAnon(page) && !page_mapcount(page))) {
>  		unlock_page(page);
>  		goto retry;
>  	}
> Index: test-2.6.26-rc5-mm3/mm/migrate.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/migrate.c
> +++ test-2.6.26-rc5-mm3/mm/migrate.c
> @@ -67,9 +67,7 @@ int putback_lru_pages(struct list_head *
>  
>  	list_for_each_entry_safe(page, page2, l, lru) {
>  		list_del(&page->lru);
> -		lock_page(page);
> -		if (putback_lru_page(page))
> -			unlock_page(page);
> +		putback_lru_page(page);
>  		count++;
>  	}
>  	return count;
> @@ -571,7 +569,6 @@ static int fallback_migrate_page(struct 
>  static int move_to_new_page(struct page *newpage, struct page *page)
>  {
>  	struct address_space *mapping;
> -	int unlock = 1;
>  	int rc;
>  
>  	/*
> @@ -610,12 +607,11 @@ static int move_to_new_page(struct page 
>  		 * Put back on LRU while holding page locked to
>  		 * handle potential race with, e.g., munlock()
>  		 */
> -		unlock = putback_lru_page(newpage);
> +		putback_lru_page(newpage);
>  	} else
>  		newpage->mapping = NULL;
>  
> -	if (unlock)
> -		unlock_page(newpage);
> +	unlock_page(newpage);
>  
>  	return rc;
>  }
> @@ -632,7 +628,6 @@ static int unmap_and_move(new_page_t get
>  	struct page *newpage = get_new_page(page, private, &result);
>  	int rcu_locked = 0;
>  	int charge = 0;
> -	int unlock = 1;
>  
>  	if (!newpage)
>  		return -ENOMEM;
> @@ -713,6 +708,7 @@ rcu_unlock:
>  		rcu_read_unlock();
>  
>  unlock:
> +	unlock_page(page);
>  
>  	if (rc != -EAGAIN) {
>   		/*
> @@ -722,18 +718,9 @@ unlock:
>   		 * restored.
>   		 */
>   		list_del(&page->lru);
> -		if (!page->mapping) {
> -			VM_BUG_ON(page_count(page) != 1);
> -			unlock_page(page);
> -			put_page(page);		/* just free the old page */
> -			goto end_migration;
> -		} else
> -			unlock = putback_lru_page(page);
> +		putback_lru_page(page);
>  	}
>  
> -	if (unlock)
> -		unlock_page(page);
> -
>  end_migration:
>  	if (!charge)
>  		mem_cgroup_end_migration(newpage);
> Index: test-2.6.26-rc5-mm3/mm/internal.h
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/internal.h
> +++ test-2.6.26-rc5-mm3/mm/internal.h
> @@ -43,7 +43,7 @@ static inline void __put_page(struct pag
>   * in mm/vmscan.c:
>   */
>  extern int isolate_lru_page(struct page *page);
> -extern int putback_lru_page(struct page *page);
> +extern void putback_lru_page(struct page *page);
>  
>  /*
>   * in mm/page_alloc.c
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"> email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org </a>

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
  2008-06-18 18:21         ` Lee Schermerhorn
  (?)
@ 2008-06-19  0:22           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-19  0:22 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Wed, 18 Jun 2008 14:21:06 -0400
Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:

> On Wed, 2008-06-18 at 18:40 +0900, KAMEZAWA Hiroyuki wrote:
> > Lee-san, how about this ?
> > Tested on x86-64 and tried Nisimura-san's test at el. works good now.
> 
> I have been testing with my work load on both ia64 and x86_64 and it
> seems to be working well.  I'll let them run for a day or so.
> 
thank you.
<snip>

> > @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
> >  	struct page *page;
> >  	pte_t pte;
> >  
> > +	/*
> > +	 * page is never be unmapped by page-reclaim. we lock this page now.
> > +	 */
> 
> I don't understand what you're trying to say here.  That is, what the
> point of this comment is...
> 
We access the page-table without taking pte_lock. But this vm is MLOCKED
and migration-race is handled. So we don't need to be too nervous to access
the pte. I'll consider more meaningful words.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-19  0:22           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-19  0:22 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Wed, 18 Jun 2008 14:21:06 -0400
Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:

> On Wed, 2008-06-18 at 18:40 +0900, KAMEZAWA Hiroyuki wrote:
> > Lee-san, how about this ?
> > Tested on x86-64 and tried Nisimura-san's test at el. works good now.
> 
> I have been testing with my work load on both ia64 and x86_64 and it
> seems to be working well.  I'll let them run for a day or so.
> 
thank you.
<snip>

> > @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
> >  	struct page *page;
> >  	pte_t pte;
> >  
> > +	/*
> > +	 * page is never be unmapped by page-reclaim. we lock this page now.
> > +	 */
> 
> I don't understand what you're trying to say here.  That is, what the
> point of this comment is...
> 
We access the page-table without taking pte_lock. But this vm is MLOCKED
and migration-race is handled. So we don't need to be too nervous to access
the pte. I'll consider more meaningful words.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-19  0:22           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-19  0:22 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Wed, 18 Jun 2008 14:21:06 -0400
Lee Schermerhorn <Lee.Schermerhorn-VXdhtT5mjnY@public.gmane.org> wrote:

> On Wed, 2008-06-18 at 18:40 +0900, KAMEZAWA Hiroyuki wrote:
> > Lee-san, how about this ?
> > Tested on x86-64 and tried Nisimura-san's test at el. works good now.
> 
> I have been testing with my work load on both ia64 and x86_64 and it
> seems to be working well.  I'll let them run for a day or so.
> 
thank you.
<snip>

> > @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
> >  	struct page *page;
> >  	pte_t pte;
> >  
> > +	/*
> > +	 * page is never be unmapped by page-reclaim. we lock this page now.
> > +	 */
> 
> I don't understand what you're trying to say here.  That is, what the
> point of this comment is...
> 
We access the page-table without taking pte_lock. But this vm is MLOCKED
and migration-race is handled. So we don't need to be too nervous to access
the pte. I'll consider more meaningful words.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [BUG][PATCH -mm] avoid BUG() in __stop_machine_run()
  2008-06-12  5:59 ` 2.6.26-rc5-mm3 Andrew Morton
@ 2008-06-19  6:59   ` Hidehiro Kawai
  -1 siblings, 0 replies; 290+ messages in thread
From: Hidehiro Kawai @ 2008-06-19  6:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, kernel-testers, linux-mm, sugita, Satoshi OSHIMA, rusty

When a process loads a kernel module, __stop_machine_run() is called, and
it calls sched_setscheduler() to give newly created kernel threads highest
priority.  However, the process can have no CAP_SYS_NICE which required
for sched_setscheduler() to increase the priority.  For example, SystemTap
loads its module with only CAP_SYS_MODULE.  In this case,
sched_setscheduler() returns -EPERM, then BUG() is called.

Failure of sched_setscheduler() wouldn't be a real problem, so this
patch just ignores it.
Or, should we give the CAP_SYS_NICE capability temporarily?

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
---
 kernel/stop_machine.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: linux-2.6.26-rc5-mm3/kernel/stop_machine.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/kernel/stop_machine.c
+++ linux-2.6.26-rc5-mm3/kernel/stop_machine.c
@@ -143,8 +143,7 @@ int __stop_machine_run(int (*fn)(void *)
 		kthread_bind(threads[i], i);
 
 		/* Make it highest prio. */
-		if (sched_setscheduler(threads[i], SCHED_FIFO, &param) != 0)
-			BUG();
+		sched_setscheduler(threads[i], SCHED_FIFO, &param);
 	}
 
 	/* We've created all the threads.  Wake them all: hold this CPU so one



^ permalink raw reply	[flat|nested] 290+ messages in thread

* [BUG][PATCH -mm] avoid BUG() in __stop_machine_run()
@ 2008-06-19  6:59   ` Hidehiro Kawai
  0 siblings, 0 replies; 290+ messages in thread
From: Hidehiro Kawai @ 2008-06-19  6:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, kernel-testers, linux-mm, sugita, Satoshi OSHIMA, rusty

When a process loads a kernel module, __stop_machine_run() is called, and
it calls sched_setscheduler() to give newly created kernel threads highest
priority.  However, the process can have no CAP_SYS_NICE which required
for sched_setscheduler() to increase the priority.  For example, SystemTap
loads its module with only CAP_SYS_MODULE.  In this case,
sched_setscheduler() returns -EPERM, then BUG() is called.

Failure of sched_setscheduler() wouldn't be a real problem, so this
patch just ignores it.
Or, should we give the CAP_SYS_NICE capability temporarily?

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
---
 kernel/stop_machine.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: linux-2.6.26-rc5-mm3/kernel/stop_machine.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/kernel/stop_machine.c
+++ linux-2.6.26-rc5-mm3/kernel/stop_machine.c
@@ -143,8 +143,7 @@ int __stop_machine_run(int (*fn)(void *)
 		kthread_bind(threads[i], i);
 
 		/* Make it highest prio. */
-		if (sched_setscheduler(threads[i], SCHED_FIFO, &param) != 0)
-			BUG();
+		sched_setscheduler(threads[i], SCHED_FIFO, &param);
 	}
 
 	/* We've created all the threads.  Wake them all: hold this CPU so one


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
  2008-06-18 11:55           ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-19  8:00             ` Daisuke Nishimura
  -1 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-19  8:00 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

> > > -		unlock = putback_lru_page(newpage);
> > > +		putback_lru_page(newpage);
> > >  	} else
> > >  		newpage->mapping = NULL;
> > 
> > originally move_to_lru() called in unmap_and_move().
> > unevictable infrastructure patch move to this point for 
> > calling putback_lru_page() under page locked.
> > 
> > So, your patch remove page locked dependency.
> > move to unmap_and_move() again is better.
> > 
> > it become page lock holding time reducing.
> > 
> ok, will look into again.
> 

I agree with Kosaki-san.

And VM_BUG_ON(page_count(newpage) != 1) in unmap_and_move()
is not correct again, IMHO.
I got this BUG actually when testing this patch(with
migratin_entry_wait fix).

unmap_and_move()
	move_to_new_page()
		migrate_page()
		remove_migration_ptes()
		putback_lru_page()			(*1)
	  :
        if (!newpage->mapping)				(*2)
		VM_BUG_ON(page_count(newpage) != 1)

If a anonymous page(without mapping) is migrated successfully,
this page is moved back to lru by putback_lru_page()(*1),
and the page count becomes 1(pte only).

At the same time(between *1 and *2), if the process
that owns this page are freeing this page, the page count
becomes 0 and ->mapping becomes NULL by free_hot_cold_page(),
so this BUG is caused.

I've not seen this BUG on real HW yet(seen twice on fake-numa
hvm guest of Xen), but I think it can happen theoretically.


Thanks,
Daisuke Nishimura.

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-19  8:00             ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-19  8:00 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

> > > -		unlock = putback_lru_page(newpage);
> > > +		putback_lru_page(newpage);
> > >  	} else
> > >  		newpage->mapping = NULL;
> > 
> > originally move_to_lru() called in unmap_and_move().
> > unevictable infrastructure patch move to this point for 
> > calling putback_lru_page() under page locked.
> > 
> > So, your patch remove page locked dependency.
> > move to unmap_and_move() again is better.
> > 
> > it become page lock holding time reducing.
> > 
> ok, will look into again.
> 

I agree with Kosaki-san.

And VM_BUG_ON(page_count(newpage) != 1) in unmap_and_move()
is not correct again, IMHO.
I got this BUG actually when testing this patch(with
migratin_entry_wait fix).

unmap_and_move()
	move_to_new_page()
		migrate_page()
		remove_migration_ptes()
		putback_lru_page()			(*1)
	  :
        if (!newpage->mapping)				(*2)
		VM_BUG_ON(page_count(newpage) != 1)

If a anonymous page(without mapping) is migrated successfully,
this page is moved back to lru by putback_lru_page()(*1),
and the page count becomes 1(pte only).

At the same time(between *1 and *2), if the process
that owns this page are freeing this page, the page count
becomes 0 and ->mapping becomes NULL by free_hot_cold_page(),
so this BUG is caused.

I've not seen this BUG on real HW yet(seen twice on fake-numa
hvm guest of Xen), but I think it can happen theoretically.


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-19  8:00             ` Daisuke Nishimura
  0 siblings, 0 replies; 290+ messages in thread
From: Daisuke Nishimura @ 2008-06-19  8:00 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

> > > -		unlock = putback_lru_page(newpage);
> > > +		putback_lru_page(newpage);
> > >  	} else
> > >  		newpage->mapping = NULL;
> > 
> > originally move_to_lru() called in unmap_and_move().
> > unevictable infrastructure patch move to this point for 
> > calling putback_lru_page() under page locked.
> > 
> > So, your patch remove page locked dependency.
> > move to unmap_and_move() again is better.
> > 
> > it become page lock holding time reducing.
> > 
> ok, will look into again.
> 

I agree with Kosaki-san.

And VM_BUG_ON(page_count(newpage) != 1) in unmap_and_move()
is not correct again, IMHO.
I got this BUG actually when testing this patch(with
migratin_entry_wait fix).

unmap_and_move()
	move_to_new_page()
		migrate_page()
		remove_migration_ptes()
		putback_lru_page()			(*1)
	  :
        if (!newpage->mapping)				(*2)
		VM_BUG_ON(page_count(newpage) != 1)

If a anonymous page(without mapping) is migrated successfully,
this page is moved back to lru by putback_lru_page()(*1),
and the page count becomes 1(pte only).

At the same time(between *1 and *2), if the process
that owns this page are freeing this page, the page count
becomes 0 and ->mapping becomes NULL by free_hot_cold_page(),
so this BUG is caused.

I've not seen this BUG on real HW yet(seen twice on fake-numa
hvm guest of Xen), but I think it can happen theoretically.


Thanks,
Daisuke Nishimura.
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
  2008-06-19  8:00             ` Daisuke Nishimura
  (?)
@ 2008-06-19  8:24               ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-19  8:24 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Thu, 19 Jun 2008 17:00:59 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> > > > -		unlock = putback_lru_page(newpage);
> > > > +		putback_lru_page(newpage);
> > > >  	} else
> > > >  		newpage->mapping = NULL;
> > > 
> > > originally move_to_lru() called in unmap_and_move().
> > > unevictable infrastructure patch move to this point for 
> > > calling putback_lru_page() under page locked.
> > > 
> > > So, your patch remove page locked dependency.
> > > move to unmap_and_move() again is better.
> > > 
> > > it become page lock holding time reducing.
> > > 
> > ok, will look into again.
> > 
> 
> I agree with Kosaki-san.
> 
> And VM_BUG_ON(page_count(newpage) != 1) in unmap_and_move()
> is not correct again, IMHO.
> I got this BUG actually when testing this patch(with
> migratin_entry_wait fix).
> 
> unmap_and_move()
> 	move_to_new_page()
> 		migrate_page()
> 		remove_migration_ptes()
> 		putback_lru_page()			(*1)
> 	  :
>         if (!newpage->mapping)				(*2)
> 		VM_BUG_ON(page_count(newpage) != 1)
> 
> If a anonymous page(without mapping) is migrated successfully,
> this page is moved back to lru by putback_lru_page()(*1),
> and the page count becomes 1(pte only).
> 
yes.

> At the same time(between *1 and *2), if the process
> that owns this page are freeing this page, the page count
> becomes 0 and ->mapping becomes NULL by free_hot_cold_page(),
> so this BUG is caused.
> 
Agree, I see.

> I've not seen this BUG on real HW yet(seen twice on fake-numa
> hvm guest of Xen), but I think it can happen theoretically.
> 
That's (maybe) because page->mapping is not cleared when it's removed
from rmap. (and there is pagevec to dealy freeing....)

But ok, I see your point. KOSAKI-san is now writing patch set to
fix the whole. please see it.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-19  8:24               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-19  8:24 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Thu, 19 Jun 2008 17:00:59 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> > > > -		unlock = putback_lru_page(newpage);
> > > > +		putback_lru_page(newpage);
> > > >  	} else
> > > >  		newpage->mapping = NULL;
> > > 
> > > originally move_to_lru() called in unmap_and_move().
> > > unevictable infrastructure patch move to this point for 
> > > calling putback_lru_page() under page locked.
> > > 
> > > So, your patch remove page locked dependency.
> > > move to unmap_and_move() again is better.
> > > 
> > > it become page lock holding time reducing.
> > > 
> > ok, will look into again.
> > 
> 
> I agree with Kosaki-san.
> 
> And VM_BUG_ON(page_count(newpage) != 1) in unmap_and_move()
> is not correct again, IMHO.
> I got this BUG actually when testing this patch(with
> migratin_entry_wait fix).
> 
> unmap_and_move()
> 	move_to_new_page()
> 		migrate_page()
> 		remove_migration_ptes()
> 		putback_lru_page()			(*1)
> 	  :
>         if (!newpage->mapping)				(*2)
> 		VM_BUG_ON(page_count(newpage) != 1)
> 
> If a anonymous page(without mapping) is migrated successfully,
> this page is moved back to lru by putback_lru_page()(*1),
> and the page count becomes 1(pte only).
> 
yes.

> At the same time(between *1 and *2), if the process
> that owns this page are freeing this page, the page count
> becomes 0 and ->mapping becomes NULL by free_hot_cold_page(),
> so this BUG is caused.
> 
Agree, I see.

> I've not seen this BUG on real HW yet(seen twice on fake-numa
> hvm guest of Xen), but I think it can happen theoretically.
> 
That's (maybe) because page->mapping is not cleared when it's removed
from rmap. (and there is pagevec to dealy freeing....)

But ok, I see your point. KOSAKI-san is now writing patch set to
fix the whole. please see it.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-19  8:24               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-19  8:24 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Lee Schermerhorn,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Thu, 19 Jun 2008 17:00:59 +0900
Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org> wrote:

> > > > -		unlock = putback_lru_page(newpage);
> > > > +		putback_lru_page(newpage);
> > > >  	} else
> > > >  		newpage->mapping = NULL;
> > > 
> > > originally move_to_lru() called in unmap_and_move().
> > > unevictable infrastructure patch move to this point for 
> > > calling putback_lru_page() under page locked.
> > > 
> > > So, your patch remove page locked dependency.
> > > move to unmap_and_move() again is better.
> > > 
> > > it become page lock holding time reducing.
> > > 
> > ok, will look into again.
> > 
> 
> I agree with Kosaki-san.
> 
> And VM_BUG_ON(page_count(newpage) != 1) in unmap_and_move()
> is not correct again, IMHO.
> I got this BUG actually when testing this patch(with
> migratin_entry_wait fix).
> 
> unmap_and_move()
> 	move_to_new_page()
> 		migrate_page()
> 		remove_migration_ptes()
> 		putback_lru_page()			(*1)
> 	  :
>         if (!newpage->mapping)				(*2)
> 		VM_BUG_ON(page_count(newpage) != 1)
> 
> If a anonymous page(without mapping) is migrated successfully,
> this page is moved back to lru by putback_lru_page()(*1),
> and the page count becomes 1(pte only).
> 
yes.

> At the same time(between *1 and *2), if the process
> that owns this page are freeing this page, the page count
> becomes 0 and ->mapping becomes NULL by free_hot_cold_page(),
> so this BUG is caused.
> 
Agree, I see.

> I've not seen this BUG on real HW yet(seen twice on fake-numa
> hvm guest of Xen), but I think it can happen theoretically.
> 
That's (maybe) because page->mapping is not cleared when it's removed
from rmap. (and there is pagevec to dealy freeing....)

But ok, I see your point. KOSAKI-san is now writing patch set to
fix the whole. please see it.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3
  2008-06-18 17:55     ` 2.6.26-rc5-mm3 Daniel Walker
  (?)
@ 2008-06-19  9:13       ` Ingo Molnar
  -1 siblings, 0 replies; 290+ messages in thread
From: Ingo Molnar @ 2008-06-19  9:13 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Byron Bradley, Andrew Morton, linux-kernel, kernel-testers,
	linux-mm, Hua Zhong


* Daniel Walker <dwalker@mvista.com> wrote:

> 
> On Fri, 2008-06-13 at 00:32 +0100, Byron Bradley wrote:
> > Looks like x86 and ARM both fail to boot if PROFILE_LIKELY, FTRACE and 
> > DYNAMIC_FTRACE are selected. If any one of those three are disabled it 
> > boots (or fails in some other way which I'm looking at now). The serial 
> > console output from both machines when they fail to boot is below, let me 
> > know if there is any other information I can provide.
> 
> I was able to reproduce a hang on x86 with those options. The patch
> below is a potential fix. I think we don't want to trace
> do_check_likely(), since the ftrace internals might use likely/unlikely
> macro's which will just cause recursion back to do_check_likely()..
> 
> Signed-off-by: Daniel Walker <dwalker@mvista.com>
> 
> ---
>  lib/likely_prof.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-2.6.25/lib/likely_prof.c
> ===================================================================
> --- linux-2.6.25.orig/lib/likely_prof.c
> +++ linux-2.6.25/lib/likely_prof.c
> @@ -22,7 +22,7 @@
>  
>  static struct likeliness *likeliness_head;
>  
> -int do_check_likely(struct likeliness *likeliness, unsigned int ret)
> +int notrace do_check_likely(struct likeliness *likeliness, unsigned int ret)

the better fix would be to add likely_prof.o to this list of exceptions 
in lib/Makefile:

 ifdef CONFIG_FTRACE
 # Do not profile string.o, since it may be used in early boot or vdso
 CFLAGS_REMOVE_string.o = -pg
 # Also do not profile any debug utilities
 CFLAGS_REMOVE_spinlock_debug.o = -pg
 CFLAGS_REMOVE_list_debug.o = -pg
 CFLAGS_REMOVE_debugobjects.o = -pg
 endif

instead of adding notrace to the source.

	Ingo

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3
@ 2008-06-19  9:13       ` Ingo Molnar
  0 siblings, 0 replies; 290+ messages in thread
From: Ingo Molnar @ 2008-06-19  9:13 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Byron Bradley, Andrew Morton, linux-kernel, kernel-testers,
	linux-mm, Hua Zhong

* Daniel Walker <dwalker@mvista.com> wrote:

> 
> On Fri, 2008-06-13 at 00:32 +0100, Byron Bradley wrote:
> > Looks like x86 and ARM both fail to boot if PROFILE_LIKELY, FTRACE and 
> > DYNAMIC_FTRACE are selected. If any one of those three are disabled it 
> > boots (or fails in some other way which I'm looking at now). The serial 
> > console output from both machines when they fail to boot is below, let me 
> > know if there is any other information I can provide.
> 
> I was able to reproduce a hang on x86 with those options. The patch
> below is a potential fix. I think we don't want to trace
> do_check_likely(), since the ftrace internals might use likely/unlikely
> macro's which will just cause recursion back to do_check_likely()..
> 
> Signed-off-by: Daniel Walker <dwalker@mvista.com>
> 
> ---
>  lib/likely_prof.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-2.6.25/lib/likely_prof.c
> ===================================================================
> --- linux-2.6.25.orig/lib/likely_prof.c
> +++ linux-2.6.25/lib/likely_prof.c
> @@ -22,7 +22,7 @@
>  
>  static struct likeliness *likeliness_head;
>  
> -int do_check_likely(struct likeliness *likeliness, unsigned int ret)
> +int notrace do_check_likely(struct likeliness *likeliness, unsigned int ret)

the better fix would be to add likely_prof.o to this list of exceptions 
in lib/Makefile:

 ifdef CONFIG_FTRACE
 # Do not profile string.o, since it may be used in early boot or vdso
 CFLAGS_REMOVE_string.o = -pg
 # Also do not profile any debug utilities
 CFLAGS_REMOVE_spinlock_debug.o = -pg
 CFLAGS_REMOVE_list_debug.o = -pg
 CFLAGS_REMOVE_debugobjects.o = -pg
 endif

instead of adding notrace to the source.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3
@ 2008-06-19  9:13       ` Ingo Molnar
  0 siblings, 0 replies; 290+ messages in thread
From: Ingo Molnar @ 2008-06-19  9:13 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Byron Bradley, Andrew Morton,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Hua Zhong


* Daniel Walker <dwalker-Igf4POYTYCDQT0dZR+AlfA@public.gmane.org> wrote:

> 
> On Fri, 2008-06-13 at 00:32 +0100, Byron Bradley wrote:
> > Looks like x86 and ARM both fail to boot if PROFILE_LIKELY, FTRACE and 
> > DYNAMIC_FTRACE are selected. If any one of those three are disabled it 
> > boots (or fails in some other way which I'm looking at now). The serial 
> > console output from both machines when they fail to boot is below, let me 
> > know if there is any other information I can provide.
> 
> I was able to reproduce a hang on x86 with those options. The patch
> below is a potential fix. I think we don't want to trace
> do_check_likely(), since the ftrace internals might use likely/unlikely
> macro's which will just cause recursion back to do_check_likely()..
> 
> Signed-off-by: Daniel Walker <dwalker-Igf4POYTYCDQT0dZR+AlfA@public.gmane.org>
> 
> ---
>  lib/likely_prof.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-2.6.25/lib/likely_prof.c
> ===================================================================
> --- linux-2.6.25.orig/lib/likely_prof.c
> +++ linux-2.6.25/lib/likely_prof.c
> @@ -22,7 +22,7 @@
>  
>  static struct likeliness *likeliness_head;
>  
> -int do_check_likely(struct likeliness *likeliness, unsigned int ret)
> +int notrace do_check_likely(struct likeliness *likeliness, unsigned int ret)

the better fix would be to add likely_prof.o to this list of exceptions 
in lib/Makefile:

 ifdef CONFIG_FTRACE
 # Do not profile string.o, since it may be used in early boot or vdso
 CFLAGS_REMOVE_string.o = -pg
 # Also do not profile any debug utilities
 CFLAGS_REMOVE_spinlock_debug.o = -pg
 CFLAGS_REMOVE_list_debug.o = -pg
 CFLAGS_REMOVE_debugobjects.o = -pg
 endif

instead of adding notrace to the source.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run()
  2008-06-19  6:59   ` Hidehiro Kawai
  (?)
@ 2008-06-19 10:12     ` Rusty Russell
  -1 siblings, 0 replies; 290+ messages in thread
From: Rusty Russell @ 2008-06-19 10:12 UTC (permalink / raw)
  To: Hidehiro Kawai
  Cc: Andrew Morton, linux-kernel, kernel-testers, linux-mm, sugita,
	Satoshi OSHIMA, Ingo Molnar

On Thursday 19 June 2008 16:59:50 Hidehiro Kawai wrote:
> When a process loads a kernel module, __stop_machine_run() is called, and
> it calls sched_setscheduler() to give newly created kernel threads highest
> priority.  However, the process can have no CAP_SYS_NICE which required
> for sched_setscheduler() to increase the priority.  For example, SystemTap
> loads its module with only CAP_SYS_MODULE.  In this case,
> sched_setscheduler() returns -EPERM, then BUG() is called.

Hi Hidehiro,

	Nice catch.  This can happen in the current code, it just doesn't
BUG().

> Failure of sched_setscheduler() wouldn't be a real problem, so this
> patch just ignores it.

	Well, it can mean that the stop_machine blocks indefinitely.  Better
than a BUG(), but we should aim higher.

> Or, should we give the CAP_SYS_NICE capability temporarily?

        I don't think so.  It can be seen from another thread, and in theory
that should not see something random.  Worse, they can change it from
another thread.

How's this?

sched_setscheduler: add a flag to control access checks

Hidehiro Kawai noticed that sched_setscheduler() can fail in
stop_machine: it calls sched_setscheduler() from insmod, which can
have CAP_SYS_MODULE without CAP_SYS_NICE.

This simply introduces a flag to allow us to disable the capability
checks for internal callers (this is simpler than splitting the
sched_setscheduler() function, since it loops checking permissions).

The flag is only "false" (ie. no check) for the following cases, where
it shouldn't matter:
  drivers/input/touchscreen/ucb1400_ts.c:ucb1400_ts_thread()
	- it's a kthread
  drivers/mmc/core/sdio_irq.c:sdio_irq_thread()
	- also a kthread
  kernel/kthread.c:create_kthread()
	- making a kthread (from kthreadd)
  kernel/softlockup.c:watchdog()
	- also a kthread

And these cases could have failed before:
  kernel/softirq.c:cpu_callback()
	- CPU hotplug callback
  kernel/stop_machine.c:__stop_machine_run()
	- Called from various places, including modprobe()

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff -r 509f0724da6b drivers/input/touchscreen/ucb1400_ts.c
--- a/drivers/input/touchscreen/ucb1400_ts.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/drivers/input/touchscreen/ucb1400_ts.c	Thu Jun 19 19:36:40 2008 +1000
@@ -287,7 +287,7 @@ static int ucb1400_ts_thread(void *_ucb)
 	int valid = 0;
 	struct sched_param param = { .sched_priority = 1 };
 
-	sched_setscheduler(tsk, SCHED_FIFO, &param);
+	sched_setscheduler(tsk, SCHED_FIFO, &param, false);
 
 	set_freezable();
 	while (!kthread_should_stop()) {
diff -r 509f0724da6b drivers/mmc/core/sdio_irq.c
--- a/drivers/mmc/core/sdio_irq.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/drivers/mmc/core/sdio_irq.c	Thu Jun 19 19:36:40 2008 +1000
@@ -70,7 +70,7 @@ static int sdio_irq_thread(void *_host)
 	unsigned long period, idle_period;
 	int ret;
 
-	sched_setscheduler(current, SCHED_FIFO, &param);
+	sched_setscheduler(current, SCHED_FIFO, &param, false);
 
 	/*
 	 * We want to allow for SDIO cards to work even on non SDIO
diff -r 509f0724da6b include/linux/sched.h
--- a/include/linux/sched.h	Thu Jun 19 17:06:30 2008 +1000
+++ b/include/linux/sched.h	Thu Jun 19 19:36:40 2008 +1000
@@ -1654,7 +1654,8 @@ extern int can_nice(const struct task_st
 extern int can_nice(const struct task_struct *p, const int nice);
 extern int task_curr(const struct task_struct *p);
 extern int idle_cpu(int cpu);
-extern int sched_setscheduler(struct task_struct *, int, struct sched_param *);
+extern int sched_setscheduler(struct task_struct *, int, struct sched_param *,
+			      bool);
 extern struct task_struct *idle_task(int cpu);
 extern struct task_struct *curr_task(int cpu);
 extern void set_curr_task(int cpu, struct task_struct *p);
diff -r 509f0724da6b kernel/kthread.c
--- a/kernel/kthread.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/kernel/kthread.c	Thu Jun 19 19:36:40 2008 +1000
@@ -104,7 +104,7 @@ static void create_kthread(struct kthrea
 		 * root may have changed our (kthreadd's) priority or CPU mask.
 		 * The kernel thread should not inherit these properties.
 		 */
-		sched_setscheduler(create->result, SCHED_NORMAL, &param);
+		sched_setscheduler(create->result, SCHED_NORMAL, &param, false);
 		set_user_nice(create->result, KTHREAD_NICE_LEVEL);
 		set_cpus_allowed(create->result, CPU_MASK_ALL);
 	}
diff -r 509f0724da6b kernel/rtmutex-tester.c
--- a/kernel/rtmutex-tester.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/kernel/rtmutex-tester.c	Thu Jun 19 19:36:40 2008 +1000
@@ -327,7 +327,8 @@ static ssize_t sysfs_test_command(struct
 	switch (op) {
 	case RTTEST_SCHEDOT:
 		schedpar.sched_priority = 0;
-		ret = sched_setscheduler(threads[tid], SCHED_NORMAL, &schedpar);
+		ret = sched_setscheduler(threads[tid], SCHED_NORMAL, &schedpar,
+					 true);
 		if (ret)
 			return ret;
 		set_user_nice(current, 0);
@@ -335,7 +336,8 @@ static ssize_t sysfs_test_command(struct
 
 	case RTTEST_SCHEDRT:
 		schedpar.sched_priority = dat;
-		ret = sched_setscheduler(threads[tid], SCHED_FIFO, &schedpar);
+		ret = sched_setscheduler(threads[tid], SCHED_FIFO, &schedpar,
+					 true);
 		if (ret)
 			return ret;
 		break;
diff -r 509f0724da6b kernel/sched.c
--- a/kernel/sched.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/kernel/sched.c	Thu Jun 19 19:36:40 2008 +1000
@@ -4749,11 +4749,12 @@ __setscheduler(struct rq *rq, struct tas
  * @p: the task in question.
  * @policy: new policy.
  * @param: structure containing the new RT priority.
+ * @user: do checks to ensure this thread has permission
  *
  * NOTE that the task may be already dead.
  */
 int sched_setscheduler(struct task_struct *p, int policy,
-		       struct sched_param *param)
+		       struct sched_param *param, bool user)
 {
 	int retval, oldprio, oldpolicy = -1, on_rq, running;
 	unsigned long flags;
@@ -4785,7 +4786,7 @@ recheck:
 	/*
 	 * Allow unprivileged RT tasks to decrease priority:
 	 */
-	if (!capable(CAP_SYS_NICE)) {
+	if (user && !capable(CAP_SYS_NICE)) {
 		if (rt_policy(policy)) {
 			unsigned long rlim_rtprio;
 
@@ -4821,7 +4822,8 @@ recheck:
 	 * Do not allow realtime tasks into groups that have no runtime
 	 * assigned.
 	 */
-	if (rt_policy(policy) && task_group(p)->rt_bandwidth.rt_runtime == 0)
+	if (user
+	    && rt_policy(policy) && task_group(p)->rt_bandwidth.rt_runtime == 0)
 		return -EPERM;
 #endif
 
@@ -4888,7 +4890,7 @@ do_sched_setscheduler(pid_t pid, int pol
 	retval = -ESRCH;
 	p = find_process_by_pid(pid);
 	if (p != NULL)
-		retval = sched_setscheduler(p, policy, &lparam);
+		retval = sched_setscheduler(p, policy, &lparam, true);
 	rcu_read_unlock();
 
 	return retval;
diff -r 509f0724da6b kernel/softirq.c
--- a/kernel/softirq.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/kernel/softirq.c	Thu Jun 19 19:36:40 2008 +1000
@@ -645,7 +645,7 @@ static int __cpuinit cpu_callback(struct
 
 		p = per_cpu(ksoftirqd, hotcpu);
 		per_cpu(ksoftirqd, hotcpu) = NULL;
-		sched_setscheduler(p, SCHED_FIFO, &param);
+		sched_setscheduler(p, SCHED_FIFO, &param, false);
 		kthread_stop(p);
 		takeover_tasklets(hotcpu);
 		break;
diff -r 509f0724da6b kernel/softlockup.c
--- a/kernel/softlockup.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/kernel/softlockup.c	Thu Jun 19 19:36:40 2008 +1000
@@ -211,7 +211,7 @@ static int watchdog(void *__bind_cpu)
 	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
 	int this_cpu = (long)__bind_cpu;
 
-	sched_setscheduler(current, SCHED_FIFO, &param);
+	sched_setscheduler(current, SCHED_FIFO, &param, false);
 
 	/* initialize timestamp */
 	touch_softlockup_watchdog();
diff -r 509f0724da6b kernel/stop_machine.c
--- a/kernel/stop_machine.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/kernel/stop_machine.c	Thu Jun 19 19:36:40 2008 +1000
@@ -187,7 +187,7 @@ struct task_struct *__stop_machine_run(i
 		struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
 
 		/* One high-prio thread per cpu.  We'll do this one. */
-		sched_setscheduler(p, SCHED_FIFO, &param);
+		sched_setscheduler(p, SCHED_FIFO, &param, false);
 		kthread_bind(p, cpu);
 		wake_up_process(p);
 		wait_for_completion(&smdata.done);

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run()
@ 2008-06-19 10:12     ` Rusty Russell
  0 siblings, 0 replies; 290+ messages in thread
From: Rusty Russell @ 2008-06-19 10:12 UTC (permalink / raw)
  To: Hidehiro Kawai
  Cc: Andrew Morton, linux-kernel, kernel-testers, linux-mm, sugita,
	Satoshi OSHIMA, Ingo Molnar

On Thursday 19 June 2008 16:59:50 Hidehiro Kawai wrote:
> When a process loads a kernel module, __stop_machine_run() is called, and
> it calls sched_setscheduler() to give newly created kernel threads highest
> priority.  However, the process can have no CAP_SYS_NICE which required
> for sched_setscheduler() to increase the priority.  For example, SystemTap
> loads its module with only CAP_SYS_MODULE.  In this case,
> sched_setscheduler() returns -EPERM, then BUG() is called.

Hi Hidehiro,

	Nice catch.  This can happen in the current code, it just doesn't
BUG().

> Failure of sched_setscheduler() wouldn't be a real problem, so this
> patch just ignores it.

	Well, it can mean that the stop_machine blocks indefinitely.  Better
than a BUG(), but we should aim higher.

> Or, should we give the CAP_SYS_NICE capability temporarily?

        I don't think so.  It can be seen from another thread, and in theory
that should not see something random.  Worse, they can change it from
another thread.

How's this?

sched_setscheduler: add a flag to control access checks

Hidehiro Kawai noticed that sched_setscheduler() can fail in
stop_machine: it calls sched_setscheduler() from insmod, which can
have CAP_SYS_MODULE without CAP_SYS_NICE.

This simply introduces a flag to allow us to disable the capability
checks for internal callers (this is simpler than splitting the
sched_setscheduler() function, since it loops checking permissions).

The flag is only "false" (ie. no check) for the following cases, where
it shouldn't matter:
  drivers/input/touchscreen/ucb1400_ts.c:ucb1400_ts_thread()
	- it's a kthread
  drivers/mmc/core/sdio_irq.c:sdio_irq_thread()
	- also a kthread
  kernel/kthread.c:create_kthread()
	- making a kthread (from kthreadd)
  kernel/softlockup.c:watchdog()
	- also a kthread

And these cases could have failed before:
  kernel/softirq.c:cpu_callback()
	- CPU hotplug callback
  kernel/stop_machine.c:__stop_machine_run()
	- Called from various places, including modprobe()

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff -r 509f0724da6b drivers/input/touchscreen/ucb1400_ts.c
--- a/drivers/input/touchscreen/ucb1400_ts.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/drivers/input/touchscreen/ucb1400_ts.c	Thu Jun 19 19:36:40 2008 +1000
@@ -287,7 +287,7 @@ static int ucb1400_ts_thread(void *_ucb)
 	int valid = 0;
 	struct sched_param param = { .sched_priority = 1 };
 
-	sched_setscheduler(tsk, SCHED_FIFO, &param);
+	sched_setscheduler(tsk, SCHED_FIFO, &param, false);
 
 	set_freezable();
 	while (!kthread_should_stop()) {
diff -r 509f0724da6b drivers/mmc/core/sdio_irq.c
--- a/drivers/mmc/core/sdio_irq.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/drivers/mmc/core/sdio_irq.c	Thu Jun 19 19:36:40 2008 +1000
@@ -70,7 +70,7 @@ static int sdio_irq_thread(void *_host)
 	unsigned long period, idle_period;
 	int ret;
 
-	sched_setscheduler(current, SCHED_FIFO, &param);
+	sched_setscheduler(current, SCHED_FIFO, &param, false);
 
 	/*
 	 * We want to allow for SDIO cards to work even on non SDIO
diff -r 509f0724da6b include/linux/sched.h
--- a/include/linux/sched.h	Thu Jun 19 17:06:30 2008 +1000
+++ b/include/linux/sched.h	Thu Jun 19 19:36:40 2008 +1000
@@ -1654,7 +1654,8 @@ extern int can_nice(const struct task_st
 extern int can_nice(const struct task_struct *p, const int nice);
 extern int task_curr(const struct task_struct *p);
 extern int idle_cpu(int cpu);
-extern int sched_setscheduler(struct task_struct *, int, struct sched_param *);
+extern int sched_setscheduler(struct task_struct *, int, struct sched_param *,
+			      bool);
 extern struct task_struct *idle_task(int cpu);
 extern struct task_struct *curr_task(int cpu);
 extern void set_curr_task(int cpu, struct task_struct *p);
diff -r 509f0724da6b kernel/kthread.c
--- a/kernel/kthread.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/kernel/kthread.c	Thu Jun 19 19:36:40 2008 +1000
@@ -104,7 +104,7 @@ static void create_kthread(struct kthrea
 		 * root may have changed our (kthreadd's) priority or CPU mask.
 		 * The kernel thread should not inherit these properties.
 		 */
-		sched_setscheduler(create->result, SCHED_NORMAL, &param);
+		sched_setscheduler(create->result, SCHED_NORMAL, &param, false);
 		set_user_nice(create->result, KTHREAD_NICE_LEVEL);
 		set_cpus_allowed(create->result, CPU_MASK_ALL);
 	}
diff -r 509f0724da6b kernel/rtmutex-tester.c
--- a/kernel/rtmutex-tester.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/kernel/rtmutex-tester.c	Thu Jun 19 19:36:40 2008 +1000
@@ -327,7 +327,8 @@ static ssize_t sysfs_test_command(struct
 	switch (op) {
 	case RTTEST_SCHEDOT:
 		schedpar.sched_priority = 0;
-		ret = sched_setscheduler(threads[tid], SCHED_NORMAL, &schedpar);
+		ret = sched_setscheduler(threads[tid], SCHED_NORMAL, &schedpar,
+					 true);
 		if (ret)
 			return ret;
 		set_user_nice(current, 0);
@@ -335,7 +336,8 @@ static ssize_t sysfs_test_command(struct
 
 	case RTTEST_SCHEDRT:
 		schedpar.sched_priority = dat;
-		ret = sched_setscheduler(threads[tid], SCHED_FIFO, &schedpar);
+		ret = sched_setscheduler(threads[tid], SCHED_FIFO, &schedpar,
+					 true);
 		if (ret)
 			return ret;
 		break;
diff -r 509f0724da6b kernel/sched.c
--- a/kernel/sched.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/kernel/sched.c	Thu Jun 19 19:36:40 2008 +1000
@@ -4749,11 +4749,12 @@ __setscheduler(struct rq *rq, struct tas
  * @p: the task in question.
  * @policy: new policy.
  * @param: structure containing the new RT priority.
+ * @user: do checks to ensure this thread has permission
  *
  * NOTE that the task may be already dead.
  */
 int sched_setscheduler(struct task_struct *p, int policy,
-		       struct sched_param *param)
+		       struct sched_param *param, bool user)
 {
 	int retval, oldprio, oldpolicy = -1, on_rq, running;
 	unsigned long flags;
@@ -4785,7 +4786,7 @@ recheck:
 	/*
 	 * Allow unprivileged RT tasks to decrease priority:
 	 */
-	if (!capable(CAP_SYS_NICE)) {
+	if (user && !capable(CAP_SYS_NICE)) {
 		if (rt_policy(policy)) {
 			unsigned long rlim_rtprio;
 
@@ -4821,7 +4822,8 @@ recheck:
 	 * Do not allow realtime tasks into groups that have no runtime
 	 * assigned.
 	 */
-	if (rt_policy(policy) && task_group(p)->rt_bandwidth.rt_runtime == 0)
+	if (user
+	    && rt_policy(policy) && task_group(p)->rt_bandwidth.rt_runtime == 0)
 		return -EPERM;
 #endif
 
@@ -4888,7 +4890,7 @@ do_sched_setscheduler(pid_t pid, int pol
 	retval = -ESRCH;
 	p = find_process_by_pid(pid);
 	if (p != NULL)
-		retval = sched_setscheduler(p, policy, &lparam);
+		retval = sched_setscheduler(p, policy, &lparam, true);
 	rcu_read_unlock();
 
 	return retval;
diff -r 509f0724da6b kernel/softirq.c
--- a/kernel/softirq.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/kernel/softirq.c	Thu Jun 19 19:36:40 2008 +1000
@@ -645,7 +645,7 @@ static int __cpuinit cpu_callback(struct
 
 		p = per_cpu(ksoftirqd, hotcpu);
 		per_cpu(ksoftirqd, hotcpu) = NULL;
-		sched_setscheduler(p, SCHED_FIFO, &param);
+		sched_setscheduler(p, SCHED_FIFO, &param, false);
 		kthread_stop(p);
 		takeover_tasklets(hotcpu);
 		break;
diff -r 509f0724da6b kernel/softlockup.c
--- a/kernel/softlockup.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/kernel/softlockup.c	Thu Jun 19 19:36:40 2008 +1000
@@ -211,7 +211,7 @@ static int watchdog(void *__bind_cpu)
 	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
 	int this_cpu = (long)__bind_cpu;
 
-	sched_setscheduler(current, SCHED_FIFO, &param);
+	sched_setscheduler(current, SCHED_FIFO, &param, false);
 
 	/* initialize timestamp */
 	touch_softlockup_watchdog();
diff -r 509f0724da6b kernel/stop_machine.c
--- a/kernel/stop_machine.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/kernel/stop_machine.c	Thu Jun 19 19:36:40 2008 +1000
@@ -187,7 +187,7 @@ struct task_struct *__stop_machine_run(i
 		struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
 
 		/* One high-prio thread per cpu.  We'll do this one. */
-		sched_setscheduler(p, SCHED_FIFO, &param);
+		sched_setscheduler(p, SCHED_FIFO, &param, false);
 		kthread_bind(p, cpu);
 		wake_up_process(p);
 		wait_for_completion(&smdata.done);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run()
@ 2008-06-19 10:12     ` Rusty Russell
  0 siblings, 0 replies; 290+ messages in thread
From: Rusty Russell @ 2008-06-19 10:12 UTC (permalink / raw)
  To: Hidehiro Kawai
  Cc: Andrew Morton, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, sugita, Satoshi OSHIMA,
	Ingo Molnar

On Thursday 19 June 2008 16:59:50 Hidehiro Kawai wrote:
> When a process loads a kernel module, __stop_machine_run() is called, and
> it calls sched_setscheduler() to give newly created kernel threads highest
> priority.  However, the process can have no CAP_SYS_NICE which required
> for sched_setscheduler() to increase the priority.  For example, SystemTap
> loads its module with only CAP_SYS_MODULE.  In this case,
> sched_setscheduler() returns -EPERM, then BUG() is called.

Hi Hidehiro,

	Nice catch.  This can happen in the current code, it just doesn't
BUG().

> Failure of sched_setscheduler() wouldn't be a real problem, so this
> patch just ignores it.

	Well, it can mean that the stop_machine blocks indefinitely.  Better
than a BUG(), but we should aim higher.

> Or, should we give the CAP_SYS_NICE capability temporarily?

        I don't think so.  It can be seen from another thread, and in theory
that should not see something random.  Worse, they can change it from
another thread.

How's this?

sched_setscheduler: add a flag to control access checks

Hidehiro Kawai noticed that sched_setscheduler() can fail in
stop_machine: it calls sched_setscheduler() from insmod, which can
have CAP_SYS_MODULE without CAP_SYS_NICE.

This simply introduces a flag to allow us to disable the capability
checks for internal callers (this is simpler than splitting the
sched_setscheduler() function, since it loops checking permissions).

The flag is only "false" (ie. no check) for the following cases, where
it shouldn't matter:
  drivers/input/touchscreen/ucb1400_ts.c:ucb1400_ts_thread()
	- it's a kthread
  drivers/mmc/core/sdio_irq.c:sdio_irq_thread()
	- also a kthread
  kernel/kthread.c:create_kthread()
	- making a kthread (from kthreadd)
  kernel/softlockup.c:watchdog()
	- also a kthread

And these cases could have failed before:
  kernel/softirq.c:cpu_callback()
	- CPU hotplug callback
  kernel/stop_machine.c:__stop_machine_run()
	- Called from various places, including modprobe()

Signed-off-by: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>

diff -r 509f0724da6b drivers/input/touchscreen/ucb1400_ts.c
--- a/drivers/input/touchscreen/ucb1400_ts.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/drivers/input/touchscreen/ucb1400_ts.c	Thu Jun 19 19:36:40 2008 +1000
@@ -287,7 +287,7 @@ static int ucb1400_ts_thread(void *_ucb)
 	int valid = 0;
 	struct sched_param param = { .sched_priority = 1 };
 
-	sched_setscheduler(tsk, SCHED_FIFO, &param);
+	sched_setscheduler(tsk, SCHED_FIFO, &param, false);
 
 	set_freezable();
 	while (!kthread_should_stop()) {
diff -r 509f0724da6b drivers/mmc/core/sdio_irq.c
--- a/drivers/mmc/core/sdio_irq.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/drivers/mmc/core/sdio_irq.c	Thu Jun 19 19:36:40 2008 +1000
@@ -70,7 +70,7 @@ static int sdio_irq_thread(void *_host)
 	unsigned long period, idle_period;
 	int ret;
 
-	sched_setscheduler(current, SCHED_FIFO, &param);
+	sched_setscheduler(current, SCHED_FIFO, &param, false);
 
 	/*
 	 * We want to allow for SDIO cards to work even on non SDIO
diff -r 509f0724da6b include/linux/sched.h
--- a/include/linux/sched.h	Thu Jun 19 17:06:30 2008 +1000
+++ b/include/linux/sched.h	Thu Jun 19 19:36:40 2008 +1000
@@ -1654,7 +1654,8 @@ extern int can_nice(const struct task_st
 extern int can_nice(const struct task_struct *p, const int nice);
 extern int task_curr(const struct task_struct *p);
 extern int idle_cpu(int cpu);
-extern int sched_setscheduler(struct task_struct *, int, struct sched_param *);
+extern int sched_setscheduler(struct task_struct *, int, struct sched_param *,
+			      bool);
 extern struct task_struct *idle_task(int cpu);
 extern struct task_struct *curr_task(int cpu);
 extern void set_curr_task(int cpu, struct task_struct *p);
diff -r 509f0724da6b kernel/kthread.c
--- a/kernel/kthread.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/kernel/kthread.c	Thu Jun 19 19:36:40 2008 +1000
@@ -104,7 +104,7 @@ static void create_kthread(struct kthrea
 		 * root may have changed our (kthreadd's) priority or CPU mask.
 		 * The kernel thread should not inherit these properties.
 		 */
-		sched_setscheduler(create->result, SCHED_NORMAL, &param);
+		sched_setscheduler(create->result, SCHED_NORMAL, &param, false);
 		set_user_nice(create->result, KTHREAD_NICE_LEVEL);
 		set_cpus_allowed(create->result, CPU_MASK_ALL);
 	}
diff -r 509f0724da6b kernel/rtmutex-tester.c
--- a/kernel/rtmutex-tester.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/kernel/rtmutex-tester.c	Thu Jun 19 19:36:40 2008 +1000
@@ -327,7 +327,8 @@ static ssize_t sysfs_test_command(struct
 	switch (op) {
 	case RTTEST_SCHEDOT:
 		schedpar.sched_priority = 0;
-		ret = sched_setscheduler(threads[tid], SCHED_NORMAL, &schedpar);
+		ret = sched_setscheduler(threads[tid], SCHED_NORMAL, &schedpar,
+					 true);
 		if (ret)
 			return ret;
 		set_user_nice(current, 0);
@@ -335,7 +336,8 @@ static ssize_t sysfs_test_command(struct
 
 	case RTTEST_SCHEDRT:
 		schedpar.sched_priority = dat;
-		ret = sched_setscheduler(threads[tid], SCHED_FIFO, &schedpar);
+		ret = sched_setscheduler(threads[tid], SCHED_FIFO, &schedpar,
+					 true);
 		if (ret)
 			return ret;
 		break;
diff -r 509f0724da6b kernel/sched.c
--- a/kernel/sched.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/kernel/sched.c	Thu Jun 19 19:36:40 2008 +1000
@@ -4749,11 +4749,12 @@ __setscheduler(struct rq *rq, struct tas
  * @p: the task in question.
  * @policy: new policy.
  * @param: structure containing the new RT priority.
+ * @user: do checks to ensure this thread has permission
  *
  * NOTE that the task may be already dead.
  */
 int sched_setscheduler(struct task_struct *p, int policy,
-		       struct sched_param *param)
+		       struct sched_param *param, bool user)
 {
 	int retval, oldprio, oldpolicy = -1, on_rq, running;
 	unsigned long flags;
@@ -4785,7 +4786,7 @@ recheck:
 	/*
 	 * Allow unprivileged RT tasks to decrease priority:
 	 */
-	if (!capable(CAP_SYS_NICE)) {
+	if (user && !capable(CAP_SYS_NICE)) {
 		if (rt_policy(policy)) {
 			unsigned long rlim_rtprio;
 
@@ -4821,7 +4822,8 @@ recheck:
 	 * Do not allow realtime tasks into groups that have no runtime
 	 * assigned.
 	 */
-	if (rt_policy(policy) && task_group(p)->rt_bandwidth.rt_runtime == 0)
+	if (user
+	    && rt_policy(policy) && task_group(p)->rt_bandwidth.rt_runtime == 0)
 		return -EPERM;
 #endif
 
@@ -4888,7 +4890,7 @@ do_sched_setscheduler(pid_t pid, int pol
 	retval = -ESRCH;
 	p = find_process_by_pid(pid);
 	if (p != NULL)
-		retval = sched_setscheduler(p, policy, &lparam);
+		retval = sched_setscheduler(p, policy, &lparam, true);
 	rcu_read_unlock();
 
 	return retval;
diff -r 509f0724da6b kernel/softirq.c
--- a/kernel/softirq.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/kernel/softirq.c	Thu Jun 19 19:36:40 2008 +1000
@@ -645,7 +645,7 @@ static int __cpuinit cpu_callback(struct
 
 		p = per_cpu(ksoftirqd, hotcpu);
 		per_cpu(ksoftirqd, hotcpu) = NULL;
-		sched_setscheduler(p, SCHED_FIFO, &param);
+		sched_setscheduler(p, SCHED_FIFO, &param, false);
 		kthread_stop(p);
 		takeover_tasklets(hotcpu);
 		break;
diff -r 509f0724da6b kernel/softlockup.c
--- a/kernel/softlockup.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/kernel/softlockup.c	Thu Jun 19 19:36:40 2008 +1000
@@ -211,7 +211,7 @@ static int watchdog(void *__bind_cpu)
 	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
 	int this_cpu = (long)__bind_cpu;
 
-	sched_setscheduler(current, SCHED_FIFO, &param);
+	sched_setscheduler(current, SCHED_FIFO, &param, false);
 
 	/* initialize timestamp */
 	touch_softlockup_watchdog();
diff -r 509f0724da6b kernel/stop_machine.c
--- a/kernel/stop_machine.c	Thu Jun 19 17:06:30 2008 +1000
+++ b/kernel/stop_machine.c	Thu Jun 19 19:36:40 2008 +1000
@@ -187,7 +187,7 @@ struct task_struct *__stop_machine_run(i
 		struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
 
 		/* One high-prio thread per cpu.  We'll do this one. */
-		sched_setscheduler(p, SCHED_FIFO, &param);
+		sched_setscheduler(p, SCHED_FIFO, &param, false);
 		kthread_bind(p, cpu);
 		wake_up_process(p);
 		wait_for_completion(&smdata.done);
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3
  2008-06-19  9:13       ` 2.6.26-rc5-mm3 Ingo Molnar
  (?)
@ 2008-06-19 14:39         ` Daniel Walker
  -1 siblings, 0 replies; 290+ messages in thread
From: Daniel Walker @ 2008-06-19 14:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Byron Bradley, Andrew Morton, linux-kernel, kernel-testers,
	linux-mm, Hua Zhong


On Thu, 2008-06-19 at 11:13 +0200, Ingo Molnar wrote:

> the better fix would be to add likely_prof.o to this list of exceptions 
> in lib/Makefile:
> 
>  ifdef CONFIG_FTRACE
>  # Do not profile string.o, since it may be used in early boot or vdso
>  CFLAGS_REMOVE_string.o = -pg
>  # Also do not profile any debug utilities
>  CFLAGS_REMOVE_spinlock_debug.o = -pg
>  CFLAGS_REMOVE_list_debug.o = -pg
>  CFLAGS_REMOVE_debugobjects.o = -pg
>  endif
> 
> instead of adding notrace to the source.
> 
> 	Ingo

Here's the fix mentioned above.

--

Remove tracing from likely profiling since it could cause recursion if
ftrace uses likely/unlikely macro's internally.

Signed-off-by: Daniel Walker <dwalker@mvista.com>

---
 lib/Makefile |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6.25/lib/Makefile
===================================================================
--- linux-2.6.25.orig/lib/Makefile
+++ linux-2.6.25/lib/Makefile
@@ -15,6 +15,8 @@ CFLAGS_REMOVE_string.o = -pg
 CFLAGS_REMOVE_spinlock_debug.o = -pg
 CFLAGS_REMOVE_list_debug.o = -pg
 CFLAGS_REMOVE_debugobjects.o = -pg
+# likely profiling can cause recursion in ftrace, so don't trace it.
+CFLAGS_REMOVE_likely_prof.o = -pg
 endif
 
 lib-$(CONFIG_MMU) += ioremap.o



^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3
@ 2008-06-19 14:39         ` Daniel Walker
  0 siblings, 0 replies; 290+ messages in thread
From: Daniel Walker @ 2008-06-19 14:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Byron Bradley, Andrew Morton, linux-kernel, kernel-testers,
	linux-mm, Hua Zhong

On Thu, 2008-06-19 at 11:13 +0200, Ingo Molnar wrote:

> the better fix would be to add likely_prof.o to this list of exceptions 
> in lib/Makefile:
> 
>  ifdef CONFIG_FTRACE
>  # Do not profile string.o, since it may be used in early boot or vdso
>  CFLAGS_REMOVE_string.o = -pg
>  # Also do not profile any debug utilities
>  CFLAGS_REMOVE_spinlock_debug.o = -pg
>  CFLAGS_REMOVE_list_debug.o = -pg
>  CFLAGS_REMOVE_debugobjects.o = -pg
>  endif
> 
> instead of adding notrace to the source.
> 
> 	Ingo

Here's the fix mentioned above.

--

Remove tracing from likely profiling since it could cause recursion if
ftrace uses likely/unlikely macro's internally.

Signed-off-by: Daniel Walker <dwalker@mvista.com>

---
 lib/Makefile |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6.25/lib/Makefile
===================================================================
--- linux-2.6.25.orig/lib/Makefile
+++ linux-2.6.25/lib/Makefile
@@ -15,6 +15,8 @@ CFLAGS_REMOVE_string.o = -pg
 CFLAGS_REMOVE_spinlock_debug.o = -pg
 CFLAGS_REMOVE_list_debug.o = -pg
 CFLAGS_REMOVE_debugobjects.o = -pg
+# likely profiling can cause recursion in ftrace, so don't trace it.
+CFLAGS_REMOVE_likely_prof.o = -pg
 endif
 
 lib-$(CONFIG_MMU) += ioremap.o


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3
@ 2008-06-19 14:39         ` Daniel Walker
  0 siblings, 0 replies; 290+ messages in thread
From: Daniel Walker @ 2008-06-19 14:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Byron Bradley, Andrew Morton,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Hua Zhong


On Thu, 2008-06-19 at 11:13 +0200, Ingo Molnar wrote:

> the better fix would be to add likely_prof.o to this list of exceptions 
> in lib/Makefile:
> 
>  ifdef CONFIG_FTRACE
>  # Do not profile string.o, since it may be used in early boot or vdso
>  CFLAGS_REMOVE_string.o = -pg
>  # Also do not profile any debug utilities
>  CFLAGS_REMOVE_spinlock_debug.o = -pg
>  CFLAGS_REMOVE_list_debug.o = -pg
>  CFLAGS_REMOVE_debugobjects.o = -pg
>  endif
> 
> instead of adding notrace to the source.
> 
> 	Ingo

Here's the fix mentioned above.

--

Remove tracing from likely profiling since it could cause recursion if
ftrace uses likely/unlikely macro's internally.

Signed-off-by: Daniel Walker <dwalker-Igf4POYTYCDQT0dZR+AlfA@public.gmane.org>

---
 lib/Makefile |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6.25/lib/Makefile
===================================================================
--- linux-2.6.25.orig/lib/Makefile
+++ linux-2.6.25/lib/Makefile
@@ -15,6 +15,8 @@ CFLAGS_REMOVE_string.o = -pg
 CFLAGS_REMOVE_spinlock_debug.o = -pg
 CFLAGS_REMOVE_list_debug.o = -pg
 CFLAGS_REMOVE_debugobjects.o = -pg
+# likely profiling can cause recursion in ftrace, so don't trace it.
+CFLAGS_REMOVE_likely_prof.o = -pg
 endif
 
 lib-$(CONFIG_MMU) += ioremap.o


--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
  2008-06-19  0:22           ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-19 14:45             ` Lee Schermerhorn
  -1 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-19 14:45 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Thu, 2008-06-19 at 09:22 +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 18 Jun 2008 14:21:06 -0400
> Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> 
> > On Wed, 2008-06-18 at 18:40 +0900, KAMEZAWA Hiroyuki wrote:
> > > Lee-san, how about this ?
> > > Tested on x86-64 and tried Nisimura-san's test at el. works good now.
> > 
> > I have been testing with my work load on both ia64 and x86_64 and it
> > seems to be working well.  I'll let them run for a day or so.
> > 
> thank you.
> <snip>

Update:

On x86_64 [32GB, 4xdual-core Opteron], my work load has run for ~20:40
hours.  Still running.

On ia64 [32G, 16cpu, 4 node], the system started going into softlockup
after ~7 hours.  Stack trace [below] indicates zone-lru lock in
__page_cache_release() called from put_page().  Either heavy contention
or failure to unlock.  Note that previous run, with patches to
putback_lru_page() and unmap_and_move(), the same load ran for ~18 hours
before I shut it down to try these patches.

I'm going to try again with the collected patches posted by Kosaki-san
[for which, Thanks!].  If it occurs again, I'll deconfig the unevictable
lru feature and see if I can reproduce it there.  It may be unrelated to
the unevictable lru patches.

> 
> > > @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
> > >  	struct page *page;
> > >  	pte_t pte;
> > >  
> > > +	/*
> > > +	 * page is never be unmapped by page-reclaim. we lock this page now.
> > > +	 */
> > 
> > I don't understand what you're trying to say here.  That is, what the
> > point of this comment is...
> > 
> We access the page-table without taking pte_lock. But this vm is MLOCKED
> and migration-race is handled. So we don't need to be too nervous to access
> the pte. I'll consider more meaningful words.

OK, so you just want to note that we're accessing the pte w/o locking
and that this is safe because the vma has been VM_LOCKED and all pages
should be mlocked?  

I'll note that the vma is NOT VM_LOCKED during the pte walk.
munlock_vma_pages_range() resets it so that try_to_unlock(), called from
munlock_vma_page(), won't try to re-mlock the page.  However, we hold
the mmap sem for write, so faults are held off--no need to worry about a
COW fault occurring between when the VM_LOCKED was cleared and before
the page is munlocked.  If that could occur, it could open a window
where a non-mlocked page is mapped in this vma, and page reclaim could
potentially unmap the page.  Shouldn't be an issue as long as we never
downgrade the semaphore to read during munlock.

Lee

----------
softlockup stack trace for "usex" workload on ia64:

BUG: soft lockup - CPU#13 stuck for 61s! [usex:124359]
Modules linked in: ipv6 sunrpc dm_mirror dm_log dm_multipath scsi_dh dm_mod pci_slot fan dock thermal sg sr_mod processor button container ehci_hcd ohci_hcd uhci_hcd usbcore

Pid: 124359, CPU 13, comm:                 usex
psr : 00001010085a6010 ifs : 8000000000000000 ip  : [<a00000010000a1a0>]    Tainted: G      D   (2.6.26-rc5-mm3-kame-rework+mcl_inherit)
ip is at ia64_spinlock_contention+0x20/0x60
unat: 0000000000000000 pfs : 0000000000000081 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : a65955959a96e969
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001001264a0 b6  : a0000001006f0350 b7  : a00000010000b940
f6  : 0ffff8000000000000000 f7  : 1003ecf3cf3cf3cf3cf3d
f8  : 1003e0000000000000001 f9  : 1003e0000000000000015
f10 : 1003e000003a82aaab1fb f11 : 1003e0000000000000000
r1  : a000000100c03650 r2  : 000000000000038a r3  : 0000000000000001
r8  : 00000010085a6010 r9  : 0000000000080028 r10 : 000000000000000b
r11 : 0000000000000a80 r12 : e0000741aaac7d50 r13 : e0000741aaac0000
r14 : 0000000000000000 r15 : a000400741329148 r16 : e000074000060100
r17 : e000076000078e98 r18 : 0000000000000015 r19 : 0000000000000018
r20 : 0000000000000003 r21 : 0000000000000002 r22 : e000076000078e88
r23 : e000076000078e80 r24 : 0000000000000001 r25 : 0240000000080028
r26 : ffffffffffff04d8 r27 : 00000010085a6010 r28 : 7fe3382473f8b380
r29 : 9c00000000000000 r30 : 0000000000000001 r31 : e000074000061400

Call Trace:
 [<a000000100015e00>] show_stack+0x80/0xa0
                                sp=e0000741aaac79b0 bsp=e0000741aaac1528
 [<a000000100016700>] show_regs+0x880/0x8c0
                                sp=e0000741aaac7b80 bsp=e0000741aaac14d0
 [<a0000001000fbbe0>] softlockup_tick+0x2e0/0x340
                                sp=e0000741aaac7b80 bsp=e0000741aaac1480
 [<a0000001000a9400>] run_local_timers+0x40/0x60
                                sp=e0000741aaac7b80 bsp=e0000741aaac1468
 [<a0000001000a9460>] update_process_times+0x40/0xc0
                                sp=e0000741aaac7b80 bsp=e0000741aaac1438
 [<a00000010003ded0>] timer_interrupt+0x1b0/0x4a0
                                sp=e0000741aaac7b80 bsp=e0000741aaac13d0
 [<a0000001000fc480>] handle_IRQ_event+0x80/0x120
                                sp=e0000741aaac7b80 bsp=e0000741aaac1398
 [<a0000001000fc660>] __do_IRQ+0x140/0x440
                                sp=e0000741aaac7b80 bsp=e0000741aaac1338
 [<a0000001000136d0>] ia64_handle_irq+0x3f0/0x420
                                sp=e0000741aaac7b80 bsp=e0000741aaac12c0
 [<a00000010000c120>] ia64_native_leave_kernel+0x0/0x270
                                sp=e0000741aaac7b80 bsp=e0000741aaac12c0
 [<a00000010000a1a0>] ia64_spinlock_contention+0x20/0x60
                                sp=e0000741aaac7d50 bsp=e0000741aaac12c0
 [<a0000001006f0350>] _spin_lock_irqsave+0x50/0x60
                                sp=e0000741aaac7d50 bsp=e0000741aaac12b8

Probably zone lru_lock in __page_cache_release().

 [<a0000001001264a0>] put_page+0x100/0x300
                                sp=e0000741aaac7d50 bsp=e0000741aaac1280
 [<a000000100157170>] free_page_and_swap_cache+0x70/0xe0
                                sp=e0000741aaac7d50 bsp=e0000741aaac1260
 [<a000000100145a10>] exit_mmap+0x3b0/0x580
                                sp=e0000741aaac7d50 bsp=e0000741aaac1210
 [<a00000010008b420>] mmput+0x80/0x1c0
                                sp=e0000741aaac7e10 bsp=e0000741aaac11d8

NOTE:  all cpus show similar stack traces above here.  Some, however, get
here from do_exit()/exit_mm(), rather than via execve().

 [<a00000010019c2c0>] flush_old_exec+0x5a0/0x1520
                                sp=e0000741aaac7e10 bsp=e0000741aaac10f0
 [<a000000100213080>] load_elf_binary+0x7e0/0x2600
                                sp=e0000741aaac7e20 bsp=e0000741aaac0fb8
 [<a00000010019b7a0>] search_binary_handler+0x1a0/0x520
                                sp=e0000741aaac7e20 bsp=e0000741aaac0f30
 [<a00000010019e4e0>] do_execve+0x320/0x3e0
                                sp=e0000741aaac7e20 bsp=e0000741aaac0ed0
 [<a000000100014d00>] sys_execve+0x60/0xc0
                                sp=e0000741aaac7e30 bsp=e0000741aaac0e98
 [<a00000010000b690>] ia64_execve+0x30/0x140
                                sp=e0000741aaac7e30 bsp=e0000741aaac0e48
 [<a00000010000bfa0>] ia64_ret_from_syscall+0x0/0x20
                                sp=e0000741aaac7e30 bsp=e0000741aaac0e48
 [<a000000000010720>] __start_ivt_text+0xffffffff00010720/0x400
                                sp=e0000741aaac8000 bsp=e0000741aaac0e48




^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-19 14:45             ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-19 14:45 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Thu, 2008-06-19 at 09:22 +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 18 Jun 2008 14:21:06 -0400
> Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> 
> > On Wed, 2008-06-18 at 18:40 +0900, KAMEZAWA Hiroyuki wrote:
> > > Lee-san, how about this ?
> > > Tested on x86-64 and tried Nisimura-san's test at el. works good now.
> > 
> > I have been testing with my work load on both ia64 and x86_64 and it
> > seems to be working well.  I'll let them run for a day or so.
> > 
> thank you.
> <snip>

Update:

On x86_64 [32GB, 4xdual-core Opteron], my work load has run for ~20:40
hours.  Still running.

On ia64 [32G, 16cpu, 4 node], the system started going into softlockup
after ~7 hours.  Stack trace [below] indicates zone-lru lock in
__page_cache_release() called from put_page().  Either heavy contention
or failure to unlock.  Note that previous run, with patches to
putback_lru_page() and unmap_and_move(), the same load ran for ~18 hours
before I shut it down to try these patches.

I'm going to try again with the collected patches posted by Kosaki-san
[for which, Thanks!].  If it occurs again, I'll deconfig the unevictable
lru feature and see if I can reproduce it there.  It may be unrelated to
the unevictable lru patches.

> 
> > > @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
> > >  	struct page *page;
> > >  	pte_t pte;
> > >  
> > > +	/*
> > > +	 * page is never be unmapped by page-reclaim. we lock this page now.
> > > +	 */
> > 
> > I don't understand what you're trying to say here.  That is, what the
> > point of this comment is...
> > 
> We access the page-table without taking pte_lock. But this vm is MLOCKED
> and migration-race is handled. So we don't need to be too nervous to access
> the pte. I'll consider more meaningful words.

OK, so you just want to note that we're accessing the pte w/o locking
and that this is safe because the vma has been VM_LOCKED and all pages
should be mlocked?  

I'll note that the vma is NOT VM_LOCKED during the pte walk.
munlock_vma_pages_range() resets it so that try_to_unlock(), called from
munlock_vma_page(), won't try to re-mlock the page.  However, we hold
the mmap sem for write, so faults are held off--no need to worry about a
COW fault occurring between when the VM_LOCKED was cleared and before
the page is munlocked.  If that could occur, it could open a window
where a non-mlocked page is mapped in this vma, and page reclaim could
potentially unmap the page.  Shouldn't be an issue as long as we never
downgrade the semaphore to read during munlock.

Lee

----------
softlockup stack trace for "usex" workload on ia64:

BUG: soft lockup - CPU#13 stuck for 61s! [usex:124359]
Modules linked in: ipv6 sunrpc dm_mirror dm_log dm_multipath scsi_dh dm_mod pci_slot fan dock thermal sg sr_mod processor button container ehci_hcd ohci_hcd uhci_hcd usbcore

Pid: 124359, CPU 13, comm:                 usex
psr : 00001010085a6010 ifs : 8000000000000000 ip  : [<a00000010000a1a0>]    Tainted: G      D   (2.6.26-rc5-mm3-kame-rework+mcl_inherit)
ip is at ia64_spinlock_contention+0x20/0x60
unat: 0000000000000000 pfs : 0000000000000081 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : a65955959a96e969
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001001264a0 b6  : a0000001006f0350 b7  : a00000010000b940
f6  : 0ffff8000000000000000 f7  : 1003ecf3cf3cf3cf3cf3d
f8  : 1003e0000000000000001 f9  : 1003e0000000000000015
f10 : 1003e000003a82aaab1fb f11 : 1003e0000000000000000
r1  : a000000100c03650 r2  : 000000000000038a r3  : 0000000000000001
r8  : 00000010085a6010 r9  : 0000000000080028 r10 : 000000000000000b
r11 : 0000000000000a80 r12 : e0000741aaac7d50 r13 : e0000741aaac0000
r14 : 0000000000000000 r15 : a000400741329148 r16 : e000074000060100
r17 : e000076000078e98 r18 : 0000000000000015 r19 : 0000000000000018
r20 : 0000000000000003 r21 : 0000000000000002 r22 : e000076000078e88
r23 : e000076000078e80 r24 : 0000000000000001 r25 : 0240000000080028
r26 : ffffffffffff04d8 r27 : 00000010085a6010 r28 : 7fe3382473f8b380
r29 : 9c00000000000000 r30 : 0000000000000001 r31 : e000074000061400

Call Trace:
 [<a000000100015e00>] show_stack+0x80/0xa0
                                sp=e0000741aaac79b0 bsp=e0000741aaac1528
 [<a000000100016700>] show_regs+0x880/0x8c0
                                sp=e0000741aaac7b80 bsp=e0000741aaac14d0
 [<a0000001000fbbe0>] softlockup_tick+0x2e0/0x340
                                sp=e0000741aaac7b80 bsp=e0000741aaac1480
 [<a0000001000a9400>] run_local_timers+0x40/0x60
                                sp=e0000741aaac7b80 bsp=e0000741aaac1468
 [<a0000001000a9460>] update_process_times+0x40/0xc0
                                sp=e0000741aaac7b80 bsp=e0000741aaac1438
 [<a00000010003ded0>] timer_interrupt+0x1b0/0x4a0
                                sp=e0000741aaac7b80 bsp=e0000741aaac13d0
 [<a0000001000fc480>] handle_IRQ_event+0x80/0x120
                                sp=e0000741aaac7b80 bsp=e0000741aaac1398
 [<a0000001000fc660>] __do_IRQ+0x140/0x440
                                sp=e0000741aaac7b80 bsp=e0000741aaac1338
 [<a0000001000136d0>] ia64_handle_irq+0x3f0/0x420
                                sp=e0000741aaac7b80 bsp=e0000741aaac12c0
 [<a00000010000c120>] ia64_native_leave_kernel+0x0/0x270
                                sp=e0000741aaac7b80 bsp=e0000741aaac12c0
 [<a00000010000a1a0>] ia64_spinlock_contention+0x20/0x60
                                sp=e0000741aaac7d50 bsp=e0000741aaac12c0
 [<a0000001006f0350>] _spin_lock_irqsave+0x50/0x60
                                sp=e0000741aaac7d50 bsp=e0000741aaac12b8

Probably zone lru_lock in __page_cache_release().

 [<a0000001001264a0>] put_page+0x100/0x300
                                sp=e0000741aaac7d50 bsp=e0000741aaac1280
 [<a000000100157170>] free_page_and_swap_cache+0x70/0xe0
                                sp=e0000741aaac7d50 bsp=e0000741aaac1260
 [<a000000100145a10>] exit_mmap+0x3b0/0x580
                                sp=e0000741aaac7d50 bsp=e0000741aaac1210
 [<a00000010008b420>] mmput+0x80/0x1c0
                                sp=e0000741aaac7e10 bsp=e0000741aaac11d8

NOTE:  all cpus show similar stack traces above here.  Some, however, get
here from do_exit()/exit_mm(), rather than via execve().

 [<a00000010019c2c0>] flush_old_exec+0x5a0/0x1520
                                sp=e0000741aaac7e10 bsp=e0000741aaac10f0
 [<a000000100213080>] load_elf_binary+0x7e0/0x2600
                                sp=e0000741aaac7e20 bsp=e0000741aaac0fb8
 [<a00000010019b7a0>] search_binary_handler+0x1a0/0x520
                                sp=e0000741aaac7e20 bsp=e0000741aaac0f30
 [<a00000010019e4e0>] do_execve+0x320/0x3e0
                                sp=e0000741aaac7e20 bsp=e0000741aaac0ed0
 [<a000000100014d00>] sys_execve+0x60/0xc0
                                sp=e0000741aaac7e30 bsp=e0000741aaac0e98
 [<a00000010000b690>] ia64_execve+0x30/0x140
                                sp=e0000741aaac7e30 bsp=e0000741aaac0e48
 [<a00000010000bfa0>] ia64_ret_from_syscall+0x0/0x20
                                sp=e0000741aaac7e30 bsp=e0000741aaac0e48
 [<a000000000010720>] __start_ivt_text+0xffffffff00010720/0x400
                                sp=e0000741aaac8000 bsp=e0000741aaac0e48



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-19 14:45             ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-19 14:45 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Thu, 2008-06-19 at 09:22 +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 18 Jun 2008 14:21:06 -0400
> Lee Schermerhorn <Lee.Schermerhorn-VXdhtT5mjnY@public.gmane.org> wrote:
> 
> > On Wed, 2008-06-18 at 18:40 +0900, KAMEZAWA Hiroyuki wrote:
> > > Lee-san, how about this ?
> > > Tested on x86-64 and tried Nisimura-san's test at el. works good now.
> > 
> > I have been testing with my work load on both ia64 and x86_64 and it
> > seems to be working well.  I'll let them run for a day or so.
> > 
> thank you.
> <snip>

Update:

On x86_64 [32GB, 4xdual-core Opteron], my work load has run for ~20:40
hours.  Still running.

On ia64 [32G, 16cpu, 4 node], the system started going into softlockup
after ~7 hours.  Stack trace [below] indicates zone-lru lock in
__page_cache_release() called from put_page().  Either heavy contention
or failure to unlock.  Note that previous run, with patches to
putback_lru_page() and unmap_and_move(), the same load ran for ~18 hours
before I shut it down to try these patches.

I'm going to try again with the collected patches posted by Kosaki-san
[for which, Thanks!].  If it occurs again, I'll deconfig the unevictable
lru feature and see if I can reproduce it there.  It may be unrelated to
the unevictable lru patches.

> 
> > > @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
> > >  	struct page *page;
> > >  	pte_t pte;
> > >  
> > > +	/*
> > > +	 * page is never be unmapped by page-reclaim. we lock this page now.
> > > +	 */
> > 
> > I don't understand what you're trying to say here.  That is, what the
> > point of this comment is...
> > 
> We access the page-table without taking pte_lock. But this vm is MLOCKED
> and migration-race is handled. So we don't need to be too nervous to access
> the pte. I'll consider more meaningful words.

OK, so you just want to note that we're accessing the pte w/o locking
and that this is safe because the vma has been VM_LOCKED and all pages
should be mlocked?  

I'll note that the vma is NOT VM_LOCKED during the pte walk.
munlock_vma_pages_range() resets it so that try_to_unlock(), called from
munlock_vma_page(), won't try to re-mlock the page.  However, we hold
the mmap sem for write, so faults are held off--no need to worry about a
COW fault occurring between when the VM_LOCKED was cleared and before
the page is munlocked.  If that could occur, it could open a window
where a non-mlocked page is mapped in this vma, and page reclaim could
potentially unmap the page.  Shouldn't be an issue as long as we never
downgrade the semaphore to read during munlock.

Lee

----------
softlockup stack trace for "usex" workload on ia64:

BUG: soft lockup - CPU#13 stuck for 61s! [usex:124359]
Modules linked in: ipv6 sunrpc dm_mirror dm_log dm_multipath scsi_dh dm_mod pci_slot fan dock thermal sg sr_mod processor button container ehci_hcd ohci_hcd uhci_hcd usbcore

Pid: 124359, CPU 13, comm:                 usex
psr : 00001010085a6010 ifs : 8000000000000000 ip  : [<a00000010000a1a0>]    Tainted: G      D   (2.6.26-rc5-mm3-kame-rework+mcl_inherit)
ip is at ia64_spinlock_contention+0x20/0x60
unat: 0000000000000000 pfs : 0000000000000081 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : a65955959a96e969
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001001264a0 b6  : a0000001006f0350 b7  : a00000010000b940
f6  : 0ffff8000000000000000 f7  : 1003ecf3cf3cf3cf3cf3d
f8  : 1003e0000000000000001 f9  : 1003e0000000000000015
f10 : 1003e000003a82aaab1fb f11 : 1003e0000000000000000
r1  : a000000100c03650 r2  : 000000000000038a r3  : 0000000000000001
r8  : 00000010085a6010 r9  : 0000000000080028 r10 : 000000000000000b
r11 : 0000000000000a80 r12 : e0000741aaac7d50 r13 : e0000741aaac0000
r14 : 0000000000000000 r15 : a000400741329148 r16 : e000074000060100
r17 : e000076000078e98 r18 : 0000000000000015 r19 : 0000000000000018
r20 : 0000000000000003 r21 : 0000000000000002 r22 : e000076000078e88
r23 : e000076000078e80 r24 : 0000000000000001 r25 : 0240000000080028
r26 : ffffffffffff04d8 r27 : 00000010085a6010 r28 : 7fe3382473f8b380
r29 : 9c00000000000000 r30 : 0000000000000001 r31 : e000074000061400

Call Trace:
 [<a000000100015e00>] show_stack+0x80/0xa0
                                sp=e0000741aaac79b0 bsp=e0000741aaac1528
 [<a000000100016700>] show_regs+0x880/0x8c0
                                sp=e0000741aaac7b80 bsp=e0000741aaac14d0
 [<a0000001000fbbe0>] softlockup_tick+0x2e0/0x340
                                sp=e0000741aaac7b80 bsp=e0000741aaac1480
 [<a0000001000a9400>] run_local_timers+0x40/0x60
                                sp=e0000741aaac7b80 bsp=e0000741aaac1468
 [<a0000001000a9460>] update_process_times+0x40/0xc0
                                sp=e0000741aaac7b80 bsp=e0000741aaac1438
 [<a00000010003ded0>] timer_interrupt+0x1b0/0x4a0
                                sp=e0000741aaac7b80 bsp=e0000741aaac13d0
 [<a0000001000fc480>] handle_IRQ_event+0x80/0x120
                                sp=e0000741aaac7b80 bsp=e0000741aaac1398
 [<a0000001000fc660>] __do_IRQ+0x140/0x440
                                sp=e0000741aaac7b80 bsp=e0000741aaac1338
 [<a0000001000136d0>] ia64_handle_irq+0x3f0/0x420
                                sp=e0000741aaac7b80 bsp=e0000741aaac12c0
 [<a00000010000c120>] ia64_native_leave_kernel+0x0/0x270
                                sp=e0000741aaac7b80 bsp=e0000741aaac12c0
 [<a00000010000a1a0>] ia64_spinlock_contention+0x20/0x60
                                sp=e0000741aaac7d50 bsp=e0000741aaac12c0
 [<a0000001006f0350>] _spin_lock_irqsave+0x50/0x60
                                sp=e0000741aaac7d50 bsp=e0000741aaac12b8

Probably zone lru_lock in __page_cache_release().

 [<a0000001001264a0>] put_page+0x100/0x300
                                sp=e0000741aaac7d50 bsp=e0000741aaac1280
 [<a000000100157170>] free_page_and_swap_cache+0x70/0xe0
                                sp=e0000741aaac7d50 bsp=e0000741aaac1260
 [<a000000100145a10>] exit_mmap+0x3b0/0x580
                                sp=e0000741aaac7d50 bsp=e0000741aaac1210
 [<a00000010008b420>] mmput+0x80/0x1c0
                                sp=e0000741aaac7e10 bsp=e0000741aaac11d8

NOTE:  all cpus show similar stack traces above here.  Some, however, get
here from do_exit()/exit_mm(), rather than via execve().

 [<a00000010019c2c0>] flush_old_exec+0x5a0/0x1520
                                sp=e0000741aaac7e10 bsp=e0000741aaac10f0
 [<a000000100213080>] load_elf_binary+0x7e0/0x2600
                                sp=e0000741aaac7e20 bsp=e0000741aaac0fb8
 [<a00000010019b7a0>] search_binary_handler+0x1a0/0x520
                                sp=e0000741aaac7e20 bsp=e0000741aaac0f30
 [<a00000010019e4e0>] do_execve+0x320/0x3e0
                                sp=e0000741aaac7e20 bsp=e0000741aaac0ed0
 [<a000000100014d00>] sys_execve+0x60/0xc0
                                sp=e0000741aaac7e30 bsp=e0000741aaac0e98
 [<a00000010000b690>] ia64_execve+0x30/0x140
                                sp=e0000741aaac7e30 bsp=e0000741aaac0e48
 [<a00000010000bfa0>] ia64_ret_from_syscall+0x0/0x20
                                sp=e0000741aaac7e30 bsp=e0000741aaac0e48
 [<a000000000010720>] __start_ivt_text+0xffffffff00010720/0x400
                                sp=e0000741aaac8000 bsp=e0000741aaac0e48



--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: Re: [Experimental][PATCH] putback_lru_page rework
  2008-06-19  0:22           ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-19 15:32             ` kamezawa.hiroyu
  -1 siblings, 0 replies; 290+ messages in thread
From: kamezawa.hiroyu @ 2008-06-19 15:32 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, Andrew Morton,
	Rik van Riel, Kosaki Motohiro, Nick Piggin, linux-mm,
	linux-kernel, kernel-testers

----- Original Message -----
>Subject: Re: [Experimental][PATCH] putback_lru_page rework
>From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

>On Thu, 2008-06-19 at 09:22 +0900, KAMEZAWA Hiroyuki wrote:
>> On Wed, 18 Jun 2008 14:21:06 -0400
>> Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
>> 
>> > On Wed, 2008-06-18 at 18:40 +0900, KAMEZAWA Hiroyuki wrote:
>> > > Lee-san, how about this ?
>> > > Tested on x86-64 and tried Nisimura-san's test at el. works good now.
>> > 
>> > I have been testing with my work load on both ia64 and x86_64 and it
>> > seems to be working well.  I'll let them run for a day or so.
>> > 
>> thank you.
>> <snip>
>
>Update:
>
>On x86_64 [32GB, 4xdual-core Opteron], my work load has run for ~20:40
>hours.  Still running.
>
>On ia64 [32G, 16cpu, 4 node], the system started going into softlockup
>after ~7 hours.  Stack trace [below] indicates zone-lru lock in
>__page_cache_release() called from put_page().  Either heavy contention
>or failure to unlock.  Note that previous run, with patches to
>putback_lru_page() and unmap_and_move(), the same load ran for ~18 hours
>before I shut it down to try these patches.
>
Thanks, then there are more troubles should be shooted down.


>I'm going to try again with the collected patches posted by Kosaki-san
>[for which, Thanks!].  If it occurs again, I'll deconfig the unevictable
>lru feature and see if I can reproduce it there.  It may be unrelated to
>the unevictable lru patches.
>
I hope so...Hmm..I'll dig tomorrow. 


>> 
>> > > @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
>> > >  	struct page *page;
>> > >  	pte_t pte;
>> > >  
>> > > +	/*
>> > > +	 * page is never be unmapped by page-reclaim. we lock this page now.
>> > > +	 */
>> > 
>> > I don't understand what you're trying to say here.  That is, what the
>> > point of this comment is...
>> > 
>> We access the page-table without taking pte_lock. But this vm is MLOCKED
>> and migration-race is handled. So we don't need to be too nervous to access
>> the pte. I'll consider more meaningful words.
>
>OK, so you just want to note that we're accessing the pte w/o locking
>and that this is safe because the vma has been VM_LOCKED and all pages
>should be mlocked?  
>
yes that was my thought.

>I'll note that the vma is NOT VM_LOCKED during the pte walk.
Ouch..
>munlock_vma_pages_range() resets it so that try_to_unlock(), called from
>munlock_vma_page(), won't try to re-mlock the page.  However, we hold
>the mmap sem for write, so faults are held off--no need to worry about a
>COW fault occurring between when the VM_LOCKED was cleared and before
>the page is munlocked. 
okay.

> If that could occur, it could open a window
>where a non-mlocked page is mapped in this vma, and page reclaim could
>potentially unmap the page.  Shouldn't be an issue as long as we never
>downgrade the semaphore to read during munlock.
>

Thank you for clarification. (so..will check Kosaki-san's one's comment later.
)


>
>Probably zone lru_lock in __page_cache_release().
>
> [<a0000001001264a0>] put_page+0x100/0x300
>                                sp=e0000741aaac7d50 bsp=e0000741aaac1280
> [<a000000100157170>] free_page_and_swap_cache+0x70/0xe0
>                                sp=e0000741aaac7d50 bsp=e0000741aaac1260
> [<a000000100145a10>] exit_mmap+0x3b0/0x580
>                                sp=e0000741aaac7d50 bsp=e0000741aaac1210
> [<a00000010008b420>] mmput+0x80/0x1c0
>                                sp=e0000741aaac7e10 bsp=e0000741aaac11d8
>
I think I have never seen this kind of dead-lock related to zone->lock.
(maybe it's because zone->lock is used in clear way historically)
I'll check around zone->lock. thanks.

Regards,
-Kame


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-19 15:32             ` kamezawa.hiroyu
  0 siblings, 0 replies; 290+ messages in thread
From: kamezawa.hiroyu @ 2008-06-19 15:32 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, Andrew Morton,
	Rik van Riel, Kosaki Motohiro, Nick Piggin, linux-mm,
	linux-kernel, kernel-testers

----- Original Message -----
>Subject: Re: [Experimental][PATCH] putback_lru_page rework
>From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

>On Thu, 2008-06-19 at 09:22 +0900, KAMEZAWA Hiroyuki wrote:
>> On Wed, 18 Jun 2008 14:21:06 -0400
>> Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
>> 
>> > On Wed, 2008-06-18 at 18:40 +0900, KAMEZAWA Hiroyuki wrote:
>> > > Lee-san, how about this ?
>> > > Tested on x86-64 and tried Nisimura-san's test at el. works good now.
>> > 
>> > I have been testing with my work load on both ia64 and x86_64 and it
>> > seems to be working well.  I'll let them run for a day or so.
>> > 
>> thank you.
>> <snip>
>
>Update:
>
>On x86_64 [32GB, 4xdual-core Opteron], my work load has run for ~20:40
>hours.  Still running.
>
>On ia64 [32G, 16cpu, 4 node], the system started going into softlockup
>after ~7 hours.  Stack trace [below] indicates zone-lru lock in
>__page_cache_release() called from put_page().  Either heavy contention
>or failure to unlock.  Note that previous run, with patches to
>putback_lru_page() and unmap_and_move(), the same load ran for ~18 hours
>before I shut it down to try these patches.
>
Thanks, then there are more troubles should be shooted down.


>I'm going to try again with the collected patches posted by Kosaki-san
>[for which, Thanks!].  If it occurs again, I'll deconfig the unevictable
>lru feature and see if I can reproduce it there.  It may be unrelated to
>the unevictable lru patches.
>
I hope so...Hmm..I'll dig tomorrow. 


>> 
>> > > @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
>> > >  	struct page *page;
>> > >  	pte_t pte;
>> > >  
>> > > +	/*
>> > > +	 * page is never be unmapped by page-reclaim. we lock this page now.
>> > > +	 */
>> > 
>> > I don't understand what you're trying to say here.  That is, what the
>> > point of this comment is...
>> > 
>> We access the page-table without taking pte_lock. But this vm is MLOCKED
>> and migration-race is handled. So we don't need to be too nervous to access
>> the pte. I'll consider more meaningful words.
>
>OK, so you just want to note that we're accessing the pte w/o locking
>and that this is safe because the vma has been VM_LOCKED and all pages
>should be mlocked?  
>
yes that was my thought.

>I'll note that the vma is NOT VM_LOCKED during the pte walk.
Ouch..
>munlock_vma_pages_range() resets it so that try_to_unlock(), called from
>munlock_vma_page(), won't try to re-mlock the page.  However, we hold
>the mmap sem for write, so faults are held off--no need to worry about a
>COW fault occurring between when the VM_LOCKED was cleared and before
>the page is munlocked. 
okay.

> If that could occur, it could open a window
>where a non-mlocked page is mapped in this vma, and page reclaim could
>potentially unmap the page.  Shouldn't be an issue as long as we never
>downgrade the semaphore to read during munlock.
>

Thank you for clarification. (so..will check Kosaki-san's one's comment later.
)


>
>Probably zone lru_lock in __page_cache_release().
>
> [<a0000001001264a0>] put_page+0x100/0x300
>                                sp=e0000741aaac7d50 bsp=e0000741aaac1280
> [<a000000100157170>] free_page_and_swap_cache+0x70/0xe0
>                                sp=e0000741aaac7d50 bsp=e0000741aaac1260
> [<a000000100145a10>] exit_mmap+0x3b0/0x580
>                                sp=e0000741aaac7d50 bsp=e0000741aaac1210
> [<a00000010008b420>] mmput+0x80/0x1c0
>                                sp=e0000741aaac7e10 bsp=e0000741aaac11d8
>
I think I have never seen this kind of dead-lock related to zone->lock.
(maybe it's because zone->lock is used in clear way historically)
I'll check around zone->lock. thanks.

Regards,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-19 15:32             ` kamezawa.hiroyu
  0 siblings, 0 replies; 290+ messages in thread
From: kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A @ 2008-06-19 15:32 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, Andrew Morton,
	Rik van Riel, Kosaki Motohiro, Nick Piggin,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

----- Original Message -----
>Subject: Re: [Experimental][PATCH] putback_lru_page rework
>From: Lee Schermerhorn <Lee.Schermerhorn-VXdhtT5mjnY@public.gmane.org>

>On Thu, 2008-06-19 at 09:22 +0900, KAMEZAWA Hiroyuki wrote:
>> On Wed, 18 Jun 2008 14:21:06 -0400
>> Lee Schermerhorn <Lee.Schermerhorn-VXdhtT5mjnY@public.gmane.org> wrote:
>> 
>> > On Wed, 2008-06-18 at 18:40 +0900, KAMEZAWA Hiroyuki wrote:
>> > > Lee-san, how about this ?
>> > > Tested on x86-64 and tried Nisimura-san's test at el. works good now.
>> > 
>> > I have been testing with my work load on both ia64 and x86_64 and it
>> > seems to be working well.  I'll let them run for a day or so.
>> > 
>> thank you.
>> <snip>
>
>Update:
>
>On x86_64 [32GB, 4xdual-core Opteron], my work load has run for ~20:40
>hours.  Still running.
>
>On ia64 [32G, 16cpu, 4 node], the system started going into softlockup
>after ~7 hours.  Stack trace [below] indicates zone-lru lock in
>__page_cache_release() called from put_page().  Either heavy contention
>or failure to unlock.  Note that previous run, with patches to
>putback_lru_page() and unmap_and_move(), the same load ran for ~18 hours
>before I shut it down to try these patches.
>
Thanks, then there are more troubles should be shooted down.


>I'm going to try again with the collected patches posted by Kosaki-san
>[for which, Thanks!].  If it occurs again, I'll deconfig the unevictable
>lru feature and see if I can reproduce it there.  It may be unrelated to
>the unevictable lru patches.
>
I hope so...Hmm..I'll dig tomorrow. 


>> 
>> > > @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
>> > >  	struct page *page;
>> > >  	pte_t pte;
>> > >  
>> > > +	/*
>> > > +	 * page is never be unmapped by page-reclaim. we lock this page now.
>> > > +	 */
>> > 
>> > I don't understand what you're trying to say here.  That is, what the
>> > point of this comment is...
>> > 
>> We access the page-table without taking pte_lock. But this vm is MLOCKED
>> and migration-race is handled. So we don't need to be too nervous to access
>> the pte. I'll consider more meaningful words.
>
>OK, so you just want to note that we're accessing the pte w/o locking
>and that this is safe because the vma has been VM_LOCKED and all pages
>should be mlocked?  
>
yes that was my thought.

>I'll note that the vma is NOT VM_LOCKED during the pte walk.
Ouch..
>munlock_vma_pages_range() resets it so that try_to_unlock(), called from
>munlock_vma_page(), won't try to re-mlock the page.  However, we hold
>the mmap sem for write, so faults are held off--no need to worry about a
>COW fault occurring between when the VM_LOCKED was cleared and before
>the page is munlocked. 
okay.

> If that could occur, it could open a window
>where a non-mlocked page is mapped in this vma, and page reclaim could
>potentially unmap the page.  Shouldn't be an issue as long as we never
>downgrade the semaphore to read during munlock.
>

Thank you for clarification. (so..will check Kosaki-san's one's comment later.
)


>
>Probably zone lru_lock in __page_cache_release().
>
> [<a0000001001264a0>] put_page+0x100/0x300
>                                sp=e0000741aaac7d50 bsp=e0000741aaac1280
> [<a000000100157170>] free_page_and_swap_cache+0x70/0xe0
>                                sp=e0000741aaac7d50 bsp=e0000741aaac1260
> [<a000000100145a10>] exit_mmap+0x3b0/0x580
>                                sp=e0000741aaac7d50 bsp=e0000741aaac1210
> [<a00000010008b420>] mmput+0x80/0x1c0
>                                sp=e0000741aaac7e10 bsp=e0000741aaac11d8
>
I think I have never seen this kind of dead-lock related to zone->lock.
(maybe it's because zone->lock is used in clear way historically)
I'll check around zone->lock. thanks.

Regards,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run()
  2008-06-19 10:12     ` Rusty Russell
  (?)
@ 2008-06-19 15:51       ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 290+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-19 15:51 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Hidehiro Kawai, Andrew Morton, linux-kernel, kernel-testers,
	linux-mm, sugita, Satoshi OSHIMA, Ingo Molnar

Rusty Russell wrote:
> On Thursday 19 June 2008 16:59:50 Hidehiro Kawai wrote:
>   
>> When a process loads a kernel module, __stop_machine_run() is called, and
>> it calls sched_setscheduler() to give newly created kernel threads highest
>> priority.  However, the process can have no CAP_SYS_NICE which required
>> for sched_setscheduler() to increase the priority.  For example, SystemTap
>> loads its module with only CAP_SYS_MODULE.  In this case,
>> sched_setscheduler() returns -EPERM, then BUG() is called.
>>     
>
> Hi Hidehiro,
>
> 	Nice catch.  This can happen in the current code, it just doesn't
> BUG().
>
>   
>> Failure of sched_setscheduler() wouldn't be a real problem, so this
>> patch just ignores it.
>>     
>
> 	Well, it can mean that the stop_machine blocks indefinitely.  Better
> than a BUG(), but we should aim higher.
>
>   
>> Or, should we give the CAP_SYS_NICE capability temporarily?
>>     
>
>         I don't think so.  It can be seen from another thread, and in theory
> that should not see something random.  Worse, they can change it from
> another thread.
>
> How's this?
>
> sched_setscheduler: add a flag to control access checks
>
> Hidehiro Kawai noticed that sched_setscheduler() can fail in
> stop_machine: it calls sched_setscheduler() from insmod, which can
> have CAP_SYS_MODULE without CAP_SYS_NICE.
>
> This simply introduces a flag to allow us to disable the capability
> checks for internal callers (this is simpler than splitting the
> sched_setscheduler() function, since it loops checking permissions).
>   
What about?

int sched_setscheduler(struct task_struct *p, int policy,
		       struct sched_param *param)
{
	return __sched_setscheduler(p, policy, param, true);
}


int sched_setscheduler_nocheck(struct task_struct *p, int policy,
		               struct sched_param *param)
{
	return __sched_setscheduler(p, policy, param, false);
}


(With the appropriate transformation of sched_setscheduler -> __)

Better than scattering stray true/falses around the code.

    J

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run()
@ 2008-06-19 15:51       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 290+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-19 15:51 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Hidehiro Kawai, Andrew Morton, linux-kernel, kernel-testers,
	linux-mm, sugita, Satoshi OSHIMA, Ingo Molnar

Rusty Russell wrote:
> On Thursday 19 June 2008 16:59:50 Hidehiro Kawai wrote:
>   
>> When a process loads a kernel module, __stop_machine_run() is called, and
>> it calls sched_setscheduler() to give newly created kernel threads highest
>> priority.  However, the process can have no CAP_SYS_NICE which required
>> for sched_setscheduler() to increase the priority.  For example, SystemTap
>> loads its module with only CAP_SYS_MODULE.  In this case,
>> sched_setscheduler() returns -EPERM, then BUG() is called.
>>     
>
> Hi Hidehiro,
>
> 	Nice catch.  This can happen in the current code, it just doesn't
> BUG().
>
>   
>> Failure of sched_setscheduler() wouldn't be a real problem, so this
>> patch just ignores it.
>>     
>
> 	Well, it can mean that the stop_machine blocks indefinitely.  Better
> than a BUG(), but we should aim higher.
>
>   
>> Or, should we give the CAP_SYS_NICE capability temporarily?
>>     
>
>         I don't think so.  It can be seen from another thread, and in theory
> that should not see something random.  Worse, they can change it from
> another thread.
>
> How's this?
>
> sched_setscheduler: add a flag to control access checks
>
> Hidehiro Kawai noticed that sched_setscheduler() can fail in
> stop_machine: it calls sched_setscheduler() from insmod, which can
> have CAP_SYS_MODULE without CAP_SYS_NICE.
>
> This simply introduces a flag to allow us to disable the capability
> checks for internal callers (this is simpler than splitting the
> sched_setscheduler() function, since it loops checking permissions).
>   
What about?

int sched_setscheduler(struct task_struct *p, int policy,
		       struct sched_param *param)
{
	return __sched_setscheduler(p, policy, param, true);
}


int sched_setscheduler_nocheck(struct task_struct *p, int policy,
		               struct sched_param *param)
{
	return __sched_setscheduler(p, policy, param, false);
}


(With the appropriate transformation of sched_setscheduler -> __)

Better than scattering stray true/falses around the code.

    J

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run()
@ 2008-06-19 15:51       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 290+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-19 15:51 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Hidehiro Kawai, Andrew Morton,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, sugita, Satoshi OSHIMA,
	Ingo Molnar

Rusty Russell wrote:
> On Thursday 19 June 2008 16:59:50 Hidehiro Kawai wrote:
>   
>> When a process loads a kernel module, __stop_machine_run() is called, and
>> it calls sched_setscheduler() to give newly created kernel threads highest
>> priority.  However, the process can have no CAP_SYS_NICE which required
>> for sched_setscheduler() to increase the priority.  For example, SystemTap
>> loads its module with only CAP_SYS_MODULE.  In this case,
>> sched_setscheduler() returns -EPERM, then BUG() is called.
>>     
>
> Hi Hidehiro,
>
> 	Nice catch.  This can happen in the current code, it just doesn't
> BUG().
>
>   
>> Failure of sched_setscheduler() wouldn't be a real problem, so this
>> patch just ignores it.
>>     
>
> 	Well, it can mean that the stop_machine blocks indefinitely.  Better
> than a BUG(), but we should aim higher.
>
>   
>> Or, should we give the CAP_SYS_NICE capability temporarily?
>>     
>
>         I don't think so.  It can be seen from another thread, and in theory
> that should not see something random.  Worse, they can change it from
> another thread.
>
> How's this?
>
> sched_setscheduler: add a flag to control access checks
>
> Hidehiro Kawai noticed that sched_setscheduler() can fail in
> stop_machine: it calls sched_setscheduler() from insmod, which can
> have CAP_SYS_MODULE without CAP_SYS_NICE.
>
> This simply introduces a flag to allow us to disable the capability
> checks for internal callers (this is simpler than splitting the
> sched_setscheduler() function, since it loops checking permissions).
>   
What about?

int sched_setscheduler(struct task_struct *p, int policy,
		       struct sched_param *param)
{
	return __sched_setscheduler(p, policy, param, true);
}


int sched_setscheduler_nocheck(struct task_struct *p, int policy,
		               struct sched_param *param)
{
	return __sched_setscheduler(p, policy, param, false);
}


(With the appropriate transformation of sched_setscheduler -> __)

Better than scattering stray true/falses around the code.

    J
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3: BUG large value for HugePages_Rsvd
  2008-06-12  5:59 ` 2.6.26-rc5-mm3 Andrew Morton
  (?)
@ 2008-06-19 16:27   ` Jon Tollefson
  -1 siblings, 0 replies; 290+ messages in thread
From: Jon Tollefson @ 2008-06-19 16:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, kernel-testers, linux-mm, Nick Piggin,
	Nishanth Aravamudan, Adam Litke


After running some of the libhugetlbfs tests the value for
/proc/meminfo/HugePages_Rsvd becomes really large.  It looks like it has
wrapped backwards from zero.
Below is the sequence I used to run one of the tests that causes this;
the tests passes for what it is intended to test but leaves a large
value for reserved pages and that seemed strange to me.
test run on ppc64 with 16M huge pages

cat /proc/meminfo
....
HugePages_Total:    25
HugePages_Free:     25
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:    16384 kB

mount -t hugetlbfs hugetlbfs /mnt

tundro4:~/libhugetlbfs-dev-20080516/tests # HUGETLBFS_VERBOSE=99 HUGETLBFS_DEBUG=y PATH="obj64:$PATH" LD_LIBRARY_PATH="$LD_LIBRARY_PATH:../obj64:obj64" truncate_above_4GB
Starting testcase "truncate_above_4GB", pid 3145
Mapping 3 hpages at offset 0x100000000...mapped at 0x3fffd000000
Replacing map at 0x3ffff000000 with map from offset 0x1000000...done
Truncating at 0x100000000...done
PASS

cat /proc/meminfo
....
HugePages_Total:    25
HugePages_Free:     25
HugePages_Rsvd:  18446744073709551614
HugePages_Surp:      0
Hugepagesize:    16384 kB


I put in some printks and see that the rsvd value goes mad in
'return_unused_surplus_pages'.

Debug output:

tundro4 kernel: mm/hugetlb.c:gather_surplus_pages:527; resv_huge_pages=0 delta=3
tundro4 kernel: Call Trace:
tundro4 kernel: [c000000287dff9a0] [c000000000010978] .show_stack+0x7c/0x1c4 (unreliable)
tundro4 kernel: [c000000287dffa50] [c0000000000d7c8c] .hugetlb_acct_memory+0xa4/0x448
tundro4 kernel: [c000000287dffb20] [c0000000000d85ec] .hugetlb_reserve_pages+0xec/0x16c
tundro4 kernel: [c000000287dffbc0] [c0000000001be7fc] .hugetlbfs_file_mmap+0xe0/0x154
tundro4 kernel: [c000000287dffc70] [c0000000000cbc78] .mmap_region+0x280/0x52c
tundro4 kernel: [c000000287dffd80] [c00000000000bfa0] .sys_mmap+0xa8/0x108
tundro4 kernel: [c000000287dffe30] [c0000000000086ac] syscall_exit+0x0/0x40
tundro4 kernel: mm/hugetlb.c:gather_surplus_pages:530; resv_huge_pages=3 delta=3
tundro4 kernel: mm/hugetlb.c:decrement_hugepage_resv_vma:147; resv_huge_pages=3
tundro4 kernel: mm/hugetlb.c:decrement_hugepage_resv_vma:149; resv_huge_pages=2
tundro4 kernel: mm/hugetlb.c:return_unused_surplus_pages:630; resv_huge_pages=2 unused_resv_pages=2
tundro4 kernel: Call Trace:
tundro4 kernel: [c000000287dff900] [c000000000010978] .show_stack+0x7c/0x1c4 (unreliable)
tundro4 kernel: [c000000287dff9b0] [c0000000000d7a10] .return_unused_surplus_pages+0x70/0x248
tundro4 kernel: [c000000287dffa50] [c0000000000d7fb8] .hugetlb_acct_memory+0x3d0/0x448
tundro4 kernel: [c000000287dffb20] [c0000000000c98fc] .remove_vma+0x64/0xe0
tundro4 kernel: [c000000287dffbb0] [c0000000000cb058] .do_munmap+0x30c/0x354
tundro4 kernel: [c000000287dffc70] [c0000000000cbad0] .mmap_region+0xd8/0x52c
tundro4 kernel: [c000000287dffd80] [c00000000000bfa0] .sys_mmap+0xa8/0x108
tundro4 kernel: [c000000287dffe30] [c0000000000086ac] syscall_exit+0x0/0x40
tundro4 kernel: mm/hugetlb.c:return_unused_surplus_pages:633; resv_huge_pages=0 unused_resv_pages=2
tundro4 kernel: mm/hugetlb.c:gather_surplus_pages:527; resv_huge_pages=0 delta=1
tundro4 kernel: Call Trace:
tundro4 kernel: [c000000287dff9a0] [c000000000010978] .show_stack+0x7c/0x1c4 (unreliable)
tundro4 kernel: [c000000287dffa50] [c0000000000d7c8c] .hugetlb_acct_memory+0xa4/0x448
tundro4 kernel: [c000000287dffb20] [c0000000000d85ec] .hugetlb_reserve_pages+0xec/0x16c
tundro4 kernel: [c000000287dffbc0] [c0000000001be7fc] .hugetlbfs_file_mmap+0xe0/0x154
tundro4 kernel: [c000000287dffc70] [c0000000000cbc78] .mmap_region+0x280/0x52c
tundro4 kernel: [c000000287dffd80] [c00000000000bfa0] .sys_mmap+0xa8/0x108
tundro4 kernel: [c000000287dffe30] [c0000000000086ac] syscall_exit+0x0/0x40
tundro4 kernel: mm/hugetlb.c:gather_surplus_pages:530; resv_huge_pages=1 delta=1
tundro4 kernel: mm/hugetlb.c:decrement_hugepage_resv_vma:147; resv_huge_pages=1
tundro4 kernel: mm/hugetlb.c:decrement_hugepage_resv_vma:149; resv_huge_pages=0
tundro4 kernel: mm/hugetlb.c:return_unused_surplus_pages:630; resv_huge_pages=0 unused_resv_pages=2
tundro4 kernel: Call Trace:
tundro4 kernel: [c000000287dff860] [c000000000010978] .show_stack+0x7c/0x1c4 (unreliable)
tundro4 kernel: [c000000287dff910] [c0000000000d7a10] .return_unused_surplus_pages+0x70/0x248
tundro4 kernel: [c000000287dff9b0] [c0000000000d7fb8] .hugetlb_acct_memory+0x3d0/0x448
tundro4 kernel: [c000000287dffa80] [c0000000000c98fc] .remove_vma+0x64/0xe0
tundro4 kernel: [c000000287dffb10] [c0000000000c9af0] .exit_mmap+0x178/0x1b8
tundro4 kernel: [c000000287dffbc0] [c000000000055ef0] .mmput+0x60/0x178
tundro4 kernel: [c000000287dffc50] [c00000000005add8] .exit_mm+0x130/0x154
tundro4 kernel: [c000000287dffce0] [c00000000005d598] .do_exit+0x2bc/0x778
tundro4 kernel: [c000000287dffda0] [c00000000005db38] .sys_exit_group+0x0/0x8
tundro4 kernel: [c000000287dffe30] [c0000000000086ac] syscall_exit+0x0/0x40
tundro4 kernel: mm/hugetlb.c:return_unused_surplus_pages:633; resv_huge_pages=18446744073709551614 unused_resv_pages=2

============the end===============



^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3: BUG large value for HugePages_Rsvd
@ 2008-06-19 16:27   ` Jon Tollefson
  0 siblings, 0 replies; 290+ messages in thread
From: Jon Tollefson @ 2008-06-19 16:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, kernel-testers, linux-mm, Nick Piggin,
	Nishanth Aravamudan, Adam Litke

After running some of the libhugetlbfs tests the value for
/proc/meminfo/HugePages_Rsvd becomes really large.  It looks like it has
wrapped backwards from zero.
Below is the sequence I used to run one of the tests that causes this;
the tests passes for what it is intended to test but leaves a large
value for reserved pages and that seemed strange to me.
test run on ppc64 with 16M huge pages

cat /proc/meminfo
....
HugePages_Total:    25
HugePages_Free:     25
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:    16384 kB

mount -t hugetlbfs hugetlbfs /mnt

tundro4:~/libhugetlbfs-dev-20080516/tests # HUGETLBFS_VERBOSE=99 HUGETLBFS_DEBUG=y PATH="obj64:$PATH" LD_LIBRARY_PATH="$LD_LIBRARY_PATH:../obj64:obj64" truncate_above_4GB
Starting testcase "truncate_above_4GB", pid 3145
Mapping 3 hpages at offset 0x100000000...mapped at 0x3fffd000000
Replacing map at 0x3ffff000000 with map from offset 0x1000000...done
Truncating at 0x100000000...done
PASS

cat /proc/meminfo
....
HugePages_Total:    25
HugePages_Free:     25
HugePages_Rsvd:  18446744073709551614
HugePages_Surp:      0
Hugepagesize:    16384 kB


I put in some printks and see that the rsvd value goes mad in
'return_unused_surplus_pages'.

Debug output:

tundro4 kernel: mm/hugetlb.c:gather_surplus_pages:527; resv_huge_pages=0 delta=3
tundro4 kernel: Call Trace:
tundro4 kernel: [c000000287dff9a0] [c000000000010978] .show_stack+0x7c/0x1c4 (unreliable)
tundro4 kernel: [c000000287dffa50] [c0000000000d7c8c] .hugetlb_acct_memory+0xa4/0x448
tundro4 kernel: [c000000287dffb20] [c0000000000d85ec] .hugetlb_reserve_pages+0xec/0x16c
tundro4 kernel: [c000000287dffbc0] [c0000000001be7fc] .hugetlbfs_file_mmap+0xe0/0x154
tundro4 kernel: [c000000287dffc70] [c0000000000cbc78] .mmap_region+0x280/0x52c
tundro4 kernel: [c000000287dffd80] [c00000000000bfa0] .sys_mmap+0xa8/0x108
tundro4 kernel: [c000000287dffe30] [c0000000000086ac] syscall_exit+0x0/0x40
tundro4 kernel: mm/hugetlb.c:gather_surplus_pages:530; resv_huge_pages=3 delta=3
tundro4 kernel: mm/hugetlb.c:decrement_hugepage_resv_vma:147; resv_huge_pages=3
tundro4 kernel: mm/hugetlb.c:decrement_hugepage_resv_vma:149; resv_huge_pages=2
tundro4 kernel: mm/hugetlb.c:return_unused_surplus_pages:630; resv_huge_pages=2 unused_resv_pages=2
tundro4 kernel: Call Trace:
tundro4 kernel: [c000000287dff900] [c000000000010978] .show_stack+0x7c/0x1c4 (unreliable)
tundro4 kernel: [c000000287dff9b0] [c0000000000d7a10] .return_unused_surplus_pages+0x70/0x248
tundro4 kernel: [c000000287dffa50] [c0000000000d7fb8] .hugetlb_acct_memory+0x3d0/0x448
tundro4 kernel: [c000000287dffb20] [c0000000000c98fc] .remove_vma+0x64/0xe0
tundro4 kernel: [c000000287dffbb0] [c0000000000cb058] .do_munmap+0x30c/0x354
tundro4 kernel: [c000000287dffc70] [c0000000000cbad0] .mmap_region+0xd8/0x52c
tundro4 kernel: [c000000287dffd80] [c00000000000bfa0] .sys_mmap+0xa8/0x108
tundro4 kernel: [c000000287dffe30] [c0000000000086ac] syscall_exit+0x0/0x40
tundro4 kernel: mm/hugetlb.c:return_unused_surplus_pages:633; resv_huge_pages=0 unused_resv_pages=2
tundro4 kernel: mm/hugetlb.c:gather_surplus_pages:527; resv_huge_pages=0 delta=1
tundro4 kernel: Call Trace:
tundro4 kernel: [c000000287dff9a0] [c000000000010978] .show_stack+0x7c/0x1c4 (unreliable)
tundro4 kernel: [c000000287dffa50] [c0000000000d7c8c] .hugetlb_acct_memory+0xa4/0x448
tundro4 kernel: [c000000287dffb20] [c0000000000d85ec] .hugetlb_reserve_pages+0xec/0x16c
tundro4 kernel: [c000000287dffbc0] [c0000000001be7fc] .hugetlbfs_file_mmap+0xe0/0x154
tundro4 kernel: [c000000287dffc70] [c0000000000cbc78] .mmap_region+0x280/0x52c
tundro4 kernel: [c000000287dffd80] [c00000000000bfa0] .sys_mmap+0xa8/0x108
tundro4 kernel: [c000000287dffe30] [c0000000000086ac] syscall_exit+0x0/0x40
tundro4 kernel: mm/hugetlb.c:gather_surplus_pages:530; resv_huge_pages=1 delta=1
tundro4 kernel: mm/hugetlb.c:decrement_hugepage_resv_vma:147; resv_huge_pages=1
tundro4 kernel: mm/hugetlb.c:decrement_hugepage_resv_vma:149; resv_huge_pages=0
tundro4 kernel: mm/hugetlb.c:return_unused_surplus_pages:630; resv_huge_pages=0 unused_resv_pages=2
tundro4 kernel: Call Trace:
tundro4 kernel: [c000000287dff860] [c000000000010978] .show_stack+0x7c/0x1c4 (unreliable)
tundro4 kernel: [c000000287dff910] [c0000000000d7a10] .return_unused_surplus_pages+0x70/0x248
tundro4 kernel: [c000000287dff9b0] [c0000000000d7fb8] .hugetlb_acct_memory+0x3d0/0x448
tundro4 kernel: [c000000287dffa80] [c0000000000c98fc] .remove_vma+0x64/0xe0
tundro4 kernel: [c000000287dffb10] [c0000000000c9af0] .exit_mmap+0x178/0x1b8
tundro4 kernel: [c000000287dffbc0] [c000000000055ef0] .mmput+0x60/0x178
tundro4 kernel: [c000000287dffc50] [c00000000005add8] .exit_mm+0x130/0x154
tundro4 kernel: [c000000287dffce0] [c00000000005d598] .do_exit+0x2bc/0x778
tundro4 kernel: [c000000287dffda0] [c00000000005db38] .sys_exit_group+0x0/0x8
tundro4 kernel: [c000000287dffe30] [c0000000000086ac] syscall_exit+0x0/0x40
tundro4 kernel: mm/hugetlb.c:return_unused_surplus_pages:633; resv_huge_pages=18446744073709551614 unused_resv_pages=2

============the end===============


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3: BUG large value for HugePages_Rsvd
@ 2008-06-19 16:27   ` Jon Tollefson
  0 siblings, 0 replies; 290+ messages in thread
From: Jon Tollefson @ 2008-06-19 16:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Nick Piggin,
	Nishanth Aravamudan, Adam Litke


After running some of the libhugetlbfs tests the value for
/proc/meminfo/HugePages_Rsvd becomes really large.  It looks like it has
wrapped backwards from zero.
Below is the sequence I used to run one of the tests that causes this;
the tests passes for what it is intended to test but leaves a large
value for reserved pages and that seemed strange to me.
test run on ppc64 with 16M huge pages

cat /proc/meminfo
....
HugePages_Total:    25
HugePages_Free:     25
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:    16384 kB

mount -t hugetlbfs hugetlbfs /mnt

tundro4:~/libhugetlbfs-dev-20080516/tests # HUGETLBFS_VERBOSE=99 HUGETLBFS_DEBUG=y PATH="obj64:$PATH" LD_LIBRARY_PATH="$LD_LIBRARY_PATH:../obj64:obj64" truncate_above_4GB
Starting testcase "truncate_above_4GB", pid 3145
Mapping 3 hpages at offset 0x100000000...mapped at 0x3fffd000000
Replacing map at 0x3ffff000000 with map from offset 0x1000000...done
Truncating at 0x100000000...done
PASS

cat /proc/meminfo
....
HugePages_Total:    25
HugePages_Free:     25
HugePages_Rsvd:  18446744073709551614
HugePages_Surp:      0
Hugepagesize:    16384 kB


I put in some printks and see that the rsvd value goes mad in
'return_unused_surplus_pages'.

Debug output:

tundro4 kernel: mm/hugetlb.c:gather_surplus_pages:527; resv_huge_pages=0 delta=3
tundro4 kernel: Call Trace:
tundro4 kernel: [c000000287dff9a0] [c000000000010978] .show_stack+0x7c/0x1c4 (unreliable)
tundro4 kernel: [c000000287dffa50] [c0000000000d7c8c] .hugetlb_acct_memory+0xa4/0x448
tundro4 kernel: [c000000287dffb20] [c0000000000d85ec] .hugetlb_reserve_pages+0xec/0x16c
tundro4 kernel: [c000000287dffbc0] [c0000000001be7fc] .hugetlbfs_file_mmap+0xe0/0x154
tundro4 kernel: [c000000287dffc70] [c0000000000cbc78] .mmap_region+0x280/0x52c
tundro4 kernel: [c000000287dffd80] [c00000000000bfa0] .sys_mmap+0xa8/0x108
tundro4 kernel: [c000000287dffe30] [c0000000000086ac] syscall_exit+0x0/0x40
tundro4 kernel: mm/hugetlb.c:gather_surplus_pages:530; resv_huge_pages=3 delta=3
tundro4 kernel: mm/hugetlb.c:decrement_hugepage_resv_vma:147; resv_huge_pages=3
tundro4 kernel: mm/hugetlb.c:decrement_hugepage_resv_vma:149; resv_huge_pages=2
tundro4 kernel: mm/hugetlb.c:return_unused_surplus_pages:630; resv_huge_pages=2 unused_resv_pages=2
tundro4 kernel: Call Trace:
tundro4 kernel: [c000000287dff900] [c000000000010978] .show_stack+0x7c/0x1c4 (unreliable)
tundro4 kernel: [c000000287dff9b0] [c0000000000d7a10] .return_unused_surplus_pages+0x70/0x248
tundro4 kernel: [c000000287dffa50] [c0000000000d7fb8] .hugetlb_acct_memory+0x3d0/0x448
tundro4 kernel: [c000000287dffb20] [c0000000000c98fc] .remove_vma+0x64/0xe0
tundro4 kernel: [c000000287dffbb0] [c0000000000cb058] .do_munmap+0x30c/0x354
tundro4 kernel: [c000000287dffc70] [c0000000000cbad0] .mmap_region+0xd8/0x52c
tundro4 kernel: [c000000287dffd80] [c00000000000bfa0] .sys_mmap+0xa8/0x108
tundro4 kernel: [c000000287dffe30] [c0000000000086ac] syscall_exit+0x0/0x40
tundro4 kernel: mm/hugetlb.c:return_unused_surplus_pages:633; resv_huge_pages=0 unused_resv_pages=2
tundro4 kernel: mm/hugetlb.c:gather_surplus_pages:527; resv_huge_pages=0 delta=1
tundro4 kernel: Call Trace:
tundro4 kernel: [c000000287dff9a0] [c000000000010978] .show_stack+0x7c/0x1c4 (unreliable)
tundro4 kernel: [c000000287dffa50] [c0000000000d7c8c] .hugetlb_acct_memory+0xa4/0x448
tundro4 kernel: [c000000287dffb20] [c0000000000d85ec] .hugetlb_reserve_pages+0xec/0x16c
tundro4 kernel: [c000000287dffbc0] [c0000000001be7fc] .hugetlbfs_file_mmap+0xe0/0x154
tundro4 kernel: [c000000287dffc70] [c0000000000cbc78] .mmap_region+0x280/0x52c
tundro4 kernel: [c000000287dffd80] [c00000000000bfa0] .sys_mmap+0xa8/0x108
tundro4 kernel: [c000000287dffe30] [c0000000000086ac] syscall_exit+0x0/0x40
tundro4 kernel: mm/hugetlb.c:gather_surplus_pages:530; resv_huge_pages=1 delta=1
tundro4 kernel: mm/hugetlb.c:decrement_hugepage_resv_vma:147; resv_huge_pages=1
tundro4 kernel: mm/hugetlb.c:decrement_hugepage_resv_vma:149; resv_huge_pages=0
tundro4 kernel: mm/hugetlb.c:return_unused_surplus_pages:630; resv_huge_pages=0 unused_resv_pages=2
tundro4 kernel: Call Trace:
tundro4 kernel: [c000000287dff860] [c000000000010978] .show_stack+0x7c/0x1c4 (unreliable)
tundro4 kernel: [c000000287dff910] [c0000000000d7a10] .return_unused_surplus_pages+0x70/0x248
tundro4 kernel: [c000000287dff9b0] [c0000000000d7fb8] .hugetlb_acct_memory+0x3d0/0x448
tundro4 kernel: [c000000287dffa80] [c0000000000c98fc] .remove_vma+0x64/0xe0
tundro4 kernel: [c000000287dffb10] [c0000000000c9af0] .exit_mmap+0x178/0x1b8
tundro4 kernel: [c000000287dffbc0] [c000000000055ef0] .mmput+0x60/0x178
tundro4 kernel: [c000000287dffc50] [c00000000005add8] .exit_mm+0x130/0x154
tundro4 kernel: [c000000287dffce0] [c00000000005d598] .do_exit+0x2bc/0x778
tundro4 kernel: [c000000287dffda0] [c00000000005db38] .sys_exit_group+0x0/0x8
tundro4 kernel: [c000000287dffe30] [c0000000000086ac] syscall_exit+0x0/0x40
tundro4 kernel: mm/hugetlb.c:return_unused_surplus_pages:633; resv_huge_pages=18446744073709551614 unused_resv_pages=2

============the end===============


--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3: BUG large value for HugePages_Rsvd
  2008-06-19 16:27   ` Jon Tollefson
  (?)
@ 2008-06-19 17:16     ` Andy Whitcroft
  -1 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-19 17:16 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, linux-kernel, kernel-testers, linux-mm,
	Nick Piggin, Nishanth Aravamudan, Adam Litke

On Thu, Jun 19, 2008 at 11:27:47AM -0500, Jon Tollefson wrote:
> After running some of the libhugetlbfs tests the value for
> /proc/meminfo/HugePages_Rsvd becomes really large.  It looks like it has
> wrapped backwards from zero.
> Below is the sequence I used to run one of the tests that causes this;
> the tests passes for what it is intended to test but leaves a large
> value for reserved pages and that seemed strange to me.
> test run on ppc64 with 16M huge pages

Yes Adam reported that here yesterday, he found it in his hugetlfs testing.
I have done some investigation on it and it is being triggered by a bug in
the private reservation tracking patches.  It is triggered by the hugetlb
test which causes some complex vma splits to occur on a private mapping.

I believe I have the underlying problem nailed and do have some nearly
complete patches for this and they should be in a postable state by
tommorrow.

-apw

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3: BUG large value for HugePages_Rsvd
@ 2008-06-19 17:16     ` Andy Whitcroft
  0 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-19 17:16 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, linux-kernel, kernel-testers, linux-mm,
	Nick Piggin, Nishanth Aravamudan, Adam Litke

On Thu, Jun 19, 2008 at 11:27:47AM -0500, Jon Tollefson wrote:
> After running some of the libhugetlbfs tests the value for
> /proc/meminfo/HugePages_Rsvd becomes really large.  It looks like it has
> wrapped backwards from zero.
> Below is the sequence I used to run one of the tests that causes this;
> the tests passes for what it is intended to test but leaves a large
> value for reserved pages and that seemed strange to me.
> test run on ppc64 with 16M huge pages

Yes Adam reported that here yesterday, he found it in his hugetlfs testing.
I have done some investigation on it and it is being triggered by a bug in
the private reservation tracking patches.  It is triggered by the hugetlb
test which causes some complex vma splits to occur on a private mapping.

I believe I have the underlying problem nailed and do have some nearly
complete patches for this and they should be in a postable state by
tommorrow.

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3: BUG large value for HugePages_Rsvd
@ 2008-06-19 17:16     ` Andy Whitcroft
  0 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-19 17:16 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Nick Piggin,
	Nishanth Aravamudan, Adam Litke

On Thu, Jun 19, 2008 at 11:27:47AM -0500, Jon Tollefson wrote:
> After running some of the libhugetlbfs tests the value for
> /proc/meminfo/HugePages_Rsvd becomes really large.  It looks like it has
> wrapped backwards from zero.
> Below is the sequence I used to run one of the tests that causes this;
> the tests passes for what it is intended to test but leaves a large
> value for reserved pages and that seemed strange to me.
> test run on ppc64 with 16M huge pages

Yes Adam reported that here yesterday, he found it in his hugetlfs testing.
I have done some investigation on it and it is being triggered by a bug in
the private reservation tracking patches.  It is triggered by the hugetlb
test which causes some complex vma splits to occur on a private mapping.

I believe I have the underlying problem nailed and do have some nearly
complete patches for this and they should be in a postable state by
tommorrow.

-apw
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
  2008-06-19 14:45             ` Lee Schermerhorn
  (?)
@ 2008-06-20  0:47               ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-20  0:47 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Thu, 19 Jun 2008 10:45:22 -0400
Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:

> On Thu, 2008-06-19 at 09:22 +0900, KAMEZAWA Hiroyuki wrote:
> > On Wed, 18 Jun 2008 14:21:06 -0400
> > Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> > 
> > > On Wed, 2008-06-18 at 18:40 +0900, KAMEZAWA Hiroyuki wrote:
> > > > Lee-san, how about this ?
> > > > Tested on x86-64 and tried Nisimura-san's test at el. works good now.
> > > 
> > > I have been testing with my work load on both ia64 and x86_64 and it
> > > seems to be working well.  I'll let them run for a day or so.
> > > 
> > thank you.
> > <snip>
> 
> Update:
> 
> On x86_64 [32GB, 4xdual-core Opteron], my work load has run for ~20:40
> hours.  Still running.
> 
> On ia64 [32G, 16cpu, 4 node], the system started going into softlockup
> after ~7 hours.  Stack trace [below] indicates zone-lru lock in
> __page_cache_release() called from put_page().  Either heavy contention
> or failure to unlock.  Note that previous run, with patches to
> putback_lru_page() and unmap_and_move(), the same load ran for ~18 hours
> before I shut it down to try these patches.
> 

On ia64, ia64_spinlock_contention() enables irq and soft-lockup detection
by irq works well. On x86-64, irq is not enabled during spin-wait, and
soft-lockup detection irq cannot be handled until irq is enabled.
Then, it seems there is someone who drops into infinite-loop within
spin_lock_irqsave(&zone->lock, flags)....

Then, "A" cpu doesn't report soft-lockup while others report ?

Hmm..

-Kame



> I'm going to try again with the collected patches posted by Kosaki-san
> [for which, Thanks!].  If it occurs again, I'll deconfig the unevictable
> lru feature and see if I can reproduce it there.  It may be unrelated to
> the unevictable lru patches.
> 
> > 
> > > > @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
> > > >  	struct page *page;
> > > >  	pte_t pte;
> > > >  
> > > > +	/*
> > > > +	 * page is never be unmapped by page-reclaim. we lock this page now.
> > > > +	 */
> > > 
> > > I don't understand what you're trying to say here.  That is, what the
> > > point of this comment is...
> > > 
> > We access the page-table without taking pte_lock. But this vm is MLOCKED
> > and migration-race is handled. So we don't need to be too nervous to access
> > the pte. I'll consider more meaningful words.
> 
> OK, so you just want to note that we're accessing the pte w/o locking
> and that this is safe because the vma has been VM_LOCKED and all pages
> should be mlocked?  
> 
> I'll note that the vma is NOT VM_LOCKED during the pte walk.
> munlock_vma_pages_range() resets it so that try_to_unlock(), called from
> munlock_vma_page(), won't try to re-mlock the page.  However, we hold
> the mmap sem for write, so faults are held off--no need to worry about a
> COW fault occurring between when the VM_LOCKED was cleared and before
> the page is munlocked.  If that could occur, it could open a window
> where a non-mlocked page is mapped in this vma, and page reclaim could
> potentially unmap the page.  Shouldn't be an issue as long as we never
> downgrade the semaphore to read during munlock.
> 
> Lee
> 
> ----------
> softlockup stack trace for "usex" workload on ia64:
> 
> BUG: soft lockup - CPU#13 stuck for 61s! [usex:124359]
> Modules linked in: ipv6 sunrpc dm_mirror dm_log dm_multipath scsi_dh dm_mod pci_slot fan dock thermal sg sr_mod processor button container ehci_hcd ohci_hcd uhci_hcd usbcore
> 
> Pid: 124359, CPU 13, comm:                 usex
> psr : 00001010085a6010 ifs : 8000000000000000 ip  : [<a00000010000a1a0>]    Tainted: G      D   (2.6.26-rc5-mm3-kame-rework+mcl_inherit)
> ip is at ia64_spinlock_contention+0x20/0x60
> unat: 0000000000000000 pfs : 0000000000000081 rsc : 0000000000000003
> rnat: 0000000000000000 bsps: 0000000000000000 pr  : a65955959a96e969
> ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
> csd : 0000000000000000 ssd : 0000000000000000
> b0  : a0000001001264a0 b6  : a0000001006f0350 b7  : a00000010000b940
> f6  : 0ffff8000000000000000 f7  : 1003ecf3cf3cf3cf3cf3d
> f8  : 1003e0000000000000001 f9  : 1003e0000000000000015
> f10 : 1003e000003a82aaab1fb f11 : 1003e0000000000000000
> r1  : a000000100c03650 r2  : 000000000000038a r3  : 0000000000000001
> r8  : 00000010085a6010 r9  : 0000000000080028 r10 : 000000000000000b
> r11 : 0000000000000a80 r12 : e0000741aaac7d50 r13 : e0000741aaac0000
> r14 : 0000000000000000 r15 : a000400741329148 r16 : e000074000060100
> r17 : e000076000078e98 r18 : 0000000000000015 r19 : 0000000000000018
> r20 : 0000000000000003 r21 : 0000000000000002 r22 : e000076000078e88
> r23 : e000076000078e80 r24 : 0000000000000001 r25 : 0240000000080028
> r26 : ffffffffffff04d8 r27 : 00000010085a6010 r28 : 7fe3382473f8b380
> r29 : 9c00000000000000 r30 : 0000000000000001 r31 : e000074000061400
> 
> Call Trace:
>  [<a000000100015e00>] show_stack+0x80/0xa0
>                                 sp=e0000741aaac79b0 bsp=e0000741aaac1528
>  [<a000000100016700>] show_regs+0x880/0x8c0
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac14d0
>  [<a0000001000fbbe0>] softlockup_tick+0x2e0/0x340
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac1480
>  [<a0000001000a9400>] run_local_timers+0x40/0x60
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac1468
>  [<a0000001000a9460>] update_process_times+0x40/0xc0
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac1438
>  [<a00000010003ded0>] timer_interrupt+0x1b0/0x4a0
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac13d0
>  [<a0000001000fc480>] handle_IRQ_event+0x80/0x120
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac1398
>  [<a0000001000fc660>] __do_IRQ+0x140/0x440
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac1338
>  [<a0000001000136d0>] ia64_handle_irq+0x3f0/0x420
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac12c0
>  [<a00000010000c120>] ia64_native_leave_kernel+0x0/0x270
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac12c0
>  [<a00000010000a1a0>] ia64_spinlock_contention+0x20/0x60
>                                 sp=e0000741aaac7d50 bsp=e0000741aaac12c0
>  [<a0000001006f0350>] _spin_lock_irqsave+0x50/0x60
>                                 sp=e0000741aaac7d50 bsp=e0000741aaac12b8
> 
> Probably zone lru_lock in __page_cache_release().
> 
>  [<a0000001001264a0>] put_page+0x100/0x300
>                                 sp=e0000741aaac7d50 bsp=e0000741aaac1280
>  [<a000000100157170>] free_page_and_swap_cache+0x70/0xe0
>                                 sp=e0000741aaac7d50 bsp=e0000741aaac1260
>  [<a000000100145a10>] exit_mmap+0x3b0/0x580
>                                 sp=e0000741aaac7d50 bsp=e0000741aaac1210
>  [<a00000010008b420>] mmput+0x80/0x1c0
>                                 sp=e0000741aaac7e10 bsp=e0000741aaac11d8
> 
> NOTE:  all cpus show similar stack traces above here.  Some, however, get
> here from do_exit()/exit_mm(), rather than via execve().
> 
>  [<a00000010019c2c0>] flush_old_exec+0x5a0/0x1520
>                                 sp=e0000741aaac7e10 bsp=e0000741aaac10f0
>  [<a000000100213080>] load_elf_binary+0x7e0/0x2600
>                                 sp=e0000741aaac7e20 bsp=e0000741aaac0fb8
>  [<a00000010019b7a0>] search_binary_handler+0x1a0/0x520
>                                 sp=e0000741aaac7e20 bsp=e0000741aaac0f30
>  [<a00000010019e4e0>] do_execve+0x320/0x3e0
>                                 sp=e0000741aaac7e20 bsp=e0000741aaac0ed0
>  [<a000000100014d00>] sys_execve+0x60/0xc0
>                                 sp=e0000741aaac7e30 bsp=e0000741aaac0e98
>  [<a00000010000b690>] ia64_execve+0x30/0x140
>                                 sp=e0000741aaac7e30 bsp=e0000741aaac0e48
>  [<a00000010000bfa0>] ia64_ret_from_syscall+0x0/0x20
>                                 sp=e0000741aaac7e30 bsp=e0000741aaac0e48
>  [<a000000000010720>] __start_ivt_text+0xffffffff00010720/0x400
>                                 sp=e0000741aaac8000 bsp=e0000741aaac0e48
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-20  0:47               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-20  0:47 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Thu, 19 Jun 2008 10:45:22 -0400
Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:

> On Thu, 2008-06-19 at 09:22 +0900, KAMEZAWA Hiroyuki wrote:
> > On Wed, 18 Jun 2008 14:21:06 -0400
> > Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> > 
> > > On Wed, 2008-06-18 at 18:40 +0900, KAMEZAWA Hiroyuki wrote:
> > > > Lee-san, how about this ?
> > > > Tested on x86-64 and tried Nisimura-san's test at el. works good now.
> > > 
> > > I have been testing with my work load on both ia64 and x86_64 and it
> > > seems to be working well.  I'll let them run for a day or so.
> > > 
> > thank you.
> > <snip>
> 
> Update:
> 
> On x86_64 [32GB, 4xdual-core Opteron], my work load has run for ~20:40
> hours.  Still running.
> 
> On ia64 [32G, 16cpu, 4 node], the system started going into softlockup
> after ~7 hours.  Stack trace [below] indicates zone-lru lock in
> __page_cache_release() called from put_page().  Either heavy contention
> or failure to unlock.  Note that previous run, with patches to
> putback_lru_page() and unmap_and_move(), the same load ran for ~18 hours
> before I shut it down to try these patches.
> 

On ia64, ia64_spinlock_contention() enables irq and soft-lockup detection
by irq works well. On x86-64, irq is not enabled during spin-wait, and
soft-lockup detection irq cannot be handled until irq is enabled.
Then, it seems there is someone who drops into infinite-loop within
spin_lock_irqsave(&zone->lock, flags)....

Then, "A" cpu doesn't report soft-lockup while others report ?

Hmm..

-Kame



> I'm going to try again with the collected patches posted by Kosaki-san
> [for which, Thanks!].  If it occurs again, I'll deconfig the unevictable
> lru feature and see if I can reproduce it there.  It may be unrelated to
> the unevictable lru patches.
> 
> > 
> > > > @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
> > > >  	struct page *page;
> > > >  	pte_t pte;
> > > >  
> > > > +	/*
> > > > +	 * page is never be unmapped by page-reclaim. we lock this page now.
> > > > +	 */
> > > 
> > > I don't understand what you're trying to say here.  That is, what the
> > > point of this comment is...
> > > 
> > We access the page-table without taking pte_lock. But this vm is MLOCKED
> > and migration-race is handled. So we don't need to be too nervous to access
> > the pte. I'll consider more meaningful words.
> 
> OK, so you just want to note that we're accessing the pte w/o locking
> and that this is safe because the vma has been VM_LOCKED and all pages
> should be mlocked?  
> 
> I'll note that the vma is NOT VM_LOCKED during the pte walk.
> munlock_vma_pages_range() resets it so that try_to_unlock(), called from
> munlock_vma_page(), won't try to re-mlock the page.  However, we hold
> the mmap sem for write, so faults are held off--no need to worry about a
> COW fault occurring between when the VM_LOCKED was cleared and before
> the page is munlocked.  If that could occur, it could open a window
> where a non-mlocked page is mapped in this vma, and page reclaim could
> potentially unmap the page.  Shouldn't be an issue as long as we never
> downgrade the semaphore to read during munlock.
> 
> Lee
> 
> ----------
> softlockup stack trace for "usex" workload on ia64:
> 
> BUG: soft lockup - CPU#13 stuck for 61s! [usex:124359]
> Modules linked in: ipv6 sunrpc dm_mirror dm_log dm_multipath scsi_dh dm_mod pci_slot fan dock thermal sg sr_mod processor button container ehci_hcd ohci_hcd uhci_hcd usbcore
> 
> Pid: 124359, CPU 13, comm:                 usex
> psr : 00001010085a6010 ifs : 8000000000000000 ip  : [<a00000010000a1a0>]    Tainted: G      D   (2.6.26-rc5-mm3-kame-rework+mcl_inherit)
> ip is at ia64_spinlock_contention+0x20/0x60
> unat: 0000000000000000 pfs : 0000000000000081 rsc : 0000000000000003
> rnat: 0000000000000000 bsps: 0000000000000000 pr  : a65955959a96e969
> ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
> csd : 0000000000000000 ssd : 0000000000000000
> b0  : a0000001001264a0 b6  : a0000001006f0350 b7  : a00000010000b940
> f6  : 0ffff8000000000000000 f7  : 1003ecf3cf3cf3cf3cf3d
> f8  : 1003e0000000000000001 f9  : 1003e0000000000000015
> f10 : 1003e000003a82aaab1fb f11 : 1003e0000000000000000
> r1  : a000000100c03650 r2  : 000000000000038a r3  : 0000000000000001
> r8  : 00000010085a6010 r9  : 0000000000080028 r10 : 000000000000000b
> r11 : 0000000000000a80 r12 : e0000741aaac7d50 r13 : e0000741aaac0000
> r14 : 0000000000000000 r15 : a000400741329148 r16 : e000074000060100
> r17 : e000076000078e98 r18 : 0000000000000015 r19 : 0000000000000018
> r20 : 0000000000000003 r21 : 0000000000000002 r22 : e000076000078e88
> r23 : e000076000078e80 r24 : 0000000000000001 r25 : 0240000000080028
> r26 : ffffffffffff04d8 r27 : 00000010085a6010 r28 : 7fe3382473f8b380
> r29 : 9c00000000000000 r30 : 0000000000000001 r31 : e000074000061400
> 
> Call Trace:
>  [<a000000100015e00>] show_stack+0x80/0xa0
>                                 sp=e0000741aaac79b0 bsp=e0000741aaac1528
>  [<a000000100016700>] show_regs+0x880/0x8c0
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac14d0
>  [<a0000001000fbbe0>] softlockup_tick+0x2e0/0x340
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac1480
>  [<a0000001000a9400>] run_local_timers+0x40/0x60
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac1468
>  [<a0000001000a9460>] update_process_times+0x40/0xc0
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac1438
>  [<a00000010003ded0>] timer_interrupt+0x1b0/0x4a0
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac13d0
>  [<a0000001000fc480>] handle_IRQ_event+0x80/0x120
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac1398
>  [<a0000001000fc660>] __do_IRQ+0x140/0x440
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac1338
>  [<a0000001000136d0>] ia64_handle_irq+0x3f0/0x420
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac12c0
>  [<a00000010000c120>] ia64_native_leave_kernel+0x0/0x270
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac12c0
>  [<a00000010000a1a0>] ia64_spinlock_contention+0x20/0x60
>                                 sp=e0000741aaac7d50 bsp=e0000741aaac12c0
>  [<a0000001006f0350>] _spin_lock_irqsave+0x50/0x60
>                                 sp=e0000741aaac7d50 bsp=e0000741aaac12b8
> 
> Probably zone lru_lock in __page_cache_release().
> 
>  [<a0000001001264a0>] put_page+0x100/0x300
>                                 sp=e0000741aaac7d50 bsp=e0000741aaac1280
>  [<a000000100157170>] free_page_and_swap_cache+0x70/0xe0
>                                 sp=e0000741aaac7d50 bsp=e0000741aaac1260
>  [<a000000100145a10>] exit_mmap+0x3b0/0x580
>                                 sp=e0000741aaac7d50 bsp=e0000741aaac1210
>  [<a00000010008b420>] mmput+0x80/0x1c0
>                                 sp=e0000741aaac7e10 bsp=e0000741aaac11d8
> 
> NOTE:  all cpus show similar stack traces above here.  Some, however, get
> here from do_exit()/exit_mm(), rather than via execve().
> 
>  [<a00000010019c2c0>] flush_old_exec+0x5a0/0x1520
>                                 sp=e0000741aaac7e10 bsp=e0000741aaac10f0
>  [<a000000100213080>] load_elf_binary+0x7e0/0x2600
>                                 sp=e0000741aaac7e20 bsp=e0000741aaac0fb8
>  [<a00000010019b7a0>] search_binary_handler+0x1a0/0x520
>                                 sp=e0000741aaac7e20 bsp=e0000741aaac0f30
>  [<a00000010019e4e0>] do_execve+0x320/0x3e0
>                                 sp=e0000741aaac7e20 bsp=e0000741aaac0ed0
>  [<a000000100014d00>] sys_execve+0x60/0xc0
>                                 sp=e0000741aaac7e30 bsp=e0000741aaac0e98
>  [<a00000010000b690>] ia64_execve+0x30/0x140
>                                 sp=e0000741aaac7e30 bsp=e0000741aaac0e48
>  [<a00000010000bfa0>] ia64_ret_from_syscall+0x0/0x20
>                                 sp=e0000741aaac7e30 bsp=e0000741aaac0e48
>  [<a000000000010720>] __start_ivt_text+0xffffffff00010720/0x400
>                                 sp=e0000741aaac8000 bsp=e0000741aaac0e48
> 
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-20  0:47               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-20  0:47 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Thu, 19 Jun 2008 10:45:22 -0400
Lee Schermerhorn <Lee.Schermerhorn-VXdhtT5mjnY@public.gmane.org> wrote:

> On Thu, 2008-06-19 at 09:22 +0900, KAMEZAWA Hiroyuki wrote:
> > On Wed, 18 Jun 2008 14:21:06 -0400
> > Lee Schermerhorn <Lee.Schermerhorn-VXdhtT5mjnY@public.gmane.org> wrote:
> > 
> > > On Wed, 2008-06-18 at 18:40 +0900, KAMEZAWA Hiroyuki wrote:
> > > > Lee-san, how about this ?
> > > > Tested on x86-64 and tried Nisimura-san's test at el. works good now.
> > > 
> > > I have been testing with my work load on both ia64 and x86_64 and it
> > > seems to be working well.  I'll let them run for a day or so.
> > > 
> > thank you.
> > <snip>
> 
> Update:
> 
> On x86_64 [32GB, 4xdual-core Opteron], my work load has run for ~20:40
> hours.  Still running.
> 
> On ia64 [32G, 16cpu, 4 node], the system started going into softlockup
> after ~7 hours.  Stack trace [below] indicates zone-lru lock in
> __page_cache_release() called from put_page().  Either heavy contention
> or failure to unlock.  Note that previous run, with patches to
> putback_lru_page() and unmap_and_move(), the same load ran for ~18 hours
> before I shut it down to try these patches.
> 

On ia64, ia64_spinlock_contention() enables irq and soft-lockup detection
by irq works well. On x86-64, irq is not enabled during spin-wait, and
soft-lockup detection irq cannot be handled until irq is enabled.
Then, it seems there is someone who drops into infinite-loop within
spin_lock_irqsave(&zone->lock, flags)....

Then, "A" cpu doesn't report soft-lockup while others report ?

Hmm..

-Kame



> I'm going to try again with the collected patches posted by Kosaki-san
> [for which, Thanks!].  If it occurs again, I'll deconfig the unevictable
> lru feature and see if I can reproduce it there.  It may be unrelated to
> the unevictable lru patches.
> 
> > 
> > > > @@ -240,6 +232,9 @@ static int __munlock_pte_handler(pte_t *
> > > >  	struct page *page;
> > > >  	pte_t pte;
> > > >  
> > > > +	/*
> > > > +	 * page is never be unmapped by page-reclaim. we lock this page now.
> > > > +	 */
> > > 
> > > I don't understand what you're trying to say here.  That is, what the
> > > point of this comment is...
> > > 
> > We access the page-table without taking pte_lock. But this vm is MLOCKED
> > and migration-race is handled. So we don't need to be too nervous to access
> > the pte. I'll consider more meaningful words.
> 
> OK, so you just want to note that we're accessing the pte w/o locking
> and that this is safe because the vma has been VM_LOCKED and all pages
> should be mlocked?  
> 
> I'll note that the vma is NOT VM_LOCKED during the pte walk.
> munlock_vma_pages_range() resets it so that try_to_unlock(), called from
> munlock_vma_page(), won't try to re-mlock the page.  However, we hold
> the mmap sem for write, so faults are held off--no need to worry about a
> COW fault occurring between when the VM_LOCKED was cleared and before
> the page is munlocked.  If that could occur, it could open a window
> where a non-mlocked page is mapped in this vma, and page reclaim could
> potentially unmap the page.  Shouldn't be an issue as long as we never
> downgrade the semaphore to read during munlock.
> 
> Lee
> 
> ----------
> softlockup stack trace for "usex" workload on ia64:
> 
> BUG: soft lockup - CPU#13 stuck for 61s! [usex:124359]
> Modules linked in: ipv6 sunrpc dm_mirror dm_log dm_multipath scsi_dh dm_mod pci_slot fan dock thermal sg sr_mod processor button container ehci_hcd ohci_hcd uhci_hcd usbcore
> 
> Pid: 124359, CPU 13, comm:                 usex
> psr : 00001010085a6010 ifs : 8000000000000000 ip  : [<a00000010000a1a0>]    Tainted: G      D   (2.6.26-rc5-mm3-kame-rework+mcl_inherit)
> ip is at ia64_spinlock_contention+0x20/0x60
> unat: 0000000000000000 pfs : 0000000000000081 rsc : 0000000000000003
> rnat: 0000000000000000 bsps: 0000000000000000 pr  : a65955959a96e969
> ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
> csd : 0000000000000000 ssd : 0000000000000000
> b0  : a0000001001264a0 b6  : a0000001006f0350 b7  : a00000010000b940
> f6  : 0ffff8000000000000000 f7  : 1003ecf3cf3cf3cf3cf3d
> f8  : 1003e0000000000000001 f9  : 1003e0000000000000015
> f10 : 1003e000003a82aaab1fb f11 : 1003e0000000000000000
> r1  : a000000100c03650 r2  : 000000000000038a r3  : 0000000000000001
> r8  : 00000010085a6010 r9  : 0000000000080028 r10 : 000000000000000b
> r11 : 0000000000000a80 r12 : e0000741aaac7d50 r13 : e0000741aaac0000
> r14 : 0000000000000000 r15 : a000400741329148 r16 : e000074000060100
> r17 : e000076000078e98 r18 : 0000000000000015 r19 : 0000000000000018
> r20 : 0000000000000003 r21 : 0000000000000002 r22 : e000076000078e88
> r23 : e000076000078e80 r24 : 0000000000000001 r25 : 0240000000080028
> r26 : ffffffffffff04d8 r27 : 00000010085a6010 r28 : 7fe3382473f8b380
> r29 : 9c00000000000000 r30 : 0000000000000001 r31 : e000074000061400
> 
> Call Trace:
>  [<a000000100015e00>] show_stack+0x80/0xa0
>                                 sp=e0000741aaac79b0 bsp=e0000741aaac1528
>  [<a000000100016700>] show_regs+0x880/0x8c0
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac14d0
>  [<a0000001000fbbe0>] softlockup_tick+0x2e0/0x340
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac1480
>  [<a0000001000a9400>] run_local_timers+0x40/0x60
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac1468
>  [<a0000001000a9460>] update_process_times+0x40/0xc0
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac1438
>  [<a00000010003ded0>] timer_interrupt+0x1b0/0x4a0
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac13d0
>  [<a0000001000fc480>] handle_IRQ_event+0x80/0x120
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac1398
>  [<a0000001000fc660>] __do_IRQ+0x140/0x440
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac1338
>  [<a0000001000136d0>] ia64_handle_irq+0x3f0/0x420
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac12c0
>  [<a00000010000c120>] ia64_native_leave_kernel+0x0/0x270
>                                 sp=e0000741aaac7b80 bsp=e0000741aaac12c0
>  [<a00000010000a1a0>] ia64_spinlock_contention+0x20/0x60
>                                 sp=e0000741aaac7d50 bsp=e0000741aaac12c0
>  [<a0000001006f0350>] _spin_lock_irqsave+0x50/0x60
>                                 sp=e0000741aaac7d50 bsp=e0000741aaac12b8
> 
> Probably zone lru_lock in __page_cache_release().
> 
>  [<a0000001001264a0>] put_page+0x100/0x300
>                                 sp=e0000741aaac7d50 bsp=e0000741aaac1280
>  [<a000000100157170>] free_page_and_swap_cache+0x70/0xe0
>                                 sp=e0000741aaac7d50 bsp=e0000741aaac1260
>  [<a000000100145a10>] exit_mmap+0x3b0/0x580
>                                 sp=e0000741aaac7d50 bsp=e0000741aaac1210
>  [<a00000010008b420>] mmput+0x80/0x1c0
>                                 sp=e0000741aaac7e10 bsp=e0000741aaac11d8
> 
> NOTE:  all cpus show similar stack traces above here.  Some, however, get
> here from do_exit()/exit_mm(), rather than via execve().
> 
>  [<a00000010019c2c0>] flush_old_exec+0x5a0/0x1520
>                                 sp=e0000741aaac7e10 bsp=e0000741aaac10f0
>  [<a000000100213080>] load_elf_binary+0x7e0/0x2600
>                                 sp=e0000741aaac7e20 bsp=e0000741aaac0fb8
>  [<a00000010019b7a0>] search_binary_handler+0x1a0/0x520
>                                 sp=e0000741aaac7e20 bsp=e0000741aaac0f30
>  [<a00000010019e4e0>] do_execve+0x320/0x3e0
>                                 sp=e0000741aaac7e20 bsp=e0000741aaac0ed0
>  [<a000000100014d00>] sys_execve+0x60/0xc0
>                                 sp=e0000741aaac7e30 bsp=e0000741aaac0e98
>  [<a00000010000b690>] ia64_execve+0x30/0x140
>                                 sp=e0000741aaac7e30 bsp=e0000741aaac0e48
>  [<a00000010000bfa0>] ia64_ret_from_syscall+0x0/0x20
>                                 sp=e0000741aaac7e30 bsp=e0000741aaac0e48
>  [<a000000000010720>] __start_ivt_text+0xffffffff00010720/0x400
>                                 sp=e0000741aaac8000 bsp=e0000741aaac0e48
> 
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
  2008-06-19 14:45             ` Lee Schermerhorn
  (?)
@ 2008-06-20  1:13               ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-20  1:13 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

Lee-san, this is an additonal one..
Not-tested-yet, just by review.

Fixing page_lock() <-> zone->lock nesting of bad-behavior.

Before:
      lock_page()(TestSetPageLocked())
      spin_lock(zone->lock)
      unlock_page()
      spin_unlock(zone->lock)  
After:
      spin_lock(zone->lock)
      spin_unlock(zone->lock)

Including nit-pick fix. (I'll ask Kosaki-san to merge this to his 5/5)

Hmm...

---
 mm/vmscan.c |   25 +++++--------------------
 1 file changed, 5 insertions(+), 20 deletions(-)

Index: test-2.6.26-rc5-mm3/mm/vmscan.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/vmscan.c
+++ test-2.6.26-rc5-mm3/mm/vmscan.c
@@ -1106,7 +1106,7 @@ static unsigned long shrink_inactive_lis
 		if (nr_taken == 0)
 			goto done;
 
-		spin_lock(&zone->lru_lock);
+		spin_lock_irq(&zone->lru_lock);
 		/*
 		 * Put back any unfreeable pages.
 		 */
@@ -1136,9 +1136,8 @@ static unsigned long shrink_inactive_lis
 			}
 		}
   	} while (nr_scanned < max_scan);
-	spin_unlock(&zone->lru_lock);
+	spin_unlock_irq(&zone->lru_lock);
 done:
-	local_irq_enable();
 	pagevec_release(&pvec);
 	return nr_reclaimed;
 }
@@ -2438,7 +2437,7 @@ static void show_page_path(struct page *
  */
 static void check_move_unevictable_page(struct page *page, struct zone *zone)
 {
-
+retry:
 	ClearPageUnevictable(page); /* for page_evictable() */
 	if (page_evictable(page, NULL)) {
 		enum lru_list l = LRU_INACTIVE_ANON + page_is_file_cache(page);
@@ -2455,6 +2454,8 @@ static void check_move_unevictable_page(
 		 */
 		SetPageUnevictable(page);
 		list_move(&page->lru, &zone->lru[LRU_UNEVICTABLE].list);
+		if (page_evictable(page, NULL))
+			goto retry;
 	}
 }
 
@@ -2494,16 +2495,6 @@ void scan_mapping_unevictable_pages(stru
 				next = page_index;
 			next++;
 
-			if (TestSetPageLocked(page)) {
-				/*
-				 * OK, let's do it the hard way...
-				 */
-				if (zone)
-					spin_unlock_irq(&zone->lru_lock);
-				zone = NULL;
-				lock_page(page);
-			}
-
 			if (pagezone != zone) {
 				if (zone)
 					spin_unlock_irq(&zone->lru_lock);
@@ -2514,8 +2505,6 @@ void scan_mapping_unevictable_pages(stru
 			if (PageLRU(page) && PageUnevictable(page))
 				check_move_unevictable_page(page, zone);
 
-			unlock_page(page);
-
 		}
 		if (zone)
 			spin_unlock_irq(&zone->lru_lock);
@@ -2551,15 +2540,11 @@ void scan_zone_unevictable_pages(struct 
 		for (scan = 0;  scan < batch_size; scan++) {
 			struct page *page = lru_to_page(l_unevictable);
 
-			if (TestSetPageLocked(page))
-				continue;
-
 			prefetchw_prev_lru_page(page, l_unevictable, flags);
 
 			if (likely(PageLRU(page) && PageUnevictable(page)))
 				check_move_unevictable_page(page, zone);
 
-			unlock_page(page);
 		}
 		spin_unlock_irq(&zone->lru_lock);
 


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-20  1:13               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-20  1:13 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

Lee-san, this is an additonal one..
Not-tested-yet, just by review.

Fixing page_lock() <-> zone->lock nesting of bad-behavior.

Before:
      lock_page()(TestSetPageLocked())
      spin_lock(zone->lock)
      unlock_page()
      spin_unlock(zone->lock)  
After:
      spin_lock(zone->lock)
      spin_unlock(zone->lock)

Including nit-pick fix. (I'll ask Kosaki-san to merge this to his 5/5)

Hmm...

---
 mm/vmscan.c |   25 +++++--------------------
 1 file changed, 5 insertions(+), 20 deletions(-)

Index: test-2.6.26-rc5-mm3/mm/vmscan.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/vmscan.c
+++ test-2.6.26-rc5-mm3/mm/vmscan.c
@@ -1106,7 +1106,7 @@ static unsigned long shrink_inactive_lis
 		if (nr_taken == 0)
 			goto done;
 
-		spin_lock(&zone->lru_lock);
+		spin_lock_irq(&zone->lru_lock);
 		/*
 		 * Put back any unfreeable pages.
 		 */
@@ -1136,9 +1136,8 @@ static unsigned long shrink_inactive_lis
 			}
 		}
   	} while (nr_scanned < max_scan);
-	spin_unlock(&zone->lru_lock);
+	spin_unlock_irq(&zone->lru_lock);
 done:
-	local_irq_enable();
 	pagevec_release(&pvec);
 	return nr_reclaimed;
 }
@@ -2438,7 +2437,7 @@ static void show_page_path(struct page *
  */
 static void check_move_unevictable_page(struct page *page, struct zone *zone)
 {
-
+retry:
 	ClearPageUnevictable(page); /* for page_evictable() */
 	if (page_evictable(page, NULL)) {
 		enum lru_list l = LRU_INACTIVE_ANON + page_is_file_cache(page);
@@ -2455,6 +2454,8 @@ static void check_move_unevictable_page(
 		 */
 		SetPageUnevictable(page);
 		list_move(&page->lru, &zone->lru[LRU_UNEVICTABLE].list);
+		if (page_evictable(page, NULL))
+			goto retry;
 	}
 }
 
@@ -2494,16 +2495,6 @@ void scan_mapping_unevictable_pages(stru
 				next = page_index;
 			next++;
 
-			if (TestSetPageLocked(page)) {
-				/*
-				 * OK, let's do it the hard way...
-				 */
-				if (zone)
-					spin_unlock_irq(&zone->lru_lock);
-				zone = NULL;
-				lock_page(page);
-			}
-
 			if (pagezone != zone) {
 				if (zone)
 					spin_unlock_irq(&zone->lru_lock);
@@ -2514,8 +2505,6 @@ void scan_mapping_unevictable_pages(stru
 			if (PageLRU(page) && PageUnevictable(page))
 				check_move_unevictable_page(page, zone);
 
-			unlock_page(page);
-
 		}
 		if (zone)
 			spin_unlock_irq(&zone->lru_lock);
@@ -2551,15 +2540,11 @@ void scan_zone_unevictable_pages(struct 
 		for (scan = 0;  scan < batch_size; scan++) {
 			struct page *page = lru_to_page(l_unevictable);
 
-			if (TestSetPageLocked(page))
-				continue;
-
 			prefetchw_prev_lru_page(page, l_unevictable, flags);
 
 			if (likely(PageLRU(page) && PageUnevictable(page)))
 				check_move_unevictable_page(page, zone);
 
-			unlock_page(page);
 		}
 		spin_unlock_irq(&zone->lru_lock);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-20  1:13               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-20  1:13 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

Lee-san, this is an additonal one..
Not-tested-yet, just by review.

Fixing page_lock() <-> zone->lock nesting of bad-behavior.

Before:
      lock_page()(TestSetPageLocked())
      spin_lock(zone->lock)
      unlock_page()
      spin_unlock(zone->lock)  
After:
      spin_lock(zone->lock)
      spin_unlock(zone->lock)

Including nit-pick fix. (I'll ask Kosaki-san to merge this to his 5/5)

Hmm...

---
 mm/vmscan.c |   25 +++++--------------------
 1 file changed, 5 insertions(+), 20 deletions(-)

Index: test-2.6.26-rc5-mm3/mm/vmscan.c
===================================================================
--- test-2.6.26-rc5-mm3.orig/mm/vmscan.c
+++ test-2.6.26-rc5-mm3/mm/vmscan.c
@@ -1106,7 +1106,7 @@ static unsigned long shrink_inactive_lis
 		if (nr_taken == 0)
 			goto done;
 
-		spin_lock(&zone->lru_lock);
+		spin_lock_irq(&zone->lru_lock);
 		/*
 		 * Put back any unfreeable pages.
 		 */
@@ -1136,9 +1136,8 @@ static unsigned long shrink_inactive_lis
 			}
 		}
   	} while (nr_scanned < max_scan);
-	spin_unlock(&zone->lru_lock);
+	spin_unlock_irq(&zone->lru_lock);
 done:
-	local_irq_enable();
 	pagevec_release(&pvec);
 	return nr_reclaimed;
 }
@@ -2438,7 +2437,7 @@ static void show_page_path(struct page *
  */
 static void check_move_unevictable_page(struct page *page, struct zone *zone)
 {
-
+retry:
 	ClearPageUnevictable(page); /* for page_evictable() */
 	if (page_evictable(page, NULL)) {
 		enum lru_list l = LRU_INACTIVE_ANON + page_is_file_cache(page);
@@ -2455,6 +2454,8 @@ static void check_move_unevictable_page(
 		 */
 		SetPageUnevictable(page);
 		list_move(&page->lru, &zone->lru[LRU_UNEVICTABLE].list);
+		if (page_evictable(page, NULL))
+			goto retry;
 	}
 }
 
@@ -2494,16 +2495,6 @@ void scan_mapping_unevictable_pages(stru
 				next = page_index;
 			next++;
 
-			if (TestSetPageLocked(page)) {
-				/*
-				 * OK, let's do it the hard way...
-				 */
-				if (zone)
-					spin_unlock_irq(&zone->lru_lock);
-				zone = NULL;
-				lock_page(page);
-			}
-
 			if (pagezone != zone) {
 				if (zone)
 					spin_unlock_irq(&zone->lru_lock);
@@ -2514,8 +2505,6 @@ void scan_mapping_unevictable_pages(stru
 			if (PageLRU(page) && PageUnevictable(page))
 				check_move_unevictable_page(page, zone);
 
-			unlock_page(page);
-
 		}
 		if (zone)
 			spin_unlock_irq(&zone->lru_lock);
@@ -2551,15 +2540,11 @@ void scan_zone_unevictable_pages(struct 
 		for (scan = 0;  scan < batch_size; scan++) {
 			struct page *page = lru_to_page(l_unevictable);
 
-			if (TestSetPageLocked(page))
-				continue;
-
 			prefetchw_prev_lru_page(page, l_unevictable, flags);
 
 			if (likely(PageLRU(page) && PageUnevictable(page)))
 				check_move_unevictable_page(page, zone);
 
-			unlock_page(page);
 		}
 		spin_unlock_irq(&zone->lru_lock);
 

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3: BUG large value for HugePages_Rsvd
  2008-06-19 17:16     ` Andy Whitcroft
  (?)
@ 2008-06-20  3:18       ` Jon Tollefson
  -1 siblings, 0 replies; 290+ messages in thread
From: Jon Tollefson @ 2008-06-20  3:18 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Jon Tollefson, Andrew Morton, linux-kernel, kernel-testers,
	linux-mm, Nick Piggin, Nishanth Aravamudan, Adam Litke

Andy Whitcroft wrote:
> On Thu, Jun 19, 2008 at 11:27:47AM -0500, Jon Tollefson wrote:
>   
>> After running some of the libhugetlbfs tests the value for
>> /proc/meminfo/HugePages_Rsvd becomes really large.  It looks like it has
>> wrapped backwards from zero.
>> Below is the sequence I used to run one of the tests that causes this;
>> the tests passes for what it is intended to test but leaves a large
>> value for reserved pages and that seemed strange to me.
>> test run on ppc64 with 16M huge pages
>>     
>
> Yes Adam reported that here yesterday, he found it in his hugetlfs testing.
> I have done some investigation on it and it is being triggered by a bug in
> the private reservation tracking patches.  It is triggered by the hugetlb
> test which causes some complex vma splits to occur on a private mapping.
>   
sorry I missed that

> I believe I have the underlying problem nailed and do have some nearly
> complete patches for this and they should be in a postable state by
> tommorrow.
>   
Cool.
> -apw
>   
Jon


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3: BUG large value for HugePages_Rsvd
@ 2008-06-20  3:18       ` Jon Tollefson
  0 siblings, 0 replies; 290+ messages in thread
From: Jon Tollefson @ 2008-06-20  3:18 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Jon Tollefson, Andrew Morton, linux-kernel, kernel-testers,
	linux-mm, Nick Piggin, Nishanth Aravamudan, Adam Litke

Andy Whitcroft wrote:
> On Thu, Jun 19, 2008 at 11:27:47AM -0500, Jon Tollefson wrote:
>   
>> After running some of the libhugetlbfs tests the value for
>> /proc/meminfo/HugePages_Rsvd becomes really large.  It looks like it has
>> wrapped backwards from zero.
>> Below is the sequence I used to run one of the tests that causes this;
>> the tests passes for what it is intended to test but leaves a large
>> value for reserved pages and that seemed strange to me.
>> test run on ppc64 with 16M huge pages
>>     
>
> Yes Adam reported that here yesterday, he found it in his hugetlfs testing.
> I have done some investigation on it and it is being triggered by a bug in
> the private reservation tracking patches.  It is triggered by the hugetlb
> test which causes some complex vma splits to occur on a private mapping.
>   
sorry I missed that

> I believe I have the underlying problem nailed and do have some nearly
> complete patches for this and they should be in a postable state by
> tommorrow.
>   
Cool.
> -apw
>   
Jon

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: 2.6.26-rc5-mm3: BUG large value for HugePages_Rsvd
@ 2008-06-20  3:18       ` Jon Tollefson
  0 siblings, 0 replies; 290+ messages in thread
From: Jon Tollefson @ 2008-06-20  3:18 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Jon Tollefson, Andrew Morton,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Nick Piggin,
	Nishanth Aravamudan, Adam Litke

Andy Whitcroft wrote:
> On Thu, Jun 19, 2008 at 11:27:47AM -0500, Jon Tollefson wrote:
>   
>> After running some of the libhugetlbfs tests the value for
>> /proc/meminfo/HugePages_Rsvd becomes really large.  It looks like it has
>> wrapped backwards from zero.
>> Below is the sequence I used to run one of the tests that causes this;
>> the tests passes for what it is intended to test but leaves a large
>> value for reserved pages and that seemed strange to me.
>> test run on ppc64 with 16M huge pages
>>     
>
> Yes Adam reported that here yesterday, he found it in his hugetlfs testing.
> I have done some investigation on it and it is being triggered by a bug in
> the private reservation tracking patches.  It is triggered by the hugetlb
> test which causes some complex vma splits to occur on a private mapping.
>   
sorry I missed that

> I believe I have the underlying problem nailed and do have some nearly
> complete patches for this and they should be in a postable state by
> tommorrow.
>   
Cool.
> -apw
>   
Jon

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run()
  2008-06-19 15:51       ` Jeremy Fitzhardinge
  (?)
@ 2008-06-20 13:21         ` Ingo Molnar
  -1 siblings, 0 replies; 290+ messages in thread
From: Ingo Molnar @ 2008-06-20 13:21 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Rusty Russell, Hidehiro Kawai, Andrew Morton, linux-kernel,
	kernel-testers, linux-mm, sugita, Satoshi OSHIMA


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

>> This simply introduces a flag to allow us to disable the capability 
>> checks for internal callers (this is simpler than splitting the 
>> sched_setscheduler() function, since it loops checking permissions).
>>   
> What about?
>
> int sched_setscheduler(struct task_struct *p, int policy,
> 		       struct sched_param *param)
> {
> 	return __sched_setscheduler(p, policy, param, true);
> }
>
>
> int sched_setscheduler_nocheck(struct task_struct *p, int policy,
> 		               struct sched_param *param)
> {
> 	return __sched_setscheduler(p, policy, param, false);
> }
>
>
> (With the appropriate transformation of sched_setscheduler -> __)
>
> Better than scattering stray true/falses around the code.

agreed - it would also be less intrusive on the API change side.

i've created a new tip/sched/new-API-sched_setscheduler topic for this 
to track it, but it would be nice to have a v2 of this patch that 
introduces the new API the way suggested by Jeremy. (Hence the new topic 
is auto-merged into tip/master but not into linux-next yet.) Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run()
@ 2008-06-20 13:21         ` Ingo Molnar
  0 siblings, 0 replies; 290+ messages in thread
From: Ingo Molnar @ 2008-06-20 13:21 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Rusty Russell, Hidehiro Kawai, Andrew Morton, linux-kernel,
	kernel-testers, linux-mm, sugita, Satoshi OSHIMA

* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

>> This simply introduces a flag to allow us to disable the capability 
>> checks for internal callers (this is simpler than splitting the 
>> sched_setscheduler() function, since it loops checking permissions).
>>   
> What about?
>
> int sched_setscheduler(struct task_struct *p, int policy,
> 		       struct sched_param *param)
> {
> 	return __sched_setscheduler(p, policy, param, true);
> }
>
>
> int sched_setscheduler_nocheck(struct task_struct *p, int policy,
> 		               struct sched_param *param)
> {
> 	return __sched_setscheduler(p, policy, param, false);
> }
>
>
> (With the appropriate transformation of sched_setscheduler -> __)
>
> Better than scattering stray true/falses around the code.

agreed - it would also be less intrusive on the API change side.

i've created a new tip/sched/new-API-sched_setscheduler topic for this 
to track it, but it would be nice to have a v2 of this patch that 
introduces the new API the way suggested by Jeremy. (Hence the new topic 
is auto-merged into tip/master but not into linux-next yet.) Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run()
@ 2008-06-20 13:21         ` Ingo Molnar
  0 siblings, 0 replies; 290+ messages in thread
From: Ingo Molnar @ 2008-06-20 13:21 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Rusty Russell, Hidehiro Kawai, Andrew Morton,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, sugita, Satoshi OSHIMA


* Jeremy Fitzhardinge <jeremy-TSDbQ3PG+2Y@public.gmane.org> wrote:

>> This simply introduces a flag to allow us to disable the capability 
>> checks for internal callers (this is simpler than splitting the 
>> sched_setscheduler() function, since it loops checking permissions).
>>   
> What about?
>
> int sched_setscheduler(struct task_struct *p, int policy,
> 		       struct sched_param *param)
> {
> 	return __sched_setscheduler(p, policy, param, true);
> }
>
>
> int sched_setscheduler_nocheck(struct task_struct *p, int policy,
> 		               struct sched_param *param)
> {
> 	return __sched_setscheduler(p, policy, param, false);
> }
>
>
> (With the appropriate transformation of sched_setscheduler -> __)
>
> Better than scattering stray true/falses around the code.

agreed - it would also be less intrusive on the API change side.

i've created a new tip/sched/new-API-sched_setscheduler topic for this 
to track it, but it would be nice to have a v2 of this patch that 
introduces the new API the way suggested by Jeremy. (Hence the new topic 
is auto-merged into tip/master but not into linux-next yet.) Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: Re: [Experimental][PATCH] putback_lru_page rework
  2008-06-19 15:32             ` kamezawa.hiroyu
  (?)
@ 2008-06-20 16:24               ` Lee Schermerhorn
  -1 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-20 16:24 UTC (permalink / raw)
  To: kamezawa.hiroyu
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Fri, 2008-06-20 at 00:32 +0900, kamezawa.hiroyu@jp.fujitsu.com wrote:
> ----- Original Message -----
> >Subject: Re: [Experimental][PATCH] putback_lru_page rework
> >From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
> 
> >On Thu, 2008-06-19 at 09:22 +0900, KAMEZAWA Hiroyuki wrote:
> >> On Wed, 18 Jun 2008 14:21:06 -0400
> >> Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> >> 
> >> > On Wed, 2008-06-18 at 18:40 +0900, KAMEZAWA Hiroyuki wrote:
> >> > > Lee-san, how about this ?
> >> > > Tested on x86-64 and tried Nisimura-san's test at el. works good now.
> >> > 
> >> > I have been testing with my work load on both ia64 and x86_64 and it
> >> > seems to be working well.  I'll let them run for a day or so.
> >> > 
> >> thank you.
> >> <snip>
> >
> >Update:
> >
> >On x86_64 [32GB, 4xdual-core Opteron], my work load has run for ~20:40
> >hours.  Still running.
> >
> >On ia64 [32G, 16cpu, 4 node], the system started going into softlockup
> >after ~7 hours.  Stack trace [below] indicates zone-lru lock in
> >__page_cache_release() called from put_page().  Either heavy contention
> >or failure to unlock.  Note that previous run, with patches to
> >putback_lru_page() and unmap_and_move(), the same load ran for ~18 hours
> >before I shut it down to try these patches.
> >
> Thanks, then there are more troubles should be shooted down.
> 
> 
> >I'm going to try again with the collected patches posted by Kosaki-san
> >[for which, Thanks!].  If it occurs again, I'll deconfig the unevictable
> >lru feature and see if I can reproduce it there.  It may be unrelated to
> >the unevictable lru patches.
> >
> I hope so...Hmm..I'll dig tomorrow. 

Another update--with the collected patches:

Again, the x86_64 ran for > 22 hours w/o error before I shut it down.

And, again, the ia64 went into soft lockup--same stack traces.  This
time after > 17 hours of running.  It is possible that a BUG started
this, but it has long scrolled out of my terminal buffer by the time I
see the system.

I'm now trying the ia64 platform with 26-rc5-mm3 + collected patches
with UNEVICTABLE_LRU de-configured.  I'll start that up today and let it
run over the weekend [with panic_on_oops set] if it hasn't hit the
problem before I leave.

Regards,
Lee


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-20 16:24               ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-20 16:24 UTC (permalink / raw)
  To: kamezawa.hiroyu
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Fri, 2008-06-20 at 00:32 +0900, kamezawa.hiroyu@jp.fujitsu.com wrote:
> ----- Original Message -----
> >Subject: Re: [Experimental][PATCH] putback_lru_page rework
> >From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
> 
> >On Thu, 2008-06-19 at 09:22 +0900, KAMEZAWA Hiroyuki wrote:
> >> On Wed, 18 Jun 2008 14:21:06 -0400
> >> Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> >> 
> >> > On Wed, 2008-06-18 at 18:40 +0900, KAMEZAWA Hiroyuki wrote:
> >> > > Lee-san, how about this ?
> >> > > Tested on x86-64 and tried Nisimura-san's test at el. works good now.
> >> > 
> >> > I have been testing with my work load on both ia64 and x86_64 and it
> >> > seems to be working well.  I'll let them run for a day or so.
> >> > 
> >> thank you.
> >> <snip>
> >
> >Update:
> >
> >On x86_64 [32GB, 4xdual-core Opteron], my work load has run for ~20:40
> >hours.  Still running.
> >
> >On ia64 [32G, 16cpu, 4 node], the system started going into softlockup
> >after ~7 hours.  Stack trace [below] indicates zone-lru lock in
> >__page_cache_release() called from put_page().  Either heavy contention
> >or failure to unlock.  Note that previous run, with patches to
> >putback_lru_page() and unmap_and_move(), the same load ran for ~18 hours
> >before I shut it down to try these patches.
> >
> Thanks, then there are more troubles should be shooted down.
> 
> 
> >I'm going to try again with the collected patches posted by Kosaki-san
> >[for which, Thanks!].  If it occurs again, I'll deconfig the unevictable
> >lru feature and see if I can reproduce it there.  It may be unrelated to
> >the unevictable lru patches.
> >
> I hope so...Hmm..I'll dig tomorrow. 

Another update--with the collected patches:

Again, the x86_64 ran for > 22 hours w/o error before I shut it down.

And, again, the ia64 went into soft lockup--same stack traces.  This
time after > 17 hours of running.  It is possible that a BUG started
this, but it has long scrolled out of my terminal buffer by the time I
see the system.

I'm now trying the ia64 platform with 26-rc5-mm3 + collected patches
with UNEVICTABLE_LRU de-configured.  I'll start that up today and let it
run over the weekend [with panic_on_oops set] if it hasn't hit the
problem before I leave.

Regards,
Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-20 16:24               ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-20 16:24 UTC (permalink / raw)
  To: kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Fri, 2008-06-20 at 00:32 +0900, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org wrote:
> ----- Original Message -----
> >Subject: Re: [Experimental][PATCH] putback_lru_page rework
> >From: Lee Schermerhorn <Lee.Schermerhorn-VXdhtT5mjnY@public.gmane.org>
> 
> >On Thu, 2008-06-19 at 09:22 +0900, KAMEZAWA Hiroyuki wrote:
> >> On Wed, 18 Jun 2008 14:21:06 -0400
> >> Lee Schermerhorn <Lee.Schermerhorn-VXdhtT5mjnY@public.gmane.org> wrote:
> >> 
> >> > On Wed, 2008-06-18 at 18:40 +0900, KAMEZAWA Hiroyuki wrote:
> >> > > Lee-san, how about this ?
> >> > > Tested on x86-64 and tried Nisimura-san's test at el. works good now.
> >> > 
> >> > I have been testing with my work load on both ia64 and x86_64 and it
> >> > seems to be working well.  I'll let them run for a day or so.
> >> > 
> >> thank you.
> >> <snip>
> >
> >Update:
> >
> >On x86_64 [32GB, 4xdual-core Opteron], my work load has run for ~20:40
> >hours.  Still running.
> >
> >On ia64 [32G, 16cpu, 4 node], the system started going into softlockup
> >after ~7 hours.  Stack trace [below] indicates zone-lru lock in
> >__page_cache_release() called from put_page().  Either heavy contention
> >or failure to unlock.  Note that previous run, with patches to
> >putback_lru_page() and unmap_and_move(), the same load ran for ~18 hours
> >before I shut it down to try these patches.
> >
> Thanks, then there are more troubles should be shooted down.
> 
> 
> >I'm going to try again with the collected patches posted by Kosaki-san
> >[for which, Thanks!].  If it occurs again, I'll deconfig the unevictable
> >lru feature and see if I can reproduce it there.  It may be unrelated to
> >the unevictable lru patches.
> >
> I hope so...Hmm..I'll dig tomorrow. 

Another update--with the collected patches:

Again, the x86_64 ran for > 22 hours w/o error before I shut it down.

And, again, the ia64 went into soft lockup--same stack traces.  This
time after > 17 hours of running.  It is possible that a BUG started
this, but it has long scrolled out of my terminal buffer by the time I
see the system.

I'm now trying the ia64 platform with 26-rc5-mm3 + collected patches
with UNEVICTABLE_LRU de-configured.  I'll start that up today and let it
run over the weekend [with panic_on_oops set] if it hasn't hit the
problem before I leave.

Regards,
Lee

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
  2008-06-20  1:13               ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-20 17:10                 ` Lee Schermerhorn
  -1 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-20 17:10 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Fri, 2008-06-20 at 10:13 +0900, KAMEZAWA Hiroyuki wrote:
> Lee-san, this is an additonal one..
> Not-tested-yet, just by review.

OK, I'll test this on my x86_64 platform, which doesn't seem to hit the
soft lockups.

> 
> Fixing page_lock() <-> zone->lock nesting of bad-behavior.
> 
> Before:
>       lock_page()(TestSetPageLocked())
>       spin_lock(zone->lock)
>       unlock_page()
>       spin_unlock(zone->lock)  

Couple of comments:
* I believe that the locks are acquired in the right order--at least as
documented in the comments in mm/rmap.c.  
* The unlocking appears out of order because this function attempts to
hold the zone lock across a few pages in the pagevec, but must switch to
a different zone lru lock when it finds a page on a different zone from
the zone whose lock it is holding--like in the pagevec draining
functions, altho' they don't need to lock the page.

> After:
>       spin_lock(zone->lock)
>       spin_unlock(zone->lock)

Right.  With your reworked check_move_unevictable_page() [with retry],
we don't need to lock the page here, any more.  That means we can revert
all of the changes to pass the mapping back to sys_shmctl() and move the
call to scan_mapping_unevictable_pages() back to shmem_lock() after
clearing the address_space's unevictable flag.  We only did that to
avoid sleeping while holding the shmem_inode_info lock and the
shmid_kernel's ipc_perm spinlock.  

Shall I handle that, after we've tested this patch?

> 
> Including nit-pick fix. (I'll ask Kosaki-san to merge this to his 5/5)
> 
> Hmm...
> 
> ---
>  mm/vmscan.c |   25 +++++--------------------
>  1 file changed, 5 insertions(+), 20 deletions(-)
> 
> Index: test-2.6.26-rc5-mm3/mm/vmscan.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/vmscan.c
> +++ test-2.6.26-rc5-mm3/mm/vmscan.c
> @@ -1106,7 +1106,7 @@ static unsigned long shrink_inactive_lis
>  		if (nr_taken == 0)
>  			goto done;
>  
> -		spin_lock(&zone->lru_lock);
> +		spin_lock_irq(&zone->lru_lock);

1) It appears that the spin_lock() [no '_irq'] was there because irqs
are disabled a few lines above so that we could use non-atomic
__count[_zone]_vm_events().  
2) I think this predates the split lru or unevictable lru patches, so
these changes are unrelated.
>  		/*
>  		 * Put back any unfreeable pages.
>  		 */
> @@ -1136,9 +1136,8 @@ static unsigned long shrink_inactive_lis
>  			}
>  		}
>    	} while (nr_scanned < max_scan);
> -	spin_unlock(&zone->lru_lock);
> +	spin_unlock_irq(&zone->lru_lock);
>  done:
> -	local_irq_enable();
>  	pagevec_release(&pvec);
>  	return nr_reclaimed;
>  }
> @@ -2438,7 +2437,7 @@ static void show_page_path(struct page *
>   */
>  static void check_move_unevictable_page(struct page *page, struct zone *zone)
>  {
> -
> +retry:
>  	ClearPageUnevictable(page); /* for page_evictable() */
We can remove this comment            ^^^^^^^^^^^^^^^^^^^^^^^^^^
page_evictable() no longer asserts !PageUnevictable(), right?

>  	if (page_evictable(page, NULL)) {
>  		enum lru_list l = LRU_INACTIVE_ANON + page_is_file_cache(page);
> @@ -2455,6 +2454,8 @@ static void check_move_unevictable_page(
>  		 */
>  		SetPageUnevictable(page);
>  		list_move(&page->lru, &zone->lru[LRU_UNEVICTABLE].list);
> +		if (page_evictable(page, NULL))
> +			goto retry;
>  	}
>  }
>  
> @@ -2494,16 +2495,6 @@ void scan_mapping_unevictable_pages(stru
>  				next = page_index;
>  			next++;
>  
> -			if (TestSetPageLocked(page)) {
> -				/*
> -				 * OK, let's do it the hard way...
> -				 */
> -				if (zone)
> -					spin_unlock_irq(&zone->lru_lock);
> -				zone = NULL;
> -				lock_page(page);
> -			}
> -
>  			if (pagezone != zone) {
>  				if (zone)
>  					spin_unlock_irq(&zone->lru_lock);
> @@ -2514,8 +2505,6 @@ void scan_mapping_unevictable_pages(stru
>  			if (PageLRU(page) && PageUnevictable(page))
>  				check_move_unevictable_page(page, zone);
>  
> -			unlock_page(page);
> -
>  		}
>  		if (zone)
>  			spin_unlock_irq(&zone->lru_lock);
> @@ -2551,15 +2540,11 @@ void scan_zone_unevictable_pages(struct 
>  		for (scan = 0;  scan < batch_size; scan++) {
>  			struct page *page = lru_to_page(l_unevictable);
>  
> -			if (TestSetPageLocked(page))
> -				continue;
> -
>  			prefetchw_prev_lru_page(page, l_unevictable, flags);
>  
>  			if (likely(PageLRU(page) && PageUnevictable(page)))
>  				check_move_unevictable_page(page, zone);
>  
> -			unlock_page(page);
>  		}
>  		spin_unlock_irq(&zone->lru_lock);
>  
> 

I'll let you know how it goes.

Later,
Lee


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-20 17:10                 ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-20 17:10 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Fri, 2008-06-20 at 10:13 +0900, KAMEZAWA Hiroyuki wrote:
> Lee-san, this is an additonal one..
> Not-tested-yet, just by review.

OK, I'll test this on my x86_64 platform, which doesn't seem to hit the
soft lockups.

> 
> Fixing page_lock() <-> zone->lock nesting of bad-behavior.
> 
> Before:
>       lock_page()(TestSetPageLocked())
>       spin_lock(zone->lock)
>       unlock_page()
>       spin_unlock(zone->lock)  

Couple of comments:
* I believe that the locks are acquired in the right order--at least as
documented in the comments in mm/rmap.c.  
* The unlocking appears out of order because this function attempts to
hold the zone lock across a few pages in the pagevec, but must switch to
a different zone lru lock when it finds a page on a different zone from
the zone whose lock it is holding--like in the pagevec draining
functions, altho' they don't need to lock the page.

> After:
>       spin_lock(zone->lock)
>       spin_unlock(zone->lock)

Right.  With your reworked check_move_unevictable_page() [with retry],
we don't need to lock the page here, any more.  That means we can revert
all of the changes to pass the mapping back to sys_shmctl() and move the
call to scan_mapping_unevictable_pages() back to shmem_lock() after
clearing the address_space's unevictable flag.  We only did that to
avoid sleeping while holding the shmem_inode_info lock and the
shmid_kernel's ipc_perm spinlock.  

Shall I handle that, after we've tested this patch?

> 
> Including nit-pick fix. (I'll ask Kosaki-san to merge this to his 5/5)
> 
> Hmm...
> 
> ---
>  mm/vmscan.c |   25 +++++--------------------
>  1 file changed, 5 insertions(+), 20 deletions(-)
> 
> Index: test-2.6.26-rc5-mm3/mm/vmscan.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/vmscan.c
> +++ test-2.6.26-rc5-mm3/mm/vmscan.c
> @@ -1106,7 +1106,7 @@ static unsigned long shrink_inactive_lis
>  		if (nr_taken == 0)
>  			goto done;
>  
> -		spin_lock(&zone->lru_lock);
> +		spin_lock_irq(&zone->lru_lock);

1) It appears that the spin_lock() [no '_irq'] was there because irqs
are disabled a few lines above so that we could use non-atomic
__count[_zone]_vm_events().  
i>>?2) I think this predates the split lru or unevictable lru patches, so
these changes are unrelated.
>  		/*
>  		 * Put back any unfreeable pages.
>  		 */
> @@ -1136,9 +1136,8 @@ static unsigned long shrink_inactive_lis
>  			}
>  		}
>    	} while (nr_scanned < max_scan);
> -	spin_unlock(&zone->lru_lock);
> +	spin_unlock_irq(&zone->lru_lock);
>  done:
> -	local_irq_enable();
>  	pagevec_release(&pvec);
>  	return nr_reclaimed;
>  }
> @@ -2438,7 +2437,7 @@ static void show_page_path(struct page *
>   */
>  static void check_move_unevictable_page(struct page *page, struct zone *zone)
>  {
> -
> +retry:
>  	ClearPageUnevictable(page); /* for page_evictable() */
We can remove this comment            ^^^^^^^^^^^^^^^^^^^^^^^^^^
page_evictable() no longer asserts !PageUnevictable(), right?

>  	if (page_evictable(page, NULL)) {
>  		enum lru_list l = LRU_INACTIVE_ANON + page_is_file_cache(page);
> @@ -2455,6 +2454,8 @@ static void check_move_unevictable_page(
>  		 */
>  		SetPageUnevictable(page);
>  		list_move(&page->lru, &zone->lru[LRU_UNEVICTABLE].list);
> +		if (page_evictable(page, NULL))
> +			goto retry;
>  	}
>  }
>  
> @@ -2494,16 +2495,6 @@ void scan_mapping_unevictable_pages(stru
>  				next = page_index;
>  			next++;
>  
> -			if (TestSetPageLocked(page)) {
> -				/*
> -				 * OK, let's do it the hard way...
> -				 */
> -				if (zone)
> -					spin_unlock_irq(&zone->lru_lock);
> -				zone = NULL;
> -				lock_page(page);
> -			}
> -
>  			if (pagezone != zone) {
>  				if (zone)
>  					spin_unlock_irq(&zone->lru_lock);
> @@ -2514,8 +2505,6 @@ void scan_mapping_unevictable_pages(stru
>  			if (PageLRU(page) && PageUnevictable(page))
>  				check_move_unevictable_page(page, zone);
>  
> -			unlock_page(page);
> -
>  		}
>  		if (zone)
>  			spin_unlock_irq(&zone->lru_lock);
> @@ -2551,15 +2540,11 @@ void scan_zone_unevictable_pages(struct 
>  		for (scan = 0;  scan < batch_size; scan++) {
>  			struct page *page = lru_to_page(l_unevictable);
>  
> -			if (TestSetPageLocked(page))
> -				continue;
> -
>  			prefetchw_prev_lru_page(page, l_unevictable, flags);
>  
>  			if (likely(PageLRU(page) && PageUnevictable(page)))
>  				check_move_unevictable_page(page, zone);
>  
> -			unlock_page(page);
>  		}
>  		spin_unlock_irq(&zone->lru_lock);
>  
> 

I'll let you know how it goes.

Later,
Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-20 17:10                 ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-20 17:10 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Fri, 2008-06-20 at 10:13 +0900, KAMEZAWA Hiroyuki wrote:
> Lee-san, this is an additonal one..
> Not-tested-yet, just by review.

OK, I'll test this on my x86_64 platform, which doesn't seem to hit the
soft lockups.

> 
> Fixing page_lock() <-> zone->lock nesting of bad-behavior.
> 
> Before:
>       lock_page()(TestSetPageLocked())
>       spin_lock(zone->lock)
>       unlock_page()
>       spin_unlock(zone->lock)  

Couple of comments:
* I believe that the locks are acquired in the right order--at least as
documented in the comments in mm/rmap.c.  
* The unlocking appears out of order because this function attempts to
hold the zone lock across a few pages in the pagevec, but must switch to
a different zone lru lock when it finds a page on a different zone from
the zone whose lock it is holding--like in the pagevec draining
functions, altho' they don't need to lock the page.

> After:
>       spin_lock(zone->lock)
>       spin_unlock(zone->lock)

Right.  With your reworked check_move_unevictable_page() [with retry],
we don't need to lock the page here, any more.  That means we can revert
all of the changes to pass the mapping back to sys_shmctl() and move the
call to scan_mapping_unevictable_pages() back to shmem_lock() after
clearing the address_space's unevictable flag.  We only did that to
avoid sleeping while holding the shmem_inode_info lock and the
shmid_kernel's ipc_perm spinlock.  

Shall I handle that, after we've tested this patch?

> 
> Including nit-pick fix. (I'll ask Kosaki-san to merge this to his 5/5)
> 
> Hmm...
> 
> ---
>  mm/vmscan.c |   25 +++++--------------------
>  1 file changed, 5 insertions(+), 20 deletions(-)
> 
> Index: test-2.6.26-rc5-mm3/mm/vmscan.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/vmscan.c
> +++ test-2.6.26-rc5-mm3/mm/vmscan.c
> @@ -1106,7 +1106,7 @@ static unsigned long shrink_inactive_lis
>  		if (nr_taken == 0)
>  			goto done;
>  
> -		spin_lock(&zone->lru_lock);
> +		spin_lock_irq(&zone->lru_lock);

1) It appears that the spin_lock() [no '_irq'] was there because irqs
are disabled a few lines above so that we could use non-atomic
__count[_zone]_vm_events().  
2) I think this predates the split lru or unevictable lru patches, so
these changes are unrelated.
>  		/*
>  		 * Put back any unfreeable pages.
>  		 */
> @@ -1136,9 +1136,8 @@ static unsigned long shrink_inactive_lis
>  			}
>  		}
>    	} while (nr_scanned < max_scan);
> -	spin_unlock(&zone->lru_lock);
> +	spin_unlock_irq(&zone->lru_lock);
>  done:
> -	local_irq_enable();
>  	pagevec_release(&pvec);
>  	return nr_reclaimed;
>  }
> @@ -2438,7 +2437,7 @@ static void show_page_path(struct page *
>   */
>  static void check_move_unevictable_page(struct page *page, struct zone *zone)
>  {
> -
> +retry:
>  	ClearPageUnevictable(page); /* for page_evictable() */
We can remove this comment            ^^^^^^^^^^^^^^^^^^^^^^^^^^
page_evictable() no longer asserts !PageUnevictable(), right?

>  	if (page_evictable(page, NULL)) {
>  		enum lru_list l = LRU_INACTIVE_ANON + page_is_file_cache(page);
> @@ -2455,6 +2454,8 @@ static void check_move_unevictable_page(
>  		 */
>  		SetPageUnevictable(page);
>  		list_move(&page->lru, &zone->lru[LRU_UNEVICTABLE].list);
> +		if (page_evictable(page, NULL))
> +			goto retry;
>  	}
>  }
>  
> @@ -2494,16 +2495,6 @@ void scan_mapping_unevictable_pages(stru
>  				next = page_index;
>  			next++;
>  
> -			if (TestSetPageLocked(page)) {
> -				/*
> -				 * OK, let's do it the hard way...
> -				 */
> -				if (zone)
> -					spin_unlock_irq(&zone->lru_lock);
> -				zone = NULL;
> -				lock_page(page);
> -			}
> -
>  			if (pagezone != zone) {
>  				if (zone)
>  					spin_unlock_irq(&zone->lru_lock);
> @@ -2514,8 +2505,6 @@ void scan_mapping_unevictable_pages(stru
>  			if (PageLRU(page) && PageUnevictable(page))
>  				check_move_unevictable_page(page, zone);
>  
> -			unlock_page(page);
> -
>  		}
>  		if (zone)
>  			spin_unlock_irq(&zone->lru_lock);
> @@ -2551,15 +2540,11 @@ void scan_zone_unevictable_pages(struct 
>  		for (scan = 0;  scan < batch_size; scan++) {
>  			struct page *page = lru_to_page(l_unevictable);
>  
> -			if (TestSetPageLocked(page))
> -				continue;
> -
>  			prefetchw_prev_lru_page(page, l_unevictable, flags);
>  
>  			if (likely(PageLRU(page) && PageUnevictable(page)))
>  				check_move_unevictable_page(page, zone);
>  
> -			unlock_page(page);
>  		}
>  		spin_unlock_irq(&zone->lru_lock);
>  
> 

I'll let you know how it goes.

Later,
Lee

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [RFC] hugetlb reservations -- MAP_PRIVATE fixes for split vmas
  2008-06-19 16:27   ` Jon Tollefson
  (?)
@ 2008-06-20 19:17     ` Andy Whitcroft
  -1 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-20 19:17 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel, kernel-testers, linux-mm, Mel Gorman,
	Andy Whitcroft

As reported by Adam Litke and Jon Tollefson one of the libhugetlbfs
regression tests triggers a negative overall reservation count.  When
this occurs where there is no dynamic pool enabled tests will fail.

Following this email are two patches to fix this issue:

hugetlb reservations: move region tracking earlier -- simply moves the
  region tracking code earlier so we do not have to supply prototypes, and

hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma
  splits -- which moves us to tracking the consumed reservation so that
  we can correctly calculate the remaining reservations at vma close time.

This stack is against the top of v2.6.25-rc6-mm3, should this solution
prove acceptable it would probabally need porting below Nicks multiple
hugepage size patches and those updated; if so I would be happy to do
that too.

Jon could you have a test on this and see if it works out for you.

-apw

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [RFC] hugetlb reservations -- MAP_PRIVATE fixes for split vmas
@ 2008-06-20 19:17     ` Andy Whitcroft
  0 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-20 19:17 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel, kernel-testers, linux-mm, Mel Gorman,
	Andy Whitcroft

As reported by Adam Litke and Jon Tollefson one of the libhugetlbfs
regression tests triggers a negative overall reservation count.  When
this occurs where there is no dynamic pool enabled tests will fail.

Following this email are two patches to fix this issue:

hugetlb reservations: move region tracking earlier -- simply moves the
  region tracking code earlier so we do not have to supply prototypes, and

hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma
  splits -- which moves us to tracking the consumed reservation so that
  we can correctly calculate the remaining reservations at vma close time.

This stack is against the top of v2.6.25-rc6-mm3, should this solution
prove acceptable it would probabally need porting below Nicks multiple
hugepage size patches and those updated; if so I would be happy to do
that too.

Jon could you have a test on this and see if it works out for you.

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [RFC] hugetlb reservations -- MAP_PRIVATE fixes for split vmas
@ 2008-06-20 19:17     ` Andy Whitcroft
  0 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-20 19:17 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Mel Gorman, Andy Whitcroft

As reported by Adam Litke and Jon Tollefson one of the libhugetlbfs
regression tests triggers a negative overall reservation count.  When
this occurs where there is no dynamic pool enabled tests will fail.

Following this email are two patches to fix this issue:

hugetlb reservations: move region tracking earlier -- simply moves the
  region tracking code earlier so we do not have to supply prototypes, and

hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma
  splits -- which moves us to tracking the consumed reservation so that
  we can correctly calculate the remaining reservations at vma close time.

This stack is against the top of v2.6.25-rc6-mm3, should this solution
prove acceptable it would probabally need porting below Nicks multiple
hugepage size patches and those updated; if so I would be happy to do
that too.

Jon could you have a test on this and see if it works out for you.

-apw
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH 1/2] hugetlb reservations: move region tracking earlier
  2008-06-20 19:17     ` Andy Whitcroft
@ 2008-06-20 19:17       ` Andy Whitcroft
  -1 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-20 19:17 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel, kernel-testers, linux-mm, Mel Gorman,
	Andy Whitcroft

Move the region tracking code much earlier so we can use it for page
presence tracking later on.  No code is changed, just its location.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 mm/hugetlb.c |  246 +++++++++++++++++++++++++++++----------------------------
 1 files changed, 125 insertions(+), 121 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0f76ed1..d701e39 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -47,6 +47,131 @@ static unsigned long __initdata default_hstate_size;
 static DEFINE_SPINLOCK(hugetlb_lock);
 
 /*
+ * Region tracking -- allows tracking of reservations and instantiated pages
+ *                    across the pages in a mapping.
+ */
+struct file_region {
+	struct list_head link;
+	long from;
+	long to;
+};
+
+static long region_add(struct list_head *head, long f, long t)
+{
+	struct file_region *rg, *nrg, *trg;
+
+	/* Locate the region we are either in or before. */
+	list_for_each_entry(rg, head, link)
+		if (f <= rg->to)
+			break;
+
+	/* Round our left edge to the current segment if it encloses us. */
+	if (f > rg->from)
+		f = rg->from;
+
+	/* Check for and consume any regions we now overlap with. */
+	nrg = rg;
+	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
+		if (&rg->link == head)
+			break;
+		if (rg->from > t)
+			break;
+
+		/* If this area reaches higher then extend our area to
+		 * include it completely.  If this is not the first area
+		 * which we intend to reuse, free it. */
+		if (rg->to > t)
+			t = rg->to;
+		if (rg != nrg) {
+			list_del(&rg->link);
+			kfree(rg);
+		}
+	}
+	nrg->from = f;
+	nrg->to = t;
+	return 0;
+}
+
+static long region_chg(struct list_head *head, long f, long t)
+{
+	struct file_region *rg, *nrg;
+	long chg = 0;
+
+	/* Locate the region we are before or in. */
+	list_for_each_entry(rg, head, link)
+		if (f <= rg->to)
+			break;
+
+	/* If we are below the current region then a new region is required.
+	 * Subtle, allocate a new region at the position but make it zero
+	 * size such that we can guarantee to record the reservation. */
+	if (&rg->link == head || t < rg->from) {
+		nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
+		if (!nrg)
+			return -ENOMEM;
+		nrg->from = f;
+		nrg->to   = f;
+		INIT_LIST_HEAD(&nrg->link);
+		list_add(&nrg->link, rg->link.prev);
+
+		return t - f;
+	}
+
+	/* Round our left edge to the current segment if it encloses us. */
+	if (f > rg->from)
+		f = rg->from;
+	chg = t - f;
+
+	/* Check for and consume any regions we now overlap with. */
+	list_for_each_entry(rg, rg->link.prev, link) {
+		if (&rg->link == head)
+			break;
+		if (rg->from > t)
+			return chg;
+
+		/* We overlap with this area, if it extends futher than
+		 * us then we must extend ourselves.  Account for its
+		 * existing reservation. */
+		if (rg->to > t) {
+			chg += rg->to - t;
+			t = rg->to;
+		}
+		chg -= rg->to - rg->from;
+	}
+	return chg;
+}
+
+static long region_truncate(struct list_head *head, long end)
+{
+	struct file_region *rg, *trg;
+	long chg = 0;
+
+	/* Locate the region we are either in or before. */
+	list_for_each_entry(rg, head, link)
+		if (end <= rg->to)
+			break;
+	if (&rg->link == head)
+		return 0;
+
+	/* If we are in the middle of a region then adjust it. */
+	if (end > rg->from) {
+		chg = rg->to - end;
+		rg->to = end;
+		rg = list_entry(rg->link.next, typeof(*rg), link);
+	}
+
+	/* Drop any remaining regions. */
+	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
+		if (&rg->link == head)
+			break;
+		chg += rg->to - rg->from;
+		list_del(&rg->link);
+		kfree(rg);
+	}
+	return chg;
+}
+
+/*
  * Convert the address within this vma to the page offset within
  * the mapping, in base page units.
  */
@@ -649,127 +774,6 @@ static void return_unused_surplus_pages(struct hstate *h,
 	}
 }
 
-struct file_region {
-	struct list_head link;
-	long from;
-	long to;
-};
-
-static long region_add(struct list_head *head, long f, long t)
-{
-	struct file_region *rg, *nrg, *trg;
-
-	/* Locate the region we are either in or before. */
-	list_for_each_entry(rg, head, link)
-		if (f <= rg->to)
-			break;
-
-	/* Round our left edge to the current segment if it encloses us. */
-	if (f > rg->from)
-		f = rg->from;
-
-	/* Check for and consume any regions we now overlap with. */
-	nrg = rg;
-	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
-		if (&rg->link == head)
-			break;
-		if (rg->from > t)
-			break;
-
-		/* If this area reaches higher then extend our area to
-		 * include it completely.  If this is not the first area
-		 * which we intend to reuse, free it. */
-		if (rg->to > t)
-			t = rg->to;
-		if (rg != nrg) {
-			list_del(&rg->link);
-			kfree(rg);
-		}
-	}
-	nrg->from = f;
-	nrg->to = t;
-	return 0;
-}
-
-static long region_chg(struct list_head *head, long f, long t)
-{
-	struct file_region *rg, *nrg;
-	long chg = 0;
-
-	/* Locate the region we are before or in. */
-	list_for_each_entry(rg, head, link)
-		if (f <= rg->to)
-			break;
-
-	/* If we are below the current region then a new region is required.
-	 * Subtle, allocate a new region at the position but make it zero
-	 * size such that we can guarantee to record the reservation. */
-	if (&rg->link == head || t < rg->from) {
-		nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
-		if (!nrg)
-			return -ENOMEM;
-		nrg->from = f;
-		nrg->to   = f;
-		INIT_LIST_HEAD(&nrg->link);
-		list_add(&nrg->link, rg->link.prev);
-
-		return t - f;
-	}
-
-	/* Round our left edge to the current segment if it encloses us. */
-	if (f > rg->from)
-		f = rg->from;
-	chg = t - f;
-
-	/* Check for and consume any regions we now overlap with. */
-	list_for_each_entry(rg, rg->link.prev, link) {
-		if (&rg->link == head)
-			break;
-		if (rg->from > t)
-			return chg;
-
-		/* We overlap with this area, if it extends futher than
-		 * us then we must extend ourselves.  Account for its
-		 * existing reservation. */
-		if (rg->to > t) {
-			chg += rg->to - t;
-			t = rg->to;
-		}
-		chg -= rg->to - rg->from;
-	}
-	return chg;
-}
-
-static long region_truncate(struct list_head *head, long end)
-{
-	struct file_region *rg, *trg;
-	long chg = 0;
-
-	/* Locate the region we are either in or before. */
-	list_for_each_entry(rg, head, link)
-		if (end <= rg->to)
-			break;
-	if (&rg->link == head)
-		return 0;
-
-	/* If we are in the middle of a region then adjust it. */
-	if (end > rg->from) {
-		chg = rg->to - end;
-		rg->to = end;
-		rg = list_entry(rg->link.next, typeof(*rg), link);
-	}
-
-	/* Drop any remaining regions. */
-	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
-		if (&rg->link == head)
-			break;
-		chg += rg->to - rg->from;
-		list_del(&rg->link);
-		kfree(rg);
-	}
-	return chg;
-}
-
 /*
  * Determine if the huge page at addr within the vma has an associated
  * reservation.  Where it does not we will need to logically increase
-- 
1.5.6.205.g7ca3a


^ permalink raw reply related	[flat|nested] 290+ messages in thread

* [PATCH 1/2] hugetlb reservations: move region tracking earlier
@ 2008-06-20 19:17       ` Andy Whitcroft
  0 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-20 19:17 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel, kernel-testers, linux-mm, Mel Gorman,
	Andy Whitcroft

Move the region tracking code much earlier so we can use it for page
presence tracking later on.  No code is changed, just its location.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 mm/hugetlb.c |  246 +++++++++++++++++++++++++++++----------------------------
 1 files changed, 125 insertions(+), 121 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0f76ed1..d701e39 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -47,6 +47,131 @@ static unsigned long __initdata default_hstate_size;
 static DEFINE_SPINLOCK(hugetlb_lock);
 
 /*
+ * Region tracking -- allows tracking of reservations and instantiated pages
+ *                    across the pages in a mapping.
+ */
+struct file_region {
+	struct list_head link;
+	long from;
+	long to;
+};
+
+static long region_add(struct list_head *head, long f, long t)
+{
+	struct file_region *rg, *nrg, *trg;
+
+	/* Locate the region we are either in or before. */
+	list_for_each_entry(rg, head, link)
+		if (f <= rg->to)
+			break;
+
+	/* Round our left edge to the current segment if it encloses us. */
+	if (f > rg->from)
+		f = rg->from;
+
+	/* Check for and consume any regions we now overlap with. */
+	nrg = rg;
+	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
+		if (&rg->link == head)
+			break;
+		if (rg->from > t)
+			break;
+
+		/* If this area reaches higher then extend our area to
+		 * include it completely.  If this is not the first area
+		 * which we intend to reuse, free it. */
+		if (rg->to > t)
+			t = rg->to;
+		if (rg != nrg) {
+			list_del(&rg->link);
+			kfree(rg);
+		}
+	}
+	nrg->from = f;
+	nrg->to = t;
+	return 0;
+}
+
+static long region_chg(struct list_head *head, long f, long t)
+{
+	struct file_region *rg, *nrg;
+	long chg = 0;
+
+	/* Locate the region we are before or in. */
+	list_for_each_entry(rg, head, link)
+		if (f <= rg->to)
+			break;
+
+	/* If we are below the current region then a new region is required.
+	 * Subtle, allocate a new region at the position but make it zero
+	 * size such that we can guarantee to record the reservation. */
+	if (&rg->link == head || t < rg->from) {
+		nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
+		if (!nrg)
+			return -ENOMEM;
+		nrg->from = f;
+		nrg->to   = f;
+		INIT_LIST_HEAD(&nrg->link);
+		list_add(&nrg->link, rg->link.prev);
+
+		return t - f;
+	}
+
+	/* Round our left edge to the current segment if it encloses us. */
+	if (f > rg->from)
+		f = rg->from;
+	chg = t - f;
+
+	/* Check for and consume any regions we now overlap with. */
+	list_for_each_entry(rg, rg->link.prev, link) {
+		if (&rg->link == head)
+			break;
+		if (rg->from > t)
+			return chg;
+
+		/* We overlap with this area, if it extends futher than
+		 * us then we must extend ourselves.  Account for its
+		 * existing reservation. */
+		if (rg->to > t) {
+			chg += rg->to - t;
+			t = rg->to;
+		}
+		chg -= rg->to - rg->from;
+	}
+	return chg;
+}
+
+static long region_truncate(struct list_head *head, long end)
+{
+	struct file_region *rg, *trg;
+	long chg = 0;
+
+	/* Locate the region we are either in or before. */
+	list_for_each_entry(rg, head, link)
+		if (end <= rg->to)
+			break;
+	if (&rg->link == head)
+		return 0;
+
+	/* If we are in the middle of a region then adjust it. */
+	if (end > rg->from) {
+		chg = rg->to - end;
+		rg->to = end;
+		rg = list_entry(rg->link.next, typeof(*rg), link);
+	}
+
+	/* Drop any remaining regions. */
+	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
+		if (&rg->link == head)
+			break;
+		chg += rg->to - rg->from;
+		list_del(&rg->link);
+		kfree(rg);
+	}
+	return chg;
+}
+
+/*
  * Convert the address within this vma to the page offset within
  * the mapping, in base page units.
  */
@@ -649,127 +774,6 @@ static void return_unused_surplus_pages(struct hstate *h,
 	}
 }
 
-struct file_region {
-	struct list_head link;
-	long from;
-	long to;
-};
-
-static long region_add(struct list_head *head, long f, long t)
-{
-	struct file_region *rg, *nrg, *trg;
-
-	/* Locate the region we are either in or before. */
-	list_for_each_entry(rg, head, link)
-		if (f <= rg->to)
-			break;
-
-	/* Round our left edge to the current segment if it encloses us. */
-	if (f > rg->from)
-		f = rg->from;
-
-	/* Check for and consume any regions we now overlap with. */
-	nrg = rg;
-	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
-		if (&rg->link == head)
-			break;
-		if (rg->from > t)
-			break;
-
-		/* If this area reaches higher then extend our area to
-		 * include it completely.  If this is not the first area
-		 * which we intend to reuse, free it. */
-		if (rg->to > t)
-			t = rg->to;
-		if (rg != nrg) {
-			list_del(&rg->link);
-			kfree(rg);
-		}
-	}
-	nrg->from = f;
-	nrg->to = t;
-	return 0;
-}
-
-static long region_chg(struct list_head *head, long f, long t)
-{
-	struct file_region *rg, *nrg;
-	long chg = 0;
-
-	/* Locate the region we are before or in. */
-	list_for_each_entry(rg, head, link)
-		if (f <= rg->to)
-			break;
-
-	/* If we are below the current region then a new region is required.
-	 * Subtle, allocate a new region at the position but make it zero
-	 * size such that we can guarantee to record the reservation. */
-	if (&rg->link == head || t < rg->from) {
-		nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
-		if (!nrg)
-			return -ENOMEM;
-		nrg->from = f;
-		nrg->to   = f;
-		INIT_LIST_HEAD(&nrg->link);
-		list_add(&nrg->link, rg->link.prev);
-
-		return t - f;
-	}
-
-	/* Round our left edge to the current segment if it encloses us. */
-	if (f > rg->from)
-		f = rg->from;
-	chg = t - f;
-
-	/* Check for and consume any regions we now overlap with. */
-	list_for_each_entry(rg, rg->link.prev, link) {
-		if (&rg->link == head)
-			break;
-		if (rg->from > t)
-			return chg;
-
-		/* We overlap with this area, if it extends futher than
-		 * us then we must extend ourselves.  Account for its
-		 * existing reservation. */
-		if (rg->to > t) {
-			chg += rg->to - t;
-			t = rg->to;
-		}
-		chg -= rg->to - rg->from;
-	}
-	return chg;
-}
-
-static long region_truncate(struct list_head *head, long end)
-{
-	struct file_region *rg, *trg;
-	long chg = 0;
-
-	/* Locate the region we are either in or before. */
-	list_for_each_entry(rg, head, link)
-		if (end <= rg->to)
-			break;
-	if (&rg->link == head)
-		return 0;
-
-	/* If we are in the middle of a region then adjust it. */
-	if (end > rg->from) {
-		chg = rg->to - end;
-		rg->to = end;
-		rg = list_entry(rg->link.next, typeof(*rg), link);
-	}
-
-	/* Drop any remaining regions. */
-	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
-		if (&rg->link == head)
-			break;
-		chg += rg->to - rg->from;
-		list_del(&rg->link);
-		kfree(rg);
-	}
-	return chg;
-}
-
 /*
  * Determine if the huge page at addr within the vma has an associated
  * reservation.  Where it does not we will need to logically increase
-- 
1.5.6.205.g7ca3a

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 290+ messages in thread

* [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits
  2008-06-20 19:17     ` Andy Whitcroft
  (?)
@ 2008-06-20 19:17       ` Andy Whitcroft
  -1 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-20 19:17 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel, kernel-testers, linux-mm, Mel Gorman,
	Andy Whitcroft

When a hugetlb mapping with a reservation is split, a new VMA is cloned
from the original.  This new VMA is a direct copy of the original
including the reservation count.  When this pair of VMAs are unmapped
we will incorrect double account the unused reservation and the overall
reservation count will be incorrect, in extreme cases it will wrap.

The problem occurs when we split an existing VMA say to unmap a page in
the middle.  split_vma() will create a new VMA copying all fields from
the original.  As we are storing our reservation count in vm_private_data
this is also copies, endowing the new VMA with a duplicate of the original
VMA's reservation.  Neither of the new VMAs can exhaust these reservations
as they are too small, but when we unmap and close these VMAs we will
incorrect credit the remainder twice and resv_huge_pages will become
out of sync.  This can lead to allocation failures on mappings with
reservations and even to resv_huge_pages wrapping which prevents all
subsequent hugepage allocations.

The simple fix would be to correctly apportion the remaining reservation
count when the split is made.  However the only hook we have vm_ops->open
only has the new VMA we do not know the identity of the preceeding VMA.
Also even if we did have that VMA to hand we do not know how much of the
reservation was consumed each side of the split.

This patch therefore takes a different tack.  We know that the whole of any
private mapping (which has a reservation) has a reservation over its whole
size.  Any present pages represent consumed reservation.  Therefore if
we track the instantiated pages we can calculate the remaining reservation.

This patch reuses the existing regions code to track the regions for which
we have consumed reservation (ie. the instantiated pages), as each page
is faulted in we record the consumption of reservation for the new page.
When we need to return unused reservations at unmap time we simply count
the consumed reservation region subtracting that from the whole of the map.
During a VMA split the newly opened VMA will point to the same region map,
as this map is offset oriented it remains valid for both of the split VMAs.
This map is referenced counted so that it is removed when all VMAs which
are part of the mmap are gone.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 mm/hugetlb.c |  151 ++++++++++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 126 insertions(+), 25 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d701e39..ecff986 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -171,6 +171,30 @@ static long region_truncate(struct list_head *head, long end)
 	return chg;
 }
 
+static long region_count(struct list_head *head, long f, long t)
+{
+	struct file_region *rg;
+	long chg = 0;
+
+	/* Locate each segment we overlap with, and count that overlap. */
+	list_for_each_entry(rg, head, link) {
+		int seg_from;
+		int seg_to;
+
+		if (rg->to <= f)
+			continue;
+		if (rg->from >= t)
+			break;
+
+		seg_from = max(rg->from, f);
+		seg_to = min(rg->to, t);
+
+		chg += seg_to - seg_from;
+	}
+
+	return chg;
+}
+
 /*
  * Convert the address within this vma to the page offset within
  * the mapping, in base page units.
@@ -193,9 +217,14 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
 			(vma->vm_pgoff >> huge_page_order(h));
 }
 
-#define HPAGE_RESV_OWNER    (1UL << (BITS_PER_LONG - 1))
-#define HPAGE_RESV_UNMAPPED (1UL << (BITS_PER_LONG - 2))
+/*
+ * Flags for MAP_PRIVATE reservations.  These are stored in the bottom
+ * bits of the reservation map pointer.
+ */
+#define HPAGE_RESV_OWNER    (1UL << 0)
+#define HPAGE_RESV_UNMAPPED (1UL << 1)
 #define HPAGE_RESV_MASK (HPAGE_RESV_OWNER | HPAGE_RESV_UNMAPPED)
+
 /*
  * These helpers are used to track how many pages are reserved for
  * faults in a MAP_PRIVATE mapping. Only the process that called mmap()
@@ -205,6 +234,15 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
  * the reserve counters are updated with the hugetlb_lock held. It is safe
  * to reset the VMA at fork() time as it is not in use yet and there is no
  * chance of the global counters getting corrupted as a result of the values.
+ *
+ * The private mapping reservation is represented in a subtly different
+ * manner to a shared mapping.  A shared mapping has a region map associated
+ * with the underlying file, this region map represents the backing file
+ * pages which have had a reservation taken and this persists even after
+ * the page is instantiated.  A private mapping has a region map associated
+ * with the original mmap which is attached to all VMAs which reference it,
+ * this region map represents those offsets which have consumed reservation
+ * ie. where pages have been instantiated.
  */
 static unsigned long get_vma_private_data(struct vm_area_struct *vma)
 {
@@ -217,22 +255,44 @@ static void set_vma_private_data(struct vm_area_struct *vma,
 	vma->vm_private_data = (void *)value;
 }
 
-static unsigned long vma_resv_huge_pages(struct vm_area_struct *vma)
+struct resv_map {
+	struct kref refs;
+	struct list_head regions;
+};
+
+struct resv_map *resv_map_alloc(void)
+{
+	struct resv_map *resv_map = kmalloc(sizeof(*resv_map), GFP_KERNEL);
+	if (!resv_map)
+		return NULL;
+
+	kref_init(&resv_map->refs);
+	INIT_LIST_HEAD(&resv_map->regions);
+
+	return resv_map;
+}
+
+void resv_map_release(struct kref *ref)
+{
+        struct resv_map *resv_map = container_of(ref, struct resv_map, refs);
+
+	region_truncate(&resv_map->regions, 0);
+	kfree(resv_map);
+}
+
+static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
 {
 	VM_BUG_ON(!is_vm_hugetlb_page(vma));
 	if (!(vma->vm_flags & VM_SHARED))
-		return get_vma_private_data(vma) & ~HPAGE_RESV_MASK;
+		return (struct resv_map *)(get_vma_private_data(vma) &
+							~HPAGE_RESV_MASK);
 	return 0;
 }
 
-static void set_vma_resv_huge_pages(struct vm_area_struct *vma,
-							unsigned long reserve)
+static void set_vma_resv_map(struct vm_area_struct *vma, struct resv_map *map)
 {
-	VM_BUG_ON(!is_vm_hugetlb_page(vma));
-	VM_BUG_ON(vma->vm_flags & VM_SHARED);
-
-	set_vma_private_data(vma,
-		(get_vma_private_data(vma) & HPAGE_RESV_MASK) | reserve);
+	set_vma_private_data(vma, (get_vma_private_data(vma) &
+				HPAGE_RESV_MASK) | (unsigned long)map);
 }
 
 static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
@@ -251,11 +311,11 @@ static int is_vma_resv_set(struct vm_area_struct *vma, unsigned long flag)
 }
 
 /* Decrement the reserved pages in the hugepage pool by one */
-static void decrement_hugepage_resv_vma(struct hstate *h,
-			struct vm_area_struct *vma)
+static int decrement_hugepage_resv_vma(struct hstate *h,
+			struct vm_area_struct *vma, unsigned long address)
 {
 	if (vma->vm_flags & VM_NORESERVE)
-		return;
+		return 0;
 
 	if (vma->vm_flags & VM_SHARED) {
 		/* Shared mappings always use reserves */
@@ -266,14 +326,19 @@ static void decrement_hugepage_resv_vma(struct hstate *h,
 		 * private mappings.
 		 */
 		if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
-			unsigned long flags, reserve;
+			unsigned long idx = vma_pagecache_offset(h,
+							vma, address);
+			struct resv_map *reservations = vma_resv_map(vma);
+
 			h->resv_huge_pages--;
-			flags = (unsigned long)vma->vm_private_data &
-							HPAGE_RESV_MASK;
-			reserve = (unsigned long)vma->vm_private_data - 1;
-			vma->vm_private_data = (void *)(reserve | flags);
+
+			/* Mark this page used in the map. */
+			if (region_chg(&reservations->regions, idx, idx + 1) < 0)
+				return -1;
+			region_add(&reservations->regions, idx, idx + 1);
 		}
 	}
+	return 0;
 }
 
 /* Reset counters to 0 and clear all HPAGE_RESV_* flags */
@@ -289,7 +354,7 @@ static int vma_has_private_reserves(struct vm_area_struct *vma)
 {
 	if (vma->vm_flags & VM_SHARED)
 		return 0;
-	if (!vma_resv_huge_pages(vma))
+	if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER))
 		return 0;
 	return 1;
 }
@@ -376,15 +441,16 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
 		nid = zone_to_nid(zone);
 		if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask) &&
 		    !list_empty(&h->hugepage_freelists[nid])) {
+			if (!avoid_reserve &&
+			    decrement_hugepage_resv_vma(h, vma, address) < 0)
+				return NULL;
+
 			page = list_entry(h->hugepage_freelists[nid].next,
 					  struct page, lru);
 			list_del(&page->lru);
 			h->free_huge_pages--;
 			h->free_huge_pages_node[nid]--;
 
-			if (!avoid_reserve)
-				decrement_hugepage_resv_vma(h, vma);
-
 			break;
 		}
 	}
@@ -1456,10 +1522,39 @@ out:
 	return ret;
 }
 
+static void hugetlb_vm_op_open(struct vm_area_struct *vma)
+{
+ 	struct resv_map *reservations = vma_resv_map(vma);
+
+	/*
+	 * This new VMA will share its siblings reservation map.  The open
+	 * vm_op is only called for newly created VMAs which have been made
+	 * from another, still existing VMA.  As that VMA has a reference to
+	 * this reservation map the reservation map cannot disappear until
+	 * after this open completes.  It is therefore safe to take a new
+	 * reference here without additional locking.
+	 */
+	if (reservations)
+		kref_get(&reservations->refs);
+}
+
 static void hugetlb_vm_op_close(struct vm_area_struct *vma)
 {
 	struct hstate *h = hstate_vma(vma);
-	unsigned long reserve = vma_resv_huge_pages(vma);
+ 	struct resv_map *reservations = vma_resv_map(vma);
+	unsigned long reserve = 0;
+	unsigned long start;
+	unsigned long end;
+
+	if (reservations) {
+		start = vma_pagecache_offset(h, vma, vma->vm_start);
+		end = vma_pagecache_offset(h, vma, vma->vm_end);
+
+		reserve = (end - start) -
+			region_count(&reservations->regions, start, end);
+
+		kref_put(&reservations->refs, resv_map_release);
+	}
 
 	if (reserve)
 		hugetlb_acct_memory(h, -reserve);
@@ -1479,6 +1574,7 @@ static int hugetlb_vm_op_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 
 struct vm_operations_struct hugetlb_vm_ops = {
 	.fault = hugetlb_vm_op_fault,
+	.open = hugetlb_vm_op_open,
 	.close = hugetlb_vm_op_close,
 };
 
@@ -2037,8 +2133,13 @@ int hugetlb_reserve_pages(struct inode *inode,
 	if (!vma || vma->vm_flags & VM_SHARED)
 		chg = region_chg(&inode->i_mapping->private_list, from, to);
 	else {
+		struct resv_map *resv_map = resv_map_alloc();
+		if (!resv_map)
+			return -ENOMEM;
+
 		chg = to - from;
-		set_vma_resv_huge_pages(vma, chg);
+
+		set_vma_resv_map(vma, resv_map);
 		set_vma_resv_flags(vma, HPAGE_RESV_OWNER);
 	}
 
-- 
1.5.6.205.g7ca3a


^ permalink raw reply related	[flat|nested] 290+ messages in thread

* [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits
@ 2008-06-20 19:17       ` Andy Whitcroft
  0 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-20 19:17 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel, kernel-testers, linux-mm, Mel Gorman,
	Andy Whitcroft

When a hugetlb mapping with a reservation is split, a new VMA is cloned
from the original.  This new VMA is a direct copy of the original
including the reservation count.  When this pair of VMAs are unmapped
we will incorrect double account the unused reservation and the overall
reservation count will be incorrect, in extreme cases it will wrap.

The problem occurs when we split an existing VMA say to unmap a page in
the middle.  split_vma() will create a new VMA copying all fields from
the original.  As we are storing our reservation count in vm_private_data
this is also copies, endowing the new VMA with a duplicate of the original
VMA's reservation.  Neither of the new VMAs can exhaust these reservations
as they are too small, but when we unmap and close these VMAs we will
incorrect credit the remainder twice and resv_huge_pages will become
out of sync.  This can lead to allocation failures on mappings with
reservations and even to resv_huge_pages wrapping which prevents all
subsequent hugepage allocations.

The simple fix would be to correctly apportion the remaining reservation
count when the split is made.  However the only hook we have vm_ops->open
only has the new VMA we do not know the identity of the preceeding VMA.
Also even if we did have that VMA to hand we do not know how much of the
reservation was consumed each side of the split.

This patch therefore takes a different tack.  We know that the whole of any
private mapping (which has a reservation) has a reservation over its whole
size.  Any present pages represent consumed reservation.  Therefore if
we track the instantiated pages we can calculate the remaining reservation.

This patch reuses the existing regions code to track the regions for which
we have consumed reservation (ie. the instantiated pages), as each page
is faulted in we record the consumption of reservation for the new page.
When we need to return unused reservations at unmap time we simply count
the consumed reservation region subtracting that from the whole of the map.
During a VMA split the newly opened VMA will point to the same region map,
as this map is offset oriented it remains valid for both of the split VMAs.
This map is referenced counted so that it is removed when all VMAs which
are part of the mmap are gone.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 mm/hugetlb.c |  151 ++++++++++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 126 insertions(+), 25 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d701e39..ecff986 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -171,6 +171,30 @@ static long region_truncate(struct list_head *head, long end)
 	return chg;
 }
 
+static long region_count(struct list_head *head, long f, long t)
+{
+	struct file_region *rg;
+	long chg = 0;
+
+	/* Locate each segment we overlap with, and count that overlap. */
+	list_for_each_entry(rg, head, link) {
+		int seg_from;
+		int seg_to;
+
+		if (rg->to <= f)
+			continue;
+		if (rg->from >= t)
+			break;
+
+		seg_from = max(rg->from, f);
+		seg_to = min(rg->to, t);
+
+		chg += seg_to - seg_from;
+	}
+
+	return chg;
+}
+
 /*
  * Convert the address within this vma to the page offset within
  * the mapping, in base page units.
@@ -193,9 +217,14 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
 			(vma->vm_pgoff >> huge_page_order(h));
 }
 
-#define HPAGE_RESV_OWNER    (1UL << (BITS_PER_LONG - 1))
-#define HPAGE_RESV_UNMAPPED (1UL << (BITS_PER_LONG - 2))
+/*
+ * Flags for MAP_PRIVATE reservations.  These are stored in the bottom
+ * bits of the reservation map pointer.
+ */
+#define HPAGE_RESV_OWNER    (1UL << 0)
+#define HPAGE_RESV_UNMAPPED (1UL << 1)
 #define HPAGE_RESV_MASK (HPAGE_RESV_OWNER | HPAGE_RESV_UNMAPPED)
+
 /*
  * These helpers are used to track how many pages are reserved for
  * faults in a MAP_PRIVATE mapping. Only the process that called mmap()
@@ -205,6 +234,15 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
  * the reserve counters are updated with the hugetlb_lock held. It is safe
  * to reset the VMA at fork() time as it is not in use yet and there is no
  * chance of the global counters getting corrupted as a result of the values.
+ *
+ * The private mapping reservation is represented in a subtly different
+ * manner to a shared mapping.  A shared mapping has a region map associated
+ * with the underlying file, this region map represents the backing file
+ * pages which have had a reservation taken and this persists even after
+ * the page is instantiated.  A private mapping has a region map associated
+ * with the original mmap which is attached to all VMAs which reference it,
+ * this region map represents those offsets which have consumed reservation
+ * ie. where pages have been instantiated.
  */
 static unsigned long get_vma_private_data(struct vm_area_struct *vma)
 {
@@ -217,22 +255,44 @@ static void set_vma_private_data(struct vm_area_struct *vma,
 	vma->vm_private_data = (void *)value;
 }
 
-static unsigned long vma_resv_huge_pages(struct vm_area_struct *vma)
+struct resv_map {
+	struct kref refs;
+	struct list_head regions;
+};
+
+struct resv_map *resv_map_alloc(void)
+{
+	struct resv_map *resv_map = kmalloc(sizeof(*resv_map), GFP_KERNEL);
+	if (!resv_map)
+		return NULL;
+
+	kref_init(&resv_map->refs);
+	INIT_LIST_HEAD(&resv_map->regions);
+
+	return resv_map;
+}
+
+void resv_map_release(struct kref *ref)
+{
+        struct resv_map *resv_map = container_of(ref, struct resv_map, refs);
+
+	region_truncate(&resv_map->regions, 0);
+	kfree(resv_map);
+}
+
+static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
 {
 	VM_BUG_ON(!is_vm_hugetlb_page(vma));
 	if (!(vma->vm_flags & VM_SHARED))
-		return get_vma_private_data(vma) & ~HPAGE_RESV_MASK;
+		return (struct resv_map *)(get_vma_private_data(vma) &
+							~HPAGE_RESV_MASK);
 	return 0;
 }
 
-static void set_vma_resv_huge_pages(struct vm_area_struct *vma,
-							unsigned long reserve)
+static void set_vma_resv_map(struct vm_area_struct *vma, struct resv_map *map)
 {
-	VM_BUG_ON(!is_vm_hugetlb_page(vma));
-	VM_BUG_ON(vma->vm_flags & VM_SHARED);
-
-	set_vma_private_data(vma,
-		(get_vma_private_data(vma) & HPAGE_RESV_MASK) | reserve);
+	set_vma_private_data(vma, (get_vma_private_data(vma) &
+				HPAGE_RESV_MASK) | (unsigned long)map);
 }
 
 static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
@@ -251,11 +311,11 @@ static int is_vma_resv_set(struct vm_area_struct *vma, unsigned long flag)
 }
 
 /* Decrement the reserved pages in the hugepage pool by one */
-static void decrement_hugepage_resv_vma(struct hstate *h,
-			struct vm_area_struct *vma)
+static int decrement_hugepage_resv_vma(struct hstate *h,
+			struct vm_area_struct *vma, unsigned long address)
 {
 	if (vma->vm_flags & VM_NORESERVE)
-		return;
+		return 0;
 
 	if (vma->vm_flags & VM_SHARED) {
 		/* Shared mappings always use reserves */
@@ -266,14 +326,19 @@ static void decrement_hugepage_resv_vma(struct hstate *h,
 		 * private mappings.
 		 */
 		if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
-			unsigned long flags, reserve;
+			unsigned long idx = vma_pagecache_offset(h,
+							vma, address);
+			struct resv_map *reservations = vma_resv_map(vma);
+
 			h->resv_huge_pages--;
-			flags = (unsigned long)vma->vm_private_data &
-							HPAGE_RESV_MASK;
-			reserve = (unsigned long)vma->vm_private_data - 1;
-			vma->vm_private_data = (void *)(reserve | flags);
+
+			/* Mark this page used in the map. */
+			if (region_chg(&reservations->regions, idx, idx + 1) < 0)
+				return -1;
+			region_add(&reservations->regions, idx, idx + 1);
 		}
 	}
+	return 0;
 }
 
 /* Reset counters to 0 and clear all HPAGE_RESV_* flags */
@@ -289,7 +354,7 @@ static int vma_has_private_reserves(struct vm_area_struct *vma)
 {
 	if (vma->vm_flags & VM_SHARED)
 		return 0;
-	if (!vma_resv_huge_pages(vma))
+	if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER))
 		return 0;
 	return 1;
 }
@@ -376,15 +441,16 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
 		nid = zone_to_nid(zone);
 		if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask) &&
 		    !list_empty(&h->hugepage_freelists[nid])) {
+			if (!avoid_reserve &&
+			    decrement_hugepage_resv_vma(h, vma, address) < 0)
+				return NULL;
+
 			page = list_entry(h->hugepage_freelists[nid].next,
 					  struct page, lru);
 			list_del(&page->lru);
 			h->free_huge_pages--;
 			h->free_huge_pages_node[nid]--;
 
-			if (!avoid_reserve)
-				decrement_hugepage_resv_vma(h, vma);
-
 			break;
 		}
 	}
@@ -1456,10 +1522,39 @@ out:
 	return ret;
 }
 
+static void hugetlb_vm_op_open(struct vm_area_struct *vma)
+{
+ 	struct resv_map *reservations = vma_resv_map(vma);
+
+	/*
+	 * This new VMA will share its siblings reservation map.  The open
+	 * vm_op is only called for newly created VMAs which have been made
+	 * from another, still existing VMA.  As that VMA has a reference to
+	 * this reservation map the reservation map cannot disappear until
+	 * after this open completes.  It is therefore safe to take a new
+	 * reference here without additional locking.
+	 */
+	if (reservations)
+		kref_get(&reservations->refs);
+}
+
 static void hugetlb_vm_op_close(struct vm_area_struct *vma)
 {
 	struct hstate *h = hstate_vma(vma);
-	unsigned long reserve = vma_resv_huge_pages(vma);
+ 	struct resv_map *reservations = vma_resv_map(vma);
+	unsigned long reserve = 0;
+	unsigned long start;
+	unsigned long end;
+
+	if (reservations) {
+		start = vma_pagecache_offset(h, vma, vma->vm_start);
+		end = vma_pagecache_offset(h, vma, vma->vm_end);
+
+		reserve = (end - start) -
+			region_count(&reservations->regions, start, end);
+
+		kref_put(&reservations->refs, resv_map_release);
+	}
 
 	if (reserve)
 		hugetlb_acct_memory(h, -reserve);
@@ -1479,6 +1574,7 @@ static int hugetlb_vm_op_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 
 struct vm_operations_struct hugetlb_vm_ops = {
 	.fault = hugetlb_vm_op_fault,
+	.open = hugetlb_vm_op_open,
 	.close = hugetlb_vm_op_close,
 };
 
@@ -2037,8 +2133,13 @@ int hugetlb_reserve_pages(struct inode *inode,
 	if (!vma || vma->vm_flags & VM_SHARED)
 		chg = region_chg(&inode->i_mapping->private_list, from, to);
 	else {
+		struct resv_map *resv_map = resv_map_alloc();
+		if (!resv_map)
+			return -ENOMEM;
+
 		chg = to - from;
-		set_vma_resv_huge_pages(vma, chg);
+
+		set_vma_resv_map(vma, resv_map);
 		set_vma_resv_flags(vma, HPAGE_RESV_OWNER);
 	}
 
-- 
1.5.6.205.g7ca3a

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 290+ messages in thread

* [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits
@ 2008-06-20 19:17       ` Andy Whitcroft
  0 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-20 19:17 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Mel Gorman, Andy Whitcroft

When a hugetlb mapping with a reservation is split, a new VMA is cloned
from the original.  This new VMA is a direct copy of the original
including the reservation count.  When this pair of VMAs are unmapped
we will incorrect double account the unused reservation and the overall
reservation count will be incorrect, in extreme cases it will wrap.

The problem occurs when we split an existing VMA say to unmap a page in
the middle.  split_vma() will create a new VMA copying all fields from
the original.  As we are storing our reservation count in vm_private_data
this is also copies, endowing the new VMA with a duplicate of the original
VMA's reservation.  Neither of the new VMAs can exhaust these reservations
as they are too small, but when we unmap and close these VMAs we will
incorrect credit the remainder twice and resv_huge_pages will become
out of sync.  This can lead to allocation failures on mappings with
reservations and even to resv_huge_pages wrapping which prevents all
subsequent hugepage allocations.

The simple fix would be to correctly apportion the remaining reservation
count when the split is made.  However the only hook we have vm_ops->open
only has the new VMA we do not know the identity of the preceeding VMA.
Also even if we did have that VMA to hand we do not know how much of the
reservation was consumed each side of the split.

This patch therefore takes a different tack.  We know that the whole of any
private mapping (which has a reservation) has a reservation over its whole
size.  Any present pages represent consumed reservation.  Therefore if
we track the instantiated pages we can calculate the remaining reservation.

This patch reuses the existing regions code to track the regions for which
we have consumed reservation (ie. the instantiated pages), as each page
is faulted in we record the consumption of reservation for the new page.
When we need to return unused reservations at unmap time we simply count
the consumed reservation region subtracting that from the whole of the map.
During a VMA split the newly opened VMA will point to the same region map,
as this map is offset oriented it remains valid for both of the split VMAs.
This map is referenced counted so that it is removed when all VMAs which
are part of the mmap are gone.

Signed-off-by: Andy Whitcroft <apw-26w3C0LaAnFg9hUCZPvPmw@public.gmane.org>
---
 mm/hugetlb.c |  151 ++++++++++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 126 insertions(+), 25 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d701e39..ecff986 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -171,6 +171,30 @@ static long region_truncate(struct list_head *head, long end)
 	return chg;
 }
 
+static long region_count(struct list_head *head, long f, long t)
+{
+	struct file_region *rg;
+	long chg = 0;
+
+	/* Locate each segment we overlap with, and count that overlap. */
+	list_for_each_entry(rg, head, link) {
+		int seg_from;
+		int seg_to;
+
+		if (rg->to <= f)
+			continue;
+		if (rg->from >= t)
+			break;
+
+		seg_from = max(rg->from, f);
+		seg_to = min(rg->to, t);
+
+		chg += seg_to - seg_from;
+	}
+
+	return chg;
+}
+
 /*
  * Convert the address within this vma to the page offset within
  * the mapping, in base page units.
@@ -193,9 +217,14 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
 			(vma->vm_pgoff >> huge_page_order(h));
 }
 
-#define HPAGE_RESV_OWNER    (1UL << (BITS_PER_LONG - 1))
-#define HPAGE_RESV_UNMAPPED (1UL << (BITS_PER_LONG - 2))
+/*
+ * Flags for MAP_PRIVATE reservations.  These are stored in the bottom
+ * bits of the reservation map pointer.
+ */
+#define HPAGE_RESV_OWNER    (1UL << 0)
+#define HPAGE_RESV_UNMAPPED (1UL << 1)
 #define HPAGE_RESV_MASK (HPAGE_RESV_OWNER | HPAGE_RESV_UNMAPPED)
+
 /*
  * These helpers are used to track how many pages are reserved for
  * faults in a MAP_PRIVATE mapping. Only the process that called mmap()
@@ -205,6 +234,15 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
  * the reserve counters are updated with the hugetlb_lock held. It is safe
  * to reset the VMA at fork() time as it is not in use yet and there is no
  * chance of the global counters getting corrupted as a result of the values.
+ *
+ * The private mapping reservation is represented in a subtly different
+ * manner to a shared mapping.  A shared mapping has a region map associated
+ * with the underlying file, this region map represents the backing file
+ * pages which have had a reservation taken and this persists even after
+ * the page is instantiated.  A private mapping has a region map associated
+ * with the original mmap which is attached to all VMAs which reference it,
+ * this region map represents those offsets which have consumed reservation
+ * ie. where pages have been instantiated.
  */
 static unsigned long get_vma_private_data(struct vm_area_struct *vma)
 {
@@ -217,22 +255,44 @@ static void set_vma_private_data(struct vm_area_struct *vma,
 	vma->vm_private_data = (void *)value;
 }
 
-static unsigned long vma_resv_huge_pages(struct vm_area_struct *vma)
+struct resv_map {
+	struct kref refs;
+	struct list_head regions;
+};
+
+struct resv_map *resv_map_alloc(void)
+{
+	struct resv_map *resv_map = kmalloc(sizeof(*resv_map), GFP_KERNEL);
+	if (!resv_map)
+		return NULL;
+
+	kref_init(&resv_map->refs);
+	INIT_LIST_HEAD(&resv_map->regions);
+
+	return resv_map;
+}
+
+void resv_map_release(struct kref *ref)
+{
+        struct resv_map *resv_map = container_of(ref, struct resv_map, refs);
+
+	region_truncate(&resv_map->regions, 0);
+	kfree(resv_map);
+}
+
+static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
 {
 	VM_BUG_ON(!is_vm_hugetlb_page(vma));
 	if (!(vma->vm_flags & VM_SHARED))
-		return get_vma_private_data(vma) & ~HPAGE_RESV_MASK;
+		return (struct resv_map *)(get_vma_private_data(vma) &
+							~HPAGE_RESV_MASK);
 	return 0;
 }
 
-static void set_vma_resv_huge_pages(struct vm_area_struct *vma,
-							unsigned long reserve)
+static void set_vma_resv_map(struct vm_area_struct *vma, struct resv_map *map)
 {
-	VM_BUG_ON(!is_vm_hugetlb_page(vma));
-	VM_BUG_ON(vma->vm_flags & VM_SHARED);
-
-	set_vma_private_data(vma,
-		(get_vma_private_data(vma) & HPAGE_RESV_MASK) | reserve);
+	set_vma_private_data(vma, (get_vma_private_data(vma) &
+				HPAGE_RESV_MASK) | (unsigned long)map);
 }
 
 static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
@@ -251,11 +311,11 @@ static int is_vma_resv_set(struct vm_area_struct *vma, unsigned long flag)
 }
 
 /* Decrement the reserved pages in the hugepage pool by one */
-static void decrement_hugepage_resv_vma(struct hstate *h,
-			struct vm_area_struct *vma)
+static int decrement_hugepage_resv_vma(struct hstate *h,
+			struct vm_area_struct *vma, unsigned long address)
 {
 	if (vma->vm_flags & VM_NORESERVE)
-		return;
+		return 0;
 
 	if (vma->vm_flags & VM_SHARED) {
 		/* Shared mappings always use reserves */
@@ -266,14 +326,19 @@ static void decrement_hugepage_resv_vma(struct hstate *h,
 		 * private mappings.
 		 */
 		if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
-			unsigned long flags, reserve;
+			unsigned long idx = vma_pagecache_offset(h,
+							vma, address);
+			struct resv_map *reservations = vma_resv_map(vma);
+
 			h->resv_huge_pages--;
-			flags = (unsigned long)vma->vm_private_data &
-							HPAGE_RESV_MASK;
-			reserve = (unsigned long)vma->vm_private_data - 1;
-			vma->vm_private_data = (void *)(reserve | flags);
+
+			/* Mark this page used in the map. */
+			if (region_chg(&reservations->regions, idx, idx + 1) < 0)
+				return -1;
+			region_add(&reservations->regions, idx, idx + 1);
 		}
 	}
+	return 0;
 }
 
 /* Reset counters to 0 and clear all HPAGE_RESV_* flags */
@@ -289,7 +354,7 @@ static int vma_has_private_reserves(struct vm_area_struct *vma)
 {
 	if (vma->vm_flags & VM_SHARED)
 		return 0;
-	if (!vma_resv_huge_pages(vma))
+	if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER))
 		return 0;
 	return 1;
 }
@@ -376,15 +441,16 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
 		nid = zone_to_nid(zone);
 		if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask) &&
 		    !list_empty(&h->hugepage_freelists[nid])) {
+			if (!avoid_reserve &&
+			    decrement_hugepage_resv_vma(h, vma, address) < 0)
+				return NULL;
+
 			page = list_entry(h->hugepage_freelists[nid].next,
 					  struct page, lru);
 			list_del(&page->lru);
 			h->free_huge_pages--;
 			h->free_huge_pages_node[nid]--;
 
-			if (!avoid_reserve)
-				decrement_hugepage_resv_vma(h, vma);
-
 			break;
 		}
 	}
@@ -1456,10 +1522,39 @@ out:
 	return ret;
 }
 
+static void hugetlb_vm_op_open(struct vm_area_struct *vma)
+{
+ 	struct resv_map *reservations = vma_resv_map(vma);
+
+	/*
+	 * This new VMA will share its siblings reservation map.  The open
+	 * vm_op is only called for newly created VMAs which have been made
+	 * from another, still existing VMA.  As that VMA has a reference to
+	 * this reservation map the reservation map cannot disappear until
+	 * after this open completes.  It is therefore safe to take a new
+	 * reference here without additional locking.
+	 */
+	if (reservations)
+		kref_get(&reservations->refs);
+}
+
 static void hugetlb_vm_op_close(struct vm_area_struct *vma)
 {
 	struct hstate *h = hstate_vma(vma);
-	unsigned long reserve = vma_resv_huge_pages(vma);
+ 	struct resv_map *reservations = vma_resv_map(vma);
+	unsigned long reserve = 0;
+	unsigned long start;
+	unsigned long end;
+
+	if (reservations) {
+		start = vma_pagecache_offset(h, vma, vma->vm_start);
+		end = vma_pagecache_offset(h, vma, vma->vm_end);
+
+		reserve = (end - start) -
+			region_count(&reservations->regions, start, end);
+
+		kref_put(&reservations->refs, resv_map_release);
+	}
 
 	if (reserve)
 		hugetlb_acct_memory(h, -reserve);
@@ -1479,6 +1574,7 @@ static int hugetlb_vm_op_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 
 struct vm_operations_struct hugetlb_vm_ops = {
 	.fault = hugetlb_vm_op_fault,
+	.open = hugetlb_vm_op_open,
 	.close = hugetlb_vm_op_close,
 };
 
@@ -2037,8 +2133,13 @@ int hugetlb_reserve_pages(struct inode *inode,
 	if (!vma || vma->vm_flags & VM_SHARED)
 		chg = region_chg(&inode->i_mapping->private_list, from, to);
 	else {
+		struct resv_map *resv_map = resv_map_alloc();
+		if (!resv_map)
+			return -ENOMEM;
+
 		chg = to - from;
-		set_vma_resv_huge_pages(vma, chg);
+
+		set_vma_resv_map(vma, resv_map);
 		set_vma_resv_flags(vma, HPAGE_RESV_OWNER);
 	}
 
-- 
1.5.6.205.g7ca3a

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
  2008-06-20 17:10                 ` Lee Schermerhorn
  (?)
@ 2008-06-20 20:41                   ` Lee Schermerhorn
  -1 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-20 20:41 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Fri, 2008-06-20 at 13:10 -0400, Lee Schermerhorn wrote:
> On Fri, 2008-06-20 at 10:13 +0900, KAMEZAWA Hiroyuki wrote:
> > Lee-san, this is an additonal one..
> > Not-tested-yet, just by review.
> 
> OK, I'll test this on my x86_64 platform, which doesn't seem to hit the
> soft lockups.
> 

Quick update:  

With this patch applied, at ~ 1.5 hours into the test, my system panic'd
[panic_on_oops set] with a BUG in __find_get_block() -- looks like the
BUG_ON() in check_irqs_on() called from bh_lru_install() inlined by
__find_get_block().  Before the panic occurred, I saw warnings from
native_smp_call_function_mask() [arch/x86/kernel/smp.c]--also because
irqs_disabled().

I'll back out the changes [spin_[un]lock() => spin_[un]lock_irq()] to
shrink_inactive_list() and try again.  Just a hunch.

Lee


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-20 20:41                   ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-20 20:41 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Fri, 2008-06-20 at 13:10 -0400, Lee Schermerhorn wrote:
> On Fri, 2008-06-20 at 10:13 +0900, KAMEZAWA Hiroyuki wrote:
> > Lee-san, this is an additonal one..
> > Not-tested-yet, just by review.
> 
> OK, I'll test this on my x86_64 platform, which doesn't seem to hit the
> soft lockups.
> 

Quick update:  

With this patch applied, at ~ 1.5 hours into the test, my system panic'd
[panic_on_oops set] with a BUG in __find_get_block() -- looks like the
BUG_ON() in check_irqs_on() called from bh_lru_install() inlined by
__find_get_block().  Before the panic occurred, I saw warnings from
native_smp_call_function_mask() [arch/x86/kernel/smp.c]--also because
irqs_disabled().

I'll back out the changes [spin_[un]lock() => spin_[un]lock_irq()] to
shrink_inactive_list() and try again.  Just a hunch.

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-20 20:41                   ` Lee Schermerhorn
  0 siblings, 0 replies; 290+ messages in thread
From: Lee Schermerhorn @ 2008-06-20 20:41 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, Rik van Riel, Kosaki Motohiro,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Fri, 2008-06-20 at 13:10 -0400, Lee Schermerhorn wrote:
> On Fri, 2008-06-20 at 10:13 +0900, KAMEZAWA Hiroyuki wrote:
> > Lee-san, this is an additonal one..
> > Not-tested-yet, just by review.
> 
> OK, I'll test this on my x86_64 platform, which doesn't seem to hit the
> soft lockups.
> 

Quick update:  

With this patch applied, at ~ 1.5 hours into the test, my system panic'd
[panic_on_oops set] with a BUG in __find_get_block() -- looks like the
BUG_ON() in check_irqs_on() called from bh_lru_install() inlined by
__find_get_block().  Before the panic occurred, I saw warnings from
native_smp_call_function_mask() [arch/x86/kernel/smp.c]--also because
irqs_disabled().

I'll back out the changes [spin_[un]lock() => spin_[un]lock_irq()] to
shrink_inactive_list() and try again.  Just a hunch.

Lee

--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
  2008-06-20  1:13               ` KAMEZAWA Hiroyuki
  (?)
@ 2008-06-21  8:39                 ` KOSAKI Motohiro
  -1 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-21  8:39 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro, Lee Schermerhorn, Daisuke Nishimura,
	Andrew Morton, Rik van Riel, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers

Hi

> Lee-san, this is an additonal one..
> Not-tested-yet, just by review.
> 
> Fixing page_lock() <-> zone->lock nesting of bad-behavior.
> 
> Before:
>       lock_page()(TestSetPageLocked())
>       spin_lock(zone->lock)
>       unlock_page()
>       spin_unlock(zone->lock)  
> After:
>       spin_lock(zone->lock)
>       spin_unlock(zone->lock)


Good catch!

> 
> Including nit-pick fix. (I'll ask Kosaki-san to merge this to his 5/5)
> 
> Hmm...
> 
> ---
>  mm/vmscan.c |   25 +++++--------------------
>  1 file changed, 5 insertions(+), 20 deletions(-)
> 
> Index: test-2.6.26-rc5-mm3/mm/vmscan.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/vmscan.c
> +++ test-2.6.26-rc5-mm3/mm/vmscan.c
> @@ -1106,7 +1106,7 @@ static unsigned long shrink_inactive_lis
>  		if (nr_taken == 0)
>  			goto done;
>  
> -		spin_lock(&zone->lru_lock);
> +		spin_lock_irq(&zone->lru_lock);
>  		/*
>  		 * Put back any unfreeable pages.
>  		 */
> @@ -1136,9 +1136,8 @@ static unsigned long shrink_inactive_lis
>  			}
>  		}
>    	} while (nr_scanned < max_scan);
> -	spin_unlock(&zone->lru_lock);
> +	spin_unlock_irq(&zone->lru_lock);
>  done:
> -	local_irq_enable();
>  	pagevec_release(&pvec);
>  	return nr_reclaimed;
>  }

No.

shrink_inactive_list lock usage is 

	local_irq_disable()
	spin_lock(&zone->lru_lock);
	while(){
		if (!pagevec_add(&pvec, page)) {
			spin_unlock_irq(&zone->lru_lock);
			__pagevec_release(&pvec);
			spin_lock_irq(&zone->lru_lock);
		}
	}
	spin_unlock(&zone->lru_lock);
	local_irq_enable();

this keep below lock rule.
	- if zone->lru_lock is holded, interrupt is always disable disabled.




> @@ -2438,7 +2437,7 @@ static void show_page_path(struct page *
>   */
>  static void check_move_unevictable_page(struct page *page, struct zone *zone)
>  {
> -
> +retry:
>  	ClearPageUnevictable(page); /* for page_evictable() */
>  	if (page_evictable(page, NULL)) {
>  		enum lru_list l = LRU_INACTIVE_ANON + page_is_file_cache(page);
> @@ -2455,6 +2454,8 @@ static void check_move_unevictable_page(
>  		 */
>  		SetPageUnevictable(page);
>  		list_move(&page->lru, &zone->lru[LRU_UNEVICTABLE].list);
> +		if (page_evictable(page, NULL))
> +			goto retry;
>  	}
>  }

Right, Thanks.


>  
> @@ -2494,16 +2495,6 @@ void scan_mapping_unevictable_pages(stru
>  				next = page_index;
>  			next++;
>  
> -			if (TestSetPageLocked(page)) {
> -				/*
> -				 * OK, let's do it the hard way...
> -				 */
> -				if (zone)
> -					spin_unlock_irq(&zone->lru_lock);
> -				zone = NULL;
> -				lock_page(page);
> -			}
> -
>  			if (pagezone != zone) {
>  				if (zone)
>  					spin_unlock_irq(&zone->lru_lock);
> @@ -2514,8 +2505,6 @@ void scan_mapping_unevictable_pages(stru
>  			if (PageLRU(page) && PageUnevictable(page))
>  				check_move_unevictable_page(page, zone);
>  
> -			unlock_page(page);
> -
>  		}
>  		if (zone)
>  			spin_unlock_irq(&zone->lru_lock);

Right.


> @@ -2551,15 +2540,11 @@ void scan_zone_unevictable_pages(struct 
>  		for (scan = 0;  scan < batch_size; scan++) {
>  			struct page *page = lru_to_page(l_unevictable);
>  
> -			if (TestSetPageLocked(page))
> -				continue;
> -
>  			prefetchw_prev_lru_page(page, l_unevictable, flags);
>  
>  			if (likely(PageLRU(page) && PageUnevictable(page)))
>  				check_move_unevictable_page(page, zone);
>  
> -			unlock_page(page);
>  		}
>  		spin_unlock_irq(&zone->lru_lock);

Right.



^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-21  8:39                 ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-21  8:39 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro, Lee Schermerhorn, Daisuke Nishimura,
	Andrew Morton, Rik van Riel, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers

Hi

> Lee-san, this is an additonal one..
> Not-tested-yet, just by review.
> 
> Fixing page_lock() <-> zone->lock nesting of bad-behavior.
> 
> Before:
>       lock_page()(TestSetPageLocked())
>       spin_lock(zone->lock)
>       unlock_page()
>       spin_unlock(zone->lock)  
> After:
>       spin_lock(zone->lock)
>       spin_unlock(zone->lock)


Good catch!

> 
> Including nit-pick fix. (I'll ask Kosaki-san to merge this to his 5/5)
> 
> Hmm...
> 
> ---
>  mm/vmscan.c |   25 +++++--------------------
>  1 file changed, 5 insertions(+), 20 deletions(-)
> 
> Index: test-2.6.26-rc5-mm3/mm/vmscan.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/vmscan.c
> +++ test-2.6.26-rc5-mm3/mm/vmscan.c
> @@ -1106,7 +1106,7 @@ static unsigned long shrink_inactive_lis
>  		if (nr_taken == 0)
>  			goto done;
>  
> -		spin_lock(&zone->lru_lock);
> +		spin_lock_irq(&zone->lru_lock);
>  		/*
>  		 * Put back any unfreeable pages.
>  		 */
> @@ -1136,9 +1136,8 @@ static unsigned long shrink_inactive_lis
>  			}
>  		}
>    	} while (nr_scanned < max_scan);
> -	spin_unlock(&zone->lru_lock);
> +	spin_unlock_irq(&zone->lru_lock);
>  done:
> -	local_irq_enable();
>  	pagevec_release(&pvec);
>  	return nr_reclaimed;
>  }

No.

shrink_inactive_list lock usage is 

	local_irq_disable()
	spin_lock(&zone->lru_lock);
	while(){
		if (!pagevec_add(&pvec, page)) {
			spin_unlock_irq(&zone->lru_lock);
			__pagevec_release(&pvec);
			spin_lock_irq(&zone->lru_lock);
		}
	}
	spin_unlock(&zone->lru_lock);
	local_irq_enable();

this keep below lock rule.
	- if zone->lru_lock is holded, interrupt is always disable disabled.




> @@ -2438,7 +2437,7 @@ static void show_page_path(struct page *
>   */
>  static void check_move_unevictable_page(struct page *page, struct zone *zone)
>  {
> -
> +retry:
>  	ClearPageUnevictable(page); /* for page_evictable() */
>  	if (page_evictable(page, NULL)) {
>  		enum lru_list l = LRU_INACTIVE_ANON + page_is_file_cache(page);
> @@ -2455,6 +2454,8 @@ static void check_move_unevictable_page(
>  		 */
>  		SetPageUnevictable(page);
>  		list_move(&page->lru, &zone->lru[LRU_UNEVICTABLE].list);
> +		if (page_evictable(page, NULL))
> +			goto retry;
>  	}
>  }

Right, Thanks.


>  
> @@ -2494,16 +2495,6 @@ void scan_mapping_unevictable_pages(stru
>  				next = page_index;
>  			next++;
>  
> -			if (TestSetPageLocked(page)) {
> -				/*
> -				 * OK, let's do it the hard way...
> -				 */
> -				if (zone)
> -					spin_unlock_irq(&zone->lru_lock);
> -				zone = NULL;
> -				lock_page(page);
> -			}
> -
>  			if (pagezone != zone) {
>  				if (zone)
>  					spin_unlock_irq(&zone->lru_lock);
> @@ -2514,8 +2505,6 @@ void scan_mapping_unevictable_pages(stru
>  			if (PageLRU(page) && PageUnevictable(page))
>  				check_move_unevictable_page(page, zone);
>  
> -			unlock_page(page);
> -
>  		}
>  		if (zone)
>  			spin_unlock_irq(&zone->lru_lock);

Right.


> @@ -2551,15 +2540,11 @@ void scan_zone_unevictable_pages(struct 
>  		for (scan = 0;  scan < batch_size; scan++) {
>  			struct page *page = lru_to_page(l_unevictable);
>  
> -			if (TestSetPageLocked(page))
> -				continue;
> -
>  			prefetchw_prev_lru_page(page, l_unevictable, flags);
>  
>  			if (likely(PageLRU(page) && PageUnevictable(page)))
>  				check_move_unevictable_page(page, zone);
>  
> -			unlock_page(page);
>  		}
>  		spin_unlock_irq(&zone->lru_lock);

Right.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-21  8:39                 ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-21  8:39 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, Lee Schermerhorn,
	Daisuke Nishimura, Andrew Morton, Rik van Riel, Nick Piggin,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

Hi

> Lee-san, this is an additonal one..
> Not-tested-yet, just by review.
> 
> Fixing page_lock() <-> zone->lock nesting of bad-behavior.
> 
> Before:
>       lock_page()(TestSetPageLocked())
>       spin_lock(zone->lock)
>       unlock_page()
>       spin_unlock(zone->lock)  
> After:
>       spin_lock(zone->lock)
>       spin_unlock(zone->lock)


Good catch!

> 
> Including nit-pick fix. (I'll ask Kosaki-san to merge this to his 5/5)
> 
> Hmm...
> 
> ---
>  mm/vmscan.c |   25 +++++--------------------
>  1 file changed, 5 insertions(+), 20 deletions(-)
> 
> Index: test-2.6.26-rc5-mm3/mm/vmscan.c
> ===================================================================
> --- test-2.6.26-rc5-mm3.orig/mm/vmscan.c
> +++ test-2.6.26-rc5-mm3/mm/vmscan.c
> @@ -1106,7 +1106,7 @@ static unsigned long shrink_inactive_lis
>  		if (nr_taken == 0)
>  			goto done;
>  
> -		spin_lock(&zone->lru_lock);
> +		spin_lock_irq(&zone->lru_lock);
>  		/*
>  		 * Put back any unfreeable pages.
>  		 */
> @@ -1136,9 +1136,8 @@ static unsigned long shrink_inactive_lis
>  			}
>  		}
>    	} while (nr_scanned < max_scan);
> -	spin_unlock(&zone->lru_lock);
> +	spin_unlock_irq(&zone->lru_lock);
>  done:
> -	local_irq_enable();
>  	pagevec_release(&pvec);
>  	return nr_reclaimed;
>  }

No.

shrink_inactive_list lock usage is 

	local_irq_disable()
	spin_lock(&zone->lru_lock);
	while(){
		if (!pagevec_add(&pvec, page)) {
			spin_unlock_irq(&zone->lru_lock);
			__pagevec_release(&pvec);
			spin_lock_irq(&zone->lru_lock);
		}
	}
	spin_unlock(&zone->lru_lock);
	local_irq_enable();

this keep below lock rule.
	- if zone->lru_lock is holded, interrupt is always disable disabled.




> @@ -2438,7 +2437,7 @@ static void show_page_path(struct page *
>   */
>  static void check_move_unevictable_page(struct page *page, struct zone *zone)
>  {
> -
> +retry:
>  	ClearPageUnevictable(page); /* for page_evictable() */
>  	if (page_evictable(page, NULL)) {
>  		enum lru_list l = LRU_INACTIVE_ANON + page_is_file_cache(page);
> @@ -2455,6 +2454,8 @@ static void check_move_unevictable_page(
>  		 */
>  		SetPageUnevictable(page);
>  		list_move(&page->lru, &zone->lru[LRU_UNEVICTABLE].list);
> +		if (page_evictable(page, NULL))
> +			goto retry;
>  	}
>  }

Right, Thanks.


>  
> @@ -2494,16 +2495,6 @@ void scan_mapping_unevictable_pages(stru
>  				next = page_index;
>  			next++;
>  
> -			if (TestSetPageLocked(page)) {
> -				/*
> -				 * OK, let's do it the hard way...
> -				 */
> -				if (zone)
> -					spin_unlock_irq(&zone->lru_lock);
> -				zone = NULL;
> -				lock_page(page);
> -			}
> -
>  			if (pagezone != zone) {
>  				if (zone)
>  					spin_unlock_irq(&zone->lru_lock);
> @@ -2514,8 +2505,6 @@ void scan_mapping_unevictable_pages(stru
>  			if (PageLRU(page) && PageUnevictable(page))
>  				check_move_unevictable_page(page, zone);
>  
> -			unlock_page(page);
> -
>  		}
>  		if (zone)
>  			spin_unlock_irq(&zone->lru_lock);

Right.


> @@ -2551,15 +2540,11 @@ void scan_zone_unevictable_pages(struct 
>  		for (scan = 0;  scan < batch_size; scan++) {
>  			struct page *page = lru_to_page(l_unevictable);
>  
> -			if (TestSetPageLocked(page))
> -				continue;
> -
>  			prefetchw_prev_lru_page(page, l_unevictable, flags);
>  
>  			if (likely(PageLRU(page) && PageUnevictable(page)))
>  				check_move_unevictable_page(page, zone);
>  
> -			unlock_page(page);
>  		}
>  		spin_unlock_irq(&zone->lru_lock);

Right.


--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
  2008-06-20 17:10                 ` Lee Schermerhorn
  (?)
@ 2008-06-21  8:41                   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-21  8:41 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: kosaki.motohiro, KAMEZAWA Hiroyuki, Daisuke Nishimura,
	Andrew Morton, Rik van Riel, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers

> > Before:
> >       lock_page()(TestSetPageLocked())
> >       spin_lock(zone->lock)
> >       unlock_page()
> >       spin_unlock(zone->lock)  
> 
> Couple of comments:
> * I believe that the locks are acquired in the right order--at least as
> documented in the comments in mm/rmap.c.  
> * The unlocking appears out of order because this function attempts to
> hold the zone lock across a few pages in the pagevec, but must switch to
> a different zone lru lock when it finds a page on a different zone from
> the zone whose lock it is holding--like in the pagevec draining
> functions, altho' they don't need to lock the page.
> 
> > After:
> >       spin_lock(zone->lock)
> >       spin_unlock(zone->lock)
> 
> Right.  With your reworked check_move_unevictable_page() [with retry],
> we don't need to lock the page here, any more.  That means we can revert
> all of the changes to pass the mapping back to sys_shmctl() and move the
> call to scan_mapping_unevictable_pages() back to shmem_lock() after
> clearing the address_space's unevictable flag.  We only did that to
> avoid sleeping while holding the shmem_inode_info lock and the
> shmid_kernel's ipc_perm spinlock.  
> 
> Shall I handle that, after we've tested this patch?

Yeah, I'll do it :)


> > @@ -2438,7 +2437,7 @@ static void show_page_path(struct page *
> >   */
> >  static void check_move_unevictable_page(struct page *page, struct zone *zone)
> >  {
> > -
> > +retry:
> >  	ClearPageUnevictable(page); /* for page_evictable() */
> We can remove this comment            ^^^^^^^^^^^^^^^^^^^^^^^^^^
> page_evictable() no longer asserts !PageUnevictable(), right?

Yes.
I'll remove it.




^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-21  8:41                   ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-21  8:41 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: kosaki.motohiro, KAMEZAWA Hiroyuki, Daisuke Nishimura,
	Andrew Morton, Rik van Riel, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers

> > Before:
> >       lock_page()(TestSetPageLocked())
> >       spin_lock(zone->lock)
> >       unlock_page()
> >       spin_unlock(zone->lock)  
> 
> Couple of comments:
> * I believe that the locks are acquired in the right order--at least as
> documented in the comments in mm/rmap.c.  
> * The unlocking appears out of order because this function attempts to
> hold the zone lock across a few pages in the pagevec, but must switch to
> a different zone lru lock when it finds a page on a different zone from
> the zone whose lock it is holding--like in the pagevec draining
> functions, altho' they don't need to lock the page.
> 
> > After:
> >       spin_lock(zone->lock)
> >       spin_unlock(zone->lock)
> 
> Right.  With your reworked check_move_unevictable_page() [with retry],
> we don't need to lock the page here, any more.  That means we can revert
> all of the changes to pass the mapping back to sys_shmctl() and move the
> call to scan_mapping_unevictable_pages() back to shmem_lock() after
> clearing the address_space's unevictable flag.  We only did that to
> avoid sleeping while holding the shmem_inode_info lock and the
> shmid_kernel's ipc_perm spinlock.  
> 
> Shall I handle that, after we've tested this patch?

Yeah, I'll do it :)


> > @@ -2438,7 +2437,7 @@ static void show_page_path(struct page *
> >   */
> >  static void check_move_unevictable_page(struct page *page, struct zone *zone)
> >  {
> > -
> > +retry:
> >  	ClearPageUnevictable(page); /* for page_evictable() */
> We can remove this comment            ^^^^^^^^^^^^^^^^^^^^^^^^^^
> page_evictable() no longer asserts !PageUnevictable(), right?

Yes.
I'll remove it.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-21  8:41                   ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-21  8:41 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Andrew Morton, Rik van Riel, Nick Piggin,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

> > Before:
> >       lock_page()(TestSetPageLocked())
> >       spin_lock(zone->lock)
> >       unlock_page()
> >       spin_unlock(zone->lock)  
> 
> Couple of comments:
> * I believe that the locks are acquired in the right order--at least as
> documented in the comments in mm/rmap.c.  
> * The unlocking appears out of order because this function attempts to
> hold the zone lock across a few pages in the pagevec, but must switch to
> a different zone lru lock when it finds a page on a different zone from
> the zone whose lock it is holding--like in the pagevec draining
> functions, altho' they don't need to lock the page.
> 
> > After:
> >       spin_lock(zone->lock)
> >       spin_unlock(zone->lock)
> 
> Right.  With your reworked check_move_unevictable_page() [with retry],
> we don't need to lock the page here, any more.  That means we can revert
> all of the changes to pass the mapping back to sys_shmctl() and move the
> call to scan_mapping_unevictable_pages() back to shmem_lock() after
> clearing the address_space's unevictable flag.  We only did that to
> avoid sleeping while holding the shmem_inode_info lock and the
> shmid_kernel's ipc_perm spinlock.  
> 
> Shall I handle that, after we've tested this patch?

Yeah, I'll do it :)


> > @@ -2438,7 +2437,7 @@ static void show_page_path(struct page *
> >   */
> >  static void check_move_unevictable_page(struct page *page, struct zone *zone)
> >  {
> > -
> > +retry:
> >  	ClearPageUnevictable(page); /* for page_evictable() */
> We can remove this comment            ^^^^^^^^^^^^^^^^^^^^^^^^^^
> page_evictable() no longer asserts !PageUnevictable(), right?

Yes.
I'll remove it.



--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
  2008-06-20 20:41                   ` Lee Schermerhorn
  (?)
@ 2008-06-21  8:56                     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-21  8:56 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: kosaki.motohiro, KAMEZAWA Hiroyuki, Daisuke Nishimura,
	Andrew Morton, Rik van Riel, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers

> Quick update:  
> 
> With this patch applied, at ~ 1.5 hours into the test, my system panic'd
> [panic_on_oops set] with a BUG in __find_get_block() -- looks like the
> BUG_ON() in check_irqs_on() called from bh_lru_install() inlined by
> __find_get_block().  Before the panic occurred, I saw warnings from
> native_smp_call_function_mask() [arch/x86/kernel/smp.c]--also because
> irqs_disabled().
> 
> I'll back out the changes [spin_[un]lock() => spin_[un]lock_irq()] to
> shrink_inactive_list() and try again.  Just a hunch.

Yup.
Kamezawa-san's patch remove local_irq_enable(), but don't remove
local_irq_disable().
thus, irq is never enabled.

> -	spin_unlock(&zone->lru_lock);
> +	spin_unlock_irq(&zone->lru_lock);
>  done:
> -	local_irq_enable();




^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-21  8:56                     ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-21  8:56 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: kosaki.motohiro, KAMEZAWA Hiroyuki, Daisuke Nishimura,
	Andrew Morton, Rik van Riel, Nick Piggin, linux-mm, linux-kernel,
	kernel-testers

> Quick update:  
> 
> With this patch applied, at ~ 1.5 hours into the test, my system panic'd
> [panic_on_oops set] with a BUG in __find_get_block() -- looks like the
> BUG_ON() in check_irqs_on() called from bh_lru_install() inlined by
> __find_get_block().  Before the panic occurred, I saw warnings from
> native_smp_call_function_mask() [arch/x86/kernel/smp.c]--also because
> irqs_disabled().
> 
> I'll back out the changes [spin_[un]lock() => spin_[un]lock_irq()] to
> shrink_inactive_list() and try again.  Just a hunch.

Yup.
Kamezawa-san's patch remove local_irq_enable(), but don't remove
local_irq_disable().
thus, irq is never enabled.

> -	spin_unlock(&zone->lru_lock);
> +	spin_unlock_irq(&zone->lru_lock);
>  done:
> -	local_irq_enable();



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-21  8:56                     ` KOSAKI Motohiro
  0 siblings, 0 replies; 290+ messages in thread
From: KOSAKI Motohiro @ 2008-06-21  8:56 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, KAMEZAWA Hiroyuki,
	Daisuke Nishimura, Andrew Morton, Rik van Riel, Nick Piggin,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

> Quick update:  
> 
> With this patch applied, at ~ 1.5 hours into the test, my system panic'd
> [panic_on_oops set] with a BUG in __find_get_block() -- looks like the
> BUG_ON() in check_irqs_on() called from bh_lru_install() inlined by
> __find_get_block().  Before the panic occurred, I saw warnings from
> native_smp_call_function_mask() [arch/x86/kernel/smp.c]--also because
> irqs_disabled().
> 
> I'll back out the changes [spin_[un]lock() => spin_[un]lock_irq()] to
> shrink_inactive_list() and try again.  Just a hunch.

Yup.
Kamezawa-san's patch remove local_irq_enable(), but don't remove
local_irq_disable().
thus, irq is never enabled.

> -	spin_unlock(&zone->lru_lock);
> +	spin_unlock_irq(&zone->lru_lock);
>  done:
> -	local_irq_enable();



--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
  2008-06-21  8:56                     ` KOSAKI Motohiro
  (?)
@ 2008-06-23  0:30                       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-23  0:30 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Lee Schermerhorn, Daisuke Nishimura, Andrew Morton, Rik van Riel,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Sat, 21 Jun 2008 17:56:17 +0900
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> > Quick update:  
> > 
> > With this patch applied, at ~ 1.5 hours into the test, my system panic'd
> > [panic_on_oops set] with a BUG in __find_get_block() -- looks like the
> > BUG_ON() in check_irqs_on() called from bh_lru_install() inlined by
> > __find_get_block().  Before the panic occurred, I saw warnings from
> > native_smp_call_function_mask() [arch/x86/kernel/smp.c]--also because
> > irqs_disabled().
> > 
> > I'll back out the changes [spin_[un]lock() => spin_[un]lock_irq()] to
> > shrink_inactive_list() and try again.  Just a hunch.
> 
> Yup.
> Kamezawa-san's patch remove local_irq_enable(), but don't remove
> local_irq_disable().
> thus, irq is never enabled.
> 

Sorry,
-Kame


> > -	spin_unlock(&zone->lru_lock);
> > +	spin_unlock_irq(&zone->lru_lock);
> >  done:
> > -	local_irq_enable();
> 
> 
> 


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-23  0:30                       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-23  0:30 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Lee Schermerhorn, Daisuke Nishimura, Andrew Morton, Rik van Riel,
	Nick Piggin, linux-mm, linux-kernel, kernel-testers

On Sat, 21 Jun 2008 17:56:17 +0900
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> > Quick update:  
> > 
> > With this patch applied, at ~ 1.5 hours into the test, my system panic'd
> > [panic_on_oops set] with a BUG in __find_get_block() -- looks like the
> > BUG_ON() in check_irqs_on() called from bh_lru_install() inlined by
> > __find_get_block().  Before the panic occurred, I saw warnings from
> > native_smp_call_function_mask() [arch/x86/kernel/smp.c]--also because
> > irqs_disabled().
> > 
> > I'll back out the changes [spin_[un]lock() => spin_[un]lock_irq()] to
> > shrink_inactive_list() and try again.  Just a hunch.
> 
> Yup.
> Kamezawa-san's patch remove local_irq_enable(), but don't remove
> local_irq_disable().
> thus, irq is never enabled.
> 

Sorry,
-Kame


> > -	spin_unlock(&zone->lru_lock);
> > +	spin_unlock_irq(&zone->lru_lock);
> >  done:
> > -	local_irq_enable();
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [Experimental][PATCH] putback_lru_page rework
@ 2008-06-23  0:30                       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 290+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-23  0:30 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Lee Schermerhorn, Daisuke Nishimura, Andrew Morton, Rik van Riel,
	Nick Piggin, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Sat, 21 Jun 2008 17:56:17 +0900
KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:

> > Quick update:  
> > 
> > With this patch applied, at ~ 1.5 hours into the test, my system panic'd
> > [panic_on_oops set] with a BUG in __find_get_block() -- looks like the
> > BUG_ON() in check_irqs_on() called from bh_lru_install() inlined by
> > __find_get_block().  Before the panic occurred, I saw warnings from
> > native_smp_call_function_mask() [arch/x86/kernel/smp.c]--also because
> > irqs_disabled().
> > 
> > I'll back out the changes [spin_[un]lock() => spin_[un]lock_irq()] to
> > shrink_inactive_list() and try again.  Just a hunch.
> 
> Yup.
> Kamezawa-san's patch remove local_irq_enable(), but don't remove
> local_irq_disable().
> thus, irq is never enabled.
> 

Sorry,
-Kame


> > -	spin_unlock(&zone->lru_lock);
> > +	spin_unlock_irq(&zone->lru_lock);
> >  done:
> > -	local_irq_enable();
> 
> 
> 

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run()
  2008-06-20 13:21         ` Ingo Molnar
  (?)
@ 2008-06-23  3:55           ` Rusty Russell
  -1 siblings, 0 replies; 290+ messages in thread
From: Rusty Russell @ 2008-06-23  3:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, Hidehiro Kawai, Andrew Morton, linux-kernel,
	kernel-testers, linux-mm, sugita, Satoshi OSHIMA

On Friday 20 June 2008 23:21:10 Ingo Molnar wrote:
> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> >> This simply introduces a flag to allow us to disable the capability
> >> checks for internal callers (this is simpler than splitting the
> >> sched_setscheduler() function, since it loops checking permissions).
> >
> > What about?
> >
> > int sched_setscheduler(struct task_struct *p, int policy,
> > 		       struct sched_param *param)
> > {
> > 	return __sched_setscheduler(p, policy, param, true);
> > }
> >
> >
> > int sched_setscheduler_nocheck(struct task_struct *p, int policy,
> > 		               struct sched_param *param)
> > {
> > 	return __sched_setscheduler(p, policy, param, false);
> > }
> >
> >
> > (With the appropriate transformation of sched_setscheduler -> __)
> >
> > Better than scattering stray true/falses around the code.
>
> agreed - it would also be less intrusive on the API change side.

Yes, here's the patch.  I've put it in my tree for testing, too.

sched_setscheduler_nocheck: add a flag to control access checks

Hidehiro Kawai noticed that sched_setscheduler() can fail in
stop_machine: it calls sched_setscheduler() from insmod, which can
have CAP_SYS_MODULE without CAP_SYS_NICE.

Two cases could have failed, so are changed to sched_setscheduler_nocheck:
  kernel/softirq.c:cpu_callback()
	- CPU hotplug callback
  kernel/stop_machine.c:__stop_machine_run()
	- Called from various places, including modprobe()

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff -r 91c45b8d7775 include/linux/sched.h
--- a/include/linux/sched.h	Mon Jun 23 13:49:26 2008 +1000
+++ b/include/linux/sched.h	Mon Jun 23 13:54:55 2008 +1000
@@ -1655,6 +1655,8 @@ extern int task_curr(const struct task_s
 extern int task_curr(const struct task_struct *p);
 extern int idle_cpu(int cpu);
 extern int sched_setscheduler(struct task_struct *, int, struct sched_param *);
+extern int sched_setscheduler_nocheck(struct task_struct *, int,
+				      struct sched_param *);
 extern struct task_struct *idle_task(int cpu);
 extern struct task_struct *curr_task(int cpu);
 extern void set_curr_task(int cpu, struct task_struct *p);
diff -r 91c45b8d7775 kernel/sched.c
--- a/kernel/sched.c	Mon Jun 23 13:49:26 2008 +1000
+++ b/kernel/sched.c	Mon Jun 23 13:54:55 2008 +1000
@@ -4744,16 +4744,8 @@ __setscheduler(struct rq *rq, struct tas
 	set_load_weight(p);
 }
 
-/**
- * sched_setscheduler - change the scheduling policy and/or RT priority of a thread.
- * @p: the task in question.
- * @policy: new policy.
- * @param: structure containing the new RT priority.
- *
- * NOTE that the task may be already dead.
- */
-int sched_setscheduler(struct task_struct *p, int policy,
-		       struct sched_param *param)
+static int __sched_setscheduler(struct task_struct *p, int policy,
+				struct sched_param *param, bool user)
 {
 	int retval, oldprio, oldpolicy = -1, on_rq, running;
 	unsigned long flags;
@@ -4785,7 +4777,7 @@ recheck:
 	/*
 	 * Allow unprivileged RT tasks to decrease priority:
 	 */
-	if (!capable(CAP_SYS_NICE)) {
+	if (user && !capable(CAP_SYS_NICE)) {
 		if (rt_policy(policy)) {
 			unsigned long rlim_rtprio;
 
@@ -4821,7 +4813,8 @@ recheck:
 	 * Do not allow realtime tasks into groups that have no runtime
 	 * assigned.
 	 */
-	if (rt_policy(policy) && task_group(p)->rt_bandwidth.rt_runtime == 0)
+	if (user
+	    && rt_policy(policy) && task_group(p)->rt_bandwidth.rt_runtime == 0)
 		return -EPERM;
 #endif
 
@@ -4870,7 +4863,38 @@ recheck:
 
 	return 0;
 }
+
+/**
+ * sched_setscheduler - change the scheduling policy and/or RT priority of a thread.
+ * @p: the task in question.
+ * @policy: new policy.
+ * @param: structure containing the new RT priority.
+ *
+ * NOTE that the task may be already dead.
+ */
+int sched_setscheduler(struct task_struct *p, int policy,
+		       struct sched_param *param)
+{
+	return __sched_setscheduler(p, policy, param, true);
+}
 EXPORT_SYMBOL_GPL(sched_setscheduler);
+
+/**
+ * sched_setscheduler_nocheck - change the scheduling policy and/or RT priority of a thread 
from kernelspace.
+ * @p: the task in question.
+ * @policy: new policy.
+ * @param: structure containing the new RT priority.
+ *
+ * Just like sched_setscheduler, only don't bother checking if the
+ * current context has permission.  For example, this is needed in
+ * stop_machine(): we create temporary high priority worker threads,
+ * but our caller might not have that capability.
+ */
+int sched_setscheduler_nocheck(struct task_struct *p, int policy,
+			       struct sched_param *param)
+{
+	return __sched_setscheduler(p, policy, param, false);
+}
 
 static int
 do_sched_setscheduler(pid_t pid, int policy, struct sched_param __user *param)
diff -r 91c45b8d7775 kernel/softirq.c
--- a/kernel/softirq.c	Mon Jun 23 13:49:26 2008 +1000
+++ b/kernel/softirq.c	Mon Jun 23 13:54:55 2008 +1000
@@ -645,7 +645,7 @@ static int __cpuinit cpu_callback(struct
 
 		p = per_cpu(ksoftirqd, hotcpu);
 		per_cpu(ksoftirqd, hotcpu) = NULL;
-		sched_setscheduler(p, SCHED_FIFO, &param);
+		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
 		kthread_stop(p);
 		takeover_tasklets(hotcpu);
 		break;
diff -r 91c45b8d7775 kernel/stop_machine.c
--- a/kernel/stop_machine.c	Mon Jun 23 13:49:26 2008 +1000
+++ b/kernel/stop_machine.c	Mon Jun 23 13:54:55 2008 +1000
@@ -187,7 +187,7 @@ struct task_struct *__stop_machine_run(i
 		struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
 
 		/* One high-prio thread per cpu.  We'll do this one. */
-		sched_setscheduler(p, SCHED_FIFO, &param);
+		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
 		kthread_bind(p, cpu);
 		wake_up_process(p);
 		wait_for_completion(&smdata.done);

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run()
@ 2008-06-23  3:55           ` Rusty Russell
  0 siblings, 0 replies; 290+ messages in thread
From: Rusty Russell @ 2008-06-23  3:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, Hidehiro Kawai, Andrew Morton, linux-kernel,
	kernel-testers, linux-mm, sugita, Satoshi OSHIMA

On Friday 20 June 2008 23:21:10 Ingo Molnar wrote:
> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> >> This simply introduces a flag to allow us to disable the capability
> >> checks for internal callers (this is simpler than splitting the
> >> sched_setscheduler() function, since it loops checking permissions).
> >
> > What about?
> >
> > int sched_setscheduler(struct task_struct *p, int policy,
> > 		       struct sched_param *param)
> > {
> > 	return __sched_setscheduler(p, policy, param, true);
> > }
> >
> >
> > int sched_setscheduler_nocheck(struct task_struct *p, int policy,
> > 		               struct sched_param *param)
> > {
> > 	return __sched_setscheduler(p, policy, param, false);
> > }
> >
> >
> > (With the appropriate transformation of sched_setscheduler -> __)
> >
> > Better than scattering stray true/falses around the code.
>
> agreed - it would also be less intrusive on the API change side.

Yes, here's the patch.  I've put it in my tree for testing, too.

sched_setscheduler_nocheck: add a flag to control access checks

Hidehiro Kawai noticed that sched_setscheduler() can fail in
stop_machine: it calls sched_setscheduler() from insmod, which can
have CAP_SYS_MODULE without CAP_SYS_NICE.

Two cases could have failed, so are changed to sched_setscheduler_nocheck:
  kernel/softirq.c:cpu_callback()
	- CPU hotplug callback
  kernel/stop_machine.c:__stop_machine_run()
	- Called from various places, including modprobe()

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff -r 91c45b8d7775 include/linux/sched.h
--- a/include/linux/sched.h	Mon Jun 23 13:49:26 2008 +1000
+++ b/include/linux/sched.h	Mon Jun 23 13:54:55 2008 +1000
@@ -1655,6 +1655,8 @@ extern int task_curr(const struct task_s
 extern int task_curr(const struct task_struct *p);
 extern int idle_cpu(int cpu);
 extern int sched_setscheduler(struct task_struct *, int, struct sched_param *);
+extern int sched_setscheduler_nocheck(struct task_struct *, int,
+				      struct sched_param *);
 extern struct task_struct *idle_task(int cpu);
 extern struct task_struct *curr_task(int cpu);
 extern void set_curr_task(int cpu, struct task_struct *p);
diff -r 91c45b8d7775 kernel/sched.c
--- a/kernel/sched.c	Mon Jun 23 13:49:26 2008 +1000
+++ b/kernel/sched.c	Mon Jun 23 13:54:55 2008 +1000
@@ -4744,16 +4744,8 @@ __setscheduler(struct rq *rq, struct tas
 	set_load_weight(p);
 }
 
-/**
- * sched_setscheduler - change the scheduling policy and/or RT priority of a thread.
- * @p: the task in question.
- * @policy: new policy.
- * @param: structure containing the new RT priority.
- *
- * NOTE that the task may be already dead.
- */
-int sched_setscheduler(struct task_struct *p, int policy,
-		       struct sched_param *param)
+static int __sched_setscheduler(struct task_struct *p, int policy,
+				struct sched_param *param, bool user)
 {
 	int retval, oldprio, oldpolicy = -1, on_rq, running;
 	unsigned long flags;
@@ -4785,7 +4777,7 @@ recheck:
 	/*
 	 * Allow unprivileged RT tasks to decrease priority:
 	 */
-	if (!capable(CAP_SYS_NICE)) {
+	if (user && !capable(CAP_SYS_NICE)) {
 		if (rt_policy(policy)) {
 			unsigned long rlim_rtprio;
 
@@ -4821,7 +4813,8 @@ recheck:
 	 * Do not allow realtime tasks into groups that have no runtime
 	 * assigned.
 	 */
-	if (rt_policy(policy) && task_group(p)->rt_bandwidth.rt_runtime == 0)
+	if (user
+	    && rt_policy(policy) && task_group(p)->rt_bandwidth.rt_runtime == 0)
 		return -EPERM;
 #endif
 
@@ -4870,7 +4863,38 @@ recheck:
 
 	return 0;
 }
+
+/**
+ * sched_setscheduler - change the scheduling policy and/or RT priority of a thread.
+ * @p: the task in question.
+ * @policy: new policy.
+ * @param: structure containing the new RT priority.
+ *
+ * NOTE that the task may be already dead.
+ */
+int sched_setscheduler(struct task_struct *p, int policy,
+		       struct sched_param *param)
+{
+	return __sched_setscheduler(p, policy, param, true);
+}
 EXPORT_SYMBOL_GPL(sched_setscheduler);
+
+/**
+ * sched_setscheduler_nocheck - change the scheduling policy and/or RT priority of a thread 
from kernelspace.
+ * @p: the task in question.
+ * @policy: new policy.
+ * @param: structure containing the new RT priority.
+ *
+ * Just like sched_setscheduler, only don't bother checking if the
+ * current context has permission.  For example, this is needed in
+ * stop_machine(): we create temporary high priority worker threads,
+ * but our caller might not have that capability.
+ */
+int sched_setscheduler_nocheck(struct task_struct *p, int policy,
+			       struct sched_param *param)
+{
+	return __sched_setscheduler(p, policy, param, false);
+}
 
 static int
 do_sched_setscheduler(pid_t pid, int policy, struct sched_param __user *param)
diff -r 91c45b8d7775 kernel/softirq.c
--- a/kernel/softirq.c	Mon Jun 23 13:49:26 2008 +1000
+++ b/kernel/softirq.c	Mon Jun 23 13:54:55 2008 +1000
@@ -645,7 +645,7 @@ static int __cpuinit cpu_callback(struct
 
 		p = per_cpu(ksoftirqd, hotcpu);
 		per_cpu(ksoftirqd, hotcpu) = NULL;
-		sched_setscheduler(p, SCHED_FIFO, &param);
+		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
 		kthread_stop(p);
 		takeover_tasklets(hotcpu);
 		break;
diff -r 91c45b8d7775 kernel/stop_machine.c
--- a/kernel/stop_machine.c	Mon Jun 23 13:49:26 2008 +1000
+++ b/kernel/stop_machine.c	Mon Jun 23 13:54:55 2008 +1000
@@ -187,7 +187,7 @@ struct task_struct *__stop_machine_run(i
 		struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
 
 		/* One high-prio thread per cpu.  We'll do this one. */
-		sched_setscheduler(p, SCHED_FIFO, &param);
+		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
 		kthread_bind(p, cpu);
 		wake_up_process(p);
 		wait_for_completion(&smdata.done);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run()
@ 2008-06-23  3:55           ` Rusty Russell
  0 siblings, 0 replies; 290+ messages in thread
From: Rusty Russell @ 2008-06-23  3:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, Hidehiro Kawai, Andrew Morton,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, sugita, Satoshi OSHIMA

On Friday 20 June 2008 23:21:10 Ingo Molnar wrote:
> * Jeremy Fitzhardinge <jeremy-TSDbQ3PG+2Y@public.gmane.org> wrote:
> >> This simply introduces a flag to allow us to disable the capability
> >> checks for internal callers (this is simpler than splitting the
> >> sched_setscheduler() function, since it loops checking permissions).
> >
> > What about?
> >
> > int sched_setscheduler(struct task_struct *p, int policy,
> > 		       struct sched_param *param)
> > {
> > 	return __sched_setscheduler(p, policy, param, true);
> > }
> >
> >
> > int sched_setscheduler_nocheck(struct task_struct *p, int policy,
> > 		               struct sched_param *param)
> > {
> > 	return __sched_setscheduler(p, policy, param, false);
> > }
> >
> >
> > (With the appropriate transformation of sched_setscheduler -> __)
> >
> > Better than scattering stray true/falses around the code.
>
> agreed - it would also be less intrusive on the API change side.

Yes, here's the patch.  I've put it in my tree for testing, too.

sched_setscheduler_nocheck: add a flag to control access checks

Hidehiro Kawai noticed that sched_setscheduler() can fail in
stop_machine: it calls sched_setscheduler() from insmod, which can
have CAP_SYS_MODULE without CAP_SYS_NICE.

Two cases could have failed, so are changed to sched_setscheduler_nocheck:
  kernel/softirq.c:cpu_callback()
	- CPU hotplug callback
  kernel/stop_machine.c:__stop_machine_run()
	- Called from various places, including modprobe()

Signed-off-by: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>

diff -r 91c45b8d7775 include/linux/sched.h
--- a/include/linux/sched.h	Mon Jun 23 13:49:26 2008 +1000
+++ b/include/linux/sched.h	Mon Jun 23 13:54:55 2008 +1000
@@ -1655,6 +1655,8 @@ extern int task_curr(const struct task_s
 extern int task_curr(const struct task_struct *p);
 extern int idle_cpu(int cpu);
 extern int sched_setscheduler(struct task_struct *, int, struct sched_param *);
+extern int sched_setscheduler_nocheck(struct task_struct *, int,
+				      struct sched_param *);
 extern struct task_struct *idle_task(int cpu);
 extern struct task_struct *curr_task(int cpu);
 extern void set_curr_task(int cpu, struct task_struct *p);
diff -r 91c45b8d7775 kernel/sched.c
--- a/kernel/sched.c	Mon Jun 23 13:49:26 2008 +1000
+++ b/kernel/sched.c	Mon Jun 23 13:54:55 2008 +1000
@@ -4744,16 +4744,8 @@ __setscheduler(struct rq *rq, struct tas
 	set_load_weight(p);
 }
 
-/**
- * sched_setscheduler - change the scheduling policy and/or RT priority of a thread.
- * @p: the task in question.
- * @policy: new policy.
- * @param: structure containing the new RT priority.
- *
- * NOTE that the task may be already dead.
- */
-int sched_setscheduler(struct task_struct *p, int policy,
-		       struct sched_param *param)
+static int __sched_setscheduler(struct task_struct *p, int policy,
+				struct sched_param *param, bool user)
 {
 	int retval, oldprio, oldpolicy = -1, on_rq, running;
 	unsigned long flags;
@@ -4785,7 +4777,7 @@ recheck:
 	/*
 	 * Allow unprivileged RT tasks to decrease priority:
 	 */
-	if (!capable(CAP_SYS_NICE)) {
+	if (user && !capable(CAP_SYS_NICE)) {
 		if (rt_policy(policy)) {
 			unsigned long rlim_rtprio;
 
@@ -4821,7 +4813,8 @@ recheck:
 	 * Do not allow realtime tasks into groups that have no runtime
 	 * assigned.
 	 */
-	if (rt_policy(policy) && task_group(p)->rt_bandwidth.rt_runtime == 0)
+	if (user
+	    && rt_policy(policy) && task_group(p)->rt_bandwidth.rt_runtime == 0)
 		return -EPERM;
 #endif
 
@@ -4870,7 +4863,38 @@ recheck:
 
 	return 0;
 }
+
+/**
+ * sched_setscheduler - change the scheduling policy and/or RT priority of a thread.
+ * @p: the task in question.
+ * @policy: new policy.
+ * @param: structure containing the new RT priority.
+ *
+ * NOTE that the task may be already dead.
+ */
+int sched_setscheduler(struct task_struct *p, int policy,
+		       struct sched_param *param)
+{
+	return __sched_setscheduler(p, policy, param, true);
+}
 EXPORT_SYMBOL_GPL(sched_setscheduler);
+
+/**
+ * sched_setscheduler_nocheck - change the scheduling policy and/or RT priority of a thread 
from kernelspace.
+ * @p: the task in question.
+ * @policy: new policy.
+ * @param: structure containing the new RT priority.
+ *
+ * Just like sched_setscheduler, only don't bother checking if the
+ * current context has permission.  For example, this is needed in
+ * stop_machine(): we create temporary high priority worker threads,
+ * but our caller might not have that capability.
+ */
+int sched_setscheduler_nocheck(struct task_struct *p, int policy,
+			       struct sched_param *param)
+{
+	return __sched_setscheduler(p, policy, param, false);
+}
 
 static int
 do_sched_setscheduler(pid_t pid, int policy, struct sched_param __user *param)
diff -r 91c45b8d7775 kernel/softirq.c
--- a/kernel/softirq.c	Mon Jun 23 13:49:26 2008 +1000
+++ b/kernel/softirq.c	Mon Jun 23 13:54:55 2008 +1000
@@ -645,7 +645,7 @@ static int __cpuinit cpu_callback(struct
 
 		p = per_cpu(ksoftirqd, hotcpu);
 		per_cpu(ksoftirqd, hotcpu) = NULL;
-		sched_setscheduler(p, SCHED_FIFO, &param);
+		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
 		kthread_stop(p);
 		takeover_tasklets(hotcpu);
 		break;
diff -r 91c45b8d7775 kernel/stop_machine.c
--- a/kernel/stop_machine.c	Mon Jun 23 13:49:26 2008 +1000
+++ b/kernel/stop_machine.c	Mon Jun 23 13:54:55 2008 +1000
@@ -187,7 +187,7 @@ struct task_struct *__stop_machine_run(i
 		struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
 
 		/* One high-prio thread per cpu.  We'll do this one. */
-		sched_setscheduler(p, SCHED_FIFO, &param);
+		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
 		kthread_bind(p, cpu);
 		wake_up_process(p);
 		wait_for_completion(&smdata.done);

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits
  2008-06-20 19:17       ` Andy Whitcroft
  (?)
@ 2008-06-23  7:33         ` Mel Gorman
  -1 siblings, 0 replies; 290+ messages in thread
From: Mel Gorman @ 2008-06-23  7:33 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Jon Tollefson, Andrew Morton, Nick Piggin, Nishanth Aravamudan,
	Adam Litke, linux-kernel, kernel-testers, linux-mm

On (20/06/08 20:17), Andy Whitcroft didst pronounce:
> When a hugetlb mapping with a reservation is split, a new VMA is cloned
> from the original.  This new VMA is a direct copy of the original
> including the reservation count.  When this pair of VMAs are unmapped
> we will incorrect double account the unused reservation and the overall
> reservation count will be incorrect, in extreme cases it will wrap.
> 

D'oh. It's not even that extreme, it's fairly straight-forward
to trigger as it turns out as this crappy application shows
http://www.csn.ul.ie/~mel/postings/apw-20080622/hugetlbfs-unmap-private-test.c
. This runs on x86 and can wrap the rsvd counters. I believe the other tests
I was running had already used the reserves and missed this test case.

> The problem occurs when we split an existing VMA say to unmap a page in
> the middle.  split_vma() will create a new VMA copying all fields from
> the original.  As we are storing our reservation count in vm_private_data
> this is also copies, endowing the new VMA with a duplicate of the original
> VMA's reservation.  Neither of the new VMAs can exhaust these reservations
> as they are too small, but when we unmap and close these VMAs we will
> incorrect credit the remainder twice and resv_huge_pages will become
> out of sync.  This can lead to allocation failures on mappings with
> reservations and even to resv_huge_pages wrapping which prevents all
> subsequent hugepage allocations.
> 

Yeah, that does sound as if it would occur all right and running the
test program confirms it.

> The simple fix would be to correctly apportion the remaining reservation
> count when the split is made.  However the only hook we have vm_ops->open
> only has the new VMA we do not know the identity of the preceeding VMA.
> Also even if we did have that VMA to hand we do not know how much of the
> reservation was consumed each side of the split.
> 
> This patch therefore takes a different tack.  We know that the whole of any
> private mapping (which has a reservation) has a reservation over its whole
> size.  Any present pages represent consumed reservation.  Therefore if
> we track the instantiated pages we can calculate the remaining reservation.
> 
> This patch reuses the existing regions code to track the regions for which
> we have consumed reservation (ie. the instantiated pages), as each page
> is faulted in we record the consumption of reservation for the new page.

Clever. The additional nice thing is that it makes private mappings less of
a special case in comparison to shared mappings. My impression right now is
that with the path, shared mappings track reservations based on the underlying
file and the private mappings are generally tracked per-mapping and only share
due to unmap-related-splits or forks().  That seems a bit more consistent.

> When we need to return unused reservations at unmap time we simply count
> the consumed reservation region subtracting that from the whole of the map.
> During a VMA split the newly opened VMA will point to the same region map,
> as this map is offset oriented it remains valid for both of the split VMAs.
> This map is referenced counted so that it is removed when all VMAs which
> are part of the mmap are gone.
> 

This looks sensible and applying the patches and running the test program
means that the reserve counter does not wrap when the program exists which
is very nice.  I also tested a parent-child scenario where the pool is of
insufficient size and the child gets killed as expected. Thanks a million
for cleaning this up.

Some comments below but they are relatively minor.

> Signed-off-by: Andy Whitcroft <apw@shadowen.org>
> ---
>  mm/hugetlb.c |  151 ++++++++++++++++++++++++++++++++++++++++++++++++----------
>  1 files changed, 126 insertions(+), 25 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d701e39..ecff986 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -171,6 +171,30 @@ static long region_truncate(struct list_head *head, long end)
>  	return chg;
>  }
>  
> +static long region_count(struct list_head *head, long f, long t)
> +{
> +	struct file_region *rg;
> +	long chg = 0;
> +
> +	/* Locate each segment we overlap with, and count that overlap. */
> +	list_for_each_entry(rg, head, link) {
> +		int seg_from;
> +		int seg_to;
> +
> +		if (rg->to <= f)
> +			continue;
> +		if (rg->from >= t)
> +			break;
> +
> +		seg_from = max(rg->from, f);
> +		seg_to = min(rg->to, t);
> +
> +		chg += seg_to - seg_from;
> +	}
> +
> +	return chg;
> +}

Ok, seems straight forward. The tuples track pages that already exist so
by counting the overlaps in a given range, you know how many hugepages
have been faulted. The size of the VMA minus the overlap is the
required reservation.

> +
>  /*
>   * Convert the address within this vma to the page offset within
>   * the mapping, in base page units.
> @@ -193,9 +217,14 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
>  			(vma->vm_pgoff >> huge_page_order(h));
>  }
>  
> -#define HPAGE_RESV_OWNER    (1UL << (BITS_PER_LONG - 1))
> -#define HPAGE_RESV_UNMAPPED (1UL << (BITS_PER_LONG - 2))
> +/*
> + * Flags for MAP_PRIVATE reservations.  These are stored in the bottom
> + * bits of the reservation map pointer.
> + */
> +#define HPAGE_RESV_OWNER    (1UL << 0)
> +#define HPAGE_RESV_UNMAPPED (1UL << 1)
>  #define HPAGE_RESV_MASK (HPAGE_RESV_OWNER | HPAGE_RESV_UNMAPPED)
> +

The bits move here but for good reason. private_data is now a pointer and
we pack flags into bits that are available due to alignment.  Right?

>  /*
>   * These helpers are used to track how many pages are reserved for
>   * faults in a MAP_PRIVATE mapping. Only the process that called mmap()
> @@ -205,6 +234,15 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
>   * the reserve counters are updated with the hugetlb_lock held. It is safe
>   * to reset the VMA at fork() time as it is not in use yet and there is no
>   * chance of the global counters getting corrupted as a result of the values.
> + *
> + * The private mapping reservation is represented in a subtly different
> + * manner to a shared mapping.  A shared mapping has a region map associated
> + * with the underlying file, this region map represents the backing file
> + * pages which have had a reservation taken and this persists even after
> + * the page is instantiated.  A private mapping has a region map associated
> + * with the original mmap which is attached to all VMAs which reference it,
> + * this region map represents those offsets which have consumed reservation
> + * ie. where pages have been instantiated.
>   */
>  static unsigned long get_vma_private_data(struct vm_area_struct *vma)
>  {
> @@ -217,22 +255,44 @@ static void set_vma_private_data(struct vm_area_struct *vma,
>  	vma->vm_private_data = (void *)value;
>  }
>  
> -static unsigned long vma_resv_huge_pages(struct vm_area_struct *vma)
> +struct resv_map {
> +	struct kref refs;
> +	struct list_head regions;
> +};
> +
> +struct resv_map *resv_map_alloc(void)
> +{
> +	struct resv_map *resv_map = kmalloc(sizeof(*resv_map), GFP_KERNEL);
> +	if (!resv_map)
> +		return NULL;
> +
> +	kref_init(&resv_map->refs);
> +	INIT_LIST_HEAD(&resv_map->regions);
> +
> +	return resv_map;
> +}
> +
> +void resv_map_release(struct kref *ref)
> +{
> +        struct resv_map *resv_map = container_of(ref, struct resv_map, refs);
> +

tabs vs space problem here.

> +	region_truncate(&resv_map->regions, 0);
> +	kfree(resv_map);
> +}

Otherwise, looks right. The region_truncate() looked a bit odd but you
have call it or memory would leak so well thought out there.

> +
> +static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
>  {
>  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
>  	if (!(vma->vm_flags & VM_SHARED))
> -		return get_vma_private_data(vma) & ~HPAGE_RESV_MASK;
> +		return (struct resv_map *)(get_vma_private_data(vma) &
> +							~HPAGE_RESV_MASK);
>  	return 0;
>  }
>  
> -static void set_vma_resv_huge_pages(struct vm_area_struct *vma,
> -							unsigned long reserve)
> +static void set_vma_resv_map(struct vm_area_struct *vma, struct resv_map *map)
>  {
> -	VM_BUG_ON(!is_vm_hugetlb_page(vma));
> -	VM_BUG_ON(vma->vm_flags & VM_SHARED);
> -
> -	set_vma_private_data(vma,
> -		(get_vma_private_data(vma) & HPAGE_RESV_MASK) | reserve);
> +	set_vma_private_data(vma, (get_vma_private_data(vma) &
> +				HPAGE_RESV_MASK) | (unsigned long)map);
>  }

The VM_BUG_ON checks are removed here. Is that intentional? They still
seem valid but maybe I am missing something.

>  
>  static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
> @@ -251,11 +311,11 @@ static int is_vma_resv_set(struct vm_area_struct *vma, unsigned long flag)
>  }
>  
>  /* Decrement the reserved pages in the hugepage pool by one */
> -static void decrement_hugepage_resv_vma(struct hstate *h,
> -			struct vm_area_struct *vma)
> +static int decrement_hugepage_resv_vma(struct hstate *h,
> +			struct vm_area_struct *vma, unsigned long address)
>  {

The comment needs an update here to explain what the return value means.
I believe the reason is below.

>  	if (vma->vm_flags & VM_NORESERVE)
> -		return;
> +		return 0;
>  
>  	if (vma->vm_flags & VM_SHARED) {
>  		/* Shared mappings always use reserves */
> @@ -266,14 +326,19 @@ static void decrement_hugepage_resv_vma(struct hstate *h,
>  		 * private mappings.
>  		 */
>  		if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
> -			unsigned long flags, reserve;
> +			unsigned long idx = vma_pagecache_offset(h,
> +							vma, address);
> +			struct resv_map *reservations = vma_resv_map(vma);
> +
>  			h->resv_huge_pages--;
> -			flags = (unsigned long)vma->vm_private_data &
> -							HPAGE_RESV_MASK;
> -			reserve = (unsigned long)vma->vm_private_data - 1;
> -			vma->vm_private_data = (void *)(reserve | flags);
> +
> +			/* Mark this page used in the map. */
> +			if (region_chg(&reservations->regions, idx, idx + 1) < 0)
> +				return -1;
> +			region_add(&reservations->regions, idx, idx + 1);

There is an incredibly remote possibility that a fault would fail for a
mapping that had reserved huge pages because the kmalloc() in region_chg
failed. The system would have to be in terrible shape though. Should a
KERN_WARNING be printed here if this failure path is entered?  Otherwise it
will just mainfest as a SIGKILLd application.

>  		}
>  	}
> +	return 0;
>  }
>  
>  /* Reset counters to 0 and clear all HPAGE_RESV_* flags */
> @@ -289,7 +354,7 @@ static int vma_has_private_reserves(struct vm_area_struct *vma)
>  {
>  	if (vma->vm_flags & VM_SHARED)
>  		return 0;
> -	if (!vma_resv_huge_pages(vma))
> +	if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER))
>  		return 0;
>  	return 1;
>  }
> @@ -376,15 +441,16 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
>  		nid = zone_to_nid(zone);
>  		if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask) &&
>  		    !list_empty(&h->hugepage_freelists[nid])) {
> +			if (!avoid_reserve &&
> +			    decrement_hugepage_resv_vma(h, vma, address) < 0)
> +				return NULL;
> +
>  			page = list_entry(h->hugepage_freelists[nid].next,
>  					  struct page, lru);
>  			list_del(&page->lru);
>  			h->free_huge_pages--;
>  			h->free_huge_pages_node[nid]--;
>  
> -			if (!avoid_reserve)
> -				decrement_hugepage_resv_vma(h, vma);
> -
>  			break;
>  		}
>  	}
> @@ -1456,10 +1522,39 @@ out:
>  	return ret;
>  }
>  
> +static void hugetlb_vm_op_open(struct vm_area_struct *vma)
> +{
> + 	struct resv_map *reservations = vma_resv_map(vma);
> +
> +	/*
> +	 * This new VMA will share its siblings reservation map.  The open
> +	 * vm_op is only called for newly created VMAs which have been made
> +	 * from another, still existing VMA.  As that VMA has a reference to
> +	 * this reservation map the reservation map cannot disappear until
> +	 * after this open completes.  It is therefore safe to take a new
> +	 * reference here without additional locking.
> +	 */
> +	if (reservations)
> +		kref_get(&reservations->refs);
> +}

This comment is a tad misleading. The open call is also called at fork()
time. However, in the case of fork, the private_data will be cleared.
Maybe something like;

====
The open vm_op is called when new VMAs are created but only VMAs which
have been made from another, still existing VMA will have a
reservation....
====

?

> +
>  static void hugetlb_vm_op_close(struct vm_area_struct *vma)
>  {
>  	struct hstate *h = hstate_vma(vma);
> -	unsigned long reserve = vma_resv_huge_pages(vma);
> + 	struct resv_map *reservations = vma_resv_map(vma);
> +	unsigned long reserve = 0;
> +	unsigned long start;
> +	unsigned long end;
> +
> +	if (reservations) {
> +		start = vma_pagecache_offset(h, vma, vma->vm_start);
> +		end = vma_pagecache_offset(h, vma, vma->vm_end);
> +
> +		reserve = (end - start) -
> +			region_count(&reservations->regions, start, end);
> +
> +		kref_put(&reservations->refs, resv_map_release);
> +	}
>  

Clever. So, a split VMA will have the reference map for portions of the
mapping outside its range but region_count() ensures that we decrement
by the correct amount.

>  	if (reserve)
>  		hugetlb_acct_memory(h, -reserve);
> @@ -1479,6 +1574,7 @@ static int hugetlb_vm_op_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  
>  struct vm_operations_struct hugetlb_vm_ops = {
>  	.fault = hugetlb_vm_op_fault,
> +	.open = hugetlb_vm_op_open,
>  	.close = hugetlb_vm_op_close,
>  };
>  
> @@ -2037,8 +2133,13 @@ int hugetlb_reserve_pages(struct inode *inode,
>  	if (!vma || vma->vm_flags & VM_SHARED)
>  		chg = region_chg(&inode->i_mapping->private_list, from, to);
>  	else {
> +		struct resv_map *resv_map = resv_map_alloc();
> +		if (!resv_map)
> +			return -ENOMEM;
> +
>  		chg = to - from;
> -		set_vma_resv_huge_pages(vma, chg);
> +
> +		set_vma_resv_map(vma, resv_map);
>  		set_vma_resv_flags(vma, HPAGE_RESV_OWNER);
>  	}

Overall, this is a really clever idea and I like that it brings private
mappings closer to shared mappings in a number of respects. 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits
@ 2008-06-23  7:33         ` Mel Gorman
  0 siblings, 0 replies; 290+ messages in thread
From: Mel Gorman @ 2008-06-23  7:33 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Jon Tollefson, Andrew Morton, Nick Piggin, Nishanth Aravamudan,
	Adam Litke, linux-kernel, kernel-testers, linux-mm

On (20/06/08 20:17), Andy Whitcroft didst pronounce:
> When a hugetlb mapping with a reservation is split, a new VMA is cloned
> from the original.  This new VMA is a direct copy of the original
> including the reservation count.  When this pair of VMAs are unmapped
> we will incorrect double account the unused reservation and the overall
> reservation count will be incorrect, in extreme cases it will wrap.
> 

D'oh. It's not even that extreme, it's fairly straight-forward
to trigger as it turns out as this crappy application shows
http://www.csn.ul.ie/~mel/postings/apw-20080622/hugetlbfs-unmap-private-test.c
. This runs on x86 and can wrap the rsvd counters. I believe the other tests
I was running had already used the reserves and missed this test case.

> The problem occurs when we split an existing VMA say to unmap a page in
> the middle.  split_vma() will create a new VMA copying all fields from
> the original.  As we are storing our reservation count in vm_private_data
> this is also copies, endowing the new VMA with a duplicate of the original
> VMA's reservation.  Neither of the new VMAs can exhaust these reservations
> as they are too small, but when we unmap and close these VMAs we will
> incorrect credit the remainder twice and resv_huge_pages will become
> out of sync.  This can lead to allocation failures on mappings with
> reservations and even to resv_huge_pages wrapping which prevents all
> subsequent hugepage allocations.
> 

Yeah, that does sound as if it would occur all right and running the
test program confirms it.

> The simple fix would be to correctly apportion the remaining reservation
> count when the split is made.  However the only hook we have vm_ops->open
> only has the new VMA we do not know the identity of the preceeding VMA.
> Also even if we did have that VMA to hand we do not know how much of the
> reservation was consumed each side of the split.
> 
> This patch therefore takes a different tack.  We know that the whole of any
> private mapping (which has a reservation) has a reservation over its whole
> size.  Any present pages represent consumed reservation.  Therefore if
> we track the instantiated pages we can calculate the remaining reservation.
> 
> This patch reuses the existing regions code to track the regions for which
> we have consumed reservation (ie. the instantiated pages), as each page
> is faulted in we record the consumption of reservation for the new page.

Clever. The additional nice thing is that it makes private mappings less of
a special case in comparison to shared mappings. My impression right now is
that with the path, shared mappings track reservations based on the underlying
file and the private mappings are generally tracked per-mapping and only share
due to unmap-related-splits or forks().  That seems a bit more consistent.

> When we need to return unused reservations at unmap time we simply count
> the consumed reservation region subtracting that from the whole of the map.
> During a VMA split the newly opened VMA will point to the same region map,
> as this map is offset oriented it remains valid for both of the split VMAs.
> This map is referenced counted so that it is removed when all VMAs which
> are part of the mmap are gone.
> 

This looks sensible and applying the patches and running the test program
means that the reserve counter does not wrap when the program exists which
is very nice.  I also tested a parent-child scenario where the pool is of
insufficient size and the child gets killed as expected. Thanks a million
for cleaning this up.

Some comments below but they are relatively minor.

> Signed-off-by: Andy Whitcroft <apw@shadowen.org>
> ---
>  mm/hugetlb.c |  151 ++++++++++++++++++++++++++++++++++++++++++++++++----------
>  1 files changed, 126 insertions(+), 25 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d701e39..ecff986 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -171,6 +171,30 @@ static long region_truncate(struct list_head *head, long end)
>  	return chg;
>  }
>  
> +static long region_count(struct list_head *head, long f, long t)
> +{
> +	struct file_region *rg;
> +	long chg = 0;
> +
> +	/* Locate each segment we overlap with, and count that overlap. */
> +	list_for_each_entry(rg, head, link) {
> +		int seg_from;
> +		int seg_to;
> +
> +		if (rg->to <= f)
> +			continue;
> +		if (rg->from >= t)
> +			break;
> +
> +		seg_from = max(rg->from, f);
> +		seg_to = min(rg->to, t);
> +
> +		chg += seg_to - seg_from;
> +	}
> +
> +	return chg;
> +}

Ok, seems straight forward. The tuples track pages that already exist so
by counting the overlaps in a given range, you know how many hugepages
have been faulted. The size of the VMA minus the overlap is the
required reservation.

> +
>  /*
>   * Convert the address within this vma to the page offset within
>   * the mapping, in base page units.
> @@ -193,9 +217,14 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
>  			(vma->vm_pgoff >> huge_page_order(h));
>  }
>  
> -#define HPAGE_RESV_OWNER    (1UL << (BITS_PER_LONG - 1))
> -#define HPAGE_RESV_UNMAPPED (1UL << (BITS_PER_LONG - 2))
> +/*
> + * Flags for MAP_PRIVATE reservations.  These are stored in the bottom
> + * bits of the reservation map pointer.
> + */
> +#define HPAGE_RESV_OWNER    (1UL << 0)
> +#define HPAGE_RESV_UNMAPPED (1UL << 1)
>  #define HPAGE_RESV_MASK (HPAGE_RESV_OWNER | HPAGE_RESV_UNMAPPED)
> +

The bits move here but for good reason. private_data is now a pointer and
we pack flags into bits that are available due to alignment.  Right?

>  /*
>   * These helpers are used to track how many pages are reserved for
>   * faults in a MAP_PRIVATE mapping. Only the process that called mmap()
> @@ -205,6 +234,15 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
>   * the reserve counters are updated with the hugetlb_lock held. It is safe
>   * to reset the VMA at fork() time as it is not in use yet and there is no
>   * chance of the global counters getting corrupted as a result of the values.
> + *
> + * The private mapping reservation is represented in a subtly different
> + * manner to a shared mapping.  A shared mapping has a region map associated
> + * with the underlying file, this region map represents the backing file
> + * pages which have had a reservation taken and this persists even after
> + * the page is instantiated.  A private mapping has a region map associated
> + * with the original mmap which is attached to all VMAs which reference it,
> + * this region map represents those offsets which have consumed reservation
> + * ie. where pages have been instantiated.
>   */
>  static unsigned long get_vma_private_data(struct vm_area_struct *vma)
>  {
> @@ -217,22 +255,44 @@ static void set_vma_private_data(struct vm_area_struct *vma,
>  	vma->vm_private_data = (void *)value;
>  }
>  
> -static unsigned long vma_resv_huge_pages(struct vm_area_struct *vma)
> +struct resv_map {
> +	struct kref refs;
> +	struct list_head regions;
> +};
> +
> +struct resv_map *resv_map_alloc(void)
> +{
> +	struct resv_map *resv_map = kmalloc(sizeof(*resv_map), GFP_KERNEL);
> +	if (!resv_map)
> +		return NULL;
> +
> +	kref_init(&resv_map->refs);
> +	INIT_LIST_HEAD(&resv_map->regions);
> +
> +	return resv_map;
> +}
> +
> +void resv_map_release(struct kref *ref)
> +{
> +        struct resv_map *resv_map = container_of(ref, struct resv_map, refs);
> +

tabs vs space problem here.

> +	region_truncate(&resv_map->regions, 0);
> +	kfree(resv_map);
> +}

Otherwise, looks right. The region_truncate() looked a bit odd but you
have call it or memory would leak so well thought out there.

> +
> +static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
>  {
>  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
>  	if (!(vma->vm_flags & VM_SHARED))
> -		return get_vma_private_data(vma) & ~HPAGE_RESV_MASK;
> +		return (struct resv_map *)(get_vma_private_data(vma) &
> +							~HPAGE_RESV_MASK);
>  	return 0;
>  }
>  
> -static void set_vma_resv_huge_pages(struct vm_area_struct *vma,
> -							unsigned long reserve)
> +static void set_vma_resv_map(struct vm_area_struct *vma, struct resv_map *map)
>  {
> -	VM_BUG_ON(!is_vm_hugetlb_page(vma));
> -	VM_BUG_ON(vma->vm_flags & VM_SHARED);
> -
> -	set_vma_private_data(vma,
> -		(get_vma_private_data(vma) & HPAGE_RESV_MASK) | reserve);
> +	set_vma_private_data(vma, (get_vma_private_data(vma) &
> +				HPAGE_RESV_MASK) | (unsigned long)map);
>  }

The VM_BUG_ON checks are removed here. Is that intentional? They still
seem valid but maybe I am missing something.

>  
>  static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
> @@ -251,11 +311,11 @@ static int is_vma_resv_set(struct vm_area_struct *vma, unsigned long flag)
>  }
>  
>  /* Decrement the reserved pages in the hugepage pool by one */
> -static void decrement_hugepage_resv_vma(struct hstate *h,
> -			struct vm_area_struct *vma)
> +static int decrement_hugepage_resv_vma(struct hstate *h,
> +			struct vm_area_struct *vma, unsigned long address)
>  {

The comment needs an update here to explain what the return value means.
I believe the reason is below.

>  	if (vma->vm_flags & VM_NORESERVE)
> -		return;
> +		return 0;
>  
>  	if (vma->vm_flags & VM_SHARED) {
>  		/* Shared mappings always use reserves */
> @@ -266,14 +326,19 @@ static void decrement_hugepage_resv_vma(struct hstate *h,
>  		 * private mappings.
>  		 */
>  		if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
> -			unsigned long flags, reserve;
> +			unsigned long idx = vma_pagecache_offset(h,
> +							vma, address);
> +			struct resv_map *reservations = vma_resv_map(vma);
> +
>  			h->resv_huge_pages--;
> -			flags = (unsigned long)vma->vm_private_data &
> -							HPAGE_RESV_MASK;
> -			reserve = (unsigned long)vma->vm_private_data - 1;
> -			vma->vm_private_data = (void *)(reserve | flags);
> +
> +			/* Mark this page used in the map. */
> +			if (region_chg(&reservations->regions, idx, idx + 1) < 0)
> +				return -1;
> +			region_add(&reservations->regions, idx, idx + 1);

There is an incredibly remote possibility that a fault would fail for a
mapping that had reserved huge pages because the kmalloc() in region_chg
failed. The system would have to be in terrible shape though. Should a
KERN_WARNING be printed here if this failure path is entered?  Otherwise it
will just mainfest as a SIGKILLd application.

>  		}
>  	}
> +	return 0;
>  }
>  
>  /* Reset counters to 0 and clear all HPAGE_RESV_* flags */
> @@ -289,7 +354,7 @@ static int vma_has_private_reserves(struct vm_area_struct *vma)
>  {
>  	if (vma->vm_flags & VM_SHARED)
>  		return 0;
> -	if (!vma_resv_huge_pages(vma))
> +	if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER))
>  		return 0;
>  	return 1;
>  }
> @@ -376,15 +441,16 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
>  		nid = zone_to_nid(zone);
>  		if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask) &&
>  		    !list_empty(&h->hugepage_freelists[nid])) {
> +			if (!avoid_reserve &&
> +			    decrement_hugepage_resv_vma(h, vma, address) < 0)
> +				return NULL;
> +
>  			page = list_entry(h->hugepage_freelists[nid].next,
>  					  struct page, lru);
>  			list_del(&page->lru);
>  			h->free_huge_pages--;
>  			h->free_huge_pages_node[nid]--;
>  
> -			if (!avoid_reserve)
> -				decrement_hugepage_resv_vma(h, vma);
> -
>  			break;
>  		}
>  	}
> @@ -1456,10 +1522,39 @@ out:
>  	return ret;
>  }
>  
> +static void hugetlb_vm_op_open(struct vm_area_struct *vma)
> +{
> + 	struct resv_map *reservations = vma_resv_map(vma);
> +
> +	/*
> +	 * This new VMA will share its siblings reservation map.  The open
> +	 * vm_op is only called for newly created VMAs which have been made
> +	 * from another, still existing VMA.  As that VMA has a reference to
> +	 * this reservation map the reservation map cannot disappear until
> +	 * after this open completes.  It is therefore safe to take a new
> +	 * reference here without additional locking.
> +	 */
> +	if (reservations)
> +		kref_get(&reservations->refs);
> +}

This comment is a tad misleading. The open call is also called at fork()
time. However, in the case of fork, the private_data will be cleared.
Maybe something like;

====
The open vm_op is called when new VMAs are created but only VMAs which
have been made from another, still existing VMA will have a
reservation....
====

?

> +
>  static void hugetlb_vm_op_close(struct vm_area_struct *vma)
>  {
>  	struct hstate *h = hstate_vma(vma);
> -	unsigned long reserve = vma_resv_huge_pages(vma);
> + 	struct resv_map *reservations = vma_resv_map(vma);
> +	unsigned long reserve = 0;
> +	unsigned long start;
> +	unsigned long end;
> +
> +	if (reservations) {
> +		start = vma_pagecache_offset(h, vma, vma->vm_start);
> +		end = vma_pagecache_offset(h, vma, vma->vm_end);
> +
> +		reserve = (end - start) -
> +			region_count(&reservations->regions, start, end);
> +
> +		kref_put(&reservations->refs, resv_map_release);
> +	}
>  

Clever. So, a split VMA will have the reference map for portions of the
mapping outside its range but region_count() ensures that we decrement
by the correct amount.

>  	if (reserve)
>  		hugetlb_acct_memory(h, -reserve);
> @@ -1479,6 +1574,7 @@ static int hugetlb_vm_op_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  
>  struct vm_operations_struct hugetlb_vm_ops = {
>  	.fault = hugetlb_vm_op_fault,
> +	.open = hugetlb_vm_op_open,
>  	.close = hugetlb_vm_op_close,
>  };
>  
> @@ -2037,8 +2133,13 @@ int hugetlb_reserve_pages(struct inode *inode,
>  	if (!vma || vma->vm_flags & VM_SHARED)
>  		chg = region_chg(&inode->i_mapping->private_list, from, to);
>  	else {
> +		struct resv_map *resv_map = resv_map_alloc();
> +		if (!resv_map)
> +			return -ENOMEM;
> +
>  		chg = to - from;
> -		set_vma_resv_huge_pages(vma, chg);
> +
> +		set_vma_resv_map(vma, resv_map);
>  		set_vma_resv_flags(vma, HPAGE_RESV_OWNER);
>  	}

Overall, this is a really clever idea and I like that it brings private
mappings closer to shared mappings in a number of respects. 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits
@ 2008-06-23  7:33         ` Mel Gorman
  0 siblings, 0 replies; 290+ messages in thread
From: Mel Gorman @ 2008-06-23  7:33 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Jon Tollefson, Andrew Morton, Nick Piggin, Nishanth Aravamudan,
	Adam Litke, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On (20/06/08 20:17), Andy Whitcroft didst pronounce:
> When a hugetlb mapping with a reservation is split, a new VMA is cloned
> from the original.  This new VMA is a direct copy of the original
> including the reservation count.  When this pair of VMAs are unmapped
> we will incorrect double account the unused reservation and the overall
> reservation count will be incorrect, in extreme cases it will wrap.
> 

D'oh. It's not even that extreme, it's fairly straight-forward
to trigger as it turns out as this crappy application shows
http://www.csn.ul.ie/~mel/postings/apw-20080622/hugetlbfs-unmap-private-test.c
. This runs on x86 and can wrap the rsvd counters. I believe the other tests
I was running had already used the reserves and missed this test case.

> The problem occurs when we split an existing VMA say to unmap a page in
> the middle.  split_vma() will create a new VMA copying all fields from
> the original.  As we are storing our reservation count in vm_private_data
> this is also copies, endowing the new VMA with a duplicate of the original
> VMA's reservation.  Neither of the new VMAs can exhaust these reservations
> as they are too small, but when we unmap and close these VMAs we will
> incorrect credit the remainder twice and resv_huge_pages will become
> out of sync.  This can lead to allocation failures on mappings with
> reservations and even to resv_huge_pages wrapping which prevents all
> subsequent hugepage allocations.
> 

Yeah, that does sound as if it would occur all right and running the
test program confirms it.

> The simple fix would be to correctly apportion the remaining reservation
> count when the split is made.  However the only hook we have vm_ops->open
> only has the new VMA we do not know the identity of the preceeding VMA.
> Also even if we did have that VMA to hand we do not know how much of the
> reservation was consumed each side of the split.
> 
> This patch therefore takes a different tack.  We know that the whole of any
> private mapping (which has a reservation) has a reservation over its whole
> size.  Any present pages represent consumed reservation.  Therefore if
> we track the instantiated pages we can calculate the remaining reservation.
> 
> This patch reuses the existing regions code to track the regions for which
> we have consumed reservation (ie. the instantiated pages), as each page
> is faulted in we record the consumption of reservation for the new page.

Clever. The additional nice thing is that it makes private mappings less of
a special case in comparison to shared mappings. My impression right now is
that with the path, shared mappings track reservations based on the underlying
file and the private mappings are generally tracked per-mapping and only share
due to unmap-related-splits or forks().  That seems a bit more consistent.

> When we need to return unused reservations at unmap time we simply count
> the consumed reservation region subtracting that from the whole of the map.
> During a VMA split the newly opened VMA will point to the same region map,
> as this map is offset oriented it remains valid for both of the split VMAs.
> This map is referenced counted so that it is removed when all VMAs which
> are part of the mmap are gone.
> 

This looks sensible and applying the patches and running the test program
means that the reserve counter does not wrap when the program exists which
is very nice.  I also tested a parent-child scenario where the pool is of
insufficient size and the child gets killed as expected. Thanks a million
for cleaning this up.

Some comments below but they are relatively minor.

> Signed-off-by: Andy Whitcroft <apw-26w3C0LaAnFg9hUCZPvPmw@public.gmane.org>
> ---
>  mm/hugetlb.c |  151 ++++++++++++++++++++++++++++++++++++++++++++++++----------
>  1 files changed, 126 insertions(+), 25 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d701e39..ecff986 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -171,6 +171,30 @@ static long region_truncate(struct list_head *head, long end)
>  	return chg;
>  }
>  
> +static long region_count(struct list_head *head, long f, long t)
> +{
> +	struct file_region *rg;
> +	long chg = 0;
> +
> +	/* Locate each segment we overlap with, and count that overlap. */
> +	list_for_each_entry(rg, head, link) {
> +		int seg_from;
> +		int seg_to;
> +
> +		if (rg->to <= f)
> +			continue;
> +		if (rg->from >= t)
> +			break;
> +
> +		seg_from = max(rg->from, f);
> +		seg_to = min(rg->to, t);
> +
> +		chg += seg_to - seg_from;
> +	}
> +
> +	return chg;
> +}

Ok, seems straight forward. The tuples track pages that already exist so
by counting the overlaps in a given range, you know how many hugepages
have been faulted. The size of the VMA minus the overlap is the
required reservation.

> +
>  /*
>   * Convert the address within this vma to the page offset within
>   * the mapping, in base page units.
> @@ -193,9 +217,14 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
>  			(vma->vm_pgoff >> huge_page_order(h));
>  }
>  
> -#define HPAGE_RESV_OWNER    (1UL << (BITS_PER_LONG - 1))
> -#define HPAGE_RESV_UNMAPPED (1UL << (BITS_PER_LONG - 2))
> +/*
> + * Flags for MAP_PRIVATE reservations.  These are stored in the bottom
> + * bits of the reservation map pointer.
> + */
> +#define HPAGE_RESV_OWNER    (1UL << 0)
> +#define HPAGE_RESV_UNMAPPED (1UL << 1)
>  #define HPAGE_RESV_MASK (HPAGE_RESV_OWNER | HPAGE_RESV_UNMAPPED)
> +

The bits move here but for good reason. private_data is now a pointer and
we pack flags into bits that are available due to alignment.  Right?

>  /*
>   * These helpers are used to track how many pages are reserved for
>   * faults in a MAP_PRIVATE mapping. Only the process that called mmap()
> @@ -205,6 +234,15 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
>   * the reserve counters are updated with the hugetlb_lock held. It is safe
>   * to reset the VMA at fork() time as it is not in use yet and there is no
>   * chance of the global counters getting corrupted as a result of the values.
> + *
> + * The private mapping reservation is represented in a subtly different
> + * manner to a shared mapping.  A shared mapping has a region map associated
> + * with the underlying file, this region map represents the backing file
> + * pages which have had a reservation taken and this persists even after
> + * the page is instantiated.  A private mapping has a region map associated
> + * with the original mmap which is attached to all VMAs which reference it,
> + * this region map represents those offsets which have consumed reservation
> + * ie. where pages have been instantiated.
>   */
>  static unsigned long get_vma_private_data(struct vm_area_struct *vma)
>  {
> @@ -217,22 +255,44 @@ static void set_vma_private_data(struct vm_area_struct *vma,
>  	vma->vm_private_data = (void *)value;
>  }
>  
> -static unsigned long vma_resv_huge_pages(struct vm_area_struct *vma)
> +struct resv_map {
> +	struct kref refs;
> +	struct list_head regions;
> +};
> +
> +struct resv_map *resv_map_alloc(void)
> +{
> +	struct resv_map *resv_map = kmalloc(sizeof(*resv_map), GFP_KERNEL);
> +	if (!resv_map)
> +		return NULL;
> +
> +	kref_init(&resv_map->refs);
> +	INIT_LIST_HEAD(&resv_map->regions);
> +
> +	return resv_map;
> +}
> +
> +void resv_map_release(struct kref *ref)
> +{
> +        struct resv_map *resv_map = container_of(ref, struct resv_map, refs);
> +

tabs vs space problem here.

> +	region_truncate(&resv_map->regions, 0);
> +	kfree(resv_map);
> +}

Otherwise, looks right. The region_truncate() looked a bit odd but you
have call it or memory would leak so well thought out there.

> +
> +static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
>  {
>  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
>  	if (!(vma->vm_flags & VM_SHARED))
> -		return get_vma_private_data(vma) & ~HPAGE_RESV_MASK;
> +		return (struct resv_map *)(get_vma_private_data(vma) &
> +							~HPAGE_RESV_MASK);
>  	return 0;
>  }
>  
> -static void set_vma_resv_huge_pages(struct vm_area_struct *vma,
> -							unsigned long reserve)
> +static void set_vma_resv_map(struct vm_area_struct *vma, struct resv_map *map)
>  {
> -	VM_BUG_ON(!is_vm_hugetlb_page(vma));
> -	VM_BUG_ON(vma->vm_flags & VM_SHARED);
> -
> -	set_vma_private_data(vma,
> -		(get_vma_private_data(vma) & HPAGE_RESV_MASK) | reserve);
> +	set_vma_private_data(vma, (get_vma_private_data(vma) &
> +				HPAGE_RESV_MASK) | (unsigned long)map);
>  }

The VM_BUG_ON checks are removed here. Is that intentional? They still
seem valid but maybe I am missing something.

>  
>  static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
> @@ -251,11 +311,11 @@ static int is_vma_resv_set(struct vm_area_struct *vma, unsigned long flag)
>  }
>  
>  /* Decrement the reserved pages in the hugepage pool by one */
> -static void decrement_hugepage_resv_vma(struct hstate *h,
> -			struct vm_area_struct *vma)
> +static int decrement_hugepage_resv_vma(struct hstate *h,
> +			struct vm_area_struct *vma, unsigned long address)
>  {

The comment needs an update here to explain what the return value means.
I believe the reason is below.

>  	if (vma->vm_flags & VM_NORESERVE)
> -		return;
> +		return 0;
>  
>  	if (vma->vm_flags & VM_SHARED) {
>  		/* Shared mappings always use reserves */
> @@ -266,14 +326,19 @@ static void decrement_hugepage_resv_vma(struct hstate *h,
>  		 * private mappings.
>  		 */
>  		if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
> -			unsigned long flags, reserve;
> +			unsigned long idx = vma_pagecache_offset(h,
> +							vma, address);
> +			struct resv_map *reservations = vma_resv_map(vma);
> +
>  			h->resv_huge_pages--;
> -			flags = (unsigned long)vma->vm_private_data &
> -							HPAGE_RESV_MASK;
> -			reserve = (unsigned long)vma->vm_private_data - 1;
> -			vma->vm_private_data = (void *)(reserve | flags);
> +
> +			/* Mark this page used in the map. */
> +			if (region_chg(&reservations->regions, idx, idx + 1) < 0)
> +				return -1;
> +			region_add(&reservations->regions, idx, idx + 1);

There is an incredibly remote possibility that a fault would fail for a
mapping that had reserved huge pages because the kmalloc() in region_chg
failed. The system would have to be in terrible shape though. Should a
KERN_WARNING be printed here if this failure path is entered?  Otherwise it
will just mainfest as a SIGKILLd application.

>  		}
>  	}
> +	return 0;
>  }
>  
>  /* Reset counters to 0 and clear all HPAGE_RESV_* flags */
> @@ -289,7 +354,7 @@ static int vma_has_private_reserves(struct vm_area_struct *vma)
>  {
>  	if (vma->vm_flags & VM_SHARED)
>  		return 0;
> -	if (!vma_resv_huge_pages(vma))
> +	if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER))
>  		return 0;
>  	return 1;
>  }
> @@ -376,15 +441,16 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
>  		nid = zone_to_nid(zone);
>  		if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask) &&
>  		    !list_empty(&h->hugepage_freelists[nid])) {
> +			if (!avoid_reserve &&
> +			    decrement_hugepage_resv_vma(h, vma, address) < 0)
> +				return NULL;
> +
>  			page = list_entry(h->hugepage_freelists[nid].next,
>  					  struct page, lru);
>  			list_del(&page->lru);
>  			h->free_huge_pages--;
>  			h->free_huge_pages_node[nid]--;
>  
> -			if (!avoid_reserve)
> -				decrement_hugepage_resv_vma(h, vma);
> -
>  			break;
>  		}
>  	}
> @@ -1456,10 +1522,39 @@ out:
>  	return ret;
>  }
>  
> +static void hugetlb_vm_op_open(struct vm_area_struct *vma)
> +{
> + 	struct resv_map *reservations = vma_resv_map(vma);
> +
> +	/*
> +	 * This new VMA will share its siblings reservation map.  The open
> +	 * vm_op is only called for newly created VMAs which have been made
> +	 * from another, still existing VMA.  As that VMA has a reference to
> +	 * this reservation map the reservation map cannot disappear until
> +	 * after this open completes.  It is therefore safe to take a new
> +	 * reference here without additional locking.
> +	 */
> +	if (reservations)
> +		kref_get(&reservations->refs);
> +}

This comment is a tad misleading. The open call is also called at fork()
time. However, in the case of fork, the private_data will be cleared.
Maybe something like;

====
The open vm_op is called when new VMAs are created but only VMAs which
have been made from another, still existing VMA will have a
reservation....
====

?

> +
>  static void hugetlb_vm_op_close(struct vm_area_struct *vma)
>  {
>  	struct hstate *h = hstate_vma(vma);
> -	unsigned long reserve = vma_resv_huge_pages(vma);
> + 	struct resv_map *reservations = vma_resv_map(vma);
> +	unsigned long reserve = 0;
> +	unsigned long start;
> +	unsigned long end;
> +
> +	if (reservations) {
> +		start = vma_pagecache_offset(h, vma, vma->vm_start);
> +		end = vma_pagecache_offset(h, vma, vma->vm_end);
> +
> +		reserve = (end - start) -
> +			region_count(&reservations->regions, start, end);
> +
> +		kref_put(&reservations->refs, resv_map_release);
> +	}
>  

Clever. So, a split VMA will have the reference map for portions of the
mapping outside its range but region_count() ensures that we decrement
by the correct amount.

>  	if (reserve)
>  		hugetlb_acct_memory(h, -reserve);
> @@ -1479,6 +1574,7 @@ static int hugetlb_vm_op_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  
>  struct vm_operations_struct hugetlb_vm_ops = {
>  	.fault = hugetlb_vm_op_fault,
> +	.open = hugetlb_vm_op_open,
>  	.close = hugetlb_vm_op_close,
>  };
>  
> @@ -2037,8 +2133,13 @@ int hugetlb_reserve_pages(struct inode *inode,
>  	if (!vma || vma->vm_flags & VM_SHARED)
>  		chg = region_chg(&inode->i_mapping->private_list, from, to);
>  	else {
> +		struct resv_map *resv_map = resv_map_alloc();
> +		if (!resv_map)
> +			return -ENOMEM;
> +
>  		chg = to - from;
> -		set_vma_resv_huge_pages(vma, chg);
> +
> +		set_vma_resv_map(vma, resv_map);
>  		set_vma_resv_flags(vma, HPAGE_RESV_OWNER);
>  	}

Overall, this is a really clever idea and I like that it brings private
mappings closer to shared mappings in a number of respects. 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits
  2008-06-20 19:17       ` Andy Whitcroft
  (?)
@ 2008-06-23  8:00         ` Mel Gorman
  -1 siblings, 0 replies; 290+ messages in thread
From: Mel Gorman @ 2008-06-23  8:00 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Jon Tollefson, Andrew Morton, Nick Piggin, Nishanth Aravamudan,
	Adam Litke, linux-kernel, kernel-testers, linux-mm

Typical. I spotted this after I pushed send.....

> <SNIP>

> @@ -266,14 +326,19 @@ static void decrement_hugepage_resv_vma(struct hstate *h,
>  		 * private mappings.
>  		 */
>  		if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
> -			unsigned long flags, reserve;
> +			unsigned long idx = vma_pagecache_offset(h,
> +							vma, address);
> +			struct resv_map *reservations = vma_resv_map(vma);
> +
>  			h->resv_huge_pages--;
> -			flags = (unsigned long)vma->vm_private_data &
> -							HPAGE_RESV_MASK;
> -			reserve = (unsigned long)vma->vm_private_data - 1;
> -			vma->vm_private_data = (void *)(reserve | flags);
> +
> +			/* Mark this page used in the map. */
> +			if (region_chg(&reservations->regions, idx, idx + 1) < 0)
> +				return -1;
> +			region_add(&reservations->regions, idx, idx + 1);
>  		}

decrement_hugepage_resv_vma() is called with hugetlb_lock held and region_chg
calls kmalloc(GFP_KERNEL).  Hence it's possible we would sleep with that
spinlock held which is a bit uncool. The allocation needs to happen outside
the lock. Right?

> <SNIP>

-- 
Mel Gorman

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits
@ 2008-06-23  8:00         ` Mel Gorman
  0 siblings, 0 replies; 290+ messages in thread
From: Mel Gorman @ 2008-06-23  8:00 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Jon Tollefson, Andrew Morton, Nick Piggin, Nishanth Aravamudan,
	Adam Litke, linux-kernel, kernel-testers, linux-mm

Typical. I spotted this after I pushed send.....

> <SNIP>

> @@ -266,14 +326,19 @@ static void decrement_hugepage_resv_vma(struct hstate *h,
>  		 * private mappings.
>  		 */
>  		if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
> -			unsigned long flags, reserve;
> +			unsigned long idx = vma_pagecache_offset(h,
> +							vma, address);
> +			struct resv_map *reservations = vma_resv_map(vma);
> +
>  			h->resv_huge_pages--;
> -			flags = (unsigned long)vma->vm_private_data &
> -							HPAGE_RESV_MASK;
> -			reserve = (unsigned long)vma->vm_private_data - 1;
> -			vma->vm_private_data = (void *)(reserve | flags);
> +
> +			/* Mark this page used in the map. */
> +			if (region_chg(&reservations->regions, idx, idx + 1) < 0)
> +				return -1;
> +			region_add(&reservations->regions, idx, idx + 1);
>  		}

decrement_hugepage_resv_vma() is called with hugetlb_lock held and region_chg
calls kmalloc(GFP_KERNEL).  Hence it's possible we would sleep with that
spinlock held which is a bit uncool. The allocation needs to happen outside
the lock. Right?

> <SNIP>

-- 
Mel Gorman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits
@ 2008-06-23  8:00         ` Mel Gorman
  0 siblings, 0 replies; 290+ messages in thread
From: Mel Gorman @ 2008-06-23  8:00 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Jon Tollefson, Andrew Morton, Nick Piggin, Nishanth Aravamudan,
	Adam Litke, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

Typical. I spotted this after I pushed send.....

> <SNIP>

> @@ -266,14 +326,19 @@ static void decrement_hugepage_resv_vma(struct hstate *h,
>  		 * private mappings.
>  		 */
>  		if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
> -			unsigned long flags, reserve;
> +			unsigned long idx = vma_pagecache_offset(h,
> +							vma, address);
> +			struct resv_map *reservations = vma_resv_map(vma);
> +
>  			h->resv_huge_pages--;
> -			flags = (unsigned long)vma->vm_private_data &
> -							HPAGE_RESV_MASK;
> -			reserve = (unsigned long)vma->vm_private_data - 1;
> -			vma->vm_private_data = (void *)(reserve | flags);
> +
> +			/* Mark this page used in the map. */
> +			if (region_chg(&reservations->regions, idx, idx + 1) < 0)
> +				return -1;
> +			region_add(&reservations->regions, idx, idx + 1);
>  		}

decrement_hugepage_resv_vma() is called with hugetlb_lock held and region_chg
calls kmalloc(GFP_KERNEL).  Hence it's possible we would sleep with that
spinlock held which is a bit uncool. The allocation needs to happen outside
the lock. Right?

> <SNIP>

-- 
Mel Gorman

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits
  2008-06-23  8:00         ` Mel Gorman
  (?)
@ 2008-06-23  9:53           ` Andy Whitcroft
  -1 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-23  9:53 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Jon Tollefson, Andrew Morton, Nick Piggin, Nishanth Aravamudan,
	Adam Litke, linux-kernel, kernel-testers, linux-mm

On Mon, Jun 23, 2008 at 09:00:48AM +0100, Mel Gorman wrote:
> Typical. I spotted this after I pushed send.....
> 
> > <SNIP>
> 
> > @@ -266,14 +326,19 @@ static void decrement_hugepage_resv_vma(struct hstate *h,
> >  		 * private mappings.
> >  		 */
> >  		if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
> > -			unsigned long flags, reserve;
> > +			unsigned long idx = vma_pagecache_offset(h,
> > +							vma, address);
> > +			struct resv_map *reservations = vma_resv_map(vma);
> > +
> >  			h->resv_huge_pages--;
> > -			flags = (unsigned long)vma->vm_private_data &
> > -							HPAGE_RESV_MASK;
> > -			reserve = (unsigned long)vma->vm_private_data - 1;
> > -			vma->vm_private_data = (void *)(reserve | flags);
> > +
> > +			/* Mark this page used in the map. */
> > +			if (region_chg(&reservations->regions, idx, idx + 1) < 0)
> > +				return -1;
> > +			region_add(&reservations->regions, idx, idx + 1);
> >  		}
> 
> decrement_hugepage_resv_vma() is called with hugetlb_lock held and region_chg
> calls kmalloc(GFP_KERNEL).  Hence it's possible we would sleep with that
> spinlock held which is a bit uncool. The allocation needs to happen outside
> the lock. Right?

Yes, good spot.  Luckily this pair of calls can be separated, as the
first is a prepare and the second a commit.  So I can trivially pull
the allocation outside the lock.

Had a quick go at this and it looks like I can move both out of the lock
to a much more logical spot and clean the patch up significantly.  Will
fold in your other comments and post up a V2 once it has been tested.

Thanks.

-apw

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits
@ 2008-06-23  9:53           ` Andy Whitcroft
  0 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-23  9:53 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Jon Tollefson, Andrew Morton, Nick Piggin, Nishanth Aravamudan,
	Adam Litke, linux-kernel, kernel-testers, linux-mm

On Mon, Jun 23, 2008 at 09:00:48AM +0100, Mel Gorman wrote:
> Typical. I spotted this after I pushed send.....
> 
> > <SNIP>
> 
> > @@ -266,14 +326,19 @@ static void decrement_hugepage_resv_vma(struct hstate *h,
> >  		 * private mappings.
> >  		 */
> >  		if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
> > -			unsigned long flags, reserve;
> > +			unsigned long idx = vma_pagecache_offset(h,
> > +							vma, address);
> > +			struct resv_map *reservations = vma_resv_map(vma);
> > +
> >  			h->resv_huge_pages--;
> > -			flags = (unsigned long)vma->vm_private_data &
> > -							HPAGE_RESV_MASK;
> > -			reserve = (unsigned long)vma->vm_private_data - 1;
> > -			vma->vm_private_data = (void *)(reserve | flags);
> > +
> > +			/* Mark this page used in the map. */
> > +			if (region_chg(&reservations->regions, idx, idx + 1) < 0)
> > +				return -1;
> > +			region_add(&reservations->regions, idx, idx + 1);
> >  		}
> 
> decrement_hugepage_resv_vma() is called with hugetlb_lock held and region_chg
> calls kmalloc(GFP_KERNEL).  Hence it's possible we would sleep with that
> spinlock held which is a bit uncool. The allocation needs to happen outside
> the lock. Right?

Yes, good spot.  Luckily this pair of calls can be separated, as the
first is a prepare and the second a commit.  So I can trivially pull
the allocation outside the lock.

Had a quick go at this and it looks like I can move both out of the lock
to a much more logical spot and clean the patch up significantly.  Will
fold in your other comments and post up a V2 once it has been tested.

Thanks.

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits
@ 2008-06-23  9:53           ` Andy Whitcroft
  0 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-23  9:53 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Jon Tollefson, Andrew Morton, Nick Piggin, Nishanth Aravamudan,
	Adam Litke, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Mon, Jun 23, 2008 at 09:00:48AM +0100, Mel Gorman wrote:
> Typical. I spotted this after I pushed send.....
> 
> > <SNIP>
> 
> > @@ -266,14 +326,19 @@ static void decrement_hugepage_resv_vma(struct hstate *h,
> >  		 * private mappings.
> >  		 */
> >  		if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
> > -			unsigned long flags, reserve;
> > +			unsigned long idx = vma_pagecache_offset(h,
> > +							vma, address);
> > +			struct resv_map *reservations = vma_resv_map(vma);
> > +
> >  			h->resv_huge_pages--;
> > -			flags = (unsigned long)vma->vm_private_data &
> > -							HPAGE_RESV_MASK;
> > -			reserve = (unsigned long)vma->vm_private_data - 1;
> > -			vma->vm_private_data = (void *)(reserve | flags);
> > +
> > +			/* Mark this page used in the map. */
> > +			if (region_chg(&reservations->regions, idx, idx + 1) < 0)
> > +				return -1;
> > +			region_add(&reservations->regions, idx, idx + 1);
> >  		}
> 
> decrement_hugepage_resv_vma() is called with hugetlb_lock held and region_chg
> calls kmalloc(GFP_KERNEL).  Hence it's possible we would sleep with that
> spinlock held which is a bit uncool. The allocation needs to happen outside
> the lock. Right?

Yes, good spot.  Luckily this pair of calls can be separated, as the
first is a prepare and the second a commit.  So I can trivially pull
the allocation outside the lock.

Had a quick go at this and it looks like I can move both out of the lock
to a much more logical spot and clean the patch up significantly.  Will
fold in your other comments and post up a V2 once it has been tested.

Thanks.

-apw

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [RFC] hugetlb reservations -- MAP_PRIVATE fixes for split vmas
  2008-06-20 19:17     ` Andy Whitcroft
  (?)
@ 2008-06-23 16:04       ` Jon Tollefson
  -1 siblings, 0 replies; 290+ messages in thread
From: Jon Tollefson @ 2008-06-23 16:04 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel, kernel-testers, linux-mm, Mel Gorman

Andy Whitcroft wrote:
> As reported by Adam Litke and Jon Tollefson one of the libhugetlbfs
> regression tests triggers a negative overall reservation count.  When
> this occurs where there is no dynamic pool enabled tests will fail.
>
> Following this email are two patches to fix this issue:
>
> hugetlb reservations: move region tracking earlier -- simply moves the
>   region tracking code earlier so we do not have to supply prototypes, and
>
> hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma
>   splits -- which moves us to tracking the consumed reservation so that
>   we can correctly calculate the remaining reservations at vma close time.
>
> This stack is against the top of v2.6.25-rc6-mm3, should this solution
> prove acceptable it would probabally need porting below Nicks multiple
> hugepage size patches and those updated; if so I would be happy to do
> that too.
>
> Jon could you have a test on this and see if it works out for you.
>
> -apw
>   
Looking good so far.
I am not seeing any of the tests push the reservation number negative -
with this patch set applied

Jon



^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [RFC] hugetlb reservations -- MAP_PRIVATE fixes for split vmas
@ 2008-06-23 16:04       ` Jon Tollefson
  0 siblings, 0 replies; 290+ messages in thread
From: Jon Tollefson @ 2008-06-23 16:04 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel, kernel-testers, linux-mm, Mel Gorman

Andy Whitcroft wrote:
> As reported by Adam Litke and Jon Tollefson one of the libhugetlbfs
> regression tests triggers a negative overall reservation count.  When
> this occurs where there is no dynamic pool enabled tests will fail.
>
> Following this email are two patches to fix this issue:
>
> hugetlb reservations: move region tracking earlier -- simply moves the
>   region tracking code earlier so we do not have to supply prototypes, and
>
> hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma
>   splits -- which moves us to tracking the consumed reservation so that
>   we can correctly calculate the remaining reservations at vma close time.
>
> This stack is against the top of v2.6.25-rc6-mm3, should this solution
> prove acceptable it would probabally need porting below Nicks multiple
> hugepage size patches and those updated; if so I would be happy to do
> that too.
>
> Jon could you have a test on this and see if it works out for you.
>
> -apw
>   
Looking good so far.
I am not seeing any of the tests push the reservation number negative -
with this patch set applied

Jon


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [RFC] hugetlb reservations -- MAP_PRIVATE fixes for split vmas
@ 2008-06-23 16:04       ` Jon Tollefson
  0 siblings, 0 replies; 290+ messages in thread
From: Jon Tollefson @ 2008-06-23 16:04 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Mel Gorman

Andy Whitcroft wrote:
> As reported by Adam Litke and Jon Tollefson one of the libhugetlbfs
> regression tests triggers a negative overall reservation count.  When
> this occurs where there is no dynamic pool enabled tests will fail.
>
> Following this email are two patches to fix this issue:
>
> hugetlb reservations: move region tracking earlier -- simply moves the
>   region tracking code earlier so we do not have to supply prototypes, and
>
> hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma
>   splits -- which moves us to tracking the consumed reservation so that
>   we can correctly calculate the remaining reservations at vma close time.
>
> This stack is against the top of v2.6.25-rc6-mm3, should this solution
> prove acceptable it would probabally need porting below Nicks multiple
> hugepage size patches and those updated; if so I would be happy to do
> that too.
>
> Jon could you have a test on this and see if it works out for you.
>
> -apw
>   
Looking good so far.
I am not seeing any of the tests push the reservation number negative -
with this patch set applied

Jon


^ permalink raw reply	[flat|nested] 290+ messages in thread

* [RFC] hugetlb reservations -- MAP_PRIVATE fixes for split vmas V2
  2008-06-19 16:27   ` Jon Tollefson
  (?)
@ 2008-06-23 17:35     ` Andy Whitcroft
  -1 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-23 17:35 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel, kernel-testers, linux-mm, Mel Gorman,
	Andy Whitcroft

As reported by Adam Litke and Jon Tollefson one of the libhugetlbfs
regression tests triggers a negative overall reservation count.  When
this occurs where there is no dynamic pool enabled tests will fail.

Following this email are two patches to address this issue:

hugetlb reservations: move region tracking earlier -- simply moves the
  region tracking code earlier so we do not have to supply prototypes, and

hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma
  splits -- which moves us to tracking the consumed reservation so that
  we can correctly calculate the remaining reservations at vma close time.

This stack is against the top of v2.6.25-rc6-mm3, should this solution
prove acceptable it would need slipping underneath Nick's multiple hugepage
size patches and those updated.  I have a modified stack prepared for that.

This version incorporates Mel's feedback (both cosmetic, and an allocation
under spinlock issue) and has an improved layout.

Changes in V2:
 - commentry updates
 - pull allocations out from under hugetlb_lock
 - refactor to match shared code layout
 - reinstate BUG_ON's

Jon could you have a test on this and see if it works out for you.

-apw

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [RFC] hugetlb reservations -- MAP_PRIVATE fixes for split vmas V2
@ 2008-06-23 17:35     ` Andy Whitcroft
  0 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-23 17:35 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel, kernel-testers, linux-mm, Mel Gorman,
	Andy Whitcroft

As reported by Adam Litke and Jon Tollefson one of the libhugetlbfs
regression tests triggers a negative overall reservation count.  When
this occurs where there is no dynamic pool enabled tests will fail.

Following this email are two patches to address this issue:

hugetlb reservations: move region tracking earlier -- simply moves the
  region tracking code earlier so we do not have to supply prototypes, and

hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma
  splits -- which moves us to tracking the consumed reservation so that
  we can correctly calculate the remaining reservations at vma close time.

This stack is against the top of v2.6.25-rc6-mm3, should this solution
prove acceptable it would need slipping underneath Nick's multiple hugepage
size patches and those updated.  I have a modified stack prepared for that.

This version incorporates Mel's feedback (both cosmetic, and an allocation
under spinlock issue) and has an improved layout.

Changes in V2:
 - commentry updates
 - pull allocations out from under hugetlb_lock
 - refactor to match shared code layout
 - reinstate BUG_ON's

Jon could you have a test on this and see if it works out for you.

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [RFC] hugetlb reservations -- MAP_PRIVATE fixes for split vmas V2
@ 2008-06-23 17:35     ` Andy Whitcroft
  0 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-23 17:35 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Mel Gorman, Andy Whitcroft

As reported by Adam Litke and Jon Tollefson one of the libhugetlbfs
regression tests triggers a negative overall reservation count.  When
this occurs where there is no dynamic pool enabled tests will fail.

Following this email are two patches to address this issue:

hugetlb reservations: move region tracking earlier -- simply moves the
  region tracking code earlier so we do not have to supply prototypes, and

hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma
  splits -- which moves us to tracking the consumed reservation so that
  we can correctly calculate the remaining reservations at vma close time.

This stack is against the top of v2.6.25-rc6-mm3, should this solution
prove acceptable it would need slipping underneath Nick's multiple hugepage
size patches and those updated.  I have a modified stack prepared for that.

This version incorporates Mel's feedback (both cosmetic, and an allocation
under spinlock issue) and has an improved layout.

Changes in V2:
 - commentry updates
 - pull allocations out from under hugetlb_lock
 - refactor to match shared code layout
 - reinstate BUG_ON's

Jon could you have a test on this and see if it works out for you.

-apw

^ permalink raw reply	[flat|nested] 290+ messages in thread

* [PATCH 1/2] hugetlb reservations: move region tracking earlier
  2008-06-23 17:35     ` Andy Whitcroft
  (?)
@ 2008-06-23 17:35       ` Andy Whitcroft
  -1 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-23 17:35 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel, kernel-testers, linux-mm, Mel Gorman,
	Andy Whitcroft

Move the region tracking code much earlier so we can use it for page
presence tracking later on.  No code is changed, just its location.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 mm/hugetlb.c |  246 +++++++++++++++++++++++++++++----------------------------
 1 files changed, 125 insertions(+), 121 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0f76ed1..d701e39 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -47,6 +47,131 @@ static unsigned long __initdata default_hstate_size;
 static DEFINE_SPINLOCK(hugetlb_lock);
 
 /*
+ * Region tracking -- allows tracking of reservations and instantiated pages
+ *                    across the pages in a mapping.
+ */
+struct file_region {
+	struct list_head link;
+	long from;
+	long to;
+};
+
+static long region_add(struct list_head *head, long f, long t)
+{
+	struct file_region *rg, *nrg, *trg;
+
+	/* Locate the region we are either in or before. */
+	list_for_each_entry(rg, head, link)
+		if (f <= rg->to)
+			break;
+
+	/* Round our left edge to the current segment if it encloses us. */
+	if (f > rg->from)
+		f = rg->from;
+
+	/* Check for and consume any regions we now overlap with. */
+	nrg = rg;
+	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
+		if (&rg->link == head)
+			break;
+		if (rg->from > t)
+			break;
+
+		/* If this area reaches higher then extend our area to
+		 * include it completely.  If this is not the first area
+		 * which we intend to reuse, free it. */
+		if (rg->to > t)
+			t = rg->to;
+		if (rg != nrg) {
+			list_del(&rg->link);
+			kfree(rg);
+		}
+	}
+	nrg->from = f;
+	nrg->to = t;
+	return 0;
+}
+
+static long region_chg(struct list_head *head, long f, long t)
+{
+	struct file_region *rg, *nrg;
+	long chg = 0;
+
+	/* Locate the region we are before or in. */
+	list_for_each_entry(rg, head, link)
+		if (f <= rg->to)
+			break;
+
+	/* If we are below the current region then a new region is required.
+	 * Subtle, allocate a new region at the position but make it zero
+	 * size such that we can guarantee to record the reservation. */
+	if (&rg->link == head || t < rg->from) {
+		nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
+		if (!nrg)
+			return -ENOMEM;
+		nrg->from = f;
+		nrg->to   = f;
+		INIT_LIST_HEAD(&nrg->link);
+		list_add(&nrg->link, rg->link.prev);
+
+		return t - f;
+	}
+
+	/* Round our left edge to the current segment if it encloses us. */
+	if (f > rg->from)
+		f = rg->from;
+	chg = t - f;
+
+	/* Check for and consume any regions we now overlap with. */
+	list_for_each_entry(rg, rg->link.prev, link) {
+		if (&rg->link == head)
+			break;
+		if (rg->from > t)
+			return chg;
+
+		/* We overlap with this area, if it extends futher than
+		 * us then we must extend ourselves.  Account for its
+		 * existing reservation. */
+		if (rg->to > t) {
+			chg += rg->to - t;
+			t = rg->to;
+		}
+		chg -= rg->to - rg->from;
+	}
+	return chg;
+}
+
+static long region_truncate(struct list_head *head, long end)
+{
+	struct file_region *rg, *trg;
+	long chg = 0;
+
+	/* Locate the region we are either in or before. */
+	list_for_each_entry(rg, head, link)
+		if (end <= rg->to)
+			break;
+	if (&rg->link == head)
+		return 0;
+
+	/* If we are in the middle of a region then adjust it. */
+	if (end > rg->from) {
+		chg = rg->to - end;
+		rg->to = end;
+		rg = list_entry(rg->link.next, typeof(*rg), link);
+	}
+
+	/* Drop any remaining regions. */
+	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
+		if (&rg->link == head)
+			break;
+		chg += rg->to - rg->from;
+		list_del(&rg->link);
+		kfree(rg);
+	}
+	return chg;
+}
+
+/*
  * Convert the address within this vma to the page offset within
  * the mapping, in base page units.
  */
@@ -649,127 +774,6 @@ static void return_unused_surplus_pages(struct hstate *h,
 	}
 }
 
-struct file_region {
-	struct list_head link;
-	long from;
-	long to;
-};
-
-static long region_add(struct list_head *head, long f, long t)
-{
-	struct file_region *rg, *nrg, *trg;
-
-	/* Locate the region we are either in or before. */
-	list_for_each_entry(rg, head, link)
-		if (f <= rg->to)
-			break;
-
-	/* Round our left edge to the current segment if it encloses us. */
-	if (f > rg->from)
-		f = rg->from;
-
-	/* Check for and consume any regions we now overlap with. */
-	nrg = rg;
-	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
-		if (&rg->link == head)
-			break;
-		if (rg->from > t)
-			break;
-
-		/* If this area reaches higher then extend our area to
-		 * include it completely.  If this is not the first area
-		 * which we intend to reuse, free it. */
-		if (rg->to > t)
-			t = rg->to;
-		if (rg != nrg) {
-			list_del(&rg->link);
-			kfree(rg);
-		}
-	}
-	nrg->from = f;
-	nrg->to = t;
-	return 0;
-}
-
-static long region_chg(struct list_head *head, long f, long t)
-{
-	struct file_region *rg, *nrg;
-	long chg = 0;
-
-	/* Locate the region we are before or in. */
-	list_for_each_entry(rg, head, link)
-		if (f <= rg->to)
-			break;
-
-	/* If we are below the current region then a new region is required.
-	 * Subtle, allocate a new region at the position but make it zero
-	 * size such that we can guarantee to record the reservation. */
-	if (&rg->link == head || t < rg->from) {
-		nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
-		if (!nrg)
-			return -ENOMEM;
-		nrg->from = f;
-		nrg->to   = f;
-		INIT_LIST_HEAD(&nrg->link);
-		list_add(&nrg->link, rg->link.prev);
-
-		return t - f;
-	}
-
-	/* Round our left edge to the current segment if it encloses us. */
-	if (f > rg->from)
-		f = rg->from;
-	chg = t - f;
-
-	/* Check for and consume any regions we now overlap with. */
-	list_for_each_entry(rg, rg->link.prev, link) {
-		if (&rg->link == head)
-			break;
-		if (rg->from > t)
-			return chg;
-
-		/* We overlap with this area, if it extends futher than
-		 * us then we must extend ourselves.  Account for its
-		 * existing reservation. */
-		if (rg->to > t) {
-			chg += rg->to - t;
-			t = rg->to;
-		}
-		chg -= rg->to - rg->from;
-	}
-	return chg;
-}
-
-static long region_truncate(struct list_head *head, long end)
-{
-	struct file_region *rg, *trg;
-	long chg = 0;
-
-	/* Locate the region we are either in or before. */
-	list_for_each_entry(rg, head, link)
-		if (end <= rg->to)
-			break;
-	if (&rg->link == head)
-		return 0;
-
-	/* If we are in the middle of a region then adjust it. */
-	if (end > rg->from) {
-		chg = rg->to - end;
-		rg->to = end;
-		rg = list_entry(rg->link.next, typeof(*rg), link);
-	}
-
-	/* Drop any remaining regions. */
-	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
-		if (&rg->link == head)
-			break;
-		chg += rg->to - rg->from;
-		list_del(&rg->link);
-		kfree(rg);
-	}
-	return chg;
-}
-
 /*
  * Determine if the huge page at addr within the vma has an associated
  * reservation.  Where it does not we will need to logically increase
-- 
1.5.6.205.g7ca3a


^ permalink raw reply related	[flat|nested] 290+ messages in thread

* [PATCH 1/2] hugetlb reservations: move region tracking earlier
@ 2008-06-23 17:35       ` Andy Whitcroft
  0 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-23 17:35 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel, kernel-testers, linux-mm, Mel Gorman,
	Andy Whitcroft

Move the region tracking code much earlier so we can use it for page
presence tracking later on.  No code is changed, just its location.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 mm/hugetlb.c |  246 +++++++++++++++++++++++++++++----------------------------
 1 files changed, 125 insertions(+), 121 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0f76ed1..d701e39 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -47,6 +47,131 @@ static unsigned long __initdata default_hstate_size;
 static DEFINE_SPINLOCK(hugetlb_lock);
 
 /*
+ * Region tracking -- allows tracking of reservations and instantiated pages
+ *                    across the pages in a mapping.
+ */
+struct file_region {
+	struct list_head link;
+	long from;
+	long to;
+};
+
+static long region_add(struct list_head *head, long f, long t)
+{
+	struct file_region *rg, *nrg, *trg;
+
+	/* Locate the region we are either in or before. */
+	list_for_each_entry(rg, head, link)
+		if (f <= rg->to)
+			break;
+
+	/* Round our left edge to the current segment if it encloses us. */
+	if (f > rg->from)
+		f = rg->from;
+
+	/* Check for and consume any regions we now overlap with. */
+	nrg = rg;
+	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
+		if (&rg->link == head)
+			break;
+		if (rg->from > t)
+			break;
+
+		/* If this area reaches higher then extend our area to
+		 * include it completely.  If this is not the first area
+		 * which we intend to reuse, free it. */
+		if (rg->to > t)
+			t = rg->to;
+		if (rg != nrg) {
+			list_del(&rg->link);
+			kfree(rg);
+		}
+	}
+	nrg->from = f;
+	nrg->to = t;
+	return 0;
+}
+
+static long region_chg(struct list_head *head, long f, long t)
+{
+	struct file_region *rg, *nrg;
+	long chg = 0;
+
+	/* Locate the region we are before or in. */
+	list_for_each_entry(rg, head, link)
+		if (f <= rg->to)
+			break;
+
+	/* If we are below the current region then a new region is required.
+	 * Subtle, allocate a new region at the position but make it zero
+	 * size such that we can guarantee to record the reservation. */
+	if (&rg->link == head || t < rg->from) {
+		nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
+		if (!nrg)
+			return -ENOMEM;
+		nrg->from = f;
+		nrg->to   = f;
+		INIT_LIST_HEAD(&nrg->link);
+		list_add(&nrg->link, rg->link.prev);
+
+		return t - f;
+	}
+
+	/* Round our left edge to the current segment if it encloses us. */
+	if (f > rg->from)
+		f = rg->from;
+	chg = t - f;
+
+	/* Check for and consume any regions we now overlap with. */
+	list_for_each_entry(rg, rg->link.prev, link) {
+		if (&rg->link == head)
+			break;
+		if (rg->from > t)
+			return chg;
+
+		/* We overlap with this area, if it extends futher than
+		 * us then we must extend ourselves.  Account for its
+		 * existing reservation. */
+		if (rg->to > t) {
+			chg += rg->to - t;
+			t = rg->to;
+		}
+		chg -= rg->to - rg->from;
+	}
+	return chg;
+}
+
+static long region_truncate(struct list_head *head, long end)
+{
+	struct file_region *rg, *trg;
+	long chg = 0;
+
+	/* Locate the region we are either in or before. */
+	list_for_each_entry(rg, head, link)
+		if (end <= rg->to)
+			break;
+	if (&rg->link == head)
+		return 0;
+
+	/* If we are in the middle of a region then adjust it. */
+	if (end > rg->from) {
+		chg = rg->to - end;
+		rg->to = end;
+		rg = list_entry(rg->link.next, typeof(*rg), link);
+	}
+
+	/* Drop any remaining regions. */
+	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
+		if (&rg->link == head)
+			break;
+		chg += rg->to - rg->from;
+		list_del(&rg->link);
+		kfree(rg);
+	}
+	return chg;
+}
+
+/*
  * Convert the address within this vma to the page offset within
  * the mapping, in base page units.
  */
@@ -649,127 +774,6 @@ static void return_unused_surplus_pages(struct hstate *h,
 	}
 }
 
-struct file_region {
-	struct list_head link;
-	long from;
-	long to;
-};
-
-static long region_add(struct list_head *head, long f, long t)
-{
-	struct file_region *rg, *nrg, *trg;
-
-	/* Locate the region we are either in or before. */
-	list_for_each_entry(rg, head, link)
-		if (f <= rg->to)
-			break;
-
-	/* Round our left edge to the current segment if it encloses us. */
-	if (f > rg->from)
-		f = rg->from;
-
-	/* Check for and consume any regions we now overlap with. */
-	nrg = rg;
-	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
-		if (&rg->link == head)
-			break;
-		if (rg->from > t)
-			break;
-
-		/* If this area reaches higher then extend our area to
-		 * include it completely.  If this is not the first area
-		 * which we intend to reuse, free it. */
-		if (rg->to > t)
-			t = rg->to;
-		if (rg != nrg) {
-			list_del(&rg->link);
-			kfree(rg);
-		}
-	}
-	nrg->from = f;
-	nrg->to = t;
-	return 0;
-}
-
-static long region_chg(struct list_head *head, long f, long t)
-{
-	struct file_region *rg, *nrg;
-	long chg = 0;
-
-	/* Locate the region we are before or in. */
-	list_for_each_entry(rg, head, link)
-		if (f <= rg->to)
-			break;
-
-	/* If we are below the current region then a new region is required.
-	 * Subtle, allocate a new region at the position but make it zero
-	 * size such that we can guarantee to record the reservation. */
-	if (&rg->link == head || t < rg->from) {
-		nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
-		if (!nrg)
-			return -ENOMEM;
-		nrg->from = f;
-		nrg->to   = f;
-		INIT_LIST_HEAD(&nrg->link);
-		list_add(&nrg->link, rg->link.prev);
-
-		return t - f;
-	}
-
-	/* Round our left edge to the current segment if it encloses us. */
-	if (f > rg->from)
-		f = rg->from;
-	chg = t - f;
-
-	/* Check for and consume any regions we now overlap with. */
-	list_for_each_entry(rg, rg->link.prev, link) {
-		if (&rg->link == head)
-			break;
-		if (rg->from > t)
-			return chg;
-
-		/* We overlap with this area, if it extends futher than
-		 * us then we must extend ourselves.  Account for its
-		 * existing reservation. */
-		if (rg->to > t) {
-			chg += rg->to - t;
-			t = rg->to;
-		}
-		chg -= rg->to - rg->from;
-	}
-	return chg;
-}
-
-static long region_truncate(struct list_head *head, long end)
-{
-	struct file_region *rg, *trg;
-	long chg = 0;
-
-	/* Locate the region we are either in or before. */
-	list_for_each_entry(rg, head, link)
-		if (end <= rg->to)
-			break;
-	if (&rg->link == head)
-		return 0;
-
-	/* If we are in the middle of a region then adjust it. */
-	if (end > rg->from) {
-		chg = rg->to - end;
-		rg->to = end;
-		rg = list_entry(rg->link.next, typeof(*rg), link);
-	}
-
-	/* Drop any remaining regions. */
-	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
-		if (&rg->link == head)
-			break;
-		chg += rg->to - rg->from;
-		list_del(&rg->link);
-		kfree(rg);
-	}
-	return chg;
-}
-
 /*
  * Determine if the huge page at addr within the vma has an associated
  * reservation.  Where it does not we will need to logically increase
-- 
1.5.6.205.g7ca3a

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 290+ messages in thread

* [PATCH 1/2] hugetlb reservations: move region tracking earlier
@ 2008-06-23 17:35       ` Andy Whitcroft
  0 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-23 17:35 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Mel Gorman, Andy Whitcroft

Move the region tracking code much earlier so we can use it for page
presence tracking later on.  No code is changed, just its location.

Signed-off-by: Andy Whitcroft <apw-26w3C0LaAnFg9hUCZPvPmw@public.gmane.org>
---
 mm/hugetlb.c |  246 +++++++++++++++++++++++++++++----------------------------
 1 files changed, 125 insertions(+), 121 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0f76ed1..d701e39 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -47,6 +47,131 @@ static unsigned long __initdata default_hstate_size;
 static DEFINE_SPINLOCK(hugetlb_lock);
 
 /*
+ * Region tracking -- allows tracking of reservations and instantiated pages
+ *                    across the pages in a mapping.
+ */
+struct file_region {
+	struct list_head link;
+	long from;
+	long to;
+};
+
+static long region_add(struct list_head *head, long f, long t)
+{
+	struct file_region *rg, *nrg, *trg;
+
+	/* Locate the region we are either in or before. */
+	list_for_each_entry(rg, head, link)
+		if (f <= rg->to)
+			break;
+
+	/* Round our left edge to the current segment if it encloses us. */
+	if (f > rg->from)
+		f = rg->from;
+
+	/* Check for and consume any regions we now overlap with. */
+	nrg = rg;
+	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
+		if (&rg->link == head)
+			break;
+		if (rg->from > t)
+			break;
+
+		/* If this area reaches higher then extend our area to
+		 * include it completely.  If this is not the first area
+		 * which we intend to reuse, free it. */
+		if (rg->to > t)
+			t = rg->to;
+		if (rg != nrg) {
+			list_del(&rg->link);
+			kfree(rg);
+		}
+	}
+	nrg->from = f;
+	nrg->to = t;
+	return 0;
+}
+
+static long region_chg(struct list_head *head, long f, long t)
+{
+	struct file_region *rg, *nrg;
+	long chg = 0;
+
+	/* Locate the region we are before or in. */
+	list_for_each_entry(rg, head, link)
+		if (f <= rg->to)
+			break;
+
+	/* If we are below the current region then a new region is required.
+	 * Subtle, allocate a new region at the position but make it zero
+	 * size such that we can guarantee to record the reservation. */
+	if (&rg->link == head || t < rg->from) {
+		nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
+		if (!nrg)
+			return -ENOMEM;
+		nrg->from = f;
+		nrg->to   = f;
+		INIT_LIST_HEAD(&nrg->link);
+		list_add(&nrg->link, rg->link.prev);
+
+		return t - f;
+	}
+
+	/* Round our left edge to the current segment if it encloses us. */
+	if (f > rg->from)
+		f = rg->from;
+	chg = t - f;
+
+	/* Check for and consume any regions we now overlap with. */
+	list_for_each_entry(rg, rg->link.prev, link) {
+		if (&rg->link == head)
+			break;
+		if (rg->from > t)
+			return chg;
+
+		/* We overlap with this area, if it extends futher than
+		 * us then we must extend ourselves.  Account for its
+		 * existing reservation. */
+		if (rg->to > t) {
+			chg += rg->to - t;
+			t = rg->to;
+		}
+		chg -= rg->to - rg->from;
+	}
+	return chg;
+}
+
+static long region_truncate(struct list_head *head, long end)
+{
+	struct file_region *rg, *trg;
+	long chg = 0;
+
+	/* Locate the region we are either in or before. */
+	list_for_each_entry(rg, head, link)
+		if (end <= rg->to)
+			break;
+	if (&rg->link == head)
+		return 0;
+
+	/* If we are in the middle of a region then adjust it. */
+	if (end > rg->from) {
+		chg = rg->to - end;
+		rg->to = end;
+		rg = list_entry(rg->link.next, typeof(*rg), link);
+	}
+
+	/* Drop any remaining regions. */
+	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
+		if (&rg->link == head)
+			break;
+		chg += rg->to - rg->from;
+		list_del(&rg->link);
+		kfree(rg);
+	}
+	return chg;
+}
+
+/*
  * Convert the address within this vma to the page offset within
  * the mapping, in base page units.
  */
@@ -649,127 +774,6 @@ static void return_unused_surplus_pages(struct hstate *h,
 	}
 }
 
-struct file_region {
-	struct list_head link;
-	long from;
-	long to;
-};
-
-static long region_add(struct list_head *head, long f, long t)
-{
-	struct file_region *rg, *nrg, *trg;
-
-	/* Locate the region we are either in or before. */
-	list_for_each_entry(rg, head, link)
-		if (f <= rg->to)
-			break;
-
-	/* Round our left edge to the current segment if it encloses us. */
-	if (f > rg->from)
-		f = rg->from;
-
-	/* Check for and consume any regions we now overlap with. */
-	nrg = rg;
-	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
-		if (&rg->link == head)
-			break;
-		if (rg->from > t)
-			break;
-
-		/* If this area reaches higher then extend our area to
-		 * include it completely.  If this is not the first area
-		 * which we intend to reuse, free it. */
-		if (rg->to > t)
-			t = rg->to;
-		if (rg != nrg) {
-			list_del(&rg->link);
-			kfree(rg);
-		}
-	}
-	nrg->from = f;
-	nrg->to = t;
-	return 0;
-}
-
-static long region_chg(struct list_head *head, long f, long t)
-{
-	struct file_region *rg, *nrg;
-	long chg = 0;
-
-	/* Locate the region we are before or in. */
-	list_for_each_entry(rg, head, link)
-		if (f <= rg->to)
-			break;
-
-	/* If we are below the current region then a new region is required.
-	 * Subtle, allocate a new region at the position but make it zero
-	 * size such that we can guarantee to record the reservation. */
-	if (&rg->link == head || t < rg->from) {
-		nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
-		if (!nrg)
-			return -ENOMEM;
-		nrg->from = f;
-		nrg->to   = f;
-		INIT_LIST_HEAD(&nrg->link);
-		list_add(&nrg->link, rg->link.prev);
-
-		return t - f;
-	}
-
-	/* Round our left edge to the current segment if it encloses us. */
-	if (f > rg->from)
-		f = rg->from;
-	chg = t - f;
-
-	/* Check for and consume any regions we now overlap with. */
-	list_for_each_entry(rg, rg->link.prev, link) {
-		if (&rg->link == head)
-			break;
-		if (rg->from > t)
-			return chg;
-
-		/* We overlap with this area, if it extends futher than
-		 * us then we must extend ourselves.  Account for its
-		 * existing reservation. */
-		if (rg->to > t) {
-			chg += rg->to - t;
-			t = rg->to;
-		}
-		chg -= rg->to - rg->from;
-	}
-	return chg;
-}
-
-static long region_truncate(struct list_head *head, long end)
-{
-	struct file_region *rg, *trg;
-	long chg = 0;
-
-	/* Locate the region we are either in or before. */
-	list_for_each_entry(rg, head, link)
-		if (end <= rg->to)
-			break;
-	if (&rg->link == head)
-		return 0;
-
-	/* If we are in the middle of a region then adjust it. */
-	if (end > rg->from) {
-		chg = rg->to - end;
-		rg->to = end;
-		rg = list_entry(rg->link.next, typeof(*rg), link);
-	}
-
-	/* Drop any remaining regions. */
-	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
-		if (&rg->link == head)
-			break;
-		chg += rg->to - rg->from;
-		list_del(&rg->link);
-		kfree(rg);
-	}
-	return chg;
-}
-
 /*
  * Determine if the huge page at addr within the vma has an associated
  * reservation.  Where it does not we will need to logically increase
-- 
1.5.6.205.g7ca3a

^ permalink raw reply related	[flat|nested] 290+ messages in thread

* [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits V2
  2008-06-23 17:35     ` Andy Whitcroft
  (?)
@ 2008-06-23 17:35       ` Andy Whitcroft
  -1 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-23 17:35 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel, kernel-testers, linux-mm, Mel Gorman,
	Andy Whitcroft

When a hugetlb mapping with a reservation is split, a new VMA is cloned
from the original.  This new VMA is a direct copy of the original
including the reservation count.  When this pair of VMAs are unmapped
we will incorrect double account the unused reservation and the overall
reservation count will be incorrect, in extreme cases it will wrap.

The problem occurs when we split an existing VMA say to unmap a page in
the middle.  split_vma() will create a new VMA copying all fields from
the original.  As we are storing our reservation count in vm_private_data
this is also copies, endowing the new VMA with a duplicate of the original
VMA's reservation.  Neither of the new VMAs can exhaust these reservations
as they are too small, but when we unmap and close these VMAs we will
incorrect credit the remainder twice and resv_huge_pages will become
out of sync.  This can lead to allocation failures on mappings with
reservations and even to resv_huge_pages wrapping which prevents all
subsequent hugepage allocations.

The simple fix would be to correctly apportion the remaining reservation
count when the split is made.  However the only hook we have vm_ops->open
only has the new VMA we do not know the identity of the preceeding VMA.
Also even if we did have that VMA to hand we do not know how much of the
reservation was consumed each side of the split.

This patch therefore takes a different tack.  We know that the whole of any
private mapping (which has a reservation) has a reservation over its whole
size.  Any present pages represent consumed reservation.  Therefore if
we track the instantiated pages we can calculate the remaining reservation.

This patch reuses the existing regions code to track the regions for which
we have consumed reservation (ie. the instantiated pages), as each page
is faulted in we record the consumption of reservation for the new page.
When we need to return unused reservations at unmap time we simply count
the consumed reservation region subtracting that from the whole of the map.
During a VMA split the newly opened VMA will point to the same region map,
as this map is offset oriented it remains valid for both of the split VMAs.
This map is referenced counted so that it is removed when all VMAs which
are part of the mmap are gone.

Thanks to Adam Litke and Mel Gorman for their review feedback.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 mm/hugetlb.c |  171 ++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 files changed, 144 insertions(+), 27 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d701e39..7ba6d4d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -49,6 +49,16 @@ static DEFINE_SPINLOCK(hugetlb_lock);
 /*
  * Region tracking -- allows tracking of reservations and instantiated pages
  *                    across the pages in a mapping.
+ *
+ * The region data structures are protected by a combination of the mmap_sem
+ * and the hugetlb_instantion_mutex.  To access or modify a region the caller
+ * must either hold the mmap_sem for write, or the mmap_sem for read and
+ * the hugetlb_instantiation mutex:
+ *
+ * 	down_write(&mm->mmap_sem);
+ * or
+ * 	down_read(&mm->mmap_sem);
+ * 	mutex_lock(&hugetlb_instantiation_mutex);
  */
 struct file_region {
 	struct list_head link;
@@ -171,6 +181,30 @@ static long region_truncate(struct list_head *head, long end)
 	return chg;
 }
 
+static long region_count(struct list_head *head, long f, long t)
+{
+	struct file_region *rg;
+	long chg = 0;
+
+	/* Locate each segment we overlap with, and count that overlap. */
+	list_for_each_entry(rg, head, link) {
+		int seg_from;
+		int seg_to;
+
+		if (rg->to <= f)
+			continue;
+		if (rg->from >= t)
+			break;
+
+		seg_from = max(rg->from, f);
+		seg_to = min(rg->to, t);
+
+		chg += seg_to - seg_from;
+	}
+
+	return chg;
+}
+
 /*
  * Convert the address within this vma to the page offset within
  * the mapping, in base page units.
@@ -193,9 +227,15 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
 			(vma->vm_pgoff >> huge_page_order(h));
 }
 
-#define HPAGE_RESV_OWNER    (1UL << (BITS_PER_LONG - 1))
-#define HPAGE_RESV_UNMAPPED (1UL << (BITS_PER_LONG - 2))
+/*
+ * Flags for MAP_PRIVATE reservations.  These are stored in the bottom
+ * bits of the reservation map pointer, which are always clear due to
+ * alignment.
+ */
+#define HPAGE_RESV_OWNER    (1UL << 0)
+#define HPAGE_RESV_UNMAPPED (1UL << 1)
 #define HPAGE_RESV_MASK (HPAGE_RESV_OWNER | HPAGE_RESV_UNMAPPED)
+
 /*
  * These helpers are used to track how many pages are reserved for
  * faults in a MAP_PRIVATE mapping. Only the process that called mmap()
@@ -205,6 +245,15 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
  * the reserve counters are updated with the hugetlb_lock held. It is safe
  * to reset the VMA at fork() time as it is not in use yet and there is no
  * chance of the global counters getting corrupted as a result of the values.
+ *
+ * The private mapping reservation is represented in a subtly different
+ * manner to a shared mapping.  A shared mapping has a region map associated
+ * with the underlying file, this region map represents the backing file
+ * pages which have ever had a reservation assigned which this persists even
+ * after the page is instantiated.  A private mapping has a region map
+ * associated with the original mmap which is attached to all VMAs which
+ * reference it, this region map represents those offsets which have consumed
+ * reservation ie. where pages have been instantiated.
  */
 static unsigned long get_vma_private_data(struct vm_area_struct *vma)
 {
@@ -217,22 +266,48 @@ static void set_vma_private_data(struct vm_area_struct *vma,
 	vma->vm_private_data = (void *)value;
 }
 
-static unsigned long vma_resv_huge_pages(struct vm_area_struct *vma)
+struct resv_map {
+	struct kref refs;
+	struct list_head regions;
+};
+
+struct resv_map *resv_map_alloc(void)
+{
+	struct resv_map *resv_map = kmalloc(sizeof(*resv_map), GFP_KERNEL);
+	if (!resv_map)
+		return NULL;
+
+	kref_init(&resv_map->refs);
+	INIT_LIST_HEAD(&resv_map->regions);
+
+	return resv_map;
+}
+
+void resv_map_release(struct kref *ref)
+{
+	struct resv_map *resv_map = container_of(ref, struct resv_map, refs);
+
+	/* Clear out any active regions before we release the map. */
+	region_truncate(&resv_map->regions, 0);
+	kfree(resv_map);
+}
+
+static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
 {
 	VM_BUG_ON(!is_vm_hugetlb_page(vma));
 	if (!(vma->vm_flags & VM_SHARED))
-		return get_vma_private_data(vma) & ~HPAGE_RESV_MASK;
+		return (struct resv_map *)(get_vma_private_data(vma) &
+							~HPAGE_RESV_MASK);
 	return 0;
 }
 
-static void set_vma_resv_huge_pages(struct vm_area_struct *vma,
-							unsigned long reserve)
+static void set_vma_resv_map(struct vm_area_struct *vma, struct resv_map *map)
 {
 	VM_BUG_ON(!is_vm_hugetlb_page(vma));
 	VM_BUG_ON(vma->vm_flags & VM_SHARED);
 
-	set_vma_private_data(vma,
-		(get_vma_private_data(vma) & HPAGE_RESV_MASK) | reserve);
+	set_vma_private_data(vma, (get_vma_private_data(vma) &
+				HPAGE_RESV_MASK) | (unsigned long)map);
 }
 
 static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
@@ -260,19 +335,12 @@ static void decrement_hugepage_resv_vma(struct hstate *h,
 	if (vma->vm_flags & VM_SHARED) {
 		/* Shared mappings always use reserves */
 		h->resv_huge_pages--;
-	} else {
+	} else if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
 		/*
 		 * Only the process that called mmap() has reserves for
 		 * private mappings.
 		 */
-		if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
-			unsigned long flags, reserve;
-			h->resv_huge_pages--;
-			flags = (unsigned long)vma->vm_private_data &
-							HPAGE_RESV_MASK;
-			reserve = (unsigned long)vma->vm_private_data - 1;
-			vma->vm_private_data = (void *)(reserve | flags);
-		}
+		h->resv_huge_pages--;
 	}
 }
 
@@ -289,7 +357,7 @@ static int vma_has_private_reserves(struct vm_area_struct *vma)
 {
 	if (vma->vm_flags & VM_SHARED)
 		return 0;
-	if (!vma_resv_huge_pages(vma))
+	if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER))
 		return 0;
 	return 1;
 }
@@ -794,12 +862,19 @@ static int vma_needs_reservation(struct hstate *h,
 		return region_chg(&inode->i_mapping->private_list,
 							idx, idx + 1);
 
-	} else {
-		if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER))
-			return 1;
-	}
+	} else if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
+		return 1;
 
-	return 0;
+	} else  {
+		int err;
+		pgoff_t idx = vma_pagecache_offset(h, vma, addr);
+		struct resv_map *reservations = vma_resv_map(vma);
+
+		err = region_chg(&reservations->regions, idx, idx + 1);
+		if (err < 0)
+			return err;
+		return 0;
+	}
 }
 static void vma_commit_reservation(struct hstate *h,
 			struct vm_area_struct *vma, unsigned long addr)
@@ -810,6 +885,13 @@ static void vma_commit_reservation(struct hstate *h,
 	if (vma->vm_flags & VM_SHARED) {
 		pgoff_t idx = vma_pagecache_offset(h, vma, addr);
 		region_add(&inode->i_mapping->private_list, idx, idx + 1);
+
+	} else if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
+		pgoff_t idx = vma_pagecache_offset(h, vma, addr);
+		struct resv_map *reservations = vma_resv_map(vma);
+
+		/* Mark this page used in the map. */
+		region_add(&reservations->regions, idx, idx + 1);
 	}
 }
 
@@ -1456,13 +1538,42 @@ out:
 	return ret;
 }
 
+static void hugetlb_vm_op_open(struct vm_area_struct *vma)
+{
+	struct resv_map *reservations = vma_resv_map(vma);
+
+	/*
+	 * This new VMA should share its siblings reservation map if present.
+	 * The VMA will only ever have a valid reservation map pointer where
+	 * it is being copied for another still existing VMA.  As that VMA
+	 * has a reference to the reservation map it cannot dissappear until
+	 * after this open call completes.  It is therefore safe to take a
+	 * new reference here without additional locking.
+	 */
+	if (reservations)
+		kref_get(&reservations->refs);
+}
+
 static void hugetlb_vm_op_close(struct vm_area_struct *vma)
 {
 	struct hstate *h = hstate_vma(vma);
-	unsigned long reserve = vma_resv_huge_pages(vma);
+	struct resv_map *reservations = vma_resv_map(vma);
+	unsigned long reserve;
+	unsigned long start;
+	unsigned long end;
 
-	if (reserve)
-		hugetlb_acct_memory(h, -reserve);
+	if (reservations) {
+		start = vma_pagecache_offset(h, vma, vma->vm_start);
+		end = vma_pagecache_offset(h, vma, vma->vm_end);
+
+		reserve = (end - start) -
+			region_count(&reservations->regions, start, end);
+
+		kref_put(&reservations->refs, resv_map_release);
+
+		if (reserve)
+			hugetlb_acct_memory(h, -reserve);
+	}
 }
 
 /*
@@ -1479,6 +1590,7 @@ static int hugetlb_vm_op_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 
 struct vm_operations_struct hugetlb_vm_ops = {
 	.fault = hugetlb_vm_op_fault,
+	.open = hugetlb_vm_op_open,
 	.close = hugetlb_vm_op_close,
 };
 
@@ -2037,8 +2149,13 @@ int hugetlb_reserve_pages(struct inode *inode,
 	if (!vma || vma->vm_flags & VM_SHARED)
 		chg = region_chg(&inode->i_mapping->private_list, from, to);
 	else {
+		struct resv_map *resv_map = resv_map_alloc();
+		if (!resv_map)
+			return -ENOMEM;
+
 		chg = to - from;
-		set_vma_resv_huge_pages(vma, chg);
+
+		set_vma_resv_map(vma, resv_map);
 		set_vma_resv_flags(vma, HPAGE_RESV_OWNER);
 	}
 
-- 
1.5.6.205.g7ca3a


^ permalink raw reply related	[flat|nested] 290+ messages in thread

* [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits V2
@ 2008-06-23 17:35       ` Andy Whitcroft
  0 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-23 17:35 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel, kernel-testers, linux-mm, Mel Gorman,
	Andy Whitcroft

When a hugetlb mapping with a reservation is split, a new VMA is cloned
from the original.  This new VMA is a direct copy of the original
including the reservation count.  When this pair of VMAs are unmapped
we will incorrect double account the unused reservation and the overall
reservation count will be incorrect, in extreme cases it will wrap.

The problem occurs when we split an existing VMA say to unmap a page in
the middle.  split_vma() will create a new VMA copying all fields from
the original.  As we are storing our reservation count in vm_private_data
this is also copies, endowing the new VMA with a duplicate of the original
VMA's reservation.  Neither of the new VMAs can exhaust these reservations
as they are too small, but when we unmap and close these VMAs we will
incorrect credit the remainder twice and resv_huge_pages will become
out of sync.  This can lead to allocation failures on mappings with
reservations and even to resv_huge_pages wrapping which prevents all
subsequent hugepage allocations.

The simple fix would be to correctly apportion the remaining reservation
count when the split is made.  However the only hook we have vm_ops->open
only has the new VMA we do not know the identity of the preceeding VMA.
Also even if we did have that VMA to hand we do not know how much of the
reservation was consumed each side of the split.

This patch therefore takes a different tack.  We know that the whole of any
private mapping (which has a reservation) has a reservation over its whole
size.  Any present pages represent consumed reservation.  Therefore if
we track the instantiated pages we can calculate the remaining reservation.

This patch reuses the existing regions code to track the regions for which
we have consumed reservation (ie. the instantiated pages), as each page
is faulted in we record the consumption of reservation for the new page.
When we need to return unused reservations at unmap time we simply count
the consumed reservation region subtracting that from the whole of the map.
During a VMA split the newly opened VMA will point to the same region map,
as this map is offset oriented it remains valid for both of the split VMAs.
This map is referenced counted so that it is removed when all VMAs which
are part of the mmap are gone.

Thanks to Adam Litke and Mel Gorman for their review feedback.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 mm/hugetlb.c |  171 ++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 files changed, 144 insertions(+), 27 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d701e39..7ba6d4d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -49,6 +49,16 @@ static DEFINE_SPINLOCK(hugetlb_lock);
 /*
  * Region tracking -- allows tracking of reservations and instantiated pages
  *                    across the pages in a mapping.
+ *
+ * The region data structures are protected by a combination of the mmap_sem
+ * and the hugetlb_instantion_mutex.  To access or modify a region the caller
+ * must either hold the mmap_sem for write, or the mmap_sem for read and
+ * the hugetlb_instantiation mutex:
+ *
+ * 	down_write(&mm->mmap_sem);
+ * or
+ * 	down_read(&mm->mmap_sem);
+ * 	mutex_lock(&hugetlb_instantiation_mutex);
  */
 struct file_region {
 	struct list_head link;
@@ -171,6 +181,30 @@ static long region_truncate(struct list_head *head, long end)
 	return chg;
 }
 
+static long region_count(struct list_head *head, long f, long t)
+{
+	struct file_region *rg;
+	long chg = 0;
+
+	/* Locate each segment we overlap with, and count that overlap. */
+	list_for_each_entry(rg, head, link) {
+		int seg_from;
+		int seg_to;
+
+		if (rg->to <= f)
+			continue;
+		if (rg->from >= t)
+			break;
+
+		seg_from = max(rg->from, f);
+		seg_to = min(rg->to, t);
+
+		chg += seg_to - seg_from;
+	}
+
+	return chg;
+}
+
 /*
  * Convert the address within this vma to the page offset within
  * the mapping, in base page units.
@@ -193,9 +227,15 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
 			(vma->vm_pgoff >> huge_page_order(h));
 }
 
-#define HPAGE_RESV_OWNER    (1UL << (BITS_PER_LONG - 1))
-#define HPAGE_RESV_UNMAPPED (1UL << (BITS_PER_LONG - 2))
+/*
+ * Flags for MAP_PRIVATE reservations.  These are stored in the bottom
+ * bits of the reservation map pointer, which are always clear due to
+ * alignment.
+ */
+#define HPAGE_RESV_OWNER    (1UL << 0)
+#define HPAGE_RESV_UNMAPPED (1UL << 1)
 #define HPAGE_RESV_MASK (HPAGE_RESV_OWNER | HPAGE_RESV_UNMAPPED)
+
 /*
  * These helpers are used to track how many pages are reserved for
  * faults in a MAP_PRIVATE mapping. Only the process that called mmap()
@@ -205,6 +245,15 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
  * the reserve counters are updated with the hugetlb_lock held. It is safe
  * to reset the VMA at fork() time as it is not in use yet and there is no
  * chance of the global counters getting corrupted as a result of the values.
+ *
+ * The private mapping reservation is represented in a subtly different
+ * manner to a shared mapping.  A shared mapping has a region map associated
+ * with the underlying file, this region map represents the backing file
+ * pages which have ever had a reservation assigned which this persists even
+ * after the page is instantiated.  A private mapping has a region map
+ * associated with the original mmap which is attached to all VMAs which
+ * reference it, this region map represents those offsets which have consumed
+ * reservation ie. where pages have been instantiated.
  */
 static unsigned long get_vma_private_data(struct vm_area_struct *vma)
 {
@@ -217,22 +266,48 @@ static void set_vma_private_data(struct vm_area_struct *vma,
 	vma->vm_private_data = (void *)value;
 }
 
-static unsigned long vma_resv_huge_pages(struct vm_area_struct *vma)
+struct resv_map {
+	struct kref refs;
+	struct list_head regions;
+};
+
+struct resv_map *resv_map_alloc(void)
+{
+	struct resv_map *resv_map = kmalloc(sizeof(*resv_map), GFP_KERNEL);
+	if (!resv_map)
+		return NULL;
+
+	kref_init(&resv_map->refs);
+	INIT_LIST_HEAD(&resv_map->regions);
+
+	return resv_map;
+}
+
+void resv_map_release(struct kref *ref)
+{
+	struct resv_map *resv_map = container_of(ref, struct resv_map, refs);
+
+	/* Clear out any active regions before we release the map. */
+	region_truncate(&resv_map->regions, 0);
+	kfree(resv_map);
+}
+
+static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
 {
 	VM_BUG_ON(!is_vm_hugetlb_page(vma));
 	if (!(vma->vm_flags & VM_SHARED))
-		return get_vma_private_data(vma) & ~HPAGE_RESV_MASK;
+		return (struct resv_map *)(get_vma_private_data(vma) &
+							~HPAGE_RESV_MASK);
 	return 0;
 }
 
-static void set_vma_resv_huge_pages(struct vm_area_struct *vma,
-							unsigned long reserve)
+static void set_vma_resv_map(struct vm_area_struct *vma, struct resv_map *map)
 {
 	VM_BUG_ON(!is_vm_hugetlb_page(vma));
 	VM_BUG_ON(vma->vm_flags & VM_SHARED);
 
-	set_vma_private_data(vma,
-		(get_vma_private_data(vma) & HPAGE_RESV_MASK) | reserve);
+	set_vma_private_data(vma, (get_vma_private_data(vma) &
+				HPAGE_RESV_MASK) | (unsigned long)map);
 }
 
 static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
@@ -260,19 +335,12 @@ static void decrement_hugepage_resv_vma(struct hstate *h,
 	if (vma->vm_flags & VM_SHARED) {
 		/* Shared mappings always use reserves */
 		h->resv_huge_pages--;
-	} else {
+	} else if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
 		/*
 		 * Only the process that called mmap() has reserves for
 		 * private mappings.
 		 */
-		if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
-			unsigned long flags, reserve;
-			h->resv_huge_pages--;
-			flags = (unsigned long)vma->vm_private_data &
-							HPAGE_RESV_MASK;
-			reserve = (unsigned long)vma->vm_private_data - 1;
-			vma->vm_private_data = (void *)(reserve | flags);
-		}
+		h->resv_huge_pages--;
 	}
 }
 
@@ -289,7 +357,7 @@ static int vma_has_private_reserves(struct vm_area_struct *vma)
 {
 	if (vma->vm_flags & VM_SHARED)
 		return 0;
-	if (!vma_resv_huge_pages(vma))
+	if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER))
 		return 0;
 	return 1;
 }
@@ -794,12 +862,19 @@ static int vma_needs_reservation(struct hstate *h,
 		return region_chg(&inode->i_mapping->private_list,
 							idx, idx + 1);
 
-	} else {
-		if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER))
-			return 1;
-	}
+	} else if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
+		return 1;
 
-	return 0;
+	} else  {
+		int err;
+		pgoff_t idx = vma_pagecache_offset(h, vma, addr);
+		struct resv_map *reservations = vma_resv_map(vma);
+
+		err = region_chg(&reservations->regions, idx, idx + 1);
+		if (err < 0)
+			return err;
+		return 0;
+	}
 }
 static void vma_commit_reservation(struct hstate *h,
 			struct vm_area_struct *vma, unsigned long addr)
@@ -810,6 +885,13 @@ static void vma_commit_reservation(struct hstate *h,
 	if (vma->vm_flags & VM_SHARED) {
 		pgoff_t idx = vma_pagecache_offset(h, vma, addr);
 		region_add(&inode->i_mapping->private_list, idx, idx + 1);
+
+	} else if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
+		pgoff_t idx = vma_pagecache_offset(h, vma, addr);
+		struct resv_map *reservations = vma_resv_map(vma);
+
+		/* Mark this page used in the map. */
+		region_add(&reservations->regions, idx, idx + 1);
 	}
 }
 
@@ -1456,13 +1538,42 @@ out:
 	return ret;
 }
 
+static void hugetlb_vm_op_open(struct vm_area_struct *vma)
+{
+	struct resv_map *reservations = vma_resv_map(vma);
+
+	/*
+	 * This new VMA should share its siblings reservation map if present.
+	 * The VMA will only ever have a valid reservation map pointer where
+	 * it is being copied for another still existing VMA.  As that VMA
+	 * has a reference to the reservation map it cannot dissappear until
+	 * after this open call completes.  It is therefore safe to take a
+	 * new reference here without additional locking.
+	 */
+	if (reservations)
+		kref_get(&reservations->refs);
+}
+
 static void hugetlb_vm_op_close(struct vm_area_struct *vma)
 {
 	struct hstate *h = hstate_vma(vma);
-	unsigned long reserve = vma_resv_huge_pages(vma);
+	struct resv_map *reservations = vma_resv_map(vma);
+	unsigned long reserve;
+	unsigned long start;
+	unsigned long end;
 
-	if (reserve)
-		hugetlb_acct_memory(h, -reserve);
+	if (reservations) {
+		start = vma_pagecache_offset(h, vma, vma->vm_start);
+		end = vma_pagecache_offset(h, vma, vma->vm_end);
+
+		reserve = (end - start) -
+			region_count(&reservations->regions, start, end);
+
+		kref_put(&reservations->refs, resv_map_release);
+
+		if (reserve)
+			hugetlb_acct_memory(h, -reserve);
+	}
 }
 
 /*
@@ -1479,6 +1590,7 @@ static int hugetlb_vm_op_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 
 struct vm_operations_struct hugetlb_vm_ops = {
 	.fault = hugetlb_vm_op_fault,
+	.open = hugetlb_vm_op_open,
 	.close = hugetlb_vm_op_close,
 };
 
@@ -2037,8 +2149,13 @@ int hugetlb_reserve_pages(struct inode *inode,
 	if (!vma || vma->vm_flags & VM_SHARED)
 		chg = region_chg(&inode->i_mapping->private_list, from, to);
 	else {
+		struct resv_map *resv_map = resv_map_alloc();
+		if (!resv_map)
+			return -ENOMEM;
+
 		chg = to - from;
-		set_vma_resv_huge_pages(vma, chg);
+
+		set_vma_resv_map(vma, resv_map);
 		set_vma_resv_flags(vma, HPAGE_RESV_OWNER);
 	}
 
-- 
1.5.6.205.g7ca3a

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 290+ messages in thread

* [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits V2
@ 2008-06-23 17:35       ` Andy Whitcroft
  0 siblings, 0 replies; 290+ messages in thread
From: Andy Whitcroft @ 2008-06-23 17:35 UTC (permalink / raw)
  To: Jon Tollefson
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Mel Gorman, Andy Whitcroft

When a hugetlb mapping with a reservation is split, a new VMA is cloned
from the original.  This new VMA is a direct copy of the original
including the reservation count.  When this pair of VMAs are unmapped
we will incorrect double account the unused reservation and the overall
reservation count will be incorrect, in extreme cases it will wrap.

The problem occurs when we split an existing VMA say to unmap a page in
the middle.  split_vma() will create a new VMA copying all fields from
the original.  As we are storing our reservation count in vm_private_data
this is also copies, endowing the new VMA with a duplicate of the original
VMA's reservation.  Neither of the new VMAs can exhaust these reservations
as they are too small, but when we unmap and close these VMAs we will
incorrect credit the remainder twice and resv_huge_pages will become
out of sync.  This can lead to allocation failures on mappings with
reservations and even to resv_huge_pages wrapping which prevents all
subsequent hugepage allocations.

The simple fix would be to correctly apportion the remaining reservation
count when the split is made.  However the only hook we have vm_ops->open
only has the new VMA we do not know the identity of the preceeding VMA.
Also even if we did have that VMA to hand we do not know how much of the
reservation was consumed each side of the split.

This patch therefore takes a different tack.  We know that the whole of any
private mapping (which has a reservation) has a reservation over its whole
size.  Any present pages represent consumed reservation.  Therefore if
we track the instantiated pages we can calculate the remaining reservation.

This patch reuses the existing regions code to track the regions for which
we have consumed reservation (ie. the instantiated pages), as each page
is faulted in we record the consumption of reservation for the new page.
When we need to return unused reservations at unmap time we simply count
the consumed reservation region subtracting that from the whole of the map.
During a VMA split the newly opened VMA will point to the same region map,
as this map is offset oriented it remains valid for both of the split VMAs.
This map is referenced counted so that it is removed when all VMAs which
are part of the mmap are gone.

Thanks to Adam Litke and Mel Gorman for their review feedback.

Signed-off-by: Andy Whitcroft <apw-26w3C0LaAnFg9hUCZPvPmw@public.gmane.org>
---
 mm/hugetlb.c |  171 ++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 files changed, 144 insertions(+), 27 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d701e39..7ba6d4d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -49,6 +49,16 @@ static DEFINE_SPINLOCK(hugetlb_lock);
 /*
  * Region tracking -- allows tracking of reservations and instantiated pages
  *                    across the pages in a mapping.
+ *
+ * The region data structures are protected by a combination of the mmap_sem
+ * and the hugetlb_instantion_mutex.  To access or modify a region the caller
+ * must either hold the mmap_sem for write, or the mmap_sem for read and
+ * the hugetlb_instantiation mutex:
+ *
+ * 	down_write(&mm->mmap_sem);
+ * or
+ * 	down_read(&mm->mmap_sem);
+ * 	mutex_lock(&hugetlb_instantiation_mutex);
  */
 struct file_region {
 	struct list_head link;
@@ -171,6 +181,30 @@ static long region_truncate(struct list_head *head, long end)
 	return chg;
 }
 
+static long region_count(struct list_head *head, long f, long t)
+{
+	struct file_region *rg;
+	long chg = 0;
+
+	/* Locate each segment we overlap with, and count that overlap. */
+	list_for_each_entry(rg, head, link) {
+		int seg_from;
+		int seg_to;
+
+		if (rg->to <= f)
+			continue;
+		if (rg->from >= t)
+			break;
+
+		seg_from = max(rg->from, f);
+		seg_to = min(rg->to, t);
+
+		chg += seg_to - seg_from;
+	}
+
+	return chg;
+}
+
 /*
  * Convert the address within this vma to the page offset within
  * the mapping, in base page units.
@@ -193,9 +227,15 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
 			(vma->vm_pgoff >> huge_page_order(h));
 }
 
-#define HPAGE_RESV_OWNER    (1UL << (BITS_PER_LONG - 1))
-#define HPAGE_RESV_UNMAPPED (1UL << (BITS_PER_LONG - 2))
+/*
+ * Flags for MAP_PRIVATE reservations.  These are stored in the bottom
+ * bits of the reservation map pointer, which are always clear due to
+ * alignment.
+ */
+#define HPAGE_RESV_OWNER    (1UL << 0)
+#define HPAGE_RESV_UNMAPPED (1UL << 1)
 #define HPAGE_RESV_MASK (HPAGE_RESV_OWNER | HPAGE_RESV_UNMAPPED)
+
 /*
  * These helpers are used to track how many pages are reserved for
  * faults in a MAP_PRIVATE mapping. Only the process that called mmap()
@@ -205,6 +245,15 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
  * the reserve counters are updated with the hugetlb_lock held. It is safe
  * to reset the VMA at fork() time as it is not in use yet and there is no
  * chance of the global counters getting corrupted as a result of the values.
+ *
+ * The private mapping reservation is represented in a subtly different
+ * manner to a shared mapping.  A shared mapping has a region map associated
+ * with the underlying file, this region map represents the backing file
+ * pages which have ever had a reservation assigned which this persists even
+ * after the page is instantiated.  A private mapping has a region map
+ * associated with the original mmap which is attached to all VMAs which
+ * reference it, this region map represents those offsets which have consumed
+ * reservation ie. where pages have been instantiated.
  */
 static unsigned long get_vma_private_data(struct vm_area_struct *vma)
 {
@@ -217,22 +266,48 @@ static void set_vma_private_data(struct vm_area_struct *vma,
 	vma->vm_private_data = (void *)value;
 }
 
-static unsigned long vma_resv_huge_pages(struct vm_area_struct *vma)
+struct resv_map {
+	struct kref refs;
+	struct list_head regions;
+};
+
+struct resv_map *resv_map_alloc(void)
+{
+	struct resv_map *resv_map = kmalloc(sizeof(*resv_map), GFP_KERNEL);
+	if (!resv_map)
+		return NULL;
+
+	kref_init(&resv_map->refs);
+	INIT_LIST_HEAD(&resv_map->regions);
+
+	return resv_map;
+}
+
+void resv_map_release(struct kref *ref)
+{
+	struct resv_map *resv_map = container_of(ref, struct resv_map, refs);
+
+	/* Clear out any active regions before we release the map. */
+	region_truncate(&resv_map->regions, 0);
+	kfree(resv_map);
+}
+
+static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
 {
 	VM_BUG_ON(!is_vm_hugetlb_page(vma));
 	if (!(vma->vm_flags & VM_SHARED))
-		return get_vma_private_data(vma) & ~HPAGE_RESV_MASK;
+		return (struct resv_map *)(get_vma_private_data(vma) &
+							~HPAGE_RESV_MASK);
 	return 0;
 }
 
-static void set_vma_resv_huge_pages(struct vm_area_struct *vma,
-							unsigned long reserve)
+static void set_vma_resv_map(struct vm_area_struct *vma, struct resv_map *map)
 {
 	VM_BUG_ON(!is_vm_hugetlb_page(vma));
 	VM_BUG_ON(vma->vm_flags & VM_SHARED);
 
-	set_vma_private_data(vma,
-		(get_vma_private_data(vma) & HPAGE_RESV_MASK) | reserve);
+	set_vma_private_data(vma, (get_vma_private_data(vma) &
+				HPAGE_RESV_MASK) | (unsigned long)map);
 }
 
 static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
@@ -260,19 +335,12 @@ static void decrement_hugepage_resv_vma(struct hstate *h,
 	if (vma->vm_flags & VM_SHARED) {
 		/* Shared mappings always use reserves */
 		h->resv_huge_pages--;
-	} else {
+	} else if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
 		/*
 		 * Only the process that called mmap() has reserves for
 		 * private mappings.
 		 */
-		if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
-			unsigned long flags, reserve;
-			h->resv_huge_pages--;
-			flags = (unsigned long)vma->vm_private_data &
-							HPAGE_RESV_MASK;
-			reserve = (unsigned long)vma->vm_private_data - 1;
-			vma->vm_private_data = (void *)(reserve | flags);
-		}
+		h->resv_huge_pages--;
 	}
 }
 
@@ -289,7 +357,7 @@ static int vma_has_private_reserves(struct vm_area_struct *vma)
 {
 	if (vma->vm_flags & VM_SHARED)
 		return 0;
-	if (!vma_resv_huge_pages(vma))
+	if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER))
 		return 0;
 	return 1;
 }
@@ -794,12 +862,19 @@ static int vma_needs_reservation(struct hstate *h,
 		return region_chg(&inode->i_mapping->private_list,
 							idx, idx + 1);
 
-	} else {
-		if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER))
-			return 1;
-	}
+	} else if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
+		return 1;
 
-	return 0;
+	} else  {
+		int err;
+		pgoff_t idx = vma_pagecache_offset(h, vma, addr);
+		struct resv_map *reservations = vma_resv_map(vma);
+
+		err = region_chg(&reservations->regions, idx, idx + 1);
+		if (err < 0)
+			return err;
+		return 0;
+	}
 }
 static void vma_commit_reservation(struct hstate *h,
 			struct vm_area_struct *vma, unsigned long addr)
@@ -810,6 +885,13 @@ static void vma_commit_reservation(struct hstate *h,
 	if (vma->vm_flags & VM_SHARED) {
 		pgoff_t idx = vma_pagecache_offset(h, vma, addr);
 		region_add(&inode->i_mapping->private_list, idx, idx + 1);
+
+	} else if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
+		pgoff_t idx = vma_pagecache_offset(h, vma, addr);
+		struct resv_map *reservations = vma_resv_map(vma);
+
+		/* Mark this page used in the map. */
+		region_add(&reservations->regions, idx, idx + 1);
 	}
 }
 
@@ -1456,13 +1538,42 @@ out:
 	return ret;
 }
 
+static void hugetlb_vm_op_open(struct vm_area_struct *vma)
+{
+	struct resv_map *reservations = vma_resv_map(vma);
+
+	/*
+	 * This new VMA should share its siblings reservation map if present.
+	 * The VMA will only ever have a valid reservation map pointer where
+	 * it is being copied for another still existing VMA.  As that VMA
+	 * has a reference to the reservation map it cannot dissappear until
+	 * after this open call completes.  It is therefore safe to take a
+	 * new reference here without additional locking.
+	 */
+	if (reservations)
+		kref_get(&reservations->refs);
+}
+
 static void hugetlb_vm_op_close(struct vm_area_struct *vma)
 {
 	struct hstate *h = hstate_vma(vma);
-	unsigned long reserve = vma_resv_huge_pages(vma);
+	struct resv_map *reservations = vma_resv_map(vma);
+	unsigned long reserve;
+	unsigned long start;
+	unsigned long end;
 
-	if (reserve)
-		hugetlb_acct_memory(h, -reserve);
+	if (reservations) {
+		start = vma_pagecache_offset(h, vma, vma->vm_start);
+		end = vma_pagecache_offset(h, vma, vma->vm_end);
+
+		reserve = (end - start) -
+			region_count(&reservations->regions, start, end);
+
+		kref_put(&reservations->refs, resv_map_release);
+
+		if (reserve)
+			hugetlb_acct_memory(h, -reserve);
+	}
 }
 
 /*
@@ -1479,6 +1590,7 @@ static int hugetlb_vm_op_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 
 struct vm_operations_struct hugetlb_vm_ops = {
 	.fault = hugetlb_vm_op_fault,
+	.open = hugetlb_vm_op_open,
 	.close = hugetlb_vm_op_close,
 };
 
@@ -2037,8 +2149,13 @@ int hugetlb_reserve_pages(struct inode *inode,
 	if (!vma || vma->vm_flags & VM_SHARED)
 		chg = region_chg(&inode->i_mapping->private_list, from, to);
 	else {
+		struct resv_map *resv_map = resv_map_alloc();
+		if (!resv_map)
+			return -ENOMEM;
+
 		chg = to - from;
-		set_vma_resv_huge_pages(vma, chg);
+
+		set_vma_resv_map(vma, resv_map);
 		set_vma_resv_flags(vma, HPAGE_RESV_OWNER);
 	}
 
-- 
1.5.6.205.g7ca3a

^ permalink raw reply related	[flat|nested] 290+ messages in thread

* Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run()
  2008-06-23  3:55           ` Rusty Russell
  (?)
@ 2008-06-23 21:01             ` Ingo Molnar
  -1 siblings, 0 replies; 290+ messages in thread
From: Ingo Molnar @ 2008-06-23 21:01 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Jeremy Fitzhardinge, Hidehiro Kawai, Andrew Morton, linux-kernel,
	kernel-testers, linux-mm, sugita, Satoshi OSHIMA


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Friday 20 June 2008 23:21:10 Ingo Molnar wrote:
> > * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
[...]
> > > (With the appropriate transformation of sched_setscheduler -> __)
> > >
> > > Better than scattering stray true/falses around the code.
> >
> > agreed - it would also be less intrusive on the API change side.
> 
> Yes, here's the patch.  I've put it in my tree for testing, too.
> 
> sched_setscheduler_nocheck: add a flag to control access checks

applied to tip/sched/new-API-sched_setscheduler, thanks Rusty. Also 
added it to auto-sched-next so that it shows up in linux-next.

btw., had to merge this bit manually:

> +/**
> + * sched_setscheduler_nocheck - change the scheduling policy and/or RT priority of a thread 
> from kernelspace.
> + * @p: the task in question.

as it suffered from line-warp damage.

	Ingo

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run()
@ 2008-06-23 21:01             ` Ingo Molnar
  0 siblings, 0 replies; 290+ messages in thread
From: Ingo Molnar @ 2008-06-23 21:01 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Jeremy Fitzhardinge, Hidehiro Kawai, Andrew Morton, linux-kernel,
	kernel-testers, linux-mm, sugita, Satoshi OSHIMA

* Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Friday 20 June 2008 23:21:10 Ingo Molnar wrote:
> > * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
[...]
> > > (With the appropriate transformation of sched_setscheduler -> __)
> > >
> > > Better than scattering stray true/falses around the code.
> >
> > agreed - it would also be less intrusive on the API change side.
> 
> Yes, here's the patch.  I've put it in my tree for testing, too.
> 
> sched_setscheduler_nocheck: add a flag to control access checks

applied to tip/sched/new-API-sched_setscheduler, thanks Rusty. Also 
added it to auto-sched-next so that it shows up in linux-next.

btw., had to merge this bit manually:

> +/**
> + * sched_setscheduler_nocheck - change the scheduling policy and/or RT priority of a thread 
> from kernelspace.
> + * @p: the task in question.

as it suffered from line-warp damage.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run()
@ 2008-06-23 21:01             ` Ingo Molnar
  0 siblings, 0 replies; 290+ messages in thread
From: Ingo Molnar @ 2008-06-23 21:01 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Jeremy Fitzhardinge, Hidehiro Kawai, Andrew Morton,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, sugita, Satoshi OSHIMA


* Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org> wrote:

> On Friday 20 June 2008 23:21:10 Ingo Molnar wrote:
> > * Jeremy Fitzhardinge <jeremy-TSDbQ3PG+2Y@public.gmane.org> wrote:
[...]
> > > (With the appropriate transformation of sched_setscheduler -> __)
> > >
> > > Better than scattering stray true/falses around the code.
> >
> > agreed - it would also be less intrusive on the API change side.
> 
> Yes, here's the patch.  I've put it in my tree for testing, too.
> 
> sched_setscheduler_nocheck: add a flag to control access checks

applied to tip/sched/new-API-sched_setscheduler, thanks Rusty. Also 
added it to auto-sched-next so that it shows up in linux-next.

btw., had to merge this bit manually:

> +/**
> + * sched_setscheduler_nocheck - change the scheduling policy and/or RT priority of a thread 
> from kernelspace.
> + * @p: the task in question.

as it suffered from line-warp damage.

	Ingo

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH 1/2] hugetlb reservations: move region tracking earlier
  2008-06-23 17:35       ` Andy Whitcroft
  (?)
@ 2008-06-23 23:05         ` Mel Gorman
  -1 siblings, 0 replies; 290+ messages in thread
From: Mel Gorman @ 2008-06-23 23:05 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Jon Tollefson, Andrew Morton, Nick Piggin, Nishanth Aravamudan,
	Adam Litke, linux-kernel, kernel-testers, linux-mm

On (23/06/08 18:35), Andy Whitcroft didst pronounce:
> Move the region tracking code much earlier so we can use it for page
> presence tracking later on.  No code is changed, just its location.
> 
> Signed-off-by: Andy Whitcroft <apw@shadowen.org>

Straight-forward code-move.

Acked-by: Mel Gorman <mel@csn.ul.ie>

> ---
>  mm/hugetlb.c |  246 +++++++++++++++++++++++++++++----------------------------
>  1 files changed, 125 insertions(+), 121 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 0f76ed1..d701e39 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -47,6 +47,131 @@ static unsigned long __initdata default_hstate_size;
>  static DEFINE_SPINLOCK(hugetlb_lock);
>  
>  /*
> + * Region tracking -- allows tracking of reservations and instantiated pages
> + *                    across the pages in a mapping.
> + */
> +struct file_region {
> +	struct list_head link;
> +	long from;
> +	long to;
> +};
> +
> +static long region_add(struct list_head *head, long f, long t)
> +{
> +	struct file_region *rg, *nrg, *trg;
> +
> +	/* Locate the region we are either in or before. */
> +	list_for_each_entry(rg, head, link)
> +		if (f <= rg->to)
> +			break;
> +
> +	/* Round our left edge to the current segment if it encloses us. */
> +	if (f > rg->from)
> +		f = rg->from;
> +
> +	/* Check for and consume any regions we now overlap with. */
> +	nrg = rg;
> +	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
> +		if (&rg->link == head)
> +			break;
> +		if (rg->from > t)
> +			break;
> +
> +		/* If this area reaches higher then extend our area to
> +		 * include it completely.  If this is not the first area
> +		 * which we intend to reuse, free it. */
> +		if (rg->to > t)
> +			t = rg->to;
> +		if (rg != nrg) {
> +			list_del(&rg->link);
> +			kfree(rg);
> +		}
> +	}
> +	nrg->from = f;
> +	nrg->to = t;
> +	return 0;
> +}
> +
> +static long region_chg(struct list_head *head, long f, long t)
> +{
> +	struct file_region *rg, *nrg;
> +	long chg = 0;
> +
> +	/* Locate the region we are before or in. */
> +	list_for_each_entry(rg, head, link)
> +		if (f <= rg->to)
> +			break;
> +
> +	/* If we are below the current region then a new region is required.
> +	 * Subtle, allocate a new region at the position but make it zero
> +	 * size such that we can guarantee to record the reservation. */
> +	if (&rg->link == head || t < rg->from) {
> +		nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
> +		if (!nrg)
> +			return -ENOMEM;
> +		nrg->from = f;
> +		nrg->to   = f;
> +		INIT_LIST_HEAD(&nrg->link);
> +		list_add(&nrg->link, rg->link.prev);
> +
> +		return t - f;
> +	}
> +
> +	/* Round our left edge to the current segment if it encloses us. */
> +	if (f > rg->from)
> +		f = rg->from;
> +	chg = t - f;
> +
> +	/* Check for and consume any regions we now overlap with. */
> +	list_for_each_entry(rg, rg->link.prev, link) {
> +		if (&rg->link == head)
> +			break;
> +		if (rg->from > t)
> +			return chg;
> +
> +		/* We overlap with this area, if it extends futher than
> +		 * us then we must extend ourselves.  Account for its
> +		 * existing reservation. */
> +		if (rg->to > t) {
> +			chg += rg->to - t;
> +			t = rg->to;
> +		}
> +		chg -= rg->to - rg->from;
> +	}
> +	return chg;
> +}
> +
> +static long region_truncate(struct list_head *head, long end)
> +{
> +	struct file_region *rg, *trg;
> +	long chg = 0;
> +
> +	/* Locate the region we are either in or before. */
> +	list_for_each_entry(rg, head, link)
> +		if (end <= rg->to)
> +			break;
> +	if (&rg->link == head)
> +		return 0;
> +
> +	/* If we are in the middle of a region then adjust it. */
> +	if (end > rg->from) {
> +		chg = rg->to - end;
> +		rg->to = end;
> +		rg = list_entry(rg->link.next, typeof(*rg), link);
> +	}
> +
> +	/* Drop any remaining regions. */
> +	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
> +		if (&rg->link == head)
> +			break;
> +		chg += rg->to - rg->from;
> +		list_del(&rg->link);
> +		kfree(rg);
> +	}
> +	return chg;
> +}
> +
> +/*
>   * Convert the address within this vma to the page offset within
>   * the mapping, in base page units.
>   */
> @@ -649,127 +774,6 @@ static void return_unused_surplus_pages(struct hstate *h,
>  	}
>  }
>  
> -struct file_region {
> -	struct list_head link;
> -	long from;
> -	long to;
> -};
> -
> -static long region_add(struct list_head *head, long f, long t)
> -{
> -	struct file_region *rg, *nrg, *trg;
> -
> -	/* Locate the region we are either in or before. */
> -	list_for_each_entry(rg, head, link)
> -		if (f <= rg->to)
> -			break;
> -
> -	/* Round our left edge to the current segment if it encloses us. */
> -	if (f > rg->from)
> -		f = rg->from;
> -
> -	/* Check for and consume any regions we now overlap with. */
> -	nrg = rg;
> -	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
> -		if (&rg->link == head)
> -			break;
> -		if (rg->from > t)
> -			break;
> -
> -		/* If this area reaches higher then extend our area to
> -		 * include it completely.  If this is not the first area
> -		 * which we intend to reuse, free it. */
> -		if (rg->to > t)
> -			t = rg->to;
> -		if (rg != nrg) {
> -			list_del(&rg->link);
> -			kfree(rg);
> -		}
> -	}
> -	nrg->from = f;
> -	nrg->to = t;
> -	return 0;
> -}
> -
> -static long region_chg(struct list_head *head, long f, long t)
> -{
> -	struct file_region *rg, *nrg;
> -	long chg = 0;
> -
> -	/* Locate the region we are before or in. */
> -	list_for_each_entry(rg, head, link)
> -		if (f <= rg->to)
> -			break;
> -
> -	/* If we are below the current region then a new region is required.
> -	 * Subtle, allocate a new region at the position but make it zero
> -	 * size such that we can guarantee to record the reservation. */
> -	if (&rg->link == head || t < rg->from) {
> -		nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
> -		if (!nrg)
> -			return -ENOMEM;
> -		nrg->from = f;
> -		nrg->to   = f;
> -		INIT_LIST_HEAD(&nrg->link);
> -		list_add(&nrg->link, rg->link.prev);
> -
> -		return t - f;
> -	}
> -
> -	/* Round our left edge to the current segment if it encloses us. */
> -	if (f > rg->from)
> -		f = rg->from;
> -	chg = t - f;
> -
> -	/* Check for and consume any regions we now overlap with. */
> -	list_for_each_entry(rg, rg->link.prev, link) {
> -		if (&rg->link == head)
> -			break;
> -		if (rg->from > t)
> -			return chg;
> -
> -		/* We overlap with this area, if it extends futher than
> -		 * us then we must extend ourselves.  Account for its
> -		 * existing reservation. */
> -		if (rg->to > t) {
> -			chg += rg->to - t;
> -			t = rg->to;
> -		}
> -		chg -= rg->to - rg->from;
> -	}
> -	return chg;
> -}
> -
> -static long region_truncate(struct list_head *head, long end)
> -{
> -	struct file_region *rg, *trg;
> -	long chg = 0;
> -
> -	/* Locate the region we are either in or before. */
> -	list_for_each_entry(rg, head, link)
> -		if (end <= rg->to)
> -			break;
> -	if (&rg->link == head)
> -		return 0;
> -
> -	/* If we are in the middle of a region then adjust it. */
> -	if (end > rg->from) {
> -		chg = rg->to - end;
> -		rg->to = end;
> -		rg = list_entry(rg->link.next, typeof(*rg), link);
> -	}
> -
> -	/* Drop any remaining regions. */
> -	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
> -		if (&rg->link == head)
> -			break;
> -		chg += rg->to - rg->from;
> -		list_del(&rg->link);
> -		kfree(rg);
> -	}
> -	return chg;
> -}
> -
>  /*
>   * Determine if the huge page at addr within the vma has an associated
>   * reservation.  Where it does not we will need to logically increase
> -- 
> 1.5.6.205.g7ca3a
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH 1/2] hugetlb reservations: move region tracking earlier
@ 2008-06-23 23:05         ` Mel Gorman
  0 siblings, 0 replies; 290+ messages in thread
From: Mel Gorman @ 2008-06-23 23:05 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Jon Tollefson, Andrew Morton, Nick Piggin, Nishanth Aravamudan,
	Adam Litke, linux-kernel, kernel-testers, linux-mm

On (23/06/08 18:35), Andy Whitcroft didst pronounce:
> Move the region tracking code much earlier so we can use it for page
> presence tracking later on.  No code is changed, just its location.
> 
> Signed-off-by: Andy Whitcroft <apw@shadowen.org>

Straight-forward code-move.

Acked-by: Mel Gorman <mel@csn.ul.ie>

> ---
>  mm/hugetlb.c |  246 +++++++++++++++++++++++++++++----------------------------
>  1 files changed, 125 insertions(+), 121 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 0f76ed1..d701e39 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -47,6 +47,131 @@ static unsigned long __initdata default_hstate_size;
>  static DEFINE_SPINLOCK(hugetlb_lock);
>  
>  /*
> + * Region tracking -- allows tracking of reservations and instantiated pages
> + *                    across the pages in a mapping.
> + */
> +struct file_region {
> +	struct list_head link;
> +	long from;
> +	long to;
> +};
> +
> +static long region_add(struct list_head *head, long f, long t)
> +{
> +	struct file_region *rg, *nrg, *trg;
> +
> +	/* Locate the region we are either in or before. */
> +	list_for_each_entry(rg, head, link)
> +		if (f <= rg->to)
> +			break;
> +
> +	/* Round our left edge to the current segment if it encloses us. */
> +	if (f > rg->from)
> +		f = rg->from;
> +
> +	/* Check for and consume any regions we now overlap with. */
> +	nrg = rg;
> +	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
> +		if (&rg->link == head)
> +			break;
> +		if (rg->from > t)
> +			break;
> +
> +		/* If this area reaches higher then extend our area to
> +		 * include it completely.  If this is not the first area
> +		 * which we intend to reuse, free it. */
> +		if (rg->to > t)
> +			t = rg->to;
> +		if (rg != nrg) {
> +			list_del(&rg->link);
> +			kfree(rg);
> +		}
> +	}
> +	nrg->from = f;
> +	nrg->to = t;
> +	return 0;
> +}
> +
> +static long region_chg(struct list_head *head, long f, long t)
> +{
> +	struct file_region *rg, *nrg;
> +	long chg = 0;
> +
> +	/* Locate the region we are before or in. */
> +	list_for_each_entry(rg, head, link)
> +		if (f <= rg->to)
> +			break;
> +
> +	/* If we are below the current region then a new region is required.
> +	 * Subtle, allocate a new region at the position but make it zero
> +	 * size such that we can guarantee to record the reservation. */
> +	if (&rg->link == head || t < rg->from) {
> +		nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
> +		if (!nrg)
> +			return -ENOMEM;
> +		nrg->from = f;
> +		nrg->to   = f;
> +		INIT_LIST_HEAD(&nrg->link);
> +		list_add(&nrg->link, rg->link.prev);
> +
> +		return t - f;
> +	}
> +
> +	/* Round our left edge to the current segment if it encloses us. */
> +	if (f > rg->from)
> +		f = rg->from;
> +	chg = t - f;
> +
> +	/* Check for and consume any regions we now overlap with. */
> +	list_for_each_entry(rg, rg->link.prev, link) {
> +		if (&rg->link == head)
> +			break;
> +		if (rg->from > t)
> +			return chg;
> +
> +		/* We overlap with this area, if it extends futher than
> +		 * us then we must extend ourselves.  Account for its
> +		 * existing reservation. */
> +		if (rg->to > t) {
> +			chg += rg->to - t;
> +			t = rg->to;
> +		}
> +		chg -= rg->to - rg->from;
> +	}
> +	return chg;
> +}
> +
> +static long region_truncate(struct list_head *head, long end)
> +{
> +	struct file_region *rg, *trg;
> +	long chg = 0;
> +
> +	/* Locate the region we are either in or before. */
> +	list_for_each_entry(rg, head, link)
> +		if (end <= rg->to)
> +			break;
> +	if (&rg->link == head)
> +		return 0;
> +
> +	/* If we are in the middle of a region then adjust it. */
> +	if (end > rg->from) {
> +		chg = rg->to - end;
> +		rg->to = end;
> +		rg = list_entry(rg->link.next, typeof(*rg), link);
> +	}
> +
> +	/* Drop any remaining regions. */
> +	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
> +		if (&rg->link == head)
> +			break;
> +		chg += rg->to - rg->from;
> +		list_del(&rg->link);
> +		kfree(rg);
> +	}
> +	return chg;
> +}
> +
> +/*
>   * Convert the address within this vma to the page offset within
>   * the mapping, in base page units.
>   */
> @@ -649,127 +774,6 @@ static void return_unused_surplus_pages(struct hstate *h,
>  	}
>  }
>  
> -struct file_region {
> -	struct list_head link;
> -	long from;
> -	long to;
> -};
> -
> -static long region_add(struct list_head *head, long f, long t)
> -{
> -	struct file_region *rg, *nrg, *trg;
> -
> -	/* Locate the region we are either in or before. */
> -	list_for_each_entry(rg, head, link)
> -		if (f <= rg->to)
> -			break;
> -
> -	/* Round our left edge to the current segment if it encloses us. */
> -	if (f > rg->from)
> -		f = rg->from;
> -
> -	/* Check for and consume any regions we now overlap with. */
> -	nrg = rg;
> -	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
> -		if (&rg->link == head)
> -			break;
> -		if (rg->from > t)
> -			break;
> -
> -		/* If this area reaches higher then extend our area to
> -		 * include it completely.  If this is not the first area
> -		 * which we intend to reuse, free it. */
> -		if (rg->to > t)
> -			t = rg->to;
> -		if (rg != nrg) {
> -			list_del(&rg->link);
> -			kfree(rg);
> -		}
> -	}
> -	nrg->from = f;
> -	nrg->to = t;
> -	return 0;
> -}
> -
> -static long region_chg(struct list_head *head, long f, long t)
> -{
> -	struct file_region *rg, *nrg;
> -	long chg = 0;
> -
> -	/* Locate the region we are before or in. */
> -	list_for_each_entry(rg, head, link)
> -		if (f <= rg->to)
> -			break;
> -
> -	/* If we are below the current region then a new region is required.
> -	 * Subtle, allocate a new region at the position but make it zero
> -	 * size such that we can guarantee to record the reservation. */
> -	if (&rg->link == head || t < rg->from) {
> -		nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
> -		if (!nrg)
> -			return -ENOMEM;
> -		nrg->from = f;
> -		nrg->to   = f;
> -		INIT_LIST_HEAD(&nrg->link);
> -		list_add(&nrg->link, rg->link.prev);
> -
> -		return t - f;
> -	}
> -
> -	/* Round our left edge to the current segment if it encloses us. */
> -	if (f > rg->from)
> -		f = rg->from;
> -	chg = t - f;
> -
> -	/* Check for and consume any regions we now overlap with. */
> -	list_for_each_entry(rg, rg->link.prev, link) {
> -		if (&rg->link == head)
> -			break;
> -		if (rg->from > t)
> -			return chg;
> -
> -		/* We overlap with this area, if it extends futher than
> -		 * us then we must extend ourselves.  Account for its
> -		 * existing reservation. */
> -		if (rg->to > t) {
> -			chg += rg->to - t;
> -			t = rg->to;
> -		}
> -		chg -= rg->to - rg->from;
> -	}
> -	return chg;
> -}
> -
> -static long region_truncate(struct list_head *head, long end)
> -{
> -	struct file_region *rg, *trg;
> -	long chg = 0;
> -
> -	/* Locate the region we are either in or before. */
> -	list_for_each_entry(rg, head, link)
> -		if (end <= rg->to)
> -			break;
> -	if (&rg->link == head)
> -		return 0;
> -
> -	/* If we are in the middle of a region then adjust it. */
> -	if (end > rg->from) {
> -		chg = rg->to - end;
> -		rg->to = end;
> -		rg = list_entry(rg->link.next, typeof(*rg), link);
> -	}
> -
> -	/* Drop any remaining regions. */
> -	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
> -		if (&rg->link == head)
> -			break;
> -		chg += rg->to - rg->from;
> -		list_del(&rg->link);
> -		kfree(rg);
> -	}
> -	return chg;
> -}
> -
>  /*
>   * Determine if the huge page at addr within the vma has an associated
>   * reservation.  Where it does not we will need to logically increase
> -- 
> 1.5.6.205.g7ca3a
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH 1/2] hugetlb reservations: move region tracking earlier
@ 2008-06-23 23:05         ` Mel Gorman
  0 siblings, 0 replies; 290+ messages in thread
From: Mel Gorman @ 2008-06-23 23:05 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Jon Tollefson, Andrew Morton, Nick Piggin, Nishanth Aravamudan,
	Adam Litke, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On (23/06/08 18:35), Andy Whitcroft didst pronounce:
> Move the region tracking code much earlier so we can use it for page
> presence tracking later on.  No code is changed, just its location.
> 
> Signed-off-by: Andy Whitcroft <apw-26w3C0LaAnFg9hUCZPvPmw@public.gmane.org>

Straight-forward code-move.

Acked-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>

> ---
>  mm/hugetlb.c |  246 +++++++++++++++++++++++++++++----------------------------
>  1 files changed, 125 insertions(+), 121 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 0f76ed1..d701e39 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -47,6 +47,131 @@ static unsigned long __initdata default_hstate_size;
>  static DEFINE_SPINLOCK(hugetlb_lock);
>  
>  /*
> + * Region tracking -- allows tracking of reservations and instantiated pages
> + *                    across the pages in a mapping.
> + */
> +struct file_region {
> +	struct list_head link;
> +	long from;
> +	long to;
> +};
> +
> +static long region_add(struct list_head *head, long f, long t)
> +{
> +	struct file_region *rg, *nrg, *trg;
> +
> +	/* Locate the region we are either in or before. */
> +	list_for_each_entry(rg, head, link)
> +		if (f <= rg->to)
> +			break;
> +
> +	/* Round our left edge to the current segment if it encloses us. */
> +	if (f > rg->from)
> +		f = rg->from;
> +
> +	/* Check for and consume any regions we now overlap with. */
> +	nrg = rg;
> +	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
> +		if (&rg->link == head)
> +			break;
> +		if (rg->from > t)
> +			break;
> +
> +		/* If this area reaches higher then extend our area to
> +		 * include it completely.  If this is not the first area
> +		 * which we intend to reuse, free it. */
> +		if (rg->to > t)
> +			t = rg->to;
> +		if (rg != nrg) {
> +			list_del(&rg->link);
> +			kfree(rg);
> +		}
> +	}
> +	nrg->from = f;
> +	nrg->to = t;
> +	return 0;
> +}
> +
> +static long region_chg(struct list_head *head, long f, long t)
> +{
> +	struct file_region *rg, *nrg;
> +	long chg = 0;
> +
> +	/* Locate the region we are before or in. */
> +	list_for_each_entry(rg, head, link)
> +		if (f <= rg->to)
> +			break;
> +
> +	/* If we are below the current region then a new region is required.
> +	 * Subtle, allocate a new region at the position but make it zero
> +	 * size such that we can guarantee to record the reservation. */
> +	if (&rg->link == head || t < rg->from) {
> +		nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
> +		if (!nrg)
> +			return -ENOMEM;
> +		nrg->from = f;
> +		nrg->to   = f;
> +		INIT_LIST_HEAD(&nrg->link);
> +		list_add(&nrg->link, rg->link.prev);
> +
> +		return t - f;
> +	}
> +
> +	/* Round our left edge to the current segment if it encloses us. */
> +	if (f > rg->from)
> +		f = rg->from;
> +	chg = t - f;
> +
> +	/* Check for and consume any regions we now overlap with. */
> +	list_for_each_entry(rg, rg->link.prev, link) {
> +		if (&rg->link == head)
> +			break;
> +		if (rg->from > t)
> +			return chg;
> +
> +		/* We overlap with this area, if it extends futher than
> +		 * us then we must extend ourselves.  Account for its
> +		 * existing reservation. */
> +		if (rg->to > t) {
> +			chg += rg->to - t;
> +			t = rg->to;
> +		}
> +		chg -= rg->to - rg->from;
> +	}
> +	return chg;
> +}
> +
> +static long region_truncate(struct list_head *head, long end)
> +{
> +	struct file_region *rg, *trg;
> +	long chg = 0;
> +
> +	/* Locate the region we are either in or before. */
> +	list_for_each_entry(rg, head, link)
> +		if (end <= rg->to)
> +			break;
> +	if (&rg->link == head)
> +		return 0;
> +
> +	/* If we are in the middle of a region then adjust it. */
> +	if (end > rg->from) {
> +		chg = rg->to - end;
> +		rg->to = end;
> +		rg = list_entry(rg->link.next, typeof(*rg), link);
> +	}
> +
> +	/* Drop any remaining regions. */
> +	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
> +		if (&rg->link == head)
> +			break;
> +		chg += rg->to - rg->from;
> +		list_del(&rg->link);
> +		kfree(rg);
> +	}
> +	return chg;
> +}
> +
> +/*
>   * Convert the address within this vma to the page offset within
>   * the mapping, in base page units.
>   */
> @@ -649,127 +774,6 @@ static void return_unused_surplus_pages(struct hstate *h,
>  	}
>  }
>  
> -struct file_region {
> -	struct list_head link;
> -	long from;
> -	long to;
> -};
> -
> -static long region_add(struct list_head *head, long f, long t)
> -{
> -	struct file_region *rg, *nrg, *trg;
> -
> -	/* Locate the region we are either in or before. */
> -	list_for_each_entry(rg, head, link)
> -		if (f <= rg->to)
> -			break;
> -
> -	/* Round our left edge to the current segment if it encloses us. */
> -	if (f > rg->from)
> -		f = rg->from;
> -
> -	/* Check for and consume any regions we now overlap with. */
> -	nrg = rg;
> -	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
> -		if (&rg->link == head)
> -			break;
> -		if (rg->from > t)
> -			break;
> -
> -		/* If this area reaches higher then extend our area to
> -		 * include it completely.  If this is not the first area
> -		 * which we intend to reuse, free it. */
> -		if (rg->to > t)
> -			t = rg->to;
> -		if (rg != nrg) {
> -			list_del(&rg->link);
> -			kfree(rg);
> -		}
> -	}
> -	nrg->from = f;
> -	nrg->to = t;
> -	return 0;
> -}
> -
> -static long region_chg(struct list_head *head, long f, long t)
> -{
> -	struct file_region *rg, *nrg;
> -	long chg = 0;
> -
> -	/* Locate the region we are before or in. */
> -	list_for_each_entry(rg, head, link)
> -		if (f <= rg->to)
> -			break;
> -
> -	/* If we are below the current region then a new region is required.
> -	 * Subtle, allocate a new region at the position but make it zero
> -	 * size such that we can guarantee to record the reservation. */
> -	if (&rg->link == head || t < rg->from) {
> -		nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
> -		if (!nrg)
> -			return -ENOMEM;
> -		nrg->from = f;
> -		nrg->to   = f;
> -		INIT_LIST_HEAD(&nrg->link);
> -		list_add(&nrg->link, rg->link.prev);
> -
> -		return t - f;
> -	}
> -
> -	/* Round our left edge to the current segment if it encloses us. */
> -	if (f > rg->from)
> -		f = rg->from;
> -	chg = t - f;
> -
> -	/* Check for and consume any regions we now overlap with. */
> -	list_for_each_entry(rg, rg->link.prev, link) {
> -		if (&rg->link == head)
> -			break;
> -		if (rg->from > t)
> -			return chg;
> -
> -		/* We overlap with this area, if it extends futher than
> -		 * us then we must extend ourselves.  Account for its
> -		 * existing reservation. */
> -		if (rg->to > t) {
> -			chg += rg->to - t;
> -			t = rg->to;
> -		}
> -		chg -= rg->to - rg->from;
> -	}
> -	return chg;
> -}
> -
> -static long region_truncate(struct list_head *head, long end)
> -{
> -	struct file_region *rg, *trg;
> -	long chg = 0;
> -
> -	/* Locate the region we are either in or before. */
> -	list_for_each_entry(rg, head, link)
> -		if (end <= rg->to)
> -			break;
> -	if (&rg->link == head)
> -		return 0;
> -
> -	/* If we are in the middle of a region then adjust it. */
> -	if (end > rg->from) {
> -		chg = rg->to - end;
> -		rg->to = end;
> -		rg = list_entry(rg->link.next, typeof(*rg), link);
> -	}
> -
> -	/* Drop any remaining regions. */
> -	list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
> -		if (&rg->link == head)
> -			break;
> -		chg += rg->to - rg->from;
> -		list_del(&rg->link);
> -		kfree(rg);
> -	}
> -	return chg;
> -}
> -
>  /*
>   * Determine if the huge page at addr within the vma has an associated
>   * reservation.  Where it does not we will need to logically increase
> -- 
> 1.5.6.205.g7ca3a
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits V2
  2008-06-23 17:35       ` Andy Whitcroft
  (?)
@ 2008-06-23 23:08         ` Mel Gorman
  -1 siblings, 0 replies; 290+ messages in thread
From: Mel Gorman @ 2008-06-23 23:08 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Jon Tollefson, Andrew Morton, Nick Piggin, Nishanth Aravamudan,
	Adam Litke, linux-kernel, kernel-testers, linux-mm

On (23/06/08 18:35), Andy Whitcroft didst pronounce:
> When a hugetlb mapping with a reservation is split, a new VMA is cloned
> from the original.  This new VMA is a direct copy of the original
> including the reservation count.  When this pair of VMAs are unmapped
> we will incorrect double account the unused reservation and the overall
> reservation count will be incorrect, in extreme cases it will wrap.
> 
> The problem occurs when we split an existing VMA say to unmap a page in
> the middle.  split_vma() will create a new VMA copying all fields from
> the original.  As we are storing our reservation count in vm_private_data
> this is also copies, endowing the new VMA with a duplicate of the original
> VMA's reservation.  Neither of the new VMAs can exhaust these reservations
> as they are too small, but when we unmap and close these VMAs we will
> incorrect credit the remainder twice and resv_huge_pages will become
> out of sync.  This can lead to allocation failures on mappings with
> reservations and even to resv_huge_pages wrapping which prevents all
> subsequent hugepage allocations.
> 
> The simple fix would be to correctly apportion the remaining reservation
> count when the split is made.  However the only hook we have vm_ops->open
> only has the new VMA we do not know the identity of the preceeding VMA.
> Also even if we did have that VMA to hand we do not know how much of the
> reservation was consumed each side of the split.
> 
> This patch therefore takes a different tack.  We know that the whole of any
> private mapping (which has a reservation) has a reservation over its whole
> size.  Any present pages represent consumed reservation.  Therefore if
> we track the instantiated pages we can calculate the remaining reservation.
> 
> This patch reuses the existing regions code to track the regions for which
> we have consumed reservation (ie. the instantiated pages), as each page
> is faulted in we record the consumption of reservation for the new page.
> When we need to return unused reservations at unmap time we simply count
> the consumed reservation region subtracting that from the whole of the map.
> During a VMA split the newly opened VMA will point to the same region map,
> as this map is offset oriented it remains valid for both of the split VMAs.
> This map is referenced counted so that it is removed when all VMAs which
> are part of the mmap are gone.
> 
> Thanks to Adam Litke and Mel Gorman for their review feedback.
> 
> Signed-off-by: Andy Whitcroft <apw@shadowen.org>

Nice explanation. Testing on i386 with qemu, this patch allows some
small tests to pass without corruption of the rsvd counters.
libhugetlbfs tests also passed. I do not see anything new to complain
about in the code. Thanks.

Acked-by: Mel Gorman <mel@csn.ul.ie>

> ---
>  mm/hugetlb.c |  171 ++++++++++++++++++++++++++++++++++++++++++++++++---------
>  1 files changed, 144 insertions(+), 27 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d701e39..7ba6d4d 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -49,6 +49,16 @@ static DEFINE_SPINLOCK(hugetlb_lock);
>  /*
>   * Region tracking -- allows tracking of reservations and instantiated pages
>   *                    across the pages in a mapping.
> + *
> + * The region data structures are protected by a combination of the mmap_sem
> + * and the hugetlb_instantion_mutex.  To access or modify a region the caller
> + * must either hold the mmap_sem for write, or the mmap_sem for read and
> + * the hugetlb_instantiation mutex:
> + *
> + * 	down_write(&mm->mmap_sem);
> + * or
> + * 	down_read(&mm->mmap_sem);
> + * 	mutex_lock(&hugetlb_instantiation_mutex);
>   */
>  struct file_region {
>  	struct list_head link;
> @@ -171,6 +181,30 @@ static long region_truncate(struct list_head *head, long end)
>  	return chg;
>  }
>  
> +static long region_count(struct list_head *head, long f, long t)
> +{
> +	struct file_region *rg;
> +	long chg = 0;
> +
> +	/* Locate each segment we overlap with, and count that overlap. */
> +	list_for_each_entry(rg, head, link) {
> +		int seg_from;
> +		int seg_to;
> +
> +		if (rg->to <= f)
> +			continue;
> +		if (rg->from >= t)
> +			break;
> +
> +		seg_from = max(rg->from, f);
> +		seg_to = min(rg->to, t);
> +
> +		chg += seg_to - seg_from;
> +	}
> +
> +	return chg;
> +}
> +
>  /*
>   * Convert the address within this vma to the page offset within
>   * the mapping, in base page units.
> @@ -193,9 +227,15 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
>  			(vma->vm_pgoff >> huge_page_order(h));
>  }
>  
> -#define HPAGE_RESV_OWNER    (1UL << (BITS_PER_LONG - 1))
> -#define HPAGE_RESV_UNMAPPED (1UL << (BITS_PER_LONG - 2))
> +/*
> + * Flags for MAP_PRIVATE reservations.  These are stored in the bottom
> + * bits of the reservation map pointer, which are always clear due to
> + * alignment.
> + */
> +#define HPAGE_RESV_OWNER    (1UL << 0)
> +#define HPAGE_RESV_UNMAPPED (1UL << 1)
>  #define HPAGE_RESV_MASK (HPAGE_RESV_OWNER | HPAGE_RESV_UNMAPPED)
> +
>  /*
>   * These helpers are used to track how many pages are reserved for
>   * faults in a MAP_PRIVATE mapping. Only the process that called mmap()
> @@ -205,6 +245,15 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
>   * the reserve counters are updated with the hugetlb_lock held. It is safe
>   * to reset the VMA at fork() time as it is not in use yet and there is no
>   * chance of the global counters getting corrupted as a result of the values.
> + *
> + * The private mapping reservation is represented in a subtly different
> + * manner to a shared mapping.  A shared mapping has a region map associated
> + * with the underlying file, this region map represents the backing file
> + * pages which have ever had a reservation assigned which this persists even
> + * after the page is instantiated.  A private mapping has a region map
> + * associated with the original mmap which is attached to all VMAs which
> + * reference it, this region map represents those offsets which have consumed
> + * reservation ie. where pages have been instantiated.
>   */
>  static unsigned long get_vma_private_data(struct vm_area_struct *vma)
>  {
> @@ -217,22 +266,48 @@ static void set_vma_private_data(struct vm_area_struct *vma,
>  	vma->vm_private_data = (void *)value;
>  }
>  
> -static unsigned long vma_resv_huge_pages(struct vm_area_struct *vma)
> +struct resv_map {
> +	struct kref refs;
> +	struct list_head regions;
> +};
> +
> +struct resv_map *resv_map_alloc(void)
> +{
> +	struct resv_map *resv_map = kmalloc(sizeof(*resv_map), GFP_KERNEL);
> +	if (!resv_map)
> +		return NULL;
> +
> +	kref_init(&resv_map->refs);
> +	INIT_LIST_HEAD(&resv_map->regions);
> +
> +	return resv_map;
> +}
> +
> +void resv_map_release(struct kref *ref)
> +{
> +	struct resv_map *resv_map = container_of(ref, struct resv_map, refs);
> +
> +	/* Clear out any active regions before we release the map. */
> +	region_truncate(&resv_map->regions, 0);
> +	kfree(resv_map);
> +}
> +
> +static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
>  {
>  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
>  	if (!(vma->vm_flags & VM_SHARED))
> -		return get_vma_private_data(vma) & ~HPAGE_RESV_MASK;
> +		return (struct resv_map *)(get_vma_private_data(vma) &
> +							~HPAGE_RESV_MASK);
>  	return 0;
>  }
>  
> -static void set_vma_resv_huge_pages(struct vm_area_struct *vma,
> -							unsigned long reserve)
> +static void set_vma_resv_map(struct vm_area_struct *vma, struct resv_map *map)
>  {
>  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
>  	VM_BUG_ON(vma->vm_flags & VM_SHARED);
>  
> -	set_vma_private_data(vma,
> -		(get_vma_private_data(vma) & HPAGE_RESV_MASK) | reserve);
> +	set_vma_private_data(vma, (get_vma_private_data(vma) &
> +				HPAGE_RESV_MASK) | (unsigned long)map);
>  }
>  
>  static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
> @@ -260,19 +335,12 @@ static void decrement_hugepage_resv_vma(struct hstate *h,
>  	if (vma->vm_flags & VM_SHARED) {
>  		/* Shared mappings always use reserves */
>  		h->resv_huge_pages--;
> -	} else {
> +	} else if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
>  		/*
>  		 * Only the process that called mmap() has reserves for
>  		 * private mappings.
>  		 */
> -		if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
> -			unsigned long flags, reserve;
> -			h->resv_huge_pages--;
> -			flags = (unsigned long)vma->vm_private_data &
> -							HPAGE_RESV_MASK;
> -			reserve = (unsigned long)vma->vm_private_data - 1;
> -			vma->vm_private_data = (void *)(reserve | flags);
> -		}
> +		h->resv_huge_pages--;
>  	}
>  }
>  
> @@ -289,7 +357,7 @@ static int vma_has_private_reserves(struct vm_area_struct *vma)
>  {
>  	if (vma->vm_flags & VM_SHARED)
>  		return 0;
> -	if (!vma_resv_huge_pages(vma))
> +	if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER))
>  		return 0;
>  	return 1;
>  }
> @@ -794,12 +862,19 @@ static int vma_needs_reservation(struct hstate *h,
>  		return region_chg(&inode->i_mapping->private_list,
>  							idx, idx + 1);
>  
> -	} else {
> -		if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER))
> -			return 1;
> -	}
> +	} else if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
> +		return 1;
>  
> -	return 0;
> +	} else  {
> +		int err;
> +		pgoff_t idx = vma_pagecache_offset(h, vma, addr);
> +		struct resv_map *reservations = vma_resv_map(vma);
> +
> +		err = region_chg(&reservations->regions, idx, idx + 1);
> +		if (err < 0)
> +			return err;
> +		return 0;
> +	}
>  }
>  static void vma_commit_reservation(struct hstate *h,
>  			struct vm_area_struct *vma, unsigned long addr)
> @@ -810,6 +885,13 @@ static void vma_commit_reservation(struct hstate *h,
>  	if (vma->vm_flags & VM_SHARED) {
>  		pgoff_t idx = vma_pagecache_offset(h, vma, addr);
>  		region_add(&inode->i_mapping->private_list, idx, idx + 1);
> +
> +	} else if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
> +		pgoff_t idx = vma_pagecache_offset(h, vma, addr);
> +		struct resv_map *reservations = vma_resv_map(vma);
> +
> +		/* Mark this page used in the map. */
> +		region_add(&reservations->regions, idx, idx + 1);
>  	}
>  }
>  
> @@ -1456,13 +1538,42 @@ out:
>  	return ret;
>  }
>  
> +static void hugetlb_vm_op_open(struct vm_area_struct *vma)
> +{
> +	struct resv_map *reservations = vma_resv_map(vma);
> +
> +	/*
> +	 * This new VMA should share its siblings reservation map if present.
> +	 * The VMA will only ever have a valid reservation map pointer where
> +	 * it is being copied for another still existing VMA.  As that VMA
> +	 * has a reference to the reservation map it cannot dissappear until
> +	 * after this open call completes.  It is therefore safe to take a
> +	 * new reference here without additional locking.
> +	 */
> +	if (reservations)
> +		kref_get(&reservations->refs);
> +}
> +
>  static void hugetlb_vm_op_close(struct vm_area_struct *vma)
>  {
>  	struct hstate *h = hstate_vma(vma);
> -	unsigned long reserve = vma_resv_huge_pages(vma);
> +	struct resv_map *reservations = vma_resv_map(vma);
> +	unsigned long reserve;
> +	unsigned long start;
> +	unsigned long end;
>  
> -	if (reserve)
> -		hugetlb_acct_memory(h, -reserve);
> +	if (reservations) {
> +		start = vma_pagecache_offset(h, vma, vma->vm_start);
> +		end = vma_pagecache_offset(h, vma, vma->vm_end);
> +
> +		reserve = (end - start) -
> +			region_count(&reservations->regions, start, end);
> +
> +		kref_put(&reservations->refs, resv_map_release);
> +
> +		if (reserve)
> +			hugetlb_acct_memory(h, -reserve);
> +	}
>  }
>  
>  /*
> @@ -1479,6 +1590,7 @@ static int hugetlb_vm_op_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  
>  struct vm_operations_struct hugetlb_vm_ops = {
>  	.fault = hugetlb_vm_op_fault,
> +	.open = hugetlb_vm_op_open,
>  	.close = hugetlb_vm_op_close,
>  };
>  
> @@ -2037,8 +2149,13 @@ int hugetlb_reserve_pages(struct inode *inode,
>  	if (!vma || vma->vm_flags & VM_SHARED)
>  		chg = region_chg(&inode->i_mapping->private_list, from, to);
>  	else {
> +		struct resv_map *resv_map = resv_map_alloc();
> +		if (!resv_map)
> +			return -ENOMEM;
> +
>  		chg = to - from;
> -		set_vma_resv_huge_pages(vma, chg);
> +
> +		set_vma_resv_map(vma, resv_map);
>  		set_vma_resv_flags(vma, HPAGE_RESV_OWNER);
>  	}
>  
> -- 
> 1.5.6.205.g7ca3a
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits V2
@ 2008-06-23 23:08         ` Mel Gorman
  0 siblings, 0 replies; 290+ messages in thread
From: Mel Gorman @ 2008-06-23 23:08 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Jon Tollefson, Andrew Morton, Nick Piggin, Nishanth Aravamudan,
	Adam Litke, linux-kernel, kernel-testers, linux-mm

On (23/06/08 18:35), Andy Whitcroft didst pronounce:
> When a hugetlb mapping with a reservation is split, a new VMA is cloned
> from the original.  This new VMA is a direct copy of the original
> including the reservation count.  When this pair of VMAs are unmapped
> we will incorrect double account the unused reservation and the overall
> reservation count will be incorrect, in extreme cases it will wrap.
> 
> The problem occurs when we split an existing VMA say to unmap a page in
> the middle.  split_vma() will create a new VMA copying all fields from
> the original.  As we are storing our reservation count in vm_private_data
> this is also copies, endowing the new VMA with a duplicate of the original
> VMA's reservation.  Neither of the new VMAs can exhaust these reservations
> as they are too small, but when we unmap and close these VMAs we will
> incorrect credit the remainder twice and resv_huge_pages will become
> out of sync.  This can lead to allocation failures on mappings with
> reservations and even to resv_huge_pages wrapping which prevents all
> subsequent hugepage allocations.
> 
> The simple fix would be to correctly apportion the remaining reservation
> count when the split is made.  However the only hook we have vm_ops->open
> only has the new VMA we do not know the identity of the preceeding VMA.
> Also even if we did have that VMA to hand we do not know how much of the
> reservation was consumed each side of the split.
> 
> This patch therefore takes a different tack.  We know that the whole of any
> private mapping (which has a reservation) has a reservation over its whole
> size.  Any present pages represent consumed reservation.  Therefore if
> we track the instantiated pages we can calculate the remaining reservation.
> 
> This patch reuses the existing regions code to track the regions for which
> we have consumed reservation (ie. the instantiated pages), as each page
> is faulted in we record the consumption of reservation for the new page.
> When we need to return unused reservations at unmap time we simply count
> the consumed reservation region subtracting that from the whole of the map.
> During a VMA split the newly opened VMA will point to the same region map,
> as this map is offset oriented it remains valid for both of the split VMAs.
> This map is referenced counted so that it is removed when all VMAs which
> are part of the mmap are gone.
> 
> Thanks to Adam Litke and Mel Gorman for their review feedback.
> 
> Signed-off-by: Andy Whitcroft <apw@shadowen.org>

Nice explanation. Testing on i386 with qemu, this patch allows some
small tests to pass without corruption of the rsvd counters.
libhugetlbfs tests also passed. I do not see anything new to complain
about in the code. Thanks.

Acked-by: Mel Gorman <mel@csn.ul.ie>

> ---
>  mm/hugetlb.c |  171 ++++++++++++++++++++++++++++++++++++++++++++++++---------
>  1 files changed, 144 insertions(+), 27 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d701e39..7ba6d4d 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -49,6 +49,16 @@ static DEFINE_SPINLOCK(hugetlb_lock);
>  /*
>   * Region tracking -- allows tracking of reservations and instantiated pages
>   *                    across the pages in a mapping.
> + *
> + * The region data structures are protected by a combination of the mmap_sem
> + * and the hugetlb_instantion_mutex.  To access or modify a region the caller
> + * must either hold the mmap_sem for write, or the mmap_sem for read and
> + * the hugetlb_instantiation mutex:
> + *
> + * 	down_write(&mm->mmap_sem);
> + * or
> + * 	down_read(&mm->mmap_sem);
> + * 	mutex_lock(&hugetlb_instantiation_mutex);
>   */
>  struct file_region {
>  	struct list_head link;
> @@ -171,6 +181,30 @@ static long region_truncate(struct list_head *head, long end)
>  	return chg;
>  }
>  
> +static long region_count(struct list_head *head, long f, long t)
> +{
> +	struct file_region *rg;
> +	long chg = 0;
> +
> +	/* Locate each segment we overlap with, and count that overlap. */
> +	list_for_each_entry(rg, head, link) {
> +		int seg_from;
> +		int seg_to;
> +
> +		if (rg->to <= f)
> +			continue;
> +		if (rg->from >= t)
> +			break;
> +
> +		seg_from = max(rg->from, f);
> +		seg_to = min(rg->to, t);
> +
> +		chg += seg_to - seg_from;
> +	}
> +
> +	return chg;
> +}
> +
>  /*
>   * Convert the address within this vma to the page offset within
>   * the mapping, in base page units.
> @@ -193,9 +227,15 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
>  			(vma->vm_pgoff >> huge_page_order(h));
>  }
>  
> -#define HPAGE_RESV_OWNER    (1UL << (BITS_PER_LONG - 1))
> -#define HPAGE_RESV_UNMAPPED (1UL << (BITS_PER_LONG - 2))
> +/*
> + * Flags for MAP_PRIVATE reservations.  These are stored in the bottom
> + * bits of the reservation map pointer, which are always clear due to
> + * alignment.
> + */
> +#define HPAGE_RESV_OWNER    (1UL << 0)
> +#define HPAGE_RESV_UNMAPPED (1UL << 1)
>  #define HPAGE_RESV_MASK (HPAGE_RESV_OWNER | HPAGE_RESV_UNMAPPED)
> +
>  /*
>   * These helpers are used to track how many pages are reserved for
>   * faults in a MAP_PRIVATE mapping. Only the process that called mmap()
> @@ -205,6 +245,15 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
>   * the reserve counters are updated with the hugetlb_lock held. It is safe
>   * to reset the VMA at fork() time as it is not in use yet and there is no
>   * chance of the global counters getting corrupted as a result of the values.
> + *
> + * The private mapping reservation is represented in a subtly different
> + * manner to a shared mapping.  A shared mapping has a region map associated
> + * with the underlying file, this region map represents the backing file
> + * pages which have ever had a reservation assigned which this persists even
> + * after the page is instantiated.  A private mapping has a region map
> + * associated with the original mmap which is attached to all VMAs which
> + * reference it, this region map represents those offsets which have consumed
> + * reservation ie. where pages have been instantiated.
>   */
>  static unsigned long get_vma_private_data(struct vm_area_struct *vma)
>  {
> @@ -217,22 +266,48 @@ static void set_vma_private_data(struct vm_area_struct *vma,
>  	vma->vm_private_data = (void *)value;
>  }
>  
> -static unsigned long vma_resv_huge_pages(struct vm_area_struct *vma)
> +struct resv_map {
> +	struct kref refs;
> +	struct list_head regions;
> +};
> +
> +struct resv_map *resv_map_alloc(void)
> +{
> +	struct resv_map *resv_map = kmalloc(sizeof(*resv_map), GFP_KERNEL);
> +	if (!resv_map)
> +		return NULL;
> +
> +	kref_init(&resv_map->refs);
> +	INIT_LIST_HEAD(&resv_map->regions);
> +
> +	return resv_map;
> +}
> +
> +void resv_map_release(struct kref *ref)
> +{
> +	struct resv_map *resv_map = container_of(ref, struct resv_map, refs);
> +
> +	/* Clear out any active regions before we release the map. */
> +	region_truncate(&resv_map->regions, 0);
> +	kfree(resv_map);
> +}
> +
> +static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
>  {
>  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
>  	if (!(vma->vm_flags & VM_SHARED))
> -		return get_vma_private_data(vma) & ~HPAGE_RESV_MASK;
> +		return (struct resv_map *)(get_vma_private_data(vma) &
> +							~HPAGE_RESV_MASK);
>  	return 0;
>  }
>  
> -static void set_vma_resv_huge_pages(struct vm_area_struct *vma,
> -							unsigned long reserve)
> +static void set_vma_resv_map(struct vm_area_struct *vma, struct resv_map *map)
>  {
>  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
>  	VM_BUG_ON(vma->vm_flags & VM_SHARED);
>  
> -	set_vma_private_data(vma,
> -		(get_vma_private_data(vma) & HPAGE_RESV_MASK) | reserve);
> +	set_vma_private_data(vma, (get_vma_private_data(vma) &
> +				HPAGE_RESV_MASK) | (unsigned long)map);
>  }
>  
>  static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
> @@ -260,19 +335,12 @@ static void decrement_hugepage_resv_vma(struct hstate *h,
>  	if (vma->vm_flags & VM_SHARED) {
>  		/* Shared mappings always use reserves */
>  		h->resv_huge_pages--;
> -	} else {
> +	} else if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
>  		/*
>  		 * Only the process that called mmap() has reserves for
>  		 * private mappings.
>  		 */
> -		if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
> -			unsigned long flags, reserve;
> -			h->resv_huge_pages--;
> -			flags = (unsigned long)vma->vm_private_data &
> -							HPAGE_RESV_MASK;
> -			reserve = (unsigned long)vma->vm_private_data - 1;
> -			vma->vm_private_data = (void *)(reserve | flags);
> -		}
> +		h->resv_huge_pages--;
>  	}
>  }
>  
> @@ -289,7 +357,7 @@ static int vma_has_private_reserves(struct vm_area_struct *vma)
>  {
>  	if (vma->vm_flags & VM_SHARED)
>  		return 0;
> -	if (!vma_resv_huge_pages(vma))
> +	if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER))
>  		return 0;
>  	return 1;
>  }
> @@ -794,12 +862,19 @@ static int vma_needs_reservation(struct hstate *h,
>  		return region_chg(&inode->i_mapping->private_list,
>  							idx, idx + 1);
>  
> -	} else {
> -		if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER))
> -			return 1;
> -	}
> +	} else if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
> +		return 1;
>  
> -	return 0;
> +	} else  {
> +		int err;
> +		pgoff_t idx = vma_pagecache_offset(h, vma, addr);
> +		struct resv_map *reservations = vma_resv_map(vma);
> +
> +		err = region_chg(&reservations->regions, idx, idx + 1);
> +		if (err < 0)
> +			return err;
> +		return 0;
> +	}
>  }
>  static void vma_commit_reservation(struct hstate *h,
>  			struct vm_area_struct *vma, unsigned long addr)
> @@ -810,6 +885,13 @@ static void vma_commit_reservation(struct hstate *h,
>  	if (vma->vm_flags & VM_SHARED) {
>  		pgoff_t idx = vma_pagecache_offset(h, vma, addr);
>  		region_add(&inode->i_mapping->private_list, idx, idx + 1);
> +
> +	} else if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
> +		pgoff_t idx = vma_pagecache_offset(h, vma, addr);
> +		struct resv_map *reservations = vma_resv_map(vma);
> +
> +		/* Mark this page used in the map. */
> +		region_add(&reservations->regions, idx, idx + 1);
>  	}
>  }
>  
> @@ -1456,13 +1538,42 @@ out:
>  	return ret;
>  }
>  
> +static void hugetlb_vm_op_open(struct vm_area_struct *vma)
> +{
> +	struct resv_map *reservations = vma_resv_map(vma);
> +
> +	/*
> +	 * This new VMA should share its siblings reservation map if present.
> +	 * The VMA will only ever have a valid reservation map pointer where
> +	 * it is being copied for another still existing VMA.  As that VMA
> +	 * has a reference to the reservation map it cannot dissappear until
> +	 * after this open call completes.  It is therefore safe to take a
> +	 * new reference here without additional locking.
> +	 */
> +	if (reservations)
> +		kref_get(&reservations->refs);
> +}
> +
>  static void hugetlb_vm_op_close(struct vm_area_struct *vma)
>  {
>  	struct hstate *h = hstate_vma(vma);
> -	unsigned long reserve = vma_resv_huge_pages(vma);
> +	struct resv_map *reservations = vma_resv_map(vma);
> +	unsigned long reserve;
> +	unsigned long start;
> +	unsigned long end;
>  
> -	if (reserve)
> -		hugetlb_acct_memory(h, -reserve);
> +	if (reservations) {
> +		start = vma_pagecache_offset(h, vma, vma->vm_start);
> +		end = vma_pagecache_offset(h, vma, vma->vm_end);
> +
> +		reserve = (end - start) -
> +			region_count(&reservations->regions, start, end);
> +
> +		kref_put(&reservations->refs, resv_map_release);
> +
> +		if (reserve)
> +			hugetlb_acct_memory(h, -reserve);
> +	}
>  }
>  
>  /*
> @@ -1479,6 +1590,7 @@ static int hugetlb_vm_op_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  
>  struct vm_operations_struct hugetlb_vm_ops = {
>  	.fault = hugetlb_vm_op_fault,
> +	.open = hugetlb_vm_op_open,
>  	.close = hugetlb_vm_op_close,
>  };
>  
> @@ -2037,8 +2149,13 @@ int hugetlb_reserve_pages(struct inode *inode,
>  	if (!vma || vma->vm_flags & VM_SHARED)
>  		chg = region_chg(&inode->i_mapping->private_list, from, to);
>  	else {
> +		struct resv_map *resv_map = resv_map_alloc();
> +		if (!resv_map)
> +			return -ENOMEM;
> +
>  		chg = to - from;
> -		set_vma_resv_huge_pages(vma, chg);
> +
> +		set_vma_resv_map(vma, resv_map);
>  		set_vma_resv_flags(vma, HPAGE_RESV_OWNER);
>  	}
>  
> -- 
> 1.5.6.205.g7ca3a
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits V2
@ 2008-06-23 23:08         ` Mel Gorman
  0 siblings, 0 replies; 290+ messages in thread
From: Mel Gorman @ 2008-06-23 23:08 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Jon Tollefson, Andrew Morton, Nick Piggin, Nishanth Aravamudan,
	Adam Litke, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On (23/06/08 18:35), Andy Whitcroft didst pronounce:
> When a hugetlb mapping with a reservation is split, a new VMA is cloned
> from the original.  This new VMA is a direct copy of the original
> including the reservation count.  When this pair of VMAs are unmapped
> we will incorrect double account the unused reservation and the overall
> reservation count will be incorrect, in extreme cases it will wrap.
> 
> The problem occurs when we split an existing VMA say to unmap a page in
> the middle.  split_vma() will create a new VMA copying all fields from
> the original.  As we are storing our reservation count in vm_private_data
> this is also copies, endowing the new VMA with a duplicate of the original
> VMA's reservation.  Neither of the new VMAs can exhaust these reservations
> as they are too small, but when we unmap and close these VMAs we will
> incorrect credit the remainder twice and resv_huge_pages will become
> out of sync.  This can lead to allocation failures on mappings with
> reservations and even to resv_huge_pages wrapping which prevents all
> subsequent hugepage allocations.
> 
> The simple fix would be to correctly apportion the remaining reservation
> count when the split is made.  However the only hook we have vm_ops->open
> only has the new VMA we do not know the identity of the preceeding VMA.
> Also even if we did have that VMA to hand we do not know how much of the
> reservation was consumed each side of the split.
> 
> This patch therefore takes a different tack.  We know that the whole of any
> private mapping (which has a reservation) has a reservation over its whole
> size.  Any present pages represent consumed reservation.  Therefore if
> we track the instantiated pages we can calculate the remaining reservation.
> 
> This patch reuses the existing regions code to track the regions for which
> we have consumed reservation (ie. the instantiated pages), as each page
> is faulted in we record the consumption of reservation for the new page.
> When we need to return unused reservations at unmap time we simply count
> the consumed reservation region subtracting that from the whole of the map.
> During a VMA split the newly opened VMA will point to the same region map,
> as this map is offset oriented it remains valid for both of the split VMAs.
> This map is referenced counted so that it is removed when all VMAs which
> are part of the mmap are gone.
> 
> Thanks to Adam Litke and Mel Gorman for their review feedback.
> 
> Signed-off-by: Andy Whitcroft <apw-26w3C0LaAnFg9hUCZPvPmw@public.gmane.org>

Nice explanation. Testing on i386 with qemu, this patch allows some
small tests to pass without corruption of the rsvd counters.
libhugetlbfs tests also passed. I do not see anything new to complain
about in the code. Thanks.

Acked-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>

> ---
>  mm/hugetlb.c |  171 ++++++++++++++++++++++++++++++++++++++++++++++++---------
>  1 files changed, 144 insertions(+), 27 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d701e39..7ba6d4d 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -49,6 +49,16 @@ static DEFINE_SPINLOCK(hugetlb_lock);
>  /*
>   * Region tracking -- allows tracking of reservations and instantiated pages
>   *                    across the pages in a mapping.
> + *
> + * The region data structures are protected by a combination of the mmap_sem
> + * and the hugetlb_instantion_mutex.  To access or modify a region the caller
> + * must either hold the mmap_sem for write, or the mmap_sem for read and
> + * the hugetlb_instantiation mutex:
> + *
> + * 	down_write(&mm->mmap_sem);
> + * or
> + * 	down_read(&mm->mmap_sem);
> + * 	mutex_lock(&hugetlb_instantiation_mutex);
>   */
>  struct file_region {
>  	struct list_head link;
> @@ -171,6 +181,30 @@ static long region_truncate(struct list_head *head, long end)
>  	return chg;
>  }
>  
> +static long region_count(struct list_head *head, long f, long t)
> +{
> +	struct file_region *rg;
> +	long chg = 0;
> +
> +	/* Locate each segment we overlap with, and count that overlap. */
> +	list_for_each_entry(rg, head, link) {
> +		int seg_from;
> +		int seg_to;
> +
> +		if (rg->to <= f)
> +			continue;
> +		if (rg->from >= t)
> +			break;
> +
> +		seg_from = max(rg->from, f);
> +		seg_to = min(rg->to, t);
> +
> +		chg += seg_to - seg_from;
> +	}
> +
> +	return chg;
> +}
> +
>  /*
>   * Convert the address within this vma to the page offset within
>   * the mapping, in base page units.
> @@ -193,9 +227,15 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
>  			(vma->vm_pgoff >> huge_page_order(h));
>  }
>  
> -#define HPAGE_RESV_OWNER    (1UL << (BITS_PER_LONG - 1))
> -#define HPAGE_RESV_UNMAPPED (1UL << (BITS_PER_LONG - 2))
> +/*
> + * Flags for MAP_PRIVATE reservations.  These are stored in the bottom
> + * bits of the reservation map pointer, which are always clear due to
> + * alignment.
> + */
> +#define HPAGE_RESV_OWNER    (1UL << 0)
> +#define HPAGE_RESV_UNMAPPED (1UL << 1)
>  #define HPAGE_RESV_MASK (HPAGE_RESV_OWNER | HPAGE_RESV_UNMAPPED)
> +
>  /*
>   * These helpers are used to track how many pages are reserved for
>   * faults in a MAP_PRIVATE mapping. Only the process that called mmap()
> @@ -205,6 +245,15 @@ static pgoff_t vma_pagecache_offset(struct hstate *h,
>   * the reserve counters are updated with the hugetlb_lock held. It is safe
>   * to reset the VMA at fork() time as it is not in use yet and there is no
>   * chance of the global counters getting corrupted as a result of the values.
> + *
> + * The private mapping reservation is represented in a subtly different
> + * manner to a shared mapping.  A shared mapping has a region map associated
> + * with the underlying file, this region map represents the backing file
> + * pages which have ever had a reservation assigned which this persists even
> + * after the page is instantiated.  A private mapping has a region map
> + * associated with the original mmap which is attached to all VMAs which
> + * reference it, this region map represents those offsets which have consumed
> + * reservation ie. where pages have been instantiated.
>   */
>  static unsigned long get_vma_private_data(struct vm_area_struct *vma)
>  {
> @@ -217,22 +266,48 @@ static void set_vma_private_data(struct vm_area_struct *vma,
>  	vma->vm_private_data = (void *)value;
>  }
>  
> -static unsigned long vma_resv_huge_pages(struct vm_area_struct *vma)
> +struct resv_map {
> +	struct kref refs;
> +	struct list_head regions;
> +};
> +
> +struct resv_map *resv_map_alloc(void)
> +{
> +	struct resv_map *resv_map = kmalloc(sizeof(*resv_map), GFP_KERNEL);
> +	if (!resv_map)
> +		return NULL;
> +
> +	kref_init(&resv_map->refs);
> +	INIT_LIST_HEAD(&resv_map->regions);
> +
> +	return resv_map;
> +}
> +
> +void resv_map_release(struct kref *ref)
> +{
> +	struct resv_map *resv_map = container_of(ref, struct resv_map, refs);
> +
> +	/* Clear out any active regions before we release the map. */
> +	region_truncate(&resv_map->regions, 0);
> +	kfree(resv_map);
> +}
> +
> +static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
>  {
>  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
>  	if (!(vma->vm_flags & VM_SHARED))
> -		return get_vma_private_data(vma) & ~HPAGE_RESV_MASK;
> +		return (struct resv_map *)(get_vma_private_data(vma) &
> +							~HPAGE_RESV_MASK);
>  	return 0;
>  }
>  
> -static void set_vma_resv_huge_pages(struct vm_area_struct *vma,
> -							unsigned long reserve)
> +static void set_vma_resv_map(struct vm_area_struct *vma, struct resv_map *map)
>  {
>  	VM_BUG_ON(!is_vm_hugetlb_page(vma));
>  	VM_BUG_ON(vma->vm_flags & VM_SHARED);
>  
> -	set_vma_private_data(vma,
> -		(get_vma_private_data(vma) & HPAGE_RESV_MASK) | reserve);
> +	set_vma_private_data(vma, (get_vma_private_data(vma) &
> +				HPAGE_RESV_MASK) | (unsigned long)map);
>  }
>  
>  static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
> @@ -260,19 +335,12 @@ static void decrement_hugepage_resv_vma(struct hstate *h,
>  	if (vma->vm_flags & VM_SHARED) {
>  		/* Shared mappings always use reserves */
>  		h->resv_huge_pages--;
> -	} else {
> +	} else if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
>  		/*
>  		 * Only the process that called mmap() has reserves for
>  		 * private mappings.
>  		 */
> -		if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
> -			unsigned long flags, reserve;
> -			h->resv_huge_pages--;
> -			flags = (unsigned long)vma->vm_private_data &
> -							HPAGE_RESV_MASK;
> -			reserve = (unsigned long)vma->vm_private_data - 1;
> -			vma->vm_private_data = (void *)(reserve | flags);
> -		}
> +		h->resv_huge_pages--;
>  	}
>  }
>  
> @@ -289,7 +357,7 @@ static int vma_has_private_reserves(struct vm_area_struct *vma)
>  {
>  	if (vma->vm_flags & VM_SHARED)
>  		return 0;
> -	if (!vma_resv_huge_pages(vma))
> +	if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER))
>  		return 0;
>  	return 1;
>  }
> @@ -794,12 +862,19 @@ static int vma_needs_reservation(struct hstate *h,
>  		return region_chg(&inode->i_mapping->private_list,
>  							idx, idx + 1);
>  
> -	} else {
> -		if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER))
> -			return 1;
> -	}
> +	} else if (!is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
> +		return 1;
>  
> -	return 0;
> +	} else  {
> +		int err;
> +		pgoff_t idx = vma_pagecache_offset(h, vma, addr);
> +		struct resv_map *reservations = vma_resv_map(vma);
> +
> +		err = region_chg(&reservations->regions, idx, idx + 1);
> +		if (err < 0)
> +			return err;
> +		return 0;
> +	}
>  }
>  static void vma_commit_reservation(struct hstate *h,
>  			struct vm_area_struct *vma, unsigned long addr)
> @@ -810,6 +885,13 @@ static void vma_commit_reservation(struct hstate *h,
>  	if (vma->vm_flags & VM_SHARED) {
>  		pgoff_t idx = vma_pagecache_offset(h, vma, addr);
>  		region_add(&inode->i_mapping->private_list, idx, idx + 1);
> +
> +	} else if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
> +		pgoff_t idx = vma_pagecache_offset(h, vma, addr);
> +		struct resv_map *reservations = vma_resv_map(vma);
> +
> +		/* Mark this page used in the map. */
> +		region_add(&reservations->regions, idx, idx + 1);
>  	}
>  }
>  
> @@ -1456,13 +1538,42 @@ out:
>  	return ret;
>  }
>  
> +static void hugetlb_vm_op_open(struct vm_area_struct *vma)
> +{
> +	struct resv_map *reservations = vma_resv_map(vma);
> +
> +	/*
> +	 * This new VMA should share its siblings reservation map if present.
> +	 * The VMA will only ever have a valid reservation map pointer where
> +	 * it is being copied for another still existing VMA.  As that VMA
> +	 * has a reference to the reservation map it cannot dissappear until
> +	 * after this open call completes.  It is therefore safe to take a
> +	 * new reference here without additional locking.
> +	 */
> +	if (reservations)
> +		kref_get(&reservations->refs);
> +}
> +
>  static void hugetlb_vm_op_close(struct vm_area_struct *vma)
>  {
>  	struct hstate *h = hstate_vma(vma);
> -	unsigned long reserve = vma_resv_huge_pages(vma);
> +	struct resv_map *reservations = vma_resv_map(vma);
> +	unsigned long reserve;
> +	unsigned long start;
> +	unsigned long end;
>  
> -	if (reserve)
> -		hugetlb_acct_memory(h, -reserve);
> +	if (reservations) {
> +		start = vma_pagecache_offset(h, vma, vma->vm_start);
> +		end = vma_pagecache_offset(h, vma, vma->vm_end);
> +
> +		reserve = (end - start) -
> +			region_count(&reservations->regions, start, end);
> +
> +		kref_put(&reservations->refs, resv_map_release);
> +
> +		if (reserve)
> +			hugetlb_acct_memory(h, -reserve);
> +	}
>  }
>  
>  /*
> @@ -1479,6 +1590,7 @@ static int hugetlb_vm_op_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  
>  struct vm_operations_struct hugetlb_vm_ops = {
>  	.fault = hugetlb_vm_op_fault,
> +	.open = hugetlb_vm_op_open,
>  	.close = hugetlb_vm_op_close,
>  };
>  
> @@ -2037,8 +2149,13 @@ int hugetlb_reserve_pages(struct inode *inode,
>  	if (!vma || vma->vm_flags & VM_SHARED)
>  		chg = region_chg(&inode->i_mapping->private_list, from, to);
>  	else {
> +		struct resv_map *resv_map = resv_map_alloc();
> +		if (!resv_map)
> +			return -ENOMEM;
> +
>  		chg = to - from;
> -		set_vma_resv_huge_pages(vma, chg);
> +
> +		set_vma_resv_map(vma, resv_map);
>  		set_vma_resv_flags(vma, HPAGE_RESV_OWNER);
>  	}
>  
> -- 
> 1.5.6.205.g7ca3a
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [RFC] hugetlb reservations -- MAP_PRIVATE fixes for split vmas V2
  2008-06-23 17:35     ` Andy Whitcroft
@ 2008-06-25 21:22       ` Jon Tollefson
  -1 siblings, 0 replies; 290+ messages in thread
From: Jon Tollefson @ 2008-06-25 21:22 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel, kernel-testers, linux-mm, Mel Gorman

Andy Whitcroft wrote:
> As reported by Adam Litke and Jon Tollefson one of the libhugetlbfs
> regression tests triggers a negative overall reservation count.  When
> this occurs where there is no dynamic pool enabled tests will fail.
>
> Following this email are two patches to address this issue:
>
> hugetlb reservations: move region tracking earlier -- simply moves the
>   region tracking code earlier so we do not have to supply prototypes, and
>
> hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma
>   splits -- which moves us to tracking the consumed reservation so that
>   we can correctly calculate the remaining reservations at vma close time.
>
> This stack is against the top of v2.6.25-rc6-mm3, should this solution
> prove acceptable it would need slipping underneath Nick's multiple hugepage
> size patches and those updated.  I have a modified stack prepared for that.
>
> This version incorporates Mel's feedback (both cosmetic, and an allocation
> under spinlock issue) and has an improved layout.
>
> Changes in V2:
>  - commentry updates
>  - pull allocations out from under hugetlb_lock
>  - refactor to match shared code layout
>  - reinstate BUG_ON's
>
> Jon could you have a test on this and see if it works out for you.
>
> -apw
>   
Version two works for me too.  I am not seeing the reserve value become
negative when running the libhuge tests.

Jon


^ permalink raw reply	[flat|nested] 290+ messages in thread

* Re: [RFC] hugetlb reservations -- MAP_PRIVATE fixes for split vmas V2
@ 2008-06-25 21:22       ` Jon Tollefson
  0 siblings, 0 replies; 290+ messages in thread
From: Jon Tollefson @ 2008-06-25 21:22 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Andrew Morton, Nick Piggin, Nishanth Aravamudan, Adam Litke,
	linux-kernel, kernel-testers, linux-mm, Mel Gorman

Andy Whitcroft wrote:
> As reported by Adam Litke and Jon Tollefson one of the libhugetlbfs
> regression tests triggers a negative overall reservation count.  When
> this occurs where there is no dynamic pool enabled tests will fail.
>
> Following this email are two patches to address this issue:
>
> hugetlb reservations: move region tracking earlier -- simply moves the
>   region tracking code earlier so we do not have to supply prototypes, and
>
> hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma
>   splits -- which moves us to tracking the consumed reservation so that
>   we can correctly calculate the remaining reservations at vma close time.
>
> This stack is against the top of v2.6.25-rc6-mm3, should this solution
> prove acceptable it would need slipping underneath Nick's multiple hugepage
> size patches and those updated.  I have a modified stack prepared for that.
>
> This version incorporates Mel's feedback (both cosmetic, and an allocation
> under spinlock issue) and has an improved layout.
>
> Changes in V2:
>  - commentry updates
>  - pull allocations out from under hugetlb_lock
>  - refactor to match shared code layout
>  - reinstate BUG_ON's
>
> Jon could you have a test on this and see if it works out for you.
>
> -apw
>   
Version two works for me too.  I am not seeing the reserve value become
negative when running the libhuge tests.

Jon

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 290+ messages in thread

end of thread, other threads:[~2008-06-25 21:22 UTC | newest]

Thread overview: 290+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-12  5:59 2.6.26-rc5-mm3 Andrew Morton
2008-06-12  5:59 ` 2.6.26-rc5-mm3 Andrew Morton
2008-06-12  5:59 ` 2.6.26-rc5-mm3 Andrew Morton
2008-06-12  7:58 ` 2.6.26-rc5-mm3: kernel BUG at mm/vmscan.c:510 Alexey Dobriyan
2008-06-12  7:58   ` Alexey Dobriyan
2008-06-12  7:58   ` Alexey Dobriyan
2008-06-12  8:22   ` Andrew Morton
2008-06-12  8:22     ` Andrew Morton
2008-06-12  8:22     ` Andrew Morton
2008-06-12  8:23     ` Alexey Dobriyan
2008-06-12  8:23       ` Alexey Dobriyan
2008-06-12  8:23       ` Alexey Dobriyan
2008-06-12  8:44 ` [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575! Kamalesh Babulal
2008-06-12  8:44   ` Kamalesh Babulal
2008-06-12  8:44   ` Kamalesh Babulal
2008-06-12  8:57   ` Andrew Morton
2008-06-12  8:57     ` Andrew Morton
2008-06-12  8:57     ` Andrew Morton
2008-06-12 11:20     ` KAMEZAWA Hiroyuki
2008-06-12 11:20       ` KAMEZAWA Hiroyuki
2008-06-12 11:20       ` KAMEZAWA Hiroyuki
2008-06-13  1:44       ` [PATCH] fix double unlock_page() in " KAMEZAWA Hiroyuki
2008-06-13  1:44         ` KAMEZAWA Hiroyuki
2008-06-13  1:44         ` KAMEZAWA Hiroyuki
2008-06-13  2:13         ` Andrew Morton
2008-06-13  2:13           ` Andrew Morton
2008-06-13  2:13           ` Andrew Morton
2008-06-13 15:30           ` Lee Schermerhorn
2008-06-13 15:30             ` Lee Schermerhorn
2008-06-13 15:30             ` Lee Schermerhorn
2008-06-15  3:59             ` Kamalesh Babulal
2008-06-15  3:59               ` Kamalesh Babulal
2008-06-15  3:59               ` Kamalesh Babulal
2008-06-16 14:49             ` Lee Schermerhorn
2008-06-16 14:49               ` Lee Schermerhorn
2008-06-16 14:49               ` Lee Schermerhorn
2008-06-17  2:32             ` KAMEZAWA Hiroyuki
2008-06-17  2:32               ` KAMEZAWA Hiroyuki
2008-06-17  2:32               ` KAMEZAWA Hiroyuki
2008-06-17 15:26               ` Lee Schermerhorn
2008-06-17 15:26                 ` Lee Schermerhorn
2008-06-17 15:26                 ` Lee Schermerhorn
2008-06-13  4:34         ` Valdis.Kletnieks
2008-06-13  4:34           ` Valdis.Kletnieks-PjAqaU27lzQ
2008-06-14 13:32         ` Kamalesh Babulal
2008-06-14 13:32           ` Kamalesh Babulal
2008-06-14 13:32           ` Kamalesh Babulal
2008-06-12 11:38     ` [BUG] " Nick Piggin
2008-06-12 11:38       ` Nick Piggin
2008-06-12 11:38       ` Nick Piggin
2008-06-13  0:25       ` KAMEZAWA Hiroyuki
2008-06-13  0:25         ` KAMEZAWA Hiroyuki
2008-06-13  4:18   ` Valdis.Kletnieks
2008-06-13  4:18     ` Valdis.Kletnieks-PjAqaU27lzQ
2008-06-13  7:16     ` Andrew Morton
2008-06-13  7:16       ` Andrew Morton
2008-06-13  7:16       ` Andrew Morton
2008-06-12 23:32 ` 2.6.26-rc5-mm3 Byron Bradley
2008-06-12 23:32   ` 2.6.26-rc5-mm3 Byron Bradley
2008-06-12 23:32   ` 2.6.26-rc5-mm3 Byron Bradley
2008-06-12 23:55   ` 2.6.26-rc5-mm3 Daniel Walker
2008-06-12 23:55     ` 2.6.26-rc5-mm3 Daniel Walker
2008-06-12 23:55     ` 2.6.26-rc5-mm3 Daniel Walker
2008-06-13  0:04     ` 2.6.26-rc5-mm3 Byron Bradley
2008-06-13  0:04       ` 2.6.26-rc5-mm3 Byron Bradley
2008-06-18 17:55   ` 2.6.26-rc5-mm3 Daniel Walker
2008-06-18 17:55     ` 2.6.26-rc5-mm3 Daniel Walker
2008-06-18 17:55     ` 2.6.26-rc5-mm3 Daniel Walker
2008-06-19  9:13     ` 2.6.26-rc5-mm3 Ingo Molnar
2008-06-19  9:13       ` 2.6.26-rc5-mm3 Ingo Molnar
2008-06-19  9:13       ` 2.6.26-rc5-mm3 Ingo Molnar
2008-06-19 14:39       ` 2.6.26-rc5-mm3 Daniel Walker
2008-06-19 14:39         ` 2.6.26-rc5-mm3 Daniel Walker
2008-06-19 14:39         ` 2.6.26-rc5-mm3 Daniel Walker
2008-06-17  7:35 ` [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3 Daisuke Nishimura
2008-06-17  7:35   ` Daisuke Nishimura
2008-06-17  7:35   ` Daisuke Nishimura
2008-06-17  7:47   ` [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3) Daisuke Nishimura
2008-06-17  7:47     ` Daisuke Nishimura
2008-06-17  7:47     ` Daisuke Nishimura
2008-06-17  9:03     ` KAMEZAWA Hiroyuki
2008-06-17  9:03       ` KAMEZAWA Hiroyuki
2008-06-17  9:03       ` KAMEZAWA Hiroyuki
2008-06-17  9:14       ` KOSAKI Motohiro
2008-06-17  9:14         ` KOSAKI Motohiro
2008-06-17  9:14         ` KOSAKI Motohiro
2008-06-17  9:15       ` Daisuke Nishimura
2008-06-17  9:15         ` Daisuke Nishimura
2008-06-17 18:29         ` Lee Schermerhorn
2008-06-17 18:29           ` Lee Schermerhorn
2008-06-17 18:29           ` Lee Schermerhorn
2008-06-17 20:00           ` [PATCH] unevictable mlocked pages: initialize mm member of munlock mm_walk structure Lee Schermerhorn
2008-06-17 20:00             ` Lee Schermerhorn
2008-06-17 20:00             ` Lee Schermerhorn
2008-06-18  3:33             ` KOSAKI Motohiro
2008-06-18  3:33               ` KOSAKI Motohiro
2008-06-18  3:33               ` KOSAKI Motohiro
2008-06-18  2:40           ` [Bad page] trying to free locked page? (Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3) Daisuke Nishimura
2008-06-18  2:40             ` Daisuke Nishimura
2008-06-18  2:40             ` Daisuke Nishimura
2008-06-17 15:34     ` KOSAKI Motohiro
2008-06-17 15:34       ` KOSAKI Motohiro
2008-06-17 15:34       ` KOSAKI Motohiro
2008-06-18  2:32       ` Daisuke Nishimura
2008-06-18  2:32         ` Daisuke Nishimura
2008-06-18 10:20         ` KOSAKI Motohiro
2008-06-18 10:20           ` KOSAKI Motohiro
2008-06-18 10:20           ` KOSAKI Motohiro
2008-06-18  9:40     ` [Experimental][PATCH] putback_lru_page rework KAMEZAWA Hiroyuki
2008-06-18  9:40       ` KAMEZAWA Hiroyuki
2008-06-18  9:40       ` KAMEZAWA Hiroyuki
2008-06-18 11:36       ` KOSAKI Motohiro
2008-06-18 11:36         ` KOSAKI Motohiro
2008-06-18 11:36         ` KOSAKI Motohiro
2008-06-18 11:55         ` KAMEZAWA Hiroyuki
2008-06-18 11:55           ` KAMEZAWA Hiroyuki
2008-06-18 11:55           ` KAMEZAWA Hiroyuki
2008-06-19  8:00           ` Daisuke Nishimura
2008-06-19  8:00             ` Daisuke Nishimura
2008-06-19  8:00             ` Daisuke Nishimura
2008-06-19  8:24             ` KAMEZAWA Hiroyuki
2008-06-19  8:24               ` KAMEZAWA Hiroyuki
2008-06-19  8:24               ` KAMEZAWA Hiroyuki
2008-06-18 14:50       ` Daisuke Nishimura
2008-06-18 14:50         ` Daisuke Nishimura
2008-06-18 18:21       ` Lee Schermerhorn
2008-06-18 18:21         ` Lee Schermerhorn
2008-06-18 18:21         ` Lee Schermerhorn
2008-06-19  0:22         ` KAMEZAWA Hiroyuki
2008-06-19  0:22           ` KAMEZAWA Hiroyuki
2008-06-19  0:22           ` KAMEZAWA Hiroyuki
2008-06-19 14:45           ` Lee Schermerhorn
2008-06-19 14:45             ` Lee Schermerhorn
2008-06-19 14:45             ` Lee Schermerhorn
2008-06-20  0:47             ` KAMEZAWA Hiroyuki
2008-06-20  0:47               ` KAMEZAWA Hiroyuki
2008-06-20  0:47               ` KAMEZAWA Hiroyuki
2008-06-20  1:13             ` KAMEZAWA Hiroyuki
2008-06-20  1:13               ` KAMEZAWA Hiroyuki
2008-06-20  1:13               ` KAMEZAWA Hiroyuki
2008-06-20 17:10               ` Lee Schermerhorn
2008-06-20 17:10                 ` Lee Schermerhorn
2008-06-20 17:10                 ` Lee Schermerhorn
2008-06-20 20:41                 ` Lee Schermerhorn
2008-06-20 20:41                   ` Lee Schermerhorn
2008-06-20 20:41                   ` Lee Schermerhorn
2008-06-21  8:56                   ` KOSAKI Motohiro
2008-06-21  8:56                     ` KOSAKI Motohiro
2008-06-21  8:56                     ` KOSAKI Motohiro
2008-06-23  0:30                     ` KAMEZAWA Hiroyuki
2008-06-23  0:30                       ` KAMEZAWA Hiroyuki
2008-06-23  0:30                       ` KAMEZAWA Hiroyuki
2008-06-21  8:41                 ` KOSAKI Motohiro
2008-06-21  8:41                   ` KOSAKI Motohiro
2008-06-21  8:41                   ` KOSAKI Motohiro
2008-06-21  8:39               ` KOSAKI Motohiro
2008-06-21  8:39                 ` KOSAKI Motohiro
2008-06-21  8:39                 ` KOSAKI Motohiro
2008-06-19 15:32           ` kamezawa.hiroyu
2008-06-19 15:32             ` kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A
2008-06-19 15:32             ` kamezawa.hiroyu
2008-06-20 16:24             ` Lee Schermerhorn
2008-06-20 16:24               ` Lee Schermerhorn
2008-06-20 16:24               ` Lee Schermerhorn
2008-06-17 15:33   ` [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3 KOSAKI Motohiro
2008-06-17 15:33     ` KOSAKI Motohiro
2008-06-17 15:33     ` KOSAKI Motohiro
2008-06-18  1:54     ` Daisuke Nishimura
2008-06-18  1:54       ` Daisuke Nishimura
2008-06-18  1:54       ` Daisuke Nishimura
2008-06-18  4:41       ` Daisuke Nishimura
2008-06-18  4:41         ` Daisuke Nishimura
2008-06-18  4:41         ` Daisuke Nishimura
2008-06-18  4:59         ` KAMEZAWA Hiroyuki
2008-06-18  4:59           ` KAMEZAWA Hiroyuki
2008-06-18  4:59           ` KAMEZAWA Hiroyuki
2008-06-18  7:54         ` [PATCH][-mm] remove redundant page->mapping check KOSAKI Motohiro
2008-06-18  7:54           ` KOSAKI Motohiro
2008-06-18  7:54           ` KOSAKI Motohiro
2008-06-17 17:46   ` [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3 Lee Schermerhorn
2008-06-17 17:46     ` Lee Schermerhorn
2008-06-17 17:46     ` Lee Schermerhorn
2008-06-17 18:33     ` Hugh Dickins
2008-06-17 18:33       ` Hugh Dickins
2008-06-17 18:33       ` Hugh Dickins
2008-06-17 19:28       ` Lee Schermerhorn
2008-06-17 19:28         ` Lee Schermerhorn
2008-06-17 19:28         ` Lee Schermerhorn
2008-06-18  5:19         ` Nick Piggin
2008-06-18  5:19           ` Nick Piggin
2008-06-18  5:19           ` Nick Piggin
2008-06-18  2:59     ` Daisuke Nishimura
2008-06-18  2:59       ` Daisuke Nishimura
2008-06-18  2:59       ` Daisuke Nishimura
2008-06-18  1:13   ` KAMEZAWA Hiroyuki
2008-06-18  1:13     ` KAMEZAWA Hiroyuki
2008-06-18  1:13     ` KAMEZAWA Hiroyuki
2008-06-18  1:26     ` Daisuke Nishimura
2008-06-18  1:26       ` Daisuke Nishimura
2008-06-18  1:26       ` Daisuke Nishimura
2008-06-18  1:54     ` [PATCH] migration_entry_wait fix KAMEZAWA Hiroyuki
2008-06-18  1:54       ` KAMEZAWA Hiroyuki
2008-06-18  1:54       ` KAMEZAWA Hiroyuki
2008-06-18  5:26       ` KOSAKI Motohiro
2008-06-18  5:26         ` KOSAKI Motohiro
2008-06-18  5:26         ` KOSAKI Motohiro
2008-06-18  5:35       ` Nick Piggin
2008-06-18  5:35         ` Nick Piggin
2008-06-18  5:35         ` Nick Piggin
2008-06-18  6:04         ` KAMEZAWA Hiroyuki
2008-06-18  6:04           ` KAMEZAWA Hiroyuki
2008-06-18  6:04           ` KAMEZAWA Hiroyuki
2008-06-18  6:42           ` Nick Piggin
2008-06-18  6:42             ` Nick Piggin
2008-06-18  6:42             ` Nick Piggin
2008-06-18  6:52             ` KAMEZAWA Hiroyuki
2008-06-18  6:52               ` KAMEZAWA Hiroyuki
2008-06-18  6:52               ` KAMEZAWA Hiroyuki
2008-06-18  7:29               ` [PATCH -mm][BUGFIX] migration_entry_wait fix. v2 KAMEZAWA Hiroyuki
2008-06-18  7:29                 ` KAMEZAWA Hiroyuki
2008-06-18  7:29                 ` KAMEZAWA Hiroyuki
2008-06-18  7:26                 ` KOSAKI Motohiro
2008-06-18  7:26                   ` KOSAKI Motohiro
2008-06-18  7:26                   ` KOSAKI Motohiro
2008-06-18  7:40                 ` Nick Piggin
2008-06-18  7:40                   ` Nick Piggin
2008-06-18  7:40                   ` Nick Piggin
2008-06-19  6:59 ` [BUG][PATCH -mm] avoid BUG() in __stop_machine_run() Hidehiro Kawai
2008-06-19  6:59   ` Hidehiro Kawai
2008-06-19 10:12   ` Rusty Russell
2008-06-19 10:12     ` Rusty Russell
2008-06-19 10:12     ` Rusty Russell
2008-06-19 15:51     ` Jeremy Fitzhardinge
2008-06-19 15:51       ` Jeremy Fitzhardinge
2008-06-19 15:51       ` Jeremy Fitzhardinge
2008-06-20 13:21       ` Ingo Molnar
2008-06-20 13:21         ` Ingo Molnar
2008-06-20 13:21         ` Ingo Molnar
2008-06-23  3:55         ` Rusty Russell
2008-06-23  3:55           ` Rusty Russell
2008-06-23  3:55           ` Rusty Russell
2008-06-23 21:01           ` Ingo Molnar
2008-06-23 21:01             ` Ingo Molnar
2008-06-23 21:01             ` Ingo Molnar
2008-06-19 16:27 ` 2.6.26-rc5-mm3: BUG large value for HugePages_Rsvd Jon Tollefson
2008-06-19 16:27   ` Jon Tollefson
2008-06-19 16:27   ` Jon Tollefson
2008-06-19 17:16   ` Andy Whitcroft
2008-06-19 17:16     ` Andy Whitcroft
2008-06-19 17:16     ` Andy Whitcroft
2008-06-20  3:18     ` Jon Tollefson
2008-06-20  3:18       ` Jon Tollefson
2008-06-20  3:18       ` Jon Tollefson
2008-06-20 19:17   ` [RFC] hugetlb reservations -- MAP_PRIVATE fixes for split vmas Andy Whitcroft
2008-06-20 19:17     ` Andy Whitcroft
2008-06-20 19:17     ` Andy Whitcroft
2008-06-20 19:17     ` [PATCH 1/2] hugetlb reservations: move region tracking earlier Andy Whitcroft
2008-06-20 19:17       ` Andy Whitcroft
2008-06-20 19:17     ` [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits Andy Whitcroft
2008-06-20 19:17       ` Andy Whitcroft
2008-06-20 19:17       ` Andy Whitcroft
2008-06-23  7:33       ` Mel Gorman
2008-06-23  7:33         ` Mel Gorman
2008-06-23  7:33         ` Mel Gorman
2008-06-23  8:00       ` Mel Gorman
2008-06-23  8:00         ` Mel Gorman
2008-06-23  8:00         ` Mel Gorman
2008-06-23  9:53         ` Andy Whitcroft
2008-06-23  9:53           ` Andy Whitcroft
2008-06-23  9:53           ` Andy Whitcroft
2008-06-23 16:04     ` [RFC] hugetlb reservations -- MAP_PRIVATE fixes for split vmas Jon Tollefson
2008-06-23 16:04       ` Jon Tollefson
2008-06-23 16:04       ` Jon Tollefson
2008-06-23 17:35   ` [RFC] hugetlb reservations -- MAP_PRIVATE fixes for split vmas V2 Andy Whitcroft
2008-06-23 17:35     ` Andy Whitcroft
2008-06-23 17:35     ` Andy Whitcroft
2008-06-23 17:35     ` [PATCH 1/2] hugetlb reservations: move region tracking earlier Andy Whitcroft
2008-06-23 17:35       ` Andy Whitcroft
2008-06-23 17:35       ` Andy Whitcroft
2008-06-23 23:05       ` Mel Gorman
2008-06-23 23:05         ` Mel Gorman
2008-06-23 23:05         ` Mel Gorman
2008-06-23 17:35     ` [PATCH 2/2] hugetlb reservations: fix hugetlb MAP_PRIVATE reservations across vma splits V2 Andy Whitcroft
2008-06-23 17:35       ` Andy Whitcroft
2008-06-23 17:35       ` Andy Whitcroft
2008-06-23 23:08       ` Mel Gorman
2008-06-23 23:08         ` Mel Gorman
2008-06-23 23:08         ` Mel Gorman
2008-06-25 21:22     ` [RFC] hugetlb reservations -- MAP_PRIVATE fixes for split vmas V2 Jon Tollefson
2008-06-25 21:22       ` Jon Tollefson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.