All of lore.kernel.org
 help / color / mirror / Atom feed
* mmotm 2009-09-25-14-35 uploaded
@ 2009-09-25 21:35 ` akpm
  0 siblings, 0 replies; 22+ messages in thread
From: akpm @ 2009-09-25 21:35 UTC (permalink / raw)
  To: mm-commits, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 8820 bytes --]

The mm-of-the-moment snapshot 2009-09-25-14-35 has been uploaded to

   http://userweb.kernel.org/~akpm/mmotm/

and will soon be available at

   git://git.zen-sources.org/zen/mmotm.git

It contains the following patches against 2.6.31:

origin.patch
module-fix-up-config_kallsyms=n-build.patch
linux-next.patch
linux-next-git-rejects.patch
next-remove-localversion.patch
i-need-old-gcc.patch
kernel-core-add-smp_call_function_any.patch
arch-x86-kernel-cpu-cpufreq-acpi-cpufreqc-avoid-cross-cpu-interrupts-by-using-smp_call_function_any.patch
dell_laptop-when-the-hardware-switch-is-disabled-dont-actually-allow-changing-the-softblock-status.patch
acpi-reintroduce-acpi_device_ops-shutdown-method.patch
cs5535-gpio-add-amd-cs5535-cs5536-gpio-driver-support.patch
cs5535-gpio-request-function-mask-names-added.patch
alsa-cs5535audio-free-olpc-quirks-from-reliance-on-mgeode_lx-cpu-optimization.patch
agp-correct-missing-cleanup-on-error-in-agp_add_bridge.patch
avr32-convert-to-asm-generic-hardirqh.patch
powerpc-sky-cpu-redundant-or-incorrect-tests-on-unsigned.patch
hpilo-add-locking-comment.patch
drm-via-add-pci-id-for-via-vx800-chipset.patch
genirq-switch-proc-irq-spurious-to-seq_file.patch
timer-stats-fix-del_timer_sync-and-try_to_del_timer_sync.patch
ia64-use-printk_once.patch
input-drivers-input-xpadc-improve-xbox-360-wireless-support-and-add-sysfs-interface.patch
input-documentation-input-xpadtxt-update-for-new-driver-functionality.patch
kbuild-generate-modulesbuiltin.patch
kbuild-rebuild-fix-for-makefilemodbuiltin.patch
kconfig-cross_compile-option.patch
ide-use-printk_once.patch
mips-decrease-size-of-au1xxx_dbdma_pm_regs.patch
msp71xx-request_irq-failure-ignored-in-msp_pcibios_config_access.patch
mtdpart-memory-accessor-interface-for-mtd-layer.patch
isdn-hisax-fix-lock-imbalance.patch
hfc_usb-fix-read-buffer-overflow.patch
isdn-fix-netjet-build-errors.patch
misdn-fix-reversed-if-in-st_own_ctrl.patch
isdn-eicon-use-offsetof.patch
isdn-eicon-return-on-error.patch
3x59x-fix-pci-resource-management.patch
bluetooth-fix-for-acer-bluetooth-optical-rechargeable-mouse.patch
sunrpc-use-formatting-of-module-name-in-sunrpc.patch
serial_txx9-use-container_of-instead-of-direct-cast.patch
icom-converting-space-to-tabs.patch
cyclades-read-buffer-overflow.patch
serial167-fix-read-buffer-overflow.patch
serial-add-parameter-to-force-skipping-the-test-for-the-txen-bug.patch
drivers-md-introduce-missing-kfree.patch
scsi-add-__init-__exit-macros-to-ibmvstgtc.patch
drivers-scsi-fnic-fnic_scsic-clean-up.patch
ibmmca-buffer-overflow.patch
scsi-eata-fix-buffer-overflow.patch
drivers-scsi-gdthc-fix-buffer-overflow.patch
drivers-scsi-u14-34fc-fix-uffer-overflow.patch
drivers-scsi-lpfc-lpfc_vportc-fix-read-buffer-overflow.patch
osst-fix-read-buffer-overflow.patch
scsi-fix-func-names-in-kernel-doc.patch
gdth-unmap-ccb_phys-when-scsi_add_host-fails-in-gdth_eisa_probe_one.patch
zfcp-test-kmalloc-failure-in-scsi_get_vpd_page.patch
st-fix-test-of-value-range-in-st_set_options.patch
sparc32-convert-to-asm-generic-hardirqh.patch
synaptics-touchscreen-for-htc-dream-check-that-smbus-is-available.patch
drivers-staging-octeon-ethernet-rgmiic-dont-ignore-request_irq-return-code.patch
revert-staging-android-lowmemorykillerc-fix-it-for-oom-move-oom_adj-value-from-task_struct-to-mm_struct.patch
oom-move-oom_adj-value-from-task_struct-to-signal_struct-fix.patch
vfs-fix-vfs_rename_dir-for-fs_rename_does_d_move-filesystems.patch
raw-fix-rawctl-compat-ioctls-breakage-on-amd64-and-itanic.patch
vfs-improve-comment-describing-fget_light.patch
ecryptfs-another-lockdep-issue.patch
fs-remove-unneeded-dcache_unhashed-tricks.patch
fs-improve-remountro-vs-buffercache-coherency.patch
vfs-implement-posix-o_sync-and-o_dsync-semantics.patch
vfs-implement-posix-o_sync-and-o_dsync-semantics-checkpatch-fixes.patch
xtensa-use-generic-sys_pipe.patch
xtensa-convert-to-asm-generic-hardirqh.patch
percpu-avoid-calling-__pcpu_ptr_to_addrnull.patch
mm.patch
ksm-more-on-default-values.patch
mm-includecheck-fix-vmallocc.patch
fs-includecheck-fix-proc-kcorec.patch
video-includecheck-fix-msm-mddic.patch
video-includecheck-fix-da8xx-fbc.patch
cgroups-update-documentation-of-cgroups-tasks-and-procs-files.patch
drivers-input-inputc-fix-config_pm=n-warning.patch
x86-_end-symbol-missing-from-symbolmap.patch
sched-fix-htmldocs-schedc.patch
scsi_libc-avoid-calling-scsi_device_put-from-under-host_lock.patch
hrtimers-remove-the-timer_stats_active-check-when-setting-the-start-info.patch
drivers-net-wireless-ath-ar9170-phyc-fix-uninitialised-variable.patch
readahead-add-blk_run_backing_dev.patch
mm-warn-once-when-a-page-is-freed-with-pg_mlocked-set.patch
mm-vsmcan-check-shrink_active_list-sc-isolate_pages-return-value.patch
dev-mem-remove-redundant-test-on-len.patch
dev-mem-introduce-size_inside_page.patch
dev-mem-cleanup-unxlate_dev_mem_ptr-calls.patch
dev-mem-cleanup-unxlate_dev_mem_ptr-calls-fix.patch
dev-mem-cleanup-unxlate_dev_mem_ptr-calls-fix-fix.patch
dev-mem-make-size_inside_page-logic-straight.patch
dev-mem-remove-the-written-variable-in-write_kmem.patch
dev-mem-remove-the-written-variable-in-write_kmem-fix.patch
dev-mem-remove-redundant-parameter-from-do_write_kmem.patch
pcmcia-fix-controller-printk-warnings.patch
frv-duplicate-output_buffer-of-e03.patch
frv-duplicate-output_buffer-of-e03-checkpatch-fixes.patch
cris-convert-to-use-arch_gettimeoffset.patch
generic-ipi-cleanup-for-generic_smp_call_function_interrupt.patch
maintainers-update-generic-uio-for-pci-devices.patch
maintainers-update-tracing-section.patch
maintainers-update-omap-tony-lindgren-email-name.patch
maintainers-change-atm-mailing-list-to-moderated.patch
maintainers-use-tab-not-spaces-after-field-types.patch
maintainers-update-kernel-janitors-after-mismerge.patch
maintainers-update-score-architecture-name-style-and-add-file-pattern.patch
maintainers-simple-firmware-interface-update-email-style.patch
maintainers-winbond-cir-integrate-p-m-lines-fixup-david-härdemans-name.patch
maintainers-fix-up-peripheral-spelling.patch
maintainers-update-wolfson-microelectronics.patch
mmc-make-the-configuration-memory-resource-optional.patch
tmio_mmc-optionally-support-using-platform-clock.patch
sh-switch-migo-r-to-use-the-tmio-mmc-driver-instead-of-spi.patch
sdio-recognize-io-card-without-powercycle.patch
sdio-pass-whitelisted-cis-funce-tuples-to-sdio-drivers.patch
hwmon-driver-for-texas-instruments-amc6821-chip.patch
rtc-set-wakeup-capability-for-i2c-and-spi-rtc-drivers.patch
gpiolib-add-names-file-in-gpio-chip-sysfs.patch
fbdev-bfin-lq035q1-fb-new-blackfin-landscape-lcd-ez-extender-driver.patch
fbdev-bfin-t350mcqb-fb-handle-all-resources-in-suspend-resume.patch
fbdev-bfin-t350mcqb-fb-fix-lcd-dimensions.patch
intelfb-fix-setting-of-active-pipe-with-lvds-displays.patch
hfsplus-identify-journal-info-block-in-volume-header.patch
hfsplus-fix-journal-detection.patch
reiserfs-remove-proc-fs-reiserfs-version.patch
reiserfs-dont-compile-procfso-at-all-if-no-support.patch
reiserfs-truncate-blocks-not-used-by-a-write-v2.patch
fatfs-use-common-time_to_tm-in-fat_time_unix2fat.patch
do_wait-optimization-do-not-place-sub-threads-on-task_struct-children-list.patch
utrace-core.patch
cyclades-allow-overriding-isa-defaults-also-when-the-driver-is-built-in.patch
sound-core-pcm_timerc-use-lib-gcdc.patch
net-netfilter-ipvs-ip_vs_wrrc-use-lib-gcdc.patch
net-netfilter-ipvs-ip_vs_wrrc-use-lib-gcdc-fix.patch
vfs-take-2add-set_page_dirty_notag.patch
reiser4-vfs-add-super_operationssync_inodes-2.patch
reiser4-export-remove_from_page_cache.patch
reiser4-export-remove_from_page_cache-fix.patch
reiser4-export-find_get_pages.patch
reiser4.patch
reiser4-adjust-to-the-new-aops.patch
reiser4-adjust-to-the-new-aops-fixup.patch
reiser4-remove-simple_prepare_write-usage.patch
reiser4-remove-simple_prepare_write-usage-checkpatch-fixes.patch
fs-symlink-write_begin-allocation-context-fix-reiser4-fix.patch
reiser4-handling-error-returned-by-d_obtain_alias-fixup.patch
reiser4-update-names-of-quota-methods.patch
reiser4-use-set_page_dirty_notag.patch
fs-reiser4-add-parenths-around-x-y.patch
fs-reiser4-contextc-current_is_pdflush-got-removed.patch
reiser4-fix.patch
reiser4-disable.patch
make-sure-nobodys-leaking-resources.patch
journal_add_journal_head-debug.patch
releasing-resources-with-children.patch
make-frame_pointer-default=y.patch
mutex-subsystem-synchro-test-module.patch
slab-leaks3-default-y.patch
put_bh-debug.patch
add-debugging-aid-for-memory-initialisation-problems.patch
keep-track-of-network-interface-renaming.patch
workaround-for-a-pci-restoring-bug.patch
prio_tree-debugging-patch.patch
single_open-seq_release-leak-diagnostics.patch
add-a-refcount-check-in-dput.patch
getblk-handle-2tb-devices.patch
getblk-handle-2tb-devices-fix.patch
undeprecate-pci_find_device.patch
notify_change-callers-must-hold-i_mutex.patch

^ permalink raw reply	[flat|nested] 22+ messages in thread

* mmotm 2009-09-25-14-35 uploaded
@ 2009-09-25 21:35 ` akpm
  0 siblings, 0 replies; 22+ messages in thread
From: akpm @ 2009-09-25 21:35 UTC (permalink / raw)
  To: mm-commits, linux-kernel

The mm-of-the-moment snapshot 2009-09-25-14-35 has been uploaded to

   http://userweb.kernel.org/~akpm/mmotm/

and will soon be available at

   git://git.zen-sources.org/zen/mmotm.git

It contains the following patches against 2.6.31:

origin.patch
module-fix-up-config_kallsyms=n-build.patch
linux-next.patch
linux-next-git-rejects.patch
next-remove-localversion.patch
i-need-old-gcc.patch
kernel-core-add-smp_call_function_any.patch
arch-x86-kernel-cpu-cpufreq-acpi-cpufreqc-avoid-cross-cpu-interrupts-by-using-smp_call_function_any.patch
dell_laptop-when-the-hardware-switch-is-disabled-dont-actually-allow-changing-the-softblock-status.patch
acpi-reintroduce-acpi_device_ops-shutdown-method.patch
cs5535-gpio-add-amd-cs5535-cs5536-gpio-driver-support.patch
cs5535-gpio-request-function-mask-names-added.patch
alsa-cs5535audio-free-olpc-quirks-from-reliance-on-mgeode_lx-cpu-optimization.patch
agp-correct-missing-cleanup-on-error-in-agp_add_bridge.patch
avr32-convert-to-asm-generic-hardirqh.patch
powerpc-sky-cpu-redundant-or-incorrect-tests-on-unsigned.patch
hpilo-add-locking-comment.patch
drm-via-add-pci-id-for-via-vx800-chipset.patch
genirq-switch-proc-irq-spurious-to-seq_file.patch
timer-stats-fix-del_timer_sync-and-try_to_del_timer_sync.patch
ia64-use-printk_once.patch
input-drivers-input-xpadc-improve-xbox-360-wireless-support-and-add-sysfs-interface.patch
input-documentation-input-xpadtxt-update-for-new-driver-functionality.patch
kbuild-generate-modulesbuiltin.patch
kbuild-rebuild-fix-for-makefilemodbuiltin.patch
kconfig-cross_compile-option.patch
ide-use-printk_once.patch
mips-decrease-size-of-au1xxx_dbdma_pm_regs.patch
msp71xx-request_irq-failure-ignored-in-msp_pcibios_config_access.patch
mtdpart-memory-accessor-interface-for-mtd-layer.patch
isdn-hisax-fix-lock-imbalance.patch
hfc_usb-fix-read-buffer-overflow.patch
isdn-fix-netjet-build-errors.patch
misdn-fix-reversed-if-in-st_own_ctrl.patch
isdn-eicon-use-offsetof.patch
isdn-eicon-return-on-error.patch
3x59x-fix-pci-resource-management.patch
bluetooth-fix-for-acer-bluetooth-optical-rechargeable-mouse.patch
sunrpc-use-formatting-of-module-name-in-sunrpc.patch
serial_txx9-use-container_of-instead-of-direct-cast.patch
icom-converting-space-to-tabs.patch
cyclades-read-buffer-overflow.patch
serial167-fix-read-buffer-overflow.patch
serial-add-parameter-to-force-skipping-the-test-for-the-txen-bug.patch
drivers-md-introduce-missing-kfree.patch
scsi-add-__init-__exit-macros-to-ibmvstgtc.patch
drivers-scsi-fnic-fnic_scsic-clean-up.patch
ibmmca-buffer-overflow.patch
scsi-eata-fix-buffer-overflow.patch
drivers-scsi-gdthc-fix-buffer-overflow.patch
drivers-scsi-u14-34fc-fix-uffer-overflow.patch
drivers-scsi-lpfc-lpfc_vportc-fix-read-buffer-overflow.patch
osst-fix-read-buffer-overflow.patch
scsi-fix-func-names-in-kernel-doc.patch
gdth-unmap-ccb_phys-when-scsi_add_host-fails-in-gdth_eisa_probe_one.patch
zfcp-test-kmalloc-failure-in-scsi_get_vpd_page.patch
st-fix-test-of-value-range-in-st_set_options.patch
sparc32-convert-to-asm-generic-hardirqh.patch
synaptics-touchscreen-for-htc-dream-check-that-smbus-is-available.patch
drivers-staging-octeon-ethernet-rgmiic-dont-ignore-request_irq-return-code.patch
revert-staging-android-lowmemorykillerc-fix-it-for-oom-move-oom_adj-value-from-task_struct-to-mm_struct.patch
oom-move-oom_adj-value-from-task_struct-to-signal_struct-fix.patch
vfs-fix-vfs_rename_dir-for-fs_rename_does_d_move-filesystems.patch
raw-fix-rawctl-compat-ioctls-breakage-on-amd64-and-itanic.patch
vfs-improve-comment-describing-fget_light.patch
ecryptfs-another-lockdep-issue.patch
fs-remove-unneeded-dcache_unhashed-tricks.patch
fs-improve-remountro-vs-buffercache-coherency.patch
vfs-implement-posix-o_sync-and-o_dsync-semantics.patch
vfs-implement-posix-o_sync-and-o_dsync-semantics-checkpatch-fixes.patch
xtensa-use-generic-sys_pipe.patch
xtensa-convert-to-asm-generic-hardirqh.patch
percpu-avoid-calling-__pcpu_ptr_to_addrnull.patch
mm.patch
ksm-more-on-default-values.patch
mm-includecheck-fix-vmallocc.patch
fs-includecheck-fix-proc-kcorec.patch
video-includecheck-fix-msm-mddic.patch
video-includecheck-fix-da8xx-fbc.patch
cgroups-update-documentation-of-cgroups-tasks-and-procs-files.patch
drivers-input-inputc-fix-config_pm=n-warning.patch
x86-_end-symbol-missing-from-symbolmap.patch
sched-fix-htmldocs-schedc.patch
scsi_libc-avoid-calling-scsi_device_put-from-under-host_lock.patch
hrtimers-remove-the-timer_stats_active-check-when-setting-the-start-info.patch
drivers-net-wireless-ath-ar9170-phyc-fix-uninitialised-variable.patch
readahead-add-blk_run_backing_dev.patch
mm-warn-once-when-a-page-is-freed-with-pg_mlocked-set.patch
mm-vsmcan-check-shrink_active_list-sc-isolate_pages-return-value.patch
dev-mem-remove-redundant-test-on-len.patch
dev-mem-introduce-size_inside_page.patch
dev-mem-cleanup-unxlate_dev_mem_ptr-calls.patch
dev-mem-cleanup-unxlate_dev_mem_ptr-calls-fix.patch
dev-mem-cleanup-unxlate_dev_mem_ptr-calls-fix-fix.patch
dev-mem-make-size_inside_page-logic-straight.patch
dev-mem-remove-the-written-variable-in-write_kmem.patch
dev-mem-remove-the-written-variable-in-write_kmem-fix.patch
dev-mem-remove-redundant-parameter-from-do_write_kmem.patch
pcmcia-fix-controller-printk-warnings.patch
frv-duplicate-output_buffer-of-e03.patch
frv-duplicate-output_buffer-of-e03-checkpatch-fixes.patch
cris-convert-to-use-arch_gettimeoffset.patch
generic-ipi-cleanup-for-generic_smp_call_function_interrupt.patch
maintainers-update-generic-uio-for-pci-devices.patch
maintainers-update-tracing-section.patch
maintainers-update-omap-tony-lindgren-email-name.patch
maintainers-change-atm-mailing-list-to-moderated.patch
maintainers-use-tab-not-spaces-after-field-types.patch
maintainers-update-kernel-janitors-after-mismerge.patch
maintainers-update-score-architecture-name-style-and-add-file-pattern.patch
maintainers-simple-firmware-interface-update-email-style.patch
maintainers-winbond-cir-integrate-p-m-lines-fixup-david-härdemans-name.patch
maintainers-fix-up-peripheral-spelling.patch
maintainers-update-wolfson-microelectronics.patch
mmc-make-the-configuration-memory-resource-optional.patch
tmio_mmc-optionally-support-using-platform-clock.patch
sh-switch-migo-r-to-use-the-tmio-mmc-driver-instead-of-spi.patch
sdio-recognize-io-card-without-powercycle.patch
sdio-pass-whitelisted-cis-funce-tuples-to-sdio-drivers.patch
hwmon-driver-for-texas-instruments-amc6821-chip.patch
rtc-set-wakeup-capability-for-i2c-and-spi-rtc-drivers.patch
gpiolib-add-names-file-in-gpio-chip-sysfs.patch
fbdev-bfin-lq035q1-fb-new-blackfin-landscape-lcd-ez-extender-driver.patch
fbdev-bfin-t350mcqb-fb-handle-all-resources-in-suspend-resume.patch
fbdev-bfin-t350mcqb-fb-fix-lcd-dimensions.patch
intelfb-fix-setting-of-active-pipe-with-lvds-displays.patch
hfsplus-identify-journal-info-block-in-volume-header.patch
hfsplus-fix-journal-detection.patch
reiserfs-remove-proc-fs-reiserfs-version.patch
reiserfs-dont-compile-procfso-at-all-if-no-support.patch
reiserfs-truncate-blocks-not-used-by-a-write-v2.patch
fatfs-use-common-time_to_tm-in-fat_time_unix2fat.patch
do_wait-optimization-do-not-place-sub-threads-on-task_struct-children-list.patch
utrace-core.patch
cyclades-allow-overriding-isa-defaults-also-when-the-driver-is-built-in.patch
sound-core-pcm_timerc-use-lib-gcdc.patch
net-netfilter-ipvs-ip_vs_wrrc-use-lib-gcdc.patch
net-netfilter-ipvs-ip_vs_wrrc-use-lib-gcdc-fix.patch
vfs-take-2add-set_page_dirty_notag.patch
reiser4-vfs-add-super_operationssync_inodes-2.patch
reiser4-export-remove_from_page_cache.patch
reiser4-export-remove_from_page_cache-fix.patch
reiser4-export-find_get_pages.patch
reiser4.patch
reiser4-adjust-to-the-new-aops.patch
reiser4-adjust-to-the-new-aops-fixup.patch
reiser4-remove-simple_prepare_write-usage.patch
reiser4-remove-simple_prepare_write-usage-checkpatch-fixes.patch
fs-symlink-write_begin-allocation-context-fix-reiser4-fix.patch
reiser4-handling-error-returned-by-d_obtain_alias-fixup.patch
reiser4-update-names-of-quota-methods.patch
reiser4-use-set_page_dirty_notag.patch
fs-reiser4-add-parenths-around-x-y.patch
fs-reiser4-contextc-current_is_pdflush-got-removed.patch
reiser4-fix.patch
reiser4-disable.patch
make-sure-nobodys-leaking-resources.patch
journal_add_journal_head-debug.patch
releasing-resources-with-children.patch
make-frame_pointer-default=y.patch
mutex-subsystem-synchro-test-module.patch
slab-leaks3-default-y.patch
put_bh-debug.patch
add-debugging-aid-for-memory-initialisation-problems.patch
keep-track-of-network-interface-renaming.patch
workaround-for-a-pci-restoring-bug.patch
prio_tree-debugging-patch.patch
single_open-seq_release-leak-diagnostics.patch
add-a-refcount-check-in-dput.patch
getblk-handle-2tb-devices.patch
getblk-handle-2tb-devices-fix.patch
undeprecate-pci_find_device.patch
notify_change-callers-must-hold-i_mutex.patch
--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Question: RCU stall detected in memcg  (Re: mmotm 2009-09-25-14-35 uploaded
  2009-09-25 21:35 ` akpm
  (?)
@ 2009-09-28  6:42 ` KAMEZAWA Hiroyuki
  2009-09-28  8:33   ` KAMEZAWA Hiroyuki
                     ` (3 more replies)
  -1 siblings, 4 replies; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-28  6:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, mingo, balbir, nishimura

On Fri, 25 Sep 2009 14:35:46 -0700
akpm@linux-foundation.org wrote:

> The mm-of-the-moment snapshot 2009-09-25-14-35 has been uploaded to
> 
>    http://userweb.kernel.org/~akpm/mmotm/
> 
> and will soon be available at
> 
>    git://git.zen-sources.org/zen/mmotm.git
> 
> It contains the following patches against 2.6.31:
> 

At testing my (small) patch, with high memory pressure to
memcg+hierarchy+softlimit, following is shown.
==
INFO: RCU detected CPU 0 stall (t=10000 jiffies)
sending NMI to all CPUs:
NMI backtrace for cpu 0
CPU 0:
Modules linked in: sco bridge stp bnep l2cap crc16 bluetooth rfkill iptabl
e_filter ip_tables ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log d
m_multipath dm_mod uinput ppdev i2c_i801 pcspkr i2c_core bnx2 sg e1000e parport_pc parport button shpchp megaraid_sas sd_mo
d scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
 Pid: 2886, comm: ruby Not tainted 2.6.31-mm1 #2 PRIMERGY
 RIP: 0010:[<ffffffff810878fe>]  [<ffffffff810878fe>] trace_hardirqs_off_ca
ller+0x3e/0xb RSP: 0018:ffff88004fa03d98  EFLAGS: 00000006
 RAX: 0000000000000046 RBX: 0000000000000c00 RCX: 000000000000e501
 RDX: ffff8806133564f0 RSI: 0000000000000002 RDI: ffffffff8102a940
 RBP: ffff88004fa03d98 R08: 0000000000000001 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
 R13: 0000000000000046 R14: 00000000000000ff R15: ffff88004fa03f48
 FS:  00007fdeca0856f0(0000) GS:ffff88004fa00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fdeca09e000 CR3: 0000000619fc6000 CR4: 00000000000006f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Call Trace:
  <#DB[1]>  <<EOE>> Pid: 2886, comm: ruby Not tainted 2.6.31-mm1 #2
 Call Trace:
  <NMI>  [<ffffffff8100af79>] ? show_regs+0x49/0x50
  [<ffffffff81429385>] nmi_watchdog_tick+0x1e5/0x210
  [<ffffffff81428891>] do_nmi+0x1b1/0x2e0
  [<ffffffff8142808a>] nmi+0x1a/0x2c
  [<ffffffff8102a940>] ? flat_send_IPI_mask+0x90/0xb0
  [<ffffffff810878fe>] ? trace_hardirqs_off_caller+0x3e/0xb0
  <<EOE>>  <IRQ>  [<ffffffff810884bd>] trace_hardirqs_off+0xd/0x10
  [<ffffffff8102a940>] flat_send_IPI_mask+0x90/0xb0
  [<ffffffff8102a9c9>] flat_send_IPI_all+0x69/0x70
  [<ffffffff81027372>] arch_trigger_all_cpu_backtrace+0x62/0xa0
  [<ffffffff810bff8e>] __rcu_pending+0x7e/0x370
  [<ffffffff810c02c7>] rcu_check_callbacks+0x47/0x130
  [<ffffffff81063a26>] update_process_times+0x46/0x70
  [<ffffffff81085930>] tick_sched_timer+0x60/0x160
  [<ffffffff810858d0>] ? tick_sched_timer+0x0/0x160
  [<ffffffff8107a03a>] __run_hrtimer+0xba/0x150
  [<ffffffff8107a325>] hrtimer_interrupt+0xd5/0x1b0
  [<ffffffff81426dfe>] ? trace_hardirqs_off_thunk+0x3a/0x3c
  [<ffffffff8142cacd>] smp_apic_timer_interrupt+0x6d/0x9b
  [<ffffffff8100cb33>] apic_timer_interrupt+0x13/0x20
  <EOI>  [<ffffffff811317b6>] ? mem_cgroup_walk_tree+0x156/0x180
  [<ffffffff811316d3>] ? mem_cgroup_walk_tree+0x73/0x180
  [<ffffffff81131692>] ? mem_cgroup_walk_tree+0x32/0x180
  [<ffffffff81131a00>] ? mem_cgroup_get_local_stat+0x0/0x110
  [<ffffffff81131d5b>] ? mem_control_stat_show+0x14b/0x330
  [<ffffffff810a57fd>] ? cgroup_seqfile_show+0x3d/0x60
  [<ffffffff810a5b90>] ? cgroup_map_add+0x0/0x30
  [<ffffffff8115de03>] ? seq_read+0xf3/0x420
  [<ffffffff811d9926>] ? security_file_permission+0x16/0x20
  [<ffffffff8113b7ec>] ? vfs_read+0xcc/0x190
  [<ffffffff8113b9b5>] ? sys_read+0x55/0x90
  [<ffffffff8100bf9b>] ? system_call_fastpath+0x16/0x1b
.....
==
mem_cgroup_walk_tree() is not very young function and I've been doing the
same kind of tests but this is 1st messeage for me. So, I'm a bit confused.
and not sure how I start debug from...

Does this mean mem_cgroup_walk_tree() blocks RCU's progress
over 10000 jiffies ? What should I doubt ?

Thanks,
-Kame




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Question: RCU stall detected in memcg  (Re: mmotm 2009-09-25-14-35 uploaded
  2009-09-28  6:42 ` Question: RCU stall detected in memcg (Re: " KAMEZAWA Hiroyuki
@ 2009-09-28  8:33   ` KAMEZAWA Hiroyuki
  2009-09-28  9:06   ` [BUGFIX][PATCH][rc1] memcg: fix refcnt goes to minus KAMEZAWA Hiroyuki
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-28  8:33 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, akpm, mingo, balbir, nishimura

On Mon, 28 Sep 2009 15:42:13 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Fri, 25 Sep 2009 14:35:46 -0700
> akpm@linux-foundation.org wrote:
> 
> > The mm-of-the-moment snapshot 2009-09-25-14-35 has been uploaded to
> > 
> >    http://userweb.kernel.org/~akpm/mmotm/
> > 
> > and will soon be available at
> > 
> >    git://git.zen-sources.org/zen/mmotm.git
> > 
> > It contains the following patches against 2.6.31:
> > 
> 
> At testing my (small) patch, with high memory pressure to
> memcg+hierarchy+softlimit, following is shown.

I'm sorry that I think I found something wrong in memcg. 
I'll report it. sorry for noise.

Thanks,
-Kame

> ==
> INFO: RCU detected CPU 0 stall (t=10000 jiffies)
> sending NMI to all CPUs:
> NMI backtrace for cpu 0
> CPU 0:
> Modules linked in: sco bridge stp bnep l2cap crc16 bluetooth rfkill iptabl
> e_filter ip_tables ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log d
> m_multipath dm_mod uinput ppdev i2c_i801 pcspkr i2c_core bnx2 sg e1000e parport_pc parport button shpchp megaraid_sas sd_mo
> d scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
>  Pid: 2886, comm: ruby Not tainted 2.6.31-mm1 #2 PRIMERGY
>  RIP: 0010:[<ffffffff810878fe>]  [<ffffffff810878fe>] trace_hardirqs_off_ca
> ller+0x3e/0xb RSP: 0018:ffff88004fa03d98  EFLAGS: 00000006
>  RAX: 0000000000000046 RBX: 0000000000000c00 RCX: 000000000000e501
>  RDX: ffff8806133564f0 RSI: 0000000000000002 RDI: ffffffff8102a940
>  RBP: ffff88004fa03d98 R08: 0000000000000001 R09: 0000000000000000
>  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
>  R13: 0000000000000046 R14: 00000000000000ff R15: ffff88004fa03f48
>  FS:  00007fdeca0856f0(0000) GS:ffff88004fa00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007fdeca09e000 CR3: 0000000619fc6000 CR4: 00000000000006f0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>  Call Trace:
>   <#DB[1]>  <<EOE>> Pid: 2886, comm: ruby Not tainted 2.6.31-mm1 #2
>  Call Trace:
>   <NMI>  [<ffffffff8100af79>] ? show_regs+0x49/0x50
>   [<ffffffff81429385>] nmi_watchdog_tick+0x1e5/0x210
>   [<ffffffff81428891>] do_nmi+0x1b1/0x2e0
>   [<ffffffff8142808a>] nmi+0x1a/0x2c
>   [<ffffffff8102a940>] ? flat_send_IPI_mask+0x90/0xb0
>   [<ffffffff810878fe>] ? trace_hardirqs_off_caller+0x3e/0xb0
>   <<EOE>>  <IRQ>  [<ffffffff810884bd>] trace_hardirqs_off+0xd/0x10
>   [<ffffffff8102a940>] flat_send_IPI_mask+0x90/0xb0
>   [<ffffffff8102a9c9>] flat_send_IPI_all+0x69/0x70
>   [<ffffffff81027372>] arch_trigger_all_cpu_backtrace+0x62/0xa0
>   [<ffffffff810bff8e>] __rcu_pending+0x7e/0x370
>   [<ffffffff810c02c7>] rcu_check_callbacks+0x47/0x130
>   [<ffffffff81063a26>] update_process_times+0x46/0x70
>   [<ffffffff81085930>] tick_sched_timer+0x60/0x160
>   [<ffffffff810858d0>] ? tick_sched_timer+0x0/0x160
>   [<ffffffff8107a03a>] __run_hrtimer+0xba/0x150
>   [<ffffffff8107a325>] hrtimer_interrupt+0xd5/0x1b0
>   [<ffffffff81426dfe>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>   [<ffffffff8142cacd>] smp_apic_timer_interrupt+0x6d/0x9b
>   [<ffffffff8100cb33>] apic_timer_interrupt+0x13/0x20
>   <EOI>  [<ffffffff811317b6>] ? mem_cgroup_walk_tree+0x156/0x180
>   [<ffffffff811316d3>] ? mem_cgroup_walk_tree+0x73/0x180
>   [<ffffffff81131692>] ? mem_cgroup_walk_tree+0x32/0x180
>   [<ffffffff81131a00>] ? mem_cgroup_get_local_stat+0x0/0x110
>   [<ffffffff81131d5b>] ? mem_control_stat_show+0x14b/0x330
>   [<ffffffff810a57fd>] ? cgroup_seqfile_show+0x3d/0x60
>   [<ffffffff810a5b90>] ? cgroup_map_add+0x0/0x30
>   [<ffffffff8115de03>] ? seq_read+0xf3/0x420
>   [<ffffffff811d9926>] ? security_file_permission+0x16/0x20
>   [<ffffffff8113b7ec>] ? vfs_read+0xcc/0x190
>   [<ffffffff8113b9b5>] ? sys_read+0x55/0x90
>   [<ffffffff8100bf9b>] ? system_call_fastpath+0x16/0x1b
> .....
> ==
> mem_cgroup_walk_tree() is not very young function and I've been doing the
> same kind of tests but this is 1st messeage for me. So, I'm a bit confused.
> and not sure how I start debug from...
> 
> Does this mean mem_cgroup_walk_tree() blocks RCU's progress
> over 10000 jiffies ? What should I doubt ?
> 
> Thanks,
> -Kame
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [BUGFIX][PATCH][rc1] memcg: fix refcnt goes to minus
  2009-09-28  6:42 ` Question: RCU stall detected in memcg (Re: " KAMEZAWA Hiroyuki
  2009-09-28  8:33   ` KAMEZAWA Hiroyuki
@ 2009-09-28  9:06   ` KAMEZAWA Hiroyuki
  2009-09-28  9:13     ` [PATCH][rc1] cgroup: catch bad css refcnt at css_put KAMEZAWA Hiroyuki
  2009-09-28  9:32     ` [BUGFIX][PATCH][rc1] memcg: fix refcnt goes to minus Balbir Singh
  2009-09-28  9:22   ` Question: RCU stall detected in memcg (Re: mmotm 2009-09-25-14-35 uploaded Balbir Singh
  2009-09-28  9:34   ` Paul E. McKenney
  3 siblings, 2 replies; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-28  9:06 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, akpm, mingo, balbir, nishimura


> At testing my (small) patch, with high memory pressure to
> memcg+hierarchy+softlimit, following is shown.
> ==
> INFO: RCU detected CPU 0 stall (t=10000 jiffies)
> sending NMI to all CPUs:
> NMI backtrace for cpu 0
> CPU 0:
> Modules linked in: sco bridge stp bnep l2cap crc16 bluetooth rfkill iptabl
> e_filter ip_tables ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log d
> m_multipath dm_mod uinput ppdev i2c_i801 pcspkr i2c_core bnx2 sg e1000e parport_pc parport button shpchp megaraid_sas sd_mo
> d scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
>  Pid: 2886, comm: ruby Not tainted 2.6.31-mm1 #2 PRIMERGY
>  RIP: 0010:[<ffffffff810878fe>]  [<ffffffff810878fe>] trace_hardirqs_off_ca
> ller+0x3e/0xb RSP: 0018:ffff88004fa03d98  EFLAGS: 00000006
>  RAX: 0000000000000046 RBX: 0000000000000c00 RCX: 000000000000e501
>  RDX: ffff8806133564f0 RSI: 0000000000000002 RDI: ffffffff8102a940
>  RBP: ffff88004fa03d98 R08: 0000000000000001 R09: 0000000000000000
>  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
>  R13: 0000000000000046 R14: 00000000000000ff R15: ffff88004fa03f48
>  FS:  00007fdeca0856f0(0000) GS:ffff88004fa00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007fdeca09e000 CR3: 0000000619fc6000 CR4: 00000000000006f0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>  Call Trace:
>   <#DB[1]>  <<EOE>> Pid: 2886, comm: ruby Not tainted 2.6.31-mm1 #2
>  Call Trace:
>   <NMI>  [<ffffffff8100af79>] ? show_regs+0x49/0x50
>   [<ffffffff81429385>] nmi_watchdog_tick+0x1e5/0x210
>   [<ffffffff81428891>] do_nmi+0x1b1/0x2e0
>   [<ffffffff8142808a>] nmi+0x1a/0x2c
>   [<ffffffff8102a940>] ? flat_send_IPI_mask+0x90/0xb0
>   [<ffffffff810878fe>] ? trace_hardirqs_off_caller+0x3e/0xb0
>   <<EOE>>  <IRQ>  [<ffffffff810884bd>] trace_hardirqs_off+0xd/0x10
>   [<ffffffff8102a940>] flat_send_IPI_mask+0x90/0xb0
>   [<ffffffff8102a9c9>] flat_send_IPI_all+0x69/0x70
>   [<ffffffff81027372>] arch_trigger_all_cpu_backtrace+0x62/0xa0
>   [<ffffffff810bff8e>] __rcu_pending+0x7e/0x370
>   [<ffffffff810c02c7>] rcu_check_callbacks+0x47/0x130
>   [<ffffffff81063a26>] update_process_times+0x46/0x70
>   [<ffffffff81085930>] tick_sched_timer+0x60/0x160
>   [<ffffffff810858d0>] ? tick_sched_timer+0x0/0x160
>   [<ffffffff8107a03a>] __run_hrtimer+0xba/0x150
>   [<ffffffff8107a325>] hrtimer_interrupt+0xd5/0x1b0
>   [<ffffffff81426dfe>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>   [<ffffffff8142cacd>] smp_apic_timer_interrupt+0x6d/0x9b
>   [<ffffffff8100cb33>] apic_timer_interrupt+0x13/0x20
>   <EOI>  [<ffffffff811317b6>] ? mem_cgroup_walk_tree+0x156/0x180
>   [<ffffffff811316d3>] ? mem_cgroup_walk_tree+0x73/0x180
>   [<ffffffff81131692>] ? mem_cgroup_walk_tree+0x32/0x180
>   [<ffffffff81131a00>] ? mem_cgroup_get_local_stat+0x0/0x110
>   [<ffffffff81131d5b>] ? mem_control_stat_show+0x14b/0x330
>   [<ffffffff810a57fd>] ? cgroup_seqfile_show+0x3d/0x60
>   [<ffffffff810a5b90>] ? cgroup_map_add+0x0/0x30
>   [<ffffffff8115de03>] ? seq_read+0xf3/0x420
>   [<ffffffff811d9926>] ? security_file_permission+0x16/0x20
>   [<ffffffff8113b7ec>] ? vfs_read+0xcc/0x190
>   [<ffffffff8113b9b5>] ? sys_read+0x55/0x90
>   [<ffffffff8100bf9b>] ? system_call_fastpath+0x16/0x1b
> .....
> ==

This is a patch for 2.6.31-rc1 (maybe no hunk with -mm)
==
__mem_cgroup_largest_soft_limit_node() returns a mem_cgroup_per_zone "mz"
with incremnted mz->mem->css's refcnt.
Then, the caller of this function has to call css_put(mz->mem->css).

But, mz can be !NULL even if "not found" i.e. without css_get().
By this, css->refcnt will go down to minus.

This may cause various things...one of results will be
initite-loop in css_tryget()  as this.
 
INFO: RCU detected CPU 0 stall (t=10000 jiffies)
sending NMI to all CPUs:
NMI backtrace for cpu 0
CPU 0:
<snip>

 <<EOE>>  <IRQ>  [<ffffffff810884bd>] trace_hardirqs_off+0xd/0x10
  [<ffffffff8102a940>] flat_send_IPI_mask+0x90/0xb0
  [<ffffffff8102a9c9>] flat_send_IPI_all+0x69/0x70
  [<ffffffff81027372>] arch_trigger_all_cpu_backtrace+0x62/0xa0
  [<ffffffff810bff8e>] __rcu_pending+0x7e/0x370
  [<ffffffff810c02c7>] rcu_check_callbacks+0x47/0x130
  [<ffffffff81063a26>] update_process_times+0x46/0x70
  [<ffffffff81085930>] tick_sched_timer+0x60/0x160
  [<ffffffff810858d0>] ? tick_sched_timer+0x0/0x160
  [<ffffffff8107a03a>] __run_hrtimer+0xba/0x150
  [<ffffffff8107a325>] hrtimer_interrupt+0xd5/0x1b0
  [<ffffffff81426dfe>] ? trace_hardirqs_off_thunk+0x3a/0x3c
  [<ffffffff8142cacd>] smp_apic_timer_interrupt+0x6d/0x9b
  [<ffffffff8100cb33>] apic_timer_interrupt+0x13/0x20
  <EOI>  [<ffffffff811317b6>] ? mem_cgroup_walk_tree+0x156/0x180
  [<ffffffff811316d3>] ? mem_cgroup_walk_tree+0x73/0x180
  [<ffffffff81131692>] ? mem_cgroup_walk_tree+0x32/0x180
  [<ffffffff81131a00>] ? mem_cgroup_get_local_stat+0x0/0x110
  [<ffffffff81131d5b>] ? mem_control_stat_show+0x14b/0x330
  [<ffffffff810a57fd>] ? cgroup_seqfile_show+0x3d/0x60

Above shows CPU0 caught in css_tryget()'s inifinite loop because
of bad refcnt.

This is a fix to set mz=NULL at the top of retry path.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 mm/memcontrol.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6.32-rc1/mm/memcontrol.c
===================================================================
--- linux-2.6.32-rc1.orig/mm/memcontrol.c
+++ linux-2.6.32-rc1/mm/memcontrol.c
@@ -447,9 +447,10 @@ static struct mem_cgroup_per_zone *
 __mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz)
 {
 	struct rb_node *rightmost = NULL;
-	struct mem_cgroup_per_zone *mz = NULL;
+	struct mem_cgroup_per_zone *mz;
 
 retry:
+	mz = NULL;
 	rightmost = rb_last(&mctz->rb_root);
 	if (!rightmost)
 		goto done;		/* Nothing to reclaim from */




^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH][rc1] cgroup: catch bad css refcnt at css_put
  2009-09-28  9:06   ` [BUGFIX][PATCH][rc1] memcg: fix refcnt goes to minus KAMEZAWA Hiroyuki
@ 2009-09-28  9:13     ` KAMEZAWA Hiroyuki
  2009-09-28 14:20       ` Paul Menage
  2009-09-29  0:50       ` Li Zefan
  2009-09-28  9:32     ` [BUGFIX][PATCH][rc1] memcg: fix refcnt goes to minus Balbir Singh
  1 sibling, 2 replies; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-28  9:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-kernel, akpm, mingo, balbir, nishimura, menage, lizf

This is a patch for checking css->refcnt's sanity at css_put().

BTW, I noticed that...css->refcnt may overflow if used with memcg...
Now, refcnt is incremented per a page. Paul, do you have any idea ?
(Ah, yes. "don't use css->refcnt per page" is maybe reasonable but
 it will be big change..)

==
__css_put() doesn't check a bug as refcnt goes to minus.
I think it should be caught. This patch adds a check for it.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 kernel/cgroup.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Index: linux-2.6.32-rc1/kernel/cgroup.c
===================================================================
--- linux-2.6.32-rc1.orig/kernel/cgroup.c
+++ linux-2.6.32-rc1/kernel/cgroup.c
@@ -3708,8 +3708,10 @@ static void check_for_release(struct cgr
 void __css_put(struct cgroup_subsys_state *css)
 {
 	struct cgroup *cgrp = css->cgroup;
+	int val;
 	rcu_read_lock();
-	if (atomic_dec_return(&css->refcnt) == 1) {
+	val = atomic_dec_return(&css->refcnt);
+	if (val == 1) {
 		if (notify_on_release(cgrp)) {
 			set_bit(CGRP_RELEASABLE, &cgrp->flags);
 			check_for_release(cgrp);
@@ -3717,6 +3719,7 @@ void __css_put(struct cgroup_subsys_stat
 		cgroup_wakeup_rmdir_waiter(cgrp);
 	}
 	rcu_read_unlock();
+	WARN_ON(val < 1);
 }
 
 /*


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Question: RCU stall detected in memcg (Re: mmotm 2009-09-25-14-35  uploaded
  2009-09-28  6:42 ` Question: RCU stall detected in memcg (Re: " KAMEZAWA Hiroyuki
  2009-09-28  8:33   ` KAMEZAWA Hiroyuki
  2009-09-28  9:06   ` [BUGFIX][PATCH][rc1] memcg: fix refcnt goes to minus KAMEZAWA Hiroyuki
@ 2009-09-28  9:22   ` Balbir Singh
  2009-09-28 11:26     ` KAMEZAWA Hiroyuki
  2009-09-28  9:34   ` Paul E. McKenney
  3 siblings, 1 reply; 22+ messages in thread
From: Balbir Singh @ 2009-09-28  9:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, akpm, mingo, nishimura

> akpm@linux-foundation.org wrote:
>
>> The mm-of-the-moment snapshot 2009-09-25-14-35 has been uploaded to
>>
>>    http://userweb.kernel.org/~akpm/mmotm/
>>
>> and will soon be available at
>>
>>    git://git.zen-sources.org/zen/mmotm.git
>>
>> It contains the following patches against 2.6.31:
>>
>
> At testing my (small) patch, with high memory pressure to
> memcg+hierarchy+softlimit, following is shown.
> ==
> INFO: RCU detected CPU 0 stall (t=10000 jiffies)
> sending NMI to all CPUs:
> NMI backtrace for cpu 0
> CPU 0:
> Modules linked in: sco bridge stp bnep l2cap crc16 bluetooth rfkill iptabl
> e_filter ip_tables ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log d
> m_multipath dm_mod uinput ppdev i2c_i801 pcspkr i2c_core bnx2 sg e1000e parport_pc parport button shpchp megaraid_sas sd_mo
> d scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
>  Pid: 2886, comm: ruby Not tainted 2.6.31-mm1 #2 PRIMERGY
>  RIP: 0010:[<ffffffff810878fe>]  [<ffffffff810878fe>] trace_hardirqs_off_ca
> ller+0x3e/0xb RSP: 0018:ffff88004fa03d98  EFLAGS: 00000006
>  RAX: 0000000000000046 RBX: 0000000000000c00 RCX: 000000000000e501
>  RDX: ffff8806133564f0 RSI: 0000000000000002 RDI: ffffffff8102a940
>  RBP: ffff88004fa03d98 R08: 0000000000000001 R09: 0000000000000000
>  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
>  R13: 0000000000000046 R14: 00000000000000ff R15: ffff88004fa03f48
>  FS:  00007fdeca0856f0(0000) GS:ffff88004fa00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007fdeca09e000 CR3: 0000000619fc6000 CR4: 00000000000006f0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>  Call Trace:
>  <#DB[1]>  <<EOE>> Pid: 2886, comm: ruby Not tainted 2.6.31-mm1 #2
>  Call Trace:
>  <NMI>  [<ffffffff8100af79>] ? show_regs+0x49/0x50
>  [<ffffffff81429385>] nmi_watchdog_tick+0x1e5/0x210
>  [<ffffffff81428891>] do_nmi+0x1b1/0x2e0
>  [<ffffffff8142808a>] nmi+0x1a/0x2c
>  [<ffffffff8102a940>] ? flat_send_IPI_mask+0x90/0xb0
>  [<ffffffff810878fe>] ? trace_hardirqs_off_caller+0x3e/0xb0
>  <<EOE>>  <IRQ>  [<ffffffff810884bd>] trace_hardirqs_off+0xd/0x10
>  [<ffffffff8102a940>] flat_send_IPI_mask+0x90/0xb0
>  [<ffffffff8102a9c9>] flat_send_IPI_all+0x69/0x70
>  [<ffffffff81027372>] arch_trigger_all_cpu_backtrace+0x62/0xa0
>  [<ffffffff810bff8e>] __rcu_pending+0x7e/0x370
>  [<ffffffff810c02c7>] rcu_check_callbacks+0x47/0x130
>  [<ffffffff81063a26>] update_process_times+0x46/0x70
>  [<ffffffff81085930>] tick_sched_timer+0x60/0x160
>  [<ffffffff810858d0>] ? tick_sched_timer+0x0/0x160
>  [<ffffffff8107a03a>] __run_hrtimer+0xba/0x150
>  [<ffffffff8107a325>] hrtimer_interrupt+0xd5/0x1b0
>  [<ffffffff81426dfe>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>  [<ffffffff8142cacd>] smp_apic_timer_interrupt+0x6d/0x9b
>  [<ffffffff8100cb33>] apic_timer_interrupt+0x13/0x20
>  <EOI>  [<ffffffff811317b6>] ? mem_cgroup_walk_tree+0x156/0x180
>  [<ffffffff811316d3>] ? mem_cgroup_walk_tree+0x73/0x180
>  [<ffffffff81131692>] ? mem_cgroup_walk_tree+0x32/0x180
>  [<ffffffff81131a00>] ? mem_cgroup_get_local_stat+0x0/0x110
>  [<ffffffff81131d5b>] ? mem_control_stat_show+0x14b/0x330
>  [<ffffffff810a57fd>] ? cgroup_seqfile_show+0x3d/0x60
>  [<ffffffff810a5b90>] ? cgroup_map_add+0x0/0x30
>  [<ffffffff8115de03>] ? seq_read+0xf3/0x420
>  [<ffffffff811d9926>] ? security_file_permission+0x16/0x20
>  [<ffffffff8113b7ec>] ? vfs_read+0xcc/0x190
>  [<ffffffff8113b9b5>] ? sys_read+0x55/0x90
>  [<ffffffff8100bf9b>] ? system_call_fastpath+0x16/0x1b
> .....
> ==
> mem_cgroup_walk_tree() is not very young function and I've been doing the
> same kind of tests but this is 1st messeage for me. So, I'm a bit confused.
> and not sure how I start debug from...
>
> Does this mean mem_cgroup_walk_tree() blocks RCU's progress
> over 10000 jiffies ? What should I doubt ?

Could you share your test case so that I can reproduce the problem? My
tests seem to work fine so far

Balbir Singh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [BUGFIX][PATCH][rc1] memcg: fix refcnt goes to minus
  2009-09-28  9:06   ` [BUGFIX][PATCH][rc1] memcg: fix refcnt goes to minus KAMEZAWA Hiroyuki
  2009-09-28  9:13     ` [PATCH][rc1] cgroup: catch bad css refcnt at css_put KAMEZAWA Hiroyuki
@ 2009-09-28  9:32     ` Balbir Singh
  2009-09-28 11:22       ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 22+ messages in thread
From: Balbir Singh @ 2009-09-28  9:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, akpm, mingo, nishimura

> __mem_cgroup_largest_soft_limit_node() returns a mem_cgroup_per_zone "mz"
> with incremnted mz->mem->css's refcnt.
> Then, the caller of this function has to call css_put(mz->mem->css).
>
> But, mz can be !NULL even if "not found" i.e. without css_get().
> By this, css->refcnt will go down to minus.
>
> This may cause various things...one of results will be
> initite-loop in css_tryget()  as this.
>
> INFO: RCU detected CPU 0 stall (t=10000 jiffies)
> sending NMI to all CPUs:
> NMI backtrace for cpu 0
> CPU 0:
> <snip>
>
>  <<EOE>>  <IRQ>  [<ffffffff810884bd>] trace_hardirqs_off+0xd/0x10
>  [<ffffffff8102a940>] flat_send_IPI_mask+0x90/0xb0
>  [<ffffffff8102a9c9>] flat_send_IPI_all+0x69/0x70
>  [<ffffffff81027372>] arch_trigger_all_cpu_backtrace+0x62/0xa0
>  [<ffffffff810bff8e>] __rcu_pending+0x7e/0x370
>  [<ffffffff810c02c7>] rcu_check_callbacks+0x47/0x130
>  [<ffffffff81063a26>] update_process_times+0x46/0x70
>  [<ffffffff81085930>] tick_sched_timer+0x60/0x160
>  [<ffffffff810858d0>] ? tick_sched_timer+0x0/0x160
>  [<ffffffff8107a03a>] __run_hrtimer+0xba/0x150
>  [<ffffffff8107a325>] hrtimer_interrupt+0xd5/0x1b0
>  [<ffffffff81426dfe>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>  [<ffffffff8142cacd>] smp_apic_timer_interrupt+0x6d/0x9b
>  [<ffffffff8100cb33>] apic_timer_interrupt+0x13/0x20
>  <EOI>  [<ffffffff811317b6>] ? mem_cgroup_walk_tree+0x156/0x180
>  [<ffffffff811316d3>] ? mem_cgroup_walk_tree+0x73/0x180
>  [<ffffffff81131692>] ? mem_cgroup_walk_tree+0x32/0x180
>  [<ffffffff81131a00>] ? mem_cgroup_get_local_stat+0x0/0x110
>  [<ffffffff81131d5b>] ? mem_control_stat_show+0x14b/0x330
>  [<ffffffff810a57fd>] ? cgroup_seqfile_show+0x3d/0x60
>
> Above shows CPU0 caught in css_tryget()'s inifinite loop because
> of bad refcnt.
>
> This is a fix to set mz=NULL at the top of retry path.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> ---
>  mm/memcontrol.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> Index: linux-2.6.32-rc1/mm/memcontrol.c
> ===================================================================
> --- linux-2.6.32-rc1.orig/mm/memcontrol.c
> +++ linux-2.6.32-rc1/mm/memcontrol.c
> @@ -447,9 +447,10 @@ static struct mem_cgroup_per_zone *
>  __mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz)
>  {
>        struct rb_node *rightmost = NULL;
> -       struct mem_cgroup_per_zone *mz = NULL;
> +       struct mem_cgroup_per_zone *mz;
>
>  retry:
> +       mz = NULL;
>        rightmost = rb_last(&mctz->rb_root);
>        if (!rightmost)
>                goto done;              /* Nothing to reclaim from */
>

Good catch! So we fail at css_tryget() once, but mz is valid, we
return a non NULL mz and we do a css_put() causing ref count to go
bad. The next iteration that uses this mz will hang? I've not been
able to hit, may be I should test in a really small cgroup under
stress.

Balbir Singh.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Question: RCU stall detected in memcg  (Re: mmotm 2009-09-25-14-35 uploaded
  2009-09-28  6:42 ` Question: RCU stall detected in memcg (Re: " KAMEZAWA Hiroyuki
                     ` (2 preceding siblings ...)
  2009-09-28  9:22   ` Question: RCU stall detected in memcg (Re: mmotm 2009-09-25-14-35 uploaded Balbir Singh
@ 2009-09-28  9:34   ` Paul E. McKenney
  2009-09-28 11:30     ` KAMEZAWA Hiroyuki
  3 siblings, 1 reply; 22+ messages in thread
From: Paul E. McKenney @ 2009-09-28  9:34 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, akpm, mingo, balbir, nishimura

On Mon, Sep 28, 2009 at 03:42:13PM +0900, KAMEZAWA Hiroyuki wrote:
> On Fri, 25 Sep 2009 14:35:46 -0700
> akpm@linux-foundation.org wrote:
> 
> > The mm-of-the-moment snapshot 2009-09-25-14-35 has been uploaded to
> > 
> >    http://userweb.kernel.org/~akpm/mmotm/
> > 
> > and will soon be available at
> > 
> >    git://git.zen-sources.org/zen/mmotm.git
> > 
> > It contains the following patches against 2.6.31:
> > 
> 
> At testing my (small) patch, with high memory pressure to
> memcg+hierarchy+softlimit, following is shown.
> ==
> INFO: RCU detected CPU 0 stall (t=10000 jiffies)
> sending NMI to all CPUs:
> NMI backtrace for cpu 0
> CPU 0:
> Modules linked in: sco bridge stp bnep l2cap crc16 bluetooth rfkill iptabl
> e_filter ip_tables ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log d
> m_multipath dm_mod uinput ppdev i2c_i801 pcspkr i2c_core bnx2 sg e1000e parport_pc parport button shpchp megaraid_sas sd_mo
> d scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
>  Pid: 2886, comm: ruby Not tainted 2.6.31-mm1 #2 PRIMERGY
>  RIP: 0010:[<ffffffff810878fe>]  [<ffffffff810878fe>] trace_hardirqs_off_ca
> ller+0x3e/0xb RSP: 0018:ffff88004fa03d98  EFLAGS: 00000006
>  RAX: 0000000000000046 RBX: 0000000000000c00 RCX: 000000000000e501
>  RDX: ffff8806133564f0 RSI: 0000000000000002 RDI: ffffffff8102a940
>  RBP: ffff88004fa03d98 R08: 0000000000000001 R09: 0000000000000000
>  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
>  R13: 0000000000000046 R14: 00000000000000ff R15: ffff88004fa03f48
>  FS:  00007fdeca0856f0(0000) GS:ffff88004fa00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007fdeca09e000 CR3: 0000000619fc6000 CR4: 00000000000006f0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>  Call Trace:
>   <#DB[1]>  <<EOE>> Pid: 2886, comm: ruby Not tainted 2.6.31-mm1 #2
>  Call Trace:
>   <NMI>  [<ffffffff8100af79>] ? show_regs+0x49/0x50
>   [<ffffffff81429385>] nmi_watchdog_tick+0x1e5/0x210
>   [<ffffffff81428891>] do_nmi+0x1b1/0x2e0
>   [<ffffffff8142808a>] nmi+0x1a/0x2c
>   [<ffffffff8102a940>] ? flat_send_IPI_mask+0x90/0xb0
>   [<ffffffff810878fe>] ? trace_hardirqs_off_caller+0x3e/0xb0
>   <<EOE>>  <IRQ>  [<ffffffff810884bd>] trace_hardirqs_off+0xd/0x10
>   [<ffffffff8102a940>] flat_send_IPI_mask+0x90/0xb0
>   [<ffffffff8102a9c9>] flat_send_IPI_all+0x69/0x70
>   [<ffffffff81027372>] arch_trigger_all_cpu_backtrace+0x62/0xa0
>   [<ffffffff810bff8e>] __rcu_pending+0x7e/0x370
>   [<ffffffff810c02c7>] rcu_check_callbacks+0x47/0x130
>   [<ffffffff81063a26>] update_process_times+0x46/0x70
>   [<ffffffff81085930>] tick_sched_timer+0x60/0x160
>   [<ffffffff810858d0>] ? tick_sched_timer+0x0/0x160
>   [<ffffffff8107a03a>] __run_hrtimer+0xba/0x150
>   [<ffffffff8107a325>] hrtimer_interrupt+0xd5/0x1b0
>   [<ffffffff81426dfe>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>   [<ffffffff8142cacd>] smp_apic_timer_interrupt+0x6d/0x9b
>   [<ffffffff8100cb33>] apic_timer_interrupt+0x13/0x20
>   <EOI>  [<ffffffff811317b6>] ? mem_cgroup_walk_tree+0x156/0x180
>   [<ffffffff811316d3>] ? mem_cgroup_walk_tree+0x73/0x180
>   [<ffffffff81131692>] ? mem_cgroup_walk_tree+0x32/0x180

mem_cgroup_walk_tree() does have a loop.  Not sure why it is showing
up on the stack twice.  But it might well be the culprit.  Also,
it repeatedly calls css_get_next(), which acquires ss->id_lock.
If some other CPU is holding this lock too long, or failed to release
it, then we could see a stall spinning on the lock.  (Though these
sorts of problems are often caught by other diagnostics.)

>   [<ffffffff81131a00>] ? mem_cgroup_get_local_stat+0x0/0x110

mem_cgroup_get_local_stat() is straight-line code, so should not be
the problem.

>   [<ffffffff81131d5b>] ? mem_control_stat_show+0x14b/0x330

mem_control_stat_show() has a pair of fixed-iteration loops, so
should not be the problem.

>   [<ffffffff810a57fd>] ? cgroup_seqfile_show+0x3d/0x60

Straight-line code, should not be a problem.

>   [<ffffffff810a5b90>] ? cgroup_map_add+0x0/0x30

Straight-line code, should not be a problem.

>   [<ffffffff8115de03>] ? seq_read+0xf3/0x420

I suppose that seq_read() could be repeatedly getting -EAGAIN out of
traverse(), but unless I am missing something, we would be seeing a
different stack trace in that case.

>   [<ffffffff811d9926>] ? security_file_permission+0x16/0x20

... and so on ...

>   [<ffffffff8113b7ec>] ? vfs_read+0xcc/0x190
>   [<ffffffff8113b9b5>] ? sys_read+0x55/0x90
>   [<ffffffff8100bf9b>] ? system_call_fastpath+0x16/0x1b
> .....
> ==
> mem_cgroup_walk_tree() is not very young function and I've been doing the
> same kind of tests but this is 1st messeage for me. So, I'm a bit confused.
> and not sure how I start debug from...
> 
> Does this mean mem_cgroup_walk_tree() blocks RCU's progress
> over 10000 jiffies ? What should I doubt ?

It means that this CPU has indeed been blocking RCU's progress for more
than 10 seconds (10,000 jiffies if HZ==1000).  One thing would be to
wait for another 30 seconds and see if you get another stall warning.
If not, then you could try decreasing RCU_SECONDS_TILL_STALL_RECHECK to
(say) 5*HZ.

Either way, if you get two stack traces during the same CPU-stall event,
it can give you a much better idea where in the callstack the stall is
actually occurring.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [BUGFIX][PATCH][rc1] memcg: fix refcnt goes to minus
  2009-09-28  9:32     ` [BUGFIX][PATCH][rc1] memcg: fix refcnt goes to minus Balbir Singh
@ 2009-09-28 11:22       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-28 11:22 UTC (permalink / raw)
  To: Balbir Singh; +Cc: KAMEZAWA Hiroyuki, linux-kernel, akpm, mingo, nishimura

Balbir Singh wrote:
>> __mem_cgroup_largest_soft_limit_node() returns a mem_cgroup_per_zone
>> "mz"
>> with incremnted mz->mem->css's refcnt.
>> Then, the caller of this function has to call css_put(mz->mem->css).
>>
>> But, mz can be !NULL even if "not found" i.e. without css_get().
>> By this, css->refcnt will go down to minus.
>>
>> This may cause various things...one of results will be
>> initite-loop in css_tryget() &#160;as this.
>>
>> INFO: RCU detected CPU 0 stall (t=10000 jiffies)
>> sending NMI to all CPUs:
>> NMI backtrace for cpu 0
>> CPU 0:
>> <snip>
>>
>> &#160;<<EOE>> &#160;<IRQ> &#160;[<ffffffff810884bd>]
trace_hardirqs_off+0xd/0x10
>> &#160;[<ffffffff8102a940>] flat_send_IPI_mask+0x90/0xb0
>> &#160;[<ffffffff8102a9c9>] flat_send_IPI_all+0x69/0x70
>> &#160;[<ffffffff81027372>] arch_trigger_all_cpu_backtrace+0x62/0xa0
>> &#160;[<ffffffff810bff8e>] __rcu_pending+0x7e/0x370
>> &#160;[<ffffffff810c02c7>] rcu_check_callbacks+0x47/0x130
>> &#160;[<ffffffff81063a26>] update_process_times+0x46/0x70
>> &#160;[<ffffffff81085930>] tick_sched_timer+0x60/0x160
>> &#160;[<ffffffff810858d0>] ? tick_sched_timer+0x0/0x160
>> &#160;[<ffffffff8107a03a>] __run_hrtimer+0xba/0x150
>> &#160;[<ffffffff8107a325>] hrtimer_interrupt+0xd5/0x1b0
>> &#160;[<ffffffff81426dfe>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>> &#160;[<ffffffff8142cacd>] smp_apic_timer_interrupt+0x6d/0x9b
>> &#160;[<ffffffff8100cb33>] apic_timer_interrupt+0x13/0x20
>> &#160;<EOI> &#160;[<ffffffff811317b6>] ? mem_cgroup_walk_tree+0x156/0x180
>> &#160;[<ffffffff811316d3>] ? mem_cgroup_walk_tree+0x73/0x180
>> &#160;[<ffffffff81131692>] ? mem_cgroup_walk_tree+0x32/0x180
>> &#160;[<ffffffff81131a00>] ? mem_cgroup_get_local_stat+0x0/0x110
>> &#160;[<ffffffff81131d5b>] ? mem_control_stat_show+0x14b/0x330
>> &#160;[<ffffffff810a57fd>] ? cgroup_seqfile_show+0x3d/0x60
>>
>> Above shows CPU0 caught in css_tryget()'s inifinite loop because
>> of bad refcnt.
>>
>> This is a fix to set mz=NULL at the top of retry path.
>>
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>
>> ---
>> &#160;mm/memcontrol.c | &#160; &#160;3 ++-
>> &#160;1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> Index: linux-2.6.32-rc1/mm/memcontrol.c
>> ===================================================================
>> --- linux-2.6.32-rc1.orig/mm/memcontrol.c
>> +++ linux-2.6.32-rc1/mm/memcontrol.c
>> @@ -447,9 +447,10 @@ static struct mem_cgroup_per_zone *
>> &#160;__mem_cgroup_largest_soft_limit_node(struct
mem_cgroup_tree_per_zone *mctz)
>> &#160;{
>> &#160; &#160; &#160; &#160;struct rb_node *rightmost = NULL;
>> - &#160; &#160; &#160; struct mem_cgroup_per_zone *mz = NULL;
>> + &#160; &#160; &#160; struct mem_cgroup_per_zone *mz;
>>
>> &#160;retry:
>> + &#160; &#160; &#160; mz = NULL;
>> &#160; &#160; &#160; &#160;rightmost = rb_last(&mctz->rb_root);
>> &#160; &#160; &#160; &#160;if (!rightmost)
>> &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;goto done;
&#160; &#160; &#160; &#160; &#160; &#160; &#160;/* Nothing to reclaim
from */
>>
>
> Good catch! So we fail at css_tryget() once, but mz is valid, we
> return a non NULL mz and we do a css_put() causing ref count to go
> bad.
yes.
> The next iteration that uses this mz will hang?
Maybe no. next css_tryget() hangs if css->refcnt is 0.

> I've not been
> able to hit, may be I should test in a really small cgroup under
> stress.
>
Maybe my small patch (not posted yet) affected and I couldn't reproduce it.
But the bug itself should be fixed.
(So, I want to add a sanity check to css_put()).

Thanks,
-Kame


> Balbir Singh.
>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Question: RCU stall detected in memcg (Re: mmotm 2009-09-25-14-35  uploaded
  2009-09-28  9:22   ` Question: RCU stall detected in memcg (Re: mmotm 2009-09-25-14-35 uploaded Balbir Singh
@ 2009-09-28 11:26     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-28 11:26 UTC (permalink / raw)
  To: Balbir Singh; +Cc: KAMEZAWA Hiroyuki, linux-kernel, akpm, mingo, nishimura

Balbir Singh wrote:
>> akpm@linux-foundation.org wrote:
>>
>>> The mm-of-the-moment snapshot 2009-09-25-14-35 has been uploaded to
>>>
>>> &#160; &#160;http://userweb.kernel.org/~akpm/mmotm/
>>>
>>> and will soon be available at
>>>
>>> &#160; &#160;git://git.zen-sources.org/zen/mmotm.git
>>>
>>> It contains the following patches against 2.6.31:
>>>
>>
>> At testing my (small) patch, with high memory pressure to
>> memcg+hierarchy+softlimit, following is shown.
>> ==
>> INFO: RCU detected CPU 0 stall (t=10000 jiffies)
>> sending NMI to all CPUs:
>> NMI backtrace for cpu 0
>> CPU 0:
>> Modules linked in: sco bridge stp bnep l2cap crc16 bluetooth rfkill
>> iptabl
>> e_filter ip_tables ip6table_filter ip6_tables x_tables ipv6
>> cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log d
>> m_multipath dm_mod uinput ppdev i2c_i801 pcspkr i2c_core bnx2 sg e1000e
>> parport_pc parport button shpchp megaraid_sas sd_mo
>> d scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded:
>> microcode]
>> &#160;Pid: 2886, comm: ruby Not tainted 2.6.31-mm1 #2 PRIMERGY
>> &#160;RIP: 0010:[<ffffffff810878fe>] &#160;[<ffffffff810878fe>]
trace_hardirqs_off_ca
>> ller+0x3e/0xb RSP: 0018:ffff88004fa03d98 &#160;EFLAGS: 00000006
>> &#160;RAX: 0000000000000046 RBX: 0000000000000c00 RCX: 000000000000e501
>> &#160;RDX: ffff8806133564f0 RSI: 0000000000000002 RDI: ffffffff8102a940
>> &#160;RBP: ffff88004fa03d98 R08: 0000000000000001 R09: 0000000000000000
>> &#160;R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
>> &#160;R13: 0000000000000046 R14: 00000000000000ff R15: ffff88004fa03f48
>> &#160;FS: &#160;00007fdeca0856f0(0000) GS:ffff88004fa00000(0000)
knlGS:0000000000000000
>> &#160;CS: &#160;0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> &#160;CR2: 00007fdeca09e000 CR3: 0000000619fc6000 CR4: 00000000000006f0
>> &#160;DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> &#160;DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> &#160;Call Trace:
>> &#160;<#DB[1]> &#160;<<EOE>> Pid: 2886, comm: ruby Not tainted
2.6.31-mm1 #2
>> &#160;Call Trace:
>> &#160;<NMI> &#160;[<ffffffff8100af79>] ? show_regs+0x49/0x50
>> &#160;[<ffffffff81429385>] nmi_watchdog_tick+0x1e5/0x210
>> &#160;[<ffffffff81428891>] do_nmi+0x1b1/0x2e0
>> &#160;[<ffffffff8142808a>] nmi+0x1a/0x2c
>> &#160;[<ffffffff8102a940>] ? flat_send_IPI_mask+0x90/0xb0
>> &#160;[<ffffffff810878fe>] ? trace_hardirqs_off_caller+0x3e/0xb0
>> &#160;<<EOE>> &#160;<IRQ> &#160;[<ffffffff810884bd>]
trace_hardirqs_off+0xd/0x10
>> &#160;[<ffffffff8102a940>] flat_send_IPI_mask+0x90/0xb0
>> &#160;[<ffffffff8102a9c9>] flat_send_IPI_all+0x69/0x70
>> &#160;[<ffffffff81027372>] arch_trigger_all_cpu_backtrace+0x62/0xa0
>> &#160;[<ffffffff810bff8e>] __rcu_pending+0x7e/0x370
>> &#160;[<ffffffff810c02c7>] rcu_check_callbacks+0x47/0x130
>> &#160;[<ffffffff81063a26>] update_process_times+0x46/0x70
>> &#160;[<ffffffff81085930>] tick_sched_timer+0x60/0x160
>> &#160;[<ffffffff810858d0>] ? tick_sched_timer+0x0/0x160
>> &#160;[<ffffffff8107a03a>] __run_hrtimer+0xba/0x150
>> &#160;[<ffffffff8107a325>] hrtimer_interrupt+0xd5/0x1b0
>> &#160;[<ffffffff81426dfe>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>> &#160;[<ffffffff8142cacd>] smp_apic_timer_interrupt+0x6d/0x9b
>> &#160;[<ffffffff8100cb33>] apic_timer_interrupt+0x13/0x20
>> &#160;<EOI> &#160;[<ffffffff811317b6>] ? mem_cgroup_walk_tree+0x156/0x180
>> &#160;[<ffffffff811316d3>] ? mem_cgroup_walk_tree+0x73/0x180
>> &#160;[<ffffffff81131692>] ? mem_cgroup_walk_tree+0x32/0x180
>> &#160;[<ffffffff81131a00>] ? mem_cgroup_get_local_stat+0x0/0x110
>> &#160;[<ffffffff81131d5b>] ? mem_control_stat_show+0x14b/0x330
>> &#160;[<ffffffff810a57fd>] ? cgroup_seqfile_show+0x3d/0x60
>> &#160;[<ffffffff810a5b90>] ? cgroup_map_add+0x0/0x30
>> &#160;[<ffffffff8115de03>] ? seq_read+0xf3/0x420
>> &#160;[<ffffffff811d9926>] ? security_file_permission+0x16/0x20
>> &#160;[<ffffffff8113b7ec>] ? vfs_read+0xcc/0x190
>> &#160;[<ffffffff8113b9b5>] ? sys_read+0x55/0x90
>> &#160;[<ffffffff8100bf9b>] ? system_call_fastpath+0x16/0x1b
>> .....
>> ==
>> mem_cgroup_walk_tree() is not very young function and I've been doing
>> the
>> same kind of tests but this is 1st messeage for me. So, I'm a bit
>> confused.
>> and not sure how I start debug from...
>>
>> Does this mean mem_cgroup_walk_tree() blocks RCU's progress
>> over 10000 jiffies ? What should I doubt ?
>
> Could you share your test case so that I can reproduce the problem? My
> tests seem to work fine so far
>
Create following hierarchy

  /cgroup/A     .... use_hierarchy=1, softlimit=300M, no task
            /01  .... add task which use much memory
            /02  .... no task
          B     .... add task which use much memory

Then, kswapd will reclaim memory from A/01.

I couldn't reproduce this. I doubt ,y own patch may affect.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Question: RCU stall detected in memcg  (Re: mmotm 2009-09-25-14-35 uploaded
  2009-09-28  9:34   ` Paul E. McKenney
@ 2009-09-28 11:30     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-28 11:30 UTC (permalink / raw)
  To: paulmck; +Cc: KAMEZAWA Hiroyuki, linux-kernel, akpm, mingo, balbir, nishimura

Paul E. McKenney wrote:
> On Mon, Sep 28, 2009 at 03:42:13PM +0900, KAMEZAWA Hiroyuki wrote:
>> On Fri, 25 Sep 2009 14:35:46 -0700
>> akpm@linux-foundation.org wrote:
>>
>> > The mm-of-the-moment snapshot 2009-09-25-14-35 has been uploaded to
>> >
>> >    http://userweb.kernel.org/~akpm/mmotm/
>> >
>> > and will soon be available at
>> >
>> >    git://git.zen-sources.org/zen/mmotm.git
>> >
>> > It contains the following patches against 2.6.31:
>> >
>>
>> At testing my (small) patch, with high memory pressure to
>> memcg+hierarchy+softlimit, following is shown.
>> ==
>> INFO: RCU detected CPU 0 stall (t=10000 jiffies)
>> sending NMI to all CPUs:
>> NMI backtrace for cpu 0
>> CPU 0:
>> Modules linked in: sco bridge stp bnep l2cap crc16 bluetooth rfkill
>> iptabl
>> e_filter ip_tables ip6table_filter ip6_tables x_tables ipv6
>> cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log d
>> m_multipath dm_mod uinput ppdev i2c_i801 pcspkr i2c_core bnx2 sg e1000e
>> parport_pc parport button shpchp megaraid_sas sd_mo
>> d scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded:
>> microcode]
>>  Pid: 2886, comm: ruby Not tainted 2.6.31-mm1 #2 PRIMERGY
>>  RIP: 0010:[<ffffffff810878fe>]  [<ffffffff810878fe>]
>> trace_hardirqs_off_ca
>> ller+0x3e/0xb RSP: 0018:ffff88004fa03d98  EFLAGS: 00000006
>>  RAX: 0000000000000046 RBX: 0000000000000c00 RCX: 000000000000e501
>>  RDX: ffff8806133564f0 RSI: 0000000000000002 RDI: ffffffff8102a940
>>  RBP: ffff88004fa03d98 R08: 0000000000000001 R09: 0000000000000000
>>  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
>>  R13: 0000000000000046 R14: 00000000000000ff R15: ffff88004fa03f48
>>  FS:  00007fdeca0856f0(0000) GS:ffff88004fa00000(0000)
>> knlGS:0000000000000000
>>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>  CR2: 00007fdeca09e000 CR3: 0000000619fc6000 CR4: 00000000000006f0
>>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>  Call Trace:
>>   <#DB[1]>  <<EOE>> Pid: 2886, comm: ruby Not tainted 2.6.31-mm1 #2
>>  Call Trace:
>>   <NMI>  [<ffffffff8100af79>] ? show_regs+0x49/0x50
>>   [<ffffffff81429385>] nmi_watchdog_tick+0x1e5/0x210
>>   [<ffffffff81428891>] do_nmi+0x1b1/0x2e0
>>   [<ffffffff8142808a>] nmi+0x1a/0x2c
>>   [<ffffffff8102a940>] ? flat_send_IPI_mask+0x90/0xb0
>>   [<ffffffff810878fe>] ? trace_hardirqs_off_caller+0x3e/0xb0
>>   <<EOE>>  <IRQ>  [<ffffffff810884bd>] trace_hardirqs_off+0xd/0x10
>>   [<ffffffff8102a940>] flat_send_IPI_mask+0x90/0xb0
>>   [<ffffffff8102a9c9>] flat_send_IPI_all+0x69/0x70
>>   [<ffffffff81027372>] arch_trigger_all_cpu_backtrace+0x62/0xa0
>>   [<ffffffff810bff8e>] __rcu_pending+0x7e/0x370
>>   [<ffffffff810c02c7>] rcu_check_callbacks+0x47/0x130
>>   [<ffffffff81063a26>] update_process_times+0x46/0x70
>>   [<ffffffff81085930>] tick_sched_timer+0x60/0x160
>>   [<ffffffff810858d0>] ? tick_sched_timer+0x0/0x160
>>   [<ffffffff8107a03a>] __run_hrtimer+0xba/0x150
>>   [<ffffffff8107a325>] hrtimer_interrupt+0xd5/0x1b0
>>   [<ffffffff81426dfe>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>>   [<ffffffff8142cacd>] smp_apic_timer_interrupt+0x6d/0x9b
>>   [<ffffffff8100cb33>] apic_timer_interrupt+0x13/0x20
>>   <EOI>  [<ffffffff811317b6>] ? mem_cgroup_walk_tree+0x156/0x180
>>   [<ffffffff811316d3>] ? mem_cgroup_walk_tree+0x73/0x180
>>   [<ffffffff81131692>] ? mem_cgroup_walk_tree+0x32/0x180
>
> mem_cgroup_walk_tree() does have a loop.  Not sure why it is showing
> up on the stack twice.  But it might well be the culprit.  Also,
> it repeatedly calls css_get_next(), which acquires ss->id_lock.
> If some other CPU is holding this lock too long, or failed to release
> it, then we could see a stall spinning on the lock.  (Though these
> sorts of problems are often caught by other diagnostics.)
>
Thanks. I found above stack trace is in css_tryget() which has a loop.
It loops forever if css->refcnt is 0. Then, I doubt refcnt-miss in css.



>>   [<ffffffff81131a00>] ? mem_cgroup_get_local_stat+0x0/0x110
>
> mem_cgroup_get_local_stat() is straight-line code, so should not be
> the problem.
>
>>   [<ffffffff81131d5b>] ? mem_control_stat_show+0x14b/0x330
>
> mem_control_stat_show() has a pair of fixed-iteration loops, so
> should not be the problem.
>
>>   [<ffffffff810a57fd>] ? cgroup_seqfile_show+0x3d/0x60
>
> Straight-line code, should not be a problem.
>
>>   [<ffffffff810a5b90>] ? cgroup_map_add+0x0/0x30
>
> Straight-line code, should not be a problem.
>
>>   [<ffffffff8115de03>] ? seq_read+0xf3/0x420
>
> I suppose that seq_read() could be repeatedly getting -EAGAIN out of
> traverse(), but unless I am missing something, we would be seeing a
> different stack trace in that case.
>
>>   [<ffffffff811d9926>] ? security_file_permission+0x16/0x20
>
> ... and so on ...
>
>>   [<ffffffff8113b7ec>] ? vfs_read+0xcc/0x190
>>   [<ffffffff8113b9b5>] ? sys_read+0x55/0x90
>>   [<ffffffff8100bf9b>] ? system_call_fastpath+0x16/0x1b
>> .....
>> ==
>> mem_cgroup_walk_tree() is not very young function and I've been doing
>> the
>> same kind of tests but this is 1st messeage for me. So, I'm a bit
>> confused.
>> and not sure how I start debug from...
>>
>> Does this mean mem_cgroup_walk_tree() blocks RCU's progress
>> over 10000 jiffies ? What should I doubt ?
>
> It means that this CPU has indeed been blocking RCU's progress for more
> than 10 seconds (10,000 jiffies if HZ==1000).  One thing would be to
> wait for another 30 seconds and see if you get another stall warning.
> If not, then you could try decreasing RCU_SECONDS_TILL_STALL_RECHECK to
> (say) 5*HZ.
>
I saw twice in the same place. Thanks. I now doubt memcg's code.

> Either way, if you get two stack traces during the same CPU-stall event,
> it can give you a much better idea where in the callstack the stall is
> actually occurring.
>

Thank you for your advices :)
I'll review and add bug-check codes in memcg more.

Regards,
-Kame



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][rc1] cgroup: catch bad css refcnt at css_put
  2009-09-28  9:13     ` [PATCH][rc1] cgroup: catch bad css refcnt at css_put KAMEZAWA Hiroyuki
@ 2009-09-28 14:20       ` Paul Menage
  2009-09-29  3:06         ` KAMEZAWA Hiroyuki
  2009-09-29  0:50       ` Li Zefan
  1 sibling, 1 reply; 22+ messages in thread
From: Paul Menage @ 2009-09-28 14:20 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, akpm, mingo, balbir, nishimura, lizf

On Mon, Sep 28, 2009 at 2:13 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> This is a patch for checking css->refcnt's sanity at css_put().
>
> BTW, I noticed that...css->refcnt may overflow if used with memcg...
> Now, refcnt is incremented per a page. Paul, do you have any idea ?
> (Ah, yes. "don't use css->refcnt per page" is maybe reasonable but
>  it will be big change..)
>
> ==
> __css_put() doesn't check a bug as refcnt goes to minus.
> I think it should be caught. This patch adds a check for it.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Acked-by: Paul Menage <menage@google.com>

Looks reasonable, although there's no guarantee that it will warn on a
buggy release rather than a correct release that occur after a buggy
release.

> ---
>  kernel/cgroup.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> Index: linux-2.6.32-rc1/kernel/cgroup.c
> ===================================================================
> --- linux-2.6.32-rc1.orig/kernel/cgroup.c
> +++ linux-2.6.32-rc1/kernel/cgroup.c
> @@ -3708,8 +3708,10 @@ static void check_for_release(struct cgr
>  void __css_put(struct cgroup_subsys_state *css)
>  {
>        struct cgroup *cgrp = css->cgroup;
> +       int val;
>        rcu_read_lock();
> -       if (atomic_dec_return(&css->refcnt) == 1) {
> +       val = atomic_dec_return(&css->refcnt);
> +       if (val == 1) {
>                if (notify_on_release(cgrp)) {
>                        set_bit(CGRP_RELEASABLE, &cgrp->flags);
>                        check_for_release(cgrp);
> @@ -3717,6 +3719,7 @@ void __css_put(struct cgroup_subsys_stat
>                cgroup_wakeup_rmdir_waiter(cgrp);
>        }
>        rcu_read_unlock();
> +       WARN_ON(val < 1);
>  }
>
>  /*
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH -mmotm] ecryptfs: depends on CRYPTO
  2009-09-25 21:35 ` akpm
  (?)
  (?)
@ 2009-09-28 20:34 ` Randy Dunlap
  2009-09-29  0:10   ` Tyler Hicks
  -1 siblings, 1 reply; 22+ messages in thread
From: Randy Dunlap @ 2009-09-28 20:34 UTC (permalink / raw)
  To: akpm, Tyler Hicks; +Cc: linux-kernel, ecryptfs-devel

From: Randy Dunlap <randy.dunlap@oracle.com>

ecryptfs uses crypto APIs so it should depend on CRYPTO.
Otherwise many build errors occur. [63 lines not pasted]

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
---
 fs/ecryptfs/Kconfig |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- mmotm-2009-0925-1435.orig/fs/ecryptfs/Kconfig
+++ mmotm-2009-0925-1435/fs/ecryptfs/Kconfig
@@ -1,6 +1,6 @@
 config ECRYPT_FS
 	tristate "eCrypt filesystem layer support (EXPERIMENTAL)"
-	depends on EXPERIMENTAL && KEYS && NET
+	depends on EXPERIMENTAL && KEYS && NET && CRYPTO
 	select CRYPTO_ECB
 	select CRYPTO_CBC
 	help

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH -mmotm] ecryptfs: depends on CRYPTO
  2009-09-28 20:34 ` [PATCH -mmotm] ecryptfs: depends on CRYPTO Randy Dunlap
@ 2009-09-29  0:10   ` Tyler Hicks
  2009-09-29  0:20     ` Randy Dunlap
  0 siblings, 1 reply; 22+ messages in thread
From: Tyler Hicks @ 2009-09-29  0:10 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: akpm, linux-kernel, ecryptfs-devel, Dave Hansen

On 09/28/2009 03:34 PM, Randy Dunlap wrote:
> From: Randy Dunlap <randy.dunlap@oracle.com>
> 
> ecryptfs uses crypto APIs so it should depend on CRYPTO.
> Otherwise many build errors occur. [63 lines not pasted]
> 
> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
> ---
>  fs/ecryptfs/Kconfig |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- mmotm-2009-0925-1435.orig/fs/ecryptfs/Kconfig
> +++ mmotm-2009-0925-1435/fs/ecryptfs/Kconfig
> @@ -1,6 +1,6 @@
>  config ECRYPT_FS
>  	tristate "eCrypt filesystem layer support (EXPERIMENTAL)"
> -	depends on EXPERIMENTAL && KEYS && NET
> +	depends on EXPERIMENTAL && KEYS && NET && CRYPTO
>  	select CRYPTO_ECB
>  	select CRYPTO_CBC
>  	help

Hi Randy - Thanks for the patch!  Unfortunately, I think it defeats what
Dave Hansen was wanting to do with commit
382684984e93039a3bbd83b04d341b0ceb831519.

When I pulled that patch in, I was under the assumption that the select
would also select all necessary dependencies.  According to
Documentation/kbuild/kconfig-language.txt, that's not the case:

	select should be used with care. select will force
	a symbol to a value without visiting the dependencies.
        By abusing select you are able to select a symbol FOO even
        if FOO depends on BAR that is not set.

Maybe we should do it how other folks are tackling this problem and
select CRYPTO, along with CRYPTO_ECB and CRYPTO_CBC.  While we're at it,
we should probably throw in CRYPTO_AES (aes-128 is the default cipher,
but the cipher is configurable at mount so it might be too obtrusive for
us to select it) and CRYPTO_MD5 (our default hash alg, not currently
configurable).  Also, we don't depend on NET anymore because our netlink
interface is no longer around.  It may not hurt to select KEYS, rather
than depend on it.  Does all of this sound sane to you?

Tyler

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH -mmotm] ecryptfs: depends on CRYPTO
  2009-09-29  0:10   ` Tyler Hicks
@ 2009-09-29  0:20     ` Randy Dunlap
  2009-09-29 17:08       ` Tyler Hicks
  0 siblings, 1 reply; 22+ messages in thread
From: Randy Dunlap @ 2009-09-29  0:20 UTC (permalink / raw)
  To: Tyler Hicks; +Cc: akpm, linux-kernel, ecryptfs-devel, Dave Hansen

On Mon, 28 Sep 2009 19:10:00 -0500 Tyler Hicks wrote:

> On 09/28/2009 03:34 PM, Randy Dunlap wrote:
> > From: Randy Dunlap <randy.dunlap@oracle.com>
> > 
> > ecryptfs uses crypto APIs so it should depend on CRYPTO.
> > Otherwise many build errors occur. [63 lines not pasted]
> > 
> > Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
> > ---
> >  fs/ecryptfs/Kconfig |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > --- mmotm-2009-0925-1435.orig/fs/ecryptfs/Kconfig
> > +++ mmotm-2009-0925-1435/fs/ecryptfs/Kconfig
> > @@ -1,6 +1,6 @@
> >  config ECRYPT_FS
> >  	tristate "eCrypt filesystem layer support (EXPERIMENTAL)"
> > -	depends on EXPERIMENTAL && KEYS && NET
> > +	depends on EXPERIMENTAL && KEYS && NET && CRYPTO
> >  	select CRYPTO_ECB
> >  	select CRYPTO_CBC
> >  	help
> 
> Hi Randy - Thanks for the patch!  Unfortunately, I think it defeats what
> Dave Hansen was wanting to do with commit
> 382684984e93039a3bbd83b04d341b0ceb831519.
> 
> When I pulled that patch in, I was under the assumption that the select
> would also select all necessary dependencies.  According to
> Documentation/kbuild/kconfig-language.txt, that's not the case:
> 
> 	select should be used with care. select will force
> 	a symbol to a value without visiting the dependencies.
>         By abusing select you are able to select a symbol FOO even
>         if FOO depends on BAR that is not set.
> 
> Maybe we should do it how other folks are tackling this problem and
> select CRYPTO, along with CRYPTO_ECB and CRYPTO_CBC.  While we're at it,
> we should probably throw in CRYPTO_AES (aes-128 is the default cipher,
> but the cipher is configurable at mount so it might be too obtrusive for
> us to select it) and CRYPTO_MD5 (our default hash alg, not currently
> configurable).  Also, we don't depend on NET anymore because our netlink
> interface is no longer around.  It may not hurt to select KEYS, rather
> than depend on it.  Does all of this sound sane to you?

It selects too much stuff.  "select" should not be used to enable
a full subsystem (that's my general rule, not in kconfig-language.txt).
What kconfig-language.txt says that applies here is just after your
quoted text:

	In general use select only for non-visible symbols
	(no prompts anywhere) and for symbols with no dependencies.
	That will limit the usefulness but on the other hand avoid
	the illegal configurations all over.

CRYPTO does not fit that.

One of the big problems with selecting kconfig symbols (like subsystem
ones) is that it makes it difficult to disable that symbol, which some
of us often want to do.


---
~Randy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][rc1] cgroup: catch bad css refcnt at css_put
  2009-09-28  9:13     ` [PATCH][rc1] cgroup: catch bad css refcnt at css_put KAMEZAWA Hiroyuki
  2009-09-28 14:20       ` Paul Menage
@ 2009-09-29  0:50       ` Li Zefan
  2009-09-29  2:55         ` KAMEZAWA Hiroyuki
  2009-09-29  3:09         ` [PATCH][rc1] cgroup: catch bad css refcnt at css_put v2 KAMEZAWA Hiroyuki
  1 sibling, 2 replies; 22+ messages in thread
From: Li Zefan @ 2009-09-29  0:50 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, akpm, mingo, balbir, nishimura, menage

KAMEZAWA Hiroyuki wrote:
> This is a patch for checking css->refcnt's sanity at css_put().
> 
> BTW, I noticed that...css->refcnt may overflow if used with memcg...
> Now, refcnt is incremented per a page. Paul, do you have any idea ?

atomic64_t ?

But for 4K pagesize, it won't overflow until when the referenced
memory is > 8T?

> (Ah, yes. "don't use css->refcnt per page" is maybe reasonable but
>  it will be big change..)
> 
> ==
> __css_put() doesn't check a bug as refcnt goes to minus.
> I think it should be caught. This patch adds a check for it.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Acked-by: Li Zefan <lizf@cn.fujitsu.com>

> ---
>  kernel/cgroup.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6.32-rc1/kernel/cgroup.c
> ===================================================================
> --- linux-2.6.32-rc1.orig/kernel/cgroup.c
> +++ linux-2.6.32-rc1/kernel/cgroup.c
> @@ -3708,8 +3708,10 @@ static void check_for_release(struct cgr
>  void __css_put(struct cgroup_subsys_state *css)
>  {
>  	struct cgroup *cgrp = css->cgroup;
> +	int val;
>  	rcu_read_lock();
> -	if (atomic_dec_return(&css->refcnt) == 1) {
> +	val = atomic_dec_return(&css->refcnt);
> +	if (val == 1) {
>  		if (notify_on_release(cgrp)) {
>  			set_bit(CGRP_RELEASABLE, &cgrp->flags);
>  			check_for_release(cgrp);
> @@ -3717,6 +3719,7 @@ void __css_put(struct cgroup_subsys_stat
>  		cgroup_wakeup_rmdir_waiter(cgrp);
>  	}
>  	rcu_read_unlock();
> +	WARN_ON(val < 1);

When we run into this, it'll probably fill up the syslog quickly,
so I think WARN_ON_ONCE() is a bit better.

>  }
>  

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][rc1] cgroup: catch bad css refcnt at css_put
  2009-09-29  0:50       ` Li Zefan
@ 2009-09-29  2:55         ` KAMEZAWA Hiroyuki
  2009-09-29  3:09         ` [PATCH][rc1] cgroup: catch bad css refcnt at css_put v2 KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-29  2:55 UTC (permalink / raw)
  To: Li Zefan; +Cc: linux-kernel, akpm, mingo, balbir, nishimura, menage

On Tue, 29 Sep 2009 08:50:33 +0800
Li Zefan <lizf@cn.fujitsu.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> > This is a patch for checking css->refcnt's sanity at css_put().
> > 
> > BTW, I noticed that...css->refcnt may overflow if used with memcg...
> > Now, refcnt is incremented per a page. Paul, do you have any idea ?
> 
> atomic64_t ?
> 
maybe. atomic_long_t ?

> But for 4K pagesize, it won't overflow until when the referenced
> memory is > 8T?
> 
you're right. But there tends to be a few users who use unbelievable amounts
of memory in the world.
(Such user uses memcg or not is another problem ;)

> > (Ah, yes. "don't use css->refcnt per page" is maybe reasonable but
> >  it will be big change..)
> > 
> > ==
> > __css_put() doesn't check a bug as refcnt goes to minus.
> > I think it should be caught. This patch adds a check for it.
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Acked-by: Li Zefan <lizf@cn.fujitsu.com>
> 
> > ---
> >  kernel/cgroup.c |    5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > Index: linux-2.6.32-rc1/kernel/cgroup.c
> > ===================================================================
> > --- linux-2.6.32-rc1.orig/kernel/cgroup.c
> > +++ linux-2.6.32-rc1/kernel/cgroup.c
> > @@ -3708,8 +3708,10 @@ static void check_for_release(struct cgr
> >  void __css_put(struct cgroup_subsys_state *css)
> >  {
> >  	struct cgroup *cgrp = css->cgroup;
> > +	int val;
> >  	rcu_read_lock();
> > -	if (atomic_dec_return(&css->refcnt) == 1) {
> > +	val = atomic_dec_return(&css->refcnt);
> > +	if (val == 1) {
> >  		if (notify_on_release(cgrp)) {
> >  			set_bit(CGRP_RELEASABLE, &cgrp->flags);
> >  			check_for_release(cgrp);
> > @@ -3717,6 +3719,7 @@ void __css_put(struct cgroup_subsys_stat
> >  		cgroup_wakeup_rmdir_waiter(cgrp);
> >  	}
> >  	rcu_read_unlock();
> > +	WARN_ON(val < 1);
> 
> When we run into this, it'll probably fill up the syslog quickly,
> so I think WARN_ON_ONCE() is a bit better.
> 
Hmm, ok. I'll rewrite.

Thanks,
-Kame


> >  }
> >  
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH][rc1] cgroup: catch bad css refcnt at css_put
  2009-09-28 14:20       ` Paul Menage
@ 2009-09-29  3:06         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-29  3:06 UTC (permalink / raw)
  To: Paul Menage; +Cc: linux-kernel, akpm, mingo, balbir, nishimura, lizf

On Mon, 28 Sep 2009 07:20:08 -0700
Paul Menage <menage@google.com> wrote:

> On Mon, Sep 28, 2009 at 2:13 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > This is a patch for checking css->refcnt's sanity at css_put().
> >
> > BTW, I noticed that...css->refcnt may overflow if used with memcg...
> > Now, refcnt is incremented per a page. Paul, do you have any idea ?
> > (Ah, yes. "don't use css->refcnt per page" is maybe reasonable but
> >  it will be big change..)
> >
> > ==
> > __css_put() doesn't check a bug as refcnt goes to minus.
> > I think it should be caught. This patch adds a check for it.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Acked-by: Paul Menage <menage@google.com>
> 
Thanks.

> Looks reasonable, although there's no guarantee that it will warn on a
> buggy release rather than a correct release that occur after a buggy
> release.
> 
yes, it's a problem of refcnt.

Thanks,
-Kame


> > ---
> >  kernel/cgroup.c |    5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > Index: linux-2.6.32-rc1/kernel/cgroup.c
> > ===================================================================
> > --- linux-2.6.32-rc1.orig/kernel/cgroup.c
> > +++ linux-2.6.32-rc1/kernel/cgroup.c
> > @@ -3708,8 +3708,10 @@ static void check_for_release(struct cgr
> >  void __css_put(struct cgroup_subsys_state *css)
> >  {
> >        struct cgroup *cgrp = css->cgroup;
> > +       int val;
> >        rcu_read_lock();
> > -       if (atomic_dec_return(&css->refcnt) == 1) {
> > +       val = atomic_dec_return(&css->refcnt);
> > +       if (val == 1) {
> >                if (notify_on_release(cgrp)) {
> >                        set_bit(CGRP_RELEASABLE, &cgrp->flags);
> >                        check_for_release(cgrp);
> > @@ -3717,6 +3719,7 @@ void __css_put(struct cgroup_subsys_stat
> >                cgroup_wakeup_rmdir_waiter(cgrp);
> >        }
> >        rcu_read_unlock();
> > +       WARN_ON(val < 1);
> >  }
> >
> >  /*
> >
> >
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH][rc1] cgroup: catch bad css refcnt at css_put v2
  2009-09-29  0:50       ` Li Zefan
  2009-09-29  2:55         ` KAMEZAWA Hiroyuki
@ 2009-09-29  3:09         ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-09-29  3:09 UTC (permalink / raw)
  To: Li Zefan; +Cc: linux-kernel, akpm, mingo, balbir, nishimura, menage

__css_put() doesn't check a buggy case as refcnt goes to minus.
This patch adds a check for it.

Changelog:
  - using WARN_ON_ONCE() instead of WARN_ON()

Acked-by: Paul Menage <menage@google.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 kernel/cgroup.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Index: linux-2.6.32-rc1/kernel/cgroup.c
===================================================================
--- linux-2.6.32-rc1.orig/kernel/cgroup.c
+++ linux-2.6.32-rc1/kernel/cgroup.c
@@ -3708,8 +3708,10 @@ static void check_for_release(struct cgr
 void __css_put(struct cgroup_subsys_state *css)
 {
 	struct cgroup *cgrp = css->cgroup;
+	int val;
 	rcu_read_lock();
-	if (atomic_dec_return(&css->refcnt) == 1) {
+	val = atomic_dec_return(&css->refcnt);
+	if (val == 1) {
 		if (notify_on_release(cgrp)) {
 			set_bit(CGRP_RELEASABLE, &cgrp->flags);
 			check_for_release(cgrp);
@@ -3717,6 +3719,7 @@ void __css_put(struct cgroup_subsys_stat
 		cgroup_wakeup_rmdir_waiter(cgrp);
 	}
 	rcu_read_unlock();
+	WARN_ON_ONCE(val < 1);
 }
 
 /*


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH -mmotm] ecryptfs: depends on CRYPTO
  2009-09-29  0:20     ` Randy Dunlap
@ 2009-09-29 17:08       ` Tyler Hicks
  2009-09-29 17:17         ` Randy Dunlap
  0 siblings, 1 reply; 22+ messages in thread
From: Tyler Hicks @ 2009-09-29 17:08 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: akpm, linux-kernel, ecryptfs-devel, Dave Hansen

On 09/28/2009 07:20 PM, Randy Dunlap wrote:
> On Mon, 28 Sep 2009 19:10:00 -0500 Tyler Hicks wrote:
> 
>> On 09/28/2009 03:34 PM, Randy Dunlap wrote:
>>> From: Randy Dunlap <randy.dunlap@oracle.com>
>>>
>>> ecryptfs uses crypto APIs so it should depend on CRYPTO.
>>> Otherwise many build errors occur. [63 lines not pasted]
>>>
>>> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
>>> ---
>>>  fs/ecryptfs/Kconfig |    2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> --- mmotm-2009-0925-1435.orig/fs/ecryptfs/Kconfig
>>> +++ mmotm-2009-0925-1435/fs/ecryptfs/Kconfig
>>> @@ -1,6 +1,6 @@
>>>  config ECRYPT_FS
>>>  	tristate "eCrypt filesystem layer support (EXPERIMENTAL)"
>>> -	depends on EXPERIMENTAL && KEYS && NET
>>> +	depends on EXPERIMENTAL && KEYS && NET && CRYPTO
>>>  	select CRYPTO_ECB
>>>  	select CRYPTO_CBC
>>>  	help
>>
>> Hi Randy - Thanks for the patch!  Unfortunately, I think it defeats what
>> Dave Hansen was wanting to do with commit
>> 382684984e93039a3bbd83b04d341b0ceb831519.
>>
>> When I pulled that patch in, I was under the assumption that the select
>> would also select all necessary dependencies.  According to
>> Documentation/kbuild/kconfig-language.txt, that's not the case:
>>
>> 	select should be used with care. select will force
>> 	a symbol to a value without visiting the dependencies.
>>         By abusing select you are able to select a symbol FOO even
>>         if FOO depends on BAR that is not set.
>>
>> Maybe we should do it how other folks are tackling this problem and
>> select CRYPTO, along with CRYPTO_ECB and CRYPTO_CBC.  While we're at it,
>> we should probably throw in CRYPTO_AES (aes-128 is the default cipher,
>> but the cipher is configurable at mount so it might be too obtrusive for
>> us to select it) and CRYPTO_MD5 (our default hash alg, not currently
>> configurable).  Also, we don't depend on NET anymore because our netlink
>> interface is no longer around.  It may not hurt to select KEYS, rather
>> than depend on it.  Does all of this sound sane to you?
> 
> It selects too much stuff.  "select" should not be used to enable
> a full subsystem (that's my general rule, not in kconfig-language.txt).
> What kconfig-language.txt says that applies here is just after your
> quoted text:
> 
> 	In general use select only for non-visible symbols
> 	(no prompts anywhere) and for symbols with no dependencies.
> 	That will limit the usefulness but on the other hand avoid
> 	the illegal configurations all over.
> 
> CRYPTO does not fit that.
> 
> One of the big problems with selecting kconfig symbols (like subsystem
> ones) is that it makes it difficult to disable that symbol, which some
> of us often want to do.
> 
> 
> ---
> ~Randy

eCryptfs wouldn't be the first to select CRYPTO:

$ grep -r "select CRYPTO$" --include=Kconfig . | wc -l
26

But after trying to deselect CRYPTO with one of my custom configs, I
realized that you are right. :)  Depending on CRYPTO and then selecting
the proper CRYPTO_* symbols is the way to go.

Applied to
git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6.git#next

Thanks again!

Tyler

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH -mmotm] ecryptfs: depends on CRYPTO
  2009-09-29 17:08       ` Tyler Hicks
@ 2009-09-29 17:17         ` Randy Dunlap
  0 siblings, 0 replies; 22+ messages in thread
From: Randy Dunlap @ 2009-09-29 17:17 UTC (permalink / raw)
  To: Tyler Hicks; +Cc: akpm, linux-kernel, ecryptfs-devel, Dave Hansen

On Tue, 29 Sep 2009 12:08:55 -0500 Tyler Hicks wrote:

> On 09/28/2009 07:20 PM, Randy Dunlap wrote:
> > On Mon, 28 Sep 2009 19:10:00 -0500 Tyler Hicks wrote:
> > 
> >> On 09/28/2009 03:34 PM, Randy Dunlap wrote:
> >>> From: Randy Dunlap <randy.dunlap@oracle.com>
> >>>
> >>> ecryptfs uses crypto APIs so it should depend on CRYPTO.
> >>> Otherwise many build errors occur. [63 lines not pasted]
> >>>
> >>> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
> >>> ---
> >>>  fs/ecryptfs/Kconfig |    2 +-
> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> --- mmotm-2009-0925-1435.orig/fs/ecryptfs/Kconfig
> >>> +++ mmotm-2009-0925-1435/fs/ecryptfs/Kconfig
> >>> @@ -1,6 +1,6 @@
> >>>  config ECRYPT_FS
> >>>  	tristate "eCrypt filesystem layer support (EXPERIMENTAL)"
> >>> -	depends on EXPERIMENTAL && KEYS && NET
> >>> +	depends on EXPERIMENTAL && KEYS && NET && CRYPTO
> >>>  	select CRYPTO_ECB
> >>>  	select CRYPTO_CBC
> >>>  	help
> >>
> >> Hi Randy - Thanks for the patch!  Unfortunately, I think it defeats what
> >> Dave Hansen was wanting to do with commit
> >> 382684984e93039a3bbd83b04d341b0ceb831519.
> >>
> >> When I pulled that patch in, I was under the assumption that the select
> >> would also select all necessary dependencies.  According to
> >> Documentation/kbuild/kconfig-language.txt, that's not the case:
> >>
> >> 	select should be used with care. select will force
> >> 	a symbol to a value without visiting the dependencies.
> >>         By abusing select you are able to select a symbol FOO even
> >>         if FOO depends on BAR that is not set.
> >>
> >> Maybe we should do it how other folks are tackling this problem and
> >> select CRYPTO, along with CRYPTO_ECB and CRYPTO_CBC.  While we're at it,
> >> we should probably throw in CRYPTO_AES (aes-128 is the default cipher,
> >> but the cipher is configurable at mount so it might be too obtrusive for
> >> us to select it) and CRYPTO_MD5 (our default hash alg, not currently
> >> configurable).  Also, we don't depend on NET anymore because our netlink
> >> interface is no longer around.  It may not hurt to select KEYS, rather
> >> than depend on it.  Does all of this sound sane to you?
> > 
> > It selects too much stuff.  "select" should not be used to enable
> > a full subsystem (that's my general rule, not in kconfig-language.txt).
> > What kconfig-language.txt says that applies here is just after your
> > quoted text:
> > 
> > 	In general use select only for non-visible symbols
> > 	(no prompts anywhere) and for symbols with no dependencies.
> > 	That will limit the usefulness but on the other hand avoid
> > 	the illegal configurations all over.
> > 
> > CRYPTO does not fit that.
> > 
> > One of the big problems with selecting kconfig symbols (like subsystem
> > ones) is that it makes it difficult to disable that symbol, which some
> > of us often want to do.
> > 
> > 
> > ---
> > ~Randy
> 
> eCryptfs wouldn't be the first to select CRYPTO:
> 
> $ grep -r "select CRYPTO$" --include=Kconfig . | wc -l
> 26

Yes, I realize that.  and INPUT is selected to much also.  :(


> But after trying to deselect CRYPTO with one of my custom configs, I
> realized that you are right. :)  Depending on CRYPTO and then selecting
> the proper CRYPTO_* symbols is the way to go.
> 
> Applied to
> git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6.git#next
> 
> Thanks again!

Thanks, glad that you tried the experiment.

---
~Randy

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-09-29 17:18 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-25 21:35 mmotm 2009-09-25-14-35 uploaded akpm
2009-09-25 21:35 ` akpm
2009-09-28  6:42 ` Question: RCU stall detected in memcg (Re: " KAMEZAWA Hiroyuki
2009-09-28  8:33   ` KAMEZAWA Hiroyuki
2009-09-28  9:06   ` [BUGFIX][PATCH][rc1] memcg: fix refcnt goes to minus KAMEZAWA Hiroyuki
2009-09-28  9:13     ` [PATCH][rc1] cgroup: catch bad css refcnt at css_put KAMEZAWA Hiroyuki
2009-09-28 14:20       ` Paul Menage
2009-09-29  3:06         ` KAMEZAWA Hiroyuki
2009-09-29  0:50       ` Li Zefan
2009-09-29  2:55         ` KAMEZAWA Hiroyuki
2009-09-29  3:09         ` [PATCH][rc1] cgroup: catch bad css refcnt at css_put v2 KAMEZAWA Hiroyuki
2009-09-28  9:32     ` [BUGFIX][PATCH][rc1] memcg: fix refcnt goes to minus Balbir Singh
2009-09-28 11:22       ` KAMEZAWA Hiroyuki
2009-09-28  9:22   ` Question: RCU stall detected in memcg (Re: mmotm 2009-09-25-14-35 uploaded Balbir Singh
2009-09-28 11:26     ` KAMEZAWA Hiroyuki
2009-09-28  9:34   ` Paul E. McKenney
2009-09-28 11:30     ` KAMEZAWA Hiroyuki
2009-09-28 20:34 ` [PATCH -mmotm] ecryptfs: depends on CRYPTO Randy Dunlap
2009-09-29  0:10   ` Tyler Hicks
2009-09-29  0:20     ` Randy Dunlap
2009-09-29 17:08       ` Tyler Hicks
2009-09-29 17:17         ` Randy Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.