All of lore.kernel.org
 help / color / mirror / Atom feed
* v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
@ 2012-04-09  2:42 Linus Torvalds
  2012-04-09  2:50 ` Andrew Morton
  0 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2012-04-09  2:42 UTC (permalink / raw)
  To: Andrew Morton, David Rientjes, Rik van Riel, Hugh Dickins, werner
  Cc: linux-kernel

Guys, there's something wrong in the VM. Most likely suspects added to
the participants list.

Apparently things go south and the oom killer is invoked. X.org seems
to get killed.

Any hints? Werner traditionally finds problems by enabling every
single config option there is, I assume this is another of those
kernes..

                    Linus

On Sun, Apr 8, 2012 at 5:21 PM, werner <w.landgraf@ru.ru> wrote:
> 3.4-rc1 sticks and crashs
>
> The problem continues also with 3.4-rc2  .   The computer boots and runs
> during hours normal, then suddenly everything sticks,
> in a part of the cases the computer crashs.      The computer wasn't working
> hard, almost idle.
>
> In the example below, at the end it didn't crash completely, but only kde3 .
>  However, then i rebooted (last line).
>
> In 3.3 this didnt happen, thus, it's a regression.
>
>
> Werner Landgraf
>
>
>
>
>
> Apr  8 13:55:10 werner kernel: Malformed early option 'acpi'
> Apr  8 13:55:10 werner kernel: ACPI: RSDP 000f6920 00014 (v00 GBT   )
> Apr  8 13:55:10 werner kernel: ACPI: RSDT bfee3000 00038 (v01 GBT
>  NVDAACPI 42302E31 NVDA 01010101)
> Apr  8 13:55:10 werner kernel: ACPI: FACP bfee3040 00074 (v01 GBT
>  NVDAACPI 42302E31 NVDA 01010101)
> Apr  8 13:55:10 werner kernel: ACPI: DSDT bfee30c0 0469E (v01 GBT
>  NVDAACPI 00001000 MSFT 03000000)
> Apr  8 13:55:10 werner kernel: ACPI: FACS bfee0000 00040
> Apr  8 13:55:10 werner kernel: ACPI: SSDT bfee7880 0028A (v01 PTLTD
>  POWERNOW 00000001  LTP 00000001)
> Apr  8 13:55:10 werner kernel: ACPI: HPET bfee7b40 00038 (v01 GBT
>  NVDAACPI 42302E31 NVDA 00000098)
> Apr  8 13:55:10 werner kernel: ACPI: MCFG bfee7b80 0003C (v01 GBT
>  NVDAACPI 42302E31 NVDA 01010101)
> Apr  8 13:55:10 werner kernel: ACPI: APIC bfee7780 000D0 (v01 GBT
>  NVDAACPI 42302E31 NVDA 01010101)
> Apr  8 13:55:10 werner kernel: Zone PFN ranges:
> Apr  8 13:55:10 werner kernel:   DMA      0x00000010 -> 0x00001000
> Apr  8 13:55:10 werner kernel:   Normal   0x00001000 -> 0x000377fe
> Apr  8 13:55:10 werner kernel:   HighMem  0x000377fe -> 0x000bfee0
> Apr  8 13:55:10 werner kernel: Movable zone start PFN for each node
> Apr  8 13:55:10 werner kernel: Early memory PFN ranges
> Apr  8 13:55:10 werner kernel:     0: 0x00000010 -> 0x0000009f
> Apr  8 13:55:10 werner kernel:     0: 0x00000100 -> 0x000bfee0
> Apr  8 13:55:13 werner udevd-event[9104]: run_program: exec of program
> '/etc/rc.d/rc.media-daemon' failed
> Apr  8 13:55:13 werner kernel: Built 1 zonelists in Zone order, mobility
> grouping on.  Total pages: 779889
> Apr  8 13:55:13 werner kernel: Fast TSC calibration using PIT
> Apr  8 13:55:13 werner kernel: Detected 2511.428 MHz processor.
> Apr  8 13:55:13 werner udevd-event[9099]: udev_rules_apply_format: unknown
> format variable '$modalias | grep -q eagle-usb || exit; while !
> /sbin/eaglectrl -p 2>/dev/null | /bin/grep -q Post-firmware; do sleep 2;
> done; /sbin/eaglectrl -d''
> Apr  8 13:55:13 werner kernel: ACPI: setting ELCR to 0200 (from 0c20)
> Apr  8 13:55:13 werner kernel: AMD PMU driver.
> Apr  8 13:55:13 werner kernel: raid6: int32x1    945 MB/s
> Apr  8 13:55:13 werner kernel: raid6: int32x2    933 MB/s
> Apr  8 13:55:13 werner kernel: raid6: int32x4    933 MB/s
> Apr  8 13:55:13 werner kernel: raid6: int32x8    613 MB/s
> Apr  8 13:55:13 werner kernel: raid6: mmxx1     2035 MB/s
> Apr  8 13:55:13 werner kernel: raid6: mmxx2     3582 MB/s
> Apr  8 13:55:13 werner kernel: raid6: sse1x1    2023 MB/s
> Apr  8 13:55:13 werner kernel: raid6: sse1x2    3496 MB/s
> Apr  8 13:55:13 werner kernel: raid6: sse2x1    3445 MB/s
> Apr  8 13:55:13 werner kernel: raid6: sse2x2    4582 MB/s
> Apr  8 13:55:13 werner kernel: raid6: using algorithm sse2x2 (4582 MB/s)
> Apr  8 13:55:13 werner kernel: Expanded resource reserved due to conflict
> with PCI Bus 0000:00
> Apr  8 13:55:13 werner udevd-event[9098]: udev_rules_apply_format: unknown
> format variable '$modalias | grep -q eagle-usb || exit; while !
> /sbin/eaglectrl -p 2>/dev/null | /bin/grep -q Post-firmware; do sleep 2;
> done; /sbin/eaglectrl -d''
> Apr  8 13:55:13 werner kernel: mdacon: MDA with 8K of memory detected.
> Apr  8 13:55:13 werner kernel: ACPI: PCI Interrupt Link [LUBA] enabled at
> IRQ 10
> Apr  8 13:55:13 werner kernel: ACPI: PCI Interrupt Link [LUB2] enabled at
> IRQ 11
> Apr  8 13:55:13 werner kernel: microcode: CPU0: family 15 not supported
> Apr  8 13:55:13 werner kernel: The force parameter has not been set to 1 so
> the Iris poweroff handler will not be installed.
> Apr  8 13:55:13 werner kernel: highmem bounce pool size: 64 pages
> Apr  8 13:55:13 werner kernel: Dquot-cache hash table entries: 1024 (order
> 0, 4096 bytes)
> Apr  8 13:55:13 werner kernel: DLM installed
> Apr  8 13:55:13 werner kernel: EFS: 1.0a - http://aeschi.ch.eu.org/efs/
> Apr  8 13:55:13 werner kernel: OCFS2 User DLM kernel interface loaded
> Apr  8 13:55:13 werner kernel: GFS2 installed
> Apr  8 13:55:13 werner kernel: acpiphp_ibm: ibm_acpiphp_init:
> acpi_walk_namespace failed
> Apr  8 13:55:13 werner kernel: ACPI: PCI Interrupt Link [LIGP] enabled at
> IRQ 10
> Apr  8 13:55:13 werner kernel: nvidiafb: CRTC0 analog found
> Apr  8 13:55:13 werner kernel: nvidiafb: CRTC1 analog not found
> Apr  8 13:55:13 werner kernel: i2c i2c-0: unable to read EDID block.
> Apr  8 13:55:13 werner last message repeated 2 times
> Apr  8 13:55:13 werner kernel: nvidiafb: EDID found from BUS2
> Apr  8 13:55:13 werner kernel: nvidiafb: CRTC 0 appears to have a CRT
> attached
> Apr  8 13:55:13 werner kernel: nvidiafb: Using CRT on CRTC 0
> Apr  8 13:55:13 werner kernel: Could not find Carillo Ranch MCH device.
> Apr  8 13:55:13 werner kernel: no IO addresses supplied
> Apr  8 13:55:13 werner kernel: hgafb: probe of hgafb.0 failed with error -22
> Apr  8 13:55:13 werner kernel: uvesafb: failed to execute /sbin/v86d
> Apr  8 13:55:13 werner kernel: uvesafb: make sure that the v86d helper is
> installed and executable
> Apr  8 13:55:13 werner kernel: uvesafb: Getting VBE info block failed
> (eax=0x4f00, err=-2)
> Apr  8 13:55:13 werner kernel: uvesafb: vbe_init() failed with -22
> Apr  8 13:55:13 werner kernel: uvesafb: probe of uvesafb.0 failed with error
> -22
> Apr  8 13:55:13 werner kernel: vesafb: cannot reserve video memory at
> 0xd0000000
> Apr  8 13:55:13 werner kernel: toshiba: not a supported Toshiba laptop
> Apr  8 13:55:13 werner kernel: [drm:i915_init] *ERROR* drm/i915 can't work
> without intel_agp module!
> Apr  8 13:55:13 werner kernel: Compaq SMART2 Driver (v 2.6.0)
> Apr  8 13:55:13 werner kernel: i2c-core: driver [isl29003] using legacy
> suspend method
> Apr  8 13:55:13 werner kernel: i2c-core: driver [isl29003] using legacy
> resume method
> Apr  8 13:55:13 werner kernel: amd74xx 0000:00:06.0: BIOS didn't set cable
> bits correctly. Enabling workaround.
> Apr  8 13:55:13 werner kernel: Loading Adaptec I2O RAID: Version 2.4 Build
> 5go
> Apr  8 13:55:13 werner kernel: scsi: <fdomain> Detection failed (no card)
> Apr  8 13:55:13 werner kernel: NCR53c406a: no available ports found
> Apr  8 13:55:13 werner kernel: qla2xxx [0000:00:00.0]-0005: : QLogic Fibre
> Channel HBA Driver: 8.03.07.13-k.
> Apr  8 13:55:13 werner kernel: Emulex LightPulse Fibre Channel SCSI driver
> 8.3.30
> Apr  8 13:55:13 werner kernel: Copyright(c) 2004-2009 Emulex.  All rights
> reserved.
> Apr  8 13:55:13 werner kernel: Failed initialization of WD-7000 SCSI card!
> Apr  8 13:55:13 werner kernel: GDT-HA: Storage RAID Controller Driver.
> Version: 3.05
> Apr  8 13:55:13 werner kernel: 3ware Storage Controller device driver for
> Linux v1.26.02.003.
> Apr  8 13:55:13 werner kernel: 3ware 9000 Storage Controller device driver
> for Linux v2.26.02.014.
> Apr  8 13:55:13 werner kernel: imm: Version 2.05 (for Linux 2.4.0)
> Apr  8 13:55:13 werner kernel: ACPI: PCI Interrupt Link [LSID] enabled at
> IRQ 11
> Apr  8 13:55:13 werner kernel: ACPI: PCI Interrupt Link [LFID] enabled at
> IRQ 10
> Apr  8 13:55:13 werner kernel: Error: Driver 'pata_platform' is already
> registered, aborting...
> Apr  8 13:55:13 werner kernel: physmap-flash.0: failed to claim resource 0
> Apr  8 13:55:13 werner kernel: Failed to ioremap_nocache
> Apr  8 13:55:13 werner last message repeated 2 times
> Apr  8 13:55:13 werner kernel: SNAPGEAR: failed to ioremap() BOOTCS
> Apr  8 13:55:13 werner kernel: Generic platform RAM MTD, (c) 2004 Simtec
> Electronics
> Apr  8 13:55:13 werner kernel: [nandsim] warning: read_byte: unexpected data
> output cycle, state is STATE_READY return 0x0
> Apr  8 13:55:13 werner last message repeated 5 times
> Apr  8 13:55:13 werner kernel: flash size: 8 MiB
> Apr  8 13:55:13 werner kernel: page size: 512 bytes
> Apr  8 13:55:13 werner kernel: OOB area size: 16 bytes
> Apr  8 13:55:13 werner kernel: sector size: 8 KiB
> Apr  8 13:55:13 werner kernel: pages number: 16384
> Apr  8 13:55:13 werner kernel: pages per sector: 16
> Apr  8 13:55:13 werner kernel: bus width: 8
> Apr  8 13:55:13 werner kernel: bits in sector size: 13
> Apr  8 13:55:13 werner kernel: bits in page size: 9
> Apr  8 13:55:13 werner kernel: bits in OOB size: 4
> Apr  8 13:55:13 werner kernel: flash size with OOB: 8448 KiB
> Apr  8 13:55:13 werner kernel: page address bytes: 3
> Apr  8 13:55:13 werner kernel: sector address bytes: 2
> Apr  8 13:55:13 werner kernel: options: 0x62
> Apr  8 13:55:13 werner kernel: onenand_wait: timeout! ctrl=0x0000
> intr=0x0000
> Apr  8 13:55:13 werner kernel: DE600: port 0x378 busy
> Apr  8 13:55:13 werner kernel: paride: aten registered as protocol 0
> Apr  8 13:55:13 werner kernel: paride: bpck registered as protocol 1
> Apr  8 13:55:13 werner kernel: paride: comm registered as protocol 2
> Apr  8 13:55:13 werner kernel: paride: dstr registered as protocol 3
> Apr  8 13:55:13 werner kernel: paride: k951 registered as protocol 4
> Apr  8 13:55:13 werner kernel: paride: k971 registered as protocol 5
> Apr  8 13:55:13 werner kernel: paride: epat registered as protocol 6
> Apr  8 13:55:13 werner kernel: paride: epia registered as protocol 7
> Apr  8 13:55:13 werner kernel: paride: frpw registered as protocol 8
> Apr  8 13:55:13 werner kernel: paride: friq registered as protocol 9
> Apr  8 13:55:13 werner kernel: paride: fit2 registered as protocol 10
> Apr  8 13:55:13 werner kernel: paride: fit3 registered as protocol 11
> Apr  8 13:55:13 werner kernel: paride: on20 registered as protocol 12
> Apr  8 13:55:13 werner kernel: paride: on26 registered as protocol 13
> Apr  8 13:55:13 werner kernel: paride: ktti registered as protocol 14
> Apr  8 13:55:13 werner kernel: paride: bpck6 registered as protocol 15
> Apr  8 13:55:13 werner kernel: pd: pd version 1.05, major 45, cluster 64,
> nice 0
> Apr  8 13:55:13 werner kernel: pda: Autoprobe failed
> Apr  8 13:55:13 werner kernel: pd: no valid drive found
> Apr  8 13:55:13 werner kernel: pcd: pcd version 1.07, major 46, nice 0
> Apr  8 13:55:13 werner kernel: pcd0: Autoprobe failed
> Apr  8 13:55:13 werner kernel: pcd: No CD-ROM drive found
> Apr  8 13:55:13 werner kernel: pf: pf version 1.04, major 47, cluster 64,
> nice 0
> Apr  8 13:55:13 werner kernel: pf: No ATAPI disk detected
> Apr  8 13:55:13 werner kernel: pt: pt version 1.04, major 96
> Apr  8 13:55:13 werner kernel: sr0: scsi3-mmc drive: 48x/48x writer dvd-ram
> cd/rw xa/form2 cdda tray
> Apr  8 13:55:13 werner kernel: pt0: Autoprobe failed
> Apr  8 13:55:13 werner kernel: pt: No ATAPI tape drive detected
> Apr  8 13:55:13 werner kernel: pg: pg version 1.02, major 97
> Apr  8 13:55:13 werner kernel: pga: Autoprobe failed
> Apr  8 13:55:13 werner kernel: pg: No ATAPI device detected
> Apr  8 13:55:13 werner kernel: mk712: device not present
> Apr  8 13:55:13 werner kernel: wistron_btns: System unknown
> Apr  8 13:55:13 werner kernel: EISA: Cannot allocate resource for mainboard
> Apr  8 13:55:13 werner kernel: Cannot allocate resource for EISA slot 1
> Apr  8 13:55:13 werner kernel: Cannot allocate resource for EISA slot 2
> Apr  8 13:55:13 werner kernel: Cannot allocate resource for EISA slot 3
> Apr  8 13:55:13 werner kernel: Cannot allocate resource for EISA slot 4
> Apr  8 13:55:13 werner kernel: Cannot allocate resource for EISA slot 5
> Apr  8 13:55:13 werner kernel: Cannot allocate resource for EISA slot 6
> Apr  8 13:55:13 werner kernel: Cannot allocate resource for EISA slot 7
> Apr  8 13:55:13 werner kernel: Cannot allocate resource for EISA slot 8
> Apr  8 13:55:13 werner kernel: asus_wmi: Management GUID not found
> Apr  8 13:55:13 werner kernel: asus_wmi: Management GUID not found
> Apr  8 13:55:13 werner kernel: compal_laptop: Motherboard not recognized
> (You could try the module's force-parameter)
> Apr  8 13:55:13 werner kernel: dell_wmi: No known WMI GUID found
> Apr  8 13:55:13 werner kernel: dell_wmi_aio: No known WMI GUID found
> Apr  8 13:55:13 werner kernel: acer_wmi: No or unsupported WMI interface,
> unable to load
> Apr  8 13:55:13 werner kernel: acerhdf: unknown (unsupported) BIOS version
> Gigabyte Technology Co., Ltd./M68M-S2P/FC, please report, aborting!
> Apr  8 13:55:13 werner kernel: hdaps: supported laptop not found!
> Apr  8 13:55:13 werner kernel: hdaps: driver init failed (ret=-19)!
> Apr  8 13:55:13 werner kernel: msi_wmi: This machine doesn't have
> MSI-hotkeys through WMI
> Apr  8 13:55:13 werner kernel: intel_oaktrail: Platform not recognized (You
> could try the module's force-parameter)
> Apr  8 13:55:13 werner kernel: OK
> Apr  8 13:55:13 werner kernel: OK
> Apr  8 13:55:13 werner kernel: register_blkdev: cannot get major 3 for hd
> Apr  8 13:55:13 werner kernel: drivers/rtc/hctosys.c: unable to open rtc
> device (rtc0)
> Apr  8 13:55:13 werner kernel: udevd (5372): /proc/5372/oom_adj is
> deprecated, please use /proc/5372/oom_score_adj instead.
> Apr  8 13:55:13 werner kernel: end_request: I/O error, dev fd0, sector 0
> Apr  8 13:55:13 werner kernel: ACPI: PCI Interrupt Link [LMAC] enabled at
> IRQ 11
> Apr  8 13:55:13 werner kernel: ACPI: PCI Interrupt Link [LAZA] enabled at
> IRQ 5
> Apr  8 13:55:13 werner kernel: ACPI: PCI Interrupt Link [LNK2] enabled at
> IRQ 5
> Apr  8 13:55:13 werner kernel: 2:3:1: cannot get freq at ep 0x84
> Apr  8 13:55:13 werner kernel: k8temp 0000:00:18.3: Temperature readouts
> might be wrong - check erratum #141
> Apr  8 13:55:13 werner kernel: EXT3-fs (sda1): warning: maximal mount count
> reached, running e2fsck is recommended
> Apr  8 13:55:22 werner apcupsd[9080]: apcupsd FATAL ERROR in smartsetup.c at
> line 171 PANIC! Cannot communicate with UPS via serial port. Please make
> sure the port specified on the DEVICE directive is correct, and that your
> cable specification on the UPSCABLE directive is correct.
> Apr  8 13:55:22 werner apcupsd[9080]: apcupsd error shutdown completed
> Apr  8 13:55:30 werner hpijs[9358]: prnt/hpijs/hpijs.cpp 614: unable to init
> hpijs server
> Apr  8 13:55:35 werner udevd[5372]: add_to_rules: unknown key 'MODALIAS' in
> /etc/udev/rules.d/80-eagle-usb.rules:1
> Apr  8 13:56:03 werner kdm_greet[9504]: Can't open default user face
> Apr  8 13:57:37 werner named[10001]: /etc/named.conf:3: option
> 'multiple-cnames' is obsolete
> Apr  8 20:29:10 werner kernel: iwconfig invoked oom-killer:
> gfp_mask=0x800d0, order=0, oom_adj=0, oom_score_adj=0
> Apr  8 20:29:11 werner kernel: Pid: 31155, comm: iwconfig Not tainted
> 3.4.0-rc2-i486-1sys #1
> Apr  8 20:29:11 werner kernel: Call Trace:
> Apr  8 20:29:11 werner kernel:  [<c10356ff>] ? printk+0x20/0x22
> Apr  8 20:29:11 werner kernel:  [<c10af32b>] dump_header+0x6f/0x95
> Apr  8 20:29:11 werner kernel:  [<c10af53f>] oom_kill_process+0x52/0x251
> Apr  8 20:29:11 werner kernel:  [<c10af803>] ? select_bad_process+0xc5/0x11c
> Apr  8 20:29:11 werner kernel:  [<c10af99e>] out_of_memory+0x144/0x1b4
> Apr  8 20:29:11 werner kernel:  [<c10b269c>]
> __alloc_pages_nodemask+0x501/0x63f
> Apr  8 20:29:11 werner kernel:  [<c10b27f6>] __get_free_pages+0x1c/0x2d
> Apr  8 20:29:11 werner kernel:  [<c113197d>] do_proc_readlink+0x27/0x7b
> Apr  8 20:29:11 werner kernel:  [<c1133367>] proc_pid_readlink+0x48/0x5b
> Apr  8 20:29:11 werner kernel:  [<c10ee9c2>] sys_readlinkat+0x81/0x95
> Apr  8 20:29:11 werner kernel:  [<c10eea02>] sys_readlink+0x2c/0x2e
> Apr  8 20:29:11 werner kernel:  [<c208c05c>] syscall_call+0x7/0xb
> Apr  8 20:29:11 werner kernel: Mem-Info:
> Apr  8 20:29:11 werner kernel: DMA per-cpu:
> Apr  8 20:29:11 werner kernel: CPU    0: hi:    0, btch:  1 usd:   0
> Apr  8 20:29:11 werner kernel: Normal per-cpu:
> Apr  8 20:29:11 werner kernel: CPU    0: hi:  186, btch: 31 usd:  89
> Apr  8 20:29:11 werner kernel: HighMem per-cpu:
> Apr  8 20:29:11 werner kernel: CPU    0: hi:  186, btch: 31 usd:  87
> Apr  8 20:29:11 werner kernel: active_anon:80153 inactive_anon:71 isolated_anon:0
> Apr  8 20:29:11 werner kernel:  active_file:15189 inactive_file:21520 isolated_file:0
> Apr  8 20:29:11 werner kernel:  unevictable:0 dirty:1 writeback:0 unstable:0
> Apr  8 20:29:11 werner kernel:  free:445370 slab_reclaimable:3273 slab_unreclaimable:37492
> Apr  8 20:29:11 werner kernel:  mapped:19922 shmem:711 pagetables:597 bounce:0
> Apr  8 20:29:11 werner kernel: DMA free:4240kB min:784kB low:980kB
> high:1176kB active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:24kB slab_unreclaimable:2184kB kernel_stack:9392kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? yes
> Apr  8 20:29:11 werner kernel: lowmem_reserve[]: 0 865 3031 3031
> Apr  8 20:29:11 werner kernel: Normal free:44004kB min:44012kB low:55012kB
> high:66016kB active_anon:0kB inactive_anon:0kB active_file:132kB
> inactive_file:140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:13068kB slab_unreclaimable:147784kB kernel_stack:628952kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1376
> all_unreclaimable? yes
> Apr  8 20:29:11 werner kdm[9483]: X server for display :0 terminated unexpectedly
> Apr  8 20:29:11 werner kernel: lowmem_reserve[]: 0 0 17326 17326
> Apr  8 20:29:11 werner kernel: HighMem free:1733236kB min:512kB low:28056kB
> high:55600kB active_anon:320612kB inactive_anon:284kB active_file:60624kB
> inactive_file:85940kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:2217808kB mlocked:0kB dirty:4kB writeback:0kB mapped:79688kB
> shmem:2844kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
> pagetables:2388kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> Apr  8 20:29:11 werner kernel: lowmem_reserve[]: 0 0 0 0
> Apr  8 20:29:11 werner kernel: DMA: 116*4kB 18*8kB 1*16kB 1*32kB 0*64kB
> 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4240kB
> Apr  8 20:29:11 werner kernel: Normal: 1527*4kB 535*8kB 279*16kB 177*32kB
> 65*64kB 23*128kB 6*256kB 3*512kB 1*1024kB 0*2048kB 3*4096kB = 44004kB
> Apr  8 20:29:11 werner kernel: HighMem: 6291*4kB 5495*8kB 3815*16kB
> 2528*32kB 1406*64kB 597*128kB 228*256kB 104*512kB 63*1024kB 18*2048kB
> 279*4096kB = 1733236kB
> Apr  8 20:29:11 werner kernel: 37420 total pagecache pages
> Apr  8 20:29:11 werner kernel: 0 pages in swap cache
> Apr  8 20:29:11 werner kernel: Swap cache stats: add 0, delete 0, find 0/0
> Apr  8 20:29:11 werner kernel: Free swap  = 0kB
> Apr  8 20:29:11 werner kernel: Total swap = 0kB
> Apr  8 20:29:11 werner kernel: 786128 pages RAM
> Apr  8 20:29:11 werner kernel: 558818 pages HighMem
> Apr  8 20:29:11 werner kernel: 13582 pages reserved
> Apr  8 20:29:11 werner kernel: 109538 pages shared
> Apr  8 20:29:11 werner kernel: 223297 pages non-shared
> Apr  8 20:29:11 werner kernel: Out of memory: Kill process 9499 (X) score 29
> or sacrifice child
> Apr  8 20:29:11 werner kernel: Killed process 9499 (X) total-vm:451020kB,
> anon-rss:178420kB, file-rss:3668kB
> Apr  8 20:29:12 werner kernel: iwconfig invoked oom-killer:
> gfp_mask=0x800d0, order=0, oom_adj=0, oom_score_adj=0
> Apr  8 20:29:12 werner kernel: Pid: 31155, comm: iwconfig Not tainted
> 3.4.0-rc2-i486-1sys #1
> Apr  8 20:29:12 werner kernel: Call Trace:
> Apr  8 20:29:12 werner kernel:  [<c10356ff>] ? printk+0x20/0x22
> Apr  8 20:29:12 werner kernel:  [<c10af32b>] dump_header+0x6f/0x95
> Apr  8 20:29:12 werner kernel:  [<c10af53f>] oom_kill_process+0x52/0x251
> Apr  8 20:29:13 werner kernel:  [<c10af803>] ? select_bad_process+0xc5/0x11c
> Apr  8 20:29:13 werner kernel:  [<c10af99e>] out_of_memory+0x144/0x1b4
> Apr  8 20:29:13 werner kernel:  [<c10b269c>]
> __alloc_pages_nodemask+0x501/0x63f
> Apr  8 20:29:13 werner kernel:  [<c10b27f6>] __get_free_pages+0x1c/0x2d
> Apr  8 20:29:13 werner kernel:  [<c113197d>] do_proc_readlink+0x27/0x7b
> Apr  8 20:29:13 werner kernel:  [<c1133367>] proc_pid_readlink+0x48/0x5b
> Apr  8 20:29:13 werner kernel:  [<c10ee9c2>] sys_readlinkat+0x81/0x95
> Apr  8 20:29:13 werner kernel:  [<c10eea02>] sys_readlink+0x2c/0x2e
> Apr  8 20:29:14 werner kernel:  [<c208c05c>] syscall_call+0x7/0xb
> Apr  8 20:29:14 werner kernel: Mem-Info:
> Apr  8 20:29:14 werner kernel: DMA per-cpu:
> Apr  8 20:29:14 werner kernel: CPU    0: hi:    0, btch:  1 usd:   0
> Apr  8 20:29:14 werner kernel: Normal per-cpu:
> Apr  8 20:29:14 werner kernel: CPU    0: hi:  186, btch: 31 usd:  95
> Apr  8 20:29:14 werner kernel: HighMem per-cpu:
> Apr  8 20:29:14 werner kernel: CPU    0: hi:  186, btch: 31 usd: 154
> Apr  8 20:29:14 werner kernel: active_anon:35273 inactive_anon:32
> isolated_anon:0
> Apr  8 20:29:14 werner kernel:  active_file:15228 inactive_file:21481
> isolated_file:0
> Apr  8 20:29:14 werner kernel:  unevictable:0 dirty:1 writeback:0 unstable:0
> Apr  8 20:29:14 werner kernel:  free:490360 slab_reclaimable:3273
> slab_unreclaimable:37492
> Apr  8 20:29:14 werner kernel:  mapped:19048 shmem:673 pagetables:483
> bounce:0
> Apr  8 20:29:14 werner kernel: DMA free:4240kB min:784kB low:980kB
> high:1176kB active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:24kB slab_unreclaimable:2184kB kernel_stack:9392kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? yes
> Apr  8 20:29:14 werner kernel: lowmem_reserve[]: 0 865 3031 3031
> Apr  8 20:29:14 werner kernel: Normal free:44004kB min:44012kB low:55012kB
> high:66016kB active_anon:0kB inactive_anon:0kB active_file:132kB
> inactive_file:140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:13068kB slab_unreclaimable:147784kB kernel_stack:628952kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:13024
> all_unreclaimable? yes
> Apr  8 20:29:14 werner kernel: lowmem_reserve[]: 0 0 17326 17326
> Apr  8 20:29:14 werner kernel: HighMem free:1913196kB min:512kB low:28056kB
> high:55600kB active_anon:141092kB inactive_anon:128kB active_file:60780kB
> inactive_file:85784kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:2217808kB mlocked:0kB dirty:4kB writeback:0kB mapped:76192kB
> shmem:2692kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
> pagetables:1932kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> Apr  8 20:29:14 werner kernel: lowmem_reserve[]: 0 0 0 0
> Apr  8 20:29:14 werner kernel: DMA: 116*4kB 18*8kB 1*16kB 1*32kB 0*64kB
> 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4240kB
> Apr  8 20:29:14 werner kernel: Normal: 1527*4kB 535*8kB 279*16kB 177*32kB
> 65*64kB 23*128kB 6*256kB 3*512kB 1*1024kB 0*2048kB 3*4096kB = 44004kB
> Apr  8 20:29:14 werner kernel: HighMem: 6543*4kB 5584*8kB 3849*16kB
> 2545*32kB 1418*64kB 609*128kB 235*256kB 108*512kB 64*1024kB 21*2048kB
> 319*4096kB = 1913196kB
> Apr  8 20:29:14 werner kernel: 37382 total pagecache pages
> Apr  8 20:29:14 werner kernel: 0 pages in swap cache
> Apr  8 20:29:14 werner kernel: Swap cache stats: add 0, delete 0, find 0/0
> Apr  8 20:29:14 werner kernel: Free swap  = 0kB
> Apr  8 20:29:14 werner kernel: Total swap = 0kB
> Apr  8 20:29:14 werner kernel: 786128 pages RAM
> Apr  8 20:29:14 werner kernel: 558818 pages HighMem
> Apr  8 20:29:14 werner kernel: 13582 pages reserved
> Apr  8 20:29:14 werner kernel: 104040 pages shared
> Apr  8 20:29:14 werner kernel: 179138 pages non-shared
> Apr  8 20:29:15 werner kernel: Out of memory: Kill process 11073 (httpd)
> score 4 or sacrifice child
> Apr  8 20:29:15 werner kernel: Killed process 11073 (httpd)
> total-vm:56424kB, anon-rss:9136kB, file-rss:3784kB
> Apr  8 20:29:15 werner kernel: iwconfig invoked oom-killer:
> gfp_mask=0x800d0, order=0, oom_adj=0, oom_score_adj=0
> Apr  8 20:29:15 werner kernel: Pid: 31155, comm: iwconfig Not tainted
> 3.4.0-rc2-i486-1sys #1
> Apr  8 20:29:15 werner kernel: Call Trace:
> Apr  8 20:29:15 werner kernel:  [<c10356ff>] ? printk+0x20/0x22
> Apr  8 20:29:15 werner kernel:  [<c10af32b>] dump_header+0x6f/0x95
> Apr  8 20:29:15 werner kernel:  [<c10af53f>] oom_kill_process+0x52/0x251
> Apr  8 20:29:15 werner kernel:  [<c10af803>] ? select_bad_process+0xc5/0x11c
> Apr  8 20:29:15 werner kernel:  [<c10af99e>] out_of_memory+0x144/0x1b4
> Apr  8 20:29:15 werner kernel:  [<c10b269c>]
> __alloc_pages_nodemask+0x501/0x63f
> Apr  8 20:29:15 werner kernel:  [<c10b27f6>] __get_free_pages+0x1c/0x2d
> Apr  8 20:29:15 werner kernel:  [<c113197d>] do_proc_readlink+0x27/0x7b
> Apr  8 20:29:15 werner kernel:  [<c1133367>] proc_pid_readlink+0x48/0x5b
> Apr  8 20:29:15 werner kernel:  [<c10ee9c2>] sys_readlinkat+0x81/0x95
> Apr  8 20:29:15 werner kernel:  [<c10eea02>] sys_readlink+0x2c/0x2e
> Apr  8 20:29:15 werner kernel:  [<c208c05c>] syscall_call+0x7/0xb
> Apr  8 20:29:15 werner kernel: Mem-Info:
> Apr  8 20:29:15 werner kernel: DMA per-cpu:
> Apr  8 20:29:15 werner kernel: CPU    0: hi:    0, btch:  1 usd:   0
> Apr  8 20:29:15 werner kernel: Normal per-cpu:
> Apr  8 20:29:15 werner kernel: CPU    0: hi:  186, btch: 31 usd:  97
> Apr  8 20:29:15 werner kernel: HighMem per-cpu:
> Apr  8 20:29:15 werner kernel: CPU    0: hi:  186, btch: 31 usd: 169
> Apr  8 20:29:15 werner kernel: active_anon:34116 inactive_anon:32
> isolated_anon:0
> Apr  8 20:29:15 werner kernel:  active_file:15397 inactive_file:21312
> isolated_file:0
> Apr  8 20:29:15 werner kernel:  unevictable:0 dirty:1 writeback:0 unstable:0
> Apr  8 20:29:15 werner kernel:  free:491507 slab_reclaimable:3273
> slab_unreclaimable:37492
> Apr  8 20:29:15 werner kernel:  mapped:19048 shmem:673 pagetables:464
> bounce:0
> Apr  8 20:29:15 werner kernel: DMA free:4240kB min:784kB low:980kB
> high:1176kB active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:24kB slab_unreclaimable:2184kB kernel_stack:9392kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? yes
> Apr  8 20:29:15 werner kernel: lowmem_reserve[]: 0 865 3031 3031
> Apr  8 20:29:15 werner kernel: Normal free:44004kB min:44012kB low:55012kB
> high:66016kB active_anon:0kB inactive_anon:0kB active_file:132kB
> inactive_file:140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:13068kB slab_unreclaimable:147784kB kernel_stack:628952kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:15584
> all_unreclaimable? yes
> Apr  8 20:29:15 werner kernel: lowmem_reserve[]: 0 0 17326 17326
> Apr  8 20:29:15 werner kernel: HighMem free:1917784kB min:512kB low:28056kB
> high:55600kB active_anon:136464kB inactive_anon:128kB active_file:61456kB
> inactive_file:85108kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:2217808kB mlocked:0kB dirty:4kB writeback:0kB mapped:76192kB
> shmem:2692kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
> pagetables:1856kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> Apr  8 20:29:15 werner kernel: lowmem_reserve[]: 0 0 0 0
> Apr  8 20:29:15 werner kernel: DMA: 116*4kB 18*8kB 1*16kB 1*32kB 0*64kB
> 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4240kB
> Apr  8 20:29:15 werner kernel: Normal: 1527*4kB 535*8kB 279*16kB 177*32kB
> 65*64kB 23*128kB 6*256kB 3*512kB 1*1024kB 0*2048kB 3*4096kB = 44004kB
> Apr  8 20:29:15 werner kernel: HighMem: 6650*4kB 5596*8kB 3885*16kB
> 2556*32kB 1443*64kB 617*128kB 237*256kB 108*512kB 64*1024kB 21*2048kB
> 319*4096kB = 1917784kB
> Apr  8 20:29:15 werner kernel: 37382 total pagecache pages
> Apr  8 20:29:15 werner kernel: 0 pages in swap cache
> Apr  8 20:29:15 werner kernel: Swap cache stats: add 0, delete 0, find 0/0
> Apr  8 20:29:15 werner kernel: Free swap  = 0kB
> Apr  8 20:29:15 werner kernel: Total swap = 0kB
> Apr  8 20:29:15 werner kernel: 786128 pages RAM
> Apr  8 20:29:15 werner kernel: 558818 pages HighMem
> Apr  8 20:29:15 werner kernel: 13582 pages reserved
> Apr  8 20:29:15 werner kernel: 101963 pages shared
> Apr  8 20:29:15 werner kernel: 177974 pages non-shared
> Apr  8 20:29:16 werner kernel: Out of memory: Kill process 10001 (named)
> score 3 or sacrifice child
> Apr  8 20:29:16 werner kernel: Killed process 10001 (named)
> total-vm:41360kB, anon-rss:9828kB, file-rss:2452kB
> Apr  8 20:29:16 werner kernel: iwconfig invoked oom-killer:
> gfp_mask=0x800d0, order=0, oom_adj=0, oom_score_adj=0
> Apr  8 20:29:16 werner kernel: Pid: 31155, comm: iwconfig Not tainted
> 3.4.0-rc2-i486-1sys #1
> Apr  8 20:29:16 werner kernel: Call Trace:
> Apr  8 20:29:16 werner kernel:  [<c10356ff>] ? printk+0x20/0x22
> Apr  8 20:29:16 werner kernel:  [<c10af32b>] dump_header+0x6f/0x95
> Apr  8 20:29:16 werner kernel:  [<c10af53f>] oom_kill_process+0x52/0x251
> Apr  8 20:29:16 werner kernel:  [<c10af803>] ? select_bad_process+0xc5/0x11c
> Apr  8 20:29:16 werner kernel:  [<c10af99e>] out_of_memory+0x144/0x1b4
> Apr  8 20:29:16 werner kernel:  [<c10b269c>]
> __alloc_pages_nodemask+0x501/0x63f
> Apr  8 20:29:16 werner kernel:  [<c10b27f6>] __get_free_pages+0x1c/0x2d
> Apr  8 20:29:16 werner kernel:  [<c113197d>] do_proc_readlink+0x27/0x7b
> Apr  8 20:29:16 werner kernel:  [<c1133367>] proc_pid_readlink+0x48/0x5b
> Apr  8 20:29:16 werner kernel:  [<c10ee9c2>] sys_readlinkat+0x81/0x95
> Apr  8 20:29:16 werner kernel:  [<c10eea02>] sys_readlink+0x2c/0x2e
> Apr  8 20:29:16 werner kernel:  [<c208c05c>] syscall_call+0x7/0xb
> Apr  8 20:29:16 werner kernel: Mem-Info:
> Apr  8 20:29:16 werner kernel: DMA per-cpu:
> Apr  8 20:29:16 werner kernel: CPU    0: hi:    0, btch:  1 usd:   0
> Apr  8 20:29:16 werner kernel: Normal per-cpu:
> Apr  8 20:29:16 werner kernel: CPU    0: hi:  186, btch: 31 usd:  97
> Apr  8 20:29:16 werner kernel: HighMem per-cpu:
> Apr  8 20:29:16 werner kernel: CPU    0: hi:  186, btch: 31 usd: 169
> Apr  8 20:29:16 werner kernel: active_anon:34116 inactive_anon:32
> isolated_anon:0
> Apr  8 20:29:16 werner kernel:  active_file:15397 inactive_file:21312
> isolated_file:0
> Apr  8 20:29:16 werner kernel:  unevictable:0 dirty:1 writeback:0 unstable:0
> Apr  8 20:29:16 werner kernel:  free:491507 slab_reclaimable:3273
> slab_unreclaimable:37492
> Apr  8 20:29:16 werner kernel:  mapped:19048 shmem:673 pagetables:464
> bounce:0
> Apr  8 20:29:16 werner kernel: DMA free:4240kB min:784kB low:980kB
> high:1176kB active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:24kB slab_unreclaimable:2184kB kernel_stack:9392kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? yes
> Apr  8 20:29:16 werner kernel: lowmem_reserve[]: 0 865 3031 3031
> Apr  8 20:29:16 werner kernel: Normal free:44004kB min:44012kB low:55012kB
> high:66016kB active_anon:0kB inactive_anon:0kB active_file:132kB
> inactive_file:140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:13068kB slab_unreclaimable:147784kB kernel_stack:628952kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:15840
> all_unreclaimable? yes
> Apr  8 20:29:16 werner kernel: lowmem_reserve[]: 0 0 17326 17326
> Apr  8 20:29:16 werner kernel: HighMem free:1917784kB min:512kB low:28056kB
> high:55600kB active_anon:136464kB inactive_anon:128kB active_file:61456kB
> inactive_file:85108kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:2217808kB mlocked:0kB dirty:4kB writeback:0kB mapped:76192kB
> shmem:2692kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
> pagetables:1856kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> Apr  8 20:29:16 werner kernel: lowmem_reserve[]: 0 0 0 0
> Apr  8 20:29:16 werner kernel: DMA: 116*4kB 18*8kB 1*16kB 1*32kB 0*64kB
> 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4240kB
> Apr  8 20:29:16 werner kernel: Normal: 1527*4kB 535*8kB 279*16kB 177*32kB
> 65*64kB 23*128kB 6*256kB 3*512kB 1*1024kB 0*2048kB 3*4096kB = 44004kB
> Apr  8 20:29:16 werner kernel: HighMem: 6650*4kB 5596*8kB 3885*16kB
> 2556*32kB 1443*64kB 617*128kB 237*256kB 108*512kB 64*1024kB 21*2048kB
> 319*4096kB = 1917784kB
> Apr  8 20:29:16 werner kernel: 37382 total pagecache pages
> Apr  8 20:29:16 werner kernel: 0 pages in swap cache
> Apr  8 20:29:16 werner kernel: Swap cache stats: add 0, delete 0, find 0/0
> Apr  8 20:29:16 werner kernel: Free swap  = 0kB
> Apr  8 20:29:16 werner kernel: Total swap = 0kB
> Apr  8 20:29:16 werner kernel: 786128 pages RAM
> Apr  8 20:29:16 werner kernel: 558818 pages HighMem
> Apr  8 20:29:16 werner kernel: 13582 pages reserved
> Apr  8 20:29:16 werner kernel: 101963 pages shared
> Apr  8 20:29:16 werner kernel: 177974 pages non-shared
> Apr  8 20:29:17 werner kernel: Out of memory: Kill process 10002 (named)
> score 4 or sacrifice child
> Apr  8 20:29:17 werner kernel: Killed process 10002 (named)
> total-vm:41360kB, anon-rss:9920kB, file-rss:2540kB
> Apr  8 20:29:17 werner kernel: iwconfig invoked oom-killer:
> gfp_mask=0x800d0, order=0, oom_adj=0, oom_score_adj=0
> Apr  8 20:29:17 werner kernel: Pid: 31155, comm: iwconfig Not tainted
> 3.4.0-rc2-i486-1sys #1
> Apr  8 20:29:17 werner kernel: Call Trace:
> Apr  8 20:29:17 werner kernel:  [<c10356ff>] ? printk+0x20/0x22
> Apr  8 20:29:17 werner kernel:  [<c10af32b>] dump_header+0x6f/0x95
> Apr  8 20:29:17 werner kernel:  [<c10af53f>] oom_kill_process+0x52/0x251
> Apr  8 20:29:17 werner kernel:  [<c10af803>] ? select_bad_process+0xc5/0x11c
> Apr  8 20:29:17 werner kernel:  [<c10af99e>] out_of_memory+0x144/0x1b4
> Apr  8 20:29:17 werner kernel:  [<c10b269c>]
> __alloc_pages_nodemask+0x501/0x63f
> Apr  8 20:29:17 werner kernel:  [<c10b27f6>] __get_free_pages+0x1c/0x2d
> Apr  8 20:29:17 werner kernel:  [<c113197d>] do_proc_readlink+0x27/0x7b
> Apr  8 20:29:17 werner kernel:  [<c1133367>] proc_pid_readlink+0x48/0x5b
> Apr  8 20:29:17 werner kernel:  [<c10ee9c2>] sys_readlinkat+0x81/0x95
> Apr  8 20:29:17 werner kernel:  [<c10eea02>] sys_readlink+0x2c/0x2e
> Apr  8 20:29:17 werner kernel:  [<c208c05c>] syscall_call+0x7/0xb
> Apr  8 20:29:17 werner kernel: Mem-Info:
> Apr  8 20:29:17 werner kernel: DMA per-cpu:
> Apr  8 20:29:17 werner kernel: CPU    0: hi:    0, btch:  1 usd:   0
> Apr  8 20:29:17 werner kernel: Normal per-cpu:
> Apr  8 20:29:17 werner kernel: CPU    0: hi:  186, btch: 31 usd: 102
> Apr  8 20:29:17 werner kernel: HighMem per-cpu:
> Apr  8 20:29:17 werner kernel: CPU    0: hi:  186, btch: 31 usd: 161
> Apr  8 20:29:17 werner kernel: active_anon:31596 inactive_anon:32
> isolated_anon:0
> Apr  8 20:29:17 werner kernel:  active_file:15449 inactive_file:21260
> isolated_file:0
> Apr  8 20:29:17 werner kernel:  unevictable:0 dirty:1 writeback:0 unstable:0
> Apr  8 20:29:17 werner kernel:  free:494050 slab_reclaimable:3273
> slab_unreclaimable:37492
> Apr  8 20:29:17 werner kernel:  mapped:18630 shmem:673 pagetables:464
> bounce:0
> Apr  8 20:29:17 werner kernel: DMA free:4240kB min:784kB low:980kB
> high:1176kB active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:24kB slab_unreclaimable:2184kB kernel_stack:9392kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? yes
> Apr  8 20:29:17 werner kernel: lowmem_reserve[]: 0 865 3031 3031
> Apr  8 20:29:17 werner kernel: Normal free:44004kB min:44012kB low:55012kB
> high:66016kB active_anon:0kB inactive_anon:0kB active_file:132kB
> inactive_file:140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:13068kB slab_unreclaimable:147784kB kernel_stack:628952kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:15968
> all_unreclaimable? yes
> Apr  8 20:29:17 werner kernel: lowmem_reserve[]: 0 0 17326 17326
> Apr  8 20:29:17 werner kernel: HighMem free:1927956kB min:512kB low:28056kB
> high:55600kB active_anon:126384kB inactive_anon:128kB active_file:61664kB
> inactive_file:84900kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:2217808kB mlocked:0kB dirty:4kB writeback:0kB mapped:74520kB
> shmem:2692kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
> pagetables:1856kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> Apr  8 20:29:17 werner kernel: lowmem_reserve[]: 0 0 0 0
> Apr  8 20:29:17 werner kernel: DMA: 116*4kB 18*8kB 1*16kB 1*32kB 0*64kB
> 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4240kB
> Apr  8 20:29:17 werner kernel: Normal: 1527*4kB 535*8kB 279*16kB 177*32kB
> 65*64kB 23*128kB 6*256kB 3*512kB 1*1024kB 0*2048kB 3*4096kB = 44004kB
> Apr  8 20:29:17 werner kernel: HighMem: 6715*4kB 5597*8kB 3896*16kB
> 2572*32kB 1453*64kB 624*128kB 245*256kB 111*512kB 64*1024kB 21*2048kB
> 320*4096kB = 1927956kB
> Apr  8 20:29:17 werner kernel: 37382 total pagecache pages
> Apr  8 20:29:17 werner kernel: 0 pages in swap cache
> Apr  8 20:29:17 werner kernel: Swap cache stats: add 0, delete 0, find 0/0
> Apr  8 20:29:17 werner kernel: Free swap  = 0kB
> Apr  8 20:29:17 werner kernel: Total swap = 0kB
> Apr  8 20:29:17 werner kernel: 786128 pages RAM
> Apr  8 20:29:17 werner kernel: 558818 pages HighMem
> Apr  8 20:29:17 werner kernel: 13582 pages reserved
> Apr  8 20:29:17 werner kernel: 101323 pages shared
> Apr  8 20:29:17 werner kernel: 175859 pages non-shared
> Apr  8 20:29:17 werner kernel: Out of memory: Kill process 11074 (httpd)
> score 3 or sacrifice child
> Apr  8 20:29:17 werner kernel: Killed process 11074 (httpd)
> total-vm:55376kB, anon-rss:8092kB, file-rss:3632kB
> Apr  8 20:29:17 werner kernel: iwconfig invoked oom-killer:
> gfp_mask=0x800d0, order=0, oom_adj=0, oom_score_adj=0
> Apr  8 20:29:17 werner kernel: Pid: 31155, comm: iwconfig Not tainted
> 3.4.0-rc2-i486-1sys #1
> Apr  8 20:29:17 werner kernel: Call Trace:
> Apr  8 20:29:17 werner kernel:  [<c10356ff>] ? printk+0x20/0x22
> Apr  8 20:29:17 werner kernel:  [<c10af32b>] dump_header+0x6f/0x95
> Apr  8 20:29:17 werner kernel:  [<c10af53f>] oom_kill_process+0x52/0x251
> Apr  8 20:29:17 werner kernel:  [<c10af803>] ? select_bad_process+0xc5/0x11c
> Apr  8 20:29:17 werner kernel:  [<c10af99e>] out_of_memory+0x144/0x1b4
> Apr  8 20:29:17 werner kernel:  [<c10b269c>]
> __alloc_pages_nodemask+0x501/0x63f
> Apr  8 20:29:17 werner kernel:  [<c10b27f6>] __get_free_pages+0x1c/0x2d
> Apr  8 20:29:17 werner kernel:  [<c113197d>] do_proc_readlink+0x27/0x7b
> Apr  8 20:29:17 werner kernel:  [<c1133367>] proc_pid_readlink+0x48/0x5b
> Apr  8 20:29:17 werner kernel:  [<c10ee9c2>] sys_readlinkat+0x81/0x95
> Apr  8 20:29:17 werner kernel:  [<c10eea02>] sys_readlink+0x2c/0x2e
> Apr  8 20:29:17 werner kernel:  [<c208c05c>] syscall_call+0x7/0xb
> Apr  8 20:29:17 werner kernel: Mem-Info:
> Apr  8 20:29:17 werner kernel: DMA per-cpu:
> Apr  8 20:29:17 werner kernel: CPU    0: hi:    0, btch:  1 usd:   0
> Apr  8 20:29:17 werner kernel: Normal per-cpu:
> Apr  8 20:29:17 werner kernel: CPU    0: hi:  186, btch: 31 usd: 104
> Apr  8 20:29:17 werner kernel: HighMem per-cpu:
> Apr  8 20:29:17 werner kernel: CPU    0: hi:  186, btch: 31 usd: 163
> Apr  8 20:29:17 werner kernel: active_anon:30699 inactive_anon:32
> isolated_anon:0
> Apr  8 20:29:17 werner kernel:  active_file:15553 inactive_file:21156
> isolated_file:0
> Apr  8 20:29:17 werner kernel:  unevictable:0 dirty:1 writeback:0 unstable:0
> Apr  8 20:29:17 werner kernel:  free:494949 slab_reclaimable:3273
> slab_unreclaimable:37492
> Apr  8 20:29:17 werner kernel:  mapped:18630 shmem:673 pagetables:445
> bounce:0
> Apr  8 20:29:17 werner kernel: DMA free:4240kB min:784kB low:980kB
> high:1176kB active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:24kB slab_unreclaimable:2184kB kernel_stack:9392kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? yes
> Apr  8 20:29:17 werner kernel: lowmem_reserve[]: 0 865 3031 3031
> Apr  8 20:29:17 werner kernel: Normal free:44004kB min:44012kB low:55012kB
> high:66016kB active_anon:0kB inactive_anon:0kB active_file:132kB
> inactive_file:140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:13068kB slab_unreclaimable:147784kB kernel_stack:628952kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:16864
> all_unreclaimable? yes
> Apr  8 20:29:17 werner kernel: lowmem_reserve[]: 0 0 17326 17326
> Apr  8 20:29:17 werner kernel: HighMem free:1931552kB min:512kB low:28056kB
> high:55600kB active_anon:122796kB inactive_anon:128kB active_file:62080kB
> inactive_file:84484kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:2217808kB mlocked:0kB dirty:4kB writeback:0kB mapped:74520kB
> shmem:2692kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
> pagetables:1780kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> Apr  8 20:29:17 werner kernel: lowmem_reserve[]: 0 0 0 0
> Apr  8 20:29:17 werner kernel: DMA: 116*4kB 18*8kB 1*16kB 1*32kB 0*64kB
> 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4240kB
> Apr  8 20:29:17 werner kernel: Normal: 1527*4kB 535*8kB 279*16kB 177*32kB
> 65*64kB 23*128kB 6*256kB 3*512kB 1*1024kB 0*2048kB 3*4096kB = 44004kB
> Apr  8 20:29:17 werner kernel: HighMem: 6724*4kB 5598*8kB 3908*16kB
> 2589*32kB 1469*64kB 630*128kB 249*256kB 111*512kB 64*1024kB 21*2048kB
> 320*4096kB = 1931552kB
> Apr  8 20:29:17 werner kernel: 37382 total pagecache pages
> Apr  8 20:29:17 werner kernel: 0 pages in swap cache
> Apr  8 20:29:17 werner kernel: Swap cache stats: add 0, delete 0, find 0/0
> Apr  8 20:29:17 werner kernel: Free swap  = 0kB
> Apr  8 20:29:17 werner kernel: Total swap = 0kB
> Apr  8 20:29:17 werner kernel: 786128 pages RAM
> Apr  8 20:29:17 werner kernel: 558818 pages HighMem
> Apr  8 20:29:17 werner kernel: 13582 pages reserved
> Apr  8 20:29:17 werner kernel: 99284 pages shared
> Apr  8 20:29:17 werner kernel: 174956 pages non-shared
> Apr  8 20:29:17 werner kernel: Out of memory: Kill process 11075 (httpd)
> score 3 or sacrifice child
> Apr  8 20:29:17 werner kernel: Killed process 11075 (httpd)
> total-vm:55120kB, anon-rss:7832kB, file-rss:3628kB
> Apr  8 20:29:17 werner kernel: iwconfig invoked oom-killer:
> gfp_mask=0x800d0, order=0, oom_adj=0, oom_score_adj=0
> Apr  8 20:29:17 werner kernel: Pid: 31155, comm: iwconfig Not tainted
> 3.4.0-rc2-i486-1sys #1
> Apr  8 20:29:17 werner kernel: Call Trace:
> Apr  8 20:29:17 werner kernel:  [<c10356ff>] ? printk+0x20/0x22
> Apr  8 20:29:17 werner kernel:  [<c10af32b>] dump_header+0x6f/0x95
> Apr  8 20:29:17 werner kernel:  [<c10af53f>] oom_kill_process+0x52/0x251
> Apr  8 20:29:17 werner kernel:  [<c10af803>] ? select_bad_process+0xc5/0x11c
> Apr  8 20:29:17 werner kernel:  [<c10af99e>] out_of_memory+0x144/0x1b4
> Apr  8 20:29:17 werner kernel:  [<c10b269c>]
> __alloc_pages_nodemask+0x501/0x63f
> Apr  8 20:29:17 werner kernel:  [<c10b27f6>] __get_free_pages+0x1c/0x2d
> Apr  8 20:29:17 werner kernel:  [<c113197d>] do_proc_readlink+0x27/0x7b
> Apr  8 20:29:17 werner kernel:  [<c1133367>] proc_pid_readlink+0x48/0x5b
> Apr  8 20:29:17 werner kernel:  [<c10ee9c2>] sys_readlinkat+0x81/0x95
> Apr  8 20:29:17 werner kernel:  [<c10eea02>] sys_readlink+0x2c/0x2e
> Apr  8 20:29:17 werner kernel:  [<c208c05c>] syscall_call+0x7/0xb
> Apr  8 20:29:17 werner kernel: Mem-Info:
> Apr  8 20:29:17 werner kernel: DMA per-cpu:
> Apr  8 20:29:17 werner kernel: CPU    0: hi:    0, btch:  1 usd:   0
> Apr  8 20:29:17 werner kernel: Normal per-cpu:
> Apr  8 20:29:17 werner kernel: CPU    0: hi:  186, btch: 31 usd:  57
> Apr  8 20:29:17 werner kernel: HighMem per-cpu:
> Apr  8 20:29:17 werner kernel: CPU    0: hi:  186, btch: 31 usd: 121
> Apr  8 20:29:17 werner kernel: active_anon:29820 inactive_anon:29
> isolated_anon:0
> Apr  8 20:29:17 werner kernel:  active_file:15559 inactive_file:21150
> isolated_file:0
> Apr  8 20:29:17 werner kernel:  unevictable:0 dirty:0 writeback:0 unstable:0
> Apr  8 20:29:17 werner kernel:  free:495930 slab_reclaimable:3267
> slab_unreclaimable:37486
> Apr  8 20:29:17 werner kernel:  mapped:18629 shmem:671 pagetables:434
> bounce:0
> Apr  8 20:29:17 werner kernel: DMA free:4240kB min:784kB low:980kB
> high:1176kB active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:24kB slab_unreclaimable:2184kB kernel_stack:9392kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? yes
> Apr  8 20:29:17 werner kernel: lowmem_reserve[]: 0 865 3031 3031
> Apr  8 20:29:17 werner kernel: Normal free:44184kB min:44012kB low:55012kB
> high:66016kB active_anon:0kB inactive_anon:0kB active_file:128kB
> inactive_file:144kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:13044kB slab_unreclaimable:147760kB kernel_stack:628952kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:447
> all_unreclaimable? yes
> Apr  8 20:29:17 werner kernel: lowmem_reserve[]: 0 0 17326 17326
> Apr  8 20:29:17 werner kernel: HighMem free:1935296kB min:512kB low:28056kB
> high:55600kB active_anon:119280kB inactive_anon:116kB active_file:62108kB
> inactive_file:84456kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:2217808kB mlocked:0kB dirty:0kB writeback:0kB mapped:74516kB
> shmem:2684kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
> pagetables:1736kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> Apr  8 20:29:17 werner kernel: lowmem_reserve[]: 0 0 0 0
> Apr  8 20:29:17 werner kernel: DMA: 116*4kB 18*8kB 1*16kB 1*32kB 0*64kB
> 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4240kB
> Apr  8 20:29:17 werner kernel: Normal: 1568*4kB 535*8kB 280*16kB 177*32kB
> 65*64kB 23*128kB 6*256kB 3*512kB 1*1024kB 0*2048kB 3*4096kB = 44184kB
> Apr  8 20:29:17 werner kernel: HighMem: 6656*4kB 5630*8kB 3937*16kB
> 2602*32kB 1484*64kB 637*128kB 253*256kB 111*512kB 64*1024kB 21*2048kB
> 320*4096kB = 1935296kB
> Apr  8 20:29:17 werner kernel: 37380 total pagecache pages
> Apr  8 20:29:17 werner kernel: 0 pages in swap cache
> Apr  8 20:29:17 werner kernel: Swap cache stats: add 0, delete 0, find 0/0
> Apr  8 20:29:17 werner kernel: Free swap  = 0kB
> Apr  8 20:29:17 werner kernel: Total swap = 0kB
> Apr  8 20:29:17 werner kernel: 786128 pages RAM
> Apr  8 20:29:17 werner kernel: 558818 pages HighMem
> Apr  8 20:29:17 werner kernel: 13582 pages reserved
> Apr  8 20:29:17 werner kernel: 96159 pages shared
> Apr  8 20:29:17 werner kernel: 174665 pages non-shared
> Apr  8 20:29:17 werner kernel: Out of memory: Kill process 11076 (httpd)
> score 3 or sacrifice child
> Apr  8 20:29:17 werner kernel: Killed process 11076 (httpd)
> total-vm:55380kB, anon-rss:8088kB, file-rss:3632kB
> Apr  8 20:29:28 werner kdm_greet[31163]: Can't open default user face
> Apr  8 20:55:07 werner hcid[9333]: Got disconnected from the system message
> bus
> ---
> Professional hosting for everyone - http://www.host.ru

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09  2:42 v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs) Linus Torvalds
@ 2012-04-09  2:50 ` Andrew Morton
  2012-04-09  3:11   ` Linus Torvalds
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2012-04-09  2:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Rientjes, Rik van Riel, Hugh Dickins, werner, linux-kernel

On Sun, 8 Apr 2012 19:42:31 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Guys, there's something wrong in the VM. Most likely suspects added to
> the participants list.
> 
> Apparently things go south and the oom killer is invoked. X.org seems
> to get killed.
> 
> Any hints? Werner traditionally finds problems by enabling every
> single config option there is, I assume this is another of those
> kernes..
> 
>
> ...
>
> > Apr __8 20:29:11 werner kernel: Normal free:44004kB min:44012kB low:55012kB
> > high:66016kB active_anon:0kB inactive_anon:0kB active_file:132kB
> > inactive_file:140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> > present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> > slab_reclaimable:13068kB slab_unreclaimable:147784kB kernel_stack:628952kB
> > pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1376
> > all_unreclaimable? yes

That's claiming that 600MB of ZONE_NORMAL is being used for kernel stacks.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09  2:50 ` Andrew Morton
@ 2012-04-09  3:11   ` Linus Torvalds
  2012-04-09  7:04     ` Sven Joachim
  2012-04-09 10:15     ` David Rientjes
  0 siblings, 2 replies; 43+ messages in thread
From: Linus Torvalds @ 2012-04-09  3:11 UTC (permalink / raw)
  To: Andrew Morton, werner
  Cc: David Rientjes, Rik van Riel, Hugh Dickins, linux-kernel, Oleg Nesterov

On Sun, Apr 8, 2012 at 7:50 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Sun, 8 Apr 2012 19:42:31 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>> > Apr __8 20:29:11 werner kernel: Normal free:44004kB min:44012kB low:55012kB
>> > high:66016kB active_anon:0kB inactive_anon:0kB active_file:132kB
>> > inactive_file:140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
>> > present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
>> > slab_reclaimable:13068kB slab_unreclaimable:147784kB kernel_stack:628952kB
>> > pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1376
>> > all_unreclaimable? yes
>
> That's claiming that 600MB of ZONE_NORMAL is being used for kernel stacks.

Well, that would certainly eat up memory that is hard to get back.

Werner - if you can reproduce this, can you get a "ps axl" or similar
when it starts happening? Or probably even long before, since it
probably starts long long earlier.

Or does anybody see anything that keeps thread counts raised so that
"free_task()" doesn't get done. kernel/profoe.c does that
"profile_handoff_task()" thing - but only oprofile and the android
low-memory-killer logic seems to use it though. But that's exactly the
kind of thing that Werner's "configure everything" might enable -
Werner?

What else would do this? I'd suspect the /proc code, but that grabs
the mm_struct, and those particular changes were pre-3.3 anyway.

Adding Oleg just in case he has any ideas about process code changes
(or some usermodehelper thing that leaks processes, or whatever).

                             Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09  3:11   ` Linus Torvalds
@ 2012-04-09  7:04     ` Sven Joachim
  2012-04-09 15:24       ` Linus Torvalds
  2012-04-09 15:57       ` Rik van Riel
  2012-04-09 10:15     ` David Rientjes
  1 sibling, 2 replies; 43+ messages in thread
From: Sven Joachim @ 2012-04-09  7:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, werner, David Rientjes, Rik van Riel,
	Hugh Dickins, linux-kernel, Oleg Nesterov

On 2012-04-09 05:11 +0200, Linus Torvalds wrote:

> On Sun, Apr 8, 2012 at 7:50 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
>> On Sun, 8 Apr 2012 19:42:31 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>>
>>> > Apr __8 20:29:11 werner kernel: Normal free:44004kB min:44012kB low:55012kB
>>> > high:66016kB active_anon:0kB inactive_anon:0kB active_file:132kB
>>> > inactive_file:140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
>>> > present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
>>> > slab_reclaimable:13068kB slab_unreclaimable:147784kB kernel_stack:628952kB
>>> > pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1376
>>> > all_unreclaimable? yes
>>
>> That's claiming that 600MB of ZONE_NORMAL is being used for kernel stacks.
>
> Well, that would certainly eat up memory that is hard to get back.

While I did not experience any crashes or instabilities (yet?), I'm also
seeing memory leaks.  On a system started this morning, with hardly
anything running:

,----
| $ pstree
| init-+-acpid
|      |-atd
|      |-cron
|      |-dbus-daemon
|      |-dhclient
|      |-dictd
|      |-5*[getty]
|      |-gpm
|      |-login---zsh---pstree
|      |-lpd
|      |-master-+-pickup
|      |        `-qmgr
|      |-named---4*[{named}]
|      |-rpc.statd
|      |-rpcbind
|      |-rsyslogd---3*[{rsyslogd}]
|      |-timidity
|      |-udevd---2*[udevd]
|      `-wpa_supplicant
`----

where I would expect no more than 50 MB used, 400 MB are actually in use:

,----
| $ free
|              total       used       free     shared    buffers     cached
| Mem:       3348400    1849712    1498688          0     328960    1119180
| -/+ buffers/cache:     401572    2946828
| Swap:      3719040          0    3719040
`----

Cheers,
       Sven

> Werner - if you can reproduce this, can you get a "ps axl" or similar
> when it starts happening? Or probably even long before, since it
> probably starts long long earlier.
>
> Or does anybody see anything that keeps thread counts raised so that
> "free_task()" doesn't get done. kernel/profoe.c does that
> "profile_handoff_task()" thing - but only oprofile and the android
> low-memory-killer logic seems to use it though. But that's exactly the
> kind of thing that Werner's "configure everything" might enable -
> Werner?
>
> What else would do this? I'd suspect the /proc code, but that grabs
> the mm_struct, and those particular changes were pre-3.3 anyway.
>
> Adding Oleg just in case he has any ideas about process code changes
> (or some usermodehelper thing that leaks processes, or whatever).


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09  3:11   ` Linus Torvalds
  2012-04-09  7:04     ` Sven Joachim
@ 2012-04-09 10:15     ` David Rientjes
  2012-04-09 15:39       ` Linus Torvalds
  1 sibling, 1 reply; 43+ messages in thread
From: David Rientjes @ 2012-04-09 10:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
	Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
	Anton Vorontsov, Greg Kroah-Hartman, stable

On Sun, 8 Apr 2012, Linus Torvalds wrote:

> Or does anybody see anything that keeps thread counts raised so that
> "free_task()" doesn't get done. kernel/profoe.c does that
> "profile_handoff_task()" thing - but only oprofile and the android
> low-memory-killer logic seems to use it though. But that's exactly the
> kind of thing that Werner's "configure everything" might enable -
> Werner?
> 

I think you nailed it.

I suspect the problem is 1eda5166c764 ("staging: android/lowmemorykiller: 
Don't unregister notifier from atomic context") merged during the 3.4 
merge window and, unfortunately, backported to stable.

Werner's config has CONFIG_ANDROID_LOW_MEMORY_KILLER=y so we never 
actually unregister the callback for the task handoff as a result of the 
patch.  It's supposed to take responsibility for doing free_task() itself 
when it's good and ready, usually by putting it into a list to free, but 
now we're just doing this:

	struct task_struct *task = data;

	if (task == lowmem_deathpending)
		lowmem_deathpending = NULL;

	return NOTIFY_OK;

whenever put_task_struct() decrements the refcount to 0 and thus they get 
leaked and bad things happen.

This is confirmed by Werner's oom log that shows extremely small values 
for the oom score of the task chosen to oom kill.  His first log showed X 
being killed with a score of 29.  That means it is the most memory-hogging 
task on the system and is only using 2.9% of total system memory.

I can't actually see how the lowmemorykiller actually ever freed any 
task_struct after unregistering the notifier during the callback.  It 
seems like this has always leaked memory but it used to happen much more 
slowly because, prior to the patch, we did task_handoff_unregister() in 
the callback.  So I think the code was always wrong but now it's out of 
control because the notifier remains enabled indefinitely.  I can't say 
the 1eda5166c764 ("staging: android/lowmemorykiller: Don't unregister 
notifier from atomic context") commit is fully to blame, it just made the 
error much more egregious.

As it sits in 3.4-rc2, this whole lowmem_deathpending business seems to be 
storing a pointer to the task_struct of something sent a SIGKILL and it 
remains that way until the lowmem_deathpending_timeout expires and 
something else is killed instead.  lowmem_deathpending gets cleared on the 
task handoff if the task selected for kill just exited.  This ensures we 
only kill one thread at a time.

That's all fine and good but it seems like we're never freeing the 
task_struct itself on exit.  This seems like the most obvious fix but it 
would be really nice to revisit this and remove the dependency on 
CONFIG_PROFILING and just check if the lowmem_deathpending thread is found 
in the iteration for lowmem_shrink() prior to killing.


android, lowmemorykiller: free task struct on profiling handoff

The lowmemorykiller stores a pointer to a killed thread's task_struct in 
lowmem_deathpending when profiling is enabled.  When put_task_struct() 
results in the refcount going to 0, the task_notify_func() callback clears 
lowmem_deathpending if it is the thread that was killed last.  This 
prevents additional killing until lowmem_deathpending_timeout elapses.

The responsibility of every task handoff notifier is to free the tasks 
handed off to it, however, and this was being neglected, which results 
in a massive memory leak since no task_struct ever gets freed.

Fix that by freeing the task_struct since we no longer need a reference to 
it.

Reported-by: werner <w.landgraf@ru.ru>
Cc: stable@vger.kernel.org
Signed-off-by: David Rientjes <rientjes@google.com>
---
 drivers/staging/android/lowmemorykiller.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -78,6 +78,7 @@ task_notify_func(struct notifier_block *self, unsigned long val, void *data)
 
 	if (task == lowmem_deathpending)
 		lowmem_deathpending = NULL;
+	free_task(task);
 
 	return NOTIFY_OK;
 }

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09  7:04     ` Sven Joachim
@ 2012-04-09 15:24       ` Linus Torvalds
  2012-04-09 15:43         ` Sven Joachim
  2012-04-09 15:57       ` Rik van Riel
  1 sibling, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2012-04-09 15:24 UTC (permalink / raw)
  To: Sven Joachim
  Cc: Andrew Morton, werner, David Rientjes, Rik van Riel,
	Hugh Dickins, linux-kernel, Oleg Nesterov

On Mon, Apr 9, 2012 at 12:04 AM, Sven Joachim <svenjoac@gmx.de> wrote:
>
> While I did not experience any crashes or instabilities (yet?), I'm also
> seeing memory leaks.  On a system started this morning, with hardly
> anything running:

Do you also have ANDROID support compiled in?  And
ANDROID_LOW_MEMORY_KILLER in particular?

                  Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 10:15     ` David Rientjes
@ 2012-04-09 15:39       ` Linus Torvalds
  2012-04-09 21:22         ` David Rientjes
  0 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2012-04-09 15:39 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
	Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
	Anton Vorontsov, Greg Kroah-Hartman, stable

On Mon, Apr 9, 2012 at 3:15 AM, David Rientjes <rientjes@google.com> wrote:
>
> I think you nailed it.
>
> I suspect the problem is 1eda5166c764 ("staging: android/lowmemorykiller:
> Don't unregister notifier from atomic context") merged during the 3.4
> merge window and, unfortunately, backported to stable.

Ok. That does seem to match everything.

However, I think your patch is the wrong one.

The real bug is actually that those notifiers are a f*cking joke, and
the return value from the notifier is a mistake.

So I personally think that the real problem is this code in
profile_handoff_task:

        return (ret == NOTIFY_OK) ? 1 : 0;

and ask yourself two questions:

 - what the hell does NOTIFY_OK/NOTIFY_DONE mean?
 - what happens if there are multiple notifiers that all (or some)
return NOTIFY_OK?

I'll tell you what my answers are:

 (a) NOTIFY_DONE is the "ok, everything is fine, you can free the
task-struct". It's also what that handoff notifier thing returns if
there are no notifiers registered at all.

     So the fix to the Android lowmemorykiller is as simple as just
changing NOTIFY_OK to NOTIFY_DONE, which will mean that the caller
will properly free the task struct.

     The NOTIFY_OK/NOTIFY_DONE difference really does seem to be just
"NOTIFY_OK means that I will free the task myself later". That's what
the oprofile uses, and it frees the task.

 (b) But the whole interface is a total f*cking mess. If *multiple*
people return NOTIFY_OK, they're royally fucked. And the whole (and
only) point of notifiers is that you can register multiple different
ones independently.

So quite frankly, the *real* bug is not in that android driver
(although I'd say that we should just make it return NOTIFY_DONE and
be done with it). The real bug is that the whole f*cking notifier is a
mistake, and checking the error return was the biggest mistake of all.

Werner: just test David's patch (do *not* change both the error value
*and* apply David's patch - that would free the task-struct twice). I
don't think his patch is what I want to apply eventually, but it
should fix the issue.

Sadly, I don't think we have anybody who really "owns"
kernel/profile.c - the thing is broken, it was misdesigned, and nobody
really cares. Which is why we'll probably have to fix this by just
making that Android thing return NOTIFY_DONE, and just accept that the
whole thing is a f*cking joke.

                         Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 15:24       ` Linus Torvalds
@ 2012-04-09 15:43         ` Sven Joachim
  0 siblings, 0 replies; 43+ messages in thread
From: Sven Joachim @ 2012-04-09 15:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, werner, David Rientjes, Rik van Riel,
	Hugh Dickins, linux-kernel, Oleg Nesterov

On 2012-04-09 17:24 +0200, Linus Torvalds wrote:

> On Mon, Apr 9, 2012 at 12:04 AM, Sven Joachim <svenjoac@gmx.de> wrote:
>>
>> While I did not experience any crashes or instabilities (yet?), I'm also
>> seeing memory leaks.  On a system started this morning, with hardly
>> anything running:
>
> Do you also have ANDROID support compiled in?  And
> ANDROID_LOW_MEMORY_KILLER in particular?

No, "grep ANDROID /boot/config-$(uname -r)" prints nothing.

Cheers,
       Sven

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09  7:04     ` Sven Joachim
  2012-04-09 15:24       ` Linus Torvalds
@ 2012-04-09 15:57       ` Rik van Riel
  2012-04-09 16:19         ` Sven Joachim
  1 sibling, 1 reply; 43+ messages in thread
From: Rik van Riel @ 2012-04-09 15:57 UTC (permalink / raw)
  To: Sven Joachim
  Cc: Linus Torvalds, Andrew Morton, werner, David Rientjes,
	Hugh Dickins, linux-kernel, Oleg Nesterov

On 04/09/2012 03:04 AM, Sven Joachim wrote:

> While I did not experience any crashes or instabilities (yet?), I'm also
> seeing memory leaks.  On a system started this morning, with hardly
> anything running:

> where I would expect no more than 50 MB used, 400 MB are actually in use:
>
> ,----
> | $ free
> |              total       used       free     shared    buffers     cached
> | Mem:       3348400    1849712    1498688          0     328960    1119180
> | -/+ buffers/cache:     401572    2946828
> | Swap:      3719040          0    3719040
> `----

Do you see any big memory users in /proc/meminfo or in
/proc/slabinfo?

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 15:57       ` Rik van Riel
@ 2012-04-09 16:19         ` Sven Joachim
  2012-04-09 16:33           ` Rik van Riel
  0 siblings, 1 reply; 43+ messages in thread
From: Sven Joachim @ 2012-04-09 16:19 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Linus Torvalds, Andrew Morton, werner, David Rientjes,
	Hugh Dickins, linux-kernel, Oleg Nesterov

[-- Attachment #1: Type: text/plain, Size: 901 bytes --]

On 2012-04-09 17:57 +0200, Rik van Riel wrote:

> On 04/09/2012 03:04 AM, Sven Joachim wrote:
>
>> While I did not experience any crashes or instabilities (yet?), I'm also
>> seeing memory leaks.  On a system started this morning, with hardly
>> anything running:
>
>> where I would expect no more than 50 MB used, 400 MB are actually in use:
>>
>> ,----
>> | $ free
>> |              total       used       free     shared    buffers     cached
>> | Mem:       3348400    1849712    1498688          0     328960    1119180
>> | -/+ buffers/cache:     401572    2946828
>> | Swap:      3719040          0    3719040
>> `----
>
> Do you see any big memory users in /proc/meminfo or in
> /proc/slabinfo?

Attaching these files, since I can't really make anything out of the
latter.  Note that I started a few memory hogs (X, Firefox, Emacs with
Gnus), so overall memory footprint has grown to 768 MB.


[-- Attachment #2: meminfo --]
[-- Type: text/plain, Size: 986 bytes --]

MemTotal:        3348400 kB
MemFree:          195560 kB
Buffers:          292688 kB
Cached:          2079648 kB
SwapCached:            0 kB
Active:          1443900 kB
Inactive:        1241544 kB
Active(anon):     219388 kB
Inactive(anon):    94668 kB
Active(file):    1224512 kB
Inactive(file):  1146876 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       3719040 kB
SwapFree:        3719040 kB
Dirty:                52 kB
Writeback:             0 kB
AnonPages:        313108 kB
Mapped:            70348 kB
Shmem:               948 kB
Slab:             407688 kB
SReclaimable:     393984 kB
SUnreclaim:        13704 kB
KernelStack:        1088 kB
PageTables:         2496 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     5393240 kB
Committed_AS:     790452 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      316040 kB
VmallocChunk:   34359370299 kB
DirectMap4k:      232380 kB
DirectMap2M:     3174400 kB

[-- Attachment #3: slabinfo --]
[-- Type: text/plain, Size: 16729 bytes --]

slabinfo - version: 2.1
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
fib6_nodes             5     59     64   59    1 : tunables  120   60    8 : slabdata      1      1      0
ip6_dst_cache          4     12    320   12    1 : tunables   54   27    8 : slabdata      1      1      0
RAWv6                  5      8   1024    4    1 : tunables   54   27    8 : slabdata      2      2      0
UDPLITEv6              0      0   1024    4    1 : tunables   54   27    8 : slabdata      0      0      0
UDPv6                  2      4   1024    4    1 : tunables   54   27    8 : slabdata      1      1      0
tw_sock_TCPv6          0      0    192   20    1 : tunables  120   60    8 : slabdata      0      0      0
request_sock_TCPv6      0      0    192   20    1 : tunables  120   60    8 : slabdata      0      0      0
TCPv6                  0      0   1792    2    1 : tunables   24   12    8 : slabdata      0      0      0
ext4_groupinfo_2k     27     28    136   28    1 : tunables  120   60    8 : slabdata      1      1      0
ext4_groupinfo_4k   1160   1176    136   28    1 : tunables  120   60    8 : slabdata     42     42      0
uhci_urb_priv          0      0     56   67    1 : tunables  120   60    8 : slabdata      0      0      0
flow_cache             0      0    104   37    1 : tunables  120   60    8 : slabdata      0      0      0
scsi_sense_cache      22     30    128   30    1 : tunables  120   60    8 : slabdata      1      1      0
scsi_cmd_cache        22     30    256   15    1 : tunables  120   60    8 : slabdata      2      2      0
sd_ext_cdb             2    112     32  112    1 : tunables  120   60    8 : slabdata      1      1      0
cfq_io_cq             43    148    104   37    1 : tunables  120   60    8 : slabdata      4      4      0
cfq_queue             44    119    232   17    1 : tunables  120   60    8 : slabdata      7      7      0
mqueue_inode_cache      1      9    832    9    2 : tunables   54   27    8 : slabdata      1      1      0
jbd2_transaction_s     27     30    256   15    1 : tunables  120   60    8 : slabdata      2      2      0
jbd2_inode         17996  19019     48   77    1 : tunables  120   60    8 : slabdata    247    247      0
jbd2_journal_handle     24    144     24  144    1 : tunables  120   60    8 : slabdata      1      1      0
jbd2_journal_head     54    340    112   34    1 : tunables  120   60    8 : slabdata     10     10      0
jbd2_revoke_table_s     10    202     16  202    1 : tunables  120   60    8 : slabdata      1      1      0
jbd2_revoke_record_s      0      0     32  112    1 : tunables  120   60    8 : slabdata      0      0      0
ext4_inode_cache  283945 284168    840    4    1 : tunables   54   27    8 : slabdata  71042  71042      0
ext4_xattr             0      0     88   44    1 : tunables  120   60    8 : slabdata      0      0      0
ext4_free_data         0      0     64   59    1 : tunables  120   60    8 : slabdata      0      0      0
ext4_allocation_context      0      0    136   28    1 : tunables  120   60    8 : slabdata      0      0      0
ext4_prealloc_space     18     74    104   37    1 : tunables  120   60    8 : slabdata      2      2      0
ext4_system_zone       0      0     40   92    1 : tunables  120   60    8 : slabdata      0      0      0
ext4_io_end            0      0   1128    3    1 : tunables   24   12    8 : slabdata      0      0      0
ext4_io_page           0      0     16  202    1 : tunables  120   60    8 : slabdata      0      0      0
kioctx                 0      0    384   10    1 : tunables   54   27    8 : slabdata      0      0      0
kiocb                  0      0    256   15    1 : tunables  120   60    8 : slabdata      0      0      0
fanotify_response_event      0      0     32  112    1 : tunables  120   60    8 : slabdata      0      0      0
fsnotify_mark          0      0    128   30    1 : tunables  120   60    8 : slabdata      0      0      0
inotify_event_private_data      0      0     32  112    1 : tunables  120   60    8 : slabdata      0      0      0
inotify_inode_mark     16     28    136   28    1 : tunables  120   60    8 : slabdata      1      1      0
dnotify_mark           0      0    136   28    1 : tunables  120   60    8 : slabdata      0      0      0
dnotify_struct         0      0     32  112    1 : tunables  120   60    8 : slabdata      0      0      0
dio                    0      0    640    6    1 : tunables   54   27    8 : slabdata      0      0      0
fasync_cache           5     77     48   77    1 : tunables  120   60    8 : slabdata      1      1      0
posix_timers_cache      0      0    144   27    1 : tunables  120   60    8 : slabdata      0      0      0
uid_cache              9     30    128   30    1 : tunables  120   60    8 : slabdata      1      1      0
UNIX                 175    175    768    5    1 : tunables   54   27    8 : slabdata     35     35      0
UDP-Lite               0      0    832    9    2 : tunables   54   27    8 : slabdata      0      0      0
tcp_bind_bucket       10    112     32  112    1 : tunables  120   60    8 : slabdata      1      1      0
inet_peer_cache       15     40    192   20    1 : tunables  120   60    8 : slabdata      2      2      0
secpath_cache          0      0     64   59    1 : tunables  120   60    8 : slabdata      0      0      0
xfrm_dst_cache         0      0    384   10    1 : tunables   54   27    8 : slabdata      0      0      0
ip_fib_trie            8     67     56   67    1 : tunables  120   60    8 : slabdata      1      1      0
ip_fib_alias           9     77     48   77    1 : tunables  120   60    8 : slabdata      1      1      0
ip_dst_cache          31     45    256   15    1 : tunables  120   60    8 : slabdata      3      3      0
PING                   0      0    768    5    1 : tunables   54   27    8 : slabdata      0      0      0
RAW                    3      9    832    9    2 : tunables   54   27    8 : slabdata      1      1      0
UDP                   18     18    832    9    2 : tunables   54   27    8 : slabdata      2      2      0
tw_sock_TCP            0      0    192   20    1 : tunables  120   60    8 : slabdata      0      0      0
request_sock_TCP       0      0    128   30    1 : tunables  120   60    8 : slabdata      0      0      0
TCP                   10     15   1600    5    2 : tunables   24   12    8 : slabdata      3      3      0
eventpoll_pwq         94    212     72   53    1 : tunables  120   60    8 : slabdata      4      4      0
eventpoll_epi         94    180    128   30    1 : tunables  120   60    8 : slabdata      6      6      0
sgpool-128             2      2   4096    1    1 : tunables   24   12    8 : slabdata      2      2      0
sgpool-64              2      2   2048    2    1 : tunables   24   12    8 : slabdata      1      1      0
sgpool-32              2      4   1024    4    1 : tunables   54   27    8 : slabdata      1      1      0
sgpool-16              2      8    512    8    1 : tunables   54   27    8 : slabdata      1      1      0
sgpool-8              15     15    256   15    1 : tunables  120   60    8 : slabdata      1      1      0
scsi_data_buffer       0      0     24  144    1 : tunables  120   60    8 : slabdata      0      0      0
blkdev_queue           2      4   1688    4    2 : tunables   24   12    8 : slabdata      1      1      0
blkdev_requests       22     22    344   11    1 : tunables   54   27    8 : slabdata      2      2      0
blkdev_ioc            43    160     96   40    1 : tunables  120   60    8 : slabdata      4      4      0
fsnotify_event_holder      0      0     24  144    1 : tunables  120   60    8 : slabdata      0      0      0
fsnotify_event         1     34    112   34    1 : tunables  120   60    8 : slabdata      1      1      0
bio-0                 32     40    192   20    1 : tunables  120   60    8 : slabdata      2      2      0
biovec-256             2      2   4096    1    1 : tunables   24   12    8 : slabdata      2      2      0
biovec-128             0      0   2048    2    1 : tunables   24   12    8 : slabdata      0      0      0
biovec-64              0      0   1024    4    1 : tunables   54   27    8 : slabdata      0      0      0
biovec-16              0      0    256   15    1 : tunables  120   60    8 : slabdata      0      0      0
sock_inode_cache     224    224    576    7    1 : tunables   54   27    8 : slabdata     32     32      0
skbuff_fclone_cache      0      8    448    8    1 : tunables   54   27    8 : slabdata      0      1      0
skbuff_head_cache    156    180    256   15    1 : tunables  120   60    8 : slabdata     12     12      0
file_lock_cache       34     40    192   20    1 : tunables  120   60    8 : slabdata      2      2      0
shmem_inode_cache   1364   1573    592   13    2 : tunables   54   27    8 : slabdata    121    121    135
Acpi-Operand        1432   1484     72   53    1 : tunables  120   60    8 : slabdata     28     28      0
Acpi-ParseExt          0      0     72   53    1 : tunables  120   60    8 : slabdata      0      0      0
Acpi-Parse             0      0     48   77    1 : tunables  120   60    8 : slabdata      0      0      0
Acpi-State             0      0     80   48    1 : tunables  120   60    8 : slabdata      0      0      0
Acpi-Namespace       691    736     40   92    1 : tunables  120   60    8 : slabdata      8      8      0
task_delay_info      169    306    112   34    1 : tunables  120   60    8 : slabdata      9      9      0
taskstats              2     12    328   12    1 : tunables   54   27    8 : slabdata      1      1      0
proc_inode_cache    1188   1188    592    6    1 : tunables   54   27    8 : slabdata    198    198      0
sigqueue              48     48    160   24    1 : tunables  120   60    8 : slabdata      2      2      0
bdev_cache            11     15    768    5    1 : tunables   54   27    8 : slabdata      3      3      0
sysfs_dir_cache     6770   6800    112   34    1 : tunables  120   60    8 : slabdata    200    200      0
mnt_cache             30     45    256   15    1 : tunables  120   60    8 : slabdata      3      3      0
filp                2362   3285    256   15    1 : tunables  120   60    8 : slabdata    219    219     60
inode_cache          651    651    528    7    1 : tunables   54   27    8 : slabdata     93     93      0
dentry            232248 238440    192   20    1 : tunables  120   60    8 : slabdata  11922  11922      0
names_cache            5      5   4096    1    1 : tunables   24   12    8 : slabdata      5      5      0
key_jar                1     20    192   20    1 : tunables  120   60    8 : slabdata      1      1      0
buffer_head       376815 406556    104   37    1 : tunables  120   60    8 : slabdata  10988  10988      0
nsproxy                1     77     48   77    1 : tunables  120   60    8 : slabdata      1      1      0
vm_area_struct      4348   4600    168   23    1 : tunables  120   60    8 : slabdata    200    200     60
mm_struct             76     76    896    4    1 : tunables   54   27    8 : slabdata     19     19      0
fs_cache              84    177     64   59    1 : tunables  120   60    8 : slabdata      3      3      0
files_cache           84    132    704   11    2 : tunables   54   27    8 : slabdata     12     12      0
signal_cache         125    140   1024    4    1 : tunables   54   27    8 : slabdata     35     35      0
sighand_cache        117    126   2112    3    2 : tunables   24   12    8 : slabdata     42     42      0
task_xstate          112    112    512    8    1 : tunables   54   27    8 : slabdata     14     14      0
task_struct          155    155   1472    5    2 : tunables   24   12    8 : slabdata     31     31      0
cred_jar             373    640    192   20    1 : tunables  120   60    8 : slabdata     32     32      0
anon_vma_chain      3520   5005     48   77    1 : tunables  120   60    8 : slabdata     65     65      0
anon_vma            2502   2950     64   59    1 : tunables  120   60    8 : slabdata     50     50      0
pid                  175    240    128   30    1 : tunables  120   60    8 : slabdata      8      8      0
radix_tree_node    28764  29099    560    7    1 : tunables   54   27    8 : slabdata   4157   4157      0
idr_layer_cache      327    357    544    7    1 : tunables   54   27    8 : slabdata     51     51      0
size-4194304(DMA)      0      0 4194304    1 1024 : tunables    1    1    0 : slabdata      0      0      0
size-4194304           0      0 4194304    1 1024 : tunables    1    1    0 : slabdata      0      0      0
size-2097152(DMA)      0      0 2097152    1  512 : tunables    1    1    0 : slabdata      0      0      0
size-2097152           0      0 2097152    1  512 : tunables    1    1    0 : slabdata      0      0      0
size-1048576(DMA)      0      0 1048576    1  256 : tunables    1    1    0 : slabdata      0      0      0
size-1048576           0      0 1048576    1  256 : tunables    1    1    0 : slabdata      0      0      0
size-524288(DMA)       0      0 524288    1  128 : tunables    1    1    0 : slabdata      0      0      0
size-524288            1      1 524288    1  128 : tunables    1    1    0 : slabdata      1      1      0
size-262144(DMA)       0      0 262144    1   64 : tunables    1    1    0 : slabdata      0      0      0
size-262144            0      0 262144    1   64 : tunables    1    1    0 : slabdata      0      0      0
size-131072(DMA)       0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-131072            3      3 131072    1   32 : tunables    8    4    0 : slabdata      3      3      0
size-65536(DMA)        0      0  65536    1   16 : tunables    8    4    0 : slabdata      0      0      0
size-65536             5      5  65536    1   16 : tunables    8    4    0 : slabdata      5      5      0
size-32768(DMA)        0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-32768             9      9  32768    1    8 : tunables    8    4    0 : slabdata      9      9      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 : slabdata      0      0      0
size-16384             7      7  16384    1    4 : tunables    8    4    0 : slabdata      7      7      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : slabdata      0      0      0
size-8192             23     23   8192    1    2 : tunables    8    4    0 : slabdata     23     23      0
size-4096(DMA)         0      0   4096    1    1 : tunables   24   12    8 : slabdata      0      0      0
size-4096            210    210   4096    1    1 : tunables   24   12    8 : slabdata    210    210      0
size-2048(DMA)         0      0   2048    2    1 : tunables   24   12    8 : slabdata      0      0      0
size-2048            276    276   2048    2    1 : tunables   24   12    8 : slabdata    138    138      0
size-1024(DMA)         0      0   1024    4    1 : tunables   54   27    8 : slabdata      0      0      0
size-1024            920    920   1024    4    1 : tunables   54   27    8 : slabdata    230    230      0
size-512(DMA)          0      0    512    8    1 : tunables   54   27    8 : slabdata      0      0      0
size-512             608    608    512    8    1 : tunables   54   27    8 : slabdata     76     76      0
size-256(DMA)          0      0    256   15    1 : tunables  120   60    8 : slabdata      0      0      0
size-256             620    795    256   15    1 : tunables  120   60    8 : slabdata     53     53      0
size-192(DMA)          0      0    192   20    1 : tunables  120   60    8 : slabdata      0      0      0
size-192            1545   1960    192   20    1 : tunables  120   60    8 : slabdata     98     98      0
size-128(DMA)          0      0    128   30    1 : tunables  120   60    8 : slabdata      0      0      0
size-64(DMA)           0      0     64   59    1 : tunables  120   60    8 : slabdata      0      0      0
size-64             8307  13924     64   59    1 : tunables  120   60    8 : slabdata    236    236      0
size-32(DMA)           0      0     32  112    1 : tunables  120   60    8 : slabdata      0      0      0
size-128            3820   3840    128   30    1 : tunables  120   60    8 : slabdata    128    128      0
size-32             5625   5936     32  112    1 : tunables  120   60    8 : slabdata     53     53      0
kmem_cache           153    160    192   20    1 : tunables  120   60    8 : slabdata      8      8      0

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 16:19         ` Sven Joachim
@ 2012-04-09 16:33           ` Rik van Riel
  2012-04-09 17:00             ` Pekka Enberg
  2012-04-09 17:00             ` Sven Joachim
  0 siblings, 2 replies; 43+ messages in thread
From: Rik van Riel @ 2012-04-09 16:33 UTC (permalink / raw)
  To: Sven Joachim
  Cc: Linus Torvalds, Andrew Morton, werner, David Rientjes,
	Hugh Dickins, linux-kernel, Oleg Nesterov

On 04/09/2012 12:19 PM, Sven Joachim wrote:
> On 2012-04-09 17:57 +0200, Rik van Riel wrote:
>
>> On 04/09/2012 03:04 AM, Sven Joachim wrote:
>>
>>> While I did not experience any crashes or instabilities (yet?), I'm also
>>> seeing memory leaks.  On a system started this morning, with hardly
>>> anything running:
>>
>>> where I would expect no more than 50 MB used, 400 MB are actually in use:
>>>
>>> ,----
>>> | $ free
>>> |              total       used       free     shared    buffers     cached
>>> | Mem:       3348400    1849712    1498688          0     328960    1119180
>>> | -/+ buffers/cache:     401572    2946828
>>> | Swap:      3719040          0    3719040
>>> `----
>>
>> Do you see any big memory users in /proc/meminfo or in
>> /proc/slabinfo?
>
> Attaching these files, since I can't really make anything out of the
> latter.  Note that I started a few memory hogs (X, Firefox, Emacs with
> Gnus), so overall memory footprint has grown to 768 MB.

Looks like the "missing" 400MB is all in filesystem caches,
specifically the dentry cache, the ext4 inode cache and
buffer heads.

That is perfectly fine, since those caches will be shrunk
when the system needs memory.

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 16:33           ` Rik van Riel
@ 2012-04-09 17:00             ` Pekka Enberg
  2012-04-09 17:19               ` Sven Joachim
  2012-04-09 17:00             ` Sven Joachim
  1 sibling, 1 reply; 43+ messages in thread
From: Pekka Enberg @ 2012-04-09 17:00 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Sven Joachim, Linus Torvalds, Andrew Morton, werner,
	David Rientjes, Hugh Dickins, linux-kernel, Oleg Nesterov

On Mon, Apr 9, 2012 at 7:33 PM, Rik van Riel <riel@redhat.com> wrote:
>> Attaching these files, since I can't really make anything out of the
>> latter.  Note that I started a few memory hogs (X, Firefox, Emacs with
>> Gnus), so overall memory footprint has grown to 768 MB.
>
> Looks like the "missing" 400MB is all in filesystem caches,
> specifically the dentry cache, the ext4 inode cache and
> buffer heads.
>
> That is perfectly fine, since those caches will be shrunk
> when the system needs memory.

CONFIG_SLUB, right? It will merge caches so you don't necessarily see
leaks in /proc/slabinfo. You can use "slub_nomerge" kernel parameter
to disable the merging.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 16:33           ` Rik van Riel
  2012-04-09 17:00             ` Pekka Enberg
@ 2012-04-09 17:00             ` Sven Joachim
  2012-04-09 17:20               ` Rik van Riel
  1 sibling, 1 reply; 43+ messages in thread
From: Sven Joachim @ 2012-04-09 17:00 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Linus Torvalds, Andrew Morton, werner, David Rientjes,
	Hugh Dickins, linux-kernel, Oleg Nesterov

On 2012-04-09 18:33 +0200, Rik van Riel wrote:

> On 04/09/2012 12:19 PM, Sven Joachim wrote:
>> On 2012-04-09 17:57 +0200, Rik van Riel wrote:
>>
>>> On 04/09/2012 03:04 AM, Sven Joachim wrote:
>>>
>>>> While I did not experience any crashes or instabilities (yet?), I'm also
>>>> seeing memory leaks.  On a system started this morning, with hardly
>>>> anything running:
>>>
>>>> where I would expect no more than 50 MB used, 400 MB are actually in use:
>>>>
>>>> ,----
>>>> | $ free
>>>> |              total       used       free     shared    buffers     cached
>>>> | Mem:       3348400    1849712    1498688          0     328960    1119180
>>>> | -/+ buffers/cache:     401572    2946828
>>>> | Swap:      3719040          0    3719040
>>>> `----
>>>
>>> Do you see any big memory users in /proc/meminfo or in
>>> /proc/slabinfo?
>>
>> Attaching these files, since I can't really make anything out of the
>> latter.  Note that I started a few memory hogs (X, Firefox, Emacs with
>> Gnus), so overall memory footprint has grown to 768 MB.
>
> Looks like the "missing" 400MB is all in filesystem caches,
> specifically the dentry cache, the ext4 inode cache and
> buffer heads.

Then why does free(1) report those in the "-/+ buffers/cache:" line?  It
did not do this with earlier kernels, AFAIK.

Cheers,
       Sven

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 17:00             ` Pekka Enberg
@ 2012-04-09 17:19               ` Sven Joachim
  0 siblings, 0 replies; 43+ messages in thread
From: Sven Joachim @ 2012-04-09 17:19 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Rik van Riel, Linus Torvalds, Andrew Morton, werner,
	David Rientjes, Hugh Dickins, linux-kernel, Oleg Nesterov

On 2012-04-09 19:00 +0200, Pekka Enberg wrote:

> On Mon, Apr 9, 2012 at 7:33 PM, Rik van Riel <riel@redhat.com> wrote:
>>> Attaching these files, since I can't really make anything out of the
>>> latter.  Note that I started a few memory hogs (X, Firefox, Emacs with
>>> Gnus), so overall memory footprint has grown to 768 MB.
>>
>> Looks like the "missing" 400MB is all in filesystem caches,
>> specifically the dentry cache, the ext4 inode cache and
>> buffer heads.
>>
>> That is perfectly fine, since those caches will be shrunk
>> when the system needs memory.
>
> CONFIG_SLUB, right?

Actually, no.  For some reason (probably historical…) I have CONFIG_SLAB.

> It will merge caches so you don't necessarily see
> leaks in /proc/slabinfo. You can use "slub_nomerge" kernel parameter
> to disable the merging.

Cheers,
       Sven

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 17:00             ` Sven Joachim
@ 2012-04-09 17:20               ` Rik van Riel
  0 siblings, 0 replies; 43+ messages in thread
From: Rik van Riel @ 2012-04-09 17:20 UTC (permalink / raw)
  To: Sven Joachim
  Cc: Linus Torvalds, Andrew Morton, werner, David Rientjes,
	Hugh Dickins, linux-kernel, Oleg Nesterov

On 04/09/2012 01:00 PM, Sven Joachim wrote:
> On 2012-04-09 18:33 +0200, Rik van Riel wrote:
>
>> On 04/09/2012 12:19 PM, Sven Joachim wrote:
>>> On 2012-04-09 17:57 +0200, Rik van Riel wrote:
>>>
>>>> On 04/09/2012 03:04 AM, Sven Joachim wrote:
>>>>
>>>>> While I did not experience any crashes or instabilities (yet?), I'm also
>>>>> seeing memory leaks.  On a system started this morning, with hardly
>>>>> anything running:
>>>>
>>>>> where I would expect no more than 50 MB used, 400 MB are actually in use:
>>>>>
>>>>> ,----
>>>>> | $ free
>>>>> |              total       used       free     shared    buffers     cached
>>>>> | Mem:       3348400    1849712    1498688          0     328960    1119180
>>>>> | -/+ buffers/cache:     401572    2946828
>>>>> | Swap:      3719040          0    3719040
>>>>> `----
>>>>
>>>> Do you see any big memory users in /proc/meminfo or in
>>>> /proc/slabinfo?
>>>
>>> Attaching these files, since I can't really make anything out of the
>>> latter.  Note that I started a few memory hogs (X, Firefox, Emacs with
>>> Gnus), so overall memory footprint has grown to 768 MB.
>>
>> Looks like the "missing" 400MB is all in filesystem caches,
>> specifically the dentry cache, the ext4 inode cache and
>> buffer heads.
>
> Then why does free(1) report those in the "-/+ buffers/cache:" line?  It
> did not do this with earlier kernels, AFAIK.

It has done so for over a decade. Reclaimable slab has never been
subtracted from "used" by the free utility.

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 15:39       ` Linus Torvalds
@ 2012-04-09 21:22         ` David Rientjes
  2012-04-09 22:09           ` Linus Torvalds
  2012-04-09 22:13             ` Colin Cross
  0 siblings, 2 replies; 43+ messages in thread
From: David Rientjes @ 2012-04-09 21:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
	Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
	Anton Vorontsov, Greg Kroah-Hartman, stable

On Mon, 9 Apr 2012, Linus Torvalds wrote:

> The real bug is actually that those notifiers are a f*cking joke, and
> the return value from the notifier is a mistake.
> 
> So I personally think that the real problem is this code in
> profile_handoff_task:
> 
>         return (ret == NOTIFY_OK) ? 1 : 0;
> 
> and ask yourself two questions:
> 
>  - what the hell does NOTIFY_OK/NOTIFY_DONE mean?
>  - what happens if there are multiple notifiers that all (or some)
> return NOTIFY_OK?
> 

NOTIFY_OK should never be a valid response for this notifier the way it's 
currently implemented, it should be NOTIFY_STOP to stop iterating the call 
chain to avoid a double free.  Right now it doesn't matter because only 
oprofile is actually freeing the task_struct and lowmemorykiller should be 
using NOTIFY_DONE.

Then we have a completeness issue if multiple callbacks want to return 
NOTIFY_STOP and an ordering issue if the oprofile callback is invoked 
before lowmemorykiller.

> I'll tell you what my answers are:
> 
>  (a) NOTIFY_DONE is the "ok, everything is fine, you can free the
> task-struct". It's also what that handoff notifier thing returns if
> there are no notifiers registered at all.
> 
>      So the fix to the Android lowmemorykiller is as simple as just
> changing NOTIFY_OK to NOTIFY_DONE, which will mean that the caller
> will properly free the task struct.
> 

I don't think so for Werner's config who also has CONFIG_OPROFILE=y, so 
oprofile would return NOTIFY_OK and queue the task_struct for free, then 
the second notifier callback to the lowmemorykiller would return 
NOTIFY_DONE which would result in put_task_struct() doing free_task() 
itself for a double free.

>      The NOTIFY_OK/NOTIFY_DONE difference really does seem to be just
> "NOTIFY_OK means that I will free the task myself later". That's what
> the oprofile uses, and it frees the task.
> 
>  (b) But the whole interface is a total f*cking mess. If *multiple*
> people return NOTIFY_OK, they're royally fucked. And the whole (and
> only) point of notifiers is that you can register multiple different
> ones independently.
> 
> So quite frankly, the *real* bug is not in that android driver
> (although I'd say that we should just make it return NOTIFY_DONE and
> be done with it). The real bug is that the whole f*cking notifier is a
> mistake, and checking the error return was the biggest mistake of all.
> 

Right, we can't handoff the freeing of the task_struct to more than one 
notifier.  It seems misdesigned from the beginning and what we really want 
is to hijack task->usage for __put_task_struct(task) if we have such a 
notifier callchain and require each one (currently just oprofile) to take 
a reference on task->usage for NOTIFY_OK and then be responsible for 
dropping the reference when it's done with it later instead of requiring 
it to free the task_struct itself.

That's _if_ we want to continue to have such an interface in the first 
place where it's only really necessary right now for oprofile (and, hence, 
wasn't implemented in an extendable way).  I'm thinking the 
lowmemorykiller, as I eluded to, could be written in a way where we can 
detect if a thread we've already killed has exited yet before killing 
another one.  We can't just store a pointer to the task_struct of the 
killed task since it could be reused for a fork later, but we could use 
TIF_MEMDIE like the oom killer does.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 21:22         ` David Rientjes
@ 2012-04-09 22:09           ` Linus Torvalds
  2012-04-09 23:25             ` David Rientjes
  2012-04-09 22:13             ` Colin Cross
  1 sibling, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2012-04-09 22:09 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
	Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
	Anton Vorontsov, Greg Kroah-Hartman, stable, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 2126 bytes --]

On Mon, Apr 9, 2012 at 2:22 PM, David Rientjes <rientjes@google.com> wrote:
>
> NOTIFY_OK should never be a valid response for this notifier the way it's
> currently implemented, it should be NOTIFY_STOP to stop iterating the call
> chain to avoid a double free.

No, that's no good either. That would mean that some people wouldn't
be notified about the death of the task at all.

So NOTIFY_STOP just implies *another* bug.

> Right, we can't handoff the freeing of the task_struct to more than one
> notifier.  It seems misdesigned from the beginning and what we really want
> is to hijack task->usage for __put_task_struct(task) if we have such a
> notifier callchain and require each one (currently just oprofile) to take
> a reference on task->usage for NOTIFY_OK and then be responsible for
> dropping the reference when it's done with it later instead of requiring
> it to free the task_struct itself.

We could make notifier.c just "or" all the return values together, and
then it's ok if *one* person returns NOTIFY_OK.

Of course, that's not how notifiers are documented to work, but quite
frankly, notifiers with non-zero values that don't sat STOP are broken
as-is anyway, so you might we well do a logical "or" of the return
values and at least make things like this work.

I personally think every single notifier interface we have ever had in
the kernel has been a total f*cking disaster. The whole concept of
"run these random functions at this random event" is a broken concept
that just makes people do crazy broken things.

Oh well. So my suggestion right now would be something like the
attached. It's still horribly broken, it actively breaks documented
notifier behavior, but dammit, if the notifier people don't like
'or'ing return values together they should damn well return zero from
the notifier that doesn't do anything. And returning an error will
exit out, so..

Hmm? Who cares about that kernel/notifier.c code? Andrew? Ingo? We
don't have any actual maintainer for that crap, but judging by the
commits, it's one of you two.

                  Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 1047 bytes --]

 drivers/staging/android/lowmemorykiller.c |    2 +-
 kernel/notifier.c                         |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
index 052b43e4e505..142bfc2f84db 100644
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -79,7 +79,7 @@ task_notify_func(struct notifier_block *self, unsigned long val, void *data)
 	if (task == lowmem_deathpending)
 		lowmem_deathpending = NULL;
 
-	return NOTIFY_OK;
+	return NOTIFY_DONE;
 }
 
 static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
diff --git a/kernel/notifier.c b/kernel/notifier.c
index 2d5cc4ccff7f..11fe956e8daf 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -90,7 +90,7 @@ static int __kprobes notifier_call_chain(struct notifier_block **nl,
 			continue;
 		}
 #endif
-		ret = nb->notifier_call(nb, val, v);
+		ret |= nb->notifier_call(nb, val, v);
 
 		if (nr_calls)
 			(*nr_calls)++;

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 21:22         ` David Rientjes
@ 2012-04-09 22:13             ` Colin Cross
  2012-04-09 22:13             ` Colin Cross
  1 sibling, 0 replies; 43+ messages in thread
From: Colin Cross @ 2012-04-09 22:13 UTC (permalink / raw)
  To: David Rientjes
  Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
	Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
	Christian Bejram, Paul E. McKenney, Anton Vorontsov,
	Greg Kroah-Hartman, stable

On Mon, Apr 9, 2012 at 2:22 PM, David Rientjes <rientjes@google.com> wrote:
> On Mon, 9 Apr 2012, Linus Torvalds wrote:
>
>> The real bug is actually that those notifiers are a f*cking joke, and
>> the return value from the notifier is a mistake.
>>
>> So I personally think that the real problem is this code in
>> profile_handoff_task:
>>
>>         return (ret == NOTIFY_OK) ? 1 : 0;
>>
>> and ask yourself two questions:
>>
>>  - what the hell does NOTIFY_OK/NOTIFY_DONE mean?
>>  - what happens if there are multiple notifiers that all (or some)
>> return NOTIFY_OK?
>>
> NOTIFY_OK should never be a valid response for this notifier the way it's
> currently implemented, it should be NOTIFY_STOP to stop iterating the call
> chain to avoid a double free.  Right now it doesn't matter because only
> oprofile is actually freeing the task_struct and lowmemorykiller should be
> using NOTIFY_DONE.
>
> Then we have a completeness issue if multiple callbacks want to return
> NOTIFY_STOP and an ordering issue if the oprofile callback is invoked
> before lowmemorykiller.
>
>> I'll tell you what my answers are:
>>
>>  (a) NOTIFY_DONE is the "ok, everything is fine, you can free the
>> task-struct". It's also what that handoff notifier thing returns if
>> there are no notifiers registered at all.
>>
>>      So the fix to the Android lowmemorykiller is as simple as just
>> changing NOTIFY_OK to NOTIFY_DONE, which will mean that the caller
>> will properly free the task struct.
>>
>
> I don't think so for Werner's config who also has CONFIG_OPROFILE=y, so
> oprofile would return NOTIFY_OK and queue the task_struct for free, then
> the second notifier callback to the lowmemorykiller would return
> NOTIFY_DONE which would result in put_task_struct() doing free_task()
> itself for a double free.
>
>>      The NOTIFY_OK/NOTIFY_DONE difference really does seem to be just
>> "NOTIFY_OK means that I will free the task myself later". That's what
>> the oprofile uses, and it frees the task.
>>
>>  (b) But the whole interface is a total f*cking mess. If *multiple*
>> people return NOTIFY_OK, they're royally fucked. And the whole (and
>> only) point of notifiers is that you can register multiple different
>> ones independently.
>>
>> So quite frankly, the *real* bug is not in that android driver
>> (although I'd say that we should just make it return NOTIFY_DONE and
>> be done with it). The real bug is that the whole f*cking notifier is a
>> mistake, and checking the error return was the biggest mistake of all.
>>
>
> Right, we can't handoff the freeing of the task_struct to more than one
> notifier.  It seems misdesigned from the beginning and what we really want
> is to hijack task->usage for __put_task_struct(task) if we have such a
> notifier callchain and require each one (currently just oprofile) to take
> a reference on task->usage for NOTIFY_OK and then be responsible for
> dropping the reference when it's done with it later instead of requiring
> it to free the task_struct itself.
>
> That's _if_ we want to continue to have such an interface in the first
> place where it's only really necessary right now for oprofile (and, hence,
> wasn't implemented in an extendable way).  I'm thinking the
> lowmemorykiller, as I eluded to, could be written in a way where we can
> detect if a thread we've already killed has exited yet before killing
> another one.  We can't just store a pointer to the task_struct of the
> killed task since it could be reused for a fork later, but we could use
> TIF_MEMDIE like the oom killer does.

This was a known issue in 2010, in the android tree the use of
task_handoff_register was dropped one day after it was added and
replaced with a new task_free_register hook.  I assume Greg dropped
the fix during the android tree refresh in 3.0 because it depended on
a change to kernel/fork.c.  The two relevant patches are (using
codeaurora's gitweb becase we don't have one right now):

sched: Add a generic notifier when a task struct is about to be freed
https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=667dffa787a87ef4ea43cc65957ce96077fdcd0a

staging: android: lowmemorykiller: Fix task_struct leak
https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=af0240f095a704f75f032bbcc01f670c65c163ba

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
@ 2012-04-09 22:13             ` Colin Cross
  0 siblings, 0 replies; 43+ messages in thread
From: Colin Cross @ 2012-04-09 22:13 UTC (permalink / raw)
  To: David Rientjes
  Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
	Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
	Christian Bejram, Paul E. McKenney, Anton Vorontsov,
	Greg Kroah-Hartman, stable

On Mon, Apr 9, 2012 at 2:22 PM, David Rientjes <rientjes@google.com> wrote:
> On Mon, 9 Apr 2012, Linus Torvalds wrote:
>
>> The real bug is actually that those notifiers are a f*cking joke, and
>> the return value from the notifier is a mistake.
>>
>> So I personally think that the real problem is this code in
>> profile_handoff_task:
>>
>> � � � � return (ret == NOTIFY_OK) ? 1 : 0;
>>
>> and ask yourself two questions:
>>
>> �- what the hell does NOTIFY_OK/NOTIFY_DONE mean?
>> �- what happens if there are multiple notifiers that all (or some)
>> return NOTIFY_OK?
>>
> NOTIFY_OK should never be a valid response for this notifier the way it's
> currently implemented, it should be NOTIFY_STOP to stop iterating the call
> chain to avoid a double free. �Right now it doesn't matter because only
> oprofile is actually freeing the task_struct and lowmemorykiller should be
> using NOTIFY_DONE.
>
> Then we have a completeness issue if multiple callbacks want to return
> NOTIFY_STOP and an ordering issue if the oprofile callback is invoked
> before lowmemorykiller.
>
>> I'll tell you what my answers are:
>>
>> �(a) NOTIFY_DONE is the "ok, everything is fine, you can free the
>> task-struct". It's also what that handoff notifier thing returns if
>> there are no notifiers registered at all.
>>
>> � � �So the fix to the Android lowmemorykiller is as simple as just
>> changing NOTIFY_OK to NOTIFY_DONE, which will mean that the caller
>> will properly free the task struct.
>>
>
> I don't think so for Werner's config who also has CONFIG_OPROFILE=y, so
> oprofile would return NOTIFY_OK and queue the task_struct for free, then
> the second notifier callback to the lowmemorykiller would return
> NOTIFY_DONE which would result in put_task_struct() doing free_task()
> itself for a double free.
>
>> � � �The NOTIFY_OK/NOTIFY_DONE difference really does seem to be just
>> "NOTIFY_OK means that I will free the task myself later". That's what
>> the oprofile uses, and it frees the task.
>>
>> �(b) But the whole interface is a total f*cking mess. If *multiple*
>> people return NOTIFY_OK, they're royally fucked. And the whole (and
>> only) point of notifiers is that you can register multiple different
>> ones independently.
>>
>> So quite frankly, the *real* bug is not in that android driver
>> (although I'd say that we should just make it return NOTIFY_DONE and
>> be done with it). The real bug is that the whole f*cking notifier is a
>> mistake, and checking the error return was the biggest mistake of all.
>>
>
> Right, we can't handoff the freeing of the task_struct to more than one
> notifier. �It seems misdesigned from the beginning and what we really want
> is to hijack task->usage for __put_task_struct(task) if we have such a
> notifier callchain and require each one (currently just oprofile) to take
> a reference on task->usage for NOTIFY_OK and then be responsible for
> dropping the reference when it's done with it later instead of requiring
> it to free the task_struct itself.
>
> That's _if_ we want to continue to have such an interface in the first
> place where it's only really necessary right now for oprofile (and, hence,
> wasn't implemented in an extendable way). �I'm thinking the
> lowmemorykiller, as I eluded to, could be written in a way where we can
> detect if a thread we've already killed has exited yet before killing
> another one. �We can't just store a pointer to the task_struct of the
> killed task since it could be reused for a fork later, but we could use
> TIF_MEMDIE like the oom killer does.

This was a known issue in 2010, in the android tree the use of
task_handoff_register was dropped one day after it was added and
replaced with a new task_free_register hook.  I assume Greg dropped
the fix during the android tree refresh in 3.0 because it depended on
a change to kernel/fork.c.  The two relevant patches are (using
codeaurora's gitweb becase we don't have one right now):

sched: Add a generic notifier when a task struct is about to be freed
https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=667dffa787a87ef4ea43cc65957ce96077fdcd0a

staging: android: lowmemorykiller: Fix task_struct leak
https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=af0240f095a704f75f032bbcc01f670c65c163ba

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 22:13             ` Colin Cross
@ 2012-04-09 22:21               ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 43+ messages in thread
From: Greg Kroah-Hartman @ 2012-04-09 22:21 UTC (permalink / raw)
  To: Colin Cross
  Cc: David Rientjes, Linus Torvalds, Andrew Morton, werner,
	Rik van Riel, Hugh Dickins, linux-kernel, Oleg Nesterov,
	Rabin Vincent, Christian Bejram, Paul E. McKenney,
	Anton Vorontsov, stable

On Mon, Apr 09, 2012 at 03:13:00PM -0700, Colin Cross wrote:
> On Mon, Apr 9, 2012 at 2:22 PM, David Rientjes <rientjes@google.com> wrote:
> > On Mon, 9 Apr 2012, Linus Torvalds wrote:
> >
> >> The real bug is actually that those notifiers are a f*cking joke, and
> >> the return value from the notifier is a mistake.
> >>
> >> So I personally think that the real problem is this code in
> >> profile_handoff_task:
> >>
> >>         return (ret == NOTIFY_OK) ? 1 : 0;
> >>
> >> and ask yourself two questions:
> >>
> >>  - what the hell does NOTIFY_OK/NOTIFY_DONE mean?
> >>  - what happens if there are multiple notifiers that all (or some)
> >> return NOTIFY_OK?
> >>
> > NOTIFY_OK should never be a valid response for this notifier the way it's
> > currently implemented, it should be NOTIFY_STOP to stop iterating the call
> > chain to avoid a double free.  Right now it doesn't matter because only
> > oprofile is actually freeing the task_struct and lowmemorykiller should be
> > using NOTIFY_DONE.
> >
> > Then we have a completeness issue if multiple callbacks want to return
> > NOTIFY_STOP and an ordering issue if the oprofile callback is invoked
> > before lowmemorykiller.
> >
> >> I'll tell you what my answers are:
> >>
> >>  (a) NOTIFY_DONE is the "ok, everything is fine, you can free the
> >> task-struct". It's also what that handoff notifier thing returns if
> >> there are no notifiers registered at all.
> >>
> >>      So the fix to the Android lowmemorykiller is as simple as just
> >> changing NOTIFY_OK to NOTIFY_DONE, which will mean that the caller
> >> will properly free the task struct.
> >>
> >
> > I don't think so for Werner's config who also has CONFIG_OPROFILE=y, so
> > oprofile would return NOTIFY_OK and queue the task_struct for free, then
> > the second notifier callback to the lowmemorykiller would return
> > NOTIFY_DONE which would result in put_task_struct() doing free_task()
> > itself for a double free.
> >
> >>      The NOTIFY_OK/NOTIFY_DONE difference really does seem to be just
> >> "NOTIFY_OK means that I will free the task myself later". That's what
> >> the oprofile uses, and it frees the task.
> >>
> >>  (b) But the whole interface is a total f*cking mess. If *multiple*
> >> people return NOTIFY_OK, they're royally fucked. And the whole (and
> >> only) point of notifiers is that you can register multiple different
> >> ones independently.
> >>
> >> So quite frankly, the *real* bug is not in that android driver
> >> (although I'd say that we should just make it return NOTIFY_DONE and
> >> be done with it). The real bug is that the whole f*cking notifier is a
> >> mistake, and checking the error return was the biggest mistake of all.
> >>
> >
> > Right, we can't handoff the freeing of the task_struct to more than one
> > notifier.  It seems misdesigned from the beginning and what we really want
> > is to hijack task->usage for __put_task_struct(task) if we have such a
> > notifier callchain and require each one (currently just oprofile) to take
> > a reference on task->usage for NOTIFY_OK and then be responsible for
> > dropping the reference when it's done with it later instead of requiring
> > it to free the task_struct itself.
> >
> > That's _if_ we want to continue to have such an interface in the first
> > place where it's only really necessary right now for oprofile (and, hence,
> > wasn't implemented in an extendable way).  I'm thinking the
> > lowmemorykiller, as I eluded to, could be written in a way where we can
> > detect if a thread we've already killed has exited yet before killing
> > another one.  We can't just store a pointer to the task_struct of the
> > killed task since it could be reused for a fork later, but we could use
> > TIF_MEMDIE like the oom killer does.
> 
> This was a known issue in 2010, in the android tree the use of
> task_handoff_register was dropped one day after it was added and
> replaced with a new task_free_register hook.  I assume Greg dropped
> the fix during the android tree refresh in 3.0 because it depended on
> a change to kernel/fork.c.  The two relevant patches are (using
> codeaurora's gitweb becase we don't have one right now):
> 
> sched: Add a generic notifier when a task struct is about to be freed
> https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=667dffa787a87ef4ea43cc65957ce96077fdcd0a

Yes, I can't add a patch like that for this driver, that is why I
thought everyone was getting together to "properly" determine how to
solve this oom notifier problem.  Has that work stalled somwhere?

greg k-h

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
@ 2012-04-09 22:21               ` Greg Kroah-Hartman
  0 siblings, 0 replies; 43+ messages in thread
From: Greg Kroah-Hartman @ 2012-04-09 22:21 UTC (permalink / raw)
  To: Colin Cross
  Cc: David Rientjes, Linus Torvalds, Andrew Morton, werner,
	Rik van Riel, Hugh Dickins, linux-kernel, Oleg Nesterov,
	Rabin Vincent, Christian Bejram, Paul E. McKenney,
	Anton Vorontsov, stable

On Mon, Apr 09, 2012 at 03:13:00PM -0700, Colin Cross wrote:
> On Mon, Apr 9, 2012 at 2:22 PM, David Rientjes <rientjes@google.com> wrote:
> > On Mon, 9 Apr 2012, Linus Torvalds wrote:
> >
> >> The real bug is actually that those notifiers are a f*cking joke, and
> >> the return value from the notifier is a mistake.
> >>
> >> So I personally think that the real problem is this code in
> >> profile_handoff_task:
> >>
> >> � � � � return (ret == NOTIFY_OK) ? 1 : 0;
> >>
> >> and ask yourself two questions:
> >>
> >> �- what the hell does NOTIFY_OK/NOTIFY_DONE mean?
> >> �- what happens if there are multiple notifiers that all (or some)
> >> return NOTIFY_OK?
> >>
> > NOTIFY_OK should never be a valid response for this notifier the way it's
> > currently implemented, it should be NOTIFY_STOP to stop iterating the call
> > chain to avoid a double free. �Right now it doesn't matter because only
> > oprofile is actually freeing the task_struct and lowmemorykiller should be
> > using NOTIFY_DONE.
> >
> > Then we have a completeness issue if multiple callbacks want to return
> > NOTIFY_STOP and an ordering issue if the oprofile callback is invoked
> > before lowmemorykiller.
> >
> >> I'll tell you what my answers are:
> >>
> >> �(a) NOTIFY_DONE is the "ok, everything is fine, you can free the
> >> task-struct". It's also what that handoff notifier thing returns if
> >> there are no notifiers registered at all.
> >>
> >> � � �So the fix to the Android lowmemorykiller is as simple as just
> >> changing NOTIFY_OK to NOTIFY_DONE, which will mean that the caller
> >> will properly free the task struct.
> >>
> >
> > I don't think so for Werner's config who also has CONFIG_OPROFILE=y, so
> > oprofile would return NOTIFY_OK and queue the task_struct for free, then
> > the second notifier callback to the lowmemorykiller would return
> > NOTIFY_DONE which would result in put_task_struct() doing free_task()
> > itself for a double free.
> >
> >> � � �The NOTIFY_OK/NOTIFY_DONE difference really does seem to be just
> >> "NOTIFY_OK means that I will free the task myself later". That's what
> >> the oprofile uses, and it frees the task.
> >>
> >> �(b) But the whole interface is a total f*cking mess. If *multiple*
> >> people return NOTIFY_OK, they're royally fucked. And the whole (and
> >> only) point of notifiers is that you can register multiple different
> >> ones independently.
> >>
> >> So quite frankly, the *real* bug is not in that android driver
> >> (although I'd say that we should just make it return NOTIFY_DONE and
> >> be done with it). The real bug is that the whole f*cking notifier is a
> >> mistake, and checking the error return was the biggest mistake of all.
> >>
> >
> > Right, we can't handoff the freeing of the task_struct to more than one
> > notifier. �It seems misdesigned from the beginning and what we really want
> > is to hijack task->usage for __put_task_struct(task) if we have such a
> > notifier callchain and require each one (currently just oprofile) to take
> > a reference on task->usage for NOTIFY_OK and then be responsible for
> > dropping the reference when it's done with it later instead of requiring
> > it to free the task_struct itself.
> >
> > That's _if_ we want to continue to have such an interface in the first
> > place where it's only really necessary right now for oprofile (and, hence,
> > wasn't implemented in an extendable way). �I'm thinking the
> > lowmemorykiller, as I eluded to, could be written in a way where we can
> > detect if a thread we've already killed has exited yet before killing
> > another one. �We can't just store a pointer to the task_struct of the
> > killed task since it could be reused for a fork later, but we could use
> > TIF_MEMDIE like the oom killer does.
> 
> This was a known issue in 2010, in the android tree the use of
> task_handoff_register was dropped one day after it was added and
> replaced with a new task_free_register hook.  I assume Greg dropped
> the fix during the android tree refresh in 3.0 because it depended on
> a change to kernel/fork.c.  The two relevant patches are (using
> codeaurora's gitweb becase we don't have one right now):
> 
> sched: Add a generic notifier when a task struct is about to be freed
> https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=667dffa787a87ef4ea43cc65957ce96077fdcd0a

Yes, I can't add a patch like that for this driver, that is why I
thought everyone was getting together to "properly" determine how to
solve this oom notifier problem.  Has that work stalled somwhere?

greg k-h

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 22:13             ` Colin Cross
  (?)
  (?)
@ 2012-04-09 22:30             ` Linus Torvalds
  -1 siblings, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2012-04-09 22:30 UTC (permalink / raw)
  To: Colin Cross
  Cc: David Rientjes, Andrew Morton, werner, Rik van Riel,
	Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
	Christian Bejram, Paul E. McKenney, Anton Vorontsov,
	Greg Kroah-Hartman, stable

On Mon, Apr 9, 2012 at 3:13 PM, Colin Cross <ccross@google.com> wrote:
>
> sched: Add a generic notifier when a task struct is about to be freed
> https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=667dffa787a87ef4ea43cc65957ce96077fdcd0a

Oh, *HELL*NO*!

It's a fucking disaster in "Oh, one notifier was broken, SO LET'S ADD
ANOTHER RANDOM ONE TO FIX THAT".

The definition of insanity is doing the same thing over and over and
thinking you get a different result. Let's not do that kind of idiotic
thing.

Notifiers are evil crap. Let's make *fewer* of them, not add
yet-another-random-notifier-for-some-random-reason.

F*ck me, but how I hate those random notifiers. And I hate people who
add them willy nilly.

                             Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 22:21               ` Greg Kroah-Hartman
@ 2012-04-09 22:44                 ` john stultz
  -1 siblings, 0 replies; 43+ messages in thread
From: john stultz @ 2012-04-09 22:44 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Colin Cross, David Rientjes, Linus Torvalds, Andrew Morton,
	werner, Rik van Riel, Hugh Dickins, linux-kernel, Oleg Nesterov,
	Rabin Vincent, Christian Bejram, Paul E. McKenney,
	Anton Vorontsov, stable

On Mon, Apr 9, 2012 at 3:21 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Mon, Apr 09, 2012 at 03:13:00PM -0700, Colin Cross wrote:
>> sched: Add a generic notifier when a task struct is about to be freed
>> https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=667dffa787a87ef4ea43cc65957ce96077fdcd0a
>
> Yes, I can't add a patch like that for this driver, that is why I
> thought everyone was getting together to "properly" determine how to
> solve this oom notifier problem.  Has that work stalled somwhere?

Anton Vorontsov  has been working on this (and just sent out some
related vmevent patches today). His hope is to use the vmevent  or mem
cgroup interface to notify a userland killer to get the same or
improved behavior as the in-kernel lowmemory killer.

thanks
-john

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
@ 2012-04-09 22:44                 ` john stultz
  0 siblings, 0 replies; 43+ messages in thread
From: john stultz @ 2012-04-09 22:44 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Colin Cross, David Rientjes, Linus Torvalds, Andrew Morton,
	werner, Rik van Riel, Hugh Dickins, linux-kernel, Oleg Nesterov,
	Rabin Vincent, Christian Bejram, Paul E. McKenney,
	Anton Vorontsov, stable

On Mon, Apr 9, 2012 at 3:21 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Mon, Apr 09, 2012 at 03:13:00PM -0700, Colin Cross wrote:
>> sched: Add a generic notifier when a task struct is about to be freed
>> https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=667dffa787a87ef4ea43cc65957ce96077fdcd0a
>
> Yes, I can't add a patch like that for this driver, that is why I
> thought everyone was getting together to "properly" determine how to
> solve this oom notifier problem. �Has that work stalled somwhere?

Anton Vorontsov  has been working on this (and just sent out some
related vmevent patches today). His hope is to use the vmevent  or mem
cgroup interface to notify a userland killer to get the same or
improved behavior as the in-kernel lowmemory killer.

thanks
-john

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 22:09           ` Linus Torvalds
@ 2012-04-09 23:25             ` David Rientjes
  2012-04-09 23:55                 ` Linus Torvalds
                                 ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: David Rientjes @ 2012-04-09 23:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
	Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
	Anton Vorontsov, Greg Kroah-Hartman, stable, Ingo Molnar

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3064 bytes --]

On Mon, 9 Apr 2012, Linus Torvalds wrote:

> > Right, we can't handoff the freeing of the task_struct to more than one
> > notifier.  It seems misdesigned from the beginning and what we really want
> > is to hijack task->usage for __put_task_struct(task) if we have such a
> > notifier callchain and require each one (currently just oprofile) to take
> > a reference on task->usage for NOTIFY_OK and then be responsible for
> > dropping the reference when it's done with it later instead of requiring
> > it to free the task_struct itself.
> 
> We could make notifier.c just "or" all the return values together, and
> then it's ok if *one* person returns NOTIFY_OK.
> 

You could that if you also turned the check for "ret == NOTIFY_OK" in 
profile_handoff_task() into "ret & NOTIFY_OK" in your patch, otherwise you 
get a double free from __put_task_struct() and oprofile.

> Of course, that's not how notifiers are documented to work, but quite
> frankly, notifiers with non-zero values that don't sat STOP are broken
> as-is anyway, so you might we well do a logical "or" of the return
> values and at least make things like this work.
> 

It works fine if the callbacks are correctly implemented, it's just that 
the task handoff in kernel/profile.c is broken because it assumes only one 
callback will return NOTIFY_OK, meaning it will eventually free, and its 
only checking the return value of the last notifier called to see if 
__put_task_struct() should immediately free.

In defense of notifiers, though, it works fine right now for memory 
hotplug.  The last issue I had with it was when slab lacked a callback 
when a node was onlined or offlined in 2.6.34 and then I added memory 
hotplug support for that allocator and it has since worked fine.  For 
things like MEM_GOING_OFFLINE, returning NOTIFY_BAD is great if the 
subsystem of interest can't allow the memory to go offline (in-use slab 
objects, for example).  In the memory hotplug usecase, we certainly don't 
want to stop at NOTIFY_OK because we need to notify every subsystem on the 
callchain.

> Oh well. So my suggestion right now would be something like the
> attached. It's still horribly broken, it actively breaks documented
> notifier behavior, but dammit, if the notifier people don't like
> 'or'ing return values together they should damn well return zero from
> the notifier that doesn't do anything. And returning an error will
> exit out, so..
> 

Instead of this and it's possible bad interactions with other notifiers 
during the -rc cycle, I think it would be better to

 (1)  fix the lowmemorykiller so it doesn't need to use these notifiers at 
      all, which isn't difficult, for 3.4, then

 (2a) change the task handoff to a refcount on task->usage after the final
      put_task_struct() using the notifier and then allow it to be freed 
      after everybody does a put_handoff_task_struct() for 3.5

	or

 (2b) remove the task handoff notifier callchain entirely and just tie it
      directly to oprofile since android won't be using it anymore after 
      (1).

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 22:13             ` Colin Cross
                               ` (2 preceding siblings ...)
  (?)
@ 2012-04-09 23:37             ` David Rientjes
  2012-04-10  0:23                 ` Colin Cross
  -1 siblings, 1 reply; 43+ messages in thread
From: David Rientjes @ 2012-04-09 23:37 UTC (permalink / raw)
  To: Colin Cross
  Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
	Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
	Christian Bejram, Paul E. McKenney, Anton Vorontsov,
	Greg Kroah-Hartman, stable

On Mon, 9 Apr 2012, Colin Cross wrote:

> This was a known issue in 2010, in the android tree the use of
> task_handoff_register was dropped one day after it was added and
> replaced with a new task_free_register hook.

Why can't you just do this?  Are you concerned about the possibility of 
depleting all memory reserves?
---
 drivers/staging/android/lowmemorykiller.c |   47 ++++-------------------------
 1 file changed, 6 insertions(+), 41 deletions(-)

diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -55,7 +55,6 @@ static int lowmem_minfree[6] = {
 };
 static int lowmem_minfree_size = 4;
 
-static struct task_struct *lowmem_deathpending;
 static unsigned long lowmem_deathpending_timeout;
 
 #define lowmem_print(level, x...)			\
@@ -64,24 +63,6 @@ static unsigned long lowmem_deathpending_timeout;
 			printk(x);			\
 	} while (0)
 
-static int
-task_notify_func(struct notifier_block *self, unsigned long val, void *data);
-
-static struct notifier_block task_nb = {
-	.notifier_call	= task_notify_func,
-};
-
-static int
-task_notify_func(struct notifier_block *self, unsigned long val, void *data)
-{
-	struct task_struct *task = data;
-
-	if (task == lowmem_deathpending)
-		lowmem_deathpending = NULL;
-
-	return NOTIFY_OK;
-}
-
 static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
 {
 	struct task_struct *tsk;
@@ -97,19 +78,6 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
 	int other_file = global_page_state(NR_FILE_PAGES) -
 						global_page_state(NR_SHMEM);
 
-	/*
-	 * If we already have a death outstanding, then
-	 * bail out right away; indicating to vmscan
-	 * that we have nothing further to offer on
-	 * this pass.
-	 *
-	 * Note: Currently you need CONFIG_PROFILING
-	 * for this to work correctly.
-	 */
-	if (lowmem_deathpending &&
-	    time_before_eq(jiffies, lowmem_deathpending_timeout))
-		return 0;
-
 	if (lowmem_adj_size < array_size)
 		array_size = lowmem_adj_size;
 	if (lowmem_minfree_size < array_size)
@@ -148,6 +116,11 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
 		if (!p)
 			continue;
 
+		if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
+		    time_before_eq(jiffies, lowmem_deathpending_timeout)) {
+			task_unlock(p);
+			return 0;
+		}
 		oom_score_adj = p->signal->oom_score_adj;
 		if (oom_score_adj < min_score_adj) {
 			task_unlock(p);
@@ -174,15 +147,9 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
 		lowmem_print(1, "send sigkill to %d (%s), adj %d, size %d\n",
 			     selected->pid, selected->comm,
 			     selected_oom_score_adj, selected_tasksize);
-		/*
-		 * If CONFIG_PROFILING is off, then we don't want to stall
-		 * the killer by setting lowmem_deathpending.
-		 */
-#ifdef CONFIG_PROFILING
-		lowmem_deathpending = selected;
 		lowmem_deathpending_timeout = jiffies + HZ;
-#endif
 		send_sig(SIGKILL, selected, 0);
+		set_tsk_thread_flag(selected, TIF_MEMDIE);
 		rem -= selected_tasksize;
 	}
 	lowmem_print(4, "lowmem_shrink %lu, %x, return %d\n",
@@ -198,7 +165,6 @@ static struct shrinker lowmem_shrinker = {
 
 static int __init lowmem_init(void)
 {
-	task_handoff_register(&task_nb);
 	register_shrinker(&lowmem_shrinker);
 	return 0;
 }
@@ -206,7 +172,6 @@ static int __init lowmem_init(void)
 static void __exit lowmem_exit(void)
 {
 	unregister_shrinker(&lowmem_shrinker);
-	task_handoff_unregister(&task_nb);
 }
 
 module_param_named(cost, lowmem_shrinker.seeks, int, S_IRUGO | S_IWUSR);

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 23:25             ` David Rientjes
@ 2012-04-09 23:55                 ` Linus Torvalds
  2012-04-09 23:56               ` [patch] android, lowmemorykiller: remove task handoff notifier David Rientjes
       [not found]               ` <web-723076709@zbackend1.aha.ru>
  2 siblings, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2012-04-09 23:55 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
	Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
	Anton Vorontsov, Greg Kroah-Hartman, stable, Ingo Molnar

On Mon, Apr 9, 2012 at 4:25 PM, David Rientjes <rientjes@google.com> wrote:
>
> You could that if you also turned the check for "ret == NOTIFY_OK" in
> profile_handoff_task() into "ret & NOTIFY_OK" in your patch, otherwise you
> get a double free from __put_task_struct() and oprofile.

Why? NOTIFY_DONE is zero.

I do agree that we *also* could do the "& NOTIFY_OK" and make it
clearer that we're oring bits together. And we could document the
stupid notifier interfaces to do this all, and just make the rules be
*sane* when you have multiple notifiers.

And sane rules would be either:

 - you always return an error return, and notifiers all return either
0 or a negative error number, and we stop on the first error and
return that.

 - you return a bitmask, and we or all bits together (and we can
certainly continue to have a "stop here" bit)

But the current notifier semantics are just insane. The whole "we
return the last return value" is crazy. It's by definition a random
number, since the whole point of notifiers is that there can be
multiple, and they aren't "ordered". So the whole "last return value"
is something I just look at and say: "Whoever designed that is a
f*cking moron".

(And if that happens to be some younger version of me, I am happy that
I got over it. But I'm pretty sure I have never touched that broken
notifier code in my life)

> It works fine if the callbacks are correctly implemented, it's just that
> the task handoff in kernel/profile.c is broken because it assumes only one
> callback will return NOTIFY_OK, meaning it will eventually free, and its
> only checking the return value of the last notifier called to see if
> __put_task_struct() should immediately free.

We can easily document it as "only oprofile is allowed to return
NOTIFY_OK, this notifier is a big mess, don't even *think* about
returning anything but NOTIFY_DONE".

>  (1)  fix the lowmemorykiller so it doesn't need to use these notifiers at
>      all, which isn't difficult, for 3.4, then

I do think that that makes sense. Fixing people to not use notifiers
is always a good idea. Why would anybody sane even care about the
process going away anyway? If some lowmemorykiller decides to kill off
a process that no longer exists, kill() should happily return ENOSRCH,
and we're all good

So it could just use a "pid", and test for existence with send_sig()
or lookup_pid() or something.

>  (2a) change the task handoff to a refcount on task->usage after the final
>      put_task_struct() using the notifier and then allow it to be freed
>      after everybody does a put_handoff_task_struct() for 3.5

The task handoff code runs too late right now. I guess we could easily
move it up, though.

At the same time, the *only* user of that stupid handoff thing is
oprofile, afaik, and if we use a refcount, why the hell doesn't
oprofile just use a refcount to begin with, instead of using that
notifier?: IOW, *both* users of the notifier seem to be just retarded.

So I'd rather just kill the stupid notifier entirely. In the meantime,
making lowmemorykiller just return zero instead just "fixes" it
(assuming we make the notifier semantics for multiple return codes
sane, which they clearly aren't).

Again, almost every notifier user has always been total crap. It's
just a stupid abstraction. "Something happened". "Oh, ok".

                    Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
@ 2012-04-09 23:55                 ` Linus Torvalds
  0 siblings, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2012-04-09 23:55 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
	Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
	Anton Vorontsov, Greg Kroah-Hartman, stable, Ingo Molnar

On Mon, Apr 9, 2012 at 4:25 PM, David Rientjes <rientjes@google.com> wrote:
>
> You could that if you also turned the check for "ret == NOTIFY_OK" in
> profile_handoff_task() into "ret & NOTIFY_OK" in your patch, otherwise you
> get a double free from __put_task_struct() and oprofile.

Why? NOTIFY_DONE is zero.

I do agree that we *also* could do the "& NOTIFY_OK" and make it
clearer that we're oring bits together. And we could document the
stupid notifier interfaces to do this all, and just make the rules be
*sane* when you have multiple notifiers.

And sane rules would be either:

 - you always return an error return, and notifiers all return either
0 or a negative error number, and we stop on the first error and
return that.

 - you return a bitmask, and we or all bits together (and we can
certainly continue to have a "stop here" bit)

But the current notifier semantics are just insane. The whole "we
return the last return value" is crazy. It's by definition a random
number, since the whole point of notifiers is that there can be
multiple, and they aren't "ordered". So the whole "last return value"
is something I just look at and say: "Whoever designed that is a
f*cking moron".

(And if that happens to be some younger version of me, I am happy that
I got over it. But I'm pretty sure I have never touched that broken
notifier code in my life)

> It works fine if the callbacks are correctly implemented, it's just that
> the task handoff in kernel/profile.c is broken because it assumes only one
> callback will return NOTIFY_OK, meaning it will eventually free, and its
> only checking the return value of the last notifier called to see if
> __put_task_struct() should immediately free.

We can easily document it as "only oprofile is allowed to return
NOTIFY_OK, this notifier is a big mess, don't even *think* about
returning anything but NOTIFY_DONE".

> �(1) �fix the lowmemorykiller so it doesn't need to use these notifiers at
> � � �all, which isn't difficult, for 3.4, then

I do think that that makes sense. Fixing people to not use notifiers
is always a good idea. Why would anybody sane even care about the
process going away anyway? If some lowmemorykiller decides to kill off
a process that no longer exists, kill() should happily return ENOSRCH,
and we're all good

So it could just use a "pid", and test for existence with send_sig()
or lookup_pid() or something.

> �(2a) change the task handoff to a refcount on task->usage after the final
> � � �put_task_struct() using the notifier and then allow it to be freed
> � � �after everybody does a put_handoff_task_struct() for 3.5

The task handoff code runs too late right now. I guess we could easily
move it up, though.

At the same time, the *only* user of that stupid handoff thing is
oprofile, afaik, and if we use a refcount, why the hell doesn't
oprofile just use a refcount to begin with, instead of using that
notifier?: IOW, *both* users of the notifier seem to be just retarded.

So I'd rather just kill the stupid notifier entirely. In the meantime,
making lowmemorykiller just return zero instead just "fixes" it
(assuming we make the notifier semantics for multiple return codes
sane, which they clearly aren't).

Again, almost every notifier user has always been total crap. It's
just a stupid abstraction. "Something happened". "Oh, ok".

                    Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch] android, lowmemorykiller: remove task handoff notifier
  2012-04-09 23:25             ` David Rientjes
  2012-04-09 23:55                 ` Linus Torvalds
@ 2012-04-09 23:56               ` David Rientjes
  2012-04-10  1:23                   ` Colin Cross
       [not found]               ` <web-723076709@zbackend1.aha.ru>
  2 siblings, 1 reply; 43+ messages in thread
From: David Rientjes @ 2012-04-09 23:56 UTC (permalink / raw)
  To: Linus Torvalds, Greg Kroah-Hartman
  Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
	Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
	Anton Vorontsov, stable, Ingo Molnar, Colin Cross

The task handoff notifier leaks task_struct since it never gets freed
after the callback returns NOTIFY_OK, which means it is responsible for
doing so.

It turns out the lowmemorykiller actually doesn't need this notifier at
all.  It's used to prevent unnecessary killing by waiting for a thread to
exit as a result of lowmem_shrink(), however, it's possible to do this in
the same way the kernel oom killer works by setting TIF_MEMDIE and avoid
killing if we're still waiting for it to exit.

The kernel oom killer will already automatically set TIF_MEMDIE for
threads that are attempting to allocate memory that have a fatal signal.
The thread selected by lowmem_shrink() will have such a signal after the
lowmemorykiller sends it a SIGKILL, so this won't result in an
unnecessary use of memory reserves for the thread to exit.

This has the added benefit that we don't have to rely on CONFIG_PROFILING
to prevent needlessly killing tasks.

Reported-by: werner <w.landgraf@ru.ru>
Cc: stable@vger.kernel.org
Signed-off-by: David Rientjes <rientjes@google.com>
---
 drivers/staging/android/lowmemorykiller.c |   48 +++++------------------------
 1 file changed, 7 insertions(+), 41 deletions(-)

diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -55,7 +55,6 @@ static int lowmem_minfree[6] = {
 };
 static int lowmem_minfree_size = 4;
 
-static struct task_struct *lowmem_deathpending;
 static unsigned long lowmem_deathpending_timeout;
 
 #define lowmem_print(level, x...)			\
@@ -64,24 +63,6 @@ static unsigned long lowmem_deathpending_timeout;
 			printk(x);			\
 	} while (0)
 
-static int
-task_notify_func(struct notifier_block *self, unsigned long val, void *data);
-
-static struct notifier_block task_nb = {
-	.notifier_call	= task_notify_func,
-};
-
-static int
-task_notify_func(struct notifier_block *self, unsigned long val, void *data)
-{
-	struct task_struct *task = data;
-
-	if (task == lowmem_deathpending)
-		lowmem_deathpending = NULL;
-
-	return NOTIFY_OK;
-}
-
 static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
 {
 	struct task_struct *tsk;
@@ -97,19 +78,6 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
 	int other_file = global_page_state(NR_FILE_PAGES) -
 						global_page_state(NR_SHMEM);
 
-	/*
-	 * If we already have a death outstanding, then
-	 * bail out right away; indicating to vmscan
-	 * that we have nothing further to offer on
-	 * this pass.
-	 *
-	 * Note: Currently you need CONFIG_PROFILING
-	 * for this to work correctly.
-	 */
-	if (lowmem_deathpending &&
-	    time_before_eq(jiffies, lowmem_deathpending_timeout))
-		return 0;
-
 	if (lowmem_adj_size < array_size)
 		array_size = lowmem_adj_size;
 	if (lowmem_minfree_size < array_size)
@@ -148,6 +116,12 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
 		if (!p)
 			continue;
 
+		if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
+		    time_before_eq(jiffies, lowmem_deathpending_timeout)) {
+			task_unlock(p);
+			rcu_read_unlock();
+			return 0;
+		}
 		oom_score_adj = p->signal->oom_score_adj;
 		if (oom_score_adj < min_score_adj) {
 			task_unlock(p);
@@ -174,15 +148,9 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
 		lowmem_print(1, "send sigkill to %d (%s), adj %d, size %d\n",
 			     selected->pid, selected->comm,
 			     selected_oom_score_adj, selected_tasksize);
-		/*
-		 * If CONFIG_PROFILING is off, then we don't want to stall
-		 * the killer by setting lowmem_deathpending.
-		 */
-#ifdef CONFIG_PROFILING
-		lowmem_deathpending = selected;
 		lowmem_deathpending_timeout = jiffies + HZ;
-#endif
 		send_sig(SIGKILL, selected, 0);
+		set_tsk_thread_flag(selected, TIF_MEMDIE);
 		rem -= selected_tasksize;
 	}
 	lowmem_print(4, "lowmem_shrink %lu, %x, return %d\n",
@@ -198,7 +166,6 @@ static struct shrinker lowmem_shrinker = {
 
 static int __init lowmem_init(void)
 {
-	task_handoff_register(&task_nb);
 	register_shrinker(&lowmem_shrinker);
 	return 0;
 }
@@ -206,7 +173,6 @@ static int __init lowmem_init(void)
 static void __exit lowmem_exit(void)
 {
 	unregister_shrinker(&lowmem_shrinker);
-	task_handoff_unregister(&task_nb);
 }
 
 module_param_named(cost, lowmem_shrinker.seeks, int, S_IRUGO | S_IWUSR);

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 23:55                 ` Linus Torvalds
@ 2012-04-10  0:04                   ` David Rientjes
  -1 siblings, 0 replies; 43+ messages in thread
From: David Rientjes @ 2012-04-10  0:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
	Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
	Anton Vorontsov, Greg Kroah-Hartman, stable, Ingo Molnar

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1737 bytes --]

On Mon, 9 Apr 2012, Linus Torvalds wrote:

> > You could that if you also turned the check for "ret == NOTIFY_OK" in
> > profile_handoff_task() into "ret & NOTIFY_OK" in your patch, otherwise you
> > get a double free from __put_task_struct() and oprofile.
> 
> Why? NOTIFY_DONE is zero.
> 

Oops, right.

> >  (1)  fix the lowmemorykiller so it doesn't need to use these notifiers at
> >      all, which isn't difficult, for 3.4, then
> 
> I do think that that makes sense. Fixing people to not use notifiers
> is always a good idea. Why would anybody sane even care about the
> process going away anyway? If some lowmemorykiller decides to kill off
> a process that no longer exists, kill() should happily return ENOSRCH,
> and we're all good
> 

It's apparently waiting for a killed thread to exit before selecting 
another victim or the one second timeout expires.  (And you only get to 
prevent needless kills if you have CONFIG_PROFILING, otherwise it doesn't 
care.)

> At the same time, the *only* user of that stupid handoff thing is
> oprofile, afaik, and if we use a refcount, why the hell doesn't
> oprofile just use a refcount to begin with, instead of using that
> notifier?: IOW, *both* users of the notifier seem to be just retarded.
> 

Agreed and since the current implementation relies on CONFIG_PROFILING I 
think it's safe to remove the notifier and add a hook only for oprofile so 
it can do free_task() when it wants to.  No refcounting required.

I've already proposed a patch that removes the notifier for 
lowmemorykiller with the added benefit that it doesn't rely on 
CONFIG_PROFILING at all.  If that's merged for 3.4, I'll remove the task 
handoff callchain entirely for 3.5 since oprofile is the only user.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
@ 2012-04-10  0:04                   ` David Rientjes
  0 siblings, 0 replies; 43+ messages in thread
From: David Rientjes @ 2012-04-10  0:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
	Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
	Anton Vorontsov, Greg Kroah-Hartman, stable, Ingo Molnar

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1747 bytes --]

On Mon, 9 Apr 2012, Linus Torvalds wrote:

> > You could that if you also turned the check for "ret == NOTIFY_OK" in
> > profile_handoff_task() into "ret & NOTIFY_OK" in your patch, otherwise you
> > get a double free from __put_task_struct() and oprofile.
> 
> Why? NOTIFY_DONE is zero.
> 

Oops, right.

> > �(1) �fix the lowmemorykiller so it doesn't need to use these notifiers at
> > � � �all, which isn't difficult, for 3.4, then
> 
> I do think that that makes sense. Fixing people to not use notifiers
> is always a good idea. Why would anybody sane even care about the
> process going away anyway? If some lowmemorykiller decides to kill off
> a process that no longer exists, kill() should happily return ENOSRCH,
> and we're all good
> 

It's apparently waiting for a killed thread to exit before selecting 
another victim or the one second timeout expires.  (And you only get to 
prevent needless kills if you have CONFIG_PROFILING, otherwise it doesn't 
care.)

> At the same time, the *only* user of that stupid handoff thing is
> oprofile, afaik, and if we use a refcount, why the hell doesn't
> oprofile just use a refcount to begin with, instead of using that
> notifier?: IOW, *both* users of the notifier seem to be just retarded.
> 

Agreed and since the current implementation relies on CONFIG_PROFILING I 
think it's safe to remove the notifier and add a hook only for oprofile so 
it can do free_task() when it wants to.  No refcounting required.

I've already proposed a patch that removes the notifier for 
lowmemorykiller with the added benefit that it doesn't rely on 
CONFIG_PROFILING at all.  If that's merged for 3.4, I'll remove the task 
handoff callchain entirely for 3.5 since oprofile is the only user.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 23:37             ` David Rientjes
@ 2012-04-10  0:23                 ` Colin Cross
  0 siblings, 0 replies; 43+ messages in thread
From: Colin Cross @ 2012-04-10  0:23 UTC (permalink / raw)
  To: David Rientjes
  Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
	Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
	Christian Bejram, Paul E. McKenney, Anton Vorontsov,
	Greg Kroah-Hartman, stable

On Mon, Apr 9, 2012 at 4:37 PM, David Rientjes <rientjes@google.com> wrote:
> On Mon, 9 Apr 2012, Colin Cross wrote:
>
>> This was a known issue in 2010, in the android tree the use of
>> task_handoff_register was dropped one day after it was added and
>> replaced with a new task_free_register hook.
>
> Why can't you just do this?  Are you concerned about the possibility of
> depleting all memory reserves?

The point of the lowmem_deathpending patch was to avoid a stutter
where the cpu would spend its time looping through the tasks due to
repeated calls to lowmem_shrink instead of processing the kill signal
to the selected thread.  With this patch, it will still loop through
tasks until it finds the one that was previously killed and then
abort.  It's possible that the improvements Anton made to the task
loop reduce the performance impact enough that this whole mess could
just be dropped (by reverting 1eda516, e5d7965, and 4755b72).

This may have also been impacted by another bug that is on my list of
things to look at: when asked the size of it's "cache", lowmemkiller
returns something on the order of all memory used by userspace, but
under some conditions will refuse to kill any of it due to the current
lowmem_minfree settings.  Due to the large size of the "cache", the
shrinker can call lowmem_shrink hundreds of times for a single
allocation, each time asking to reduce the size of the cache by 128
pages.  The original lowmem_deathpending patch may have been a
misguided "fix" for this bug.

> ---
>  drivers/staging/android/lowmemorykiller.c |   47 ++++-------------------------
>  1 file changed, 6 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
> --- a/drivers/staging/android/lowmemorykiller.c
> +++ b/drivers/staging/android/lowmemorykiller.c
> @@ -55,7 +55,6 @@ static int lowmem_minfree[6] = {
>  };
>  static int lowmem_minfree_size = 4;
>
> -static struct task_struct *lowmem_deathpending;
>  static unsigned long lowmem_deathpending_timeout;
>
>  #define lowmem_print(level, x...)                      \
> @@ -64,24 +63,6 @@ static unsigned long lowmem_deathpending_timeout;
>                        printk(x);                      \
>        } while (0)
>
> -static int
> -task_notify_func(struct notifier_block *self, unsigned long val, void *data);
> -
> -static struct notifier_block task_nb = {
> -       .notifier_call  = task_notify_func,
> -};
> -
> -static int
> -task_notify_func(struct notifier_block *self, unsigned long val, void *data)
> -{
> -       struct task_struct *task = data;
> -
> -       if (task == lowmem_deathpending)
> -               lowmem_deathpending = NULL;
> -
> -       return NOTIFY_OK;
> -}
> -
>  static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
>  {
>        struct task_struct *tsk;
> @@ -97,19 +78,6 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
>        int other_file = global_page_state(NR_FILE_PAGES) -
>                                                global_page_state(NR_SHMEM);
>
> -       /*
> -        * If we already have a death outstanding, then
> -        * bail out right away; indicating to vmscan
> -        * that we have nothing further to offer on
> -        * this pass.
> -        *
> -        * Note: Currently you need CONFIG_PROFILING
> -        * for this to work correctly.
> -        */
> -       if (lowmem_deathpending &&
> -           time_before_eq(jiffies, lowmem_deathpending_timeout))
> -               return 0;
> -
>        if (lowmem_adj_size < array_size)
>                array_size = lowmem_adj_size;
>        if (lowmem_minfree_size < array_size)
> @@ -148,6 +116,11 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
>                if (!p)
>                        continue;
>
> +               if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
> +                   time_before_eq(jiffies, lowmem_deathpending_timeout)) {
> +                       task_unlock(p);
> +                       return 0;
> +               }
>                oom_score_adj = p->signal->oom_score_adj;
>                if (oom_score_adj < min_score_adj) {
>                        task_unlock(p);
> @@ -174,15 +147,9 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
>                lowmem_print(1, "send sigkill to %d (%s), adj %d, size %d\n",
>                             selected->pid, selected->comm,
>                             selected_oom_score_adj, selected_tasksize);
> -               /*
> -                * If CONFIG_PROFILING is off, then we don't want to stall
> -                * the killer by setting lowmem_deathpending.
> -                */
> -#ifdef CONFIG_PROFILING
> -               lowmem_deathpending = selected;
>                lowmem_deathpending_timeout = jiffies + HZ;
> -#endif
>                send_sig(SIGKILL, selected, 0);
> +               set_tsk_thread_flag(selected, TIF_MEMDIE);
>                rem -= selected_tasksize;
>        }
>        lowmem_print(4, "lowmem_shrink %lu, %x, return %d\n",
> @@ -198,7 +165,6 @@ static struct shrinker lowmem_shrinker = {
>
>  static int __init lowmem_init(void)
>  {
> -       task_handoff_register(&task_nb);
>        register_shrinker(&lowmem_shrinker);
>        return 0;
>  }
> @@ -206,7 +172,6 @@ static int __init lowmem_init(void)
>  static void __exit lowmem_exit(void)
>  {
>        unregister_shrinker(&lowmem_shrinker);
> -       task_handoff_unregister(&task_nb);
>  }
>
>  module_param_named(cost, lowmem_shrinker.seeks, int, S_IRUGO | S_IWUSR);

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
@ 2012-04-10  0:23                 ` Colin Cross
  0 siblings, 0 replies; 43+ messages in thread
From: Colin Cross @ 2012-04-10  0:23 UTC (permalink / raw)
  To: David Rientjes
  Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
	Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
	Christian Bejram, Paul E. McKenney, Anton Vorontsov,
	Greg Kroah-Hartman, stable

On Mon, Apr 9, 2012 at 4:37 PM, David Rientjes <rientjes@google.com> wrote:
> On Mon, 9 Apr 2012, Colin Cross wrote:
>
>> This was a known issue in 2010, in the android tree the use of
>> task_handoff_register was dropped one day after it was added and
>> replaced with a new task_free_register hook.
>
> Why can't you just do this? �Are you concerned about the possibility of
> depleting all memory reserves?

The point of the lowmem_deathpending patch was to avoid a stutter
where the cpu would spend its time looping through the tasks due to
repeated calls to lowmem_shrink instead of processing the kill signal
to the selected thread.  With this patch, it will still loop through
tasks until it finds the one that was previously killed and then
abort.  It's possible that the improvements Anton made to the task
loop reduce the performance impact enough that this whole mess could
just be dropped (by reverting 1eda516, e5d7965, and 4755b72).

This may have also been impacted by another bug that is on my list of
things to look at: when asked the size of it's "cache", lowmemkiller
returns something on the order of all memory used by userspace, but
under some conditions will refuse to kill any of it due to the current
lowmem_minfree settings.  Due to the large size of the "cache", the
shrinker can call lowmem_shrink hundreds of times for a single
allocation, each time asking to reduce the size of the cache by 128
pages.  The original lowmem_deathpending patch may have been a
misguided "fix" for this bug.

> ---
> �drivers/staging/android/lowmemorykiller.c | � 47 ++++-------------------------
> �1 file changed, 6 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
> --- a/drivers/staging/android/lowmemorykiller.c
> +++ b/drivers/staging/android/lowmemorykiller.c
> @@ -55,7 +55,6 @@ static int lowmem_minfree[6] = {
> �};
> �static int lowmem_minfree_size = 4;
>
> -static struct task_struct *lowmem_deathpending;
> �static unsigned long lowmem_deathpending_timeout;
>
> �#define lowmem_print(level, x...) � � � � � � � � � � �\
> @@ -64,24 +63,6 @@ static unsigned long lowmem_deathpending_timeout;
> � � � � � � � � � � � �printk(x); � � � � � � � � � � �\
> � � � �} while (0)
>
> -static int
> -task_notify_func(struct notifier_block *self, unsigned long val, void *data);
> -
> -static struct notifier_block task_nb = {
> - � � � .notifier_call �= task_notify_func,
> -};
> -
> -static int
> -task_notify_func(struct notifier_block *self, unsigned long val, void *data)
> -{
> - � � � struct task_struct *task = data;
> -
> - � � � if (task == lowmem_deathpending)
> - � � � � � � � lowmem_deathpending = NULL;
> -
> - � � � return NOTIFY_OK;
> -}
> -
> �static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
> �{
> � � � �struct task_struct *tsk;
> @@ -97,19 +78,6 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
> � � � �int other_file = global_page_state(NR_FILE_PAGES) -
> � � � � � � � � � � � � � � � � � � � � � � � �global_page_state(NR_SHMEM);
>
> - � � � /*
> - � � � �* If we already have a death outstanding, then
> - � � � �* bail out right away; indicating to vmscan
> - � � � �* that we have nothing further to offer on
> - � � � �* this pass.
> - � � � �*
> - � � � �* Note: Currently you need CONFIG_PROFILING
> - � � � �* for this to work correctly.
> - � � � �*/
> - � � � if (lowmem_deathpending &&
> - � � � � � time_before_eq(jiffies, lowmem_deathpending_timeout))
> - � � � � � � � return 0;
> -
> � � � �if (lowmem_adj_size < array_size)
> � � � � � � � �array_size = lowmem_adj_size;
> � � � �if (lowmem_minfree_size < array_size)
> @@ -148,6 +116,11 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
> � � � � � � � �if (!p)
> � � � � � � � � � � � �continue;
>
> + � � � � � � � if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
> + � � � � � � � � � time_before_eq(jiffies, lowmem_deathpending_timeout)) {
> + � � � � � � � � � � � task_unlock(p);
> + � � � � � � � � � � � return 0;
> + � � � � � � � }
> � � � � � � � �oom_score_adj = p->signal->oom_score_adj;
> � � � � � � � �if (oom_score_adj < min_score_adj) {
> � � � � � � � � � � � �task_unlock(p);
> @@ -174,15 +147,9 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
> � � � � � � � �lowmem_print(1, "send sigkill to %d (%s), adj %d, size %d\n",
> � � � � � � � � � � � � � � selected->pid, selected->comm,
> � � � � � � � � � � � � � � selected_oom_score_adj, selected_tasksize);
> - � � � � � � � /*
> - � � � � � � � �* If CONFIG_PROFILING is off, then we don't want to stall
> - � � � � � � � �* the killer by setting lowmem_deathpending.
> - � � � � � � � �*/
> -#ifdef CONFIG_PROFILING
> - � � � � � � � lowmem_deathpending = selected;
> � � � � � � � �lowmem_deathpending_timeout = jiffies + HZ;
> -#endif
> � � � � � � � �send_sig(SIGKILL, selected, 0);
> + � � � � � � � set_tsk_thread_flag(selected, TIF_MEMDIE);
> � � � � � � � �rem -= selected_tasksize;
> � � � �}
> � � � �lowmem_print(4, "lowmem_shrink %lu, %x, return %d\n",
> @@ -198,7 +165,6 @@ static struct shrinker lowmem_shrinker = {
>
> �static int __init lowmem_init(void)
> �{
> - � � � task_handoff_register(&task_nb);
> � � � �register_shrinker(&lowmem_shrinker);
> � � � �return 0;
> �}
> @@ -206,7 +172,6 @@ static int __init lowmem_init(void)
> �static void __exit lowmem_exit(void)
> �{
> � � � �unregister_shrinker(&lowmem_shrinker);
> - � � � task_handoff_unregister(&task_nb);
> �}
>
> �module_param_named(cost, lowmem_shrinker.seeks, int, S_IRUGO | S_IWUSR);

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-10  0:23                 ` Colin Cross
  (?)
@ 2012-04-10  0:32                 ` David Rientjes
  2012-04-10  1:21                     ` Colin Cross
  -1 siblings, 1 reply; 43+ messages in thread
From: David Rientjes @ 2012-04-10  0:32 UTC (permalink / raw)
  To: Colin Cross
  Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
	Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
	Christian Bejram, Paul E. McKenney, Anton Vorontsov,
	Greg Kroah-Hartman, stable

On Mon, 9 Apr 2012, Colin Cross wrote:

> The point of the lowmem_deathpending patch was to avoid a stutter
> where the cpu would spend its time looping through the tasks due to
> repeated calls to lowmem_shrink instead of processing the kill signal
> to the selected thread.

What did you do to avoid this without CONFIG_PROFILING?

> With this patch, it will still loop through
> tasks until it finds the one that was previously killed and then
> abort.  It's possible that the improvements Anton made to the task
> loop reduce the performance impact enough that this whole mess could
> just be dropped (by reverting 1eda516, e5d7965, and 4755b72).
> 

I don't understand how calling shrink_slab() from direct reclaim or using 
drop_caches manually taking slightly longer because it has to iterate the 
tasklist to the point of the killed thread will significantly stall the 
thread from exiting.

Much more likely is the killed thread cannot exit because you've killed it 
in a lowmem situation without giving it access to memory reserves so that 
it may exit quickly as my patch does.  That has a higher liklihood of 
stalling the exit than doing for_each_process().

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-10  0:32                 ` David Rientjes
@ 2012-04-10  1:21                     ` Colin Cross
  0 siblings, 0 replies; 43+ messages in thread
From: Colin Cross @ 2012-04-10  1:21 UTC (permalink / raw)
  To: David Rientjes
  Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
	Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
	Christian Bejram, Paul E. McKenney, Anton Vorontsov,
	Greg Kroah-Hartman, stable

On Mon, Apr 9, 2012 at 5:32 PM, David Rientjes <rientjes@google.com> wrote:
> On Mon, 9 Apr 2012, Colin Cross wrote:
>
>> The point of the lowmem_deathpending patch was to avoid a stutter
>> where the cpu would spend its time looping through the tasks due to
>> repeated calls to lowmem_shrink instead of processing the kill signal
>> to the selected thread.
>
> What did you do to avoid this without CONFIG_PROFILING?
>
>> With this patch, it will still loop through
>> tasks until it finds the one that was previously killed and then
>> abort.  It's possible that the improvements Anton made to the task
>> loop reduce the performance impact enough that this whole mess could
>> just be dropped (by reverting 1eda516, e5d7965, and 4755b72).
>>
>
> I don't understand how calling shrink_slab() from direct reclaim or using
> drop_caches manually taking slightly longer because it has to iterate the
> tasklist to the point of the killed thread will significantly stall the
> thread from exiting.

Before Anton's fix, iterating the tasklist involved taking every task
lock, which probably made it very expensive.  I tried a quick test
where I deliberately limited memory to the point that it was
triggering lowmemorykiller during boot, and it triggered about 5000
times taking on the order of 50ms total for all 5000 calls.  It was
about the same with your patch applied.

> Much more likely is the killed thread cannot exit because you've killed it
> in a lowmem situation without giving it access to memory reserves so that
> it may exit quickly as my patch does.  That has a higher liklihood of
> stalling the exit than doing for_each_process().

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
@ 2012-04-10  1:21                     ` Colin Cross
  0 siblings, 0 replies; 43+ messages in thread
From: Colin Cross @ 2012-04-10  1:21 UTC (permalink / raw)
  To: David Rientjes
  Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
	Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
	Christian Bejram, Paul E. McKenney, Anton Vorontsov,
	Greg Kroah-Hartman, stable

On Mon, Apr 9, 2012 at 5:32 PM, David Rientjes <rientjes@google.com> wrote:
> On Mon, 9 Apr 2012, Colin Cross wrote:
>
>> The point of the lowmem_deathpending patch was to avoid a stutter
>> where the cpu would spend its time looping through the tasks due to
>> repeated calls to lowmem_shrink instead of processing the kill signal
>> to the selected thread.
>
> What did you do to avoid this without CONFIG_PROFILING?
>
>> With this patch, it will still loop through
>> tasks until it finds the one that was previously killed and then
>> abort. �It's possible that the improvements Anton made to the task
>> loop reduce the performance impact enough that this whole mess could
>> just be dropped (by reverting 1eda516, e5d7965, and 4755b72).
>>
>
> I don't understand how calling shrink_slab() from direct reclaim or using
> drop_caches manually taking slightly longer because it has to iterate the
> tasklist to the point of the killed thread will significantly stall the
> thread from exiting.

Before Anton's fix, iterating the tasklist involved taking every task
lock, which probably made it very expensive.  I tried a quick test
where I deliberately limited memory to the point that it was
triggering lowmemorykiller during boot, and it triggered about 5000
times taking on the order of 50ms total for all 5000 calls.  It was
about the same with your patch applied.

> Much more likely is the killed thread cannot exit because you've killed it
> in a lowmem situation without giving it access to memory reserves so that
> it may exit quickly as my patch does. �That has a higher liklihood of
> stalling the exit than doing for_each_process().

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] android, lowmemorykiller: remove task handoff notifier
  2012-04-09 23:56               ` [patch] android, lowmemorykiller: remove task handoff notifier David Rientjes
@ 2012-04-10  1:23                   ` Colin Cross
  0 siblings, 0 replies; 43+ messages in thread
From: Colin Cross @ 2012-04-10  1:23 UTC (permalink / raw)
  To: David Rientjes
  Cc: Linus Torvalds, Greg Kroah-Hartman, Andrew Morton, werner,
	Rik van Riel, Hugh Dickins, linux-kernel, Oleg Nesterov,
	Rabin Vincent, Christian Bejram, Paul E. McKenney,
	Anton Vorontsov, stable, Ingo Molnar

On Mon, Apr 9, 2012 at 4:56 PM, David Rientjes <rientjes@google.com> wrote:
> The task handoff notifier leaks task_struct since it never gets freed
> after the callback returns NOTIFY_OK, which means it is responsible for
> doing so.
>
> It turns out the lowmemorykiller actually doesn't need this notifier at
> all.  It's used to prevent unnecessary killing by waiting for a thread to
> exit as a result of lowmem_shrink(), however, it's possible to do this in
> the same way the kernel oom killer works by setting TIF_MEMDIE and avoid
> killing if we're still waiting for it to exit.
>
> The kernel oom killer will already automatically set TIF_MEMDIE for
> threads that are attempting to allocate memory that have a fatal signal.
> The thread selected by lowmem_shrink() will have such a signal after the
> lowmemorykiller sends it a SIGKILL, so this won't result in an
> unnecessary use of memory reserves for the thread to exit.
>
> This has the added benefit that we don't have to rely on CONFIG_PROFILING
> to prevent needlessly killing tasks.
>
> Reported-by: werner <w.landgraf@ru.ru>
> Cc: stable@vger.kernel.org
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
>  drivers/staging/android/lowmemorykiller.c |   48 +++++------------------------
>  1 file changed, 7 insertions(+), 41 deletions(-)
>

I did a quick test to measure the difference in time spent inside
lowmem_shrink with and without this patch, and they were about the
same.  So,
Acked-by: Colin Cross <ccross@android.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] android, lowmemorykiller: remove task handoff notifier
@ 2012-04-10  1:23                   ` Colin Cross
  0 siblings, 0 replies; 43+ messages in thread
From: Colin Cross @ 2012-04-10  1:23 UTC (permalink / raw)
  To: David Rientjes
  Cc: Linus Torvalds, Greg Kroah-Hartman, Andrew Morton, werner,
	Rik van Riel, Hugh Dickins, linux-kernel, Oleg Nesterov,
	Rabin Vincent, Christian Bejram, Paul E. McKenney,
	Anton Vorontsov, stable, Ingo Molnar

On Mon, Apr 9, 2012 at 4:56 PM, David Rientjes <rientjes@google.com> wrote:
> The task handoff notifier leaks task_struct since it never gets freed
> after the callback returns NOTIFY_OK, which means it is responsible for
> doing so.
>
> It turns out the lowmemorykiller actually doesn't need this notifier at
> all. �It's used to prevent unnecessary killing by waiting for a thread to
> exit as a result of lowmem_shrink(), however, it's possible to do this in
> the same way the kernel oom killer works by setting TIF_MEMDIE and avoid
> killing if we're still waiting for it to exit.
>
> The kernel oom killer will already automatically set TIF_MEMDIE for
> threads that are attempting to allocate memory that have a fatal signal.
> The thread selected by lowmem_shrink() will have such a signal after the
> lowmemorykiller sends it a SIGKILL, so this won't result in an
> unnecessary use of memory reserves for the thread to exit.
>
> This has the added benefit that we don't have to rely on CONFIG_PROFILING
> to prevent needlessly killing tasks.
>
> Reported-by: werner <w.landgraf@ru.ru>
> Cc: stable@vger.kernel.org
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
> �drivers/staging/android/lowmemorykiller.c | � 48 +++++------------------------
> �1 file changed, 7 insertions(+), 41 deletions(-)
>

I did a quick test to measure the difference in time spent inside
lowmem_shrink with and without this patch, and they were about the
same.  So,
Acked-by: Colin Cross <ccross@android.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-10  1:21                     ` Colin Cross
  (?)
@ 2012-04-10  1:33                     ` David Rientjes
  2012-04-10  1:37                       ` Colin Cross
  -1 siblings, 1 reply; 43+ messages in thread
From: David Rientjes @ 2012-04-10  1:33 UTC (permalink / raw)
  To: Colin Cross
  Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
	Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
	Christian Bejram, Paul E. McKenney, Anton Vorontsov,
	Greg Kroah-Hartman, stable

On Mon, 9 Apr 2012, Colin Cross wrote:

> Before Anton's fix, iterating the tasklist involved taking every task
> lock, which probably made it very expensive.

I'm not sure of the fix you're referring to, but it's not in 3.4-rc2 
because lowmem_shrink() still does find_lock_task_mm() for every user 
process on the system, which is necessary to safely do get_mm_rss().

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-10  1:33                     ` David Rientjes
@ 2012-04-10  1:37                       ` Colin Cross
  0 siblings, 0 replies; 43+ messages in thread
From: Colin Cross @ 2012-04-10  1:37 UTC (permalink / raw)
  To: David Rientjes
  Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
	Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
	Christian Bejram, Paul E. McKenney, Anton Vorontsov,
	Greg Kroah-Hartman, stable

On Mon, Apr 9, 2012 at 6:33 PM, David Rientjes <rientjes@google.com> wrote:
> On Mon, 9 Apr 2012, Colin Cross wrote:
>
>> Before Anton's fix, iterating the tasklist involved taking every task
>> lock, which probably made it very expensive.
>
> I'm not sure of the fix you're referring to, but it's not in 3.4-rc2
> because lowmem_shrink() still does find_lock_task_mm() for every user
> process on the system, which is necessary to safely do get_mm_rss().

I confused "staging: android/lowmemorykiller: Don't grab
tasklist_lock" and "staging: android/lowmemorykiller: Better mm
handling".  You're right, it still grabs the task lock.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
       [not found]                     ` <alpine.DEB.2.00.1204091707580.21813@chino.kir.corp.google.com>
@ 2012-04-10  7:09                       ` werner
  2012-04-10  7:10                       ` werner
  1 sibling, 0 replies; 43+ messages in thread
From: werner @ 2012-04-10  7:09 UTC (permalink / raw)
  To: David Rientjes, Colin Cross, Linus Torvalds, Andrew Morton,
	Rik van Riel, Hugh Dickins, linux-kernel, Oleg Nesterov,
	Rabin Vincent, Christian Bejram, Paul E. McKenney,
	Anton Vorontsov, Greg Kroah-Hartman, stable

After first I tested some hours the 1st,one-line patch by 
D.R., now is ready compiled and started to be tested his 
2nd patch, below.   I see he has it already comitted; it 
would have been better first wait to test it.

The loop suggested below, with this 2nd patch, gives 1560 
kB , compared with 1632 kb after the 1st patch, and 1432 
kB with kernel 3.3  .    On the other hand, 3.3 has 
clearly the same problem (even if not crashing, it's 
becoming often very slow, and then  there's running 
kmemleak, what I have to kill for return to the normal 
speed), but according this 'test' it would be good, so 
that it's questionable if this test is reliable.

As already reported, the 1st patch cured the problem at 
least subjectively.

To see if this 2nd patch is good, I have to wait now some 
hours and observe if the computer becomes slow or even 
crashs


wl


=================================================

On Mon, 9 Apr 2012 17:11:45 -0700 (PDT)
  David Rientjes <rientjes@google.com> wrote:
> On Mon, 9 Apr 2012, werner wrote:
> 
>> I continue now testing your first patch a few hours, if 
>>it's good or not.
>> Then, I can make another patch.  So you have still time 
>>to think and put all
>> together
>> what you want to be tested, and mail me that. Also 
>>explain me, if you want
>> other
>> patchs ADDITIONALLY or INSTEAD your first patch -- the 
>>best would be, to send
>> me
>> always accumulating patchs including everything together 
>>to be applied over
>> the
>> 'virgin' 3.4-rcX kernel.
>> For your information, I dont download the whole git, I 
>>download all 3.X.Y-rcZ
>> , and I
>> recompile everything again (patched), instead of 
>>compiling only the patched
>> subroutines.
>> 
> 
> Ok, when you want to test the latest patch, try this:
> 
> - revert back to the vanilla 3.4-rc2 kernel,
> 
> - boot and do this on the command line:
> 
> 	for i in $(seq 1 10000); do sleep 0 & done
> 	grep KernelStack /proc/meminfo
> 
> - record that number,
> 
> - apply the patch at https://lkml.org/lkml/2012/4/9/428,
> 
> - boot and do the same two command lines,
> 
> - compare the number with the previous number from the 
>first boot.
> 
> The number should be much lower after the patch is 
>applied.
> 
> Thanks!
> 
> 

"werner" <w.landgraf@ru.ru>
---
Professional hosting for everyone - http://www.host.ru

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
       [not found]                     ` <alpine.DEB.2.00.1204091707580.21813@chino.kir.corp.google.com>
  2012-04-10  7:09                       ` v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs) werner
@ 2012-04-10  7:10                       ` werner
  1 sibling, 0 replies; 43+ messages in thread
From: werner @ 2012-04-10  7:10 UTC (permalink / raw)
  To: David Rientjes, Colin Cross, Linus Torvalds, Andrew Morton,
	Rik van Riel, Hugh Dickins, linux-kernel, Oleg Nesterov,
	Rabin Vincent, Christian Bejram, Paul E. McKenney,
	Anton Vorontsov, Greg Kroah-Hartman, stable

After first I tested some hours the 1st,one-line patch by 
D.R., now is ready compiled and started to be tested his 
2nd patch, below.   I see he has it already comitted; it 
would have been better first wait to test it.

The loop suggested below, with this 2nd patch, gives 1560 
kB , compared with 1632 kb after the 1st patch, and 1432 
kB with kernel 3.3  .    On the other hand, 3.3 has 
clearly the same problem (even if not crashing, it's 
becoming often very slow, and then  there's running 
kmemleak, what I have to kill for return to the normal 
speed), but according this 'test' it would be good, so 
that it's questionable if this test is reliable.

As already reported, the 1st patch cured the problem at 
least subjectively.

To see if this 2nd patch is good, I have to wait now some 
hours and observe if the computer becomes slow or even 
crashs


wl


=================================================

On Mon, 9 Apr 2012 17:11:45 -0700 (PDT)
  David Rientjes <rientjes@google.com> wrote:
> On Mon, 9 Apr 2012, werner wrote:
> 
>> I continue now testing your first patch a few hours, if 
>>it's good or not.
>> Then, I can make another patch.  So you have still time 
>>to think and put all
>> together
>> what you want to be tested, and mail me that. Also 
>>explain me, if you want
>> other
>> patchs ADDITIONALLY or INSTEAD your first patch -- the 
>>best would be, to send
>> me
>> always accumulating patchs including everything together 
>>to be applied over
>> the
>> 'virgin' 3.4-rcX kernel.
>> For your information, I dont download the whole git, I 
>>download all 3.X.Y-rcZ
>> , and I
>> recompile everything again (patched), instead of 
>>compiling only the patched
>> subroutines.
>> 
> 
> Ok, when you want to test the latest patch, try this:
> 
> - revert back to the vanilla 3.4-rc2 kernel,
> 
> - boot and do this on the command line:
> 
> 	for i in $(seq 1 10000); do sleep 0 & done
> 	grep KernelStack /proc/meminfo
> 
> - record that number,
> 
> - apply the patch at https://lkml.org/lkml/2012/4/9/428,
> 
> - boot and do the same two command lines,
> 
> - compare the number with the previous number from the 
>first boot.
> 
> The number should be much lower after the patch is 
>applied.
> 
> Thanks!
> 
> 

"werner" <w.landgraf@ru.ru>
---
Professional hosting for everyone - http://www.host.ru

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
  2012-04-09 23:55                 ` Linus Torvalds
  (?)
  (?)
@ 2012-04-14 20:50                 ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 43+ messages in thread
From: Srivatsa S. Bhat @ 2012-04-14 20:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Rientjes, Andrew Morton, werner, Rik van Riel,
	Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
	Christian Bejram, Paul E. McKenney, Anton Vorontsov,
	Greg Kroah-Hartman, stable, Ingo Molnar, linux-kernel,
	Rafael J. Wysocki, Peter Zijlstra, Steven Rostedt

On 04/10/2012 05:25 AM, Linus Torvalds wrote:

> On Mon, Apr 9, 2012 at 4:25 PM, David Rientjes <rientjes@google.com> wrote:
>>
>> You could that if you also turned the check for "ret == NOTIFY_OK" in
>> profile_handoff_task() into "ret & NOTIFY_OK" in your patch, otherwise you
>> get a double free from __put_task_struct() and oprofile.
> 
> Why? NOTIFY_DONE is zero.
> 
> I do agree that we *also* could do the "& NOTIFY_OK" and make it
> clearer that we're oring bits together. And we could document the
> stupid notifier interfaces to do this all, and just make the rules be
> *sane* when you have multiple notifiers.
> 
> And sane rules would be either:
> 
>  - you always return an error return, and notifiers all return either
> 0 or a negative error number, and we stop on the first error and
> return that.
> 
>  - you return a bitmask, and we or all bits together (and we can
> certainly continue to have a "stop here" bit)
> 


Even I think 'or'ing the bits makes more sense than returning the last
return value.

CPU hotplug and suspend/resume are two of the things that I know of,
that use notifiers quite a bit. However, neither of them actually care
about the exact return value - if it is an error return, no matter which
one or for what reason, they do the same error handling; and it works
for them. IOW, if we change the documented behaviour of notifiers to
return 'or' of all return values, that would continue to work well
with these users.

Of course, there are other users like profile_handoff_task() that do
care about exactly what the return value was, but I guess we can
gradually adapt such users to the better, saner rules for the notifier
return values, as you proposed.

> But the current notifier semantics are just insane. The whole "we
> return the last return value" is crazy. It's by definition a random
> number, since the whole point of notifiers is that there can be
> multiple, and they aren't "ordered". So the whole "last return value"
> is something I just look at and say: "Whoever designed that is a
> f*cking moron".
> 

[...]

> 
> Again, almost every notifier user has always been total crap. It's
> just a stupid abstraction.



> "Something happened". "Oh, ok".
> 


Never saw such a concise and apt definition of notifiers before ;-)

However, unfortunately, what other better mechanism do we have, to
deal with things that affect stuff across multiple subsystems, like
some of the users mentioned above?  Hmm...

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2012-04-14 20:50 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-09  2:42 v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs) Linus Torvalds
2012-04-09  2:50 ` Andrew Morton
2012-04-09  3:11   ` Linus Torvalds
2012-04-09  7:04     ` Sven Joachim
2012-04-09 15:24       ` Linus Torvalds
2012-04-09 15:43         ` Sven Joachim
2012-04-09 15:57       ` Rik van Riel
2012-04-09 16:19         ` Sven Joachim
2012-04-09 16:33           ` Rik van Riel
2012-04-09 17:00             ` Pekka Enberg
2012-04-09 17:19               ` Sven Joachim
2012-04-09 17:00             ` Sven Joachim
2012-04-09 17:20               ` Rik van Riel
2012-04-09 10:15     ` David Rientjes
2012-04-09 15:39       ` Linus Torvalds
2012-04-09 21:22         ` David Rientjes
2012-04-09 22:09           ` Linus Torvalds
2012-04-09 23:25             ` David Rientjes
2012-04-09 23:55               ` Linus Torvalds
2012-04-09 23:55                 ` Linus Torvalds
2012-04-10  0:04                 ` David Rientjes
2012-04-10  0:04                   ` David Rientjes
2012-04-14 20:50                 ` Srivatsa S. Bhat
2012-04-09 23:56               ` [patch] android, lowmemorykiller: remove task handoff notifier David Rientjes
2012-04-10  1:23                 ` Colin Cross
2012-04-10  1:23                   ` Colin Cross
     [not found]               ` <web-723076709@zbackend1.aha.ru>
     [not found]                 ` <alpine.DEB.2.00.1204091637280.21813@chino.kir.corp.google.com>
     [not found]                   ` <web-723082731@zbackend1.aha.ru>
     [not found]                     ` <alpine.DEB.2.00.1204091707580.21813@chino.kir.corp.google.com>
2012-04-10  7:09                       ` v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs) werner
2012-04-10  7:10                       ` werner
2012-04-09 22:13           ` Colin Cross
2012-04-09 22:13             ` Colin Cross
2012-04-09 22:21             ` Greg Kroah-Hartman
2012-04-09 22:21               ` Greg Kroah-Hartman
2012-04-09 22:44               ` john stultz
2012-04-09 22:44                 ` john stultz
2012-04-09 22:30             ` Linus Torvalds
2012-04-09 23:37             ` David Rientjes
2012-04-10  0:23               ` Colin Cross
2012-04-10  0:23                 ` Colin Cross
2012-04-10  0:32                 ` David Rientjes
2012-04-10  1:21                   ` Colin Cross
2012-04-10  1:21                     ` Colin Cross
2012-04-10  1:33                     ` David Rientjes
2012-04-10  1:37                       ` Colin Cross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.