* v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
@ 2012-04-09 2:42 Linus Torvalds
2012-04-09 2:50 ` Andrew Morton
0 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2012-04-09 2:42 UTC (permalink / raw)
To: Andrew Morton, David Rientjes, Rik van Riel, Hugh Dickins, werner
Cc: linux-kernel
Guys, there's something wrong in the VM. Most likely suspects added to
the participants list.
Apparently things go south and the oom killer is invoked. X.org seems
to get killed.
Any hints? Werner traditionally finds problems by enabling every
single config option there is, I assume this is another of those
kernes..
Linus
On Sun, Apr 8, 2012 at 5:21 PM, werner <w.landgraf@ru.ru> wrote:
> 3.4-rc1 sticks and crashs
>
> The problem continues also with 3.4-rc2 . The computer boots and runs
> during hours normal, then suddenly everything sticks,
> in a part of the cases the computer crashs. The computer wasn't working
> hard, almost idle.
>
> In the example below, at the end it didn't crash completely, but only kde3 .
> However, then i rebooted (last line).
>
> In 3.3 this didnt happen, thus, it's a regression.
>
>
> Werner Landgraf
>
>
>
>
>
> Apr 8 13:55:10 werner kernel: Malformed early option 'acpi'
> Apr 8 13:55:10 werner kernel: ACPI: RSDP 000f6920 00014 (v00 GBT )
> Apr 8 13:55:10 werner kernel: ACPI: RSDT bfee3000 00038 (v01 GBT
> NVDAACPI 42302E31 NVDA 01010101)
> Apr 8 13:55:10 werner kernel: ACPI: FACP bfee3040 00074 (v01 GBT
> NVDAACPI 42302E31 NVDA 01010101)
> Apr 8 13:55:10 werner kernel: ACPI: DSDT bfee30c0 0469E (v01 GBT
> NVDAACPI 00001000 MSFT 03000000)
> Apr 8 13:55:10 werner kernel: ACPI: FACS bfee0000 00040
> Apr 8 13:55:10 werner kernel: ACPI: SSDT bfee7880 0028A (v01 PTLTD
> POWERNOW 00000001 LTP 00000001)
> Apr 8 13:55:10 werner kernel: ACPI: HPET bfee7b40 00038 (v01 GBT
> NVDAACPI 42302E31 NVDA 00000098)
> Apr 8 13:55:10 werner kernel: ACPI: MCFG bfee7b80 0003C (v01 GBT
> NVDAACPI 42302E31 NVDA 01010101)
> Apr 8 13:55:10 werner kernel: ACPI: APIC bfee7780 000D0 (v01 GBT
> NVDAACPI 42302E31 NVDA 01010101)
> Apr 8 13:55:10 werner kernel: Zone PFN ranges:
> Apr 8 13:55:10 werner kernel: DMA 0x00000010 -> 0x00001000
> Apr 8 13:55:10 werner kernel: Normal 0x00001000 -> 0x000377fe
> Apr 8 13:55:10 werner kernel: HighMem 0x000377fe -> 0x000bfee0
> Apr 8 13:55:10 werner kernel: Movable zone start PFN for each node
> Apr 8 13:55:10 werner kernel: Early memory PFN ranges
> Apr 8 13:55:10 werner kernel: 0: 0x00000010 -> 0x0000009f
> Apr 8 13:55:10 werner kernel: 0: 0x00000100 -> 0x000bfee0
> Apr 8 13:55:13 werner udevd-event[9104]: run_program: exec of program
> '/etc/rc.d/rc.media-daemon' failed
> Apr 8 13:55:13 werner kernel: Built 1 zonelists in Zone order, mobility
> grouping on. Total pages: 779889
> Apr 8 13:55:13 werner kernel: Fast TSC calibration using PIT
> Apr 8 13:55:13 werner kernel: Detected 2511.428 MHz processor.
> Apr 8 13:55:13 werner udevd-event[9099]: udev_rules_apply_format: unknown
> format variable '$modalias | grep -q eagle-usb || exit; while !
> /sbin/eaglectrl -p 2>/dev/null | /bin/grep -q Post-firmware; do sleep 2;
> done; /sbin/eaglectrl -d''
> Apr 8 13:55:13 werner kernel: ACPI: setting ELCR to 0200 (from 0c20)
> Apr 8 13:55:13 werner kernel: AMD PMU driver.
> Apr 8 13:55:13 werner kernel: raid6: int32x1 945 MB/s
> Apr 8 13:55:13 werner kernel: raid6: int32x2 933 MB/s
> Apr 8 13:55:13 werner kernel: raid6: int32x4 933 MB/s
> Apr 8 13:55:13 werner kernel: raid6: int32x8 613 MB/s
> Apr 8 13:55:13 werner kernel: raid6: mmxx1 2035 MB/s
> Apr 8 13:55:13 werner kernel: raid6: mmxx2 3582 MB/s
> Apr 8 13:55:13 werner kernel: raid6: sse1x1 2023 MB/s
> Apr 8 13:55:13 werner kernel: raid6: sse1x2 3496 MB/s
> Apr 8 13:55:13 werner kernel: raid6: sse2x1 3445 MB/s
> Apr 8 13:55:13 werner kernel: raid6: sse2x2 4582 MB/s
> Apr 8 13:55:13 werner kernel: raid6: using algorithm sse2x2 (4582 MB/s)
> Apr 8 13:55:13 werner kernel: Expanded resource reserved due to conflict
> with PCI Bus 0000:00
> Apr 8 13:55:13 werner udevd-event[9098]: udev_rules_apply_format: unknown
> format variable '$modalias | grep -q eagle-usb || exit; while !
> /sbin/eaglectrl -p 2>/dev/null | /bin/grep -q Post-firmware; do sleep 2;
> done; /sbin/eaglectrl -d''
> Apr 8 13:55:13 werner kernel: mdacon: MDA with 8K of memory detected.
> Apr 8 13:55:13 werner kernel: ACPI: PCI Interrupt Link [LUBA] enabled at
> IRQ 10
> Apr 8 13:55:13 werner kernel: ACPI: PCI Interrupt Link [LUB2] enabled at
> IRQ 11
> Apr 8 13:55:13 werner kernel: microcode: CPU0: family 15 not supported
> Apr 8 13:55:13 werner kernel: The force parameter has not been set to 1 so
> the Iris poweroff handler will not be installed.
> Apr 8 13:55:13 werner kernel: highmem bounce pool size: 64 pages
> Apr 8 13:55:13 werner kernel: Dquot-cache hash table entries: 1024 (order
> 0, 4096 bytes)
> Apr 8 13:55:13 werner kernel: DLM installed
> Apr 8 13:55:13 werner kernel: EFS: 1.0a - http://aeschi.ch.eu.org/efs/
> Apr 8 13:55:13 werner kernel: OCFS2 User DLM kernel interface loaded
> Apr 8 13:55:13 werner kernel: GFS2 installed
> Apr 8 13:55:13 werner kernel: acpiphp_ibm: ibm_acpiphp_init:
> acpi_walk_namespace failed
> Apr 8 13:55:13 werner kernel: ACPI: PCI Interrupt Link [LIGP] enabled at
> IRQ 10
> Apr 8 13:55:13 werner kernel: nvidiafb: CRTC0 analog found
> Apr 8 13:55:13 werner kernel: nvidiafb: CRTC1 analog not found
> Apr 8 13:55:13 werner kernel: i2c i2c-0: unable to read EDID block.
> Apr 8 13:55:13 werner last message repeated 2 times
> Apr 8 13:55:13 werner kernel: nvidiafb: EDID found from BUS2
> Apr 8 13:55:13 werner kernel: nvidiafb: CRTC 0 appears to have a CRT
> attached
> Apr 8 13:55:13 werner kernel: nvidiafb: Using CRT on CRTC 0
> Apr 8 13:55:13 werner kernel: Could not find Carillo Ranch MCH device.
> Apr 8 13:55:13 werner kernel: no IO addresses supplied
> Apr 8 13:55:13 werner kernel: hgafb: probe of hgafb.0 failed with error -22
> Apr 8 13:55:13 werner kernel: uvesafb: failed to execute /sbin/v86d
> Apr 8 13:55:13 werner kernel: uvesafb: make sure that the v86d helper is
> installed and executable
> Apr 8 13:55:13 werner kernel: uvesafb: Getting VBE info block failed
> (eax=0x4f00, err=-2)
> Apr 8 13:55:13 werner kernel: uvesafb: vbe_init() failed with -22
> Apr 8 13:55:13 werner kernel: uvesafb: probe of uvesafb.0 failed with error
> -22
> Apr 8 13:55:13 werner kernel: vesafb: cannot reserve video memory at
> 0xd0000000
> Apr 8 13:55:13 werner kernel: toshiba: not a supported Toshiba laptop
> Apr 8 13:55:13 werner kernel: [drm:i915_init] *ERROR* drm/i915 can't work
> without intel_agp module!
> Apr 8 13:55:13 werner kernel: Compaq SMART2 Driver (v 2.6.0)
> Apr 8 13:55:13 werner kernel: i2c-core: driver [isl29003] using legacy
> suspend method
> Apr 8 13:55:13 werner kernel: i2c-core: driver [isl29003] using legacy
> resume method
> Apr 8 13:55:13 werner kernel: amd74xx 0000:00:06.0: BIOS didn't set cable
> bits correctly. Enabling workaround.
> Apr 8 13:55:13 werner kernel: Loading Adaptec I2O RAID: Version 2.4 Build
> 5go
> Apr 8 13:55:13 werner kernel: scsi: <fdomain> Detection failed (no card)
> Apr 8 13:55:13 werner kernel: NCR53c406a: no available ports found
> Apr 8 13:55:13 werner kernel: qla2xxx [0000:00:00.0]-0005: : QLogic Fibre
> Channel HBA Driver: 8.03.07.13-k.
> Apr 8 13:55:13 werner kernel: Emulex LightPulse Fibre Channel SCSI driver
> 8.3.30
> Apr 8 13:55:13 werner kernel: Copyright(c) 2004-2009 Emulex. All rights
> reserved.
> Apr 8 13:55:13 werner kernel: Failed initialization of WD-7000 SCSI card!
> Apr 8 13:55:13 werner kernel: GDT-HA: Storage RAID Controller Driver.
> Version: 3.05
> Apr 8 13:55:13 werner kernel: 3ware Storage Controller device driver for
> Linux v1.26.02.003.
> Apr 8 13:55:13 werner kernel: 3ware 9000 Storage Controller device driver
> for Linux v2.26.02.014.
> Apr 8 13:55:13 werner kernel: imm: Version 2.05 (for Linux 2.4.0)
> Apr 8 13:55:13 werner kernel: ACPI: PCI Interrupt Link [LSID] enabled at
> IRQ 11
> Apr 8 13:55:13 werner kernel: ACPI: PCI Interrupt Link [LFID] enabled at
> IRQ 10
> Apr 8 13:55:13 werner kernel: Error: Driver 'pata_platform' is already
> registered, aborting...
> Apr 8 13:55:13 werner kernel: physmap-flash.0: failed to claim resource 0
> Apr 8 13:55:13 werner kernel: Failed to ioremap_nocache
> Apr 8 13:55:13 werner last message repeated 2 times
> Apr 8 13:55:13 werner kernel: SNAPGEAR: failed to ioremap() BOOTCS
> Apr 8 13:55:13 werner kernel: Generic platform RAM MTD, (c) 2004 Simtec
> Electronics
> Apr 8 13:55:13 werner kernel: [nandsim] warning: read_byte: unexpected data
> output cycle, state is STATE_READY return 0x0
> Apr 8 13:55:13 werner last message repeated 5 times
> Apr 8 13:55:13 werner kernel: flash size: 8 MiB
> Apr 8 13:55:13 werner kernel: page size: 512 bytes
> Apr 8 13:55:13 werner kernel: OOB area size: 16 bytes
> Apr 8 13:55:13 werner kernel: sector size: 8 KiB
> Apr 8 13:55:13 werner kernel: pages number: 16384
> Apr 8 13:55:13 werner kernel: pages per sector: 16
> Apr 8 13:55:13 werner kernel: bus width: 8
> Apr 8 13:55:13 werner kernel: bits in sector size: 13
> Apr 8 13:55:13 werner kernel: bits in page size: 9
> Apr 8 13:55:13 werner kernel: bits in OOB size: 4
> Apr 8 13:55:13 werner kernel: flash size with OOB: 8448 KiB
> Apr 8 13:55:13 werner kernel: page address bytes: 3
> Apr 8 13:55:13 werner kernel: sector address bytes: 2
> Apr 8 13:55:13 werner kernel: options: 0x62
> Apr 8 13:55:13 werner kernel: onenand_wait: timeout! ctrl=0x0000
> intr=0x0000
> Apr 8 13:55:13 werner kernel: DE600: port 0x378 busy
> Apr 8 13:55:13 werner kernel: paride: aten registered as protocol 0
> Apr 8 13:55:13 werner kernel: paride: bpck registered as protocol 1
> Apr 8 13:55:13 werner kernel: paride: comm registered as protocol 2
> Apr 8 13:55:13 werner kernel: paride: dstr registered as protocol 3
> Apr 8 13:55:13 werner kernel: paride: k951 registered as protocol 4
> Apr 8 13:55:13 werner kernel: paride: k971 registered as protocol 5
> Apr 8 13:55:13 werner kernel: paride: epat registered as protocol 6
> Apr 8 13:55:13 werner kernel: paride: epia registered as protocol 7
> Apr 8 13:55:13 werner kernel: paride: frpw registered as protocol 8
> Apr 8 13:55:13 werner kernel: paride: friq registered as protocol 9
> Apr 8 13:55:13 werner kernel: paride: fit2 registered as protocol 10
> Apr 8 13:55:13 werner kernel: paride: fit3 registered as protocol 11
> Apr 8 13:55:13 werner kernel: paride: on20 registered as protocol 12
> Apr 8 13:55:13 werner kernel: paride: on26 registered as protocol 13
> Apr 8 13:55:13 werner kernel: paride: ktti registered as protocol 14
> Apr 8 13:55:13 werner kernel: paride: bpck6 registered as protocol 15
> Apr 8 13:55:13 werner kernel: pd: pd version 1.05, major 45, cluster 64,
> nice 0
> Apr 8 13:55:13 werner kernel: pda: Autoprobe failed
> Apr 8 13:55:13 werner kernel: pd: no valid drive found
> Apr 8 13:55:13 werner kernel: pcd: pcd version 1.07, major 46, nice 0
> Apr 8 13:55:13 werner kernel: pcd0: Autoprobe failed
> Apr 8 13:55:13 werner kernel: pcd: No CD-ROM drive found
> Apr 8 13:55:13 werner kernel: pf: pf version 1.04, major 47, cluster 64,
> nice 0
> Apr 8 13:55:13 werner kernel: pf: No ATAPI disk detected
> Apr 8 13:55:13 werner kernel: pt: pt version 1.04, major 96
> Apr 8 13:55:13 werner kernel: sr0: scsi3-mmc drive: 48x/48x writer dvd-ram
> cd/rw xa/form2 cdda tray
> Apr 8 13:55:13 werner kernel: pt0: Autoprobe failed
> Apr 8 13:55:13 werner kernel: pt: No ATAPI tape drive detected
> Apr 8 13:55:13 werner kernel: pg: pg version 1.02, major 97
> Apr 8 13:55:13 werner kernel: pga: Autoprobe failed
> Apr 8 13:55:13 werner kernel: pg: No ATAPI device detected
> Apr 8 13:55:13 werner kernel: mk712: device not present
> Apr 8 13:55:13 werner kernel: wistron_btns: System unknown
> Apr 8 13:55:13 werner kernel: EISA: Cannot allocate resource for mainboard
> Apr 8 13:55:13 werner kernel: Cannot allocate resource for EISA slot 1
> Apr 8 13:55:13 werner kernel: Cannot allocate resource for EISA slot 2
> Apr 8 13:55:13 werner kernel: Cannot allocate resource for EISA slot 3
> Apr 8 13:55:13 werner kernel: Cannot allocate resource for EISA slot 4
> Apr 8 13:55:13 werner kernel: Cannot allocate resource for EISA slot 5
> Apr 8 13:55:13 werner kernel: Cannot allocate resource for EISA slot 6
> Apr 8 13:55:13 werner kernel: Cannot allocate resource for EISA slot 7
> Apr 8 13:55:13 werner kernel: Cannot allocate resource for EISA slot 8
> Apr 8 13:55:13 werner kernel: asus_wmi: Management GUID not found
> Apr 8 13:55:13 werner kernel: asus_wmi: Management GUID not found
> Apr 8 13:55:13 werner kernel: compal_laptop: Motherboard not recognized
> (You could try the module's force-parameter)
> Apr 8 13:55:13 werner kernel: dell_wmi: No known WMI GUID found
> Apr 8 13:55:13 werner kernel: dell_wmi_aio: No known WMI GUID found
> Apr 8 13:55:13 werner kernel: acer_wmi: No or unsupported WMI interface,
> unable to load
> Apr 8 13:55:13 werner kernel: acerhdf: unknown (unsupported) BIOS version
> Gigabyte Technology Co., Ltd./M68M-S2P/FC, please report, aborting!
> Apr 8 13:55:13 werner kernel: hdaps: supported laptop not found!
> Apr 8 13:55:13 werner kernel: hdaps: driver init failed (ret=-19)!
> Apr 8 13:55:13 werner kernel: msi_wmi: This machine doesn't have
> MSI-hotkeys through WMI
> Apr 8 13:55:13 werner kernel: intel_oaktrail: Platform not recognized (You
> could try the module's force-parameter)
> Apr 8 13:55:13 werner kernel: OK
> Apr 8 13:55:13 werner kernel: OK
> Apr 8 13:55:13 werner kernel: register_blkdev: cannot get major 3 for hd
> Apr 8 13:55:13 werner kernel: drivers/rtc/hctosys.c: unable to open rtc
> device (rtc0)
> Apr 8 13:55:13 werner kernel: udevd (5372): /proc/5372/oom_adj is
> deprecated, please use /proc/5372/oom_score_adj instead.
> Apr 8 13:55:13 werner kernel: end_request: I/O error, dev fd0, sector 0
> Apr 8 13:55:13 werner kernel: ACPI: PCI Interrupt Link [LMAC] enabled at
> IRQ 11
> Apr 8 13:55:13 werner kernel: ACPI: PCI Interrupt Link [LAZA] enabled at
> IRQ 5
> Apr 8 13:55:13 werner kernel: ACPI: PCI Interrupt Link [LNK2] enabled at
> IRQ 5
> Apr 8 13:55:13 werner kernel: 2:3:1: cannot get freq at ep 0x84
> Apr 8 13:55:13 werner kernel: k8temp 0000:00:18.3: Temperature readouts
> might be wrong - check erratum #141
> Apr 8 13:55:13 werner kernel: EXT3-fs (sda1): warning: maximal mount count
> reached, running e2fsck is recommended
> Apr 8 13:55:22 werner apcupsd[9080]: apcupsd FATAL ERROR in smartsetup.c at
> line 171 PANIC! Cannot communicate with UPS via serial port. Please make
> sure the port specified on the DEVICE directive is correct, and that your
> cable specification on the UPSCABLE directive is correct.
> Apr 8 13:55:22 werner apcupsd[9080]: apcupsd error shutdown completed
> Apr 8 13:55:30 werner hpijs[9358]: prnt/hpijs/hpijs.cpp 614: unable to init
> hpijs server
> Apr 8 13:55:35 werner udevd[5372]: add_to_rules: unknown key 'MODALIAS' in
> /etc/udev/rules.d/80-eagle-usb.rules:1
> Apr 8 13:56:03 werner kdm_greet[9504]: Can't open default user face
> Apr 8 13:57:37 werner named[10001]: /etc/named.conf:3: option
> 'multiple-cnames' is obsolete
> Apr 8 20:29:10 werner kernel: iwconfig invoked oom-killer:
> gfp_mask=0x800d0, order=0, oom_adj=0, oom_score_adj=0
> Apr 8 20:29:11 werner kernel: Pid: 31155, comm: iwconfig Not tainted
> 3.4.0-rc2-i486-1sys #1
> Apr 8 20:29:11 werner kernel: Call Trace:
> Apr 8 20:29:11 werner kernel: [<c10356ff>] ? printk+0x20/0x22
> Apr 8 20:29:11 werner kernel: [<c10af32b>] dump_header+0x6f/0x95
> Apr 8 20:29:11 werner kernel: [<c10af53f>] oom_kill_process+0x52/0x251
> Apr 8 20:29:11 werner kernel: [<c10af803>] ? select_bad_process+0xc5/0x11c
> Apr 8 20:29:11 werner kernel: [<c10af99e>] out_of_memory+0x144/0x1b4
> Apr 8 20:29:11 werner kernel: [<c10b269c>]
> __alloc_pages_nodemask+0x501/0x63f
> Apr 8 20:29:11 werner kernel: [<c10b27f6>] __get_free_pages+0x1c/0x2d
> Apr 8 20:29:11 werner kernel: [<c113197d>] do_proc_readlink+0x27/0x7b
> Apr 8 20:29:11 werner kernel: [<c1133367>] proc_pid_readlink+0x48/0x5b
> Apr 8 20:29:11 werner kernel: [<c10ee9c2>] sys_readlinkat+0x81/0x95
> Apr 8 20:29:11 werner kernel: [<c10eea02>] sys_readlink+0x2c/0x2e
> Apr 8 20:29:11 werner kernel: [<c208c05c>] syscall_call+0x7/0xb
> Apr 8 20:29:11 werner kernel: Mem-Info:
> Apr 8 20:29:11 werner kernel: DMA per-cpu:
> Apr 8 20:29:11 werner kernel: CPU 0: hi: 0, btch: 1 usd: 0
> Apr 8 20:29:11 werner kernel: Normal per-cpu:
> Apr 8 20:29:11 werner kernel: CPU 0: hi: 186, btch: 31 usd: 89
> Apr 8 20:29:11 werner kernel: HighMem per-cpu:
> Apr 8 20:29:11 werner kernel: CPU 0: hi: 186, btch: 31 usd: 87
> Apr 8 20:29:11 werner kernel: active_anon:80153 inactive_anon:71 isolated_anon:0
> Apr 8 20:29:11 werner kernel: active_file:15189 inactive_file:21520 isolated_file:0
> Apr 8 20:29:11 werner kernel: unevictable:0 dirty:1 writeback:0 unstable:0
> Apr 8 20:29:11 werner kernel: free:445370 slab_reclaimable:3273 slab_unreclaimable:37492
> Apr 8 20:29:11 werner kernel: mapped:19922 shmem:711 pagetables:597 bounce:0
> Apr 8 20:29:11 werner kernel: DMA free:4240kB min:784kB low:980kB
> high:1176kB active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:24kB slab_unreclaimable:2184kB kernel_stack:9392kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? yes
> Apr 8 20:29:11 werner kernel: lowmem_reserve[]: 0 865 3031 3031
> Apr 8 20:29:11 werner kernel: Normal free:44004kB min:44012kB low:55012kB
> high:66016kB active_anon:0kB inactive_anon:0kB active_file:132kB
> inactive_file:140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:13068kB slab_unreclaimable:147784kB kernel_stack:628952kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1376
> all_unreclaimable? yes
> Apr 8 20:29:11 werner kdm[9483]: X server for display :0 terminated unexpectedly
> Apr 8 20:29:11 werner kernel: lowmem_reserve[]: 0 0 17326 17326
> Apr 8 20:29:11 werner kernel: HighMem free:1733236kB min:512kB low:28056kB
> high:55600kB active_anon:320612kB inactive_anon:284kB active_file:60624kB
> inactive_file:85940kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:2217808kB mlocked:0kB dirty:4kB writeback:0kB mapped:79688kB
> shmem:2844kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
> pagetables:2388kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> Apr 8 20:29:11 werner kernel: lowmem_reserve[]: 0 0 0 0
> Apr 8 20:29:11 werner kernel: DMA: 116*4kB 18*8kB 1*16kB 1*32kB 0*64kB
> 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4240kB
> Apr 8 20:29:11 werner kernel: Normal: 1527*4kB 535*8kB 279*16kB 177*32kB
> 65*64kB 23*128kB 6*256kB 3*512kB 1*1024kB 0*2048kB 3*4096kB = 44004kB
> Apr 8 20:29:11 werner kernel: HighMem: 6291*4kB 5495*8kB 3815*16kB
> 2528*32kB 1406*64kB 597*128kB 228*256kB 104*512kB 63*1024kB 18*2048kB
> 279*4096kB = 1733236kB
> Apr 8 20:29:11 werner kernel: 37420 total pagecache pages
> Apr 8 20:29:11 werner kernel: 0 pages in swap cache
> Apr 8 20:29:11 werner kernel: Swap cache stats: add 0, delete 0, find 0/0
> Apr 8 20:29:11 werner kernel: Free swap = 0kB
> Apr 8 20:29:11 werner kernel: Total swap = 0kB
> Apr 8 20:29:11 werner kernel: 786128 pages RAM
> Apr 8 20:29:11 werner kernel: 558818 pages HighMem
> Apr 8 20:29:11 werner kernel: 13582 pages reserved
> Apr 8 20:29:11 werner kernel: 109538 pages shared
> Apr 8 20:29:11 werner kernel: 223297 pages non-shared
> Apr 8 20:29:11 werner kernel: Out of memory: Kill process 9499 (X) score 29
> or sacrifice child
> Apr 8 20:29:11 werner kernel: Killed process 9499 (X) total-vm:451020kB,
> anon-rss:178420kB, file-rss:3668kB
> Apr 8 20:29:12 werner kernel: iwconfig invoked oom-killer:
> gfp_mask=0x800d0, order=0, oom_adj=0, oom_score_adj=0
> Apr 8 20:29:12 werner kernel: Pid: 31155, comm: iwconfig Not tainted
> 3.4.0-rc2-i486-1sys #1
> Apr 8 20:29:12 werner kernel: Call Trace:
> Apr 8 20:29:12 werner kernel: [<c10356ff>] ? printk+0x20/0x22
> Apr 8 20:29:12 werner kernel: [<c10af32b>] dump_header+0x6f/0x95
> Apr 8 20:29:12 werner kernel: [<c10af53f>] oom_kill_process+0x52/0x251
> Apr 8 20:29:13 werner kernel: [<c10af803>] ? select_bad_process+0xc5/0x11c
> Apr 8 20:29:13 werner kernel: [<c10af99e>] out_of_memory+0x144/0x1b4
> Apr 8 20:29:13 werner kernel: [<c10b269c>]
> __alloc_pages_nodemask+0x501/0x63f
> Apr 8 20:29:13 werner kernel: [<c10b27f6>] __get_free_pages+0x1c/0x2d
> Apr 8 20:29:13 werner kernel: [<c113197d>] do_proc_readlink+0x27/0x7b
> Apr 8 20:29:13 werner kernel: [<c1133367>] proc_pid_readlink+0x48/0x5b
> Apr 8 20:29:13 werner kernel: [<c10ee9c2>] sys_readlinkat+0x81/0x95
> Apr 8 20:29:13 werner kernel: [<c10eea02>] sys_readlink+0x2c/0x2e
> Apr 8 20:29:14 werner kernel: [<c208c05c>] syscall_call+0x7/0xb
> Apr 8 20:29:14 werner kernel: Mem-Info:
> Apr 8 20:29:14 werner kernel: DMA per-cpu:
> Apr 8 20:29:14 werner kernel: CPU 0: hi: 0, btch: 1 usd: 0
> Apr 8 20:29:14 werner kernel: Normal per-cpu:
> Apr 8 20:29:14 werner kernel: CPU 0: hi: 186, btch: 31 usd: 95
> Apr 8 20:29:14 werner kernel: HighMem per-cpu:
> Apr 8 20:29:14 werner kernel: CPU 0: hi: 186, btch: 31 usd: 154
> Apr 8 20:29:14 werner kernel: active_anon:35273 inactive_anon:32
> isolated_anon:0
> Apr 8 20:29:14 werner kernel: active_file:15228 inactive_file:21481
> isolated_file:0
> Apr 8 20:29:14 werner kernel: unevictable:0 dirty:1 writeback:0 unstable:0
> Apr 8 20:29:14 werner kernel: free:490360 slab_reclaimable:3273
> slab_unreclaimable:37492
> Apr 8 20:29:14 werner kernel: mapped:19048 shmem:673 pagetables:483
> bounce:0
> Apr 8 20:29:14 werner kernel: DMA free:4240kB min:784kB low:980kB
> high:1176kB active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:24kB slab_unreclaimable:2184kB kernel_stack:9392kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? yes
> Apr 8 20:29:14 werner kernel: lowmem_reserve[]: 0 865 3031 3031
> Apr 8 20:29:14 werner kernel: Normal free:44004kB min:44012kB low:55012kB
> high:66016kB active_anon:0kB inactive_anon:0kB active_file:132kB
> inactive_file:140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:13068kB slab_unreclaimable:147784kB kernel_stack:628952kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:13024
> all_unreclaimable? yes
> Apr 8 20:29:14 werner kernel: lowmem_reserve[]: 0 0 17326 17326
> Apr 8 20:29:14 werner kernel: HighMem free:1913196kB min:512kB low:28056kB
> high:55600kB active_anon:141092kB inactive_anon:128kB active_file:60780kB
> inactive_file:85784kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:2217808kB mlocked:0kB dirty:4kB writeback:0kB mapped:76192kB
> shmem:2692kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
> pagetables:1932kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> Apr 8 20:29:14 werner kernel: lowmem_reserve[]: 0 0 0 0
> Apr 8 20:29:14 werner kernel: DMA: 116*4kB 18*8kB 1*16kB 1*32kB 0*64kB
> 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4240kB
> Apr 8 20:29:14 werner kernel: Normal: 1527*4kB 535*8kB 279*16kB 177*32kB
> 65*64kB 23*128kB 6*256kB 3*512kB 1*1024kB 0*2048kB 3*4096kB = 44004kB
> Apr 8 20:29:14 werner kernel: HighMem: 6543*4kB 5584*8kB 3849*16kB
> 2545*32kB 1418*64kB 609*128kB 235*256kB 108*512kB 64*1024kB 21*2048kB
> 319*4096kB = 1913196kB
> Apr 8 20:29:14 werner kernel: 37382 total pagecache pages
> Apr 8 20:29:14 werner kernel: 0 pages in swap cache
> Apr 8 20:29:14 werner kernel: Swap cache stats: add 0, delete 0, find 0/0
> Apr 8 20:29:14 werner kernel: Free swap = 0kB
> Apr 8 20:29:14 werner kernel: Total swap = 0kB
> Apr 8 20:29:14 werner kernel: 786128 pages RAM
> Apr 8 20:29:14 werner kernel: 558818 pages HighMem
> Apr 8 20:29:14 werner kernel: 13582 pages reserved
> Apr 8 20:29:14 werner kernel: 104040 pages shared
> Apr 8 20:29:14 werner kernel: 179138 pages non-shared
> Apr 8 20:29:15 werner kernel: Out of memory: Kill process 11073 (httpd)
> score 4 or sacrifice child
> Apr 8 20:29:15 werner kernel: Killed process 11073 (httpd)
> total-vm:56424kB, anon-rss:9136kB, file-rss:3784kB
> Apr 8 20:29:15 werner kernel: iwconfig invoked oom-killer:
> gfp_mask=0x800d0, order=0, oom_adj=0, oom_score_adj=0
> Apr 8 20:29:15 werner kernel: Pid: 31155, comm: iwconfig Not tainted
> 3.4.0-rc2-i486-1sys #1
> Apr 8 20:29:15 werner kernel: Call Trace:
> Apr 8 20:29:15 werner kernel: [<c10356ff>] ? printk+0x20/0x22
> Apr 8 20:29:15 werner kernel: [<c10af32b>] dump_header+0x6f/0x95
> Apr 8 20:29:15 werner kernel: [<c10af53f>] oom_kill_process+0x52/0x251
> Apr 8 20:29:15 werner kernel: [<c10af803>] ? select_bad_process+0xc5/0x11c
> Apr 8 20:29:15 werner kernel: [<c10af99e>] out_of_memory+0x144/0x1b4
> Apr 8 20:29:15 werner kernel: [<c10b269c>]
> __alloc_pages_nodemask+0x501/0x63f
> Apr 8 20:29:15 werner kernel: [<c10b27f6>] __get_free_pages+0x1c/0x2d
> Apr 8 20:29:15 werner kernel: [<c113197d>] do_proc_readlink+0x27/0x7b
> Apr 8 20:29:15 werner kernel: [<c1133367>] proc_pid_readlink+0x48/0x5b
> Apr 8 20:29:15 werner kernel: [<c10ee9c2>] sys_readlinkat+0x81/0x95
> Apr 8 20:29:15 werner kernel: [<c10eea02>] sys_readlink+0x2c/0x2e
> Apr 8 20:29:15 werner kernel: [<c208c05c>] syscall_call+0x7/0xb
> Apr 8 20:29:15 werner kernel: Mem-Info:
> Apr 8 20:29:15 werner kernel: DMA per-cpu:
> Apr 8 20:29:15 werner kernel: CPU 0: hi: 0, btch: 1 usd: 0
> Apr 8 20:29:15 werner kernel: Normal per-cpu:
> Apr 8 20:29:15 werner kernel: CPU 0: hi: 186, btch: 31 usd: 97
> Apr 8 20:29:15 werner kernel: HighMem per-cpu:
> Apr 8 20:29:15 werner kernel: CPU 0: hi: 186, btch: 31 usd: 169
> Apr 8 20:29:15 werner kernel: active_anon:34116 inactive_anon:32
> isolated_anon:0
> Apr 8 20:29:15 werner kernel: active_file:15397 inactive_file:21312
> isolated_file:0
> Apr 8 20:29:15 werner kernel: unevictable:0 dirty:1 writeback:0 unstable:0
> Apr 8 20:29:15 werner kernel: free:491507 slab_reclaimable:3273
> slab_unreclaimable:37492
> Apr 8 20:29:15 werner kernel: mapped:19048 shmem:673 pagetables:464
> bounce:0
> Apr 8 20:29:15 werner kernel: DMA free:4240kB min:784kB low:980kB
> high:1176kB active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:24kB slab_unreclaimable:2184kB kernel_stack:9392kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? yes
> Apr 8 20:29:15 werner kernel: lowmem_reserve[]: 0 865 3031 3031
> Apr 8 20:29:15 werner kernel: Normal free:44004kB min:44012kB low:55012kB
> high:66016kB active_anon:0kB inactive_anon:0kB active_file:132kB
> inactive_file:140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:13068kB slab_unreclaimable:147784kB kernel_stack:628952kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:15584
> all_unreclaimable? yes
> Apr 8 20:29:15 werner kernel: lowmem_reserve[]: 0 0 17326 17326
> Apr 8 20:29:15 werner kernel: HighMem free:1917784kB min:512kB low:28056kB
> high:55600kB active_anon:136464kB inactive_anon:128kB active_file:61456kB
> inactive_file:85108kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:2217808kB mlocked:0kB dirty:4kB writeback:0kB mapped:76192kB
> shmem:2692kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
> pagetables:1856kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> Apr 8 20:29:15 werner kernel: lowmem_reserve[]: 0 0 0 0
> Apr 8 20:29:15 werner kernel: DMA: 116*4kB 18*8kB 1*16kB 1*32kB 0*64kB
> 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4240kB
> Apr 8 20:29:15 werner kernel: Normal: 1527*4kB 535*8kB 279*16kB 177*32kB
> 65*64kB 23*128kB 6*256kB 3*512kB 1*1024kB 0*2048kB 3*4096kB = 44004kB
> Apr 8 20:29:15 werner kernel: HighMem: 6650*4kB 5596*8kB 3885*16kB
> 2556*32kB 1443*64kB 617*128kB 237*256kB 108*512kB 64*1024kB 21*2048kB
> 319*4096kB = 1917784kB
> Apr 8 20:29:15 werner kernel: 37382 total pagecache pages
> Apr 8 20:29:15 werner kernel: 0 pages in swap cache
> Apr 8 20:29:15 werner kernel: Swap cache stats: add 0, delete 0, find 0/0
> Apr 8 20:29:15 werner kernel: Free swap = 0kB
> Apr 8 20:29:15 werner kernel: Total swap = 0kB
> Apr 8 20:29:15 werner kernel: 786128 pages RAM
> Apr 8 20:29:15 werner kernel: 558818 pages HighMem
> Apr 8 20:29:15 werner kernel: 13582 pages reserved
> Apr 8 20:29:15 werner kernel: 101963 pages shared
> Apr 8 20:29:15 werner kernel: 177974 pages non-shared
> Apr 8 20:29:16 werner kernel: Out of memory: Kill process 10001 (named)
> score 3 or sacrifice child
> Apr 8 20:29:16 werner kernel: Killed process 10001 (named)
> total-vm:41360kB, anon-rss:9828kB, file-rss:2452kB
> Apr 8 20:29:16 werner kernel: iwconfig invoked oom-killer:
> gfp_mask=0x800d0, order=0, oom_adj=0, oom_score_adj=0
> Apr 8 20:29:16 werner kernel: Pid: 31155, comm: iwconfig Not tainted
> 3.4.0-rc2-i486-1sys #1
> Apr 8 20:29:16 werner kernel: Call Trace:
> Apr 8 20:29:16 werner kernel: [<c10356ff>] ? printk+0x20/0x22
> Apr 8 20:29:16 werner kernel: [<c10af32b>] dump_header+0x6f/0x95
> Apr 8 20:29:16 werner kernel: [<c10af53f>] oom_kill_process+0x52/0x251
> Apr 8 20:29:16 werner kernel: [<c10af803>] ? select_bad_process+0xc5/0x11c
> Apr 8 20:29:16 werner kernel: [<c10af99e>] out_of_memory+0x144/0x1b4
> Apr 8 20:29:16 werner kernel: [<c10b269c>]
> __alloc_pages_nodemask+0x501/0x63f
> Apr 8 20:29:16 werner kernel: [<c10b27f6>] __get_free_pages+0x1c/0x2d
> Apr 8 20:29:16 werner kernel: [<c113197d>] do_proc_readlink+0x27/0x7b
> Apr 8 20:29:16 werner kernel: [<c1133367>] proc_pid_readlink+0x48/0x5b
> Apr 8 20:29:16 werner kernel: [<c10ee9c2>] sys_readlinkat+0x81/0x95
> Apr 8 20:29:16 werner kernel: [<c10eea02>] sys_readlink+0x2c/0x2e
> Apr 8 20:29:16 werner kernel: [<c208c05c>] syscall_call+0x7/0xb
> Apr 8 20:29:16 werner kernel: Mem-Info:
> Apr 8 20:29:16 werner kernel: DMA per-cpu:
> Apr 8 20:29:16 werner kernel: CPU 0: hi: 0, btch: 1 usd: 0
> Apr 8 20:29:16 werner kernel: Normal per-cpu:
> Apr 8 20:29:16 werner kernel: CPU 0: hi: 186, btch: 31 usd: 97
> Apr 8 20:29:16 werner kernel: HighMem per-cpu:
> Apr 8 20:29:16 werner kernel: CPU 0: hi: 186, btch: 31 usd: 169
> Apr 8 20:29:16 werner kernel: active_anon:34116 inactive_anon:32
> isolated_anon:0
> Apr 8 20:29:16 werner kernel: active_file:15397 inactive_file:21312
> isolated_file:0
> Apr 8 20:29:16 werner kernel: unevictable:0 dirty:1 writeback:0 unstable:0
> Apr 8 20:29:16 werner kernel: free:491507 slab_reclaimable:3273
> slab_unreclaimable:37492
> Apr 8 20:29:16 werner kernel: mapped:19048 shmem:673 pagetables:464
> bounce:0
> Apr 8 20:29:16 werner kernel: DMA free:4240kB min:784kB low:980kB
> high:1176kB active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:24kB slab_unreclaimable:2184kB kernel_stack:9392kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? yes
> Apr 8 20:29:16 werner kernel: lowmem_reserve[]: 0 865 3031 3031
> Apr 8 20:29:16 werner kernel: Normal free:44004kB min:44012kB low:55012kB
> high:66016kB active_anon:0kB inactive_anon:0kB active_file:132kB
> inactive_file:140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:13068kB slab_unreclaimable:147784kB kernel_stack:628952kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:15840
> all_unreclaimable? yes
> Apr 8 20:29:16 werner kernel: lowmem_reserve[]: 0 0 17326 17326
> Apr 8 20:29:16 werner kernel: HighMem free:1917784kB min:512kB low:28056kB
> high:55600kB active_anon:136464kB inactive_anon:128kB active_file:61456kB
> inactive_file:85108kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:2217808kB mlocked:0kB dirty:4kB writeback:0kB mapped:76192kB
> shmem:2692kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
> pagetables:1856kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> Apr 8 20:29:16 werner kernel: lowmem_reserve[]: 0 0 0 0
> Apr 8 20:29:16 werner kernel: DMA: 116*4kB 18*8kB 1*16kB 1*32kB 0*64kB
> 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4240kB
> Apr 8 20:29:16 werner kernel: Normal: 1527*4kB 535*8kB 279*16kB 177*32kB
> 65*64kB 23*128kB 6*256kB 3*512kB 1*1024kB 0*2048kB 3*4096kB = 44004kB
> Apr 8 20:29:16 werner kernel: HighMem: 6650*4kB 5596*8kB 3885*16kB
> 2556*32kB 1443*64kB 617*128kB 237*256kB 108*512kB 64*1024kB 21*2048kB
> 319*4096kB = 1917784kB
> Apr 8 20:29:16 werner kernel: 37382 total pagecache pages
> Apr 8 20:29:16 werner kernel: 0 pages in swap cache
> Apr 8 20:29:16 werner kernel: Swap cache stats: add 0, delete 0, find 0/0
> Apr 8 20:29:16 werner kernel: Free swap = 0kB
> Apr 8 20:29:16 werner kernel: Total swap = 0kB
> Apr 8 20:29:16 werner kernel: 786128 pages RAM
> Apr 8 20:29:16 werner kernel: 558818 pages HighMem
> Apr 8 20:29:16 werner kernel: 13582 pages reserved
> Apr 8 20:29:16 werner kernel: 101963 pages shared
> Apr 8 20:29:16 werner kernel: 177974 pages non-shared
> Apr 8 20:29:17 werner kernel: Out of memory: Kill process 10002 (named)
> score 4 or sacrifice child
> Apr 8 20:29:17 werner kernel: Killed process 10002 (named)
> total-vm:41360kB, anon-rss:9920kB, file-rss:2540kB
> Apr 8 20:29:17 werner kernel: iwconfig invoked oom-killer:
> gfp_mask=0x800d0, order=0, oom_adj=0, oom_score_adj=0
> Apr 8 20:29:17 werner kernel: Pid: 31155, comm: iwconfig Not tainted
> 3.4.0-rc2-i486-1sys #1
> Apr 8 20:29:17 werner kernel: Call Trace:
> Apr 8 20:29:17 werner kernel: [<c10356ff>] ? printk+0x20/0x22
> Apr 8 20:29:17 werner kernel: [<c10af32b>] dump_header+0x6f/0x95
> Apr 8 20:29:17 werner kernel: [<c10af53f>] oom_kill_process+0x52/0x251
> Apr 8 20:29:17 werner kernel: [<c10af803>] ? select_bad_process+0xc5/0x11c
> Apr 8 20:29:17 werner kernel: [<c10af99e>] out_of_memory+0x144/0x1b4
> Apr 8 20:29:17 werner kernel: [<c10b269c>]
> __alloc_pages_nodemask+0x501/0x63f
> Apr 8 20:29:17 werner kernel: [<c10b27f6>] __get_free_pages+0x1c/0x2d
> Apr 8 20:29:17 werner kernel: [<c113197d>] do_proc_readlink+0x27/0x7b
> Apr 8 20:29:17 werner kernel: [<c1133367>] proc_pid_readlink+0x48/0x5b
> Apr 8 20:29:17 werner kernel: [<c10ee9c2>] sys_readlinkat+0x81/0x95
> Apr 8 20:29:17 werner kernel: [<c10eea02>] sys_readlink+0x2c/0x2e
> Apr 8 20:29:17 werner kernel: [<c208c05c>] syscall_call+0x7/0xb
> Apr 8 20:29:17 werner kernel: Mem-Info:
> Apr 8 20:29:17 werner kernel: DMA per-cpu:
> Apr 8 20:29:17 werner kernel: CPU 0: hi: 0, btch: 1 usd: 0
> Apr 8 20:29:17 werner kernel: Normal per-cpu:
> Apr 8 20:29:17 werner kernel: CPU 0: hi: 186, btch: 31 usd: 102
> Apr 8 20:29:17 werner kernel: HighMem per-cpu:
> Apr 8 20:29:17 werner kernel: CPU 0: hi: 186, btch: 31 usd: 161
> Apr 8 20:29:17 werner kernel: active_anon:31596 inactive_anon:32
> isolated_anon:0
> Apr 8 20:29:17 werner kernel: active_file:15449 inactive_file:21260
> isolated_file:0
> Apr 8 20:29:17 werner kernel: unevictable:0 dirty:1 writeback:0 unstable:0
> Apr 8 20:29:17 werner kernel: free:494050 slab_reclaimable:3273
> slab_unreclaimable:37492
> Apr 8 20:29:17 werner kernel: mapped:18630 shmem:673 pagetables:464
> bounce:0
> Apr 8 20:29:17 werner kernel: DMA free:4240kB min:784kB low:980kB
> high:1176kB active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:24kB slab_unreclaimable:2184kB kernel_stack:9392kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? yes
> Apr 8 20:29:17 werner kernel: lowmem_reserve[]: 0 865 3031 3031
> Apr 8 20:29:17 werner kernel: Normal free:44004kB min:44012kB low:55012kB
> high:66016kB active_anon:0kB inactive_anon:0kB active_file:132kB
> inactive_file:140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:13068kB slab_unreclaimable:147784kB kernel_stack:628952kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:15968
> all_unreclaimable? yes
> Apr 8 20:29:17 werner kernel: lowmem_reserve[]: 0 0 17326 17326
> Apr 8 20:29:17 werner kernel: HighMem free:1927956kB min:512kB low:28056kB
> high:55600kB active_anon:126384kB inactive_anon:128kB active_file:61664kB
> inactive_file:84900kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:2217808kB mlocked:0kB dirty:4kB writeback:0kB mapped:74520kB
> shmem:2692kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
> pagetables:1856kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> Apr 8 20:29:17 werner kernel: lowmem_reserve[]: 0 0 0 0
> Apr 8 20:29:17 werner kernel: DMA: 116*4kB 18*8kB 1*16kB 1*32kB 0*64kB
> 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4240kB
> Apr 8 20:29:17 werner kernel: Normal: 1527*4kB 535*8kB 279*16kB 177*32kB
> 65*64kB 23*128kB 6*256kB 3*512kB 1*1024kB 0*2048kB 3*4096kB = 44004kB
> Apr 8 20:29:17 werner kernel: HighMem: 6715*4kB 5597*8kB 3896*16kB
> 2572*32kB 1453*64kB 624*128kB 245*256kB 111*512kB 64*1024kB 21*2048kB
> 320*4096kB = 1927956kB
> Apr 8 20:29:17 werner kernel: 37382 total pagecache pages
> Apr 8 20:29:17 werner kernel: 0 pages in swap cache
> Apr 8 20:29:17 werner kernel: Swap cache stats: add 0, delete 0, find 0/0
> Apr 8 20:29:17 werner kernel: Free swap = 0kB
> Apr 8 20:29:17 werner kernel: Total swap = 0kB
> Apr 8 20:29:17 werner kernel: 786128 pages RAM
> Apr 8 20:29:17 werner kernel: 558818 pages HighMem
> Apr 8 20:29:17 werner kernel: 13582 pages reserved
> Apr 8 20:29:17 werner kernel: 101323 pages shared
> Apr 8 20:29:17 werner kernel: 175859 pages non-shared
> Apr 8 20:29:17 werner kernel: Out of memory: Kill process 11074 (httpd)
> score 3 or sacrifice child
> Apr 8 20:29:17 werner kernel: Killed process 11074 (httpd)
> total-vm:55376kB, anon-rss:8092kB, file-rss:3632kB
> Apr 8 20:29:17 werner kernel: iwconfig invoked oom-killer:
> gfp_mask=0x800d0, order=0, oom_adj=0, oom_score_adj=0
> Apr 8 20:29:17 werner kernel: Pid: 31155, comm: iwconfig Not tainted
> 3.4.0-rc2-i486-1sys #1
> Apr 8 20:29:17 werner kernel: Call Trace:
> Apr 8 20:29:17 werner kernel: [<c10356ff>] ? printk+0x20/0x22
> Apr 8 20:29:17 werner kernel: [<c10af32b>] dump_header+0x6f/0x95
> Apr 8 20:29:17 werner kernel: [<c10af53f>] oom_kill_process+0x52/0x251
> Apr 8 20:29:17 werner kernel: [<c10af803>] ? select_bad_process+0xc5/0x11c
> Apr 8 20:29:17 werner kernel: [<c10af99e>] out_of_memory+0x144/0x1b4
> Apr 8 20:29:17 werner kernel: [<c10b269c>]
> __alloc_pages_nodemask+0x501/0x63f
> Apr 8 20:29:17 werner kernel: [<c10b27f6>] __get_free_pages+0x1c/0x2d
> Apr 8 20:29:17 werner kernel: [<c113197d>] do_proc_readlink+0x27/0x7b
> Apr 8 20:29:17 werner kernel: [<c1133367>] proc_pid_readlink+0x48/0x5b
> Apr 8 20:29:17 werner kernel: [<c10ee9c2>] sys_readlinkat+0x81/0x95
> Apr 8 20:29:17 werner kernel: [<c10eea02>] sys_readlink+0x2c/0x2e
> Apr 8 20:29:17 werner kernel: [<c208c05c>] syscall_call+0x7/0xb
> Apr 8 20:29:17 werner kernel: Mem-Info:
> Apr 8 20:29:17 werner kernel: DMA per-cpu:
> Apr 8 20:29:17 werner kernel: CPU 0: hi: 0, btch: 1 usd: 0
> Apr 8 20:29:17 werner kernel: Normal per-cpu:
> Apr 8 20:29:17 werner kernel: CPU 0: hi: 186, btch: 31 usd: 104
> Apr 8 20:29:17 werner kernel: HighMem per-cpu:
> Apr 8 20:29:17 werner kernel: CPU 0: hi: 186, btch: 31 usd: 163
> Apr 8 20:29:17 werner kernel: active_anon:30699 inactive_anon:32
> isolated_anon:0
> Apr 8 20:29:17 werner kernel: active_file:15553 inactive_file:21156
> isolated_file:0
> Apr 8 20:29:17 werner kernel: unevictable:0 dirty:1 writeback:0 unstable:0
> Apr 8 20:29:17 werner kernel: free:494949 slab_reclaimable:3273
> slab_unreclaimable:37492
> Apr 8 20:29:17 werner kernel: mapped:18630 shmem:673 pagetables:445
> bounce:0
> Apr 8 20:29:17 werner kernel: DMA free:4240kB min:784kB low:980kB
> high:1176kB active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:24kB slab_unreclaimable:2184kB kernel_stack:9392kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? yes
> Apr 8 20:29:17 werner kernel: lowmem_reserve[]: 0 865 3031 3031
> Apr 8 20:29:17 werner kernel: Normal free:44004kB min:44012kB low:55012kB
> high:66016kB active_anon:0kB inactive_anon:0kB active_file:132kB
> inactive_file:140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:13068kB slab_unreclaimable:147784kB kernel_stack:628952kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:16864
> all_unreclaimable? yes
> Apr 8 20:29:17 werner kernel: lowmem_reserve[]: 0 0 17326 17326
> Apr 8 20:29:17 werner kernel: HighMem free:1931552kB min:512kB low:28056kB
> high:55600kB active_anon:122796kB inactive_anon:128kB active_file:62080kB
> inactive_file:84484kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:2217808kB mlocked:0kB dirty:4kB writeback:0kB mapped:74520kB
> shmem:2692kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
> pagetables:1780kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> Apr 8 20:29:17 werner kernel: lowmem_reserve[]: 0 0 0 0
> Apr 8 20:29:17 werner kernel: DMA: 116*4kB 18*8kB 1*16kB 1*32kB 0*64kB
> 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4240kB
> Apr 8 20:29:17 werner kernel: Normal: 1527*4kB 535*8kB 279*16kB 177*32kB
> 65*64kB 23*128kB 6*256kB 3*512kB 1*1024kB 0*2048kB 3*4096kB = 44004kB
> Apr 8 20:29:17 werner kernel: HighMem: 6724*4kB 5598*8kB 3908*16kB
> 2589*32kB 1469*64kB 630*128kB 249*256kB 111*512kB 64*1024kB 21*2048kB
> 320*4096kB = 1931552kB
> Apr 8 20:29:17 werner kernel: 37382 total pagecache pages
> Apr 8 20:29:17 werner kernel: 0 pages in swap cache
> Apr 8 20:29:17 werner kernel: Swap cache stats: add 0, delete 0, find 0/0
> Apr 8 20:29:17 werner kernel: Free swap = 0kB
> Apr 8 20:29:17 werner kernel: Total swap = 0kB
> Apr 8 20:29:17 werner kernel: 786128 pages RAM
> Apr 8 20:29:17 werner kernel: 558818 pages HighMem
> Apr 8 20:29:17 werner kernel: 13582 pages reserved
> Apr 8 20:29:17 werner kernel: 99284 pages shared
> Apr 8 20:29:17 werner kernel: 174956 pages non-shared
> Apr 8 20:29:17 werner kernel: Out of memory: Kill process 11075 (httpd)
> score 3 or sacrifice child
> Apr 8 20:29:17 werner kernel: Killed process 11075 (httpd)
> total-vm:55120kB, anon-rss:7832kB, file-rss:3628kB
> Apr 8 20:29:17 werner kernel: iwconfig invoked oom-killer:
> gfp_mask=0x800d0, order=0, oom_adj=0, oom_score_adj=0
> Apr 8 20:29:17 werner kernel: Pid: 31155, comm: iwconfig Not tainted
> 3.4.0-rc2-i486-1sys #1
> Apr 8 20:29:17 werner kernel: Call Trace:
> Apr 8 20:29:17 werner kernel: [<c10356ff>] ? printk+0x20/0x22
> Apr 8 20:29:17 werner kernel: [<c10af32b>] dump_header+0x6f/0x95
> Apr 8 20:29:17 werner kernel: [<c10af53f>] oom_kill_process+0x52/0x251
> Apr 8 20:29:17 werner kernel: [<c10af803>] ? select_bad_process+0xc5/0x11c
> Apr 8 20:29:17 werner kernel: [<c10af99e>] out_of_memory+0x144/0x1b4
> Apr 8 20:29:17 werner kernel: [<c10b269c>]
> __alloc_pages_nodemask+0x501/0x63f
> Apr 8 20:29:17 werner kernel: [<c10b27f6>] __get_free_pages+0x1c/0x2d
> Apr 8 20:29:17 werner kernel: [<c113197d>] do_proc_readlink+0x27/0x7b
> Apr 8 20:29:17 werner kernel: [<c1133367>] proc_pid_readlink+0x48/0x5b
> Apr 8 20:29:17 werner kernel: [<c10ee9c2>] sys_readlinkat+0x81/0x95
> Apr 8 20:29:17 werner kernel: [<c10eea02>] sys_readlink+0x2c/0x2e
> Apr 8 20:29:17 werner kernel: [<c208c05c>] syscall_call+0x7/0xb
> Apr 8 20:29:17 werner kernel: Mem-Info:
> Apr 8 20:29:17 werner kernel: DMA per-cpu:
> Apr 8 20:29:17 werner kernel: CPU 0: hi: 0, btch: 1 usd: 0
> Apr 8 20:29:17 werner kernel: Normal per-cpu:
> Apr 8 20:29:17 werner kernel: CPU 0: hi: 186, btch: 31 usd: 57
> Apr 8 20:29:17 werner kernel: HighMem per-cpu:
> Apr 8 20:29:17 werner kernel: CPU 0: hi: 186, btch: 31 usd: 121
> Apr 8 20:29:17 werner kernel: active_anon:29820 inactive_anon:29
> isolated_anon:0
> Apr 8 20:29:17 werner kernel: active_file:15559 inactive_file:21150
> isolated_file:0
> Apr 8 20:29:17 werner kernel: unevictable:0 dirty:0 writeback:0 unstable:0
> Apr 8 20:29:17 werner kernel: free:495930 slab_reclaimable:3267
> slab_unreclaimable:37486
> Apr 8 20:29:17 werner kernel: mapped:18629 shmem:671 pagetables:434
> bounce:0
> Apr 8 20:29:17 werner kernel: DMA free:4240kB min:784kB low:980kB
> high:1176kB active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:24kB slab_unreclaimable:2184kB kernel_stack:9392kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? yes
> Apr 8 20:29:17 werner kernel: lowmem_reserve[]: 0 865 3031 3031
> Apr 8 20:29:17 werner kernel: Normal free:44184kB min:44012kB low:55012kB
> high:66016kB active_anon:0kB inactive_anon:0kB active_file:128kB
> inactive_file:144kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:13044kB slab_unreclaimable:147760kB kernel_stack:628952kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:447
> all_unreclaimable? yes
> Apr 8 20:29:17 werner kernel: lowmem_reserve[]: 0 0 17326 17326
> Apr 8 20:29:17 werner kernel: HighMem free:1935296kB min:512kB low:28056kB
> high:55600kB active_anon:119280kB inactive_anon:116kB active_file:62108kB
> inactive_file:84456kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:2217808kB mlocked:0kB dirty:0kB writeback:0kB mapped:74516kB
> shmem:2684kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
> pagetables:1736kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> Apr 8 20:29:17 werner kernel: lowmem_reserve[]: 0 0 0 0
> Apr 8 20:29:17 werner kernel: DMA: 116*4kB 18*8kB 1*16kB 1*32kB 0*64kB
> 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4240kB
> Apr 8 20:29:17 werner kernel: Normal: 1568*4kB 535*8kB 280*16kB 177*32kB
> 65*64kB 23*128kB 6*256kB 3*512kB 1*1024kB 0*2048kB 3*4096kB = 44184kB
> Apr 8 20:29:17 werner kernel: HighMem: 6656*4kB 5630*8kB 3937*16kB
> 2602*32kB 1484*64kB 637*128kB 253*256kB 111*512kB 64*1024kB 21*2048kB
> 320*4096kB = 1935296kB
> Apr 8 20:29:17 werner kernel: 37380 total pagecache pages
> Apr 8 20:29:17 werner kernel: 0 pages in swap cache
> Apr 8 20:29:17 werner kernel: Swap cache stats: add 0, delete 0, find 0/0
> Apr 8 20:29:17 werner kernel: Free swap = 0kB
> Apr 8 20:29:17 werner kernel: Total swap = 0kB
> Apr 8 20:29:17 werner kernel: 786128 pages RAM
> Apr 8 20:29:17 werner kernel: 558818 pages HighMem
> Apr 8 20:29:17 werner kernel: 13582 pages reserved
> Apr 8 20:29:17 werner kernel: 96159 pages shared
> Apr 8 20:29:17 werner kernel: 174665 pages non-shared
> Apr 8 20:29:17 werner kernel: Out of memory: Kill process 11076 (httpd)
> score 3 or sacrifice child
> Apr 8 20:29:17 werner kernel: Killed process 11076 (httpd)
> total-vm:55380kB, anon-rss:8088kB, file-rss:3632kB
> Apr 8 20:29:28 werner kdm_greet[31163]: Can't open default user face
> Apr 8 20:55:07 werner hcid[9333]: Got disconnected from the system message
> bus
> ---
> Professional hosting for everyone - http://www.host.ru
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 2:42 v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs) Linus Torvalds
@ 2012-04-09 2:50 ` Andrew Morton
2012-04-09 3:11 ` Linus Torvalds
0 siblings, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2012-04-09 2:50 UTC (permalink / raw)
To: Linus Torvalds
Cc: David Rientjes, Rik van Riel, Hugh Dickins, werner, linux-kernel
On Sun, 8 Apr 2012 19:42:31 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Guys, there's something wrong in the VM. Most likely suspects added to
> the participants list.
>
> Apparently things go south and the oom killer is invoked. X.org seems
> to get killed.
>
> Any hints? Werner traditionally finds problems by enabling every
> single config option there is, I assume this is another of those
> kernes..
>
>
> ...
>
> > Apr __8 20:29:11 werner kernel: Normal free:44004kB min:44012kB low:55012kB
> > high:66016kB active_anon:0kB inactive_anon:0kB active_file:132kB
> > inactive_file:140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> > present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> > slab_reclaimable:13068kB slab_unreclaimable:147784kB kernel_stack:628952kB
> > pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1376
> > all_unreclaimable? yes
That's claiming that 600MB of ZONE_NORMAL is being used for kernel stacks.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 2:50 ` Andrew Morton
@ 2012-04-09 3:11 ` Linus Torvalds
2012-04-09 7:04 ` Sven Joachim
2012-04-09 10:15 ` David Rientjes
0 siblings, 2 replies; 43+ messages in thread
From: Linus Torvalds @ 2012-04-09 3:11 UTC (permalink / raw)
To: Andrew Morton, werner
Cc: David Rientjes, Rik van Riel, Hugh Dickins, linux-kernel, Oleg Nesterov
On Sun, Apr 8, 2012 at 7:50 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Sun, 8 Apr 2012 19:42:31 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>> > Apr __8 20:29:11 werner kernel: Normal free:44004kB min:44012kB low:55012kB
>> > high:66016kB active_anon:0kB inactive_anon:0kB active_file:132kB
>> > inactive_file:140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
>> > present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
>> > slab_reclaimable:13068kB slab_unreclaimable:147784kB kernel_stack:628952kB
>> > pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1376
>> > all_unreclaimable? yes
>
> That's claiming that 600MB of ZONE_NORMAL is being used for kernel stacks.
Well, that would certainly eat up memory that is hard to get back.
Werner - if you can reproduce this, can you get a "ps axl" or similar
when it starts happening? Or probably even long before, since it
probably starts long long earlier.
Or does anybody see anything that keeps thread counts raised so that
"free_task()" doesn't get done. kernel/profoe.c does that
"profile_handoff_task()" thing - but only oprofile and the android
low-memory-killer logic seems to use it though. But that's exactly the
kind of thing that Werner's "configure everything" might enable -
Werner?
What else would do this? I'd suspect the /proc code, but that grabs
the mm_struct, and those particular changes were pre-3.3 anyway.
Adding Oleg just in case he has any ideas about process code changes
(or some usermodehelper thing that leaks processes, or whatever).
Linus
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 3:11 ` Linus Torvalds
@ 2012-04-09 7:04 ` Sven Joachim
2012-04-09 15:24 ` Linus Torvalds
2012-04-09 15:57 ` Rik van Riel
2012-04-09 10:15 ` David Rientjes
1 sibling, 2 replies; 43+ messages in thread
From: Sven Joachim @ 2012-04-09 7:04 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, werner, David Rientjes, Rik van Riel,
Hugh Dickins, linux-kernel, Oleg Nesterov
On 2012-04-09 05:11 +0200, Linus Torvalds wrote:
> On Sun, Apr 8, 2012 at 7:50 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
>> On Sun, 8 Apr 2012 19:42:31 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>>
>>> > Apr __8 20:29:11 werner kernel: Normal free:44004kB min:44012kB low:55012kB
>>> > high:66016kB active_anon:0kB inactive_anon:0kB active_file:132kB
>>> > inactive_file:140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
>>> > present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
>>> > slab_reclaimable:13068kB slab_unreclaimable:147784kB kernel_stack:628952kB
>>> > pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1376
>>> > all_unreclaimable? yes
>>
>> That's claiming that 600MB of ZONE_NORMAL is being used for kernel stacks.
>
> Well, that would certainly eat up memory that is hard to get back.
While I did not experience any crashes or instabilities (yet?), I'm also
seeing memory leaks. On a system started this morning, with hardly
anything running:
,----
| $ pstree
| init-+-acpid
| |-atd
| |-cron
| |-dbus-daemon
| |-dhclient
| |-dictd
| |-5*[getty]
| |-gpm
| |-login---zsh---pstree
| |-lpd
| |-master-+-pickup
| | `-qmgr
| |-named---4*[{named}]
| |-rpc.statd
| |-rpcbind
| |-rsyslogd---3*[{rsyslogd}]
| |-timidity
| |-udevd---2*[udevd]
| `-wpa_supplicant
`----
where I would expect no more than 50 MB used, 400 MB are actually in use:
,----
| $ free
| total used free shared buffers cached
| Mem: 3348400 1849712 1498688 0 328960 1119180
| -/+ buffers/cache: 401572 2946828
| Swap: 3719040 0 3719040
`----
Cheers,
Sven
> Werner - if you can reproduce this, can you get a "ps axl" or similar
> when it starts happening? Or probably even long before, since it
> probably starts long long earlier.
>
> Or does anybody see anything that keeps thread counts raised so that
> "free_task()" doesn't get done. kernel/profoe.c does that
> "profile_handoff_task()" thing - but only oprofile and the android
> low-memory-killer logic seems to use it though. But that's exactly the
> kind of thing that Werner's "configure everything" might enable -
> Werner?
>
> What else would do this? I'd suspect the /proc code, but that grabs
> the mm_struct, and those particular changes were pre-3.3 anyway.
>
> Adding Oleg just in case he has any ideas about process code changes
> (or some usermodehelper thing that leaks processes, or whatever).
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 3:11 ` Linus Torvalds
2012-04-09 7:04 ` Sven Joachim
@ 2012-04-09 10:15 ` David Rientjes
2012-04-09 15:39 ` Linus Torvalds
1 sibling, 1 reply; 43+ messages in thread
From: David Rientjes @ 2012-04-09 10:15 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
Anton Vorontsov, Greg Kroah-Hartman, stable
On Sun, 8 Apr 2012, Linus Torvalds wrote:
> Or does anybody see anything that keeps thread counts raised so that
> "free_task()" doesn't get done. kernel/profoe.c does that
> "profile_handoff_task()" thing - but only oprofile and the android
> low-memory-killer logic seems to use it though. But that's exactly the
> kind of thing that Werner's "configure everything" might enable -
> Werner?
>
I think you nailed it.
I suspect the problem is 1eda5166c764 ("staging: android/lowmemorykiller:
Don't unregister notifier from atomic context") merged during the 3.4
merge window and, unfortunately, backported to stable.
Werner's config has CONFIG_ANDROID_LOW_MEMORY_KILLER=y so we never
actually unregister the callback for the task handoff as a result of the
patch. It's supposed to take responsibility for doing free_task() itself
when it's good and ready, usually by putting it into a list to free, but
now we're just doing this:
struct task_struct *task = data;
if (task == lowmem_deathpending)
lowmem_deathpending = NULL;
return NOTIFY_OK;
whenever put_task_struct() decrements the refcount to 0 and thus they get
leaked and bad things happen.
This is confirmed by Werner's oom log that shows extremely small values
for the oom score of the task chosen to oom kill. His first log showed X
being killed with a score of 29. That means it is the most memory-hogging
task on the system and is only using 2.9% of total system memory.
I can't actually see how the lowmemorykiller actually ever freed any
task_struct after unregistering the notifier during the callback. It
seems like this has always leaked memory but it used to happen much more
slowly because, prior to the patch, we did task_handoff_unregister() in
the callback. So I think the code was always wrong but now it's out of
control because the notifier remains enabled indefinitely. I can't say
the 1eda5166c764 ("staging: android/lowmemorykiller: Don't unregister
notifier from atomic context") commit is fully to blame, it just made the
error much more egregious.
As it sits in 3.4-rc2, this whole lowmem_deathpending business seems to be
storing a pointer to the task_struct of something sent a SIGKILL and it
remains that way until the lowmem_deathpending_timeout expires and
something else is killed instead. lowmem_deathpending gets cleared on the
task handoff if the task selected for kill just exited. This ensures we
only kill one thread at a time.
That's all fine and good but it seems like we're never freeing the
task_struct itself on exit. This seems like the most obvious fix but it
would be really nice to revisit this and remove the dependency on
CONFIG_PROFILING and just check if the lowmem_deathpending thread is found
in the iteration for lowmem_shrink() prior to killing.
android, lowmemorykiller: free task struct on profiling handoff
The lowmemorykiller stores a pointer to a killed thread's task_struct in
lowmem_deathpending when profiling is enabled. When put_task_struct()
results in the refcount going to 0, the task_notify_func() callback clears
lowmem_deathpending if it is the thread that was killed last. This
prevents additional killing until lowmem_deathpending_timeout elapses.
The responsibility of every task handoff notifier is to free the tasks
handed off to it, however, and this was being neglected, which results
in a massive memory leak since no task_struct ever gets freed.
Fix that by freeing the task_struct since we no longer need a reference to
it.
Reported-by: werner <w.landgraf@ru.ru>
Cc: stable@vger.kernel.org
Signed-off-by: David Rientjes <rientjes@google.com>
---
drivers/staging/android/lowmemorykiller.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -78,6 +78,7 @@ task_notify_func(struct notifier_block *self, unsigned long val, void *data)
if (task == lowmem_deathpending)
lowmem_deathpending = NULL;
+ free_task(task);
return NOTIFY_OK;
}
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 7:04 ` Sven Joachim
@ 2012-04-09 15:24 ` Linus Torvalds
2012-04-09 15:43 ` Sven Joachim
2012-04-09 15:57 ` Rik van Riel
1 sibling, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2012-04-09 15:24 UTC (permalink / raw)
To: Sven Joachim
Cc: Andrew Morton, werner, David Rientjes, Rik van Riel,
Hugh Dickins, linux-kernel, Oleg Nesterov
On Mon, Apr 9, 2012 at 12:04 AM, Sven Joachim <svenjoac@gmx.de> wrote:
>
> While I did not experience any crashes or instabilities (yet?), I'm also
> seeing memory leaks. On a system started this morning, with hardly
> anything running:
Do you also have ANDROID support compiled in? And
ANDROID_LOW_MEMORY_KILLER in particular?
Linus
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 10:15 ` David Rientjes
@ 2012-04-09 15:39 ` Linus Torvalds
2012-04-09 21:22 ` David Rientjes
0 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2012-04-09 15:39 UTC (permalink / raw)
To: David Rientjes
Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
Anton Vorontsov, Greg Kroah-Hartman, stable
On Mon, Apr 9, 2012 at 3:15 AM, David Rientjes <rientjes@google.com> wrote:
>
> I think you nailed it.
>
> I suspect the problem is 1eda5166c764 ("staging: android/lowmemorykiller:
> Don't unregister notifier from atomic context") merged during the 3.4
> merge window and, unfortunately, backported to stable.
Ok. That does seem to match everything.
However, I think your patch is the wrong one.
The real bug is actually that those notifiers are a f*cking joke, and
the return value from the notifier is a mistake.
So I personally think that the real problem is this code in
profile_handoff_task:
return (ret == NOTIFY_OK) ? 1 : 0;
and ask yourself two questions:
- what the hell does NOTIFY_OK/NOTIFY_DONE mean?
- what happens if there are multiple notifiers that all (or some)
return NOTIFY_OK?
I'll tell you what my answers are:
(a) NOTIFY_DONE is the "ok, everything is fine, you can free the
task-struct". It's also what that handoff notifier thing returns if
there are no notifiers registered at all.
So the fix to the Android lowmemorykiller is as simple as just
changing NOTIFY_OK to NOTIFY_DONE, which will mean that the caller
will properly free the task struct.
The NOTIFY_OK/NOTIFY_DONE difference really does seem to be just
"NOTIFY_OK means that I will free the task myself later". That's what
the oprofile uses, and it frees the task.
(b) But the whole interface is a total f*cking mess. If *multiple*
people return NOTIFY_OK, they're royally fucked. And the whole (and
only) point of notifiers is that you can register multiple different
ones independently.
So quite frankly, the *real* bug is not in that android driver
(although I'd say that we should just make it return NOTIFY_DONE and
be done with it). The real bug is that the whole f*cking notifier is a
mistake, and checking the error return was the biggest mistake of all.
Werner: just test David's patch (do *not* change both the error value
*and* apply David's patch - that would free the task-struct twice). I
don't think his patch is what I want to apply eventually, but it
should fix the issue.
Sadly, I don't think we have anybody who really "owns"
kernel/profile.c - the thing is broken, it was misdesigned, and nobody
really cares. Which is why we'll probably have to fix this by just
making that Android thing return NOTIFY_DONE, and just accept that the
whole thing is a f*cking joke.
Linus
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 15:24 ` Linus Torvalds
@ 2012-04-09 15:43 ` Sven Joachim
0 siblings, 0 replies; 43+ messages in thread
From: Sven Joachim @ 2012-04-09 15:43 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, werner, David Rientjes, Rik van Riel,
Hugh Dickins, linux-kernel, Oleg Nesterov
On 2012-04-09 17:24 +0200, Linus Torvalds wrote:
> On Mon, Apr 9, 2012 at 12:04 AM, Sven Joachim <svenjoac@gmx.de> wrote:
>>
>> While I did not experience any crashes or instabilities (yet?), I'm also
>> seeing memory leaks. On a system started this morning, with hardly
>> anything running:
>
> Do you also have ANDROID support compiled in? And
> ANDROID_LOW_MEMORY_KILLER in particular?
No, "grep ANDROID /boot/config-$(uname -r)" prints nothing.
Cheers,
Sven
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 7:04 ` Sven Joachim
2012-04-09 15:24 ` Linus Torvalds
@ 2012-04-09 15:57 ` Rik van Riel
2012-04-09 16:19 ` Sven Joachim
1 sibling, 1 reply; 43+ messages in thread
From: Rik van Riel @ 2012-04-09 15:57 UTC (permalink / raw)
To: Sven Joachim
Cc: Linus Torvalds, Andrew Morton, werner, David Rientjes,
Hugh Dickins, linux-kernel, Oleg Nesterov
On 04/09/2012 03:04 AM, Sven Joachim wrote:
> While I did not experience any crashes or instabilities (yet?), I'm also
> seeing memory leaks. On a system started this morning, with hardly
> anything running:
> where I would expect no more than 50 MB used, 400 MB are actually in use:
>
> ,----
> | $ free
> | total used free shared buffers cached
> | Mem: 3348400 1849712 1498688 0 328960 1119180
> | -/+ buffers/cache: 401572 2946828
> | Swap: 3719040 0 3719040
> `----
Do you see any big memory users in /proc/meminfo or in
/proc/slabinfo?
--
All rights reversed
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 15:57 ` Rik van Riel
@ 2012-04-09 16:19 ` Sven Joachim
2012-04-09 16:33 ` Rik van Riel
0 siblings, 1 reply; 43+ messages in thread
From: Sven Joachim @ 2012-04-09 16:19 UTC (permalink / raw)
To: Rik van Riel
Cc: Linus Torvalds, Andrew Morton, werner, David Rientjes,
Hugh Dickins, linux-kernel, Oleg Nesterov
[-- Attachment #1: Type: text/plain, Size: 901 bytes --]
On 2012-04-09 17:57 +0200, Rik van Riel wrote:
> On 04/09/2012 03:04 AM, Sven Joachim wrote:
>
>> While I did not experience any crashes or instabilities (yet?), I'm also
>> seeing memory leaks. On a system started this morning, with hardly
>> anything running:
>
>> where I would expect no more than 50 MB used, 400 MB are actually in use:
>>
>> ,----
>> | $ free
>> | total used free shared buffers cached
>> | Mem: 3348400 1849712 1498688 0 328960 1119180
>> | -/+ buffers/cache: 401572 2946828
>> | Swap: 3719040 0 3719040
>> `----
>
> Do you see any big memory users in /proc/meminfo or in
> /proc/slabinfo?
Attaching these files, since I can't really make anything out of the
latter. Note that I started a few memory hogs (X, Firefox, Emacs with
Gnus), so overall memory footprint has grown to 768 MB.
[-- Attachment #2: meminfo --]
[-- Type: text/plain, Size: 986 bytes --]
MemTotal: 3348400 kB
MemFree: 195560 kB
Buffers: 292688 kB
Cached: 2079648 kB
SwapCached: 0 kB
Active: 1443900 kB
Inactive: 1241544 kB
Active(anon): 219388 kB
Inactive(anon): 94668 kB
Active(file): 1224512 kB
Inactive(file): 1146876 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 3719040 kB
SwapFree: 3719040 kB
Dirty: 52 kB
Writeback: 0 kB
AnonPages: 313108 kB
Mapped: 70348 kB
Shmem: 948 kB
Slab: 407688 kB
SReclaimable: 393984 kB
SUnreclaim: 13704 kB
KernelStack: 1088 kB
PageTables: 2496 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 5393240 kB
Committed_AS: 790452 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 316040 kB
VmallocChunk: 34359370299 kB
DirectMap4k: 232380 kB
DirectMap2M: 3174400 kB
[-- Attachment #3: slabinfo --]
[-- Type: text/plain, Size: 16729 bytes --]
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
fib6_nodes 5 59 64 59 1 : tunables 120 60 8 : slabdata 1 1 0
ip6_dst_cache 4 12 320 12 1 : tunables 54 27 8 : slabdata 1 1 0
RAWv6 5 8 1024 4 1 : tunables 54 27 8 : slabdata 2 2 0
UDPLITEv6 0 0 1024 4 1 : tunables 54 27 8 : slabdata 0 0 0
UDPv6 2 4 1024 4 1 : tunables 54 27 8 : slabdata 1 1 0
tw_sock_TCPv6 0 0 192 20 1 : tunables 120 60 8 : slabdata 0 0 0
request_sock_TCPv6 0 0 192 20 1 : tunables 120 60 8 : slabdata 0 0 0
TCPv6 0 0 1792 2 1 : tunables 24 12 8 : slabdata 0 0 0
ext4_groupinfo_2k 27 28 136 28 1 : tunables 120 60 8 : slabdata 1 1 0
ext4_groupinfo_4k 1160 1176 136 28 1 : tunables 120 60 8 : slabdata 42 42 0
uhci_urb_priv 0 0 56 67 1 : tunables 120 60 8 : slabdata 0 0 0
flow_cache 0 0 104 37 1 : tunables 120 60 8 : slabdata 0 0 0
scsi_sense_cache 22 30 128 30 1 : tunables 120 60 8 : slabdata 1 1 0
scsi_cmd_cache 22 30 256 15 1 : tunables 120 60 8 : slabdata 2 2 0
sd_ext_cdb 2 112 32 112 1 : tunables 120 60 8 : slabdata 1 1 0
cfq_io_cq 43 148 104 37 1 : tunables 120 60 8 : slabdata 4 4 0
cfq_queue 44 119 232 17 1 : tunables 120 60 8 : slabdata 7 7 0
mqueue_inode_cache 1 9 832 9 2 : tunables 54 27 8 : slabdata 1 1 0
jbd2_transaction_s 27 30 256 15 1 : tunables 120 60 8 : slabdata 2 2 0
jbd2_inode 17996 19019 48 77 1 : tunables 120 60 8 : slabdata 247 247 0
jbd2_journal_handle 24 144 24 144 1 : tunables 120 60 8 : slabdata 1 1 0
jbd2_journal_head 54 340 112 34 1 : tunables 120 60 8 : slabdata 10 10 0
jbd2_revoke_table_s 10 202 16 202 1 : tunables 120 60 8 : slabdata 1 1 0
jbd2_revoke_record_s 0 0 32 112 1 : tunables 120 60 8 : slabdata 0 0 0
ext4_inode_cache 283945 284168 840 4 1 : tunables 54 27 8 : slabdata 71042 71042 0
ext4_xattr 0 0 88 44 1 : tunables 120 60 8 : slabdata 0 0 0
ext4_free_data 0 0 64 59 1 : tunables 120 60 8 : slabdata 0 0 0
ext4_allocation_context 0 0 136 28 1 : tunables 120 60 8 : slabdata 0 0 0
ext4_prealloc_space 18 74 104 37 1 : tunables 120 60 8 : slabdata 2 2 0
ext4_system_zone 0 0 40 92 1 : tunables 120 60 8 : slabdata 0 0 0
ext4_io_end 0 0 1128 3 1 : tunables 24 12 8 : slabdata 0 0 0
ext4_io_page 0 0 16 202 1 : tunables 120 60 8 : slabdata 0 0 0
kioctx 0 0 384 10 1 : tunables 54 27 8 : slabdata 0 0 0
kiocb 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0
fanotify_response_event 0 0 32 112 1 : tunables 120 60 8 : slabdata 0 0 0
fsnotify_mark 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0
inotify_event_private_data 0 0 32 112 1 : tunables 120 60 8 : slabdata 0 0 0
inotify_inode_mark 16 28 136 28 1 : tunables 120 60 8 : slabdata 1 1 0
dnotify_mark 0 0 136 28 1 : tunables 120 60 8 : slabdata 0 0 0
dnotify_struct 0 0 32 112 1 : tunables 120 60 8 : slabdata 0 0 0
dio 0 0 640 6 1 : tunables 54 27 8 : slabdata 0 0 0
fasync_cache 5 77 48 77 1 : tunables 120 60 8 : slabdata 1 1 0
posix_timers_cache 0 0 144 27 1 : tunables 120 60 8 : slabdata 0 0 0
uid_cache 9 30 128 30 1 : tunables 120 60 8 : slabdata 1 1 0
UNIX 175 175 768 5 1 : tunables 54 27 8 : slabdata 35 35 0
UDP-Lite 0 0 832 9 2 : tunables 54 27 8 : slabdata 0 0 0
tcp_bind_bucket 10 112 32 112 1 : tunables 120 60 8 : slabdata 1 1 0
inet_peer_cache 15 40 192 20 1 : tunables 120 60 8 : slabdata 2 2 0
secpath_cache 0 0 64 59 1 : tunables 120 60 8 : slabdata 0 0 0
xfrm_dst_cache 0 0 384 10 1 : tunables 54 27 8 : slabdata 0 0 0
ip_fib_trie 8 67 56 67 1 : tunables 120 60 8 : slabdata 1 1 0
ip_fib_alias 9 77 48 77 1 : tunables 120 60 8 : slabdata 1 1 0
ip_dst_cache 31 45 256 15 1 : tunables 120 60 8 : slabdata 3 3 0
PING 0 0 768 5 1 : tunables 54 27 8 : slabdata 0 0 0
RAW 3 9 832 9 2 : tunables 54 27 8 : slabdata 1 1 0
UDP 18 18 832 9 2 : tunables 54 27 8 : slabdata 2 2 0
tw_sock_TCP 0 0 192 20 1 : tunables 120 60 8 : slabdata 0 0 0
request_sock_TCP 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0
TCP 10 15 1600 5 2 : tunables 24 12 8 : slabdata 3 3 0
eventpoll_pwq 94 212 72 53 1 : tunables 120 60 8 : slabdata 4 4 0
eventpoll_epi 94 180 128 30 1 : tunables 120 60 8 : slabdata 6 6 0
sgpool-128 2 2 4096 1 1 : tunables 24 12 8 : slabdata 2 2 0
sgpool-64 2 2 2048 2 1 : tunables 24 12 8 : slabdata 1 1 0
sgpool-32 2 4 1024 4 1 : tunables 54 27 8 : slabdata 1 1 0
sgpool-16 2 8 512 8 1 : tunables 54 27 8 : slabdata 1 1 0
sgpool-8 15 15 256 15 1 : tunables 120 60 8 : slabdata 1 1 0
scsi_data_buffer 0 0 24 144 1 : tunables 120 60 8 : slabdata 0 0 0
blkdev_queue 2 4 1688 4 2 : tunables 24 12 8 : slabdata 1 1 0
blkdev_requests 22 22 344 11 1 : tunables 54 27 8 : slabdata 2 2 0
blkdev_ioc 43 160 96 40 1 : tunables 120 60 8 : slabdata 4 4 0
fsnotify_event_holder 0 0 24 144 1 : tunables 120 60 8 : slabdata 0 0 0
fsnotify_event 1 34 112 34 1 : tunables 120 60 8 : slabdata 1 1 0
bio-0 32 40 192 20 1 : tunables 120 60 8 : slabdata 2 2 0
biovec-256 2 2 4096 1 1 : tunables 24 12 8 : slabdata 2 2 0
biovec-128 0 0 2048 2 1 : tunables 24 12 8 : slabdata 0 0 0
biovec-64 0 0 1024 4 1 : tunables 54 27 8 : slabdata 0 0 0
biovec-16 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0
sock_inode_cache 224 224 576 7 1 : tunables 54 27 8 : slabdata 32 32 0
skbuff_fclone_cache 0 8 448 8 1 : tunables 54 27 8 : slabdata 0 1 0
skbuff_head_cache 156 180 256 15 1 : tunables 120 60 8 : slabdata 12 12 0
file_lock_cache 34 40 192 20 1 : tunables 120 60 8 : slabdata 2 2 0
shmem_inode_cache 1364 1573 592 13 2 : tunables 54 27 8 : slabdata 121 121 135
Acpi-Operand 1432 1484 72 53 1 : tunables 120 60 8 : slabdata 28 28 0
Acpi-ParseExt 0 0 72 53 1 : tunables 120 60 8 : slabdata 0 0 0
Acpi-Parse 0 0 48 77 1 : tunables 120 60 8 : slabdata 0 0 0
Acpi-State 0 0 80 48 1 : tunables 120 60 8 : slabdata 0 0 0
Acpi-Namespace 691 736 40 92 1 : tunables 120 60 8 : slabdata 8 8 0
task_delay_info 169 306 112 34 1 : tunables 120 60 8 : slabdata 9 9 0
taskstats 2 12 328 12 1 : tunables 54 27 8 : slabdata 1 1 0
proc_inode_cache 1188 1188 592 6 1 : tunables 54 27 8 : slabdata 198 198 0
sigqueue 48 48 160 24 1 : tunables 120 60 8 : slabdata 2 2 0
bdev_cache 11 15 768 5 1 : tunables 54 27 8 : slabdata 3 3 0
sysfs_dir_cache 6770 6800 112 34 1 : tunables 120 60 8 : slabdata 200 200 0
mnt_cache 30 45 256 15 1 : tunables 120 60 8 : slabdata 3 3 0
filp 2362 3285 256 15 1 : tunables 120 60 8 : slabdata 219 219 60
inode_cache 651 651 528 7 1 : tunables 54 27 8 : slabdata 93 93 0
dentry 232248 238440 192 20 1 : tunables 120 60 8 : slabdata 11922 11922 0
names_cache 5 5 4096 1 1 : tunables 24 12 8 : slabdata 5 5 0
key_jar 1 20 192 20 1 : tunables 120 60 8 : slabdata 1 1 0
buffer_head 376815 406556 104 37 1 : tunables 120 60 8 : slabdata 10988 10988 0
nsproxy 1 77 48 77 1 : tunables 120 60 8 : slabdata 1 1 0
vm_area_struct 4348 4600 168 23 1 : tunables 120 60 8 : slabdata 200 200 60
mm_struct 76 76 896 4 1 : tunables 54 27 8 : slabdata 19 19 0
fs_cache 84 177 64 59 1 : tunables 120 60 8 : slabdata 3 3 0
files_cache 84 132 704 11 2 : tunables 54 27 8 : slabdata 12 12 0
signal_cache 125 140 1024 4 1 : tunables 54 27 8 : slabdata 35 35 0
sighand_cache 117 126 2112 3 2 : tunables 24 12 8 : slabdata 42 42 0
task_xstate 112 112 512 8 1 : tunables 54 27 8 : slabdata 14 14 0
task_struct 155 155 1472 5 2 : tunables 24 12 8 : slabdata 31 31 0
cred_jar 373 640 192 20 1 : tunables 120 60 8 : slabdata 32 32 0
anon_vma_chain 3520 5005 48 77 1 : tunables 120 60 8 : slabdata 65 65 0
anon_vma 2502 2950 64 59 1 : tunables 120 60 8 : slabdata 50 50 0
pid 175 240 128 30 1 : tunables 120 60 8 : slabdata 8 8 0
radix_tree_node 28764 29099 560 7 1 : tunables 54 27 8 : slabdata 4157 4157 0
idr_layer_cache 327 357 544 7 1 : tunables 54 27 8 : slabdata 51 51 0
size-4194304(DMA) 0 0 4194304 1 1024 : tunables 1 1 0 : slabdata 0 0 0
size-4194304 0 0 4194304 1 1024 : tunables 1 1 0 : slabdata 0 0 0
size-2097152(DMA) 0 0 2097152 1 512 : tunables 1 1 0 : slabdata 0 0 0
size-2097152 0 0 2097152 1 512 : tunables 1 1 0 : slabdata 0 0 0
size-1048576(DMA) 0 0 1048576 1 256 : tunables 1 1 0 : slabdata 0 0 0
size-1048576 0 0 1048576 1 256 : tunables 1 1 0 : slabdata 0 0 0
size-524288(DMA) 0 0 524288 1 128 : tunables 1 1 0 : slabdata 0 0 0
size-524288 1 1 524288 1 128 : tunables 1 1 0 : slabdata 1 1 0
size-262144(DMA) 0 0 262144 1 64 : tunables 1 1 0 : slabdata 0 0 0
size-262144 0 0 262144 1 64 : tunables 1 1 0 : slabdata 0 0 0
size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0
size-131072 3 3 131072 1 32 : tunables 8 4 0 : slabdata 3 3 0
size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0
size-65536 5 5 65536 1 16 : tunables 8 4 0 : slabdata 5 5 0
size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0
size-32768 9 9 32768 1 8 : tunables 8 4 0 : slabdata 9 9 0
size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
size-16384 7 7 16384 1 4 : tunables 8 4 0 : slabdata 7 7 0
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0
size-8192 23 23 8192 1 2 : tunables 8 4 0 : slabdata 23 23 0
size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 8 : slabdata 0 0 0
size-4096 210 210 4096 1 1 : tunables 24 12 8 : slabdata 210 210 0
size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 8 : slabdata 0 0 0
size-2048 276 276 2048 2 1 : tunables 24 12 8 : slabdata 138 138 0
size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 8 : slabdata 0 0 0
size-1024 920 920 1024 4 1 : tunables 54 27 8 : slabdata 230 230 0
size-512(DMA) 0 0 512 8 1 : tunables 54 27 8 : slabdata 0 0 0
size-512 608 608 512 8 1 : tunables 54 27 8 : slabdata 76 76 0
size-256(DMA) 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0
size-256 620 795 256 15 1 : tunables 120 60 8 : slabdata 53 53 0
size-192(DMA) 0 0 192 20 1 : tunables 120 60 8 : slabdata 0 0 0
size-192 1545 1960 192 20 1 : tunables 120 60 8 : slabdata 98 98 0
size-128(DMA) 0 0 128 30 1 : tunables 120 60 8 : slabdata 0 0 0
size-64(DMA) 0 0 64 59 1 : tunables 120 60 8 : slabdata 0 0 0
size-64 8307 13924 64 59 1 : tunables 120 60 8 : slabdata 236 236 0
size-32(DMA) 0 0 32 112 1 : tunables 120 60 8 : slabdata 0 0 0
size-128 3820 3840 128 30 1 : tunables 120 60 8 : slabdata 128 128 0
size-32 5625 5936 32 112 1 : tunables 120 60 8 : slabdata 53 53 0
kmem_cache 153 160 192 20 1 : tunables 120 60 8 : slabdata 8 8 0
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 16:19 ` Sven Joachim
@ 2012-04-09 16:33 ` Rik van Riel
2012-04-09 17:00 ` Pekka Enberg
2012-04-09 17:00 ` Sven Joachim
0 siblings, 2 replies; 43+ messages in thread
From: Rik van Riel @ 2012-04-09 16:33 UTC (permalink / raw)
To: Sven Joachim
Cc: Linus Torvalds, Andrew Morton, werner, David Rientjes,
Hugh Dickins, linux-kernel, Oleg Nesterov
On 04/09/2012 12:19 PM, Sven Joachim wrote:
> On 2012-04-09 17:57 +0200, Rik van Riel wrote:
>
>> On 04/09/2012 03:04 AM, Sven Joachim wrote:
>>
>>> While I did not experience any crashes or instabilities (yet?), I'm also
>>> seeing memory leaks. On a system started this morning, with hardly
>>> anything running:
>>
>>> where I would expect no more than 50 MB used, 400 MB are actually in use:
>>>
>>> ,----
>>> | $ free
>>> | total used free shared buffers cached
>>> | Mem: 3348400 1849712 1498688 0 328960 1119180
>>> | -/+ buffers/cache: 401572 2946828
>>> | Swap: 3719040 0 3719040
>>> `----
>>
>> Do you see any big memory users in /proc/meminfo or in
>> /proc/slabinfo?
>
> Attaching these files, since I can't really make anything out of the
> latter. Note that I started a few memory hogs (X, Firefox, Emacs with
> Gnus), so overall memory footprint has grown to 768 MB.
Looks like the "missing" 400MB is all in filesystem caches,
specifically the dentry cache, the ext4 inode cache and
buffer heads.
That is perfectly fine, since those caches will be shrunk
when the system needs memory.
--
All rights reversed
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 16:33 ` Rik van Riel
@ 2012-04-09 17:00 ` Pekka Enberg
2012-04-09 17:19 ` Sven Joachim
2012-04-09 17:00 ` Sven Joachim
1 sibling, 1 reply; 43+ messages in thread
From: Pekka Enberg @ 2012-04-09 17:00 UTC (permalink / raw)
To: Rik van Riel
Cc: Sven Joachim, Linus Torvalds, Andrew Morton, werner,
David Rientjes, Hugh Dickins, linux-kernel, Oleg Nesterov
On Mon, Apr 9, 2012 at 7:33 PM, Rik van Riel <riel@redhat.com> wrote:
>> Attaching these files, since I can't really make anything out of the
>> latter. Note that I started a few memory hogs (X, Firefox, Emacs with
>> Gnus), so overall memory footprint has grown to 768 MB.
>
> Looks like the "missing" 400MB is all in filesystem caches,
> specifically the dentry cache, the ext4 inode cache and
> buffer heads.
>
> That is perfectly fine, since those caches will be shrunk
> when the system needs memory.
CONFIG_SLUB, right? It will merge caches so you don't necessarily see
leaks in /proc/slabinfo. You can use "slub_nomerge" kernel parameter
to disable the merging.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 16:33 ` Rik van Riel
2012-04-09 17:00 ` Pekka Enberg
@ 2012-04-09 17:00 ` Sven Joachim
2012-04-09 17:20 ` Rik van Riel
1 sibling, 1 reply; 43+ messages in thread
From: Sven Joachim @ 2012-04-09 17:00 UTC (permalink / raw)
To: Rik van Riel
Cc: Linus Torvalds, Andrew Morton, werner, David Rientjes,
Hugh Dickins, linux-kernel, Oleg Nesterov
On 2012-04-09 18:33 +0200, Rik van Riel wrote:
> On 04/09/2012 12:19 PM, Sven Joachim wrote:
>> On 2012-04-09 17:57 +0200, Rik van Riel wrote:
>>
>>> On 04/09/2012 03:04 AM, Sven Joachim wrote:
>>>
>>>> While I did not experience any crashes or instabilities (yet?), I'm also
>>>> seeing memory leaks. On a system started this morning, with hardly
>>>> anything running:
>>>
>>>> where I would expect no more than 50 MB used, 400 MB are actually in use:
>>>>
>>>> ,----
>>>> | $ free
>>>> | total used free shared buffers cached
>>>> | Mem: 3348400 1849712 1498688 0 328960 1119180
>>>> | -/+ buffers/cache: 401572 2946828
>>>> | Swap: 3719040 0 3719040
>>>> `----
>>>
>>> Do you see any big memory users in /proc/meminfo or in
>>> /proc/slabinfo?
>>
>> Attaching these files, since I can't really make anything out of the
>> latter. Note that I started a few memory hogs (X, Firefox, Emacs with
>> Gnus), so overall memory footprint has grown to 768 MB.
>
> Looks like the "missing" 400MB is all in filesystem caches,
> specifically the dentry cache, the ext4 inode cache and
> buffer heads.
Then why does free(1) report those in the "-/+ buffers/cache:" line? It
did not do this with earlier kernels, AFAIK.
Cheers,
Sven
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 17:00 ` Pekka Enberg
@ 2012-04-09 17:19 ` Sven Joachim
0 siblings, 0 replies; 43+ messages in thread
From: Sven Joachim @ 2012-04-09 17:19 UTC (permalink / raw)
To: Pekka Enberg
Cc: Rik van Riel, Linus Torvalds, Andrew Morton, werner,
David Rientjes, Hugh Dickins, linux-kernel, Oleg Nesterov
On 2012-04-09 19:00 +0200, Pekka Enberg wrote:
> On Mon, Apr 9, 2012 at 7:33 PM, Rik van Riel <riel@redhat.com> wrote:
>>> Attaching these files, since I can't really make anything out of the
>>> latter. Note that I started a few memory hogs (X, Firefox, Emacs with
>>> Gnus), so overall memory footprint has grown to 768 MB.
>>
>> Looks like the "missing" 400MB is all in filesystem caches,
>> specifically the dentry cache, the ext4 inode cache and
>> buffer heads.
>>
>> That is perfectly fine, since those caches will be shrunk
>> when the system needs memory.
>
> CONFIG_SLUB, right?
Actually, no. For some reason (probably historical…) I have CONFIG_SLAB.
> It will merge caches so you don't necessarily see
> leaks in /proc/slabinfo. You can use "slub_nomerge" kernel parameter
> to disable the merging.
Cheers,
Sven
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 17:00 ` Sven Joachim
@ 2012-04-09 17:20 ` Rik van Riel
0 siblings, 0 replies; 43+ messages in thread
From: Rik van Riel @ 2012-04-09 17:20 UTC (permalink / raw)
To: Sven Joachim
Cc: Linus Torvalds, Andrew Morton, werner, David Rientjes,
Hugh Dickins, linux-kernel, Oleg Nesterov
On 04/09/2012 01:00 PM, Sven Joachim wrote:
> On 2012-04-09 18:33 +0200, Rik van Riel wrote:
>
>> On 04/09/2012 12:19 PM, Sven Joachim wrote:
>>> On 2012-04-09 17:57 +0200, Rik van Riel wrote:
>>>
>>>> On 04/09/2012 03:04 AM, Sven Joachim wrote:
>>>>
>>>>> While I did not experience any crashes or instabilities (yet?), I'm also
>>>>> seeing memory leaks. On a system started this morning, with hardly
>>>>> anything running:
>>>>
>>>>> where I would expect no more than 50 MB used, 400 MB are actually in use:
>>>>>
>>>>> ,----
>>>>> | $ free
>>>>> | total used free shared buffers cached
>>>>> | Mem: 3348400 1849712 1498688 0 328960 1119180
>>>>> | -/+ buffers/cache: 401572 2946828
>>>>> | Swap: 3719040 0 3719040
>>>>> `----
>>>>
>>>> Do you see any big memory users in /proc/meminfo or in
>>>> /proc/slabinfo?
>>>
>>> Attaching these files, since I can't really make anything out of the
>>> latter. Note that I started a few memory hogs (X, Firefox, Emacs with
>>> Gnus), so overall memory footprint has grown to 768 MB.
>>
>> Looks like the "missing" 400MB is all in filesystem caches,
>> specifically the dentry cache, the ext4 inode cache and
>> buffer heads.
>
> Then why does free(1) report those in the "-/+ buffers/cache:" line? It
> did not do this with earlier kernels, AFAIK.
It has done so for over a decade. Reclaimable slab has never been
subtracted from "used" by the free utility.
--
All rights reversed
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 15:39 ` Linus Torvalds
@ 2012-04-09 21:22 ` David Rientjes
2012-04-09 22:09 ` Linus Torvalds
2012-04-09 22:13 ` Colin Cross
0 siblings, 2 replies; 43+ messages in thread
From: David Rientjes @ 2012-04-09 21:22 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
Anton Vorontsov, Greg Kroah-Hartman, stable
On Mon, 9 Apr 2012, Linus Torvalds wrote:
> The real bug is actually that those notifiers are a f*cking joke, and
> the return value from the notifier is a mistake.
>
> So I personally think that the real problem is this code in
> profile_handoff_task:
>
> return (ret == NOTIFY_OK) ? 1 : 0;
>
> and ask yourself two questions:
>
> - what the hell does NOTIFY_OK/NOTIFY_DONE mean?
> - what happens if there are multiple notifiers that all (or some)
> return NOTIFY_OK?
>
NOTIFY_OK should never be a valid response for this notifier the way it's
currently implemented, it should be NOTIFY_STOP to stop iterating the call
chain to avoid a double free. Right now it doesn't matter because only
oprofile is actually freeing the task_struct and lowmemorykiller should be
using NOTIFY_DONE.
Then we have a completeness issue if multiple callbacks want to return
NOTIFY_STOP and an ordering issue if the oprofile callback is invoked
before lowmemorykiller.
> I'll tell you what my answers are:
>
> (a) NOTIFY_DONE is the "ok, everything is fine, you can free the
> task-struct". It's also what that handoff notifier thing returns if
> there are no notifiers registered at all.
>
> So the fix to the Android lowmemorykiller is as simple as just
> changing NOTIFY_OK to NOTIFY_DONE, which will mean that the caller
> will properly free the task struct.
>
I don't think so for Werner's config who also has CONFIG_OPROFILE=y, so
oprofile would return NOTIFY_OK and queue the task_struct for free, then
the second notifier callback to the lowmemorykiller would return
NOTIFY_DONE which would result in put_task_struct() doing free_task()
itself for a double free.
> The NOTIFY_OK/NOTIFY_DONE difference really does seem to be just
> "NOTIFY_OK means that I will free the task myself later". That's what
> the oprofile uses, and it frees the task.
>
> (b) But the whole interface is a total f*cking mess. If *multiple*
> people return NOTIFY_OK, they're royally fucked. And the whole (and
> only) point of notifiers is that you can register multiple different
> ones independently.
>
> So quite frankly, the *real* bug is not in that android driver
> (although I'd say that we should just make it return NOTIFY_DONE and
> be done with it). The real bug is that the whole f*cking notifier is a
> mistake, and checking the error return was the biggest mistake of all.
>
Right, we can't handoff the freeing of the task_struct to more than one
notifier. It seems misdesigned from the beginning and what we really want
is to hijack task->usage for __put_task_struct(task) if we have such a
notifier callchain and require each one (currently just oprofile) to take
a reference on task->usage for NOTIFY_OK and then be responsible for
dropping the reference when it's done with it later instead of requiring
it to free the task_struct itself.
That's _if_ we want to continue to have such an interface in the first
place where it's only really necessary right now for oprofile (and, hence,
wasn't implemented in an extendable way). I'm thinking the
lowmemorykiller, as I eluded to, could be written in a way where we can
detect if a thread we've already killed has exited yet before killing
another one. We can't just store a pointer to the task_struct of the
killed task since it could be reused for a fork later, but we could use
TIF_MEMDIE like the oom killer does.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 21:22 ` David Rientjes
@ 2012-04-09 22:09 ` Linus Torvalds
2012-04-09 23:25 ` David Rientjes
2012-04-09 22:13 ` Colin Cross
1 sibling, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2012-04-09 22:09 UTC (permalink / raw)
To: David Rientjes
Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
Anton Vorontsov, Greg Kroah-Hartman, stable, Ingo Molnar
[-- Attachment #1: Type: text/plain, Size: 2126 bytes --]
On Mon, Apr 9, 2012 at 2:22 PM, David Rientjes <rientjes@google.com> wrote:
>
> NOTIFY_OK should never be a valid response for this notifier the way it's
> currently implemented, it should be NOTIFY_STOP to stop iterating the call
> chain to avoid a double free.
No, that's no good either. That would mean that some people wouldn't
be notified about the death of the task at all.
So NOTIFY_STOP just implies *another* bug.
> Right, we can't handoff the freeing of the task_struct to more than one
> notifier. It seems misdesigned from the beginning and what we really want
> is to hijack task->usage for __put_task_struct(task) if we have such a
> notifier callchain and require each one (currently just oprofile) to take
> a reference on task->usage for NOTIFY_OK and then be responsible for
> dropping the reference when it's done with it later instead of requiring
> it to free the task_struct itself.
We could make notifier.c just "or" all the return values together, and
then it's ok if *one* person returns NOTIFY_OK.
Of course, that's not how notifiers are documented to work, but quite
frankly, notifiers with non-zero values that don't sat STOP are broken
as-is anyway, so you might we well do a logical "or" of the return
values and at least make things like this work.
I personally think every single notifier interface we have ever had in
the kernel has been a total f*cking disaster. The whole concept of
"run these random functions at this random event" is a broken concept
that just makes people do crazy broken things.
Oh well. So my suggestion right now would be something like the
attached. It's still horribly broken, it actively breaks documented
notifier behavior, but dammit, if the notifier people don't like
'or'ing return values together they should damn well return zero from
the notifier that doesn't do anything. And returning an error will
exit out, so..
Hmm? Who cares about that kernel/notifier.c code? Andrew? Ingo? We
don't have any actual maintainer for that crap, but judging by the
commits, it's one of you two.
Linus
[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 1047 bytes --]
drivers/staging/android/lowmemorykiller.c | 2 +-
kernel/notifier.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
index 052b43e4e505..142bfc2f84db 100644
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -79,7 +79,7 @@ task_notify_func(struct notifier_block *self, unsigned long val, void *data)
if (task == lowmem_deathpending)
lowmem_deathpending = NULL;
- return NOTIFY_OK;
+ return NOTIFY_DONE;
}
static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
diff --git a/kernel/notifier.c b/kernel/notifier.c
index 2d5cc4ccff7f..11fe956e8daf 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -90,7 +90,7 @@ static int __kprobes notifier_call_chain(struct notifier_block **nl,
continue;
}
#endif
- ret = nb->notifier_call(nb, val, v);
+ ret |= nb->notifier_call(nb, val, v);
if (nr_calls)
(*nr_calls)++;
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 21:22 ` David Rientjes
@ 2012-04-09 22:13 ` Colin Cross
2012-04-09 22:13 ` Colin Cross
1 sibling, 0 replies; 43+ messages in thread
From: Colin Cross @ 2012-04-09 22:13 UTC (permalink / raw)
To: David Rientjes
Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
Christian Bejram, Paul E. McKenney, Anton Vorontsov,
Greg Kroah-Hartman, stable
On Mon, Apr 9, 2012 at 2:22 PM, David Rientjes <rientjes@google.com> wrote:
> On Mon, 9 Apr 2012, Linus Torvalds wrote:
>
>> The real bug is actually that those notifiers are a f*cking joke, and
>> the return value from the notifier is a mistake.
>>
>> So I personally think that the real problem is this code in
>> profile_handoff_task:
>>
>> return (ret == NOTIFY_OK) ? 1 : 0;
>>
>> and ask yourself two questions:
>>
>> - what the hell does NOTIFY_OK/NOTIFY_DONE mean?
>> - what happens if there are multiple notifiers that all (or some)
>> return NOTIFY_OK?
>>
> NOTIFY_OK should never be a valid response for this notifier the way it's
> currently implemented, it should be NOTIFY_STOP to stop iterating the call
> chain to avoid a double free. Right now it doesn't matter because only
> oprofile is actually freeing the task_struct and lowmemorykiller should be
> using NOTIFY_DONE.
>
> Then we have a completeness issue if multiple callbacks want to return
> NOTIFY_STOP and an ordering issue if the oprofile callback is invoked
> before lowmemorykiller.
>
>> I'll tell you what my answers are:
>>
>> (a) NOTIFY_DONE is the "ok, everything is fine, you can free the
>> task-struct". It's also what that handoff notifier thing returns if
>> there are no notifiers registered at all.
>>
>> So the fix to the Android lowmemorykiller is as simple as just
>> changing NOTIFY_OK to NOTIFY_DONE, which will mean that the caller
>> will properly free the task struct.
>>
>
> I don't think so for Werner's config who also has CONFIG_OPROFILE=y, so
> oprofile would return NOTIFY_OK and queue the task_struct for free, then
> the second notifier callback to the lowmemorykiller would return
> NOTIFY_DONE which would result in put_task_struct() doing free_task()
> itself for a double free.
>
>> The NOTIFY_OK/NOTIFY_DONE difference really does seem to be just
>> "NOTIFY_OK means that I will free the task myself later". That's what
>> the oprofile uses, and it frees the task.
>>
>> (b) But the whole interface is a total f*cking mess. If *multiple*
>> people return NOTIFY_OK, they're royally fucked. And the whole (and
>> only) point of notifiers is that you can register multiple different
>> ones independently.
>>
>> So quite frankly, the *real* bug is not in that android driver
>> (although I'd say that we should just make it return NOTIFY_DONE and
>> be done with it). The real bug is that the whole f*cking notifier is a
>> mistake, and checking the error return was the biggest mistake of all.
>>
>
> Right, we can't handoff the freeing of the task_struct to more than one
> notifier. It seems misdesigned from the beginning and what we really want
> is to hijack task->usage for __put_task_struct(task) if we have such a
> notifier callchain and require each one (currently just oprofile) to take
> a reference on task->usage for NOTIFY_OK and then be responsible for
> dropping the reference when it's done with it later instead of requiring
> it to free the task_struct itself.
>
> That's _if_ we want to continue to have such an interface in the first
> place where it's only really necessary right now for oprofile (and, hence,
> wasn't implemented in an extendable way). I'm thinking the
> lowmemorykiller, as I eluded to, could be written in a way where we can
> detect if a thread we've already killed has exited yet before killing
> another one. We can't just store a pointer to the task_struct of the
> killed task since it could be reused for a fork later, but we could use
> TIF_MEMDIE like the oom killer does.
This was a known issue in 2010, in the android tree the use of
task_handoff_register was dropped one day after it was added and
replaced with a new task_free_register hook. I assume Greg dropped
the fix during the android tree refresh in 3.0 because it depended on
a change to kernel/fork.c. The two relevant patches are (using
codeaurora's gitweb becase we don't have one right now):
sched: Add a generic notifier when a task struct is about to be freed
https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=667dffa787a87ef4ea43cc65957ce96077fdcd0a
staging: android: lowmemorykiller: Fix task_struct leak
https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=af0240f095a704f75f032bbcc01f670c65c163ba
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
@ 2012-04-09 22:13 ` Colin Cross
0 siblings, 0 replies; 43+ messages in thread
From: Colin Cross @ 2012-04-09 22:13 UTC (permalink / raw)
To: David Rientjes
Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
Christian Bejram, Paul E. McKenney, Anton Vorontsov,
Greg Kroah-Hartman, stable
On Mon, Apr 9, 2012 at 2:22 PM, David Rientjes <rientjes@google.com> wrote:
> On Mon, 9 Apr 2012, Linus Torvalds wrote:
>
>> The real bug is actually that those notifiers are a f*cking joke, and
>> the return value from the notifier is a mistake.
>>
>> So I personally think that the real problem is this code in
>> profile_handoff_task:
>>
>> � � � � return (ret == NOTIFY_OK) ? 1 : 0;
>>
>> and ask yourself two questions:
>>
>> �- what the hell does NOTIFY_OK/NOTIFY_DONE mean?
>> �- what happens if there are multiple notifiers that all (or some)
>> return NOTIFY_OK?
>>
> NOTIFY_OK should never be a valid response for this notifier the way it's
> currently implemented, it should be NOTIFY_STOP to stop iterating the call
> chain to avoid a double free. �Right now it doesn't matter because only
> oprofile is actually freeing the task_struct and lowmemorykiller should be
> using NOTIFY_DONE.
>
> Then we have a completeness issue if multiple callbacks want to return
> NOTIFY_STOP and an ordering issue if the oprofile callback is invoked
> before lowmemorykiller.
>
>> I'll tell you what my answers are:
>>
>> �(a) NOTIFY_DONE is the "ok, everything is fine, you can free the
>> task-struct". It's also what that handoff notifier thing returns if
>> there are no notifiers registered at all.
>>
>> � � �So the fix to the Android lowmemorykiller is as simple as just
>> changing NOTIFY_OK to NOTIFY_DONE, which will mean that the caller
>> will properly free the task struct.
>>
>
> I don't think so for Werner's config who also has CONFIG_OPROFILE=y, so
> oprofile would return NOTIFY_OK and queue the task_struct for free, then
> the second notifier callback to the lowmemorykiller would return
> NOTIFY_DONE which would result in put_task_struct() doing free_task()
> itself for a double free.
>
>> � � �The NOTIFY_OK/NOTIFY_DONE difference really does seem to be just
>> "NOTIFY_OK means that I will free the task myself later". That's what
>> the oprofile uses, and it frees the task.
>>
>> �(b) But the whole interface is a total f*cking mess. If *multiple*
>> people return NOTIFY_OK, they're royally fucked. And the whole (and
>> only) point of notifiers is that you can register multiple different
>> ones independently.
>>
>> So quite frankly, the *real* bug is not in that android driver
>> (although I'd say that we should just make it return NOTIFY_DONE and
>> be done with it). The real bug is that the whole f*cking notifier is a
>> mistake, and checking the error return was the biggest mistake of all.
>>
>
> Right, we can't handoff the freeing of the task_struct to more than one
> notifier. �It seems misdesigned from the beginning and what we really want
> is to hijack task->usage for __put_task_struct(task) if we have such a
> notifier callchain and require each one (currently just oprofile) to take
> a reference on task->usage for NOTIFY_OK and then be responsible for
> dropping the reference when it's done with it later instead of requiring
> it to free the task_struct itself.
>
> That's _if_ we want to continue to have such an interface in the first
> place where it's only really necessary right now for oprofile (and, hence,
> wasn't implemented in an extendable way). �I'm thinking the
> lowmemorykiller, as I eluded to, could be written in a way where we can
> detect if a thread we've already killed has exited yet before killing
> another one. �We can't just store a pointer to the task_struct of the
> killed task since it could be reused for a fork later, but we could use
> TIF_MEMDIE like the oom killer does.
This was a known issue in 2010, in the android tree the use of
task_handoff_register was dropped one day after it was added and
replaced with a new task_free_register hook. I assume Greg dropped
the fix during the android tree refresh in 3.0 because it depended on
a change to kernel/fork.c. The two relevant patches are (using
codeaurora's gitweb becase we don't have one right now):
sched: Add a generic notifier when a task struct is about to be freed
https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=667dffa787a87ef4ea43cc65957ce96077fdcd0a
staging: android: lowmemorykiller: Fix task_struct leak
https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=af0240f095a704f75f032bbcc01f670c65c163ba
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 22:13 ` Colin Cross
@ 2012-04-09 22:21 ` Greg Kroah-Hartman
-1 siblings, 0 replies; 43+ messages in thread
From: Greg Kroah-Hartman @ 2012-04-09 22:21 UTC (permalink / raw)
To: Colin Cross
Cc: David Rientjes, Linus Torvalds, Andrew Morton, werner,
Rik van Riel, Hugh Dickins, linux-kernel, Oleg Nesterov,
Rabin Vincent, Christian Bejram, Paul E. McKenney,
Anton Vorontsov, stable
On Mon, Apr 09, 2012 at 03:13:00PM -0700, Colin Cross wrote:
> On Mon, Apr 9, 2012 at 2:22 PM, David Rientjes <rientjes@google.com> wrote:
> > On Mon, 9 Apr 2012, Linus Torvalds wrote:
> >
> >> The real bug is actually that those notifiers are a f*cking joke, and
> >> the return value from the notifier is a mistake.
> >>
> >> So I personally think that the real problem is this code in
> >> profile_handoff_task:
> >>
> >> return (ret == NOTIFY_OK) ? 1 : 0;
> >>
> >> and ask yourself two questions:
> >>
> >> - what the hell does NOTIFY_OK/NOTIFY_DONE mean?
> >> - what happens if there are multiple notifiers that all (or some)
> >> return NOTIFY_OK?
> >>
> > NOTIFY_OK should never be a valid response for this notifier the way it's
> > currently implemented, it should be NOTIFY_STOP to stop iterating the call
> > chain to avoid a double free. Right now it doesn't matter because only
> > oprofile is actually freeing the task_struct and lowmemorykiller should be
> > using NOTIFY_DONE.
> >
> > Then we have a completeness issue if multiple callbacks want to return
> > NOTIFY_STOP and an ordering issue if the oprofile callback is invoked
> > before lowmemorykiller.
> >
> >> I'll tell you what my answers are:
> >>
> >> (a) NOTIFY_DONE is the "ok, everything is fine, you can free the
> >> task-struct". It's also what that handoff notifier thing returns if
> >> there are no notifiers registered at all.
> >>
> >> So the fix to the Android lowmemorykiller is as simple as just
> >> changing NOTIFY_OK to NOTIFY_DONE, which will mean that the caller
> >> will properly free the task struct.
> >>
> >
> > I don't think so for Werner's config who also has CONFIG_OPROFILE=y, so
> > oprofile would return NOTIFY_OK and queue the task_struct for free, then
> > the second notifier callback to the lowmemorykiller would return
> > NOTIFY_DONE which would result in put_task_struct() doing free_task()
> > itself for a double free.
> >
> >> The NOTIFY_OK/NOTIFY_DONE difference really does seem to be just
> >> "NOTIFY_OK means that I will free the task myself later". That's what
> >> the oprofile uses, and it frees the task.
> >>
> >> (b) But the whole interface is a total f*cking mess. If *multiple*
> >> people return NOTIFY_OK, they're royally fucked. And the whole (and
> >> only) point of notifiers is that you can register multiple different
> >> ones independently.
> >>
> >> So quite frankly, the *real* bug is not in that android driver
> >> (although I'd say that we should just make it return NOTIFY_DONE and
> >> be done with it). The real bug is that the whole f*cking notifier is a
> >> mistake, and checking the error return was the biggest mistake of all.
> >>
> >
> > Right, we can't handoff the freeing of the task_struct to more than one
> > notifier. It seems misdesigned from the beginning and what we really want
> > is to hijack task->usage for __put_task_struct(task) if we have such a
> > notifier callchain and require each one (currently just oprofile) to take
> > a reference on task->usage for NOTIFY_OK and then be responsible for
> > dropping the reference when it's done with it later instead of requiring
> > it to free the task_struct itself.
> >
> > That's _if_ we want to continue to have such an interface in the first
> > place where it's only really necessary right now for oprofile (and, hence,
> > wasn't implemented in an extendable way). I'm thinking the
> > lowmemorykiller, as I eluded to, could be written in a way where we can
> > detect if a thread we've already killed has exited yet before killing
> > another one. We can't just store a pointer to the task_struct of the
> > killed task since it could be reused for a fork later, but we could use
> > TIF_MEMDIE like the oom killer does.
>
> This was a known issue in 2010, in the android tree the use of
> task_handoff_register was dropped one day after it was added and
> replaced with a new task_free_register hook. I assume Greg dropped
> the fix during the android tree refresh in 3.0 because it depended on
> a change to kernel/fork.c. The two relevant patches are (using
> codeaurora's gitweb becase we don't have one right now):
>
> sched: Add a generic notifier when a task struct is about to be freed
> https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=667dffa787a87ef4ea43cc65957ce96077fdcd0a
Yes, I can't add a patch like that for this driver, that is why I
thought everyone was getting together to "properly" determine how to
solve this oom notifier problem. Has that work stalled somwhere?
greg k-h
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
@ 2012-04-09 22:21 ` Greg Kroah-Hartman
0 siblings, 0 replies; 43+ messages in thread
From: Greg Kroah-Hartman @ 2012-04-09 22:21 UTC (permalink / raw)
To: Colin Cross
Cc: David Rientjes, Linus Torvalds, Andrew Morton, werner,
Rik van Riel, Hugh Dickins, linux-kernel, Oleg Nesterov,
Rabin Vincent, Christian Bejram, Paul E. McKenney,
Anton Vorontsov, stable
On Mon, Apr 09, 2012 at 03:13:00PM -0700, Colin Cross wrote:
> On Mon, Apr 9, 2012 at 2:22 PM, David Rientjes <rientjes@google.com> wrote:
> > On Mon, 9 Apr 2012, Linus Torvalds wrote:
> >
> >> The real bug is actually that those notifiers are a f*cking joke, and
> >> the return value from the notifier is a mistake.
> >>
> >> So I personally think that the real problem is this code in
> >> profile_handoff_task:
> >>
> >> � � � � return (ret == NOTIFY_OK) ? 1 : 0;
> >>
> >> and ask yourself two questions:
> >>
> >> �- what the hell does NOTIFY_OK/NOTIFY_DONE mean?
> >> �- what happens if there are multiple notifiers that all (or some)
> >> return NOTIFY_OK?
> >>
> > NOTIFY_OK should never be a valid response for this notifier the way it's
> > currently implemented, it should be NOTIFY_STOP to stop iterating the call
> > chain to avoid a double free. �Right now it doesn't matter because only
> > oprofile is actually freeing the task_struct and lowmemorykiller should be
> > using NOTIFY_DONE.
> >
> > Then we have a completeness issue if multiple callbacks want to return
> > NOTIFY_STOP and an ordering issue if the oprofile callback is invoked
> > before lowmemorykiller.
> >
> >> I'll tell you what my answers are:
> >>
> >> �(a) NOTIFY_DONE is the "ok, everything is fine, you can free the
> >> task-struct". It's also what that handoff notifier thing returns if
> >> there are no notifiers registered at all.
> >>
> >> � � �So the fix to the Android lowmemorykiller is as simple as just
> >> changing NOTIFY_OK to NOTIFY_DONE, which will mean that the caller
> >> will properly free the task struct.
> >>
> >
> > I don't think so for Werner's config who also has CONFIG_OPROFILE=y, so
> > oprofile would return NOTIFY_OK and queue the task_struct for free, then
> > the second notifier callback to the lowmemorykiller would return
> > NOTIFY_DONE which would result in put_task_struct() doing free_task()
> > itself for a double free.
> >
> >> � � �The NOTIFY_OK/NOTIFY_DONE difference really does seem to be just
> >> "NOTIFY_OK means that I will free the task myself later". That's what
> >> the oprofile uses, and it frees the task.
> >>
> >> �(b) But the whole interface is a total f*cking mess. If *multiple*
> >> people return NOTIFY_OK, they're royally fucked. And the whole (and
> >> only) point of notifiers is that you can register multiple different
> >> ones independently.
> >>
> >> So quite frankly, the *real* bug is not in that android driver
> >> (although I'd say that we should just make it return NOTIFY_DONE and
> >> be done with it). The real bug is that the whole f*cking notifier is a
> >> mistake, and checking the error return was the biggest mistake of all.
> >>
> >
> > Right, we can't handoff the freeing of the task_struct to more than one
> > notifier. �It seems misdesigned from the beginning and what we really want
> > is to hijack task->usage for __put_task_struct(task) if we have such a
> > notifier callchain and require each one (currently just oprofile) to take
> > a reference on task->usage for NOTIFY_OK and then be responsible for
> > dropping the reference when it's done with it later instead of requiring
> > it to free the task_struct itself.
> >
> > That's _if_ we want to continue to have such an interface in the first
> > place where it's only really necessary right now for oprofile (and, hence,
> > wasn't implemented in an extendable way). �I'm thinking the
> > lowmemorykiller, as I eluded to, could be written in a way where we can
> > detect if a thread we've already killed has exited yet before killing
> > another one. �We can't just store a pointer to the task_struct of the
> > killed task since it could be reused for a fork later, but we could use
> > TIF_MEMDIE like the oom killer does.
>
> This was a known issue in 2010, in the android tree the use of
> task_handoff_register was dropped one day after it was added and
> replaced with a new task_free_register hook. I assume Greg dropped
> the fix during the android tree refresh in 3.0 because it depended on
> a change to kernel/fork.c. The two relevant patches are (using
> codeaurora's gitweb becase we don't have one right now):
>
> sched: Add a generic notifier when a task struct is about to be freed
> https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=667dffa787a87ef4ea43cc65957ce96077fdcd0a
Yes, I can't add a patch like that for this driver, that is why I
thought everyone was getting together to "properly" determine how to
solve this oom notifier problem. Has that work stalled somwhere?
greg k-h
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 22:13 ` Colin Cross
(?)
(?)
@ 2012-04-09 22:30 ` Linus Torvalds
-1 siblings, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2012-04-09 22:30 UTC (permalink / raw)
To: Colin Cross
Cc: David Rientjes, Andrew Morton, werner, Rik van Riel,
Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
Christian Bejram, Paul E. McKenney, Anton Vorontsov,
Greg Kroah-Hartman, stable
On Mon, Apr 9, 2012 at 3:13 PM, Colin Cross <ccross@google.com> wrote:
>
> sched: Add a generic notifier when a task struct is about to be freed
> https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=667dffa787a87ef4ea43cc65957ce96077fdcd0a
Oh, *HELL*NO*!
It's a fucking disaster in "Oh, one notifier was broken, SO LET'S ADD
ANOTHER RANDOM ONE TO FIX THAT".
The definition of insanity is doing the same thing over and over and
thinking you get a different result. Let's not do that kind of idiotic
thing.
Notifiers are evil crap. Let's make *fewer* of them, not add
yet-another-random-notifier-for-some-random-reason.
F*ck me, but how I hate those random notifiers. And I hate people who
add them willy nilly.
Linus
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 22:21 ` Greg Kroah-Hartman
@ 2012-04-09 22:44 ` john stultz
-1 siblings, 0 replies; 43+ messages in thread
From: john stultz @ 2012-04-09 22:44 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Colin Cross, David Rientjes, Linus Torvalds, Andrew Morton,
werner, Rik van Riel, Hugh Dickins, linux-kernel, Oleg Nesterov,
Rabin Vincent, Christian Bejram, Paul E. McKenney,
Anton Vorontsov, stable
On Mon, Apr 9, 2012 at 3:21 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Mon, Apr 09, 2012 at 03:13:00PM -0700, Colin Cross wrote:
>> sched: Add a generic notifier when a task struct is about to be freed
>> https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=667dffa787a87ef4ea43cc65957ce96077fdcd0a
>
> Yes, I can't add a patch like that for this driver, that is why I
> thought everyone was getting together to "properly" determine how to
> solve this oom notifier problem. Has that work stalled somwhere?
Anton Vorontsov has been working on this (and just sent out some
related vmevent patches today). His hope is to use the vmevent or mem
cgroup interface to notify a userland killer to get the same or
improved behavior as the in-kernel lowmemory killer.
thanks
-john
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
@ 2012-04-09 22:44 ` john stultz
0 siblings, 0 replies; 43+ messages in thread
From: john stultz @ 2012-04-09 22:44 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Colin Cross, David Rientjes, Linus Torvalds, Andrew Morton,
werner, Rik van Riel, Hugh Dickins, linux-kernel, Oleg Nesterov,
Rabin Vincent, Christian Bejram, Paul E. McKenney,
Anton Vorontsov, stable
On Mon, Apr 9, 2012 at 3:21 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Mon, Apr 09, 2012 at 03:13:00PM -0700, Colin Cross wrote:
>> sched: Add a generic notifier when a task struct is about to be freed
>> https://www.codeaurora.org/gitweb/quic/la/?p=kernel/common.git;a=commitdiff;h=667dffa787a87ef4ea43cc65957ce96077fdcd0a
>
> Yes, I can't add a patch like that for this driver, that is why I
> thought everyone was getting together to "properly" determine how to
> solve this oom notifier problem. �Has that work stalled somwhere?
Anton Vorontsov has been working on this (and just sent out some
related vmevent patches today). His hope is to use the vmevent or mem
cgroup interface to notify a userland killer to get the same or
improved behavior as the in-kernel lowmemory killer.
thanks
-john
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 22:09 ` Linus Torvalds
@ 2012-04-09 23:25 ` David Rientjes
2012-04-09 23:55 ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 43+ messages in thread
From: David Rientjes @ 2012-04-09 23:25 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
Anton Vorontsov, Greg Kroah-Hartman, stable, Ingo Molnar
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3064 bytes --]
On Mon, 9 Apr 2012, Linus Torvalds wrote:
> > Right, we can't handoff the freeing of the task_struct to more than one
> > notifier. It seems misdesigned from the beginning and what we really want
> > is to hijack task->usage for __put_task_struct(task) if we have such a
> > notifier callchain and require each one (currently just oprofile) to take
> > a reference on task->usage for NOTIFY_OK and then be responsible for
> > dropping the reference when it's done with it later instead of requiring
> > it to free the task_struct itself.
>
> We could make notifier.c just "or" all the return values together, and
> then it's ok if *one* person returns NOTIFY_OK.
>
You could that if you also turned the check for "ret == NOTIFY_OK" in
profile_handoff_task() into "ret & NOTIFY_OK" in your patch, otherwise you
get a double free from __put_task_struct() and oprofile.
> Of course, that's not how notifiers are documented to work, but quite
> frankly, notifiers with non-zero values that don't sat STOP are broken
> as-is anyway, so you might we well do a logical "or" of the return
> values and at least make things like this work.
>
It works fine if the callbacks are correctly implemented, it's just that
the task handoff in kernel/profile.c is broken because it assumes only one
callback will return NOTIFY_OK, meaning it will eventually free, and its
only checking the return value of the last notifier called to see if
__put_task_struct() should immediately free.
In defense of notifiers, though, it works fine right now for memory
hotplug. The last issue I had with it was when slab lacked a callback
when a node was onlined or offlined in 2.6.34 and then I added memory
hotplug support for that allocator and it has since worked fine. For
things like MEM_GOING_OFFLINE, returning NOTIFY_BAD is great if the
subsystem of interest can't allow the memory to go offline (in-use slab
objects, for example). In the memory hotplug usecase, we certainly don't
want to stop at NOTIFY_OK because we need to notify every subsystem on the
callchain.
> Oh well. So my suggestion right now would be something like the
> attached. It's still horribly broken, it actively breaks documented
> notifier behavior, but dammit, if the notifier people don't like
> 'or'ing return values together they should damn well return zero from
> the notifier that doesn't do anything. And returning an error will
> exit out, so..
>
Instead of this and it's possible bad interactions with other notifiers
during the -rc cycle, I think it would be better to
(1) fix the lowmemorykiller so it doesn't need to use these notifiers at
all, which isn't difficult, for 3.4, then
(2a) change the task handoff to a refcount on task->usage after the final
put_task_struct() using the notifier and then allow it to be freed
after everybody does a put_handoff_task_struct() for 3.5
or
(2b) remove the task handoff notifier callchain entirely and just tie it
directly to oprofile since android won't be using it anymore after
(1).
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 22:13 ` Colin Cross
` (2 preceding siblings ...)
(?)
@ 2012-04-09 23:37 ` David Rientjes
2012-04-10 0:23 ` Colin Cross
-1 siblings, 1 reply; 43+ messages in thread
From: David Rientjes @ 2012-04-09 23:37 UTC (permalink / raw)
To: Colin Cross
Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
Christian Bejram, Paul E. McKenney, Anton Vorontsov,
Greg Kroah-Hartman, stable
On Mon, 9 Apr 2012, Colin Cross wrote:
> This was a known issue in 2010, in the android tree the use of
> task_handoff_register was dropped one day after it was added and
> replaced with a new task_free_register hook.
Why can't you just do this? Are you concerned about the possibility of
depleting all memory reserves?
---
drivers/staging/android/lowmemorykiller.c | 47 ++++-------------------------
1 file changed, 6 insertions(+), 41 deletions(-)
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -55,7 +55,6 @@ static int lowmem_minfree[6] = {
};
static int lowmem_minfree_size = 4;
-static struct task_struct *lowmem_deathpending;
static unsigned long lowmem_deathpending_timeout;
#define lowmem_print(level, x...) \
@@ -64,24 +63,6 @@ static unsigned long lowmem_deathpending_timeout;
printk(x); \
} while (0)
-static int
-task_notify_func(struct notifier_block *self, unsigned long val, void *data);
-
-static struct notifier_block task_nb = {
- .notifier_call = task_notify_func,
-};
-
-static int
-task_notify_func(struct notifier_block *self, unsigned long val, void *data)
-{
- struct task_struct *task = data;
-
- if (task == lowmem_deathpending)
- lowmem_deathpending = NULL;
-
- return NOTIFY_OK;
-}
-
static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
{
struct task_struct *tsk;
@@ -97,19 +78,6 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
int other_file = global_page_state(NR_FILE_PAGES) -
global_page_state(NR_SHMEM);
- /*
- * If we already have a death outstanding, then
- * bail out right away; indicating to vmscan
- * that we have nothing further to offer on
- * this pass.
- *
- * Note: Currently you need CONFIG_PROFILING
- * for this to work correctly.
- */
- if (lowmem_deathpending &&
- time_before_eq(jiffies, lowmem_deathpending_timeout))
- return 0;
-
if (lowmem_adj_size < array_size)
array_size = lowmem_adj_size;
if (lowmem_minfree_size < array_size)
@@ -148,6 +116,11 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
if (!p)
continue;
+ if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
+ time_before_eq(jiffies, lowmem_deathpending_timeout)) {
+ task_unlock(p);
+ return 0;
+ }
oom_score_adj = p->signal->oom_score_adj;
if (oom_score_adj < min_score_adj) {
task_unlock(p);
@@ -174,15 +147,9 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
lowmem_print(1, "send sigkill to %d (%s), adj %d, size %d\n",
selected->pid, selected->comm,
selected_oom_score_adj, selected_tasksize);
- /*
- * If CONFIG_PROFILING is off, then we don't want to stall
- * the killer by setting lowmem_deathpending.
- */
-#ifdef CONFIG_PROFILING
- lowmem_deathpending = selected;
lowmem_deathpending_timeout = jiffies + HZ;
-#endif
send_sig(SIGKILL, selected, 0);
+ set_tsk_thread_flag(selected, TIF_MEMDIE);
rem -= selected_tasksize;
}
lowmem_print(4, "lowmem_shrink %lu, %x, return %d\n",
@@ -198,7 +165,6 @@ static struct shrinker lowmem_shrinker = {
static int __init lowmem_init(void)
{
- task_handoff_register(&task_nb);
register_shrinker(&lowmem_shrinker);
return 0;
}
@@ -206,7 +172,6 @@ static int __init lowmem_init(void)
static void __exit lowmem_exit(void)
{
unregister_shrinker(&lowmem_shrinker);
- task_handoff_unregister(&task_nb);
}
module_param_named(cost, lowmem_shrinker.seeks, int, S_IRUGO | S_IWUSR);
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 23:25 ` David Rientjes
@ 2012-04-09 23:55 ` Linus Torvalds
2012-04-09 23:56 ` [patch] android, lowmemorykiller: remove task handoff notifier David Rientjes
[not found] ` <web-723076709@zbackend1.aha.ru>
2 siblings, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2012-04-09 23:55 UTC (permalink / raw)
To: David Rientjes
Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
Anton Vorontsov, Greg Kroah-Hartman, stable, Ingo Molnar
On Mon, Apr 9, 2012 at 4:25 PM, David Rientjes <rientjes@google.com> wrote:
>
> You could that if you also turned the check for "ret == NOTIFY_OK" in
> profile_handoff_task() into "ret & NOTIFY_OK" in your patch, otherwise you
> get a double free from __put_task_struct() and oprofile.
Why? NOTIFY_DONE is zero.
I do agree that we *also* could do the "& NOTIFY_OK" and make it
clearer that we're oring bits together. And we could document the
stupid notifier interfaces to do this all, and just make the rules be
*sane* when you have multiple notifiers.
And sane rules would be either:
- you always return an error return, and notifiers all return either
0 or a negative error number, and we stop on the first error and
return that.
- you return a bitmask, and we or all bits together (and we can
certainly continue to have a "stop here" bit)
But the current notifier semantics are just insane. The whole "we
return the last return value" is crazy. It's by definition a random
number, since the whole point of notifiers is that there can be
multiple, and they aren't "ordered". So the whole "last return value"
is something I just look at and say: "Whoever designed that is a
f*cking moron".
(And if that happens to be some younger version of me, I am happy that
I got over it. But I'm pretty sure I have never touched that broken
notifier code in my life)
> It works fine if the callbacks are correctly implemented, it's just that
> the task handoff in kernel/profile.c is broken because it assumes only one
> callback will return NOTIFY_OK, meaning it will eventually free, and its
> only checking the return value of the last notifier called to see if
> __put_task_struct() should immediately free.
We can easily document it as "only oprofile is allowed to return
NOTIFY_OK, this notifier is a big mess, don't even *think* about
returning anything but NOTIFY_DONE".
> (1) fix the lowmemorykiller so it doesn't need to use these notifiers at
> all, which isn't difficult, for 3.4, then
I do think that that makes sense. Fixing people to not use notifiers
is always a good idea. Why would anybody sane even care about the
process going away anyway? If some lowmemorykiller decides to kill off
a process that no longer exists, kill() should happily return ENOSRCH,
and we're all good
So it could just use a "pid", and test for existence with send_sig()
or lookup_pid() or something.
> (2a) change the task handoff to a refcount on task->usage after the final
> put_task_struct() using the notifier and then allow it to be freed
> after everybody does a put_handoff_task_struct() for 3.5
The task handoff code runs too late right now. I guess we could easily
move it up, though.
At the same time, the *only* user of that stupid handoff thing is
oprofile, afaik, and if we use a refcount, why the hell doesn't
oprofile just use a refcount to begin with, instead of using that
notifier?: IOW, *both* users of the notifier seem to be just retarded.
So I'd rather just kill the stupid notifier entirely. In the meantime,
making lowmemorykiller just return zero instead just "fixes" it
(assuming we make the notifier semantics for multiple return codes
sane, which they clearly aren't).
Again, almost every notifier user has always been total crap. It's
just a stupid abstraction. "Something happened". "Oh, ok".
Linus
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
@ 2012-04-09 23:55 ` Linus Torvalds
0 siblings, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2012-04-09 23:55 UTC (permalink / raw)
To: David Rientjes
Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
Anton Vorontsov, Greg Kroah-Hartman, stable, Ingo Molnar
On Mon, Apr 9, 2012 at 4:25 PM, David Rientjes <rientjes@google.com> wrote:
>
> You could that if you also turned the check for "ret == NOTIFY_OK" in
> profile_handoff_task() into "ret & NOTIFY_OK" in your patch, otherwise you
> get a double free from __put_task_struct() and oprofile.
Why? NOTIFY_DONE is zero.
I do agree that we *also* could do the "& NOTIFY_OK" and make it
clearer that we're oring bits together. And we could document the
stupid notifier interfaces to do this all, and just make the rules be
*sane* when you have multiple notifiers.
And sane rules would be either:
- you always return an error return, and notifiers all return either
0 or a negative error number, and we stop on the first error and
return that.
- you return a bitmask, and we or all bits together (and we can
certainly continue to have a "stop here" bit)
But the current notifier semantics are just insane. The whole "we
return the last return value" is crazy. It's by definition a random
number, since the whole point of notifiers is that there can be
multiple, and they aren't "ordered". So the whole "last return value"
is something I just look at and say: "Whoever designed that is a
f*cking moron".
(And if that happens to be some younger version of me, I am happy that
I got over it. But I'm pretty sure I have never touched that broken
notifier code in my life)
> It works fine if the callbacks are correctly implemented, it's just that
> the task handoff in kernel/profile.c is broken because it assumes only one
> callback will return NOTIFY_OK, meaning it will eventually free, and its
> only checking the return value of the last notifier called to see if
> __put_task_struct() should immediately free.
We can easily document it as "only oprofile is allowed to return
NOTIFY_OK, this notifier is a big mess, don't even *think* about
returning anything but NOTIFY_DONE".
> �(1) �fix the lowmemorykiller so it doesn't need to use these notifiers at
> � � �all, which isn't difficult, for 3.4, then
I do think that that makes sense. Fixing people to not use notifiers
is always a good idea. Why would anybody sane even care about the
process going away anyway? If some lowmemorykiller decides to kill off
a process that no longer exists, kill() should happily return ENOSRCH,
and we're all good
So it could just use a "pid", and test for existence with send_sig()
or lookup_pid() or something.
> �(2a) change the task handoff to a refcount on task->usage after the final
> � � �put_task_struct() using the notifier and then allow it to be freed
> � � �after everybody does a put_handoff_task_struct() for 3.5
The task handoff code runs too late right now. I guess we could easily
move it up, though.
At the same time, the *only* user of that stupid handoff thing is
oprofile, afaik, and if we use a refcount, why the hell doesn't
oprofile just use a refcount to begin with, instead of using that
notifier?: IOW, *both* users of the notifier seem to be just retarded.
So I'd rather just kill the stupid notifier entirely. In the meantime,
making lowmemorykiller just return zero instead just "fixes" it
(assuming we make the notifier semantics for multiple return codes
sane, which they clearly aren't).
Again, almost every notifier user has always been total crap. It's
just a stupid abstraction. "Something happened". "Oh, ok".
Linus
^ permalink raw reply [flat|nested] 43+ messages in thread
* [patch] android, lowmemorykiller: remove task handoff notifier
2012-04-09 23:25 ` David Rientjes
2012-04-09 23:55 ` Linus Torvalds
@ 2012-04-09 23:56 ` David Rientjes
2012-04-10 1:23 ` Colin Cross
[not found] ` <web-723076709@zbackend1.aha.ru>
2 siblings, 1 reply; 43+ messages in thread
From: David Rientjes @ 2012-04-09 23:56 UTC (permalink / raw)
To: Linus Torvalds, Greg Kroah-Hartman
Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
Anton Vorontsov, stable, Ingo Molnar, Colin Cross
The task handoff notifier leaks task_struct since it never gets freed
after the callback returns NOTIFY_OK, which means it is responsible for
doing so.
It turns out the lowmemorykiller actually doesn't need this notifier at
all. It's used to prevent unnecessary killing by waiting for a thread to
exit as a result of lowmem_shrink(), however, it's possible to do this in
the same way the kernel oom killer works by setting TIF_MEMDIE and avoid
killing if we're still waiting for it to exit.
The kernel oom killer will already automatically set TIF_MEMDIE for
threads that are attempting to allocate memory that have a fatal signal.
The thread selected by lowmem_shrink() will have such a signal after the
lowmemorykiller sends it a SIGKILL, so this won't result in an
unnecessary use of memory reserves for the thread to exit.
This has the added benefit that we don't have to rely on CONFIG_PROFILING
to prevent needlessly killing tasks.
Reported-by: werner <w.landgraf@ru.ru>
Cc: stable@vger.kernel.org
Signed-off-by: David Rientjes <rientjes@google.com>
---
drivers/staging/android/lowmemorykiller.c | 48 +++++------------------------
1 file changed, 7 insertions(+), 41 deletions(-)
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -55,7 +55,6 @@ static int lowmem_minfree[6] = {
};
static int lowmem_minfree_size = 4;
-static struct task_struct *lowmem_deathpending;
static unsigned long lowmem_deathpending_timeout;
#define lowmem_print(level, x...) \
@@ -64,24 +63,6 @@ static unsigned long lowmem_deathpending_timeout;
printk(x); \
} while (0)
-static int
-task_notify_func(struct notifier_block *self, unsigned long val, void *data);
-
-static struct notifier_block task_nb = {
- .notifier_call = task_notify_func,
-};
-
-static int
-task_notify_func(struct notifier_block *self, unsigned long val, void *data)
-{
- struct task_struct *task = data;
-
- if (task == lowmem_deathpending)
- lowmem_deathpending = NULL;
-
- return NOTIFY_OK;
-}
-
static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
{
struct task_struct *tsk;
@@ -97,19 +78,6 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
int other_file = global_page_state(NR_FILE_PAGES) -
global_page_state(NR_SHMEM);
- /*
- * If we already have a death outstanding, then
- * bail out right away; indicating to vmscan
- * that we have nothing further to offer on
- * this pass.
- *
- * Note: Currently you need CONFIG_PROFILING
- * for this to work correctly.
- */
- if (lowmem_deathpending &&
- time_before_eq(jiffies, lowmem_deathpending_timeout))
- return 0;
-
if (lowmem_adj_size < array_size)
array_size = lowmem_adj_size;
if (lowmem_minfree_size < array_size)
@@ -148,6 +116,12 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
if (!p)
continue;
+ if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
+ time_before_eq(jiffies, lowmem_deathpending_timeout)) {
+ task_unlock(p);
+ rcu_read_unlock();
+ return 0;
+ }
oom_score_adj = p->signal->oom_score_adj;
if (oom_score_adj < min_score_adj) {
task_unlock(p);
@@ -174,15 +148,9 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
lowmem_print(1, "send sigkill to %d (%s), adj %d, size %d\n",
selected->pid, selected->comm,
selected_oom_score_adj, selected_tasksize);
- /*
- * If CONFIG_PROFILING is off, then we don't want to stall
- * the killer by setting lowmem_deathpending.
- */
-#ifdef CONFIG_PROFILING
- lowmem_deathpending = selected;
lowmem_deathpending_timeout = jiffies + HZ;
-#endif
send_sig(SIGKILL, selected, 0);
+ set_tsk_thread_flag(selected, TIF_MEMDIE);
rem -= selected_tasksize;
}
lowmem_print(4, "lowmem_shrink %lu, %x, return %d\n",
@@ -198,7 +166,6 @@ static struct shrinker lowmem_shrinker = {
static int __init lowmem_init(void)
{
- task_handoff_register(&task_nb);
register_shrinker(&lowmem_shrinker);
return 0;
}
@@ -206,7 +173,6 @@ static int __init lowmem_init(void)
static void __exit lowmem_exit(void)
{
unregister_shrinker(&lowmem_shrinker);
- task_handoff_unregister(&task_nb);
}
module_param_named(cost, lowmem_shrinker.seeks, int, S_IRUGO | S_IWUSR);
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 23:55 ` Linus Torvalds
@ 2012-04-10 0:04 ` David Rientjes
-1 siblings, 0 replies; 43+ messages in thread
From: David Rientjes @ 2012-04-10 0:04 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
Anton Vorontsov, Greg Kroah-Hartman, stable, Ingo Molnar
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1737 bytes --]
On Mon, 9 Apr 2012, Linus Torvalds wrote:
> > You could that if you also turned the check for "ret == NOTIFY_OK" in
> > profile_handoff_task() into "ret & NOTIFY_OK" in your patch, otherwise you
> > get a double free from __put_task_struct() and oprofile.
>
> Why? NOTIFY_DONE is zero.
>
Oops, right.
> > (1) fix the lowmemorykiller so it doesn't need to use these notifiers at
> > all, which isn't difficult, for 3.4, then
>
> I do think that that makes sense. Fixing people to not use notifiers
> is always a good idea. Why would anybody sane even care about the
> process going away anyway? If some lowmemorykiller decides to kill off
> a process that no longer exists, kill() should happily return ENOSRCH,
> and we're all good
>
It's apparently waiting for a killed thread to exit before selecting
another victim or the one second timeout expires. (And you only get to
prevent needless kills if you have CONFIG_PROFILING, otherwise it doesn't
care.)
> At the same time, the *only* user of that stupid handoff thing is
> oprofile, afaik, and if we use a refcount, why the hell doesn't
> oprofile just use a refcount to begin with, instead of using that
> notifier?: IOW, *both* users of the notifier seem to be just retarded.
>
Agreed and since the current implementation relies on CONFIG_PROFILING I
think it's safe to remove the notifier and add a hook only for oprofile so
it can do free_task() when it wants to. No refcounting required.
I've already proposed a patch that removes the notifier for
lowmemorykiller with the added benefit that it doesn't rely on
CONFIG_PROFILING at all. If that's merged for 3.4, I'll remove the task
handoff callchain entirely for 3.5 since oprofile is the only user.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
@ 2012-04-10 0:04 ` David Rientjes
0 siblings, 0 replies; 43+ messages in thread
From: David Rientjes @ 2012-04-10 0:04 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, werner, Rik van Riel, Hugh Dickins, linux-kernel,
Oleg Nesterov, Rabin Vincent, Christian Bejram, Paul E. McKenney,
Anton Vorontsov, Greg Kroah-Hartman, stable, Ingo Molnar
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1747 bytes --]
On Mon, 9 Apr 2012, Linus Torvalds wrote:
> > You could that if you also turned the check for "ret == NOTIFY_OK" in
> > profile_handoff_task() into "ret & NOTIFY_OK" in your patch, otherwise you
> > get a double free from __put_task_struct() and oprofile.
>
> Why? NOTIFY_DONE is zero.
>
Oops, right.
> > �(1) �fix the lowmemorykiller so it doesn't need to use these notifiers at
> > � � �all, which isn't difficult, for 3.4, then
>
> I do think that that makes sense. Fixing people to not use notifiers
> is always a good idea. Why would anybody sane even care about the
> process going away anyway? If some lowmemorykiller decides to kill off
> a process that no longer exists, kill() should happily return ENOSRCH,
> and we're all good
>
It's apparently waiting for a killed thread to exit before selecting
another victim or the one second timeout expires. (And you only get to
prevent needless kills if you have CONFIG_PROFILING, otherwise it doesn't
care.)
> At the same time, the *only* user of that stupid handoff thing is
> oprofile, afaik, and if we use a refcount, why the hell doesn't
> oprofile just use a refcount to begin with, instead of using that
> notifier?: IOW, *both* users of the notifier seem to be just retarded.
>
Agreed and since the current implementation relies on CONFIG_PROFILING I
think it's safe to remove the notifier and add a hook only for oprofile so
it can do free_task() when it wants to. No refcounting required.
I've already proposed a patch that removes the notifier for
lowmemorykiller with the added benefit that it doesn't rely on
CONFIG_PROFILING at all. If that's merged for 3.4, I'll remove the task
handoff callchain entirely for 3.5 since oprofile is the only user.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 23:37 ` David Rientjes
@ 2012-04-10 0:23 ` Colin Cross
0 siblings, 0 replies; 43+ messages in thread
From: Colin Cross @ 2012-04-10 0:23 UTC (permalink / raw)
To: David Rientjes
Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
Christian Bejram, Paul E. McKenney, Anton Vorontsov,
Greg Kroah-Hartman, stable
On Mon, Apr 9, 2012 at 4:37 PM, David Rientjes <rientjes@google.com> wrote:
> On Mon, 9 Apr 2012, Colin Cross wrote:
>
>> This was a known issue in 2010, in the android tree the use of
>> task_handoff_register was dropped one day after it was added and
>> replaced with a new task_free_register hook.
>
> Why can't you just do this? Are you concerned about the possibility of
> depleting all memory reserves?
The point of the lowmem_deathpending patch was to avoid a stutter
where the cpu would spend its time looping through the tasks due to
repeated calls to lowmem_shrink instead of processing the kill signal
to the selected thread. With this patch, it will still loop through
tasks until it finds the one that was previously killed and then
abort. It's possible that the improvements Anton made to the task
loop reduce the performance impact enough that this whole mess could
just be dropped (by reverting 1eda516, e5d7965, and 4755b72).
This may have also been impacted by another bug that is on my list of
things to look at: when asked the size of it's "cache", lowmemkiller
returns something on the order of all memory used by userspace, but
under some conditions will refuse to kill any of it due to the current
lowmem_minfree settings. Due to the large size of the "cache", the
shrinker can call lowmem_shrink hundreds of times for a single
allocation, each time asking to reduce the size of the cache by 128
pages. The original lowmem_deathpending patch may have been a
misguided "fix" for this bug.
> ---
> drivers/staging/android/lowmemorykiller.c | 47 ++++-------------------------
> 1 file changed, 6 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
> --- a/drivers/staging/android/lowmemorykiller.c
> +++ b/drivers/staging/android/lowmemorykiller.c
> @@ -55,7 +55,6 @@ static int lowmem_minfree[6] = {
> };
> static int lowmem_minfree_size = 4;
>
> -static struct task_struct *lowmem_deathpending;
> static unsigned long lowmem_deathpending_timeout;
>
> #define lowmem_print(level, x...) \
> @@ -64,24 +63,6 @@ static unsigned long lowmem_deathpending_timeout;
> printk(x); \
> } while (0)
>
> -static int
> -task_notify_func(struct notifier_block *self, unsigned long val, void *data);
> -
> -static struct notifier_block task_nb = {
> - .notifier_call = task_notify_func,
> -};
> -
> -static int
> -task_notify_func(struct notifier_block *self, unsigned long val, void *data)
> -{
> - struct task_struct *task = data;
> -
> - if (task == lowmem_deathpending)
> - lowmem_deathpending = NULL;
> -
> - return NOTIFY_OK;
> -}
> -
> static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
> {
> struct task_struct *tsk;
> @@ -97,19 +78,6 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
> int other_file = global_page_state(NR_FILE_PAGES) -
> global_page_state(NR_SHMEM);
>
> - /*
> - * If we already have a death outstanding, then
> - * bail out right away; indicating to vmscan
> - * that we have nothing further to offer on
> - * this pass.
> - *
> - * Note: Currently you need CONFIG_PROFILING
> - * for this to work correctly.
> - */
> - if (lowmem_deathpending &&
> - time_before_eq(jiffies, lowmem_deathpending_timeout))
> - return 0;
> -
> if (lowmem_adj_size < array_size)
> array_size = lowmem_adj_size;
> if (lowmem_minfree_size < array_size)
> @@ -148,6 +116,11 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
> if (!p)
> continue;
>
> + if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
> + time_before_eq(jiffies, lowmem_deathpending_timeout)) {
> + task_unlock(p);
> + return 0;
> + }
> oom_score_adj = p->signal->oom_score_adj;
> if (oom_score_adj < min_score_adj) {
> task_unlock(p);
> @@ -174,15 +147,9 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
> lowmem_print(1, "send sigkill to %d (%s), adj %d, size %d\n",
> selected->pid, selected->comm,
> selected_oom_score_adj, selected_tasksize);
> - /*
> - * If CONFIG_PROFILING is off, then we don't want to stall
> - * the killer by setting lowmem_deathpending.
> - */
> -#ifdef CONFIG_PROFILING
> - lowmem_deathpending = selected;
> lowmem_deathpending_timeout = jiffies + HZ;
> -#endif
> send_sig(SIGKILL, selected, 0);
> + set_tsk_thread_flag(selected, TIF_MEMDIE);
> rem -= selected_tasksize;
> }
> lowmem_print(4, "lowmem_shrink %lu, %x, return %d\n",
> @@ -198,7 +165,6 @@ static struct shrinker lowmem_shrinker = {
>
> static int __init lowmem_init(void)
> {
> - task_handoff_register(&task_nb);
> register_shrinker(&lowmem_shrinker);
> return 0;
> }
> @@ -206,7 +172,6 @@ static int __init lowmem_init(void)
> static void __exit lowmem_exit(void)
> {
> unregister_shrinker(&lowmem_shrinker);
> - task_handoff_unregister(&task_nb);
> }
>
> module_param_named(cost, lowmem_shrinker.seeks, int, S_IRUGO | S_IWUSR);
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
@ 2012-04-10 0:23 ` Colin Cross
0 siblings, 0 replies; 43+ messages in thread
From: Colin Cross @ 2012-04-10 0:23 UTC (permalink / raw)
To: David Rientjes
Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
Christian Bejram, Paul E. McKenney, Anton Vorontsov,
Greg Kroah-Hartman, stable
On Mon, Apr 9, 2012 at 4:37 PM, David Rientjes <rientjes@google.com> wrote:
> On Mon, 9 Apr 2012, Colin Cross wrote:
>
>> This was a known issue in 2010, in the android tree the use of
>> task_handoff_register was dropped one day after it was added and
>> replaced with a new task_free_register hook.
>
> Why can't you just do this? �Are you concerned about the possibility of
> depleting all memory reserves?
The point of the lowmem_deathpending patch was to avoid a stutter
where the cpu would spend its time looping through the tasks due to
repeated calls to lowmem_shrink instead of processing the kill signal
to the selected thread. With this patch, it will still loop through
tasks until it finds the one that was previously killed and then
abort. It's possible that the improvements Anton made to the task
loop reduce the performance impact enough that this whole mess could
just be dropped (by reverting 1eda516, e5d7965, and 4755b72).
This may have also been impacted by another bug that is on my list of
things to look at: when asked the size of it's "cache", lowmemkiller
returns something on the order of all memory used by userspace, but
under some conditions will refuse to kill any of it due to the current
lowmem_minfree settings. Due to the large size of the "cache", the
shrinker can call lowmem_shrink hundreds of times for a single
allocation, each time asking to reduce the size of the cache by 128
pages. The original lowmem_deathpending patch may have been a
misguided "fix" for this bug.
> ---
> �drivers/staging/android/lowmemorykiller.c | � 47 ++++-------------------------
> �1 file changed, 6 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
> --- a/drivers/staging/android/lowmemorykiller.c
> +++ b/drivers/staging/android/lowmemorykiller.c
> @@ -55,7 +55,6 @@ static int lowmem_minfree[6] = {
> �};
> �static int lowmem_minfree_size = 4;
>
> -static struct task_struct *lowmem_deathpending;
> �static unsigned long lowmem_deathpending_timeout;
>
> �#define lowmem_print(level, x...) � � � � � � � � � � �\
> @@ -64,24 +63,6 @@ static unsigned long lowmem_deathpending_timeout;
> � � � � � � � � � � � �printk(x); � � � � � � � � � � �\
> � � � �} while (0)
>
> -static int
> -task_notify_func(struct notifier_block *self, unsigned long val, void *data);
> -
> -static struct notifier_block task_nb = {
> - � � � .notifier_call �= task_notify_func,
> -};
> -
> -static int
> -task_notify_func(struct notifier_block *self, unsigned long val, void *data)
> -{
> - � � � struct task_struct *task = data;
> -
> - � � � if (task == lowmem_deathpending)
> - � � � � � � � lowmem_deathpending = NULL;
> -
> - � � � return NOTIFY_OK;
> -}
> -
> �static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
> �{
> � � � �struct task_struct *tsk;
> @@ -97,19 +78,6 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
> � � � �int other_file = global_page_state(NR_FILE_PAGES) -
> � � � � � � � � � � � � � � � � � � � � � � � �global_page_state(NR_SHMEM);
>
> - � � � /*
> - � � � �* If we already have a death outstanding, then
> - � � � �* bail out right away; indicating to vmscan
> - � � � �* that we have nothing further to offer on
> - � � � �* this pass.
> - � � � �*
> - � � � �* Note: Currently you need CONFIG_PROFILING
> - � � � �* for this to work correctly.
> - � � � �*/
> - � � � if (lowmem_deathpending &&
> - � � � � � time_before_eq(jiffies, lowmem_deathpending_timeout))
> - � � � � � � � return 0;
> -
> � � � �if (lowmem_adj_size < array_size)
> � � � � � � � �array_size = lowmem_adj_size;
> � � � �if (lowmem_minfree_size < array_size)
> @@ -148,6 +116,11 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
> � � � � � � � �if (!p)
> � � � � � � � � � � � �continue;
>
> + � � � � � � � if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
> + � � � � � � � � � time_before_eq(jiffies, lowmem_deathpending_timeout)) {
> + � � � � � � � � � � � task_unlock(p);
> + � � � � � � � � � � � return 0;
> + � � � � � � � }
> � � � � � � � �oom_score_adj = p->signal->oom_score_adj;
> � � � � � � � �if (oom_score_adj < min_score_adj) {
> � � � � � � � � � � � �task_unlock(p);
> @@ -174,15 +147,9 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
> � � � � � � � �lowmem_print(1, "send sigkill to %d (%s), adj %d, size %d\n",
> � � � � � � � � � � � � � � selected->pid, selected->comm,
> � � � � � � � � � � � � � � selected_oom_score_adj, selected_tasksize);
> - � � � � � � � /*
> - � � � � � � � �* If CONFIG_PROFILING is off, then we don't want to stall
> - � � � � � � � �* the killer by setting lowmem_deathpending.
> - � � � � � � � �*/
> -#ifdef CONFIG_PROFILING
> - � � � � � � � lowmem_deathpending = selected;
> � � � � � � � �lowmem_deathpending_timeout = jiffies + HZ;
> -#endif
> � � � � � � � �send_sig(SIGKILL, selected, 0);
> + � � � � � � � set_tsk_thread_flag(selected, TIF_MEMDIE);
> � � � � � � � �rem -= selected_tasksize;
> � � � �}
> � � � �lowmem_print(4, "lowmem_shrink %lu, %x, return %d\n",
> @@ -198,7 +165,6 @@ static struct shrinker lowmem_shrinker = {
>
> �static int __init lowmem_init(void)
> �{
> - � � � task_handoff_register(&task_nb);
> � � � �register_shrinker(&lowmem_shrinker);
> � � � �return 0;
> �}
> @@ -206,7 +172,6 @@ static int __init lowmem_init(void)
> �static void __exit lowmem_exit(void)
> �{
> � � � �unregister_shrinker(&lowmem_shrinker);
> - � � � task_handoff_unregister(&task_nb);
> �}
>
> �module_param_named(cost, lowmem_shrinker.seeks, int, S_IRUGO | S_IWUSR);
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-10 0:23 ` Colin Cross
(?)
@ 2012-04-10 0:32 ` David Rientjes
2012-04-10 1:21 ` Colin Cross
-1 siblings, 1 reply; 43+ messages in thread
From: David Rientjes @ 2012-04-10 0:32 UTC (permalink / raw)
To: Colin Cross
Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
Christian Bejram, Paul E. McKenney, Anton Vorontsov,
Greg Kroah-Hartman, stable
On Mon, 9 Apr 2012, Colin Cross wrote:
> The point of the lowmem_deathpending patch was to avoid a stutter
> where the cpu would spend its time looping through the tasks due to
> repeated calls to lowmem_shrink instead of processing the kill signal
> to the selected thread.
What did you do to avoid this without CONFIG_PROFILING?
> With this patch, it will still loop through
> tasks until it finds the one that was previously killed and then
> abort. It's possible that the improvements Anton made to the task
> loop reduce the performance impact enough that this whole mess could
> just be dropped (by reverting 1eda516, e5d7965, and 4755b72).
>
I don't understand how calling shrink_slab() from direct reclaim or using
drop_caches manually taking slightly longer because it has to iterate the
tasklist to the point of the killed thread will significantly stall the
thread from exiting.
Much more likely is the killed thread cannot exit because you've killed it
in a lowmem situation without giving it access to memory reserves so that
it may exit quickly as my patch does. That has a higher liklihood of
stalling the exit than doing for_each_process().
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-10 0:32 ` David Rientjes
@ 2012-04-10 1:21 ` Colin Cross
0 siblings, 0 replies; 43+ messages in thread
From: Colin Cross @ 2012-04-10 1:21 UTC (permalink / raw)
To: David Rientjes
Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
Christian Bejram, Paul E. McKenney, Anton Vorontsov,
Greg Kroah-Hartman, stable
On Mon, Apr 9, 2012 at 5:32 PM, David Rientjes <rientjes@google.com> wrote:
> On Mon, 9 Apr 2012, Colin Cross wrote:
>
>> The point of the lowmem_deathpending patch was to avoid a stutter
>> where the cpu would spend its time looping through the tasks due to
>> repeated calls to lowmem_shrink instead of processing the kill signal
>> to the selected thread.
>
> What did you do to avoid this without CONFIG_PROFILING?
>
>> With this patch, it will still loop through
>> tasks until it finds the one that was previously killed and then
>> abort. It's possible that the improvements Anton made to the task
>> loop reduce the performance impact enough that this whole mess could
>> just be dropped (by reverting 1eda516, e5d7965, and 4755b72).
>>
>
> I don't understand how calling shrink_slab() from direct reclaim or using
> drop_caches manually taking slightly longer because it has to iterate the
> tasklist to the point of the killed thread will significantly stall the
> thread from exiting.
Before Anton's fix, iterating the tasklist involved taking every task
lock, which probably made it very expensive. I tried a quick test
where I deliberately limited memory to the point that it was
triggering lowmemorykiller during boot, and it triggered about 5000
times taking on the order of 50ms total for all 5000 calls. It was
about the same with your patch applied.
> Much more likely is the killed thread cannot exit because you've killed it
> in a lowmem situation without giving it access to memory reserves so that
> it may exit quickly as my patch does. That has a higher liklihood of
> stalling the exit than doing for_each_process().
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
@ 2012-04-10 1:21 ` Colin Cross
0 siblings, 0 replies; 43+ messages in thread
From: Colin Cross @ 2012-04-10 1:21 UTC (permalink / raw)
To: David Rientjes
Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
Christian Bejram, Paul E. McKenney, Anton Vorontsov,
Greg Kroah-Hartman, stable
On Mon, Apr 9, 2012 at 5:32 PM, David Rientjes <rientjes@google.com> wrote:
> On Mon, 9 Apr 2012, Colin Cross wrote:
>
>> The point of the lowmem_deathpending patch was to avoid a stutter
>> where the cpu would spend its time looping through the tasks due to
>> repeated calls to lowmem_shrink instead of processing the kill signal
>> to the selected thread.
>
> What did you do to avoid this without CONFIG_PROFILING?
>
>> With this patch, it will still loop through
>> tasks until it finds the one that was previously killed and then
>> abort. �It's possible that the improvements Anton made to the task
>> loop reduce the performance impact enough that this whole mess could
>> just be dropped (by reverting 1eda516, e5d7965, and 4755b72).
>>
>
> I don't understand how calling shrink_slab() from direct reclaim or using
> drop_caches manually taking slightly longer because it has to iterate the
> tasklist to the point of the killed thread will significantly stall the
> thread from exiting.
Before Anton's fix, iterating the tasklist involved taking every task
lock, which probably made it very expensive. I tried a quick test
where I deliberately limited memory to the point that it was
triggering lowmemorykiller during boot, and it triggered about 5000
times taking on the order of 50ms total for all 5000 calls. It was
about the same with your patch applied.
> Much more likely is the killed thread cannot exit because you've killed it
> in a lowmem situation without giving it access to memory reserves so that
> it may exit quickly as my patch does. �That has a higher liklihood of
> stalling the exit than doing for_each_process().
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch] android, lowmemorykiller: remove task handoff notifier
2012-04-09 23:56 ` [patch] android, lowmemorykiller: remove task handoff notifier David Rientjes
@ 2012-04-10 1:23 ` Colin Cross
0 siblings, 0 replies; 43+ messages in thread
From: Colin Cross @ 2012-04-10 1:23 UTC (permalink / raw)
To: David Rientjes
Cc: Linus Torvalds, Greg Kroah-Hartman, Andrew Morton, werner,
Rik van Riel, Hugh Dickins, linux-kernel, Oleg Nesterov,
Rabin Vincent, Christian Bejram, Paul E. McKenney,
Anton Vorontsov, stable, Ingo Molnar
On Mon, Apr 9, 2012 at 4:56 PM, David Rientjes <rientjes@google.com> wrote:
> The task handoff notifier leaks task_struct since it never gets freed
> after the callback returns NOTIFY_OK, which means it is responsible for
> doing so.
>
> It turns out the lowmemorykiller actually doesn't need this notifier at
> all. It's used to prevent unnecessary killing by waiting for a thread to
> exit as a result of lowmem_shrink(), however, it's possible to do this in
> the same way the kernel oom killer works by setting TIF_MEMDIE and avoid
> killing if we're still waiting for it to exit.
>
> The kernel oom killer will already automatically set TIF_MEMDIE for
> threads that are attempting to allocate memory that have a fatal signal.
> The thread selected by lowmem_shrink() will have such a signal after the
> lowmemorykiller sends it a SIGKILL, so this won't result in an
> unnecessary use of memory reserves for the thread to exit.
>
> This has the added benefit that we don't have to rely on CONFIG_PROFILING
> to prevent needlessly killing tasks.
>
> Reported-by: werner <w.landgraf@ru.ru>
> Cc: stable@vger.kernel.org
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
> drivers/staging/android/lowmemorykiller.c | 48 +++++------------------------
> 1 file changed, 7 insertions(+), 41 deletions(-)
>
I did a quick test to measure the difference in time spent inside
lowmem_shrink with and without this patch, and they were about the
same. So,
Acked-by: Colin Cross <ccross@android.com>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [patch] android, lowmemorykiller: remove task handoff notifier
@ 2012-04-10 1:23 ` Colin Cross
0 siblings, 0 replies; 43+ messages in thread
From: Colin Cross @ 2012-04-10 1:23 UTC (permalink / raw)
To: David Rientjes
Cc: Linus Torvalds, Greg Kroah-Hartman, Andrew Morton, werner,
Rik van Riel, Hugh Dickins, linux-kernel, Oleg Nesterov,
Rabin Vincent, Christian Bejram, Paul E. McKenney,
Anton Vorontsov, stable, Ingo Molnar
On Mon, Apr 9, 2012 at 4:56 PM, David Rientjes <rientjes@google.com> wrote:
> The task handoff notifier leaks task_struct since it never gets freed
> after the callback returns NOTIFY_OK, which means it is responsible for
> doing so.
>
> It turns out the lowmemorykiller actually doesn't need this notifier at
> all. �It's used to prevent unnecessary killing by waiting for a thread to
> exit as a result of lowmem_shrink(), however, it's possible to do this in
> the same way the kernel oom killer works by setting TIF_MEMDIE and avoid
> killing if we're still waiting for it to exit.
>
> The kernel oom killer will already automatically set TIF_MEMDIE for
> threads that are attempting to allocate memory that have a fatal signal.
> The thread selected by lowmem_shrink() will have such a signal after the
> lowmemorykiller sends it a SIGKILL, so this won't result in an
> unnecessary use of memory reserves for the thread to exit.
>
> This has the added benefit that we don't have to rely on CONFIG_PROFILING
> to prevent needlessly killing tasks.
>
> Reported-by: werner <w.landgraf@ru.ru>
> Cc: stable@vger.kernel.org
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
> �drivers/staging/android/lowmemorykiller.c | � 48 +++++------------------------
> �1 file changed, 7 insertions(+), 41 deletions(-)
>
I did a quick test to measure the difference in time spent inside
lowmem_shrink with and without this patch, and they were about the
same. So,
Acked-by: Colin Cross <ccross@android.com>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-10 1:21 ` Colin Cross
(?)
@ 2012-04-10 1:33 ` David Rientjes
2012-04-10 1:37 ` Colin Cross
-1 siblings, 1 reply; 43+ messages in thread
From: David Rientjes @ 2012-04-10 1:33 UTC (permalink / raw)
To: Colin Cross
Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
Christian Bejram, Paul E. McKenney, Anton Vorontsov,
Greg Kroah-Hartman, stable
On Mon, 9 Apr 2012, Colin Cross wrote:
> Before Anton's fix, iterating the tasklist involved taking every task
> lock, which probably made it very expensive.
I'm not sure of the fix you're referring to, but it's not in 3.4-rc2
because lowmem_shrink() still does find_lock_task_mm() for every user
process on the system, which is necessary to safely do get_mm_rss().
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-10 1:33 ` David Rientjes
@ 2012-04-10 1:37 ` Colin Cross
0 siblings, 0 replies; 43+ messages in thread
From: Colin Cross @ 2012-04-10 1:37 UTC (permalink / raw)
To: David Rientjes
Cc: Linus Torvalds, Andrew Morton, werner, Rik van Riel,
Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
Christian Bejram, Paul E. McKenney, Anton Vorontsov,
Greg Kroah-Hartman, stable
On Mon, Apr 9, 2012 at 6:33 PM, David Rientjes <rientjes@google.com> wrote:
> On Mon, 9 Apr 2012, Colin Cross wrote:
>
>> Before Anton's fix, iterating the tasklist involved taking every task
>> lock, which probably made it very expensive.
>
> I'm not sure of the fix you're referring to, but it's not in 3.4-rc2
> because lowmem_shrink() still does find_lock_task_mm() for every user
> process on the system, which is necessary to safely do get_mm_rss().
I confused "staging: android/lowmemorykiller: Don't grab
tasklist_lock" and "staging: android/lowmemorykiller: Better mm
handling". You're right, it still grabs the task lock.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
[not found] ` <alpine.DEB.2.00.1204091707580.21813@chino.kir.corp.google.com>
@ 2012-04-10 7:09 ` werner
2012-04-10 7:10 ` werner
1 sibling, 0 replies; 43+ messages in thread
From: werner @ 2012-04-10 7:09 UTC (permalink / raw)
To: David Rientjes, Colin Cross, Linus Torvalds, Andrew Morton,
Rik van Riel, Hugh Dickins, linux-kernel, Oleg Nesterov,
Rabin Vincent, Christian Bejram, Paul E. McKenney,
Anton Vorontsov, Greg Kroah-Hartman, stable
After first I tested some hours the 1st,one-line patch by
D.R., now is ready compiled and started to be tested his
2nd patch, below. I see he has it already comitted; it
would have been better first wait to test it.
The loop suggested below, with this 2nd patch, gives 1560
kB , compared with 1632 kb after the 1st patch, and 1432
kB with kernel 3.3 . On the other hand, 3.3 has
clearly the same problem (even if not crashing, it's
becoming often very slow, and then there's running
kmemleak, what I have to kill for return to the normal
speed), but according this 'test' it would be good, so
that it's questionable if this test is reliable.
As already reported, the 1st patch cured the problem at
least subjectively.
To see if this 2nd patch is good, I have to wait now some
hours and observe if the computer becomes slow or even
crashs
wl
=================================================
On Mon, 9 Apr 2012 17:11:45 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:
> On Mon, 9 Apr 2012, werner wrote:
>
>> I continue now testing your first patch a few hours, if
>>it's good or not.
>> Then, I can make another patch. So you have still time
>>to think and put all
>> together
>> what you want to be tested, and mail me that. Also
>>explain me, if you want
>> other
>> patchs ADDITIONALLY or INSTEAD your first patch -- the
>>best would be, to send
>> me
>> always accumulating patchs including everything together
>>to be applied over
>> the
>> 'virgin' 3.4-rcX kernel.
>> For your information, I dont download the whole git, I
>>download all 3.X.Y-rcZ
>> , and I
>> recompile everything again (patched), instead of
>>compiling only the patched
>> subroutines.
>>
>
> Ok, when you want to test the latest patch, try this:
>
> - revert back to the vanilla 3.4-rc2 kernel,
>
> - boot and do this on the command line:
>
> for i in $(seq 1 10000); do sleep 0 & done
> grep KernelStack /proc/meminfo
>
> - record that number,
>
> - apply the patch at https://lkml.org/lkml/2012/4/9/428,
>
> - boot and do the same two command lines,
>
> - compare the number with the previous number from the
>first boot.
>
> The number should be much lower after the patch is
>applied.
>
> Thanks!
>
>
"werner" <w.landgraf@ru.ru>
---
Professional hosting for everyone - http://www.host.ru
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
[not found] ` <alpine.DEB.2.00.1204091707580.21813@chino.kir.corp.google.com>
2012-04-10 7:09 ` v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs) werner
@ 2012-04-10 7:10 ` werner
1 sibling, 0 replies; 43+ messages in thread
From: werner @ 2012-04-10 7:10 UTC (permalink / raw)
To: David Rientjes, Colin Cross, Linus Torvalds, Andrew Morton,
Rik van Riel, Hugh Dickins, linux-kernel, Oleg Nesterov,
Rabin Vincent, Christian Bejram, Paul E. McKenney,
Anton Vorontsov, Greg Kroah-Hartman, stable
After first I tested some hours the 1st,one-line patch by
D.R., now is ready compiled and started to be tested his
2nd patch, below. I see he has it already comitted; it
would have been better first wait to test it.
The loop suggested below, with this 2nd patch, gives 1560
kB , compared with 1632 kb after the 1st patch, and 1432
kB with kernel 3.3 . On the other hand, 3.3 has
clearly the same problem (even if not crashing, it's
becoming often very slow, and then there's running
kmemleak, what I have to kill for return to the normal
speed), but according this 'test' it would be good, so
that it's questionable if this test is reliable.
As already reported, the 1st patch cured the problem at
least subjectively.
To see if this 2nd patch is good, I have to wait now some
hours and observe if the computer becomes slow or even
crashs
wl
=================================================
On Mon, 9 Apr 2012 17:11:45 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:
> On Mon, 9 Apr 2012, werner wrote:
>
>> I continue now testing your first patch a few hours, if
>>it's good or not.
>> Then, I can make another patch. So you have still time
>>to think and put all
>> together
>> what you want to be tested, and mail me that. Also
>>explain me, if you want
>> other
>> patchs ADDITIONALLY or INSTEAD your first patch -- the
>>best would be, to send
>> me
>> always accumulating patchs including everything together
>>to be applied over
>> the
>> 'virgin' 3.4-rcX kernel.
>> For your information, I dont download the whole git, I
>>download all 3.X.Y-rcZ
>> , and I
>> recompile everything again (patched), instead of
>>compiling only the patched
>> subroutines.
>>
>
> Ok, when you want to test the latest patch, try this:
>
> - revert back to the vanilla 3.4-rc2 kernel,
>
> - boot and do this on the command line:
>
> for i in $(seq 1 10000); do sleep 0 & done
> grep KernelStack /proc/meminfo
>
> - record that number,
>
> - apply the patch at https://lkml.org/lkml/2012/4/9/428,
>
> - boot and do the same two command lines,
>
> - compare the number with the previous number from the
>first boot.
>
> The number should be much lower after the patch is
>applied.
>
> Thanks!
>
>
"werner" <w.landgraf@ru.ru>
---
Professional hosting for everyone - http://www.host.ru
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs)
2012-04-09 23:55 ` Linus Torvalds
(?)
(?)
@ 2012-04-14 20:50 ` Srivatsa S. Bhat
-1 siblings, 0 replies; 43+ messages in thread
From: Srivatsa S. Bhat @ 2012-04-14 20:50 UTC (permalink / raw)
To: Linus Torvalds
Cc: David Rientjes, Andrew Morton, werner, Rik van Riel,
Hugh Dickins, linux-kernel, Oleg Nesterov, Rabin Vincent,
Christian Bejram, Paul E. McKenney, Anton Vorontsov,
Greg Kroah-Hartman, stable, Ingo Molnar, linux-kernel,
Rafael J. Wysocki, Peter Zijlstra, Steven Rostedt
On 04/10/2012 05:25 AM, Linus Torvalds wrote:
> On Mon, Apr 9, 2012 at 4:25 PM, David Rientjes <rientjes@google.com> wrote:
>>
>> You could that if you also turned the check for "ret == NOTIFY_OK" in
>> profile_handoff_task() into "ret & NOTIFY_OK" in your patch, otherwise you
>> get a double free from __put_task_struct() and oprofile.
>
> Why? NOTIFY_DONE is zero.
>
> I do agree that we *also* could do the "& NOTIFY_OK" and make it
> clearer that we're oring bits together. And we could document the
> stupid notifier interfaces to do this all, and just make the rules be
> *sane* when you have multiple notifiers.
>
> And sane rules would be either:
>
> - you always return an error return, and notifiers all return either
> 0 or a negative error number, and we stop on the first error and
> return that.
>
> - you return a bitmask, and we or all bits together (and we can
> certainly continue to have a "stop here" bit)
>
Even I think 'or'ing the bits makes more sense than returning the last
return value.
CPU hotplug and suspend/resume are two of the things that I know of,
that use notifiers quite a bit. However, neither of them actually care
about the exact return value - if it is an error return, no matter which
one or for what reason, they do the same error handling; and it works
for them. IOW, if we change the documented behaviour of notifiers to
return 'or' of all return values, that would continue to work well
with these users.
Of course, there are other users like profile_handoff_task() that do
care about exactly what the return value was, but I guess we can
gradually adapt such users to the better, saner rules for the notifier
return values, as you proposed.
> But the current notifier semantics are just insane. The whole "we
> return the last return value" is crazy. It's by definition a random
> number, since the whole point of notifiers is that there can be
> multiple, and they aren't "ordered". So the whole "last return value"
> is something I just look at and say: "Whoever designed that is a
> f*cking moron".
>
[...]
>
> Again, almost every notifier user has always been total crap. It's
> just a stupid abstraction.
> "Something happened". "Oh, ok".
>
Never saw such a concise and apt definition of notifiers before ;-)
However, unfortunately, what other better mechanism do we have, to
deal with things that affect stuff across multiple subsystems, like
some of the users mentioned above? Hmm...
Regards,
Srivatsa S. Bhat
^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2012-04-14 20:50 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-09 2:42 v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs) Linus Torvalds
2012-04-09 2:50 ` Andrew Morton
2012-04-09 3:11 ` Linus Torvalds
2012-04-09 7:04 ` Sven Joachim
2012-04-09 15:24 ` Linus Torvalds
2012-04-09 15:43 ` Sven Joachim
2012-04-09 15:57 ` Rik van Riel
2012-04-09 16:19 ` Sven Joachim
2012-04-09 16:33 ` Rik van Riel
2012-04-09 17:00 ` Pekka Enberg
2012-04-09 17:19 ` Sven Joachim
2012-04-09 17:00 ` Sven Joachim
2012-04-09 17:20 ` Rik van Riel
2012-04-09 10:15 ` David Rientjes
2012-04-09 15:39 ` Linus Torvalds
2012-04-09 21:22 ` David Rientjes
2012-04-09 22:09 ` Linus Torvalds
2012-04-09 23:25 ` David Rientjes
2012-04-09 23:55 ` Linus Torvalds
2012-04-09 23:55 ` Linus Torvalds
2012-04-10 0:04 ` David Rientjes
2012-04-10 0:04 ` David Rientjes
2012-04-14 20:50 ` Srivatsa S. Bhat
2012-04-09 23:56 ` [patch] android, lowmemorykiller: remove task handoff notifier David Rientjes
2012-04-10 1:23 ` Colin Cross
2012-04-10 1:23 ` Colin Cross
[not found] ` <web-723076709@zbackend1.aha.ru>
[not found] ` <alpine.DEB.2.00.1204091637280.21813@chino.kir.corp.google.com>
[not found] ` <web-723082731@zbackend1.aha.ru>
[not found] ` <alpine.DEB.2.00.1204091707580.21813@chino.kir.corp.google.com>
2012-04-10 7:09 ` v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1 sticks-and-crashs) werner
2012-04-10 7:10 ` werner
2012-04-09 22:13 ` Colin Cross
2012-04-09 22:13 ` Colin Cross
2012-04-09 22:21 ` Greg Kroah-Hartman
2012-04-09 22:21 ` Greg Kroah-Hartman
2012-04-09 22:44 ` john stultz
2012-04-09 22:44 ` john stultz
2012-04-09 22:30 ` Linus Torvalds
2012-04-09 23:37 ` David Rientjes
2012-04-10 0:23 ` Colin Cross
2012-04-10 0:23 ` Colin Cross
2012-04-10 0:32 ` David Rientjes
2012-04-10 1:21 ` Colin Cross
2012-04-10 1:21 ` Colin Cross
2012-04-10 1:33 ` David Rientjes
2012-04-10 1:37 ` Colin Cross
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.