* 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 @ 2009-10-01 19:53 Rafael J. Wysocki 2009-10-01 19:53 ` [Bug #13645] NULL pointer dereference at (null) (level2_spare_pgt) Rafael J. Wysocki ` (48 more replies) 0 siblings, 49 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:53 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Andrew Morton, Linus Torvalds, Natalie Protasevich, Kernel Testers List, Network Development, Linux ACPI, Linux PM List, Linux SCSI List, Linux Wireless List, DRI [Notes: * Quite a number of new regressions from 2.6.30 has been reported during the last three weeks. * The number of unresolved regressions 2.6.30 -> 2.6.31 is now the second highest ever.] This message contains a list of some regressions introduced between 2.6.30 and 2.6.31, for which there are no fixes in the mainline I know of. If any of them have been fixed already, please let me know. If you know of any other unresolved regressions introduced between 2.6.30 and 2.6.31, please let me know either and I'll add them to the list. Also, please let me know if any of the entries below are invalid. Each entry from the list will be sent additionally in an automatic reply to this message with CCs to the people involved in reporting and handling the issue. Listed regressions statistics: Date Total Pending Unresolved ---------------------------------------- 2009-10-02 151 49 42 2009-09-06 123 34 27 2009-08-26 108 33 26 2009-08-20 102 32 29 2009-08-10 89 27 24 2009-08-02 76 36 28 2009-07-27 70 51 43 2009-07-07 35 25 21 2009-06-29 22 22 15 Unresolved regressions ---------------------- Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14301 Subject : WARNING: at net/ipv4/af_inet.c:154 Submitter : Ralf Hildebrandt <Ralf.Hildebrandt@charite.de> Date : 2009-09-30 12:24 (2 days old) References : http://marc.info/?l=linux-kernel&m=125431350218137&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14294 Subject : kernel BUG at drivers/ide/ide-disk.c:187 Submitter : Santiago Garcia Mantinan <manty@manty.net> Date : 2009-09-30 11:05 (2 days old) References : http://marc.info/?l=linux-kernel&m=125430926311466&w=4 Handled-By : David Miller <davem@davemloft.net> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14270 Subject : Cannot boot on a PIII Celeron Submitter : Michael Tokarev <mjt@tls.msk.ru> Date : 2009-09-28 15:26 (4 days old) References : http://marc.info/?l=linux-kernel&m=125415160524110&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14267 Subject : Disassociating atheros wlan Submitter : Kristoffer Ericson <kristoffer.ericson@gmail.com> Date : 2009-09-24 10:16 (8 days old) References : http://marc.info/?l=linux-kernel&m=125378723723384&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14266 Subject : regression in page writeback Submitter : Shaohua Li <shaohua.li@intel.com> Date : 2009-09-22 5:49 (10 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d7831a0bdf06b9f722b947bb0c205ff7d77cebd8 References : http://marc.info/?l=linux-kernel&m=125359858117176&w=4 Handled-By : Wu Fengguang <fengguang.wu@intel.com> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14265 Subject : ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100 Submitter : Karol Lewandowski <karol.k.lewandowski@gmail.com> Date : 2009-09-15 12:05 (17 days old) References : http://marc.info/?l=linux-kernel&m=125301636509517&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14264 Subject : ehci problem - mouse dead on scroll Submitter : Volker Armin Hemmann <volkerarmin@googlemail.com> Date : 2009-09-12 7:46 (20 days old) References : http://marc.info/?l=linux-kernel&m=125274202707893&w=4 Handled-By : Alan Stern <stern@rowland.harvard.edu> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14257 Subject : Not able to boot on 32 bit System Submitter : Rishikesh <risrajak@linux.vnet.ibm.com> Date : 2009-09-21 15:25 (11 days old) References : http://marc.info/?l=linux-kernel&m=125354604314412&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14256 Subject : kernel BUG at fs/ext3/super.c:435 Submitter : Mikael Pettersson <mikpe@it.uu.se> Date : 2009-09-21 7:29 (11 days old) References : http://marc.info/?l=linux-kernel&m=125351816109264&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14255 Subject : WARNING: at drivers/char/tty_io.c:1267 Submitter : Heinz Diehl <htd@fancy-poultry.org> Date : 2009-09-20 11:37 (12 days old) References : http://marc.info/?l=linux-kernel&m=125344629506309&w=4 http://lkml.org/lkml/2009/9/8/393 Handled-By : Linus Torvalds <torvalds@linux-foundation.org> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14254 Subject : Hibernation broken by clocksource: Save mult_orig in clocksource_disable() Submitter : Ondrej Zary <linux@rainbow-software.org> Date : 2009-09-19 19:55 (13 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c7121843685de2bf7f3afd3ae1d6a146010bf1fc References : http://marc.info/?l=linux-kernel&m=125339012527719&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14252 Subject : WARNING: at include/linux/skbuff.h:1382 w/ e1000 Submitter : Stephan von Krawczynski <skraw@ithnet.com> Date : 2009-09-20 11:26 (12 days old) References : http://marc.info/?l=linux-kernel&m=125344599006033&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14251 Subject : 2.6.31: no login prompt Submitter : Frédéric L. W. Meunier <fredlwm@gmail.com> Date : 2009-09-19 22:43 (13 days old) References : http://marc.info/?l=linux-kernel&m=125340020804711&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14249 Subject : BUG: oops in gss_validate on 2.6.31 Submitter : Bastian Blank <bastian@waldi.eu.org> Date : 2009-09-16 10:29 (16 days old) References : http://marc.info/?l=linux-kernel&m=125309700417283&w=4 Handled-By : Trond Myklebust <trond.myklebust@fys.uio.no> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14248 Subject : 2.6.31 wireless: WARNING: at net/wireless/ibss.c:34 Submitter : Jurriaan <thunder8@xs4all.nl> Date : 2009-09-13 7:32 (19 days old) References : http://marc.info/?l=linux-kernel&m=125282721113553&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14222 Subject : Hibernation oopses for the 2nd time with 2.6.31 (won't fit the screen) Submitter : Ondrej Zary <linux@rainbow-software.org> Date : 2009-09-24 14:07 (8 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c7121843685de2bf7f3afd3ae1d6a146010bf1fc Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14205 Subject : Intel DX58SO mainboard - powering off takes really long Submitter : Tomasz Chmielewski <tch@wpkg.org> Date : 2009-09-22 10:14 (10 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14204 Subject : MCE prevent booting on my computer(pentium iii @500Mhz) Submitter : GNUtoo <GNUtoo@no-log.org> Date : 2009-09-21 20:36 (11 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14185 Subject : Oops in driversbasefirmware_class Submitter : <lars_ericsson@telia.com> Date : 2009-09-17 05:09 (15 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6e03a201bbe8137487f340d26aa662110e324b20 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14181 Subject : b43 causes panic at system shutdown Submitter : Jeremy Huddleston <jeremyhu@freedesktop.org> Date : 2009-09-15 18:34 (17 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14157 Subject : end_request: I/O error, dev cciss/cXdX, sector 0 Submitter : <jiri.harcarik@gmail.com> Date : 2009-09-11 07:42 (21 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14143 Subject : OOPS when setting nr_requests for md devices Submitter : aCaB <acab@clamav.net> Date : 2009-09-08 08:48 (24 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14141 Subject : order 2 page allocation failures in iwlagn Submitter : Frans Pop <elendil@planet.nl> Date : 2009-09-06 7:40 (26 days old) References : http://marc.info/?l=linux-kernel&m=125222287419691&w=4 Handled-By : Pekka Enberg <penberg@cs.helsinki.fi> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14133 Subject : WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule Submitter : Jens Axboe <jens.axboe@oracle.com> Date : 2009-08-31 20:43 (32 days old) References : http://marc.info/?l=linux-kernel&m=125175143918050&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14114 Subject : Tuning a saa7134 based card is broken in kernel 2.6.31-rc7 Submitter : Tsvety Petrov <Tsvetoslav.Petrov@itron.com> Date : 2009-09-03 21:06 (29 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14090 Subject : WARNING: at fs/notify/inotify/inotify_user.c:394 Submitter : Joerg Platte <bugzilla@jako.ping.de> Date : 2009-08-30 15:21 (33 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14070 Subject : lockdep warning triggered by dup_fd Submitter : Bart Van Assche <bart.vanassche@gmail.com> Date : 2009-08-23 09:36 (40 days old) References : http://lkml.org/lkml/2009/8/23/8 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14058 Subject : Oops in fsnotify Submitter : Grant Wilson <grant.wilson@zen.co.uk> Date : 2009-08-20 15:48 (43 days old) References : http://marc.info/?l=linux-kernel&m=125078450923133&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14013 Subject : hd don't show up Submitter : Tim Blechmann <tim@klingt.org> Date : 2009-08-14 8:26 (49 days old) References : http://marc.info/?l=linux-kernel&m=125023842514480&w=4 Handled-By : Tejun Heo <tj@kernel.org> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13987 Subject : Received NMI interrupt at resume Submitter : Christian Casteyde <casteyde.christian@free.fr> Date : 2009-08-15 07:55 (48 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950 Subject : Oops when USB Serial disconnected while in use Submitter : Bruno Prémont <bonbons@linux-vserver.org> Date : 2009-08-08 17:47 (55 days old) References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4 Handled-By : Alan Stern <stern@rowland.harvard.edu> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943 Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k Submitter : Fabio Comolli <fabio.comolli@gmail.com> Date : 2009-08-06 20:15 (57 days old) References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942 Subject : Troubles with AoE and uninitialized object Submitter : Bruno Prémont <bonbons@linux-vserver.org> Date : 2009-08-04 10:12 (59 days old) References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941 Subject : x86 Geode issue Submitter : Martin-Éric Racine <q-funk@iki.fi> Date : 2009-08-03 12:58 (60 days old) References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 Subject : iwlagn and sky2 stopped working, ACPI-related Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net> Date : 2009-08-07 22:33 (56 days old) References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935 Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) Submitter : Adrian Ulrich <kernel@blinkenlights.ch> Date : 2009-08-08 22:08 (55 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13906 Subject : Huawei E169 GPRS connection causes Ooops Submitter : Clemens Eisserer <linuxhippy@gmail.com> Date : 2009-08-04 09:02 (59 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13869 Subject : Radeon framebuffer (w/o KMS) corruption at boot. Submitter : Duncan <1i5t5.duncan@cox.net> Date : 2009-07-29 16:44 (65 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836 Subject : suspend script fails, related to stdout? Submitter : Tomas M. <tmezzadra@gmail.com> Date : 2009-07-17 21:24 (77 days old) References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13809 Subject : oprofile: possible circular locking dependency detected Submitter : Jerome Marchand <jmarchan@redhat.com> Date : 2009-07-22 13:35 (72 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13733 Subject : 2.6.31-rc2: irq 16: nobody cared Submitter : Niel Lambrechts <niel.lambrechts@gmail.com> Date : 2009-07-06 18:32 (88 days old) References : http://marc.info/?l=linux-kernel&m=124690524027166&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13645 Subject : NULL pointer dereference at (null) (level2_spare_pgt) Submitter : poornima nayak <mpnayak@linux.vnet.ibm.com> Date : 2009-06-17 17:56 (107 days old) References : http://lkml.org/lkml/2009/6/17/194 Regressions with patches ------------------------ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14275 Subject : kernel>=2.6.31: ahci.c: do not force unconditionally sb600 to 32bit dma any more? Submitter : gabriele balducci <balducci@units.it> Date : 2009-09-30 15:02 (2 days old) Patch : http://bugzilla.kernel.org/show_bug.cgi?id=14275#c0 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14261 Subject : e1000e jumbo frames no longer work: 'Unsupported MTU setting' Submitter : Nix <nix@esperi.org.uk> Date : 2009-09-26 11:16 (6 days old) References : http://marc.info/?l=linux-kernel&m=125396433321342&w=4 Handled-By : Alexander Duyck <alexander.duyck@gmail.com> Patch : http://patchwork.kernel.org/patch/50277/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14258 Subject : Memory leak in SCSI initialization Submitter : Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Date : 2009-09-22 4:18 (10 days old) References : http://marc.info/?l=linux-kernel&m=125359311312243&w=4 Handled-By : Michael Ellerman <michael@ellerman.id.au> Patch : http://patchwork.kernel.org/patch/49258/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14253 Subject : Oops in driversbasefirmware_class Submitter : Lars Ericsson <Lars_Ericsson@telia.com> Date : 2009-09-16 20:44 (16 days old) References : http://lkml.org/lkml/2009/9/16/461 Handled-By : Frederik Deweerdt <frederik.deweerdt@xprog.eu> Patch : http://patchwork.kernel.org/patch/49914/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14137 Subject : usb console regressions Submitter : Jason Wessel <jason.wessel@windriver.com> Date : 2009-09-05 21:08 (27 days old) References : http://marc.info/?l=linux-kernel&m=125218501310512&w=4 Handled-By : Jason Wessel <jason.wessel@windriver.com> Patch : http://patchwork.kernel.org/patch/45953/ http://patchwork.kernel.org/patch/45952/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14017 Subject : _end symbol missing from Symbol.map Submitter : Hannes Reinecke <hare@suse.de> Date : 2009-08-13 6:45 (50 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=091e52c3551d3031343df24b573b770b4c6c72b6 References : http://marc.info/?l=linux-kernel&m=125014649102253&w=4 Handled-By : Hannes Reinecke <hare@suse.de> Patch : http://marc.info/?l=linux-kernel&m=125014649102253&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13948 Subject : ath5k broken after suspend-to-ram Submitter : Johannes Stezenbach <js@sig21.net> Date : 2009-08-07 21:51 (56 days old) References : http://marc.info/?l=linux-kernel&m=124968192727854&w=4 Handled-By : Nick Kossifidis <mickflemm@gmail.com> Patch : http://patchwork.kernel.org/patch/38550/ For details, please visit the bug entries and follow the links given in references. As you can see, there is a Bugzilla entry for each of the listed regressions. There also is a Bugzilla entry used for tracking the regressions introduced between 2.6.30 and 2.6.31, unresolved as well as resolved, at: http://bugzilla.kernel.org/show_bug.cgi?id=13615 Please let me know if there are any Bugzilla entries that should be added to the list in there. Thanks, Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13645] NULL pointer dereference at (null) (level2_spare_pgt) 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:53 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:53 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, poornima nayak This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13645 Subject : NULL pointer dereference at (null) (level2_spare_pgt) Submitter : poornima nayak <mpnayak@linux.vnet.ibm.com> Date : 2009-06-17 17:56 (107 days old) References : http://lkml.org/lkml/2009/6/17/194 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13733] 2.6.31-rc2: irq 16: nobody cared 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki 2009-10-01 19:53 ` [Bug #13645] NULL pointer dereference at (null) (level2_spare_pgt) Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki ` (46 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Niel Lambrechts This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13733 Subject : 2.6.31-rc2: irq 16: nobody cared Submitter : Niel Lambrechts <niel.lambrechts@gmail.com> Date : 2009-07-06 18:32 (88 days old) References : http://marc.info/?l=linux-kernel&m=124690524027166&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13836] suspend script fails, related to stdout? 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Tomas M. This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836 Subject : suspend script fails, related to stdout? Submitter : Tomas M. <tmezzadra@gmail.com> Date : 2009-07-17 21:24 (77 days old) References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13836] suspend script fails, related to stdout? @ 2009-10-01 19:55 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Tomas M. This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836 Subject : suspend script fails, related to stdout? Submitter : Tomas M. <tmezzadra-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Date : 2009-07-17 21:24 (77 days old) References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13809] oprofile: possible circular locking dependency detected 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (2 preceding siblings ...) 2009-10-01 19:55 ` Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13906] Huawei E169 GPRS connection causes Ooops Rafael J. Wysocki ` (44 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Jerome Marchand This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13809 Subject : oprofile: possible circular locking dependency detected Submitter : Jerome Marchand <jmarchan@redhat.com> Date : 2009-07-22 13:35 (72 days old) ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13906] Huawei E169 GPRS connection causes Ooops 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (3 preceding siblings ...) 2009-10-01 19:55 ` [Bug #13809] oprofile: possible circular locking dependency detected Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13935] 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) Rafael J. Wysocki ` (43 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Clemens Eisserer This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13906 Subject : Huawei E169 GPRS connection causes Ooops Submitter : Clemens Eisserer <linuxhippy@gmail.com> Date : 2009-08-04 09:02 (59 days old) ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13935] 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (4 preceding siblings ...) 2009-10-01 19:55 ` [Bug #13906] Huawei E169 GPRS connection causes Ooops Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-02 12:51 ` Jan Scholz 2009-10-02 15:58 ` Jiri Kosina 2009-10-01 19:55 ` [Bug #13940] iwlagn and sky2 stopped working, ACPI-related Rafael J. Wysocki ` (42 subsequent siblings) 48 siblings, 2 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Adrian Ulrich, Jan Scholz, Jiri Kosina This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935 Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) Submitter : Adrian Ulrich <kernel@blinkenlights.ch> Date : 2009-08-08 22:08 (55 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13935] 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) 2009-10-01 19:55 ` [Bug #13935] 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) Rafael J. Wysocki @ 2009-10-02 12:51 ` Jan Scholz 2009-10-02 15:58 ` Jiri Kosina 1 sibling, 0 replies; 384+ messages in thread From: Jan Scholz @ 2009-10-02 12:51 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Adrian Ulrich, Jan Scholz, Jiri Kosina Hi, for me this bug is fixed by: commit 42960a13001aa6df52ca9952ce996f94a744ea65 HID: completely remove apple mightymouse from blacklist Cheers, Jan "Rafael J. Wysocki" <rjw@sisk.pl> writes: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935 > Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) > Submitter : Adrian Ulrich <kernel@blinkenlights.ch> > Date : 2009-08-08 22:08 (55 days old) > First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343 > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13935] 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) @ 2009-10-02 12:51 ` Jan Scholz 0 siblings, 0 replies; 384+ messages in thread From: Jan Scholz @ 2009-10-02 12:51 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Adrian Ulrich, Jan Scholz, Jiri Kosina Hi, for me this bug is fixed by: commit 42960a13001aa6df52ca9952ce996f94a744ea65 HID: completely remove apple mightymouse from blacklist Cheers, Jan "Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org> writes: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935 > Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) > Submitter : Adrian Ulrich <kernel-4ZM2p5qjiQGewZBzVTKGGg@public.gmane.org> > Date : 2009-08-08 22:08 (55 days old) > First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343 > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13935] 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) 2009-10-01 19:55 ` [Bug #13935] 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) Rafael J. Wysocki @ 2009-10-02 15:58 ` Jiri Kosina 2009-10-02 15:58 ` Jiri Kosina 1 sibling, 0 replies; 384+ messages in thread From: Jiri Kosina @ 2009-10-02 15:58 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Adrian Ulrich, Jan Scholz On Thu, 1 Oct 2009, Rafael J. Wysocki wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935 > Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) > Submitter : Adrian Ulrich <kernel@blinkenlights.ch> > Date : 2009-08-08 22:08 (55 days old) > First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343 Fixed now in Linus' tree (42960a13) and submitted for stable. Please close. -- Jiri Kosina SUSE Labs, Novell Inc. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13935] 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) @ 2009-10-02 15:58 ` Jiri Kosina 0 siblings, 0 replies; 384+ messages in thread From: Jiri Kosina @ 2009-10-02 15:58 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Adrian Ulrich, Jan Scholz On Thu, 1 Oct 2009, Rafael J. Wysocki wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935 > Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) > Submitter : Adrian Ulrich <kernel-4ZM2p5qjiQGewZBzVTKGGg@public.gmane.org> > Date : 2009-08-08 22:08 (55 days old) > First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343 Fixed now in Linus' tree (42960a13) and submitted for stable. Please close. -- Jiri Kosina SUSE Labs, Novell Inc. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13935] 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) @ 2009-10-02 17:16 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-02 17:16 UTC (permalink / raw) To: Jiri Kosina Cc: Linux Kernel Mailing List, Kernel Testers List, Adrian Ulrich, Jan Scholz On Friday 02 October 2009, Jiri Kosina wrote: > On Thu, 1 Oct 2009, Rafael J. Wysocki wrote: > > > This message has been generated automatically as a part of a report > > of regressions introduced between 2.6.30 and 2.6.31. > > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.30 and 2.6.31. Please verify if it still should > > be listed and let me know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935 > > Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) > > Submitter : Adrian Ulrich <kernel@blinkenlights.ch> > > Date : 2009-08-08 22:08 (55 days old) > > First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343 > > Fixed now in Linus' tree (42960a13) and submitted for stable. Please > close. Done. Thanks, Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13935] 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) @ 2009-10-02 17:16 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-02 17:16 UTC (permalink / raw) To: Jiri Kosina Cc: Linux Kernel Mailing List, Kernel Testers List, Adrian Ulrich, Jan Scholz On Friday 02 October 2009, Jiri Kosina wrote: > On Thu, 1 Oct 2009, Rafael J. Wysocki wrote: > > > This message has been generated automatically as a part of a report > > of regressions introduced between 2.6.30 and 2.6.31. > > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.30 and 2.6.31. Please verify if it still should > > be listed and let me know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935 > > Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) > > Submitter : Adrian Ulrich <kernel-4ZM2p5qjiQGewZBzVTKGGg@public.gmane.org> > > Date : 2009-08-08 22:08 (55 days old) > > First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343 > > Fixed now in Linus' tree (42960a13) and submitted for stable. Please > close. Done. Thanks, Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13940] iwlagn and sky2 stopped working, ACPI-related 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (5 preceding siblings ...) 2009-10-01 19:55 ` [Bug #13935] 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki ` (41 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Ricardo Jorge da Fonseca Marques Ferreira This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 Subject : iwlagn and sky2 stopped working, ACPI-related Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net> Date : 2009-08-07 22:33 (56 days old) References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13869] Radeon framebuffer (w/o KMS) corruption at boot. 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Duncan This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13869 Subject : Radeon framebuffer (w/o KMS) corruption at boot. Submitter : Duncan <1i5t5.duncan@cox.net> Date : 2009-07-29 16:44 (65 days old) ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13869] Radeon framebuffer (w/o KMS) corruption at boot. @ 2009-10-01 19:55 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Duncan This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13869 Subject : Radeon framebuffer (w/o KMS) corruption at boot. Submitter : Duncan <1i5t5.duncan-j9pdmedNgrk@public.gmane.org> Date : 2009-07-29 16:44 (65 days old) ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (7 preceding siblings ...) 2009-10-01 19:55 ` Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-02 7:12 ` Fabio Comolli 2009-10-01 19:55 ` [Bug #13948] ath5k broken after suspend-to-ram Rafael J. Wysocki ` (39 subsequent siblings) 48 siblings, 1 reply; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Fabio Comolli, Luis R. Rodriguez This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943 Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k Submitter : Fabio Comolli <fabio.comolli@gmail.com> Date : 2009-08-06 20:15 (57 days old) References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k 2009-10-01 19:55 ` [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k Rafael J. Wysocki @ 2009-10-02 7:12 ` Fabio Comolli 0 siblings, 0 replies; 384+ messages in thread From: Fabio Comolli @ 2009-10-02 7:12 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Luis R. Rodriguez Hi. I suppose this is still valid. I had to work around it by rfkill-ing the device during the suspend process and reenabling at resume time. I can try to reproduce it with 2.6.31.1 if you want it. On Thu, Oct 1, 2009 at 9:55 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943 > Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k > Submitter : Fabio Comolli <fabio.comolli@gmail.com> > Date : 2009-08-06 20:15 (57 days old) > References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4 > > > ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k @ 2009-10-02 7:12 ` Fabio Comolli 0 siblings, 0 replies; 384+ messages in thread From: Fabio Comolli @ 2009-10-02 7:12 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Luis R. Rodriguez Hi. I suppose this is still valid. I had to work around it by rfkill-ing the device during the suspend process and reenabling at resume time. I can try to reproduce it with 2.6.31.1 if you want it. On Thu, Oct 1, 2009 at 9:55 PM, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943 > Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k > Submitter : Fabio Comolli <fabio.comolli@gmail.com> > Date : 2009-08-06 20:15 (57 days old) > References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4 > > > ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k @ 2009-10-02 17:17 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-02 17:17 UTC (permalink / raw) To: Fabio Comolli Cc: Linux Kernel Mailing List, Kernel Testers List, Luis R. Rodriguez On Friday 02 October 2009, Fabio Comolli wrote: > Hi. > I suppose this is still valid. I had to work around it by rfkill-ing > the device during the suspend process and reenabling at resume time. Thanks for the update. > I can try to reproduce it with 2.6.31.1 if you want it. In fact I'm more interested in whether or not it's still present in the Linus' tree. Thanks, Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k @ 2009-10-02 17:17 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-02 17:17 UTC (permalink / raw) To: Fabio Comolli Cc: Linux Kernel Mailing List, Kernel Testers List, Luis R. Rodriguez On Friday 02 October 2009, Fabio Comolli wrote: > Hi. > I suppose this is still valid. I had to work around it by rfkill-ing > the device during the suspend process and reenabling at resume time. Thanks for the update. > I can try to reproduce it with 2.6.31.1 if you want it. In fact I'm more interested in whether or not it's still present in the Linus' tree. Thanks, Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k @ 2009-10-02 21:37 ` Fabio Comolli 0 siblings, 0 replies; 384+ messages in thread From: Fabio Comolli @ 2009-10-02 21:37 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Luis R. Rodriguez Hi. On Fri, Oct 2, 2009 at 7:17 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote: > On Friday 02 October 2009, Fabio Comolli wrote: >> Hi. >> I suppose this is still valid. I had to work around it by rfkill-ing >> the device during the suspend process and reenabling at resume time. > > Thanks for the update. > >> I can try to reproduce it with 2.6.31.1 if you want it. > > In fact I'm more interested in whether or not it's still present in the Linus' > tree. > Well, I just tried 2.6.32-rc1-git3 and I have to say that it's mostly useless with my eeepc. The warning didn't show up after resume but it was impossible to reassociate with my AP and after some tentative the screen went blank. I was able to poweroff the netbook using the power button but I'm a little scared to try again. Maybe I'll try with -rc3 or something. > Thanks, > Rafael > Regards, Fabio ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k @ 2009-10-02 21:37 ` Fabio Comolli 0 siblings, 0 replies; 384+ messages in thread From: Fabio Comolli @ 2009-10-02 21:37 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Luis R. Rodriguez Hi. On Fri, Oct 2, 2009 at 7:17 PM, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote: > On Friday 02 October 2009, Fabio Comolli wrote: >> Hi. >> I suppose this is still valid. I had to work around it by rfkill-ing >> the device during the suspend process and reenabling at resume time. > > Thanks for the update. > >> I can try to reproduce it with 2.6.31.1 if you want it. > > In fact I'm more interested in whether or not it's still present in the Linus' > tree. > Well, I just tried 2.6.32-rc1-git3 and I have to say that it's mostly useless with my eeepc. The warning didn't show up after resume but it was impossible to reassociate with my AP and after some tentative the screen went blank. I was able to poweroff the netbook using the power button but I'm a little scared to try again. Maybe I'll try with -rc3 or something. > Thanks, > Rafael > Regards, Fabio ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k @ 2009-10-02 21:42 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-02 21:42 UTC (permalink / raw) To: Fabio Comolli Cc: Linux Kernel Mailing List, Kernel Testers List, Luis R. Rodriguez On Friday 02 October 2009, Fabio Comolli wrote: > Hi. > > On Fri, Oct 2, 2009 at 7:17 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote: > > On Friday 02 October 2009, Fabio Comolli wrote: > >> Hi. > >> I suppose this is still valid. I had to work around it by rfkill-ing > >> the device during the suspend process and reenabling at resume time. > > > > Thanks for the update. > > > >> I can try to reproduce it with 2.6.31.1 if you want it. > > > > In fact I'm more interested in whether or not it's still present in the Linus' > > tree. > > > > Well, I just tried 2.6.32-rc1-git3 and I have to say that it's mostly > useless with my eeepc. The warning didn't show up after resume but it > was impossible to reassociate with my AP and after some tentative the > screen went blank. > > I was able to poweroff the netbook using the power button but I'm a > little scared to try again. It shouldn't kill the system cold dead, so as long as you have your data backed up, you can do some debugging IMHO. > Maybe I'll try with -rc3 or something. I guess you should report the issues you have at the moment. Then, it's actually more likely that someone will take care of fixing them. Thanks, Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k @ 2009-10-02 21:42 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-02 21:42 UTC (permalink / raw) To: Fabio Comolli Cc: Linux Kernel Mailing List, Kernel Testers List, Luis R. Rodriguez On Friday 02 October 2009, Fabio Comolli wrote: > Hi. > > On Fri, Oct 2, 2009 at 7:17 PM, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote: > > On Friday 02 October 2009, Fabio Comolli wrote: > >> Hi. > >> I suppose this is still valid. I had to work around it by rfkill-ing > >> the device during the suspend process and reenabling at resume time. > > > > Thanks for the update. > > > >> I can try to reproduce it with 2.6.31.1 if you want it. > > > > In fact I'm more interested in whether or not it's still present in the Linus' > > tree. > > > > Well, I just tried 2.6.32-rc1-git3 and I have to say that it's mostly > useless with my eeepc. The warning didn't show up after resume but it > was impossible to reassociate with my AP and after some tentative the > screen went blank. > > I was able to poweroff the netbook using the power button but I'm a > little scared to try again. It shouldn't kill the system cold dead, so as long as you have your data backed up, you can do some debugging IMHO. > Maybe I'll try with -rc3 or something. I guess you should report the issues you have at the moment. Then, it's actually more likely that someone will take care of fixing them. Thanks, Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k @ 2009-10-03 13:36 ` Fabio Comolli 0 siblings, 0 replies; 384+ messages in thread From: Fabio Comolli @ 2009-10-03 13:36 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Luis R. Rodriguez Hi Rafael. On Fri, Oct 2, 2009 at 11:42 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote: > On Friday 02 October 2009, Fabio Comolli wrote: >> Hi. >> >> On Fri, Oct 2, 2009 at 7:17 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote: >> > On Friday 02 October 2009, Fabio Comolli wrote: >> >> Hi. >> >> I suppose this is still valid. I had to work around it by rfkill-ing >> >> the device during the suspend process and reenabling at resume time. >> > >> > Thanks for the update. >> > >> >> I can try to reproduce it with 2.6.31.1 if you want it. >> > >> > In fact I'm more interested in whether or not it's still present in the Linus' >> > tree. >> > >> >> Well, I just tried 2.6.32-rc1-git3 and I have to say that it's mostly >> useless with my eeepc. The warning didn't show up after resume but it >> was impossible to reassociate with my AP and after some tentative the >> screen went blank. >> >> I was able to poweroff the netbook using the power button but I'm a >> little scared to try again. > > It shouldn't kill the system cold dead, so as long as you have your data > backed up, you can do some debugging IMHO. > >> Maybe I'll try with -rc3 or something. > > I guess you should report the issues you have at the moment. Then, it's > actually more likely that someone will take care of fixing them. > OK. This is what I've been able to come up with so far: * with 2.6.31.x the warning shows up more or less every suspend-to-ram cycle; * with 2.6.32-rc the warning never shows up; * with 2.6.31.x when the warning shows up wifi is unusable until rfkill cycle; * whith 2.6.32-rc after suspend-to-ram cycle wifi is unusable and rfkill does not cure it unless I rfkill it off - suspend-to-ram - resume - rfkill it on. This seems to work. When wifi does not work in 2.6.32-rc the messages show: --------------------------------------------- [ 49.647233] wlan0: direct probe to AP xx:xx:xx:xx:xx:xx (try 1) [ 49.649234] wlan0: direct probe responded [ 49.649244] wlan0: authenticate with AP xx:xx:xx:xx:xx:xx (try 1) [ 49.650546] wlan0: authenticated [ 49.650581] wlan0: associate with AP xx:xx:xx:xx:xx:xx (try 1) [ 49.652183] wlan0: RX AssocResp from xx:xx:xx:xx:xx:xx (capab=0x451 status=12 aid=1) [ 49.652190] wlan0: AP denied association (code=12) --------------------------------------------- The script I feed to pm-utils to have wifi work across the suspend-to-ram cycle is just: --------------------------------------------- #!/bin/sh RFKILL=`basename /sys/devices/platform/eeepc/rfkill/*` case "$1" in hibernate|suspend) cat /sys/devices/platform/eeepc/rfkill/$RFKILL/state > /tmp/suspend echo 0 > /sys/devices/platform/eeepc/rfkill/$RFKILL/state ;; thaw|resume) cat /tmp/suspend > /sys/devices/platform/eeepc/rfkill/$RFKILL/state ;; *) exit $NA ;; esac --------------------------------------------- I can confirm that with 32-rc sometimes the screen flickers badly after resume, for example running a simple dmesg command. However, nothing is written in the logs, neither messages nor Xorg.0.log. Chipset is i915. Hope this helps. Please note that git is not an option for me on this machine. > Thanks, > Rafael Regards, Fabio > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k @ 2009-10-03 13:36 ` Fabio Comolli 0 siblings, 0 replies; 384+ messages in thread From: Fabio Comolli @ 2009-10-03 13:36 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Luis R. Rodriguez Hi Rafael. On Fri, Oct 2, 2009 at 11:42 PM, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote: > On Friday 02 October 2009, Fabio Comolli wrote: >> Hi. >> >> On Fri, Oct 2, 2009 at 7:17 PM, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote: >> > On Friday 02 October 2009, Fabio Comolli wrote: >> >> Hi. >> >> I suppose this is still valid. I had to work around it by rfkill-ing >> >> the device during the suspend process and reenabling at resume time. >> > >> > Thanks for the update. >> > >> >> I can try to reproduce it with 2.6.31.1 if you want it. >> > >> > In fact I'm more interested in whether or not it's still present in the Linus' >> > tree. >> > >> >> Well, I just tried 2.6.32-rc1-git3 and I have to say that it's mostly >> useless with my eeepc. The warning didn't show up after resume but it >> was impossible to reassociate with my AP and after some tentative the >> screen went blank. >> >> I was able to poweroff the netbook using the power button but I'm a >> little scared to try again. > > It shouldn't kill the system cold dead, so as long as you have your data > backed up, you can do some debugging IMHO. > >> Maybe I'll try with -rc3 or something. > > I guess you should report the issues you have at the moment. Then, it's > actually more likely that someone will take care of fixing them. > OK. This is what I've been able to come up with so far: * with 2.6.31.x the warning shows up more or less every suspend-to-ram cycle; * with 2.6.32-rc the warning never shows up; * with 2.6.31.x when the warning shows up wifi is unusable until rfkill cycle; * whith 2.6.32-rc after suspend-to-ram cycle wifi is unusable and rfkill does not cure it unless I rfkill it off - suspend-to-ram - resume - rfkill it on. This seems to work. When wifi does not work in 2.6.32-rc the messages show: --------------------------------------------- [ 49.647233] wlan0: direct probe to AP xx:xx:xx:xx:xx:xx (try 1) [ 49.649234] wlan0: direct probe responded [ 49.649244] wlan0: authenticate with AP xx:xx:xx:xx:xx:xx (try 1) [ 49.650546] wlan0: authenticated [ 49.650581] wlan0: associate with AP xx:xx:xx:xx:xx:xx (try 1) [ 49.652183] wlan0: RX AssocResp from xx:xx:xx:xx:xx:xx (capab=0x451 status=12 aid=1) [ 49.652190] wlan0: AP denied association (code=12) --------------------------------------------- The script I feed to pm-utils to have wifi work across the suspend-to-ram cycle is just: --------------------------------------------- #!/bin/sh RFKILL=`basename /sys/devices/platform/eeepc/rfkill/*` case "$1" in hibernate|suspend) cat /sys/devices/platform/eeepc/rfkill/$RFKILL/state > /tmp/suspend echo 0 > /sys/devices/platform/eeepc/rfkill/$RFKILL/state ;; thaw|resume) cat /tmp/suspend > /sys/devices/platform/eeepc/rfkill/$RFKILL/state ;; *) exit $NA ;; esac --------------------------------------------- I can confirm that with 32-rc sometimes the screen flickers badly after resume, for example running a simple dmesg command. However, nothing is written in the logs, neither messages nor Xorg.0.log. Chipset is i915. Hope this helps. Please note that git is not an option for me on this machine. > Thanks, > Rafael Regards, Fabio > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13948] ath5k broken after suspend-to-ram 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (8 preceding siblings ...) 2009-10-01 19:55 ` [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki ` (38 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Bob Copeland, Johannes Stezenbach, Nick Kossifidis This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13948 Subject : ath5k broken after suspend-to-ram Submitter : Johannes Stezenbach <js@sig21.net> Date : 2009-08-07 21:51 (56 days old) References : http://marc.info/?l=linux-kernel&m=124968192727854&w=4 Handled-By : Nick Kossifidis <mickflemm@gmail.com> Patch : http://patchwork.kernel.org/patch/38550/ ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13941] x86 Geode issue 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Al Viro, Martin-Éric Racine This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941 Subject : x86 Geode issue Submitter : Martin-Éric Racine <q-funk@iki.fi> Date : 2009-08-03 12:58 (60 days old) References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13941] x86 Geode issue @ 2009-10-01 19:55 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Al Viro, Martin-Éric Racine This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941 Subject : x86 Geode issue Submitter : Martin-Éric Racine <q-funk@iki.fi> Date : 2009-08-03 12:58 (60 days old) References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13942] Troubles with AoE and uninitialized object 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Bruno Prémont This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942 Subject : Troubles with AoE and uninitialized object Submitter : Bruno Prémont <bonbons@linux-vserver.org> Date : 2009-08-04 10:12 (59 days old) References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13942] Troubles with AoE and uninitialized object @ 2009-10-01 19:55 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Bruno Prémont This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942 Subject : Troubles with AoE and uninitialized object Submitter : Bruno Pr√©mont <bonbons@linux-vserver.org> Date : 2009-08-04 10:12 (59 days old) References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13942] Troubles with AoE and uninitialized object 2009-10-01 19:55 ` Rafael J. Wysocki @ 2009-10-02 19:36 ` Bruno Prémont -1 siblings, 0 replies; 384+ messages in thread From: Bruno Prémont @ 2009-10-02 19:36 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Kernel Testers List On Thu, 01 October 2009 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still > should be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942 > Subject : Troubles with AoE and uninitialized object > Submitter : Bruno Prémont <bonbons@linux-vserver.org> > Date : 2009-08-04 10:12 (59 days old) > References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4 This should have been fixed with commits: 18d8217bc441630c3c5ec7416c5a65c69e8a0979 aoe: end barrier bios with EOPNOTSUPP This addresses the trace on unmounting XFS 7135a71b19be1faf48b7148d77844d03bc0717d6 aoe: allocate unused request_queue for sysfs This addresses the NULL kobject part I think the second one made it into 2.6.31 but first one didn't, please double-check! I've not seen them on stable though (which might be worth especially for the first one) Bruno ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13942] Troubles with AoE and uninitialized object @ 2009-10-02 19:36 ` Bruno Prémont 0 siblings, 0 replies; 384+ messages in thread From: Bruno Prémont @ 2009-10-02 19:36 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Kernel Testers List On Thu, 01 October 2009 "Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org> wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still > should be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942 > Subject : Troubles with AoE and uninitialized object > Submitter : Bruno Pr√©mont <bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy@public.gmane.org> > Date : 2009-08-04 10:12 (59 days old) > References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4 This should have been fixed with commits: 18d8217bc441630c3c5ec7416c5a65c69e8a0979 aoe: end barrier bios with EOPNOTSUPP This addresses the trace on unmounting XFS 7135a71b19be1faf48b7148d77844d03bc0717d6 aoe: allocate unused request_queue for sysfs This addresses the NULL kobject part I think the second one made it into 2.6.31 but first one didn't, please double-check! I've not seen them on stable though (which might be worth especially for the first one) Bruno ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13942] Troubles with AoE and uninitialized object @ 2009-10-02 21:24 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-02 21:24 UTC (permalink / raw) To: Bruno Prémont; +Cc: Linux Kernel Mailing List, Kernel Testers List On Friday 02 October 2009, Bruno Prémont wrote: > On Thu, 01 October 2009 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > This message has been generated automatically as a part of a report > > of regressions introduced between 2.6.30 and 2.6.31. > > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.30 and 2.6.31. Please verify if it still > > should be listed and let me know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942 > > Subject : Troubles with AoE and uninitialized object > > Submitter : Bruno Prémont <bonbons@linux-vserver.org> > > Date : 2009-08-04 10:12 (59 days old) > > References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4 > > This should have been fixed with commits: > > 18d8217bc441630c3c5ec7416c5a65c69e8a0979 > aoe: end barrier bios with EOPNOTSUPP > > This addresses the trace on unmounting XFS > > > 7135a71b19be1faf48b7148d77844d03bc0717d6 > aoe: allocate unused request_queue for sysfs > > This addresses the NULL kobject part > > > I think the second one made it into 2.6.31 but first one didn't, Yes, it idid. > please double-check! I've not seen them on stable though (which might > be worth especially for the first one) Thanks, closed. Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13942] Troubles with AoE and uninitialized object @ 2009-10-02 21:24 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-02 21:24 UTC (permalink / raw) To: Bruno Prémont; +Cc: Linux Kernel Mailing List, Kernel Testers List On Friday 02 October 2009, Bruno Prémont wrote: > On Thu, 01 October 2009 "Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org> wrote: > > This message has been generated automatically as a part of a report > > of regressions introduced between 2.6.30 and 2.6.31. > > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.30 and 2.6.31. Please verify if it still > > should be listed and let me know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942 > > Subject : Troubles with AoE and uninitialized object > > Submitter : Bruno Prémont <bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy@public.gmane.org> > > Date : 2009-08-04 10:12 (59 days old) > > References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4 > > This should have been fixed with commits: > > 18d8217bc441630c3c5ec7416c5a65c69e8a0979 > aoe: end barrier bios with EOPNOTSUPP > > This addresses the trace on unmounting XFS > > > 7135a71b19be1faf48b7148d77844d03bc0717d6 > aoe: allocate unused request_queue for sysfs > > This addresses the NULL kobject part > > > I think the second one made it into 2.6.31 but first one didn't, Yes, it idid. > please double-check! I've not seen them on stable though (which might > be worth especially for the first one) Thanks, closed. Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13942] Troubles with AoE and uninitialized object 2009-10-01 19:55 ` Rafael J. Wysocki @ 2009-10-02 19:57 ` David Rientjes -1 siblings, 0 replies; 384+ messages in thread From: David Rientjes @ 2009-10-02 19:57 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, onbons, Ed Cashin, Jens Axboe [-- Attachment #1: Type: TEXT/PLAIN, Size: 706 bytes --] On Thu, 1 Oct 2009, Rafael J. Wysocki wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942 > Subject : Troubles with AoE and uninitialized object > Submitter : Bruno Prémont <bonbons@linux-vserver.org> > Date : 2009-08-04 10:12 (59 days old) > References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4 > This should be fixed with 18d8217 in Linus' tree. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13942] Troubles with AoE and uninitialized object @ 2009-10-02 19:57 ` David Rientjes 0 siblings, 0 replies; 384+ messages in thread From: David Rientjes @ 2009-10-02 19:57 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, onbons-ud5FBsm0p/yTtf5O82r8kh2eb7JE58TQ, Ed Cashin, Jens Axboe [-- Attachment #1: Type: TEXT/PLAIN, Size: 738 bytes --] On Thu, 1 Oct 2009, Rafael J. Wysocki wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942 > Subject : Troubles with AoE and uninitialized object > Submitter : Bruno Prémont <bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy@public.gmane.org> > Date : 2009-08-04 10:12 (59 days old) > References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4 > This should be fixed with 18d8217 in Linus' tree. ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13950] Oops when USB Serial disconnected while in use 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Alan Stern, Bruno Prémont This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950 Subject : Oops when USB Serial disconnected while in use Submitter : Bruno Prémont <bonbons@linux-vserver.org> Date : 2009-08-08 17:47 (55 days old) References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4 Handled-By : Alan Stern <stern@rowland.harvard.edu> ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13950] Oops when USB Serial disconnected while in use @ 2009-10-01 19:55 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Alan Stern, Bruno Prémont This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950 Subject : Oops when USB Serial disconnected while in use Submitter : Bruno Pr√©mont <bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy@public.gmane.org> Date : 2009-08-08 17:47 (55 days old) References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4 Handled-By : Alan Stern <stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz@public.gmane.org> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13950] Oops when USB Serial disconnected while in use 2009-10-01 19:55 ` Rafael J. Wysocki @ 2009-10-02 19:45 ` Bruno Prémont -1 siblings, 0 replies; 384+ messages in thread From: Bruno Prémont @ 2009-10-02 19:45 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Alan Stern On Thu, 01 October 2009 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still > should be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950 > Subject : Oops when USB Serial disconnected while in > use Submitter : Bruno Prémont <bonbons@linux-vserver.org> > Date : 2009-08-08 17:47 (55 days old) > References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4 > Handled-By : Alan Stern <stern@rowland.harvard.edu> This has been adressed with following commits: 41bd34ddd7aa46dbc03b5bb33896e0fa8100fe7b usb-serial: change referencing of port and serial structures f5b0953a89fa3407fb293cc54ead7d8feec489e4 usb-serial: put subroutines in logical order 8bc2c1b2daf95029658868cb1427baea2da87139 usb-serial: change logic of serial lookups cc56cd0157753c04a987888a2f793803df661a40 usb-serial: acquire references when a new tty is installed 7e29bb4b779f4f35385e6f21994758845bf14d23 usb-serial: fix termios initialization logic 74556123e034c8337b69a3ebac2f3a5fc0a97032 usb-serial: rename subroutines ff8324df1187b7280e507c976777df76c73a1ef1 usb-serial: add missing tests and debug lines 320348c8d5c9b591282633ddb8959b42f7fc7a1c usb-serial: straighten out serial_open They went into 2.6.32-rc1 and are probably queued for 2.6.31.2 stable. Bruno ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13950] Oops when USB Serial disconnected while in use @ 2009-10-02 19:45 ` Bruno Prémont 0 siblings, 0 replies; 384+ messages in thread From: Bruno Prémont @ 2009-10-02 19:45 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Alan Stern On Thu, 01 October 2009 "Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org> wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still > should be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950 > Subject : Oops when USB Serial disconnected while in > use Submitter : Bruno Pr√©mont <bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy@public.gmane.org> > Date : 2009-08-08 17:47 (55 days old) > References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4 > Handled-By : Alan Stern <stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz@public.gmane.org> This has been adressed with following commits: 41bd34ddd7aa46dbc03b5bb33896e0fa8100fe7b usb-serial: change referencing of port and serial structures f5b0953a89fa3407fb293cc54ead7d8feec489e4 usb-serial: put subroutines in logical order 8bc2c1b2daf95029658868cb1427baea2da87139 usb-serial: change logic of serial lookups cc56cd0157753c04a987888a2f793803df661a40 usb-serial: acquire references when a new tty is installed 7e29bb4b779f4f35385e6f21994758845bf14d23 usb-serial: fix termios initialization logic 74556123e034c8337b69a3ebac2f3a5fc0a97032 usb-serial: rename subroutines ff8324df1187b7280e507c976777df76c73a1ef1 usb-serial: add missing tests and debug lines 320348c8d5c9b591282633ddb8959b42f7fc7a1c usb-serial: straighten out serial_open They went into 2.6.32-rc1 and are probably queued for 2.6.31.2 stable. Bruno ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13950] Oops when USB Serial disconnected while in use @ 2009-10-02 21:26 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-02 21:26 UTC (permalink / raw) To: Bruno Prémont Cc: Linux Kernel Mailing List, Kernel Testers List, Alan Stern On Friday 02 October 2009, Bruno Prémont wrote: > On Thu, 01 October 2009 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > This message has been generated automatically as a part of a report > > of regressions introduced between 2.6.30 and 2.6.31. > > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.30 and 2.6.31. Please verify if it still > > should be listed and let me know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950 > > Subject : Oops when USB Serial disconnected while in > > use Submitter : Bruno Prémont <bonbons@linux-vserver.org> > > Date : 2009-08-08 17:47 (55 days old) > > References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4 > > Handled-By : Alan Stern <stern@rowland.harvard.edu> > > This has been adressed with following commits: > 41bd34ddd7aa46dbc03b5bb33896e0fa8100fe7b > usb-serial: change referencing of port and serial structures > > f5b0953a89fa3407fb293cc54ead7d8feec489e4 > usb-serial: put subroutines in logical order > > 8bc2c1b2daf95029658868cb1427baea2da87139 > usb-serial: change logic of serial lookups > > cc56cd0157753c04a987888a2f793803df661a40 > usb-serial: acquire references when a new tty is installed > > 7e29bb4b779f4f35385e6f21994758845bf14d23 > usb-serial: fix termios initialization logic > > 74556123e034c8337b69a3ebac2f3a5fc0a97032 > usb-serial: rename subroutines > > ff8324df1187b7280e507c976777df76c73a1ef1 > usb-serial: add missing tests and debug lines > > 320348c8d5c9b591282633ddb8959b42f7fc7a1c > usb-serial: straighten out serial_open > > They went into 2.6.32-rc1 and are probably queued for 2.6.31.2 stable. Thanks a lot for the detailed info, bug closed. Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13950] Oops when USB Serial disconnected while in use @ 2009-10-02 21:26 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-02 21:26 UTC (permalink / raw) To: Bruno Prémont Cc: Linux Kernel Mailing List, Kernel Testers List, Alan Stern On Friday 02 October 2009, Bruno Prémont wrote: > On Thu, 01 October 2009 "Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org> wrote: > > This message has been generated automatically as a part of a report > > of regressions introduced between 2.6.30 and 2.6.31. > > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.30 and 2.6.31. Please verify if it still > > should be listed and let me know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950 > > Subject : Oops when USB Serial disconnected while in > > use Submitter : Bruno Prémont <bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy@public.gmane.org> > > Date : 2009-08-08 17:47 (55 days old) > > References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4 > > Handled-By : Alan Stern <stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz@public.gmane.org> > > This has been adressed with following commits: > 41bd34ddd7aa46dbc03b5bb33896e0fa8100fe7b > usb-serial: change referencing of port and serial structures > > f5b0953a89fa3407fb293cc54ead7d8feec489e4 > usb-serial: put subroutines in logical order > > 8bc2c1b2daf95029658868cb1427baea2da87139 > usb-serial: change logic of serial lookups > > cc56cd0157753c04a987888a2f793803df661a40 > usb-serial: acquire references when a new tty is installed > > 7e29bb4b779f4f35385e6f21994758845bf14d23 > usb-serial: fix termios initialization logic > > 74556123e034c8337b69a3ebac2f3a5fc0a97032 > usb-serial: rename subroutines > > ff8324df1187b7280e507c976777df76c73a1ef1 > usb-serial: add missing tests and debug lines > > 320348c8d5c9b591282633ddb8959b42f7fc7a1c > usb-serial: straighten out serial_open > > They went into 2.6.32-rc1 and are probably queued for 2.6.31.2 stable. Thanks a lot for the detailed info, bug closed. Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13987] Received NMI interrupt at resume 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Christian Casteyde This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13987 Subject : Received NMI interrupt at resume Submitter : Christian Casteyde <casteyde.christian@free.fr> Date : 2009-08-15 07:55 (48 days old) ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13987] Received NMI interrupt at resume @ 2009-10-01 19:55 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Christian Casteyde This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13987 Subject : Received NMI interrupt at resume Submitter : Christian Casteyde <casteyde.christian-GANU6spQydw@public.gmane.org> Date : 2009-08-15 07:55 (48 days old) ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14017] _end symbol missing from Symbol.map 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (13 preceding siblings ...) 2009-10-01 19:55 ` Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki ` (33 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Hannes Reinecke This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14017 Subject : _end symbol missing from Symbol.map Submitter : Hannes Reinecke <hare@suse.de> Date : 2009-08-13 6:45 (50 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=091e52c3551d3031343df24b573b770b4c6c72b6 References : http://marc.info/?l=linux-kernel&m=125014649102253&w=4 Handled-By : Hannes Reinecke <hare@suse.de> Patch : http://marc.info/?l=linux-kernel&m=125014649102253&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14013] hd don't show up 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Tejun Heo, Tim Blechmann This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14013 Subject : hd don't show up Submitter : Tim Blechmann <tim@klingt.org> Date : 2009-08-14 8:26 (49 days old) References : http://marc.info/?l=linux-kernel&m=125023842514480&w=4 Handled-By : Tejun Heo <tj@kernel.org> ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14013] hd don't show up @ 2009-10-01 19:55 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Tejun Heo, Tim Blechmann This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14013 Subject : hd don't show up Submitter : Tim Blechmann <tim-xpEK/MU0Hawdnm+yROfE0A@public.gmane.org> Date : 2009-08-14 8:26 (49 days old) References : http://marc.info/?l=linux-kernel&m=125023842514480&w=4 Handled-By : Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14058] Oops in fsnotify 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Eric Paris, Grant Wilson This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14058 Subject : Oops in fsnotify Submitter : Grant Wilson <grant.wilson@zen.co.uk> Date : 2009-08-20 15:48 (43 days old) References : http://marc.info/?l=linux-kernel&m=125078450923133&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14058] Oops in fsnotify @ 2009-10-01 19:55 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Eric Paris, Grant Wilson This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14058 Subject : Oops in fsnotify Submitter : Grant Wilson <grant.wilson-1HOZaDBbGgxaa/9Udqfwiw@public.gmane.org> Date : 2009-08-20 15:48 (43 days old) References : http://marc.info/?l=linux-kernel&m=125078450923133&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14058] Oops in fsnotify 2009-10-01 19:55 ` Rafael J. Wysocki @ 2009-10-02 7:14 ` Jaswinder Singh Rajput -1 siblings, 0 replies; 384+ messages in thread From: Jaswinder Singh Rajput @ 2009-10-02 7:14 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Eric Paris, Grant Wilson, Andrew Morton, Andreas Dilger, Theodore Ts'o, paulmck Hello Grant, On Thu, 2009-10-01 at 21:55 +0200, Rafael J. Wysocki wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14058 > Subject : Oops in fsnotify > Submitter : Grant Wilson <grant.wilson@zen.co.uk> > Date : 2009-08-20 15:48 (43 days old) > References : http://marc.info/?l=linux-kernel&m=125078450923133&w=4 > > Are you still facing this oops in latest kernel. If yes, can you do git bisect and specify the commit. Thanks, -- JSR ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14058] Oops in fsnotify @ 2009-10-02 7:14 ` Jaswinder Singh Rajput 0 siblings, 0 replies; 384+ messages in thread From: Jaswinder Singh Rajput @ 2009-10-02 7:14 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Eric Paris, Grant Wilson, Andrew Morton, Andreas Dilger, Theodore Ts'o, paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8 Hello Grant, On Thu, 2009-10-01 at 21:55 +0200, Rafael J. Wysocki wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14058 > Subject : Oops in fsnotify > Submitter : Grant Wilson <grant.wilson-1HOZaDBbGgxaa/9Udqfwiw@public.gmane.org> > Date : 2009-08-20 15:48 (43 days old) > References : http://marc.info/?l=linux-kernel&m=125078450923133&w=4 > > Are you still facing this oops in latest kernel. If yes, can you do git bisect and specify the commit. Thanks, -- JSR ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14070] lockdep warning triggered by dup_fd 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (16 preceding siblings ...) 2009-10-01 19:55 ` Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki ` (30 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Bart Van Assche This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14070 Subject : lockdep warning triggered by dup_fd Submitter : Bart Van Assche <bart.vanassche@gmail.com> Date : 2009-08-23 09:36 (40 days old) References : http://lkml.org/lkml/2009/8/23/8 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14090] WARNING: at fs/notify/inotify/inotify_user.c:394 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Joerg Platte This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14090 Subject : WARNING: at fs/notify/inotify/inotify_user.c:394 Submitter : Joerg Platte <bugzilla@jako.ping.de> Date : 2009-08-30 15:21 (33 days old) ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14090] WARNING: at fs/notify/inotify/inotify_user.c:394 @ 2009-10-01 19:55 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Joerg Platte This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14090 Subject : WARNING: at fs/notify/inotify/inotify_user.c:394 Submitter : Joerg Platte <bugzilla-ilKWAAXSMVN6lmGzAMPh1A@public.gmane.org> Date : 2009-08-30 15:21 (33 days old) ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14133] WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (18 preceding siblings ...) 2009-10-01 19:55 ` Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-02 7:00 ` Jaswinder Singh Rajput 2009-10-01 19:55 ` [Bug #14114] Tuning a saa7134 based card is broken in kernel 2.6.31-rc7 Rafael J. Wysocki ` (28 subsequent siblings) 48 siblings, 1 reply; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Américo Wang, Jens Axboe This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14133 Subject : WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule Submitter : Jens Axboe <jens.axboe@oracle.com> Date : 2009-08-31 20:43 (32 days old) References : http://marc.info/?l=linux-kernel&m=125175143918050&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14133] WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule 2009-10-01 19:55 ` [Bug #14133] WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule Rafael J. Wysocki @ 2009-10-02 7:00 ` Jaswinder Singh Rajput 0 siblings, 0 replies; 384+ messages in thread From: Jaswinder Singh Rajput @ 2009-10-02 7:00 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Américo Wang, Jens Axboe, Ingo Molnar, x86 maintainers, Gautham R Shenoy Hello Jens, On Thu, 2009-10-01 at 21:55 +0200, Rafael J. Wysocki wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14133 > Subject : WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule > Submitter : Jens Axboe <jens.axboe@oracle.com> > Date : 2009-08-31 20:43 (32 days old) > References : http://marc.info/?l=linux-kernel&m=125175143918050&w=4 > > Are you still getting this warning in latest -tip. If yes, can you do git bisect and specify the commit. Thanks, -- JSR ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14133] WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule @ 2009-10-02 7:00 ` Jaswinder Singh Rajput 0 siblings, 0 replies; 384+ messages in thread From: Jaswinder Singh Rajput @ 2009-10-02 7:00 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Américo Wang, Jens Axboe, Ingo Molnar, x86 maintainers, Gautham R Shenoy Hello Jens, On Thu, 2009-10-01 at 21:55 +0200, Rafael J. Wysocki wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14133 > Subject : WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule > Submitter : Jens Axboe <jens.axboe-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> > Date : 2009-08-31 20:43 (32 days old) > References : http://marc.info/?l=linux-kernel&m=125175143918050&w=4 > > Are you still getting this warning in latest -tip. If yes, can you do git bisect and specify the commit. Thanks, -- JSR ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14133] WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule 2009-10-02 7:00 ` Jaswinder Singh Rajput (?) @ 2009-10-02 7:34 ` Jens Axboe 2009-10-02 17:21 ` Rafael J. Wysocki -1 siblings, 1 reply; 384+ messages in thread From: Jens Axboe @ 2009-10-02 7:34 UTC (permalink / raw) To: Jaswinder Singh Rajput Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Américo Wang, Ingo Molnar, x86 maintainers, Gautham R Shenoy On Fri, Oct 02 2009, Jaswinder Singh Rajput wrote: > Hello Jens, > > On Thu, 2009-10-01 at 21:55 +0200, Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of regressions introduced between 2.6.30 and 2.6.31. > > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.30 and 2.6.31. Please verify if it still should > > be listed and let me know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14133 > > Subject : WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule > > Submitter : Jens Axboe <jens.axboe@oracle.com> > > Date : 2009-08-31 20:43 (32 days old) > > References : http://marc.info/?l=linux-kernel&m=125175143918050&w=4 > > > > > > Are you still getting this warning in latest -tip. If yes, can you do > git bisect and specify the commit. Nope, it seems to have disappeared. -- Jens Axboe ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14133] WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule @ 2009-10-02 17:21 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-02 17:21 UTC (permalink / raw) To: Jens Axboe Cc: Jaswinder Singh Rajput, Linux Kernel Mailing List, Kernel Testers List, Américo Wang, Ingo Molnar, x86 maintainers, Gautham R Shenoy On Friday 02 October 2009, Jens Axboe wrote: > On Fri, Oct 02 2009, Jaswinder Singh Rajput wrote: > > Hello Jens, > > > > On Thu, 2009-10-01 at 21:55 +0200, Rafael J. Wysocki wrote: > > > This message has been generated automatically as a part of a report > > > of regressions introduced between 2.6.30 and 2.6.31. > > > > > > The following bug entry is on the current list of known regressions > > > introduced between 2.6.30 and 2.6.31. Please verify if it still should > > > be listed and let me know (either way). > > > > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14133 > > > Subject : WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule > > > Submitter : Jens Axboe <jens.axboe@oracle.com> > > > Date : 2009-08-31 20:43 (32 days old) > > > References : http://marc.info/?l=linux-kernel&m=125175143918050&w=4 > > > > > > > > > > Are you still getting this warning in latest -tip. If yes, can you do > > git bisect and specify the commit. > > Nope, it seems to have disappeared. OK, closed. Thanks, Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14133] WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule @ 2009-10-02 17:21 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-02 17:21 UTC (permalink / raw) To: Jens Axboe Cc: Jaswinder Singh Rajput, Linux Kernel Mailing List, Kernel Testers List, Américo Wang, Ingo Molnar, x86 maintainers, Gautham R Shenoy On Friday 02 October 2009, Jens Axboe wrote: > On Fri, Oct 02 2009, Jaswinder Singh Rajput wrote: > > Hello Jens, > > > > On Thu, 2009-10-01 at 21:55 +0200, Rafael J. Wysocki wrote: > > > This message has been generated automatically as a part of a report > > > of regressions introduced between 2.6.30 and 2.6.31. > > > > > > The following bug entry is on the current list of known regressions > > > introduced between 2.6.30 and 2.6.31. Please verify if it still should > > > be listed and let me know (either way). > > > > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14133 > > > Subject : WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule > > > Submitter : Jens Axboe <jens.axboe-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> > > > Date : 2009-08-31 20:43 (32 days old) > > > References : http://marc.info/?l=linux-kernel&m=125175143918050&w=4 > > > > > > > > > > Are you still getting this warning in latest -tip. If yes, can you do > > git bisect and specify the commit. > > Nope, it seems to have disappeared. OK, closed. Thanks, Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14114] Tuning a saa7134 based card is broken in kernel 2.6.31-rc7 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (19 preceding siblings ...) 2009-10-01 19:55 ` [Bug #14133] WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #14137] usb console regressions Rafael J. Wysocki ` (27 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Tsvety Petrov This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14114 Subject : Tuning a saa7134 based card is broken in kernel 2.6.31-rc7 Submitter : Tsvety Petrov <Tsvetoslav.Petrov@itron.com> Date : 2009-09-03 21:06 (29 days old) ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14137] usb console regressions 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (20 preceding siblings ...) 2009-10-01 19:55 ` [Bug #14114] Tuning a saa7134 based card is broken in kernel 2.6.31-rc7 Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki ` (26 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Jason Wessel This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14137 Subject : usb console regressions Submitter : Jason Wessel <jason.wessel@windriver.com> Date : 2009-09-05 21:08 (27 days old) References : http://marc.info/?l=linux-kernel&m=125218501310512&w=4 Handled-By : Jason Wessel <jason.wessel@windriver.com> Patch : http://patchwork.kernel.org/patch/45953/ http://patchwork.kernel.org/patch/45952/ ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14157] end_request: I/O error, dev cciss/cXdX, sector 0 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, jiri.harcarik This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14157 Subject : end_request: I/O error, dev cciss/cXdX, sector 0 Submitter : <jiri.harcarik@gmail.com> Date : 2009-09-11 07:42 (21 days old) ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14157] end_request: I/O error, dev cciss/cXdX, sector 0 @ 2009-10-01 19:55 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, jiri.harcarik-Re5JQEeQqe8AvxtiuMwx3w This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14157 Subject : end_request: I/O error, dev cciss/cXdX, sector 0 Submitter : <jiri.harcarik-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Date : 2009-09-11 07:42 (21 days old) ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14143] OOPS when setting nr_requests for md devices 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (22 preceding siblings ...) 2009-10-01 19:55 ` Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki ` (24 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, aCaB This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14143 Subject : OOPS when setting nr_requests for md devices Submitter : aCaB <acab@clamav.net> Date : 2009-09-08 08:48 (24 days old) ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14181] b43 causes panic at system shutdown 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Jeremy Huddleston This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14181 Subject : b43 causes panic at system shutdown Submitter : Jeremy Huddleston <jeremyhu@freedesktop.org> Date : 2009-09-15 18:34 (17 days old) ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14181] b43 causes panic at system shutdown @ 2009-10-01 19:55 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Jeremy Huddleston This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14181 Subject : b43 causes panic at system shutdown Submitter : Jeremy Huddleston <jeremyhu-CC+yJ3UmIYqDUpFQwHEjaQ@public.gmane.org> Date : 2009-09-15 18:34 (17 days old) ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Frans Pop, Pekka Enberg This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14141 Subject : order 2 page allocation failures in iwlagn Submitter : Frans Pop <elendil@planet.nl> Date : 2009-09-06 7:40 (26 days old) References : http://marc.info/?l=linux-kernel&m=125222287419691&w=4 Handled-By : Pekka Enberg <penberg@cs.helsinki.fi> ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-01 19:55 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Frans Pop, Pekka Enberg This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14141 Subject : order 2 page allocation failures in iwlagn Submitter : Frans Pop <elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org> Date : 2009-09-06 7:40 (26 days old) References : http://marc.info/?l=linux-kernel&m=125222287419691&w=4 Handled-By : Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-01 19:55 ` Rafael J. Wysocki @ 2009-10-02 9:11 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-02 9:11 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Mel Gorman, Bartlomiej Zolnierkiewicz On Thursday 01 October 2009, Rafael J. Wysocki wrote: > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14141 > Subject : order 2 page allocation failures in iwlagn > Submitter : Frans Pop <elendil@planet.nl> > Date : 2009-09-06 7:40 (26 days old) > References : http://marc.info/?l=linux-kernel&m=125222287419691&w=4 > Handled-By : Pekka Enberg <penberg@cs.helsinki.fi> I'm not sure about this. The error messages from failed allocations should now be a lot less as a result of this commit: commit f82a924cc88a5541df1d4b9d38a0968cd077a051 Author: Reinette Chatre <reinette.chatre@intel.com> Date: Thu Sep 17 10:43:56 2009 -0700 iwlwifi: reduce noise when skb allocation fails That commit is in mainline, and I'm not sure if it is important enough for a stable update (AFAICT it's not listed for 2.6.31.2). That commit is mostly cosmetic, but possibly the real regression is not in iwlagn but in the way memory is freed/defragmented. That aspect was also reported by Bartlomiej (#14016) and was extensively discussed (without a clear conclusion) here: http://lkml.org/lkml/2009/8/26/140. My own feeling is that Bartlomiej is correct and that something has changed since .29 and that on average we do have less higher order areas available after the system has been in use for some time, but I can't substantiate that. I do know that before .30 I had never seen the SKB allocation errors. Main problem is that it's hard to deliberately and reproducibly get the system in a state where the errors occur. I certainly do feel that the kernel should try to make sure higher order allocations remain possible during system use. They are not only needed shortly after boot: drivers can be loaded/unloaded at any time. OTOH Mel probably does have a point that really high order GFP_ATOMIC allocations by drivers make no sense [1]. Anyway, I have no problems with this BR being closed. Cheers, FJP [1] <20090921133704.GO12726@csn.ul.ie> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-02 9:11 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-02 9:11 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Mel Gorman, Bartlomiej Zolnierkiewicz On Thursday 01 October 2009, Rafael J. Wysocki wrote: > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14141 > Subject : order 2 page allocation failures in iwlagn > Submitter : Frans Pop <elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org> > Date : 2009-09-06 7:40 (26 days old) > References : http://marc.info/?l=linux-kernel&m=125222287419691&w=4 > Handled-By : Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org> I'm not sure about this. The error messages from failed allocations should now be a lot less as a result of this commit: commit f82a924cc88a5541df1d4b9d38a0968cd077a051 Author: Reinette Chatre <reinette.chatre-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> Date: Thu Sep 17 10:43:56 2009 -0700 iwlwifi: reduce noise when skb allocation fails That commit is in mainline, and I'm not sure if it is important enough for a stable update (AFAICT it's not listed for 2.6.31.2). That commit is mostly cosmetic, but possibly the real regression is not in iwlagn but in the way memory is freed/defragmented. That aspect was also reported by Bartlomiej (#14016) and was extensively discussed (without a clear conclusion) here: http://lkml.org/lkml/2009/8/26/140. My own feeling is that Bartlomiej is correct and that something has changed since .29 and that on average we do have less higher order areas available after the system has been in use for some time, but I can't substantiate that. I do know that before .30 I had never seen the SKB allocation errors. Main problem is that it's hard to deliberately and reproducibly get the system in a state where the errors occur. I certainly do feel that the kernel should try to make sure higher order allocations remain possible during system use. They are not only needed shortly after boot: drivers can be loaded/unloaded at any time. OTOH Mel probably does have a point that really high order GFP_ATOMIC allocations by drivers make no sense [1]. Anyway, I have no problems with this BR being closed. Cheers, FJP [1] <20090921133704.GO12726-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-02 9:32 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-02 9:32 UTC (permalink / raw) To: Frans Pop Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski On Fri, Oct 02, 2009 at 11:11:52AM +0200, Frans Pop wrote: > On Thursday 01 October 2009, Rafael J. Wysocki wrote: > > The following bug entry is on the current list of known regressions > > introduced between 2.6.30 and 2.6.31. Please verify if it still should > > be listed and let me know (either way). > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14141 > > Subject : order 2 page allocation failures in iwlagn > > Submitter : Frans Pop <elendil@planet.nl> > > Date : 2009-09-06 7:40 (26 days old) > > References : http://marc.info/?l=linux-kernel&m=125222287419691&w=4 > > Handled-By : Pekka Enberg <penberg@cs.helsinki.fi> > > I'm not sure about this. > > The error messages from failed allocations should now be a lot less as a > result of this commit: > commit f82a924cc88a5541df1d4b9d38a0968cd077a051 > Author: Reinette Chatre <reinette.chatre@intel.com> > Date: Thu Sep 17 10:43:56 2009 -0700 > iwlwifi: reduce noise when skb allocation fails > > That commit is in mainline, and I'm not sure if it is important enough for > a stable update (AFAICT it's not listed for 2.6.31.2). > > That commit is mostly cosmetic, but possibly the real regression is not in > iwlagn but in the way memory is freed/defragmented. That aspect was also > reported by Bartlomiej (#14016) and was extensively discussed (without a > clear conclusion) here: http://lkml.org/lkml/2009/8/26/140. > > My own feeling is that Bartlomiej is correct and that something has changed > since .29 and that on average we do have less higher order areas available > after the system has been in use for some time, but I can't substantiate > that. I do know that before .30 I had never seen the SKB allocation > errors. > > Main problem is that it's hard to deliberately and reproducibly get the > system in a state where the errors occur. > Apparently, Karol Lewandowski (cc added) has a reliable reproduction case for when the firmware loading problem occurs (http://lkml.org/lkml/2009/9/30/242). While it's not the same problem exactly, it's probable they're related. I'm hoping the problem commit can be identified by his bisection whenever he gets around to it. > I certainly do feel that the kernel should try to make sure higher order > allocations remain possible during system use. They are not only needed > shortly after boot: drivers can be loaded/unloaded at any time. OTOH Mel > probably does have a point that really high order GFP_ATOMIC allocations > by drivers make no sense [1]. > While they don't make sense, I accept that the problem is apparently occuring more now than it did so something has changed that is not obvious to normal testing. Hopefully Karol will be able to help us out. > Anyway, I have no problems with this BR being closed. > > Cheers, > FJP > > [1] <20090921133704.GO12726@csn.ul.ie> > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-02 9:32 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-02 9:32 UTC (permalink / raw) To: Frans Pop Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski On Fri, Oct 02, 2009 at 11:11:52AM +0200, Frans Pop wrote: > On Thursday 01 October 2009, Rafael J. Wysocki wrote: > > The following bug entry is on the current list of known regressions > > introduced between 2.6.30 and 2.6.31. Please verify if it still should > > be listed and let me know (either way). > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14141 > > Subject : order 2 page allocation failures in iwlagn > > Submitter : Frans Pop <elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org> > > Date : 2009-09-06 7:40 (26 days old) > > References : http://marc.info/?l=linux-kernel&m=125222287419691&w=4 > > Handled-By : Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org> > > I'm not sure about this. > > The error messages from failed allocations should now be a lot less as a > result of this commit: > commit f82a924cc88a5541df1d4b9d38a0968cd077a051 > Author: Reinette Chatre <reinette.chatre-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> > Date: Thu Sep 17 10:43:56 2009 -0700 > iwlwifi: reduce noise when skb allocation fails > > That commit is in mainline, and I'm not sure if it is important enough for > a stable update (AFAICT it's not listed for 2.6.31.2). > > That commit is mostly cosmetic, but possibly the real regression is not in > iwlagn but in the way memory is freed/defragmented. That aspect was also > reported by Bartlomiej (#14016) and was extensively discussed (without a > clear conclusion) here: http://lkml.org/lkml/2009/8/26/140. > > My own feeling is that Bartlomiej is correct and that something has changed > since .29 and that on average we do have less higher order areas available > after the system has been in use for some time, but I can't substantiate > that. I do know that before .30 I had never seen the SKB allocation > errors. > > Main problem is that it's hard to deliberately and reproducibly get the > system in a state where the errors occur. > Apparently, Karol Lewandowski (cc added) has a reliable reproduction case for when the firmware loading problem occurs (http://lkml.org/lkml/2009/9/30/242). While it's not the same problem exactly, it's probable they're related. I'm hoping the problem commit can be identified by his bisection whenever he gets around to it. > I certainly do feel that the kernel should try to make sure higher order > allocations remain possible during system use. They are not only needed > shortly after boot: drivers can be loaded/unloaded at any time. OTOH Mel > probably does have a point that really high order GFP_ATOMIC allocations > by drivers make no sense [1]. > While they don't make sense, I accept that the problem is apparently occuring more now than it did so something has changed that is not obvious to normal testing. Hopefully Karol will be able to help us out. > Anyway, I have no problems with this BR being closed. > > Cheers, > FJP > > [1] <20090921133704.GO12726-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-02 10:01 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-02 10:01 UTC (permalink / raw) To: Mel Gorman Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski On Friday 02 October 2009, Mel Gorman wrote: > On Fri, Oct 02, 2009 at 11:11:52AM +0200, Frans Pop wrote: > > My own feeling is that Bartlomiej is correct and that something has > > changed since .29 and that on average we do have less higher order > > areas available after the system has been in use for some time, but I > > can't substantiate that. I do know that before .30 I had never seen > > the SKB allocation errors. > > > > Main problem is that it's hard to deliberately and reproducibly get > > the system in a state where the errors occur. > > Apparently, Karol Lewandowski (cc added) has a reliable > reproduction case for when the firmware loading problem occurs > (http://lkml.org/lkml/2009/9/30/242). While it's not the same problem > exactly, it's probable they're related. I'm hoping the problem commit > can be identified by his bisection whenever he gets around to it. That does indeed look like a third independent report for basically the same issue. > [...], I accept that the problem is apparently occuring more now than it > did so something has changed that is not obvious to normal testing. Cool, that's progress ;-) ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-02 10:01 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-02 10:01 UTC (permalink / raw) To: Mel Gorman Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski On Friday 02 October 2009, Mel Gorman wrote: > On Fri, Oct 02, 2009 at 11:11:52AM +0200, Frans Pop wrote: > > My own feeling is that Bartlomiej is correct and that something has > > changed since .29 and that on average we do have less higher order > > areas available after the system has been in use for some time, but I > > can't substantiate that. I do know that before .30 I had never seen > > the SKB allocation errors. > > > > Main problem is that it's hard to deliberately and reproducibly get > > the system in a state where the errors occur. > > Apparently, Karol Lewandowski (cc added) has a reliable > reproduction case for when the firmware loading problem occurs > (http://lkml.org/lkml/2009/9/30/242). While it's not the same problem > exactly, it's probable they're related. I'm hoping the problem commit > can be identified by his bisection whenever he gets around to it. That does indeed look like a third independent report for basically the same issue. > [...], I accept that the problem is apparently occuring more now than it > did so something has changed that is not obvious to normal testing. Cool, that's progress ;-) ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-02 20:01 ` Karol Lewandowski 0 siblings, 0 replies; 384+ messages in thread From: Karol Lewandowski @ 2009-10-02 20:01 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski On Fri, Oct 02, 2009 at 10:32:26AM +0100, Mel Gorman wrote: > On Fri, Oct 02, 2009 at 11:11:52AM +0200, Frans Pop wrote: > > My own feeling is that Bartlomiej is correct and that something has changed > > since .29 and that on average we do have less higher order areas available > > after the system has been in use for some time, but I can't substantiate > > that. I do know that before .30 I had never seen the SKB allocation > > errors. > > > > Main problem is that it's hard to deliberately and reproducibly get the > > system in a state where the errors occur. > > > > Apparently, Karol Lewandowski (cc added) has a reliable > reproduction case for when the firmware loading problem occurs > (http://lkml.org/lkml/2009/9/30/242). While it's not the same problem exactly, > it's probable they're related. I'm hoping the problem commit can be identified > by his bisection whenever he gets around to it. Unfortunately, I've had little success with bisecting this problem. I've spend fair amount of time today trying to reproduce this problem, but I'm unable to do so even on kernels that failed "easily" before. Nothing has changed in hardware/software. I've changed methodology somewhat from suspend-and-look-for-failure-on-resume to rmmod, fill memory, modprobe-and-see-it-fail... but well, few days ago it failed in either case. Damn. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-02 20:01 ` Karol Lewandowski 0 siblings, 0 replies; 384+ messages in thread From: Karol Lewandowski @ 2009-10-02 20:01 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski On Fri, Oct 02, 2009 at 10:32:26AM +0100, Mel Gorman wrote: > On Fri, Oct 02, 2009 at 11:11:52AM +0200, Frans Pop wrote: > > My own feeling is that Bartlomiej is correct and that something has changed > > since .29 and that on average we do have less higher order areas available > > after the system has been in use for some time, but I can't substantiate > > that. I do know that before .30 I had never seen the SKB allocation > > errors. > > > > Main problem is that it's hard to deliberately and reproducibly get the > > system in a state where the errors occur. > > > > Apparently, Karol Lewandowski (cc added) has a reliable > reproduction case for when the firmware loading problem occurs > (http://lkml.org/lkml/2009/9/30/242). While it's not the same problem exactly, > it's probable they're related. I'm hoping the problem commit can be identified > by his bisection whenever he gets around to it. Unfortunately, I've had little success with bisecting this problem. I've spend fair amount of time today trying to reproduce this problem, but I'm unable to do so even on kernels that failed "easily" before. Nothing has changed in hardware/software. I've changed methodology somewhat from suspend-and-look-for-failure-on-resume to rmmod, fill memory, modprobe-and-see-it-fail... but well, few days ago it failed in either case. Damn. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-02 20:01 ` Karol Lewandowski (?) @ 2009-10-04 19:28 ` Karol Lewandowski -1 siblings, 0 replies; 384+ messages in thread From: Karol Lewandowski @ 2009-10-04 19:28 UTC (permalink / raw) To: Karol Lewandowski Cc: Mel Gorman, Frans Pop, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz On Fri, Oct 02, 2009 at 10:01:43PM +0200, Karol Lewandowski wrote: > On Fri, Oct 02, 2009 at 10:32:26AM +0100, Mel Gorman wrote: > > Apparently, Karol Lewandowski (cc added) has a reliable > > reproduction case for when the firmware loading problem occurs > > (http://lkml.org/lkml/2009/9/30/242). While it's not the same problem exactly, > > it's probable they're related. I'm hoping the problem commit can be identified > > by his bisection whenever he gets around to it. > > Unfortunately, I've had little success with bisecting this problem. > I've spend fair amount of time today trying to reproduce this problem, > but I'm unable to do so even on kernels that failed "easily" before. I've been able to reproduce this problem on 2.6.32-rc1. No "luck" with bisecting, though. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-02 9:11 ` Frans Pop @ 2009-10-05 5:13 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-05 5:13 UTC (permalink / raw) To: Mel Gorman Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Friday 02 October 2009, Frans Pop wrote: > On Thursday 01 October 2009, Rafael J. Wysocki wrote: > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14141 > > Subject : order 2 page allocation failures in iwlagn > > Submitter : Frans Pop <elendil@planet.nl> > > Date : 2009-09-06 7:40 (26 days old) > > References : http://marc.info/?l=linux-kernel&m=125222287419691&w=4 > > Handled-By : Pekka Enberg <penberg@cs.helsinki.fi> > > I'm not sure about this. > > The error messages from failed allocations should now be a lot less as a > result of this commit: > commit f82a924cc88a5541df1d4b9d38a0968cd077a051 > Author: Reinette Chatre <reinette.chatre@intel.com> > Date: Thu Sep 17 10:43:56 2009 -0700 > iwlwifi: reduce noise when skb allocation fails > > That commit is in mainline, and I'm not sure if it is important enough > for a stable update (AFAICT it's not listed for 2.6.31.2). > > That commit is mostly cosmetic, but possibly the real regression is not > in iwlagn but in the way memory is freed/defragmented. That aspect was > also reported by Bartlomiej (#14016) and was extensively discussed > (without a clear conclusion) here: http://lkml.org/lkml/2009/8/26/140. I may be getting somewhere with this. I just got the allocation failures included below on .32-rc3. Note that these are not the "fixable" failures that got suppressed with the commit referenced above, but the "this could affect networking" failures that are still reported. What I was doing when I got them is also interesting: - a kernel build - a gitk for the kernel tree (with full history this uses ~40% of memory) - by mistake I then started a _second_ gitk The second gitk (which shows as 'wish8.5' in top) caused a massive swap out which brought the system to a standstill for a while (with huge latencies as well, including a completely stuck mouse cursor, which happens rarely). The system has 2GB RAM + 2GB swap, so IIUC there is no danger of getting into an OOM as the first gitk can be swapped out completely. I'll dig into this a bit more as it looks like this should be reproducible, probably even without the kernel build. Next step is to see how .30 behaves in the same situation. Even if it is reproducible with .30, I wonder if the kernel shouldn't be more robust in this situation. Currently it seems to allow one single process to claim so much memory before swapping out that "normal" operation of other processes is affected. I can understand that such a situation may be hard to avoid on a very busy system where multiple processes start claiming (a lot of) memory at roughly the same time, but I'd say it should be avoidable if a single process is the culprit. BTW, the system recovered completely, although that took some time (the first gitk remained visible in top long after I closed its window; I think because the system was busy swapping it back in before terminating it). Cheers, FJP kcryptd: page allocation failure. order:2, mode:0x4020 Pid: 1483, comm: kcryptd Not tainted 2.6.32-rc3 #22 Call Trace: <IRQ> [<ffffffff8107c3d5>] __alloc_pages_nodemask+0x5a2/0x5ec [<ffffffff81264892>] ? _spin_unlock+0x9/0xb [<ffffffff811e73cd>] ? __alloc_skb+0x3c/0x15b [<ffffffffa03202cb>] ? iwl_rx_allocate+0x8f/0x305 [iwlcore] [<ffffffff8107c431>] __get_free_pages+0x12/0x41 [<ffffffff8109cb1a>] __kmalloc_track_caller+0x3b/0xed [<ffffffff811e73f7>] __alloc_skb+0x66/0x15b [<ffffffffa03202cb>] iwl_rx_allocate+0x8f/0x305 [iwlcore] [<ffffffffa0320557>] iwl_rx_replenish_now+0x16/0x23 [iwlcore] [<ffffffffa035c0c8>] iwl_rx_handle+0x3a8/0x3c1 [iwlagn] [<ffffffff81051add>] ? sched_clock_local+0x1c/0x80 [<ffffffffa035c60d>] iwl_irq_tasklet_legacy+0x52c/0x7a4 [iwlagn] [<ffffffffa0317aaf>] ? __iwl_read32+0xa5/0xb4 [iwlcore] [<ffffffff8103efb8>] tasklet_action+0x71/0xbc [<ffffffff8103f837>] __do_softirq+0x96/0x11b [<ffffffff8100cabc>] call_softirq+0x1c/0x28 [<ffffffff8100e5ef>] do_softirq+0x33/0x6b [<ffffffff8103f5c5>] irq_exit+0x36/0x75 [<ffffffff8100dcf1>] do_IRQ+0xa3/0xba [<ffffffff8100c353>] ret_from_intr+0x0/0xa <EOI> [<ffffffff811199dd>] ? scatterwalk_start+0x11/0x19 [<ffffffff8111bbca>] ? blkcipher_walk_first+0x173/0x196 [<ffffffff8111b67b>] ? blkcipher_walk_done+0xe6/0x1b8 [<ffffffff8111bc35>] ? blkcipher_walk_virt+0x1a/0x1d [<ffffffffa02001cf>] ? crypto_cbc_encrypt+0x43/0x18e [cbc] [<ffffffff81127efd>] ? blk_recount_segments+0x1b/0x2c [<ffffffffa021371e>] ? aes_encrypt+0x0/0xf [aes_x86_64] [<ffffffff8111af64>] ? async_encrypt+0x38/0x3a [<ffffffffa01f7b54>] ? crypt_convert+0x1f9/0x28b [dm_crypt] [<ffffffffa01f8009>] ? kcryptd_crypt+0x423/0x449 [dm_crypt] [<ffffffffa01f7be6>] ? kcryptd_crypt+0x0/0x449 [dm_crypt] [<ffffffff81049bfd>] ? worker_thread+0x146/0x1d8 [<ffffffff8104d706>] ? autoremove_wake_function+0x0/0x38 [<ffffffff81049ab7>] ? worker_thread+0x0/0x1d8 [<ffffffff8104d3f4>] ? kthread+0x7d/0x85 [<ffffffff8100c9ba>] ? child_rip+0xa/0x20 [<ffffffff8104d377>] ? kthread+0x0/0x85 [<ffffffff8100c9b0>] ? child_rip+0x0/0x20 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 171 CPU 1: hi: 186, btch: 31 usd: 177 active_anon:298532 inactive_anon:100163 isolated_anon:52 active_file:3993 inactive_file:4001 isolated_file:12 unevictable:399 dirty:0 writeback:76102 unstable:0 buffer:125 free:14107 slab_reclaimable:4510 slab_unreclaimable:20421 mapped:7949 shmem:0 pagetables:4437 bounce:0 DMA free:7928kB min:40kB low:48kB high:60kB active_anon:3340kB inactive_anon:3608kB active_file:384kB inactive_file:472kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15336kB mlocked:0kB dirty:0kB writeback:80kB mapped:256kB shmem:0kB slab_reclaimable:12kB slab_unreclaimable:104kB kernel_stack:0kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 1976 1976 1976 DMA32 free:48500kB min:5664kB low:7080kB high:8496kB active_anon:1190788kB inactive_anon:397044kB active_file:15588kB inactive_file:15532kB unevictable:1596kB isolated(anon):208kB isolated(file):48kB present:2023748kB mlocked:1596kB dirty:0kB writeback:304328kB mapped:31540kB shmem:0kB slab_reclaimable:18028kB slab_unreclaimable:81496kB kernel_stack:1672kB pagetables:17732kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 19*4kB 13*8kB 3*16kB 7*32kB 11*64kB 11*128kB 5*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 7940kB DMA32: 9299*4kB 1341*8kB 4*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 48500kB 98572 total pagecache pages 90213 pages in swap cache Swap cache stats: add 175874, delete 85661, find 7850/8731 Free swap = 1425944kB Total swap = 2097144kB 518064 pages RAM 10350 pages reserved 82388 pages shared 437481 pages non-shared iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. swapper: page allocation failure. order:2, mode:0x4020 Pid: 0, comm: swapper Not tainted 2.6.32-rc3 #22 Call Trace: <IRQ> [<ffffffff8107c3d5>] __alloc_pages_nodemask+0x5a2/0x5ec [<ffffffff81264892>] ? _spin_unlock+0x9/0xb [<ffffffff811e73cd>] ? __alloc_skb+0x3c/0x15b [<ffffffffa03202cb>] ? iwl_rx_allocate+0x8f/0x305 [iwlcore] [<ffffffff8107c431>] __get_free_pages+0x12/0x41 [<ffffffff8109cb1a>] __kmalloc_track_caller+0x3b/0xed [<ffffffff811e73f7>] __alloc_skb+0x66/0x15b [<ffffffffa03202cb>] iwl_rx_allocate+0x8f/0x305 [iwlcore] [<ffffffffa0320557>] iwl_rx_replenish_now+0x16/0x23 [iwlcore] [<ffffffffa035c0c8>] iwl_rx_handle+0x3a8/0x3c1 [iwlagn] [<ffffffffa035c60d>] iwl_irq_tasklet_legacy+0x52c/0x7a4 [iwlagn] [<ffffffffa0317aaf>] ? __iwl_read32+0xa5/0xb4 [iwlcore] [<ffffffff8103efb8>] tasklet_action+0x71/0xbc [<ffffffff8103f837>] __do_softirq+0x96/0x11b [<ffffffff8100cabc>] call_softirq+0x1c/0x28 [<ffffffff8100e5ef>] do_softirq+0x33/0x6b [<ffffffff8103f5c5>] irq_exit+0x36/0x75 [<ffffffff8100dcf1>] do_IRQ+0xa3/0xba [<ffffffff8100c353>] ret_from_intr+0x0/0xa <EOI> [<ffffffffa0278ec7>] ? acpi_idle_enter_simple+0xf9/0x127 [processor] [<ffffffffa0278ebd>] ? acpi_idle_enter_simple+0xef/0x127 [processor] [<ffffffff811da545>] ? cpuidle_idle_call+0x8c/0xc7 [<ffffffff8100ae2e>] ? cpu_idle+0x55/0x8d [<ffffffff8125432d>] ? rest_init+0x61/0x63 [<ffffffff81436c3e>] ? start_kernel+0x348/0x353 [<ffffffff8143629a>] ? x86_64_start_reservations+0xaa/0xae [<ffffffff8143637f>] ? x86_64_start_kernel+0xe1/0xe8 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 171 CPU 1: hi: 186, btch: 31 usd: 155 active_anon:297901 inactive_anon:99948 isolated_anon:52 active_file:3920 inactive_file:3948 isolated_file:12 unevictable:399 dirty:0 writeback:34634 unstable:0 buffer:125 free:23390 slab_reclaimable:4510 slab_unreclaimable:11714 mapped:7819 shmem:0 pagetables:4437 bounce:0 DMA free:7908kB min:40kB low:48kB high:60kB active_anon:3340kB inactive_anon:3608kB active_file:384kB inactive_file:472kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15336kB mlocked:0kB dirty:0kB writeback:36kB mapped:256kB shmem:0kB slab_reclaimable:12kB slab_unreclaimable:12kB kernel_stack:0kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 1976 1976 1976 DMA32 free:85652kB min:5664kB low:7080kB high:8496kB active_anon:1188264kB inactive_anon:396184kB active_file:15296kB inactive_file:15320kB unevictable:1596kB isolated(anon):208kB isolated(file):48kB present:2023748kB mlocked:1596kB dirty:0kB writeback:138500kB mapped:31020kB shmem:0kB slab_reclaimable:18028kB slab_unreclaimable:46844kB kernel_stack:1672kB pagetables:17732kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 17*4kB 12*8kB 4*16kB 6*32kB 11*64kB 11*128kB 5*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 7908kB DMA32: 12419*4kB 4439*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 85652kB 97616 total pagecache pages 89394 pages in swap cache Swap cache stats: add 175906, delete 86512, find 7850/8733 Free swap = 1425864kB Total swap = 2097144kB 518064 pages RAM 10350 pages reserved 82282 pages shared 428383 pages non-shared iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-05 5:13 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-05 5:13 UTC (permalink / raw) To: Mel Gorman Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Friday 02 October 2009, Frans Pop wrote: > On Thursday 01 October 2009, Rafael J. Wysocki wrote: > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14141 > > Subject : order 2 page allocation failures in iwlagn > > Submitter : Frans Pop <elendil@planet.nl> > > Date : 2009-09-06 7:40 (26 days old) > > References : http://marc.info/?l=linux-kernel&m=125222287419691&w=4 > > Handled-By : Pekka Enberg <penberg@cs.helsinki.fi> > > I'm not sure about this. > > The error messages from failed allocations should now be a lot less as a > result of this commit: > commit f82a924cc88a5541df1d4b9d38a0968cd077a051 > Author: Reinette Chatre <reinette.chatre@intel.com> > Date: Thu Sep 17 10:43:56 2009 -0700 > iwlwifi: reduce noise when skb allocation fails > > That commit is in mainline, and I'm not sure if it is important enough > for a stable update (AFAICT it's not listed for 2.6.31.2). > > That commit is mostly cosmetic, but possibly the real regression is not > in iwlagn but in the way memory is freed/defragmented. That aspect was > also reported by Bartlomiej (#14016) and was extensively discussed > (without a clear conclusion) here: http://lkml.org/lkml/2009/8/26/140. I may be getting somewhere with this. I just got the allocation failures included below on .32-rc3. Note that these are not the "fixable" failures that got suppressed with the commit referenced above, but the "this could affect networking" failures that are still reported. What I was doing when I got them is also interesting: - a kernel build - a gitk for the kernel tree (with full history this uses ~40% of memory) - by mistake I then started a _second_ gitk The second gitk (which shows as 'wish8.5' in top) caused a massive swap out which brought the system to a standstill for a while (with huge latencies as well, including a completely stuck mouse cursor, which happens rarely). The system has 2GB RAM + 2GB swap, so IIUC there is no danger of getting into an OOM as the first gitk can be swapped out completely. I'll dig into this a bit more as it looks like this should be reproducible, probably even without the kernel build. Next step is to see how .30 behaves in the same situation. Even if it is reproducible with .30, I wonder if the kernel shouldn't be more robust in this situation. Currently it seems to allow one single process to claim so much memory before swapping out that "normal" operation of other processes is affected. I can understand that such a situation may be hard to avoid on a very busy system where multiple processes start claiming (a lot of) memory at roughly the same time, but I'd say it should be avoidable if a single process is the culprit. BTW, the system recovered completely, although that took some time (the first gitk remained visible in top long after I closed its window; I think because the system was busy swapping it back in before terminating it). Cheers, FJP kcryptd: page allocation failure. order:2, mode:0x4020 Pid: 1483, comm: kcryptd Not tainted 2.6.32-rc3 #22 Call Trace: <IRQ> [<ffffffff8107c3d5>] __alloc_pages_nodemask+0x5a2/0x5ec [<ffffffff81264892>] ? _spin_unlock+0x9/0xb [<ffffffff811e73cd>] ? __alloc_skb+0x3c/0x15b [<ffffffffa03202cb>] ? iwl_rx_allocate+0x8f/0x305 [iwlcore] [<ffffffff8107c431>] __get_free_pages+0x12/0x41 [<ffffffff8109cb1a>] __kmalloc_track_caller+0x3b/0xed [<ffffffff811e73f7>] __alloc_skb+0x66/0x15b [<ffffffffa03202cb>] iwl_rx_allocate+0x8f/0x305 [iwlcore] [<ffffffffa0320557>] iwl_rx_replenish_now+0x16/0x23 [iwlcore] [<ffffffffa035c0c8>] iwl_rx_handle+0x3a8/0x3c1 [iwlagn] [<ffffffff81051add>] ? sched_clock_local+0x1c/0x80 [<ffffffffa035c60d>] iwl_irq_tasklet_legacy+0x52c/0x7a4 [iwlagn] [<ffffffffa0317aaf>] ? __iwl_read32+0xa5/0xb4 [iwlcore] [<ffffffff8103efb8>] tasklet_action+0x71/0xbc [<ffffffff8103f837>] __do_softirq+0x96/0x11b [<ffffffff8100cabc>] call_softirq+0x1c/0x28 [<ffffffff8100e5ef>] do_softirq+0x33/0x6b [<ffffffff8103f5c5>] irq_exit+0x36/0x75 [<ffffffff8100dcf1>] do_IRQ+0xa3/0xba [<ffffffff8100c353>] ret_from_intr+0x0/0xa <EOI> [<ffffffff811199dd>] ? scatterwalk_start+0x11/0x19 [<ffffffff8111bbca>] ? blkcipher_walk_first+0x173/0x196 [<ffffffff8111b67b>] ? blkcipher_walk_done+0xe6/0x1b8 [<ffffffff8111bc35>] ? blkcipher_walk_virt+0x1a/0x1d [<ffffffffa02001cf>] ? crypto_cbc_encrypt+0x43/0x18e [cbc] [<ffffffff81127efd>] ? blk_recount_segments+0x1b/0x2c [<ffffffffa021371e>] ? aes_encrypt+0x0/0xf [aes_x86_64] [<ffffffff8111af64>] ? async_encrypt+0x38/0x3a [<ffffffffa01f7b54>] ? crypt_convert+0x1f9/0x28b [dm_crypt] [<ffffffffa01f8009>] ? kcryptd_crypt+0x423/0x449 [dm_crypt] [<ffffffffa01f7be6>] ? kcryptd_crypt+0x0/0x449 [dm_crypt] [<ffffffff81049bfd>] ? worker_thread+0x146/0x1d8 [<ffffffff8104d706>] ? autoremove_wake_function+0x0/0x38 [<ffffffff81049ab7>] ? worker_thread+0x0/0x1d8 [<ffffffff8104d3f4>] ? kthread+0x7d/0x85 [<ffffffff8100c9ba>] ? child_rip+0xa/0x20 [<ffffffff8104d377>] ? kthread+0x0/0x85 [<ffffffff8100c9b0>] ? child_rip+0x0/0x20 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 171 CPU 1: hi: 186, btch: 31 usd: 177 active_anon:298532 inactive_anon:100163 isolated_anon:52 active_file:3993 inactive_file:4001 isolated_file:12 unevictable:399 dirty:0 writeback:76102 unstable:0 buffer:125 free:14107 slab_reclaimable:4510 slab_unreclaimable:20421 mapped:7949 shmem:0 pagetables:4437 bounce:0 DMA free:7928kB min:40kB low:48kB high:60kB active_anon:3340kB inactive_anon:3608kB active_file:384kB inactive_file:472kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15336kB mlocked:0kB dirty:0kB writeback:80kB mapped:256kB shmem:0kB slab_reclaimable:12kB slab_unreclaimable:104kB kernel_stack:0kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 1976 1976 1976 DMA32 free:48500kB min:5664kB low:7080kB high:8496kB active_anon:1190788kB inactive_anon:397044kB active_file:15588kB inactive_file:15532kB unevictable:1596kB isolated(anon):208kB isolated(file):48kB present:2023748kB mlocked:1596kB dirty:0kB writeback:304328kB mapped:31540kB shmem:0kB slab_reclaimable:18028kB slab_unreclaimable:81496kB kernel_stack:1672kB pagetables:17732kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 19*4kB 13*8kB 3*16kB 7*32kB 11*64kB 11*128kB 5*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 7940kB DMA32: 9299*4kB 1341*8kB 4*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 48500kB 98572 total pagecache pages 90213 pages in swap cache Swap cache stats: add 175874, delete 85661, find 7850/8731 Free swap = 1425944kB Total swap = 2097144kB 518064 pages RAM 10350 pages reserved 82388 pages shared 437481 pages non-shared iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. swapper: page allocation failure. order:2, mode:0x4020 Pid: 0, comm: swapper Not tainted 2.6.32-rc3 #22 Call Trace: <IRQ> [<ffffffff8107c3d5>] __alloc_pages_nodemask+0x5a2/0x5ec [<ffffffff81264892>] ? _spin_unlock+0x9/0xb [<ffffffff811e73cd>] ? __alloc_skb+0x3c/0x15b [<ffffffffa03202cb>] ? iwl_rx_allocate+0x8f/0x305 [iwlcore] [<ffffffff8107c431>] __get_free_pages+0x12/0x41 [<ffffffff8109cb1a>] __kmalloc_track_caller+0x3b/0xed [<ffffffff811e73f7>] __alloc_skb+0x66/0x15b [<ffffffffa03202cb>] iwl_rx_allocate+0x8f/0x305 [iwlcore] [<ffffffffa0320557>] iwl_rx_replenish_now+0x16/0x23 [iwlcore] [<ffffffffa035c0c8>] iwl_rx_handle+0x3a8/0x3c1 [iwlagn] [<ffffffffa035c60d>] iwl_irq_tasklet_legacy+0x52c/0x7a4 [iwlagn] [<ffffffffa0317aaf>] ? __iwl_read32+0xa5/0xb4 [iwlcore] [<ffffffff8103efb8>] tasklet_action+0x71/0xbc [<ffffffff8103f837>] __do_softirq+0x96/0x11b [<ffffffff8100cabc>] call_softirq+0x1c/0x28 [<ffffffff8100e5ef>] do_softirq+0x33/0x6b [<ffffffff8103f5c5>] irq_exit+0x36/0x75 [<ffffffff8100dcf1>] do_IRQ+0xa3/0xba [<ffffffff8100c353>] ret_from_intr+0x0/0xa <EOI> [<ffffffffa0278ec7>] ? acpi_idle_enter_simple+0xf9/0x127 [processor] [<ffffffffa0278ebd>] ? acpi_idle_enter_simple+0xef/0x127 [processor] [<ffffffff811da545>] ? cpuidle_idle_call+0x8c/0xc7 [<ffffffff8100ae2e>] ? cpu_idle+0x55/0x8d [<ffffffff8125432d>] ? rest_init+0x61/0x63 [<ffffffff81436c3e>] ? start_kernel+0x348/0x353 [<ffffffff8143629a>] ? x86_64_start_reservations+0xaa/0xae [<ffffffff8143637f>] ? x86_64_start_kernel+0xe1/0xe8 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 171 CPU 1: hi: 186, btch: 31 usd: 155 active_anon:297901 inactive_anon:99948 isolated_anon:52 active_file:3920 inactive_file:3948 isolated_file:12 unevictable:399 dirty:0 writeback:34634 unstable:0 buffer:125 free:23390 slab_reclaimable:4510 slab_unreclaimable:11714 mapped:7819 shmem:0 pagetables:4437 bounce:0 DMA free:7908kB min:40kB low:48kB high:60kB active_anon:3340kB inactive_anon:3608kB active_file:384kB inactive_file:472kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15336kB mlocked:0kB dirty:0kB writeback:36kB mapped:256kB shmem:0kB slab_reclaimable:12kB slab_unreclaimable:12kB kernel_stack:0kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 1976 1976 1976 DMA32 free:85652kB min:5664kB low:7080kB high:8496kB active_anon:1188264kB inactive_anon:396184kB active_file:15296kB inactive_file:15320kB unevictable:1596kB isolated(anon):208kB isolated(file):48kB present:2023748kB mlocked:1596kB dirty:0kB writeback:138500kB mapped:31020kB shmem:0kB slab_reclaimable:18028kB slab_unreclaimable:46844kB kernel_stack:1672kB pagetables:17732kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 17*4kB 12*8kB 4*16kB 6*32kB 11*64kB 11*128kB 5*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 7908kB DMA32: 12419*4kB 4439*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 85652kB 97616 total pagecache pages 89394 pages in swap cache Swap cache stats: add 175906, delete 86512, find 7850/8733 Free swap = 1425864kB Total swap = 2097144kB 518064 pages RAM 10350 pages reserved 82282 pages shared 428383 pages non-shared iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-05 5:13 ` Frans Pop @ 2009-10-05 6:50 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-05 6:50 UTC (permalink / raw) To: Mel Gorman Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Monday 05 October 2009, Frans Pop wrote: > I'll dig into this a bit more as it looks like this should be > reproducible, probably even without the kernel build. Next step is to > see how .30 behaves in the same situation. This looks conclusive. I tested .30 and .32-rc3 from clean reboots and only starting gitk. I only started music playing in the background (amarok) from an NFS share to ensure network activity. With .32-rc3 I got 4 SKB allocation errors while starting the *second* gitk instance. And the system was completely frozen with music stopped until gitk finished loading. With .30 I was able to start *three* gitk's (which meant 2 of them got (partially) swapped out) without any allocation errors. And with the system remaining relatively responsive. There was a short break in the music while I started the 2nd instance, but it just continued playing afterwards. There was also some mild latency in the mouse cursor, but nothing like the full desktop freeze I get with .32-rc3. With .30 I looked at /proc/buddyinfo while the 3rd gitk was being started, and that looked fairly healthy all the time: Node 0, zone DMA 5 9 22 20 21 11 0 0 0 0 1 Node 0, zone DMA32 579 67 25 8 5 1 1 0 1 1 0 Node 0, zone DMA 5 9 22 20 21 11 0 0 0 0 1 Node 0, zone DMA32 276 54 13 15 8 10 3 1 1 1 0 Node 0, zone DMA 4 9 22 20 21 11 0 0 0 0 1 Node 0, zone DMA32 119 45 24 18 12 4 5 2 1 1 0 Node 0, zone DMA 4 9 22 20 21 11 0 0 0 0 1 Node 0, zone DMA32 527 13 9 5 5 3 2 1 1 1 0 Node 0, zone DMA 5 9 22 20 21 11 0 0 0 0 1 Node 0, zone DMA32 1375 24 7 7 8 5 1 1 0 1 0 Node 0, zone DMA 5 9 22 20 21 11 0 0 0 0 1 Node 0, zone DMA32 329 21 3 3 17 8 5 1 0 1 0 With .32 it was obviously impossible to get that info due to the total freeze of the desktop. Not sure if the scheduler changes in .32 contribute to this. Guess I could find out by doing the same test with .31. One thing I should mention: my swap is an LVM volume that's in a VG that's on a LUKS encrypted partition. Does this give you enough info to go on, or should I try a bisection? Cheers, FJP ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-05 6:50 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-05 6:50 UTC (permalink / raw) To: Mel Gorman Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Monday 05 October 2009, Frans Pop wrote: > I'll dig into this a bit more as it looks like this should be > reproducible, probably even without the kernel build. Next step is to > see how .30 behaves in the same situation. This looks conclusive. I tested .30 and .32-rc3 from clean reboots and only starting gitk. I only started music playing in the background (amarok) from an NFS share to ensure network activity. With .32-rc3 I got 4 SKB allocation errors while starting the *second* gitk instance. And the system was completely frozen with music stopped until gitk finished loading. With .30 I was able to start *three* gitk's (which meant 2 of them got (partially) swapped out) without any allocation errors. And with the system remaining relatively responsive. There was a short break in the music while I started the 2nd instance, but it just continued playing afterwards. There was also some mild latency in the mouse cursor, but nothing like the full desktop freeze I get with .32-rc3. With .30 I looked at /proc/buddyinfo while the 3rd gitk was being started, and that looked fairly healthy all the time: Node 0, zone DMA 5 9 22 20 21 11 0 0 0 0 1 Node 0, zone DMA32 579 67 25 8 5 1 1 0 1 1 0 Node 0, zone DMA 5 9 22 20 21 11 0 0 0 0 1 Node 0, zone DMA32 276 54 13 15 8 10 3 1 1 1 0 Node 0, zone DMA 4 9 22 20 21 11 0 0 0 0 1 Node 0, zone DMA32 119 45 24 18 12 4 5 2 1 1 0 Node 0, zone DMA 4 9 22 20 21 11 0 0 0 0 1 Node 0, zone DMA32 527 13 9 5 5 3 2 1 1 1 0 Node 0, zone DMA 5 9 22 20 21 11 0 0 0 0 1 Node 0, zone DMA32 1375 24 7 7 8 5 1 1 0 1 0 Node 0, zone DMA 5 9 22 20 21 11 0 0 0 0 1 Node 0, zone DMA32 329 21 3 3 17 8 5 1 0 1 0 With .32 it was obviously impossible to get that info due to the total freeze of the desktop. Not sure if the scheduler changes in .32 contribute to this. Guess I could find out by doing the same test with .31. One thing I should mention: my swap is an LVM volume that's in a VG that's on a LUKS encrypted partition. Does this give you enough info to go on, or should I try a bisection? Cheers, FJP -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-05 6:50 ` Frans Pop @ 2009-10-05 8:54 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-05 8:54 UTC (permalink / raw) To: Mel Gorman Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Monday 05 October 2009, Frans Pop wrote: > With .32 it was obviously impossible to get that info due to the total > freeze of the desktop. Not sure if the scheduler changes in .32 > contribute to this. Guess I could find out by doing the same test with > .31. I've tried with .31.1 too now and there does seem to be a scheduler component too. With .31.1 I also get the SKB allocation errors, but the desktop freeze seems to be less severe than with .32-rc3. I would suggest looking into that _after_ the allocation issue has been traced/solved. I did manage to really (partially) hang up the desktop with .31.1: music did not come back and the task manager of the KDE desktop remained frozen, but I could still use konsole [1]. I suspect this is because I also got an OOPS in between the SKB failures: IP: [<ffffffffa0444ea2>] rpcauth_checkverf+0x4e/0x5a[sunrpc] PGD 77b83067 PUD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/class/power_supply/C23D/charge_full CPU 0 Modules linked in: i915 drm i2c_algo_bit i2c_core ppdev parport_pc lp parport cpufreq_conservative cpufreq_userspace cpufreq_stats cpufreq_powersave ipv6 nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc ext2 coretemp hp_wmi acpi_cpufreq loop snd_hda_codec_analog snd_hda_intel snd_hda_codec arc4 snd_pcm_oss snd_mixer_oss ecb snd_pcm snd_seq_dummy snd_seq_oss iwlagn iwlcore snd_seq_midi pcmcia mac80211 snd_rawmidi usblp snd_seq_midi_event snd_seq pcspkr cfg80211 yenta_socket rsrc_nonstatic pcmcia_core psmouse snd_timer snd_seq_device rfkill serio_raw snd soundcore snd_page_alloc hp_accel lis3lv02d video container output wmi intel_agp input_polldev battery ac processor button joydev evdev ext3 jbd mbcache sha256_generic aes_x86_64 aes_generic cbc usbhid hid dm_crypt dm_mirror dm_region_hash dm_log dm_snapshot dm_mod sg sr_mod sd_mod cdrom ide_pci_generic piix ide_core pata_acpi uhci_hcd ata_piix ohci1394 sdhci_pci sdhci mmc_core led_class ieee1394 ricoh_mmc ata_generic ehci_hcd libta e1000e scsi_mod thermal fan thermal_sys [last unloaded: scsi_wait_scan] Pid: 3226, comm: rpciod/0 Not tainted 2.6.31.1 #20 HP Compaq 2510p Notebook PC RIP: 0010:[<ffffffffa0444ea2>] [<ffffffffa0444ea2>]rpcauth_checkverf+0x4e/0x5a [sunrpc] RSP: 0018:ffff88007aafbda0 EFLAGS: 00010246 RAX: 0000000400001000 RBX: ffff88003a718e40 RCX: 0000000000000001 RDX: ffff880038b821bc RSI: ffff880038b821c8 RDI: ffff8800618358c8 RBP: ffff88007aafbdc0 R08: 0000000000000000 R09: 0000000000000000 R10: ffff880001514d80 R11: ffff8800536401f0 R12: ffff8800618358c8 R13: ffff880038b821c8 R14: ffff880037bb4bd0 R15: ffffffffa04bf52b FS: 0000000000000000(0000) GS:ffff880001504000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000400001038 CR3: 0000000067ee5000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rpciod/0 (pid: 3226, threadinfo ffff88007aafa000, task ffff88007c431670) Stack: ffff88007aafbde0 ffff880037bb4bd0 ffff8800618358c8 ffff880061835958 <0> ffff88007aafbe00 ffffffffa043e24a ffff88007c4319e0 ffff8800618358c8 <0> ffff880061835970 ffff880061835958 0000000000000000 0000000000000001 Call Trace: [<ffffffffa043e24a>] call_decode+0x374/0x68e [sunrpc] [<ffffffffa044430e>] __rpc_execute+0x86/0x244 [sunrpc] [<ffffffffa04444f8>] ? rpc_async_schedule+0x0/0x12 [sunrpc] [<ffffffffa0444508>] rpc_async_schedule+0x10/0x12 [sunrpc] [<ffffffff81048bd5>] worker_thread+0x132/0x1ca [<ffffffff8104c657>] ? autoremove_wake_function+0x0/0x38 [<ffffffff81048aa3>] ? worker_thread+0x0/0x1ca [<ffffffff8104c335>] kthread+0x8f/0x97 [<ffffffff8100ca7a>] child_rip+0xa/0x20 [<ffffffff8104c2a6>] ? kthread+0x0/0x97 [<ffffffff8100ca70>] ? child_rip+0x0/0x20 Code: 30 0f b7 b7 06 01 00 00 48 89 d9 48 c7 c7 30 42 45 a0 48 8b 40 10 48 8b 50 10 31 c0 e8 73 f8 e0 e0 48 8b 43 38 4c 89 ee 4c 89 e7 <ff> 50 38 41 59 5b 41 5c 41 5d c9 c3 55 48 89 e5 41 55 49 89 f5 RIP [<ffffffffa0444ea2>] rpcauth_checkverf+0x4e/0x5a [sunrpc] RSP <ffff88007aafbda0> CR2: 0000000400001038 Not sure whether it's worth following up on that as a separate issue. Cheers, FJP [1] KDE's task manager freezing for short periods is normal for me while amarok is blocked by NFS. This normally only happens when I start amarok for the first time, but it does explain how the NFS oops can have the same effect. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-05 8:54 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-05 8:54 UTC (permalink / raw) To: Mel Gorman Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Monday 05 October 2009, Frans Pop wrote: > With .32 it was obviously impossible to get that info due to the total > freeze of the desktop. Not sure if the scheduler changes in .32 > contribute to this. Guess I could find out by doing the same test with > .31. I've tried with .31.1 too now and there does seem to be a scheduler component too. With .31.1 I also get the SKB allocation errors, but the desktop freeze seems to be less severe than with .32-rc3. I would suggest looking into that _after_ the allocation issue has been traced/solved. I did manage to really (partially) hang up the desktop with .31.1: music did not come back and the task manager of the KDE desktop remained frozen, but I could still use konsole [1]. I suspect this is because I also got an OOPS in between the SKB failures: IP: [<ffffffffa0444ea2>] rpcauth_checkverf+0x4e/0x5a[sunrpc] PGD 77b83067 PUD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/class/power_supply/C23D/charge_full CPU 0 Modules linked in: i915 drm i2c_algo_bit i2c_core ppdev parport_pc lp parport cpufreq_conservative cpufreq_userspace cpufreq_stats cpufreq_powersave ipv6 nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc ext2 coretemp hp_wmi acpi_cpufreq loop snd_hda_codec_analog snd_hda_intel snd_hda_codec arc4 snd_pcm_oss snd_mixer_oss ecb snd_pcm snd_seq_dummy snd_seq_oss iwlagn iwlcore snd_seq_midi pcmcia mac80211 snd_rawmidi usblp snd_seq_midi_event snd_seq pcspkr cfg80211 yenta_socket rsrc_nonstatic pcmcia_core psmouse snd_timer snd_seq_device rfkill serio_raw snd soundcore snd_page_alloc hp_accel lis3lv02d video container output wmi intel_agp input_polldev battery ac processor button joydev evdev ext3 jbd mbcache sha256_generic aes_x86_64 aes_generic cbc usbhid hid dm_crypt dm_mirror dm_region_hash dm_log dm_snapshot dm_mod sg sr_mod sd_mod cdrom ide_pci_generic piix ide_core pata_acpi uhci_hcd ata_piix ohci1394 sdhci_pci sdhci mmc_core led_class ieee1394 ricoh_mmc ata_generic ehci_hcd libta e1000e scsi_mod thermal fan thermal_sys [last unloaded: scsi_wait_scan] Pid: 3226, comm: rpciod/0 Not tainted 2.6.31.1 #20 HP Compaq 2510p Notebook PC RIP: 0010:[<ffffffffa0444ea2>] [<ffffffffa0444ea2>]rpcauth_checkverf+0x4e/0x5a [sunrpc] RSP: 0018:ffff88007aafbda0 EFLAGS: 00010246 RAX: 0000000400001000 RBX: ffff88003a718e40 RCX: 0000000000000001 RDX: ffff880038b821bc RSI: ffff880038b821c8 RDI: ffff8800618358c8 RBP: ffff88007aafbdc0 R08: 0000000000000000 R09: 0000000000000000 R10: ffff880001514d80 R11: ffff8800536401f0 R12: ffff8800618358c8 R13: ffff880038b821c8 R14: ffff880037bb4bd0 R15: ffffffffa04bf52b FS: 0000000000000000(0000) GS:ffff880001504000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000400001038 CR3: 0000000067ee5000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rpciod/0 (pid: 3226, threadinfo ffff88007aafa000, task ffff88007c431670) Stack: ffff88007aafbde0 ffff880037bb4bd0 ffff8800618358c8 ffff880061835958 <0> ffff88007aafbe00 ffffffffa043e24a ffff88007c4319e0 ffff8800618358c8 <0> ffff880061835970 ffff880061835958 0000000000000000 0000000000000001 Call Trace: [<ffffffffa043e24a>] call_decode+0x374/0x68e [sunrpc] [<ffffffffa044430e>] __rpc_execute+0x86/0x244 [sunrpc] [<ffffffffa04444f8>] ? rpc_async_schedule+0x0/0x12 [sunrpc] [<ffffffffa0444508>] rpc_async_schedule+0x10/0x12 [sunrpc] [<ffffffff81048bd5>] worker_thread+0x132/0x1ca [<ffffffff8104c657>] ? autoremove_wake_function+0x0/0x38 [<ffffffff81048aa3>] ? worker_thread+0x0/0x1ca [<ffffffff8104c335>] kthread+0x8f/0x97 [<ffffffff8100ca7a>] child_rip+0xa/0x20 [<ffffffff8104c2a6>] ? kthread+0x0/0x97 [<ffffffff8100ca70>] ? child_rip+0x0/0x20 Code: 30 0f b7 b7 06 01 00 00 48 89 d9 48 c7 c7 30 42 45 a0 48 8b 40 10 48 8b 50 10 31 c0 e8 73 f8 e0 e0 48 8b 43 38 4c 89 ee 4c 89 e7 <ff> 50 38 41 59 5b 41 5c 41 5d c9 c3 55 48 89 e5 41 55 49 89 f5 RIP [<ffffffffa0444ea2>] rpcauth_checkverf+0x4e/0x5a [sunrpc] RSP <ffff88007aafbda0> CR2: 0000000400001038 Not sure whether it's worth following up on that as a separate issue. Cheers, FJP [1] KDE's task manager freezing for short periods is normal for me while amarok is blocked by NFS. This normally only happens when I start amarok for the first time, but it does explain how the NFS oops can have the same effect. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-05 6:50 ` Frans Pop @ 2009-10-05 8:57 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-05 8:57 UTC (permalink / raw) To: Frans Pop Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Mon, Oct 05, 2009 at 08:50:58AM +0200, Frans Pop wrote: > On Monday 05 October 2009, Frans Pop wrote: > > I'll dig into this a bit more as it looks like this should be > > reproducible, probably even without the kernel build. Next step is to > > see how .30 behaves in the same situation. > > This looks conclusive. I tested .30 and .32-rc3 from clean reboots and > only starting gitk. I only started music playing in the background > (amarok) from an NFS share to ensure network activity. > > With .32-rc3 I got 4 SKB allocation errors while starting the *second* gitk > instance. And the system was completely frozen with music stopped until gitk > finished loading. > > With .30 I was able to start *three* gitk's (which meant 2 of them got > (partially) swapped out) without any allocation errors. And with the system > remaining relatively responsive. There was a short break in the music while > I started the 2nd instance, but it just continued playing afterwards. There > was also some mild latency in the mouse cursor, but nothing like the full > desktop freeze I get with .32-rc3. > > With .30 I looked at /proc/buddyinfo while the 3rd gitk was being started, > and that looked fairly healthy all the time: > Node 0, zone DMA 5 9 22 20 21 11 0 0 0 0 1 > Node 0, zone DMA32 579 67 25 8 5 1 1 0 1 1 0 > Node 0, zone DMA 5 9 22 20 21 11 0 0 0 0 1 > Node 0, zone DMA32 276 54 13 15 8 10 3 1 1 1 0 > Node 0, zone DMA 4 9 22 20 21 11 0 0 0 0 1 > Node 0, zone DMA32 119 45 24 18 12 4 5 2 1 1 0 > Node 0, zone DMA 4 9 22 20 21 11 0 0 0 0 1 > Node 0, zone DMA32 527 13 9 5 5 3 2 1 1 1 0 > Node 0, zone DMA 5 9 22 20 21 11 0 0 0 0 1 > Node 0, zone DMA32 1375 24 7 7 8 5 1 1 0 1 0 > Node 0, zone DMA 5 9 22 20 21 11 0 0 0 0 1 > Node 0, zone DMA32 329 21 3 3 17 8 5 1 0 1 0 > > With .32 it was obviously impossible to get that info due to the total > freeze of the desktop. Not sure if the scheduler changes in .32 contribute > to this. Guess I could find out by doing the same test with .31. > > One thing I should mention: my swap is an LVM volume that's in a VG that's > on a LUKS encrypted partition. > > Does this give you enough info to go on, or should I try a bisection? > I'll be trying to reproduce it, but it's unlikely I'll manage to reproduce it reliably as there may be a specific combination of hardware necessary as well. What I'm going to try is writing a module that allocates order-5 every second GFP_ATOMIC and see can I reproduce using scenarios similar to yours but it'll take some time with no guarantee of success. If you could bisect it, it would be fantastic. Thanks -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-05 8:57 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-05 8:57 UTC (permalink / raw) To: Frans Pop Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Mon, Oct 05, 2009 at 08:50:58AM +0200, Frans Pop wrote: > On Monday 05 October 2009, Frans Pop wrote: > > I'll dig into this a bit more as it looks like this should be > > reproducible, probably even without the kernel build. Next step is to > > see how .30 behaves in the same situation. > > This looks conclusive. I tested .30 and .32-rc3 from clean reboots and > only starting gitk. I only started music playing in the background > (amarok) from an NFS share to ensure network activity. > > With .32-rc3 I got 4 SKB allocation errors while starting the *second* gitk > instance. And the system was completely frozen with music stopped until gitk > finished loading. > > With .30 I was able to start *three* gitk's (which meant 2 of them got > (partially) swapped out) without any allocation errors. And with the system > remaining relatively responsive. There was a short break in the music while > I started the 2nd instance, but it just continued playing afterwards. There > was also some mild latency in the mouse cursor, but nothing like the full > desktop freeze I get with .32-rc3. > > With .30 I looked at /proc/buddyinfo while the 3rd gitk was being started, > and that looked fairly healthy all the time: > Node 0, zone DMA 5 9 22 20 21 11 0 0 0 0 1 > Node 0, zone DMA32 579 67 25 8 5 1 1 0 1 1 0 > Node 0, zone DMA 5 9 22 20 21 11 0 0 0 0 1 > Node 0, zone DMA32 276 54 13 15 8 10 3 1 1 1 0 > Node 0, zone DMA 4 9 22 20 21 11 0 0 0 0 1 > Node 0, zone DMA32 119 45 24 18 12 4 5 2 1 1 0 > Node 0, zone DMA 4 9 22 20 21 11 0 0 0 0 1 > Node 0, zone DMA32 527 13 9 5 5 3 2 1 1 1 0 > Node 0, zone DMA 5 9 22 20 21 11 0 0 0 0 1 > Node 0, zone DMA32 1375 24 7 7 8 5 1 1 0 1 0 > Node 0, zone DMA 5 9 22 20 21 11 0 0 0 0 1 > Node 0, zone DMA32 329 21 3 3 17 8 5 1 0 1 0 > > With .32 it was obviously impossible to get that info due to the total > freeze of the desktop. Not sure if the scheduler changes in .32 contribute > to this. Guess I could find out by doing the same test with .31. > > One thing I should mention: my swap is an LVM volume that's in a VG that's > on a LUKS encrypted partition. > > Does this give you enough info to go on, or should I try a bisection? > I'll be trying to reproduce it, but it's unlikely I'll manage to reproduce it reliably as there may be a specific combination of hardware necessary as well. What I'm going to try is writing a module that allocates order-5 every second GFP_ATOMIC and see can I reproduce using scenarios similar to yours but it'll take some time with no guarantee of success. If you could bisect it, it would be fantastic. Thanks -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-05 8:57 ` Mel Gorman @ 2009-10-05 21:34 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-05 21:34 UTC (permalink / raw) To: Mel Gorman Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm, David Rientjes On Monday 05 October 2009, Mel Gorman wrote: > On Mon, Oct 05, 2009 at 08:50:58AM +0200, Frans Pop wrote: > > On Monday 05 October 2009, Frans Pop wrote: > > > I'll dig into this a bit more as it looks like this should be > > > reproducible, probably even without the kernel build. Next step is > > > to see how .30 behaves in the same situation. > > > > This looks conclusive. I tested .30 and .32-rc3 from clean reboots and > > only starting gitk. I only started music playing in the background > > (amarok) from an NFS share to ensure network activity. > > > > With .32-rc3 I got 4 SKB allocation errors while starting the *second* > > gitk instance. And the system was completely frozen with music stopped > > until gitk finished loading. > > > > With .30 I was able to start *three* gitk's (which meant 2 of them got > > (partially) swapped out) without any allocation errors. And with the > > system remaining relatively responsive. There was a short break in the > > music while I started the 2nd instance, but it just continued playing > > afterwards. There was also some mild latency in the mouse cursor, but > > nothing like the full desktop freeze I get with .32-rc3. > > > > One thing I should mention: my swap is an LVM volume that's in a VG > > that's on a LUKS encrypted partition. > > > > Does this give you enough info to go on, or should I try a bisection? > > I'll be trying to reproduce it, but it's unlikely I'll manage to > reproduce it reliably as there may be a specific combination of hardware > necessary as well. What I'm going to try is writing a module that > allocates order-5 every second GFP_ATOMIC and see can I reproduce using > scenarios similar to yours but it'll take some time with no guarantee of > success. If you could bisect it, it would be fantastic. And the winner is: 2ff05b2b4eac2e63d345fc731ea151a060247f53 is first bad commit commit 2ff05b2b4eac2e63d345fc731ea151a060247f53 Author: David Rientjes <rientjes@google.com> Date: Tue Jun 16 15:32:56 2009 -0700 oom: move oom_adj value from task_struct to mm_struct I'm confident that the bisection is good. The test case was very reliable while zooming in on the merge from akpm. Cheers, FJP ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-05 21:34 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-05 21:34 UTC (permalink / raw) To: Mel Gorman Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm, David Rientjes On Monday 05 October 2009, Mel Gorman wrote: > On Mon, Oct 05, 2009 at 08:50:58AM +0200, Frans Pop wrote: > > On Monday 05 October 2009, Frans Pop wrote: > > > I'll dig into this a bit more as it looks like this should be > > > reproducible, probably even without the kernel build. Next step is > > > to see how .30 behaves in the same situation. > > > > This looks conclusive. I tested .30 and .32-rc3 from clean reboots and > > only starting gitk. I only started music playing in the background > > (amarok) from an NFS share to ensure network activity. > > > > With .32-rc3 I got 4 SKB allocation errors while starting the *second* > > gitk instance. And the system was completely frozen with music stopped > > until gitk finished loading. > > > > With .30 I was able to start *three* gitk's (which meant 2 of them got > > (partially) swapped out) without any allocation errors. And with the > > system remaining relatively responsive. There was a short break in the > > music while I started the 2nd instance, but it just continued playing > > afterwards. There was also some mild latency in the mouse cursor, but > > nothing like the full desktop freeze I get with .32-rc3. > > > > One thing I should mention: my swap is an LVM volume that's in a VG > > that's on a LUKS encrypted partition. > > > > Does this give you enough info to go on, or should I try a bisection? > > I'll be trying to reproduce it, but it's unlikely I'll manage to > reproduce it reliably as there may be a specific combination of hardware > necessary as well. What I'm going to try is writing a module that > allocates order-5 every second GFP_ATOMIC and see can I reproduce using > scenarios similar to yours but it'll take some time with no guarantee of > success. If you could bisect it, it would be fantastic. And the winner is: 2ff05b2b4eac2e63d345fc731ea151a060247f53 is first bad commit commit 2ff05b2b4eac2e63d345fc731ea151a060247f53 Author: David Rientjes <rientjes@google.com> Date: Tue Jun 16 15:32:56 2009 -0700 oom: move oom_adj value from task_struct to mm_struct I'm confident that the bisection is good. The test case was very reliable while zooming in on the merge from akpm. Cheers, FJP -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-05 21:34 ` Frans Pop @ 2009-10-06 0:04 ` David Rientjes -1 siblings, 0 replies; 384+ messages in thread From: David Rientjes @ 2009-10-06 0:04 UTC (permalink / raw) To: Frans Pop Cc: Mel Gorman, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Mon, 5 Oct 2009, Frans Pop wrote: > And the winner is: > 2ff05b2b4eac2e63d345fc731ea151a060247f53 is first bad commit > commit 2ff05b2b4eac2e63d345fc731ea151a060247f53 > Author: David Rientjes <rientjes@google.com> > Date: Tue Jun 16 15:32:56 2009 -0700 > > oom: move oom_adj value from task_struct to mm_struct > > I'm confident that the bisection is good. The test case was very reliable > while zooming in on the merge from akpm. > I doubt it for two reasons: (i) this commit was reverted in 0753ba0 since 2.6.31-rc7 and is no longer in the kernel, and (ii) these are GFP_ATOMIC allocations which would be unaffected by oom killer scores. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-06 0:04 ` David Rientjes 0 siblings, 0 replies; 384+ messages in thread From: David Rientjes @ 2009-10-06 0:04 UTC (permalink / raw) To: Frans Pop Cc: Mel Gorman, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Mon, 5 Oct 2009, Frans Pop wrote: > And the winner is: > 2ff05b2b4eac2e63d345fc731ea151a060247f53 is first bad commit > commit 2ff05b2b4eac2e63d345fc731ea151a060247f53 > Author: David Rientjes <rientjes@google.com> > Date: Tue Jun 16 15:32:56 2009 -0700 > > oom: move oom_adj value from task_struct to mm_struct > > I'm confident that the bisection is good. The test case was very reliable > while zooming in on the merge from akpm. > I doubt it for two reasons: (i) this commit was reverted in 0753ba0 since 2.6.31-rc7 and is no longer in the kernel, and (ii) these are GFP_ATOMIC allocations which would be unaffected by oom killer scores. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-06 0:04 ` David Rientjes @ 2009-10-06 1:25 ` KOSAKI Motohiro -1 siblings, 0 replies; 384+ messages in thread From: KOSAKI Motohiro @ 2009-10-06 1:25 UTC (permalink / raw) To: David Rientjes Cc: kosaki.motohiro, Frans Pop, Mel Gorman, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm > On Mon, 5 Oct 2009, Frans Pop wrote: > > > And the winner is: > > 2ff05b2b4eac2e63d345fc731ea151a060247f53 is first bad commit > > commit 2ff05b2b4eac2e63d345fc731ea151a060247f53 > > Author: David Rientjes <rientjes@google.com> > > Date: Tue Jun 16 15:32:56 2009 -0700 > > > > oom: move oom_adj value from task_struct to mm_struct > > > > I'm confident that the bisection is good. The test case was very reliable > > while zooming in on the merge from akpm. > > > > I doubt it for two reasons: (i) this commit was reverted in 0753ba0 since > 2.6.31-rc7 and is no longer in the kernel, and (ii) these are GFP_ATOMIC > allocations which would be unaffected by oom killer scores. I agree. this patch is pretty obvious correct. it was reverted by one unfortunately regression. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-06 1:25 ` KOSAKI Motohiro 0 siblings, 0 replies; 384+ messages in thread From: KOSAKI Motohiro @ 2009-10-06 1:25 UTC (permalink / raw) To: David Rientjes Cc: kosaki.motohiro, Frans Pop, Mel Gorman, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm > On Mon, 5 Oct 2009, Frans Pop wrote: > > > And the winner is: > > 2ff05b2b4eac2e63d345fc731ea151a060247f53 is first bad commit > > commit 2ff05b2b4eac2e63d345fc731ea151a060247f53 > > Author: David Rientjes <rientjes@google.com> > > Date: Tue Jun 16 15:32:56 2009 -0700 > > > > oom: move oom_adj value from task_struct to mm_struct > > > > I'm confident that the bisection is good. The test case was very reliable > > while zooming in on the merge from akpm. > > > > I doubt it for two reasons: (i) this commit was reverted in 0753ba0 since > 2.6.31-rc7 and is no longer in the kernel, and (ii) these are GFP_ATOMIC > allocations which would be unaffected by oom killer scores. I agree. this patch is pretty obvious correct. it was reverted by one unfortunately regression. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-06 0:04 ` David Rientjes @ 2009-10-06 8:53 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-06 8:53 UTC (permalink / raw) To: David Rientjes Cc: Frans Pop, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Mon, Oct 05, 2009 at 05:04:55PM -0700, David Rientjes wrote: > On Mon, 5 Oct 2009, Frans Pop wrote: > > > And the winner is: > > 2ff05b2b4eac2e63d345fc731ea151a060247f53 is first bad commit > > commit 2ff05b2b4eac2e63d345fc731ea151a060247f53 > > Author: David Rientjes <rientjes@google.com> > > Date: Tue Jun 16 15:32:56 2009 -0700 > > > > oom: move oom_adj value from task_struct to mm_struct > > > > I'm confident that the bisection is good. The test case was very reliable > > while zooming in on the merge from akpm. > > > > I doubt it for two reasons: (i) this commit was reverted in 0753ba0 since > 2.6.31-rc7 and is no longer in the kernel, and (ii) these are GFP_ATOMIC > allocations which would be unaffected by oom killer scores. > However, the problem was reported to start showing up in 2.6.31-rc1 so while it might not be *the* patch, it might be making the type of change that caused more fragmentation. This patch adjusted the size of mm_struct and maybe it was enough to change the "order" required for the slab. Maybe there are other slabs that have changed size as well in that timeframe. Frans, what is the size of mm_struct before and after this patch was applied? Find it with either grep mm_struct /proc/slabinfo and if the information is not available there, try cat /sys/kernel/slab/mm_struct/slab_size and /sys/kernel/slab/mm_struct/order Thanks -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-06 8:53 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-06 8:53 UTC (permalink / raw) To: David Rientjes Cc: Frans Pop, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Mon, Oct 05, 2009 at 05:04:55PM -0700, David Rientjes wrote: > On Mon, 5 Oct 2009, Frans Pop wrote: > > > And the winner is: > > 2ff05b2b4eac2e63d345fc731ea151a060247f53 is first bad commit > > commit 2ff05b2b4eac2e63d345fc731ea151a060247f53 > > Author: David Rientjes <rientjes@google.com> > > Date: Tue Jun 16 15:32:56 2009 -0700 > > > > oom: move oom_adj value from task_struct to mm_struct > > > > I'm confident that the bisection is good. The test case was very reliable > > while zooming in on the merge from akpm. > > > > I doubt it for two reasons: (i) this commit was reverted in 0753ba0 since > 2.6.31-rc7 and is no longer in the kernel, and (ii) these are GFP_ATOMIC > allocations which would be unaffected by oom killer scores. > However, the problem was reported to start showing up in 2.6.31-rc1 so while it might not be *the* patch, it might be making the type of change that caused more fragmentation. This patch adjusted the size of mm_struct and maybe it was enough to change the "order" required for the slab. Maybe there are other slabs that have changed size as well in that timeframe. Frans, what is the size of mm_struct before and after this patch was applied? Find it with either grep mm_struct /proc/slabinfo and if the information is not available there, try cat /sys/kernel/slab/mm_struct/slab_size and /sys/kernel/slab/mm_struct/order Thanks -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-06 8:53 ` Mel Gorman @ 2009-10-06 9:14 ` David Rientjes -1 siblings, 0 replies; 384+ messages in thread From: David Rientjes @ 2009-10-06 9:14 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Tue, 6 Oct 2009, Mel Gorman wrote: > > > And the winner is: > > > 2ff05b2b4eac2e63d345fc731ea151a060247f53 is first bad commit > > > commit 2ff05b2b4eac2e63d345fc731ea151a060247f53 > > > Author: David Rientjes <rientjes@google.com> > > > Date: Tue Jun 16 15:32:56 2009 -0700 > > > > > > oom: move oom_adj value from task_struct to mm_struct > > > > > > I'm confident that the bisection is good. The test case was very reliable > > > while zooming in on the merge from akpm. > > > > > > > I doubt it for two reasons: (i) this commit was reverted in 0753ba0 since > > 2.6.31-rc7 and is no longer in the kernel, and (ii) these are GFP_ATOMIC > > allocations which would be unaffected by oom killer scores. > > > > However, the problem was reported to start showing up in 2.6.31-rc1 so > while it might not be *the* patch, it might be making the type of change > that caused more fragmentation. This patch adjusted the size of > mm_struct and maybe it was enough to change the "order" required for the > slab. Maybe there are other slabs that have changed size as well in that > timeframe. > > Frans, what is the size of mm_struct before and after this patch was > applied? Find it with either > > grep mm_struct /proc/slabinfo > > and if the information is not available there, try > > cat /sys/kernel/slab/mm_struct/slab_size and > /sys/kernel/slab/mm_struct/order > If that's the case and the problem still persists in 2.6.31-rc7 as reported, then you'd need to compare the current slab order for both mm_struct and signal_struct to the previously known working kernel since the latter is where oom_adj was moved. (You'd still have to check the former to see if there were any mm_struct additions between rc1 and rc7 between the commit and revert, though.) ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-06 9:14 ` David Rientjes 0 siblings, 0 replies; 384+ messages in thread From: David Rientjes @ 2009-10-06 9:14 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Tue, 6 Oct 2009, Mel Gorman wrote: > > > And the winner is: > > > 2ff05b2b4eac2e63d345fc731ea151a060247f53 is first bad commit > > > commit 2ff05b2b4eac2e63d345fc731ea151a060247f53 > > > Author: David Rientjes <rientjes@google.com> > > > Date: Tue Jun 16 15:32:56 2009 -0700 > > > > > > oom: move oom_adj value from task_struct to mm_struct > > > > > > I'm confident that the bisection is good. The test case was very reliable > > > while zooming in on the merge from akpm. > > > > > > > I doubt it for two reasons: (i) this commit was reverted in 0753ba0 since > > 2.6.31-rc7 and is no longer in the kernel, and (ii) these are GFP_ATOMIC > > allocations which would be unaffected by oom killer scores. > > > > However, the problem was reported to start showing up in 2.6.31-rc1 so > while it might not be *the* patch, it might be making the type of change > that caused more fragmentation. This patch adjusted the size of > mm_struct and maybe it was enough to change the "order" required for the > slab. Maybe there are other slabs that have changed size as well in that > timeframe. > > Frans, what is the size of mm_struct before and after this patch was > applied? Find it with either > > grep mm_struct /proc/slabinfo > > and if the information is not available there, try > > cat /sys/kernel/slab/mm_struct/slab_size and > /sys/kernel/slab/mm_struct/order > If that's the case and the problem still persists in 2.6.31-rc7 as reported, then you'd need to compare the current slab order for both mm_struct and signal_struct to the previously known working kernel since the latter is where oom_adj was moved. (You'd still have to check the former to see if there were any mm_struct additions between rc1 and rc7 between the commit and revert, though.) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-06 9:14 ` David Rientjes @ 2009-10-06 9:22 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-06 9:22 UTC (permalink / raw) To: David Rientjes Cc: Frans Pop, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Tue, Oct 06, 2009 at 02:14:26AM -0700, David Rientjes wrote: > On Tue, 6 Oct 2009, Mel Gorman wrote: > > > > > And the winner is: > > > > 2ff05b2b4eac2e63d345fc731ea151a060247f53 is first bad commit > > > > commit 2ff05b2b4eac2e63d345fc731ea151a060247f53 > > > > Author: David Rientjes <rientjes@google.com> > > > > Date: Tue Jun 16 15:32:56 2009 -0700 > > > > > > > > oom: move oom_adj value from task_struct to mm_struct > > > > > > > > I'm confident that the bisection is good. The test case was very reliable > > > > while zooming in on the merge from akpm. > > > > > > > > > > I doubt it for two reasons: (i) this commit was reverted in 0753ba0 since > > > 2.6.31-rc7 and is no longer in the kernel, and (ii) these are GFP_ATOMIC > > > allocations which would be unaffected by oom killer scores. > > > > > > > However, the problem was reported to start showing up in 2.6.31-rc1 so > > while it might not be *the* patch, it might be making the type of change > > that caused more fragmentation. This patch adjusted the size of > > mm_struct and maybe it was enough to change the "order" required for the > > slab. Maybe there are other slabs that have changed size as well in that > > timeframe. > > > > Frans, what is the size of mm_struct before and after this patch was > > applied? Find it with either > > > > grep mm_struct /proc/slabinfo > > > > and if the information is not available there, try > > > > cat /sys/kernel/slab/mm_struct/slab_size and > > /sys/kernel/slab/mm_struct/order > > > > If that's the case and the problem still persists in 2.6.31-rc7 as > reported, then you'd need to compare the current slab order for both > mm_struct and signal_struct to the previously known working kernel > since the latter is where oom_adj was moved. (You'd still have to check > the former to see if there were any mm_struct additions between rc1 and > rc7 between the commit and revert, though.) > Best to just grab all of slabinfo for a poke around. I know task_struct has increases in size since 2.6.29 but not enough on the machines I've changed to make a difference to the order of pages requested. It might be different on the problem machines. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-06 9:22 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-06 9:22 UTC (permalink / raw) To: David Rientjes Cc: Frans Pop, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Tue, Oct 06, 2009 at 02:14:26AM -0700, David Rientjes wrote: > On Tue, 6 Oct 2009, Mel Gorman wrote: > > > > > And the winner is: > > > > 2ff05b2b4eac2e63d345fc731ea151a060247f53 is first bad commit > > > > commit 2ff05b2b4eac2e63d345fc731ea151a060247f53 > > > > Author: David Rientjes <rientjes@google.com> > > > > Date: Tue Jun 16 15:32:56 2009 -0700 > > > > > > > > oom: move oom_adj value from task_struct to mm_struct > > > > > > > > I'm confident that the bisection is good. The test case was very reliable > > > > while zooming in on the merge from akpm. > > > > > > > > > > I doubt it for two reasons: (i) this commit was reverted in 0753ba0 since > > > 2.6.31-rc7 and is no longer in the kernel, and (ii) these are GFP_ATOMIC > > > allocations which would be unaffected by oom killer scores. > > > > > > > However, the problem was reported to start showing up in 2.6.31-rc1 so > > while it might not be *the* patch, it might be making the type of change > > that caused more fragmentation. This patch adjusted the size of > > mm_struct and maybe it was enough to change the "order" required for the > > slab. Maybe there are other slabs that have changed size as well in that > > timeframe. > > > > Frans, what is the size of mm_struct before and after this patch was > > applied? Find it with either > > > > grep mm_struct /proc/slabinfo > > > > and if the information is not available there, try > > > > cat /sys/kernel/slab/mm_struct/slab_size and > > /sys/kernel/slab/mm_struct/order > > > > If that's the case and the problem still persists in 2.6.31-rc7 as > reported, then you'd need to compare the current slab order for both > mm_struct and signal_struct to the previously known working kernel > since the latter is where oom_adj was moved. (You'd still have to check > the former to see if there were any mm_struct additions between rc1 and > rc7 between the commit and revert, though.) > Best to just grab all of slabinfo for a poke around. I know task_struct has increases in size since 2.6.29 but not enough on the machines I've changed to make a difference to the order of pages requested. It might be different on the problem machines. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-06 0:04 ` David Rientjes @ 2009-10-06 10:23 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-06 10:23 UTC (permalink / raw) To: David Rientjes Cc: Mel Gorman, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Tuesday 06 October 2009, David Rientjes wrote: > On Mon, 5 Oct 2009, Frans Pop wrote: > > And the winner is: > > 2ff05b2b4eac2e63d345fc731ea151a060247f53 is first bad commit > > commit 2ff05b2b4eac2e63d345fc731ea151a060247f53 > > Author: David Rientjes <rientjes@google.com> > > Date: Tue Jun 16 15:32:56 2009 -0700 > > > > oom: move oom_adj value from task_struct to mm_struct > > > > I'm confident that the bisection is good. The test case was very > > reliable while zooming in on the merge from akpm. > > I doubt it for two reasons: (i) this commit was reverted in 0753ba0 > since 2.6.31-rc7 and is no longer in the kernel, and (ii) these are > GFP_ATOMIC allocations which would be unaffected by oom killer scores. OK. Looks like I have been getting some false "good" results. I've been redoing part of the bisect and am getting close to a new candidate. Will explain further when I have that. Cheers, FJP ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-06 10:23 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-06 10:23 UTC (permalink / raw) To: David Rientjes Cc: Mel Gorman, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Tuesday 06 October 2009, David Rientjes wrote: > On Mon, 5 Oct 2009, Frans Pop wrote: > > And the winner is: > > 2ff05b2b4eac2e63d345fc731ea151a060247f53 is first bad commit > > commit 2ff05b2b4eac2e63d345fc731ea151a060247f53 > > Author: David Rientjes <rientjes@google.com> > > Date: Tue Jun 16 15:32:56 2009 -0700 > > > > oom: move oom_adj value from task_struct to mm_struct > > > > I'm confident that the bisection is good. The test case was very > > reliable while zooming in on the merge from akpm. > > I doubt it for two reasons: (i) this commit was reverted in 0753ba0 > since 2.6.31-rc7 and is no longer in the kernel, and (ii) these are > GFP_ATOMIC allocations which would be unaffected by oom killer scores. OK. Looks like I have been getting some false "good" results. I've been redoing part of the bisect and am getting close to a new candidate. Will explain further when I have that. Cheers, FJP -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-05 6:50 ` Frans Pop @ 2009-10-11 23:10 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-11 23:10 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm Sorry for going quiet on this issue for a few days, but I have been spending *a lot* of time on it. I've done what amounts to 5 bisection rounds at ~20 minutes per iteration and in total over 80 boots. The problem with my first bisection was that there are *at least two* changes at the root of this issue, both committed between .30 and .30-rc1. Because of this a normal bisection will not lead to a reliable result and even with my last effort I can only narrow it down to two different areas, and not 100% to specific commits. The two identified areas are: 1) a wireless merge which causes the SKB errors to appear in the first place, but not always; 2) an mm merge which makes the SKB errors occur *much* quicker; IMHO this is the change that also causes the regressions reported by Pekka and Karol. So below my results. The issue is both complex and subtle. Now it's up to you, domain experts for both mm *and* wireless/networking, to make sense of it all and come up with suggestions on how to proceed. I've improved my test and it's now a lot more reliable, but there are still timing influences. Also, because this is all merge-window stuff, I'm hitting quite a few minor and major regressions between commits that can affect tests. Please study the information below carefully. I know it's long, but I think this issue justifies that. On Monday 05 October 2009, Frans Pop wrote: > This looks conclusive. I tested .30 and .32-rc3 from clean reboots and > only starting gitk. I only started music playing in the background > (amarok) from an NFS share to ensure network activity. > > With .32-rc3 I got 4 SKB allocation errors while starting the *second* > gitk instance. And the system was completely frozen with music stopped > until gitk finished loading. With .32-rc3, .31.1 and vanilla .31 I will get multiple SKB allocation errors the *first time* I run the test, *every* time. > With .30 I was able to start *three* gitk's (which meant 2 of them got > (partially) swapped out) without any allocation errors. And with the > system remaining relatively responsive. There was a short break in the > music while I started the 2nd instance, but it just continued playing > afterwards. There was also some mild latency in the mouse cursor, but > nothing like the full desktop freeze I get with .32-rc3. With both .30.2 and vanilla .30 I have *never* been able to get any SKB allocation errors. No matter how often I repeat the test. So, the start and end position are 100% reproducible. Problem is that this changes during the bisection. At some point the test will fail (no SKB errors) the first time I run it, but it will fail on the second or third attempt. Apparently at some point memory must already be fragmented (or higher orders already used up) to some extend for the errors to trigger. TEST METHOD ----------- As a normal bisection (I tried 3 times...) did not lead anywhere, I had to think of an alternative approach. I decided to start by manually selecting merges by Linus into mainline. The advantage is that that makes the bisection linear and makes it a lot easier to see patterns. After narrowing down to a specific merge, I bisected (again semi-manually) inside that merge. Because I suspected there were multiple changes involved, I deliberately tried to find two points: - where do I first start seeing SKB errors at all, even if it is only at the second or third try; - where do I start getting SKB errors reliably on the first try. I worked from "good" to "bad", i.e. I started at .30. The merges were not chosen completely randomly. From the first 3 bisections I strongly suspected the first 'net-next' merge and the first 'akpm' merge, but I did make sure to confirm that suspicion. TEST DESCRIPTION ---------------- The test I've ended up using is: 1) clean boot 2) start music in amarok from NFS share; use very long song to avoid file changes and thus ensure a fluent stream of network data during the test 3) start 'gitk v2.6.29..master &' - to use up some memory 4) start first 'gitk master &' - after this all normal memory is as good as used up, with minor swap; this never resulted in SKB errors 5) start second 'gitk master &' - this causes heavy swapping (>700 MB) and is the real test 6) if there were no SKB errors after 5), kill the gitk processes and repeat steps 3) to 5). I've done this up to 4 times in some cases 7) if the results are not clear or when there is doubt later, repeat from step 1) with same kernel Memory after initial 'gitk v2.6.29..master &': total used free shared buffers cached Mem: 2030776 1153008 877768 0 41572 333968 -/+ buffers/cache: 777468 1253308 Swap: 2097144 0 2097144 Memory after first 'gitk master &': total used free shared buffers cached Mem: 2030776 1979040 51736 0 35684 238420 -/+ buffers/cache: 1704936 325840 Swap: 2097144 21876 2075268 Memory after second 'gitk master &' (with .30.2): total used free shared buffers cached Mem: 2030776 2011608 19168 0 21836 92336 -/+ buffers/cache: 1897436 133340 Swap: 2097144 776160 1320984 OVERVIEW OF RESULTS ------------------- Below I list the most relevant merges and commits. Note that they are listed in commit order; my kernel version shows the order of testing. For the commits I tested the test results are listed on the next line. The first number on that line consists of the test series + the iteration (and also identifies the kernel I used). A "+" means I got no SKB errors, a "-" that I did get them. A "|" means I rebooted for a second series of tests. v2.6.30-2330-gdb8e7f1 'x86-fixes-for-linus' of linux-2.6-tip 1.1 +++ iwlagn sw-error during first test v2.6.30-4127-g0fa2133 'merge' of powerpc (last merge before net-next-2.6) 1.2 +++ v2.6.30-5398-g2ed0e21 net-next-2.6 (mega-merge!) 1.4 +- system reboot fails after testing v2.6.30-5517-g609106b 'merge' of powerpc 1.3 +- system reboot fails after testing v2.6.30-5927-gf83b1e6 'for-linus' of linux1394-2.6 (last merge before akpm) 2.2 ++- v2.6.30-6111-g517d086 'akpm' 2.1 -|- BISECTION OF net-next-2.6 MERGE ------------------------------- Note that this merge was based not on .30 vanilla, but partly on v2.6.30-rc1 and partly on v2.6.30-rc6. I think this had an influence on the latencies I saw (i.e. because some post-rc6 bug fixes were not present it changes the general behavior of the system during the swapping). For example: with v2.6.30-4127-g0fa2133 the system remained more responsive (smaller music skips) than with v2.6.30-rc1-1219-g82d0481. I started again by testing merges, this time those by David. v2.6.30-rc1-1219-g82d0481 'master' of wireless-next-2.6 1.5 ++++ bad latencies v2.6.30-rc6-660-gbb803cf 'master' of net-2.6 v2.6.30-rc6-808-g45ea4ea 'master' of wireless-next-2.6 v2.6.30-rc6-850-gc649c0e 'master' of net-2.6 v2.6.30-rc6-922-g3f1f39c 'linux-2.6.31.y' of wimax v2.6.30-rc6-999-gb2f8f75 'master' of net-2.6 v2.6.30-rc6-1028-ga8c617e 'net-next' of lksctp-dev 1.7 ++++|++++|++++ I went back to this one twice because the bisection inside the next merge (see below) did not give a clear result. v2.6.30-rc6-1103-gb1bc81a 'master' of wireless-next-2.6 1.8 +- v2.6.30-rc6-1224-g84503dd 'master' of wireless-next-2.6 1.6 +- So the problem started in the v2.6.30-rc6-1103-gb1bc81a merge. I was unable to narrow it down to an exact commit; AFAICT the remaining ones (between v2.6.30-rc6-1028-g8fc0fee and v2.6.30-rc6-1032-g7ba10a8) are uninteresting. But it *must* be in this area! For a good overview of the area, use 'gitk 3f1f39c4..b1bc81a0'. v2.6.30-rc6-1028-g8fc0fee cfg80211: use key size constants 1.11 ++++ v2.6.30-rc6-1031-g1bb5633 iwmc3200wifi: fix printk format 1.14 +++- not quite conclusive... v2.6.30-rc6-1032-g7ba10a8 mac80211: fix transposed min/max CW values 1.13 - This is a bugfix for aa837ee1d from an earlier merge! Could this maybe influence the test results in between? There are various SKB related changes there, for example: dfbf97f3..e5b9215e. v2.6.30-rc6-1037-g2c5b9e5 wireless: libertas: fix unaligned accesses 1.12 +- v2.6.30-rc6-1044-g729e9c7 cfg80211: fix for duplicate userspace replies 1.10 +- v2.6.30-rc6-1075-gc587de0 iwlwifi: unify station management 1.9 ++-|+- v2.6.30-rc6-1076-gd14d444 iwl3945: port allow skb allocation in tasklet I thought this was a prime candidate, but as you can see several commits before failed too. Still worth looking at I think! BISECTION of akpm (mm) MERGE ---------------------------- So here I went looking for "where does the test start failing on the first try". Again, I was unable to narrow it down to a single commit. For a good overview of the area, use 'gitk f83b1e61..517d0869'. v2.6.30-5466-ga1dd268 mm: use alloc_pages_exact in alloc_large_system_hash 2.3 +- v2.6.30-5478-ge9bb35d mm: setup_per_zone_inactive_ratio - fix comment and.. 2.5 +- v2.6.30-5486-g35282a2 migration: only migrate_prep() once per move_pages() 2.6 -|+|- not quite conclusive... v2.6.30-5492-gbce7394 page-allocator: reset wmark_min and inactive ratio.. 2.4 -|- WHERE NEXT? =========== I think the results confirm there is definitely an issue here and that my test is reliable and consistent enough to show it. And as it currently is the only test we have... I hope that the info above is enough for the mm and wireless domain experts to identify likely candidates in the areas I've identified. The next step could be trying specific reverts or debug patches, either on top of current git, or 2.6.31, or inside the identified areas. I'll run anything you care to throw at me and will try to provide any additional info you need, but at this point it's up to you. Cheers, FJP ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-11 23:10 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-11 23:10 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm Sorry for going quiet on this issue for a few days, but I have been spending *a lot* of time on it. I've done what amounts to 5 bisection rounds at ~20 minutes per iteration and in total over 80 boots. The problem with my first bisection was that there are *at least two* changes at the root of this issue, both committed between .30 and .30-rc1. Because of this a normal bisection will not lead to a reliable result and even with my last effort I can only narrow it down to two different areas, and not 100% to specific commits. The two identified areas are: 1) a wireless merge which causes the SKB errors to appear in the first place, but not always; 2) an mm merge which makes the SKB errors occur *much* quicker; IMHO this is the change that also causes the regressions reported by Pekka and Karol. So below my results. The issue is both complex and subtle. Now it's up to you, domain experts for both mm *and* wireless/networking, to make sense of it all and come up with suggestions on how to proceed. I've improved my test and it's now a lot more reliable, but there are still timing influences. Also, because this is all merge-window stuff, I'm hitting quite a few minor and major regressions between commits that can affect tests. Please study the information below carefully. I know it's long, but I think this issue justifies that. On Monday 05 October 2009, Frans Pop wrote: > This looks conclusive. I tested .30 and .32-rc3 from clean reboots and > only starting gitk. I only started music playing in the background > (amarok) from an NFS share to ensure network activity. > > With .32-rc3 I got 4 SKB allocation errors while starting the *second* > gitk instance. And the system was completely frozen with music stopped > until gitk finished loading. With .32-rc3, .31.1 and vanilla .31 I will get multiple SKB allocation errors the *first time* I run the test, *every* time. > With .30 I was able to start *three* gitk's (which meant 2 of them got > (partially) swapped out) without any allocation errors. And with the > system remaining relatively responsive. There was a short break in the > music while I started the 2nd instance, but it just continued playing > afterwards. There was also some mild latency in the mouse cursor, but > nothing like the full desktop freeze I get with .32-rc3. With both .30.2 and vanilla .30 I have *never* been able to get any SKB allocation errors. No matter how often I repeat the test. So, the start and end position are 100% reproducible. Problem is that this changes during the bisection. At some point the test will fail (no SKB errors) the first time I run it, but it will fail on the second or third attempt. Apparently at some point memory must already be fragmented (or higher orders already used up) to some extend for the errors to trigger. TEST METHOD ----------- As a normal bisection (I tried 3 times...) did not lead anywhere, I had to think of an alternative approach. I decided to start by manually selecting merges by Linus into mainline. The advantage is that that makes the bisection linear and makes it a lot easier to see patterns. After narrowing down to a specific merge, I bisected (again semi-manually) inside that merge. Because I suspected there were multiple changes involved, I deliberately tried to find two points: - where do I first start seeing SKB errors at all, even if it is only at the second or third try; - where do I start getting SKB errors reliably on the first try. I worked from "good" to "bad", i.e. I started at .30. The merges were not chosen completely randomly. From the first 3 bisections I strongly suspected the first 'net-next' merge and the first 'akpm' merge, but I did make sure to confirm that suspicion. TEST DESCRIPTION ---------------- The test I've ended up using is: 1) clean boot 2) start music in amarok from NFS share; use very long song to avoid file changes and thus ensure a fluent stream of network data during the test 3) start 'gitk v2.6.29..master &' - to use up some memory 4) start first 'gitk master &' - after this all normal memory is as good as used up, with minor swap; this never resulted in SKB errors 5) start second 'gitk master &' - this causes heavy swapping (>700 MB) and is the real test 6) if there were no SKB errors after 5), kill the gitk processes and repeat steps 3) to 5). I've done this up to 4 times in some cases 7) if the results are not clear or when there is doubt later, repeat from step 1) with same kernel Memory after initial 'gitk v2.6.29..master &': total used free shared buffers cached Mem: 2030776 1153008 877768 0 41572 333968 -/+ buffers/cache: 777468 1253308 Swap: 2097144 0 2097144 Memory after first 'gitk master &': total used free shared buffers cached Mem: 2030776 1979040 51736 0 35684 238420 -/+ buffers/cache: 1704936 325840 Swap: 2097144 21876 2075268 Memory after second 'gitk master &' (with .30.2): total used free shared buffers cached Mem: 2030776 2011608 19168 0 21836 92336 -/+ buffers/cache: 1897436 133340 Swap: 2097144 776160 1320984 OVERVIEW OF RESULTS ------------------- Below I list the most relevant merges and commits. Note that they are listed in commit order; my kernel version shows the order of testing. For the commits I tested the test results are listed on the next line. The first number on that line consists of the test series + the iteration (and also identifies the kernel I used). A "+" means I got no SKB errors, a "-" that I did get them. A "|" means I rebooted for a second series of tests. v2.6.30-2330-gdb8e7f1 'x86-fixes-for-linus' of linux-2.6-tip 1.1 +++ iwlagn sw-error during first test v2.6.30-4127-g0fa2133 'merge' of powerpc (last merge before net-next-2.6) 1.2 +++ v2.6.30-5398-g2ed0e21 net-next-2.6 (mega-merge!) 1.4 +- system reboot fails after testing v2.6.30-5517-g609106b 'merge' of powerpc 1.3 +- system reboot fails after testing v2.6.30-5927-gf83b1e6 'for-linus' of linux1394-2.6 (last merge before akpm) 2.2 ++- v2.6.30-6111-g517d086 'akpm' 2.1 -|- BISECTION OF net-next-2.6 MERGE ------------------------------- Note that this merge was based not on .30 vanilla, but partly on v2.6.30-rc1 and partly on v2.6.30-rc6. I think this had an influence on the latencies I saw (i.e. because some post-rc6 bug fixes were not present it changes the general behavior of the system during the swapping). For example: with v2.6.30-4127-g0fa2133 the system remained more responsive (smaller music skips) than with v2.6.30-rc1-1219-g82d0481. I started again by testing merges, this time those by David. v2.6.30-rc1-1219-g82d0481 'master' of wireless-next-2.6 1.5 ++++ bad latencies v2.6.30-rc6-660-gbb803cf 'master' of net-2.6 v2.6.30-rc6-808-g45ea4ea 'master' of wireless-next-2.6 v2.6.30-rc6-850-gc649c0e 'master' of net-2.6 v2.6.30-rc6-922-g3f1f39c 'linux-2.6.31.y' of wimax v2.6.30-rc6-999-gb2f8f75 'master' of net-2.6 v2.6.30-rc6-1028-ga8c617e 'net-next' of lksctp-dev 1.7 ++++|++++|++++ I went back to this one twice because the bisection inside the next merge (see below) did not give a clear result. v2.6.30-rc6-1103-gb1bc81a 'master' of wireless-next-2.6 1.8 +- v2.6.30-rc6-1224-g84503dd 'master' of wireless-next-2.6 1.6 +- So the problem started in the v2.6.30-rc6-1103-gb1bc81a merge. I was unable to narrow it down to an exact commit; AFAICT the remaining ones (between v2.6.30-rc6-1028-g8fc0fee and v2.6.30-rc6-1032-g7ba10a8) are uninteresting. But it *must* be in this area! For a good overview of the area, use 'gitk 3f1f39c4..b1bc81a0'. v2.6.30-rc6-1028-g8fc0fee cfg80211: use key size constants 1.11 ++++ v2.6.30-rc6-1031-g1bb5633 iwmc3200wifi: fix printk format 1.14 +++- not quite conclusive... v2.6.30-rc6-1032-g7ba10a8 mac80211: fix transposed min/max CW values 1.13 - This is a bugfix for aa837ee1d from an earlier merge! Could this maybe influence the test results in between? There are various SKB related changes there, for example: dfbf97f3..e5b9215e. v2.6.30-rc6-1037-g2c5b9e5 wireless: libertas: fix unaligned accesses 1.12 +- v2.6.30-rc6-1044-g729e9c7 cfg80211: fix for duplicate userspace replies 1.10 +- v2.6.30-rc6-1075-gc587de0 iwlwifi: unify station management 1.9 ++-|+- v2.6.30-rc6-1076-gd14d444 iwl3945: port allow skb allocation in tasklet I thought this was a prime candidate, but as you can see several commits before failed too. Still worth looking at I think! BISECTION of akpm (mm) MERGE ---------------------------- So here I went looking for "where does the test start failing on the first try". Again, I was unable to narrow it down to a single commit. For a good overview of the area, use 'gitk f83b1e61..517d0869'. v2.6.30-5466-ga1dd268 mm: use alloc_pages_exact in alloc_large_system_hash 2.3 +- v2.6.30-5478-ge9bb35d mm: setup_per_zone_inactive_ratio - fix comment and.. 2.5 +- v2.6.30-5486-g35282a2 migration: only migrate_prep() once per move_pages() 2.6 -|+|- not quite conclusive... v2.6.30-5492-gbce7394 page-allocator: reset wmark_min and inactive ratio.. 2.4 -|- WHERE NEXT? =========== I think the results confirm there is definitely an issue here and that my test is reliable and consistent enough to show it. And as it currently is the only test we have... I hope that the info above is enough for the mm and wireless domain experts to identify likely candidates in the areas I've identified. The next step could be trying specific reverts or debug patches, either on top of current git, or 2.6.31, or inside the identified areas. I'll run anything you care to throw at me and will try to provide any additional info you need, but at this point it's up to you. Cheers, FJP -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-11 23:10 ` Frans Pop @ 2009-10-11 23:36 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-11 23:36 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Monday 12 October 2009, Frans Pop wrote: > BISECTION of akpm (mm) MERGE > ---------------------------- > So here I went looking for "where does the test start failing on the > first try". Again, I was unable to narrow it down to a single commit. Note that this merge is based on mainline at v2.6.30-5415-g03347e2, so a number of merges "drop out" once I started bisecting into this merge. But that point is still *after* the net-next-2.6 merge, which is all that's really relevant for this issue. > For a good overview of the area, use 'gitk f83b1e61..517d0869'. > > v2.6.30-5466-ga1dd268 mm: use alloc_pages_exact in alloc_large_system_hash > 2.3 +- > v2.6.30-5478-ge9bb35d mm: setup_per_zone_inactive_ratio - fix comment and.. > 2.5 +- > v2.6.30-5486-g35282a2 migration: only migrate_prep() once per move_pages() > 2.6 -|+|- not quite conclusive... > v2.6.30-5492-gbce7394 page-allocator: reset wmark_min and inactive ratio.. > 2.4 -|- ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-11 23:36 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-11 23:36 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, linux-mm On Monday 12 October 2009, Frans Pop wrote: > BISECTION of akpm (mm) MERGE > ---------------------------- > So here I went looking for "where does the test start failing on the > first try". Again, I was unable to narrow it down to a single commit. Note that this merge is based on mainline at v2.6.30-5415-g03347e2, so a number of merges "drop out" once I started bisecting into this merge. But that point is still *after* the net-next-2.6 merge, which is all that's really relevant for this issue. > For a good overview of the area, use 'gitk f83b1e61..517d0869'. > > v2.6.30-5466-ga1dd268 mm: use alloc_pages_exact in alloc_large_system_hash > 2.3 +- > v2.6.30-5478-ge9bb35d mm: setup_per_zone_inactive_ratio - fix comment and.. > 2.5 +- > v2.6.30-5486-g35282a2 migration: only migrate_prep() once per move_pages() > 2.6 -|+|- not quite conclusive... > v2.6.30-5492-gbce7394 page-allocator: reset wmark_min and inactive ratio.. > 2.4 -|- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-11 23:10 ` Frans Pop (?) @ 2009-10-12 13:43 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-12 13:43 UTC (permalink / raw) To: Frans Pop Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Mon, Oct 12, 2009 at 01:10:25AM +0200, Frans Pop wrote: > Sorry for going quiet on this issue for a few days, but I have been > spending *a lot* of time on it. I've done what amounts to 5 bisection > rounds at ~20 minutes per iteration and in total over 80 boots. > > The problem with my first bisection was that there are *at least two* > changes at the root of this issue, both committed between .30 and .30-rc1. > Because of this a normal bisection will not lead to a reliable result and > even with my last effort I can only narrow it down to two different areas, > and not 100% to specific commits. > Thanks very much for your detailed work on this. > The two identified areas are: > 1) a wireless merge which causes the SKB errors to appear in the first > place, but not always; > 2) an mm merge which makes the SKB errors occur *much* quicker; IMHO this > is the change that also causes the regressions reported by Pekka and > Karol. > > So below my results. The issue is both complex and subtle. Now it's up to > you, domain experts for both mm *and* wireless/networking, to make sense of > it all and come up with suggestions on how to proceed. > > I've improved my test and it's now a lot more reliable, but there are still > timing influences. The timing influences is probably because kswapd is working from the time memory gets full. High-order allocation failures would cause it to start reclaiming at that order so it's a race always to see can it do its work before an atomic allocation fails or not. > Also, because this is all merge-window stuff, I'm > hitting quite a few minor and major regressions between commits that can > affect tests. > > Please study the information below carefully. I know it's long, but I think > this issue justifies that. > Agreed. I'll be looking at commits, both wireless and mm but obviously anything I saw about wireless needs to be taken with a generous dose of salt. > On Monday 05 October 2009, Frans Pop wrote: > > This looks conclusive. I tested .30 and .32-rc3 from clean reboots and > > only starting gitk. I only started music playing in the background > > (amarok) from an NFS share to ensure network activity. > > > > With .32-rc3 I got 4 SKB allocation errors while starting the *second* > > gitk instance. And the system was completely frozen with music stopped > > until gitk finished loading. > > With .32-rc3, .31.1 and vanilla .31 I will get multiple SKB allocation > errors the *first time* I run the test, *every* time. > So, this remains a current problem that wasn't solved by accident. > > With .30 I was able to start *three* gitk's (which meant 2 of them got > > (partially) swapped out) without any allocation errors. And with the > > system remaining relatively responsive. There was a short break in the > > music while I started the 2nd instance, but it just continued playing > > afterwards. There was also some mild latency in the mouse cursor, but > > nothing like the full desktop freeze I get with .32-rc3. > > With both .30.2 and vanilla .30 I have *never* been able to get any SKB > allocation errors. No matter how often I repeat the test. > > So, the start and end position are 100% reproducible. Problem is that this > changes during the bisection. At some point the test will fail (no SKB > errors) the first time I run it, but it will fail on the second or third > attempt. > Apparently at some point memory must already be fragmented (or higher > orders already used up) to some extend for the errors to trigger. > That is a reasonable assessment. It could be because 1. Something in the intevening commits greatly increases the number of GFP_ATOMIC allocations that are occuring. It's a pity that the allocator tracepoints are not available in those kernels. It would have made investigating this theory easier. 2. kswapd is no longer reclaiming high-order pages as well as it used to be it due to changes in kswapd itself or lumpy reclaim 3. Fragmentation avoidance has been broken in some subtle manner I think 3 is particularly unlikely and am expecting it to be 1 or 2. > TEST METHOD > ----------- > As a normal bisection (I tried 3 times...) did not lead anywhere, I had to > think of an alternative approach. I decided to start by manually selecting > merges by Linus into mainline. The advantage is that that makes the > bisection linear and makes it a lot easier to see patterns. > After narrowing down to a specific merge, I bisected (again semi-manually) > inside that merge. > > Because I suspected there were multiple changes involved, I deliberately > tried to find two points: > - where do I first start seeing SKB errors at all, even if it is only at > the second or third try; > - where do I start getting SKB errors reliably on the first try. > > I worked from "good" to "bad", i.e. I started at .30. The merges were not > chosen completely randomly. From the first 3 bisections I strongly > suspected the first 'net-next' merge and the first 'akpm' merge, but I did > make sure to confirm that suspicion. > A very good approach. > TEST DESCRIPTION > ---------------- > The test I've ended up using is: > 1) clean boot > 2) start music in amarok from NFS share; use very long song to avoid file > changes and thus ensure a fluent stream of network data during the test > 3) start 'gitk v2.6.29..master &' - to use up some memory > 4) start first 'gitk master &' - after this all normal memory is as good as > used up, with minor swap; this never resulted in SKB errors > 5) start second 'gitk master &' - this causes heavy swapping (>700 MB) and > is the real test > 6) if there were no SKB errors after 5), kill the gitk processes and repeat > steps 3) to 5). I've done this up to 4 times in some cases > 7) if the results are not clear or when there is doubt later, repeat from > step 1) with same kernel > > Memory after initial 'gitk v2.6.29..master &': > total used free shared buffers cached > Mem: 2030776 1153008 877768 0 41572 333968 > -/+ buffers/cache: 777468 1253308 > Swap: 2097144 0 2097144 > > Memory after first 'gitk master &': > total used free shared buffers cached > Mem: 2030776 1979040 51736 0 35684 238420 > -/+ buffers/cache: 1704936 325840 > Swap: 2097144 21876 2075268 > > Memory after second 'gitk master &' (with .30.2): > total used free shared buffers cached > Mem: 2030776 2011608 19168 0 21836 92336 > -/+ buffers/cache: 1897436 133340 > Swap: 2097144 776160 1320984 > > OVERVIEW OF RESULTS > ------------------- > Below I list the most relevant merges and commits. Note that they are > listed in commit order; my kernel version shows the order of testing. > > For the commits I tested the test results are listed on the next line. > The first number on that line consists of the test series + the iteration > (and also identifies the kernel I used). > A "+" means I got no SKB errors, a "-" that I did get them. A "|" means I > rebooted for a second series of tests. > > v2.6.30-2330-gdb8e7f1 'x86-fixes-for-linus' of linux-2.6-tip > 1.1 +++ iwlagn sw-error during first test > v2.6.30-4127-g0fa2133 'merge' of powerpc (last merge before net-next-2.6) > 1.2 +++ > v2.6.30-5398-g2ed0e21 net-next-2.6 (mega-merge!) > 1.4 +- system reboot fails after testing > v2.6.30-5517-g609106b 'merge' of powerpc > 1.3 +- system reboot fails after testing > v2.6.30-5927-gf83b1e6 'for-linus' of linux1394-2.6 (last merge before akpm) > 2.2 ++- > v2.6.30-6111-g517d086 'akpm' > 2.1 -|- > > BISECTION OF net-next-2.6 MERGE > ------------------------------- > Note that this merge was based not on .30 vanilla, but partly on > v2.6.30-rc1 and partly on v2.6.30-rc6. > I think this had an influence on the latencies I saw (i.e. because some > post-rc6 bug fixes were not present it changes the general behavior of the > system during the swapping). For example: with v2.6.30-4127-g0fa2133 the > system remained more responsive (smaller music skips) than with > v2.6.30-rc1-1219-g82d0481. > > I started again by testing merges, this time those by David. > > v2.6.30-rc1-1219-g82d0481 'master' of wireless-next-2.6 > 1.5 ++++ bad latencies The bad latencies might imply that there are a lot more allocations going on than there used to be. Maybe it was just because of a wireless bug though that was later fixed. > v2.6.30-rc6-660-gbb803cf 'master' of net-2.6 > v2.6.30-rc6-808-g45ea4ea 'master' of wireless-next-2.6 > v2.6.30-rc6-850-gc649c0e 'master' of net-2.6 > v2.6.30-rc6-922-g3f1f39c 'linux-2.6.31.y' of wimax > v2.6.30-rc6-999-gb2f8f75 'master' of net-2.6 > v2.6.30-rc6-1028-ga8c617e 'net-next' of lksctp-dev > 1.7 ++++|++++|++++ > I went back to this one twice because the bisection inside the > next merge (see below) did not give a clear result. > v2.6.30-rc6-1103-gb1bc81a 'master' of wireless-next-2.6 > 1.8 +- > v2.6.30-rc6-1224-g84503dd 'master' of wireless-next-2.6 > 1.6 +- > > So the problem started in the v2.6.30-rc6-1103-gb1bc81a merge. > I was unable to narrow it down to an exact commit; AFAICT the remaining > ones (between v2.6.30-rc6-1028-g8fc0fee and v2.6.30-rc6-1032-g7ba10a8) are > uninteresting. But it *must* be in this area! > > For a good overview of the area, use 'gitk 3f1f39c4..b1bc81a0'. > > v2.6.30-rc6-1028-g8fc0fee cfg80211: use key size constants > 1.11 ++++ > v2.6.30-rc6-1031-g1bb5633 iwmc3200wifi: fix printk format > 1.14 +++- not quite conclusive... > v2.6.30-rc6-1032-g7ba10a8 mac80211: fix transposed min/max CW values > 1.13 - > This is a bugfix for aa837ee1d from an earlier merge! Could this maybe > influence the test results in between? There are various SKB related > changes there, for example: dfbf97f3..e5b9215e. Maybe. Your commit id's are different to what I see. Maybe it's because your tree has been shuffled around a bit but after some digging around in this general area, I saw this patch 4752c93c30 iwlcore: Allow skb allocation from tasklet This patch increases the number of GFP_ATOMIC allocations that can occur by allocating GFP_ATOMIC in some cases and GFP_KERNEL in others. Previously, only GFP_KERNEL was used and I didn't realise this allocation method was so recent. Problems of this sort have cropped up before and while there are later changes that suppress some of these warnings, I believe this is a strong candidate for where the allocation failures started appearing. > v2.6.30-rc6-1037-g2c5b9e5 wireless: libertas: fix unaligned accesses > 1.12 +- > v2.6.30-rc6-1044-g729e9c7 cfg80211: fix for duplicate userspace replies > 1.10 +- > v2.6.30-rc6-1075-gc587de0 iwlwifi: unify station management > 1.9 ++-|+- > v2.6.30-rc6-1076-gd14d444 iwl3945: port allow skb allocation in tasklet > I thought this was a prime candidate, but as you can see several commits > before failed too. Still worth looking at I think! > Your commit IDs are different to what I see but it's the commit merge at b1bc81a0ef86b86fa410dd303d84c8c7bd09a64d. I agree that the last commit (d14d44407b9f06e3cf967fcef28ccb780caf0583) could make the problem worse because it expands the use of GFP_ATOMIC for another driver. > BISECTION of akpm (mm) MERGE > ---------------------------- > So here I went looking for "where does the test start failing on the first > try". Again, I was unable to narrow it down to a single commit. > > For a good overview of the area, use 'gitk f83b1e61..517d0869'. > > v2.6.30-5466-ga1dd268 mm: use alloc_pages_exact in alloc_large_system_hash > 2.3 +- > v2.6.30-5478-ge9bb35d mm: setup_per_zone_inactive_ratio - fix comment and.. > 2.5 +- > v2.6.30-5486-g35282a2 migration: only migrate_prep() once per move_pages() > 2.6 -|+|- not quite conclusive... > v2.6.30-5492-gbce7394 page-allocator: reset wmark_min and inactive ratio.. > 2.4 -|- > While I didn't spot anything too out of the ordinary here, they did occur shortly after a number of other page allocator related patches. One small thing I noticed there is that kswapd is getting woken up less now than it did previously. Generally, I wouldn't have expected it to make a difference but it's possible that kswapd is not being woken up to reclaim at a higher order than it was previously. I have a patch for this below. It'd be nice if you could apply it and see do fewer allocation failures occur on current mainline. > WHERE NEXT? > =========== > I think the results confirm there is definitely an issue here and that my > test is reliable and consistent enough to show it. And as it currently is > the only test we have... > > I hope that the info above is enough for the mm and wireless domain > experts to identify likely candidates in the areas I've identified. > > The next step could be trying specific reverts or debug patches, either on > top of current git, or 2.6.31, or inside the identified areas. > I'll run anything you care to throw at me and will try to provide any > additional info you need, but at this point it's up to you. > For the wireless people in mainline - iwl_rx_replenish_now() is doing a GFP_ATOMIC allocation that does not use __GFP_NOWARN. As part of investigating allocation failures, iwl_rx_allocate() was taught to distinguish between a benign and serious allocation failure - serious being there are very few RX buffers left and packet loss could occur soon (see commit f82a924cc88a5541df1d4b9d38a0968cd077a051). I think this GFP mask should be made GFP_ATOMIC|__GFP_NOWARN so that warnings only appear when the failure is serious, dump stack after the warning if you need it. I have a feeling that almost all these warnings have been benign and are related to the introduction of GFP_ATOMIC being used so heavily to move more expensive allocations to the tasklet (presumably to reduce user-visible latency). Frans, could you try the following kswapd-related patch please? I'd be interested in seeing if the number of allocation failure warnings are reduced with it. After that, could you edit drivers/net/wireless/iwlwifi/iwl-rx.c and make the GFP_ATOMIC in iwl_rx_replenish_now() GFP_ATOMIC|__GFP_NOWARN and see do any of the "serious" allocation failure messages appear. Thanks again for your persistence. ==== CUT HERE ==== >From 5296f50ce7ee6b276723ca21fa50d6db3d266075 Mon Sep 17 00:00:00 2001 From: Mel Gorman <mel@csn.ul.ie> Date: Mon, 12 Oct 2009 14:21:52 +0100 Subject: [PATCH] page-allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed If a direct reclaim makes no forward progress, it considers whether it should go OOM or not. Whether OOM is triggered or not, it may retry the application afterwards. In times past, this would always wake kswapd as well but currently, kswapd is not woken up after direct reclaim fails. For order-0 allocations, this makes little difference but if there is a heavy mix of higher-order allocations that direct reclaim is failing for, it might mean that kswapd is not reclaiming for higher orders as much as it did previously. This patch wakes up kswapd when an allocation is being retried after a direct reclaim failure. It would be expected that kswapd is already awake, but this has the effect of telling kswapd to reclaim at the higher order as well. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bf72055..dfa4362 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1817,9 +1817,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, if (NUMA_BUILD && (gfp_mask & GFP_THISNODE) == GFP_THISNODE) goto nopage; +restart: wake_all_kswapd(order, zonelist, high_zoneidx); -restart: /* * OK, we're below the kswapd watermark and have kicked background * reclaim. Now things get more complex, so set up alloc_flags according ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-12 13:43 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-12 13:43 UTC (permalink / raw) To: Frans Pop Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Mon, Oct 12, 2009 at 01:10:25AM +0200, Frans Pop wrote: > Sorry for going quiet on this issue for a few days, but I have been > spending *a lot* of time on it. I've done what amounts to 5 bisection > rounds at ~20 minutes per iteration and in total over 80 boots. > > The problem with my first bisection was that there are *at least two* > changes at the root of this issue, both committed between .30 and .30-rc1. > Because of this a normal bisection will not lead to a reliable result and > even with my last effort I can only narrow it down to two different areas, > and not 100% to specific commits. > Thanks very much for your detailed work on this. > The two identified areas are: > 1) a wireless merge which causes the SKB errors to appear in the first > place, but not always; > 2) an mm merge which makes the SKB errors occur *much* quicker; IMHO this > is the change that also causes the regressions reported by Pekka and > Karol. > > So below my results. The issue is both complex and subtle. Now it's up to > you, domain experts for both mm *and* wireless/networking, to make sense of > it all and come up with suggestions on how to proceed. > > I've improved my test and it's now a lot more reliable, but there are still > timing influences. The timing influences is probably because kswapd is working from the time memory gets full. High-order allocation failures would cause it to start reclaiming at that order so it's a race always to see can it do its work before an atomic allocation fails or not. > Also, because this is all merge-window stuff, I'm > hitting quite a few minor and major regressions between commits that can > affect tests. > > Please study the information below carefully. I know it's long, but I think > this issue justifies that. > Agreed. I'll be looking at commits, both wireless and mm but obviously anything I saw about wireless needs to be taken with a generous dose of salt. > On Monday 05 October 2009, Frans Pop wrote: > > This looks conclusive. I tested .30 and .32-rc3 from clean reboots and > > only starting gitk. I only started music playing in the background > > (amarok) from an NFS share to ensure network activity. > > > > With .32-rc3 I got 4 SKB allocation errors while starting the *second* > > gitk instance. And the system was completely frozen with music stopped > > until gitk finished loading. > > With .32-rc3, .31.1 and vanilla .31 I will get multiple SKB allocation > errors the *first time* I run the test, *every* time. > So, this remains a current problem that wasn't solved by accident. > > With .30 I was able to start *three* gitk's (which meant 2 of them got > > (partially) swapped out) without any allocation errors. And with the > > system remaining relatively responsive. There was a short break in the > > music while I started the 2nd instance, but it just continued playing > > afterwards. There was also some mild latency in the mouse cursor, but > > nothing like the full desktop freeze I get with .32-rc3. > > With both .30.2 and vanilla .30 I have *never* been able to get any SKB > allocation errors. No matter how often I repeat the test. > > So, the start and end position are 100% reproducible. Problem is that this > changes during the bisection. At some point the test will fail (no SKB > errors) the first time I run it, but it will fail on the second or third > attempt. > Apparently at some point memory must already be fragmented (or higher > orders already used up) to some extend for the errors to trigger. > That is a reasonable assessment. It could be because 1. Something in the intevening commits greatly increases the number of GFP_ATOMIC allocations that are occuring. It's a pity that the allocator tracepoints are not available in those kernels. It would have made investigating this theory easier. 2. kswapd is no longer reclaiming high-order pages as well as it used to be it due to changes in kswapd itself or lumpy reclaim 3. Fragmentation avoidance has been broken in some subtle manner I think 3 is particularly unlikely and am expecting it to be 1 or 2. > TEST METHOD > ----------- > As a normal bisection (I tried 3 times...) did not lead anywhere, I had to > think of an alternative approach. I decided to start by manually selecting > merges by Linus into mainline. The advantage is that that makes the > bisection linear and makes it a lot easier to see patterns. > After narrowing down to a specific merge, I bisected (again semi-manually) > inside that merge. > > Because I suspected there were multiple changes involved, I deliberately > tried to find two points: > - where do I first start seeing SKB errors at all, even if it is only at > the second or third try; > - where do I start getting SKB errors reliably on the first try. > > I worked from "good" to "bad", i.e. I started at .30. The merges were not > chosen completely randomly. From the first 3 bisections I strongly > suspected the first 'net-next' merge and the first 'akpm' merge, but I did > make sure to confirm that suspicion. > A very good approach. > TEST DESCRIPTION > ---------------- > The test I've ended up using is: > 1) clean boot > 2) start music in amarok from NFS share; use very long song to avoid file > changes and thus ensure a fluent stream of network data during the test > 3) start 'gitk v2.6.29..master &' - to use up some memory > 4) start first 'gitk master &' - after this all normal memory is as good as > used up, with minor swap; this never resulted in SKB errors > 5) start second 'gitk master &' - this causes heavy swapping (>700 MB) and > is the real test > 6) if there were no SKB errors after 5), kill the gitk processes and repeat > steps 3) to 5). I've done this up to 4 times in some cases > 7) if the results are not clear or when there is doubt later, repeat from > step 1) with same kernel > > Memory after initial 'gitk v2.6.29..master &': > total used free shared buffers cached > Mem: 2030776 1153008 877768 0 41572 333968 > -/+ buffers/cache: 777468 1253308 > Swap: 2097144 0 2097144 > > Memory after first 'gitk master &': > total used free shared buffers cached > Mem: 2030776 1979040 51736 0 35684 238420 > -/+ buffers/cache: 1704936 325840 > Swap: 2097144 21876 2075268 > > Memory after second 'gitk master &' (with .30.2): > total used free shared buffers cached > Mem: 2030776 2011608 19168 0 21836 92336 > -/+ buffers/cache: 1897436 133340 > Swap: 2097144 776160 1320984 > > OVERVIEW OF RESULTS > ------------------- > Below I list the most relevant merges and commits. Note that they are > listed in commit order; my kernel version shows the order of testing. > > For the commits I tested the test results are listed on the next line. > The first number on that line consists of the test series + the iteration > (and also identifies the kernel I used). > A "+" means I got no SKB errors, a "-" that I did get them. A "|" means I > rebooted for a second series of tests. > > v2.6.30-2330-gdb8e7f1 'x86-fixes-for-linus' of linux-2.6-tip > 1.1 +++ iwlagn sw-error during first test > v2.6.30-4127-g0fa2133 'merge' of powerpc (last merge before net-next-2.6) > 1.2 +++ > v2.6.30-5398-g2ed0e21 net-next-2.6 (mega-merge!) > 1.4 +- system reboot fails after testing > v2.6.30-5517-g609106b 'merge' of powerpc > 1.3 +- system reboot fails after testing > v2.6.30-5927-gf83b1e6 'for-linus' of linux1394-2.6 (last merge before akpm) > 2.2 ++- > v2.6.30-6111-g517d086 'akpm' > 2.1 -|- > > BISECTION OF net-next-2.6 MERGE > ------------------------------- > Note that this merge was based not on .30 vanilla, but partly on > v2.6.30-rc1 and partly on v2.6.30-rc6. > I think this had an influence on the latencies I saw (i.e. because some > post-rc6 bug fixes were not present it changes the general behavior of the > system during the swapping). For example: with v2.6.30-4127-g0fa2133 the > system remained more responsive (smaller music skips) than with > v2.6.30-rc1-1219-g82d0481. > > I started again by testing merges, this time those by David. > > v2.6.30-rc1-1219-g82d0481 'master' of wireless-next-2.6 > 1.5 ++++ bad latencies The bad latencies might imply that there are a lot more allocations going on than there used to be. Maybe it was just because of a wireless bug though that was later fixed. > v2.6.30-rc6-660-gbb803cf 'master' of net-2.6 > v2.6.30-rc6-808-g45ea4ea 'master' of wireless-next-2.6 > v2.6.30-rc6-850-gc649c0e 'master' of net-2.6 > v2.6.30-rc6-922-g3f1f39c 'linux-2.6.31.y' of wimax > v2.6.30-rc6-999-gb2f8f75 'master' of net-2.6 > v2.6.30-rc6-1028-ga8c617e 'net-next' of lksctp-dev > 1.7 ++++|++++|++++ > I went back to this one twice because the bisection inside the > next merge (see below) did not give a clear result. > v2.6.30-rc6-1103-gb1bc81a 'master' of wireless-next-2.6 > 1.8 +- > v2.6.30-rc6-1224-g84503dd 'master' of wireless-next-2.6 > 1.6 +- > > So the problem started in the v2.6.30-rc6-1103-gb1bc81a merge. > I was unable to narrow it down to an exact commit; AFAICT the remaining > ones (between v2.6.30-rc6-1028-g8fc0fee and v2.6.30-rc6-1032-g7ba10a8) are > uninteresting. But it *must* be in this area! > > For a good overview of the area, use 'gitk 3f1f39c4..b1bc81a0'. > > v2.6.30-rc6-1028-g8fc0fee cfg80211: use key size constants > 1.11 ++++ > v2.6.30-rc6-1031-g1bb5633 iwmc3200wifi: fix printk format > 1.14 +++- not quite conclusive... > v2.6.30-rc6-1032-g7ba10a8 mac80211: fix transposed min/max CW values > 1.13 - > This is a bugfix for aa837ee1d from an earlier merge! Could this maybe > influence the test results in between? There are various SKB related > changes there, for example: dfbf97f3..e5b9215e. Maybe. Your commit id's are different to what I see. Maybe it's because your tree has been shuffled around a bit but after some digging around in this general area, I saw this patch 4752c93c30 iwlcore: Allow skb allocation from tasklet This patch increases the number of GFP_ATOMIC allocations that can occur by allocating GFP_ATOMIC in some cases and GFP_KERNEL in others. Previously, only GFP_KERNEL was used and I didn't realise this allocation method was so recent. Problems of this sort have cropped up before and while there are later changes that suppress some of these warnings, I believe this is a strong candidate for where the allocation failures started appearing. > v2.6.30-rc6-1037-g2c5b9e5 wireless: libertas: fix unaligned accesses > 1.12 +- > v2.6.30-rc6-1044-g729e9c7 cfg80211: fix for duplicate userspace replies > 1.10 +- > v2.6.30-rc6-1075-gc587de0 iwlwifi: unify station management > 1.9 ++-|+- > v2.6.30-rc6-1076-gd14d444 iwl3945: port allow skb allocation in tasklet > I thought this was a prime candidate, but as you can see several commits > before failed too. Still worth looking at I think! > Your commit IDs are different to what I see but it's the commit merge at b1bc81a0ef86b86fa410dd303d84c8c7bd09a64d. I agree that the last commit (d14d44407b9f06e3cf967fcef28ccb780caf0583) could make the problem worse because it expands the use of GFP_ATOMIC for another driver. > BISECTION of akpm (mm) MERGE > ---------------------------- > So here I went looking for "where does the test start failing on the first > try". Again, I was unable to narrow it down to a single commit. > > For a good overview of the area, use 'gitk f83b1e61..517d0869'. > > v2.6.30-5466-ga1dd268 mm: use alloc_pages_exact in alloc_large_system_hash > 2.3 +- > v2.6.30-5478-ge9bb35d mm: setup_per_zone_inactive_ratio - fix comment and.. > 2.5 +- > v2.6.30-5486-g35282a2 migration: only migrate_prep() once per move_pages() > 2.6 -|+|- not quite conclusive... > v2.6.30-5492-gbce7394 page-allocator: reset wmark_min and inactive ratio.. > 2.4 -|- > While I didn't spot anything too out of the ordinary here, they did occur shortly after a number of other page allocator related patches. One small thing I noticed there is that kswapd is getting woken up less now than it did previously. Generally, I wouldn't have expected it to make a difference but it's possible that kswapd is not being woken up to reclaim at a higher order than it was previously. I have a patch for this below. It'd be nice if you could apply it and see do fewer allocation failures occur on current mainline. > WHERE NEXT? > =========== > I think the results confirm there is definitely an issue here and that my > test is reliable and consistent enough to show it. And as it currently is > the only test we have... > > I hope that the info above is enough for the mm and wireless domain > experts to identify likely candidates in the areas I've identified. > > The next step could be trying specific reverts or debug patches, either on > top of current git, or 2.6.31, or inside the identified areas. > I'll run anything you care to throw at me and will try to provide any > additional info you need, but at this point it's up to you. > For the wireless people in mainline - iwl_rx_replenish_now() is doing a GFP_ATOMIC allocation that does not use __GFP_NOWARN. As part of investigating allocation failures, iwl_rx_allocate() was taught to distinguish between a benign and serious allocation failure - serious being there are very few RX buffers left and packet loss could occur soon (see commit f82a924cc88a5541df1d4b9d38a0968cd077a051). I think this GFP mask should be made GFP_ATOMIC|__GFP_NOWARN so that warnings only appear when the failure is serious, dump stack after the warning if you need it. I have a feeling that almost all these warnings have been benign and are related to the introduction of GFP_ATOMIC being used so heavily to move more expensive allocations to the tasklet (presumably to reduce user-visible latency). Frans, could you try the following kswapd-related patch please? I'd be interested in seeing if the number of allocation failure warnings are reduced with it. After that, could you edit drivers/net/wireless/iwlwifi/iwl-rx.c and make the GFP_ATOMIC in iwl_rx_replenish_now() GFP_ATOMIC|__GFP_NOWARN and see do any of the "serious" allocation failure messages appear. Thanks again for your persistence. ==== CUT HERE ==== From 5296f50ce7ee6b276723ca21fa50d6db3d266075 Mon Sep 17 00:00:00 2001 From: Mel Gorman <mel@csn.ul.ie> Date: Mon, 12 Oct 2009 14:21:52 +0100 Subject: [PATCH] page-allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed If a direct reclaim makes no forward progress, it considers whether it should go OOM or not. Whether OOM is triggered or not, it may retry the application afterwards. In times past, this would always wake kswapd as well but currently, kswapd is not woken up after direct reclaim fails. For order-0 allocations, this makes little difference but if there is a heavy mix of higher-order allocations that direct reclaim is failing for, it might mean that kswapd is not reclaiming for higher orders as much as it did previously. This patch wakes up kswapd when an allocation is being retried after a direct reclaim failure. It would be expected that kswapd is already awake, but this has the effect of telling kswapd to reclaim at the higher order as well. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bf72055..dfa4362 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1817,9 +1817,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, if (NUMA_BUILD && (gfp_mask & GFP_THISNODE) == GFP_THISNODE) goto nopage; +restart: wake_all_kswapd(order, zonelist, high_zoneidx); -restart: /* * OK, we're below the kswapd watermark and have kicked background * reclaim. Now things get more complex, so set up alloc_flags according -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-12 13:43 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-12 13:43 UTC (permalink / raw) To: Frans Pop Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Mon, Oct 12, 2009 at 01:10:25AM +0200, Frans Pop wrote: > Sorry for going quiet on this issue for a few days, but I have been > spending *a lot* of time on it. I've done what amounts to 5 bisection > rounds at ~20 minutes per iteration and in total over 80 boots. > > The problem with my first bisection was that there are *at least two* > changes at the root of this issue, both committed between .30 and .30-rc1. > Because of this a normal bisection will not lead to a reliable result and > even with my last effort I can only narrow it down to two different areas, > and not 100% to specific commits. > Thanks very much for your detailed work on this. > The two identified areas are: > 1) a wireless merge which causes the SKB errors to appear in the first > place, but not always; > 2) an mm merge which makes the SKB errors occur *much* quicker; IMHO this > is the change that also causes the regressions reported by Pekka and > Karol. > > So below my results. The issue is both complex and subtle. Now it's up to > you, domain experts for both mm *and* wireless/networking, to make sense of > it all and come up with suggestions on how to proceed. > > I've improved my test and it's now a lot more reliable, but there are still > timing influences. The timing influences is probably because kswapd is working from the time memory gets full. High-order allocation failures would cause it to start reclaiming at that order so it's a race always to see can it do its work before an atomic allocation fails or not. > Also, because this is all merge-window stuff, I'm > hitting quite a few minor and major regressions between commits that can > affect tests. > > Please study the information below carefully. I know it's long, but I think > this issue justifies that. > Agreed. I'll be looking at commits, both wireless and mm but obviously anything I saw about wireless needs to be taken with a generous dose of salt. > On Monday 05 October 2009, Frans Pop wrote: > > This looks conclusive. I tested .30 and .32-rc3 from clean reboots and > > only starting gitk. I only started music playing in the background > > (amarok) from an NFS share to ensure network activity. > > > > With .32-rc3 I got 4 SKB allocation errors while starting the *second* > > gitk instance. And the system was completely frozen with music stopped > > until gitk finished loading. > > With .32-rc3, .31.1 and vanilla .31 I will get multiple SKB allocation > errors the *first time* I run the test, *every* time. > So, this remains a current problem that wasn't solved by accident. > > With .30 I was able to start *three* gitk's (which meant 2 of them got > > (partially) swapped out) without any allocation errors. And with the > > system remaining relatively responsive. There was a short break in the > > music while I started the 2nd instance, but it just continued playing > > afterwards. There was also some mild latency in the mouse cursor, but > > nothing like the full desktop freeze I get with .32-rc3. > > With both .30.2 and vanilla .30 I have *never* been able to get any SKB > allocation errors. No matter how often I repeat the test. > > So, the start and end position are 100% reproducible. Problem is that this > changes during the bisection. At some point the test will fail (no SKB > errors) the first time I run it, but it will fail on the second or third > attempt. > Apparently at some point memory must already be fragmented (or higher > orders already used up) to some extend for the errors to trigger. > That is a reasonable assessment. It could be because 1. Something in the intevening commits greatly increases the number of GFP_ATOMIC allocations that are occuring. It's a pity that the allocator tracepoints are not available in those kernels. It would have made investigating this theory easier. 2. kswapd is no longer reclaiming high-order pages as well as it used to be it due to changes in kswapd itself or lumpy reclaim 3. Fragmentation avoidance has been broken in some subtle manner I think 3 is particularly unlikely and am expecting it to be 1 or 2. > TEST METHOD > ----------- > As a normal bisection (I tried 3 times...) did not lead anywhere, I had to > think of an alternative approach. I decided to start by manually selecting > merges by Linus into mainline. The advantage is that that makes the > bisection linear and makes it a lot easier to see patterns. > After narrowing down to a specific merge, I bisected (again semi-manually) > inside that merge. > > Because I suspected there were multiple changes involved, I deliberately > tried to find two points: > - where do I first start seeing SKB errors at all, even if it is only at > the second or third try; > - where do I start getting SKB errors reliably on the first try. > > I worked from "good" to "bad", i.e. I started at .30. The merges were not > chosen completely randomly. From the first 3 bisections I strongly > suspected the first 'net-next' merge and the first 'akpm' merge, but I did > make sure to confirm that suspicion. > A very good approach. > TEST DESCRIPTION > ---------------- > The test I've ended up using is: > 1) clean boot > 2) start music in amarok from NFS share; use very long song to avoid file > changes and thus ensure a fluent stream of network data during the test > 3) start 'gitk v2.6.29..master &' - to use up some memory > 4) start first 'gitk master &' - after this all normal memory is as good as > used up, with minor swap; this never resulted in SKB errors > 5) start second 'gitk master &' - this causes heavy swapping (>700 MB) and > is the real test > 6) if there were no SKB errors after 5), kill the gitk processes and repeat > steps 3) to 5). I've done this up to 4 times in some cases > 7) if the results are not clear or when there is doubt later, repeat from > step 1) with same kernel > > Memory after initial 'gitk v2.6.29..master &': > total used free shared buffers cached > Mem: 2030776 1153008 877768 0 41572 333968 > -/+ buffers/cache: 777468 1253308 > Swap: 2097144 0 2097144 > > Memory after first 'gitk master &': > total used free shared buffers cached > Mem: 2030776 1979040 51736 0 35684 238420 > -/+ buffers/cache: 1704936 325840 > Swap: 2097144 21876 2075268 > > Memory after second 'gitk master &' (with .30.2): > total used free shared buffers cached > Mem: 2030776 2011608 19168 0 21836 92336 > -/+ buffers/cache: 1897436 133340 > Swap: 2097144 776160 1320984 > > OVERVIEW OF RESULTS > ------------------- > Below I list the most relevant merges and commits. Note that they are > listed in commit order; my kernel version shows the order of testing. > > For the commits I tested the test results are listed on the next line. > The first number on that line consists of the test series + the iteration > (and also identifies the kernel I used). > A "+" means I got no SKB errors, a "-" that I did get them. A "|" means I > rebooted for a second series of tests. > > v2.6.30-2330-gdb8e7f1 'x86-fixes-for-linus' of linux-2.6-tip > 1.1 +++ iwlagn sw-error during first test > v2.6.30-4127-g0fa2133 'merge' of powerpc (last merge before net-next-2.6) > 1.2 +++ > v2.6.30-5398-g2ed0e21 net-next-2.6 (mega-merge!) > 1.4 +- system reboot fails after testing > v2.6.30-5517-g609106b 'merge' of powerpc > 1.3 +- system reboot fails after testing > v2.6.30-5927-gf83b1e6 'for-linus' of linux1394-2.6 (last merge before akpm) > 2.2 ++- > v2.6.30-6111-g517d086 'akpm' > 2.1 -|- > > BISECTION OF net-next-2.6 MERGE > ------------------------------- > Note that this merge was based not on .30 vanilla, but partly on > v2.6.30-rc1 and partly on v2.6.30-rc6. > I think this had an influence on the latencies I saw (i.e. because some > post-rc6 bug fixes were not present it changes the general behavior of the > system during the swapping). For example: with v2.6.30-4127-g0fa2133 the > system remained more responsive (smaller music skips) than with > v2.6.30-rc1-1219-g82d0481. > > I started again by testing merges, this time those by David. > > v2.6.30-rc1-1219-g82d0481 'master' of wireless-next-2.6 > 1.5 ++++ bad latencies The bad latencies might imply that there are a lot more allocations going on than there used to be. Maybe it was just because of a wireless bug though that was later fixed. > v2.6.30-rc6-660-gbb803cf 'master' of net-2.6 > v2.6.30-rc6-808-g45ea4ea 'master' of wireless-next-2.6 > v2.6.30-rc6-850-gc649c0e 'master' of net-2.6 > v2.6.30-rc6-922-g3f1f39c 'linux-2.6.31.y' of wimax > v2.6.30-rc6-999-gb2f8f75 'master' of net-2.6 > v2.6.30-rc6-1028-ga8c617e 'net-next' of lksctp-dev > 1.7 ++++|++++|++++ > I went back to this one twice because the bisection inside the > next merge (see below) did not give a clear result. > v2.6.30-rc6-1103-gb1bc81a 'master' of wireless-next-2.6 > 1.8 +- > v2.6.30-rc6-1224-g84503dd 'master' of wireless-next-2.6 > 1.6 +- > > So the problem started in the v2.6.30-rc6-1103-gb1bc81a merge. > I was unable to narrow it down to an exact commit; AFAICT the remaining > ones (between v2.6.30-rc6-1028-g8fc0fee and v2.6.30-rc6-1032-g7ba10a8) are > uninteresting. But it *must* be in this area! > > For a good overview of the area, use 'gitk 3f1f39c4..b1bc81a0'. > > v2.6.30-rc6-1028-g8fc0fee cfg80211: use key size constants > 1.11 ++++ > v2.6.30-rc6-1031-g1bb5633 iwmc3200wifi: fix printk format > 1.14 +++- not quite conclusive... > v2.6.30-rc6-1032-g7ba10a8 mac80211: fix transposed min/max CW values > 1.13 - > This is a bugfix for aa837ee1d from an earlier merge! Could this maybe > influence the test results in between? There are various SKB related > changes there, for example: dfbf97f3..e5b9215e. Maybe. Your commit id's are different to what I see. Maybe it's because your tree has been shuffled around a bit but after some digging around in this general area, I saw this patch 4752c93c30 iwlcore: Allow skb allocation from tasklet This patch increases the number of GFP_ATOMIC allocations that can occur by allocating GFP_ATOMIC in some cases and GFP_KERNEL in others. Previously, only GFP_KERNEL was used and I didn't realise this allocation method was so recent. Problems of this sort have cropped up before and while there are later changes that suppress some of these warnings, I believe this is a strong candidate for where the allocation failures started appearing. > v2.6.30-rc6-1037-g2c5b9e5 wireless: libertas: fix unaligned accesses > 1.12 +- > v2.6.30-rc6-1044-g729e9c7 cfg80211: fix for duplicate userspace replies > 1.10 +- > v2.6.30-rc6-1075-gc587de0 iwlwifi: unify station management > 1.9 ++-|+- > v2.6.30-rc6-1076-gd14d444 iwl3945: port allow skb allocation in tasklet > I thought this was a prime candidate, but as you can see several commits > before failed too. Still worth looking at I think! > Your commit IDs are different to what I see but it's the commit merge at b1bc81a0ef86b86fa410dd303d84c8c7bd09a64d. I agree that the last commit (d14d44407b9f06e3cf967fcef28ccb780caf0583) could make the problem worse because it expands the use of GFP_ATOMIC for another driver. > BISECTION of akpm (mm) MERGE > ---------------------------- > So here I went looking for "where does the test start failing on the first > try". Again, I was unable to narrow it down to a single commit. > > For a good overview of the area, use 'gitk f83b1e61..517d0869'. > > v2.6.30-5466-ga1dd268 mm: use alloc_pages_exact in alloc_large_system_hash > 2.3 +- > v2.6.30-5478-ge9bb35d mm: setup_per_zone_inactive_ratio - fix comment and.. > 2.5 +- > v2.6.30-5486-g35282a2 migration: only migrate_prep() once per move_pages() > 2.6 -|+|- not quite conclusive... > v2.6.30-5492-gbce7394 page-allocator: reset wmark_min and inactive ratio.. > 2.4 -|- > While I didn't spot anything too out of the ordinary here, they did occur shortly after a number of other page allocator related patches. One small thing I noticed there is that kswapd is getting woken up less now than it did previously. Generally, I wouldn't have expected it to make a difference but it's possible that kswapd is not being woken up to reclaim at a higher order than it was previously. I have a patch for this below. It'd be nice if you could apply it and see do fewer allocation failures occur on current mainline. > WHERE NEXT? > =========== > I think the results confirm there is definitely an issue here and that my > test is reliable and consistent enough to show it. And as it currently is > the only test we have... > > I hope that the info above is enough for the mm and wireless domain > experts to identify likely candidates in the areas I've identified. > > The next step could be trying specific reverts or debug patches, either on > top of current git, or 2.6.31, or inside the identified areas. > I'll run anything you care to throw at me and will try to provide any > additional info you need, but at this point it's up to you. > For the wireless people in mainline - iwl_rx_replenish_now() is doing a GFP_ATOMIC allocation that does not use __GFP_NOWARN. As part of investigating allocation failures, iwl_rx_allocate() was taught to distinguish between a benign and serious allocation failure - serious being there are very few RX buffers left and packet loss could occur soon (see commit f82a924cc88a5541df1d4b9d38a0968cd077a051). I think this GFP mask should be made GFP_ATOMIC|__GFP_NOWARN so that warnings only appear when the failure is serious, dump stack after the warning if you need it. I have a feeling that almost all these warnings have been benign and are related to the introduction of GFP_ATOMIC being used so heavily to move more expensive allocations to the tasklet (presumably to reduce user-visible latency). Frans, could you try the following kswapd-related patch please? I'd be interested in seeing if the number of allocation failure warnings are reduced with it. After that, could you edit drivers/net/wireless/iwlwifi/iwl-rx.c and make the GFP_ATOMIC in iwl_rx_replenish_now() GFP_ATOMIC|__GFP_NOWARN and see do any of the "serious" allocation failure messages appear. Thanks again for your persistence. ==== CUT HERE ==== ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-12 13:43 ` Mel Gorman @ 2009-10-12 17:32 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-12 17:32 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Monday 12 October 2009, Mel Gorman wrote: > Maybe. Your commit id's are different to what I see. Maybe it's because > your tree has been shuffled around a bit No, the commit IDs should be identical. My tree is just plain mainline. Just to make sure... You did remove the "g" from the IDs, right? So v2.6.30-rc6-1103-gb1bc81a becomes 'b1bc81a' and if you do 'git describe b1bc81a' you really should end up with the same IDs I have. > but after some digging around in this general area, I saw this patch > > 4752c93c30 iwlcore: Allow skb allocation from tasklet That is v2.6.30-rc6-773-g4752c93, which is part of the first wireless merge I tested and where I saw no issues. But see below. > This patch increases the number of GFP_ATOMIC allocations that can occur > by allocating GFP_ATOMIC in some cases and GFP_KERNEL in others. > Previously, only GFP_KERNEL was used and I didn't realise this > allocation method was so recent. Problems of this sort have cropped up > before and while there are later changes that suppress some of these > warnings, I believe this is a strong candidate for where the allocation > failures started appearing. > > > v2.6.30-rc6-1032-g7ba10a8 mac80211: fix transposed min/max CW values > > 1.13 - > > This is a bugfix for aa837ee1d from an earlier merge! Could this maybe There's a typo here. That ID should be: aa837e1d. > > influence the test results in between? There are various SKB related > > changes there, for example: dfbf97f3..e5b9215e. > > v2.6.30-rc6-1037-g2c5b9e5 wireless: libertas: fix unaligned accesses > > 1.12 +- > > v2.6.30-rc6-1044-g729e9c7 cfg80211: fix for duplicate userspace replies > > 1.10 +- > > v2.6.30-rc6-1075-gc587de0 iwlwifi: unify station management > > 1.9 ++-|+- > > v2.6.30-rc6-1076-gd14d444 iwl3945: port allow skb allocation in tasklet > > I thought this was a prime candidate, but as you can see > > several commits before failed too. Still worth looking at I think! > > Your commit IDs are different to what I see but it's the commit merge at > b1bc81a0ef86b86fa410dd303d84c8c7bd09a64d. I agree that the last commit > (d14d44407b9f06e3cf967fcef28ccb780caf0583) could make the problem worse > because it expands the use of GFP_ATOMIC for another driver. No, that was a mistake of mine. d14d444 is in a driver I don't even compile. The one you identified (which is the same change for iwlagn) is much more interesting. I really do think that v2.6.30-rc6-1032-g7ba10a8 could play a role here. That's a fix for v2.6.30-rc1-1131-gaa837e1. So that bug was introduced _before_ the merge 82d0481 and may thus well explain both the latencies I saw _and_ why that merge tested without problems. And that would also go a long way to explain my test results. So I'm going to retest 82d0481 with 7ba10a8 cherry-picked on top. > > BISECTION of akpm (mm) MERGE > > ---------------------------- [...] > While I didn't spot anything too out of the ordinary here, they did > occur shortly after a number of other page allocator related patches. > One small thing I noticed there is that kswapd is getting woken up less > now than it did previously. Generally, I wouldn't have expected it to > make a difference but it's possible that kswapd is not being woken up to > reclaim at a higher order than it was previously. I have a patch for > this below. It'd be nice if you could apply it and see do fewer > allocation failures occur on current mainline. I'll give that patch a try and report back. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-12 17:32 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-12 17:32 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Monday 12 October 2009, Mel Gorman wrote: > Maybe. Your commit id's are different to what I see. Maybe it's because > your tree has been shuffled around a bit No, the commit IDs should be identical. My tree is just plain mainline. Just to make sure... You did remove the "g" from the IDs, right? So v2.6.30-rc6-1103-gb1bc81a becomes 'b1bc81a' and if you do 'git describe b1bc81a' you really should end up with the same IDs I have. > but after some digging around in this general area, I saw this patch > > 4752c93c30 iwlcore: Allow skb allocation from tasklet That is v2.6.30-rc6-773-g4752c93, which is part of the first wireless merge I tested and where I saw no issues. But see below. > This patch increases the number of GFP_ATOMIC allocations that can occur > by allocating GFP_ATOMIC in some cases and GFP_KERNEL in others. > Previously, only GFP_KERNEL was used and I didn't realise this > allocation method was so recent. Problems of this sort have cropped up > before and while there are later changes that suppress some of these > warnings, I believe this is a strong candidate for where the allocation > failures started appearing. > > > v2.6.30-rc6-1032-g7ba10a8 mac80211: fix transposed min/max CW values > > 1.13 - > > This is a bugfix for aa837ee1d from an earlier merge! Could this maybe There's a typo here. That ID should be: aa837e1d. > > influence the test results in between? There are various SKB related > > changes there, for example: dfbf97f3..e5b9215e. > > v2.6.30-rc6-1037-g2c5b9e5 wireless: libertas: fix unaligned accesses > > 1.12 +- > > v2.6.30-rc6-1044-g729e9c7 cfg80211: fix for duplicate userspace replies > > 1.10 +- > > v2.6.30-rc6-1075-gc587de0 iwlwifi: unify station management > > 1.9 ++-|+- > > v2.6.30-rc6-1076-gd14d444 iwl3945: port allow skb allocation in tasklet > > I thought this was a prime candidate, but as you can see > > several commits before failed too. Still worth looking at I think! > > Your commit IDs are different to what I see but it's the commit merge at > b1bc81a0ef86b86fa410dd303d84c8c7bd09a64d. I agree that the last commit > (d14d44407b9f06e3cf967fcef28ccb780caf0583) could make the problem worse > because it expands the use of GFP_ATOMIC for another driver. No, that was a mistake of mine. d14d444 is in a driver I don't even compile. The one you identified (which is the same change for iwlagn) is much more interesting. I really do think that v2.6.30-rc6-1032-g7ba10a8 could play a role here. That's a fix for v2.6.30-rc1-1131-gaa837e1. So that bug was introduced _before_ the merge 82d0481 and may thus well explain both the latencies I saw _and_ why that merge tested without problems. And that would also go a long way to explain my test results. So I'm going to retest 82d0481 with 7ba10a8 cherry-picked on top. > > BISECTION of akpm (mm) MERGE > > ---------------------------- [...] > While I didn't spot anything too out of the ordinary here, they did > occur shortly after a number of other page allocator related patches. > One small thing I noticed there is that kswapd is getting woken up less > now than it did previously. Generally, I wouldn't have expected it to > make a difference but it's possible that kswapd is not being woken up to > reclaim at a higher order than it was previously. I have a patch for > this below. It'd be nice if you could apply it and see do fewer > allocation failures occur on current mainline. I'll give that patch a try and report back. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-12 17:32 ` Frans Pop @ 2009-10-12 18:43 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-12 18:43 UTC (permalink / raw) To: Frans Pop Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Mon, Oct 12, 2009 at 07:32:11PM +0200, Frans Pop wrote: > On Monday 12 October 2009, Mel Gorman wrote: > > Maybe. Your commit id's are different to what I see. Maybe it's because > > your tree has been shuffled around a bit > > No, the commit IDs should be identical. My tree is just plain mainline. > > Just to make sure... You did remove the "g" from the IDs, right? > So v2.6.30-rc6-1103-gb1bc81a becomes 'b1bc81a' and if you do > 'git describe b1bc81a' you really should end up with the same IDs I have. > Bah, that's what I was doing all right. No excuse, that was just plain stupid of me. > > but after some digging around in this general area, I saw this patch > > > > 4752c93c30 iwlcore: Allow skb allocation from tasklet > > That is v2.6.30-rc6-773-g4752c93, which is part of the first wireless > merge I tested and where I saw no issues. But see below. > While there were no issues at that point, I think it might have been the beginning of a few patches that made things progressively worse. It is possible there is more than one patch causing trouble here and bisecting each of them is unlikely to be an option. More on this later though. > > This patch increases the number of GFP_ATOMIC allocations that can occur > > by allocating GFP_ATOMIC in some cases and GFP_KERNEL in others. > > Previously, only GFP_KERNEL was used and I didn't realise this > > allocation method was so recent. Problems of this sort have cropped up > > before and while there are later changes that suppress some of these > > warnings, I believe this is a strong candidate for where the allocation > > failures started appearing. > > > > > v2.6.30-rc6-1032-g7ba10a8 mac80211: fix transposed min/max CW values > > > 1.13 - > > > This is a bugfix for aa837ee1d from an earlier merge! Could this maybe > > There's a typo here. That ID should be: aa837e1d. > > > > influence the test results in between? There are various SKB related > > > changes there, for example: dfbf97f3..e5b9215e. > > > v2.6.30-rc6-1037-g2c5b9e5 wireless: libertas: fix unaligned accesses > > > 1.12 +- > > > v2.6.30-rc6-1044-g729e9c7 cfg80211: fix for duplicate userspace replies > > > 1.10 +- > > > v2.6.30-rc6-1075-gc587de0 iwlwifi: unify station management > > > 1.9 ++-|+- > > > v2.6.30-rc6-1076-gd14d444 iwl3945: port allow skb allocation in tasklet > > > I thought this was a prime candidate, but as you can see > > > several commits before failed too. Still worth looking at I think! > > > > Your commit IDs are different to what I see but it's the commit merge at > > b1bc81a0ef86b86fa410dd303d84c8c7bd09a64d. I agree that the last commit > > (d14d44407b9f06e3cf967fcef28ccb780caf0583) could make the problem worse > > because it expands the use of GFP_ATOMIC for another driver. > > No, that was a mistake of mine. d14d444 is in a driver I don't even compile. > The one you identified (which is the same change for iwlagn) is much more > interesting. > I had forgotten what model your card was and assumed it must have been based on this driver for the problem to get worse for you that point. > I really do think that v2.6.30-rc6-1032-g7ba10a8 could play a role here. > That's a fix for v2.6.30-rc1-1131-gaa837e1. So that bug was introduced > _before_ the merge 82d0481 and may thus well explain both the latencies I > saw _and_ why that merge tested without problems. And that would also go a > long way to explain my test results. Very good point. > So I'm going to retest 82d0481 with 7ba10a8 cherry-picked on top. > Great. > > > BISECTION of akpm (mm) MERGE > > > ---------------------------- > [...] > > While I didn't spot anything too out of the ordinary here, they did > > occur shortly after a number of other page allocator related patches. > > One small thing I noticed there is that kswapd is getting woken up less > > now than it did previously. Generally, I wouldn't have expected it to > > make a difference but it's possible that kswapd is not being woken up to > > reclaim at a higher order than it was previously. I have a patch for > > this below. It'd be nice if you could apply it and see do fewer > > allocation failures occur on current mainline. > > I'll give that patch a try and report back. > Thanks a lot. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-12 18:43 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-12 18:43 UTC (permalink / raw) To: Frans Pop Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Mon, Oct 12, 2009 at 07:32:11PM +0200, Frans Pop wrote: > On Monday 12 October 2009, Mel Gorman wrote: > > Maybe. Your commit id's are different to what I see. Maybe it's because > > your tree has been shuffled around a bit > > No, the commit IDs should be identical. My tree is just plain mainline. > > Just to make sure... You did remove the "g" from the IDs, right? > So v2.6.30-rc6-1103-gb1bc81a becomes 'b1bc81a' and if you do > 'git describe b1bc81a' you really should end up with the same IDs I have. > Bah, that's what I was doing all right. No excuse, that was just plain stupid of me. > > but after some digging around in this general area, I saw this patch > > > > 4752c93c30 iwlcore: Allow skb allocation from tasklet > > That is v2.6.30-rc6-773-g4752c93, which is part of the first wireless > merge I tested and where I saw no issues. But see below. > While there were no issues at that point, I think it might have been the beginning of a few patches that made things progressively worse. It is possible there is more than one patch causing trouble here and bisecting each of them is unlikely to be an option. More on this later though. > > This patch increases the number of GFP_ATOMIC allocations that can occur > > by allocating GFP_ATOMIC in some cases and GFP_KERNEL in others. > > Previously, only GFP_KERNEL was used and I didn't realise this > > allocation method was so recent. Problems of this sort have cropped up > > before and while there are later changes that suppress some of these > > warnings, I believe this is a strong candidate for where the allocation > > failures started appearing. > > > > > v2.6.30-rc6-1032-g7ba10a8 mac80211: fix transposed min/max CW values > > > 1.13 - > > > This is a bugfix for aa837ee1d from an earlier merge! Could this maybe > > There's a typo here. That ID should be: aa837e1d. > > > > influence the test results in between? There are various SKB related > > > changes there, for example: dfbf97f3..e5b9215e. > > > v2.6.30-rc6-1037-g2c5b9e5 wireless: libertas: fix unaligned accesses > > > 1.12 +- > > > v2.6.30-rc6-1044-g729e9c7 cfg80211: fix for duplicate userspace replies > > > 1.10 +- > > > v2.6.30-rc6-1075-gc587de0 iwlwifi: unify station management > > > 1.9 ++-|+- > > > v2.6.30-rc6-1076-gd14d444 iwl3945: port allow skb allocation in tasklet > > > I thought this was a prime candidate, but as you can see > > > several commits before failed too. Still worth looking at I think! > > > > Your commit IDs are different to what I see but it's the commit merge at > > b1bc81a0ef86b86fa410dd303d84c8c7bd09a64d. I agree that the last commit > > (d14d44407b9f06e3cf967fcef28ccb780caf0583) could make the problem worse > > because it expands the use of GFP_ATOMIC for another driver. > > No, that was a mistake of mine. d14d444 is in a driver I don't even compile. > The one you identified (which is the same change for iwlagn) is much more > interesting. > I had forgotten what model your card was and assumed it must have been based on this driver for the problem to get worse for you that point. > I really do think that v2.6.30-rc6-1032-g7ba10a8 could play a role here. > That's a fix for v2.6.30-rc1-1131-gaa837e1. So that bug was introduced > _before_ the merge 82d0481 and may thus well explain both the latencies I > saw _and_ why that merge tested without problems. And that would also go a > long way to explain my test results. Very good point. > So I'm going to retest 82d0481 with 7ba10a8 cherry-picked on top. > Great. > > > BISECTION of akpm (mm) MERGE > > > ---------------------------- > [...] > > While I didn't spot anything too out of the ordinary here, they did > > occur shortly after a number of other page allocator related patches. > > One small thing I noticed there is that kswapd is getting woken up less > > now than it did previously. Generally, I wouldn't have expected it to > > make a difference but it's possible that kswapd is not being woken up to > > reclaim at a higher order than it was previously. I have a patch for > > this below. It'd be nice if you could apply it and see do fewer > > allocation failures occur on current mainline. > > I'll give that patch a try and report back. > Thanks a lot. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-12 17:32 ` Frans Pop @ 2009-10-13 20:38 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-13 20:38 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Monday 12 October 2009, Frans Pop wrote: > On Monday 12 October 2009, Mel Gorman wrote: > > but after some digging around in this general area, I saw this patch > > > > 4752c93c30 iwlcore: Allow skb allocation from tasklet > > That is v2.6.30-rc6-773-g4752c93, which is part of the first wireless > merge I tested and where I saw no issues. But see below. > > > This patch increases the number of GFP_ATOMIC allocations that can > > occur by allocating GFP_ATOMIC in some cases and GFP_KERNEL in others. > > Previously, only GFP_KERNEL was used and I didn't realise this > > allocation method was so recent. Problems of this sort have cropped up > > before and while there are later changes that suppress some of these > > warnings, I believe this is a strong candidate for where the > > allocation failures started appearing. I have tried reverting this patch and that does make a significant difference, but the results are still not really conclusive. I tested the revert on top of: - the first net-next-2.6 merge (2ed0e21), i.e. before the mm merge - 2.6.31.1 In both cases I no longer get SKB errors, but instead (?) I get firmware errors: iwlagn 0000:10:00.0: Microcode SW error detected. Restarting 0x2000000. So on the wireless side it does look as if there is more than one change involved. Remember that with .30 I don't get any errors, only relatively mild latencies and skips in the music. > I really do think that v2.6.30-rc6-1032-g7ba10a8 could play a role here. > That's a fix for v2.6.30-rc1-1131-gaa837e1. So that bug was introduced > _before_ the merge 82d0481 and may thus well explain both the latencies > I saw _and_ why that merge tested without problems. And that would also > go a long way to explain my test results. > So I'm going to retest 82d0481 with 7ba10a8 cherry-picked on top. ^^^^^^^-- should be 45ea4ea I've tried this but still don't get any SKB errors, so that bug does not seem to make a difference. > > > BISECTION of akpm (mm) MERGE > > > ---------------------------- > > While I didn't spot anything too out of the ordinary here, they did > > occur shortly after a number of other page allocator related patches. > > One small thing I noticed there is that kswapd is getting woken up > > less now than it did previously. Generally, I wouldn't have expected > > it to make a difference but it's possible that kswapd is not being > > woken up to reclaim at a higher order than it was previously. I have a > > patch for this below. It'd be nice if you could apply it and see do > > fewer allocation failures occur on current mainline. > > I'll give that patch a try and report back. With your patch on .32-rc4 I still get the SKB errors, so it does not seem to help. The only change there may have been is that the desktop was frozen longer than without the patch, but that is an impression, not a hard fact. Although identifying the problem on the wireless side is important, I still feel that tracing the mm change should have priority as it influences much more than just iwlagn, as the other reports prove. > > After that, could you edit drivers/net/wireless/iwlwifi/iwl-rx.c and > > make the GFP_ATOMIC in iwl_rx_replenish_now() GFP_ATOMIC|__GFP_NOWARN > > and see do any of the "serious" allocation failure messages appear. For the above reason I've not yet tried this. It seems to me that this change will not really solve anything, but just suppress errors. Cheers, FJP ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-13 20:38 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-13 20:38 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Monday 12 October 2009, Frans Pop wrote: > On Monday 12 October 2009, Mel Gorman wrote: > > but after some digging around in this general area, I saw this patch > > > > 4752c93c30 iwlcore: Allow skb allocation from tasklet > > That is v2.6.30-rc6-773-g4752c93, which is part of the first wireless > merge I tested and where I saw no issues. But see below. > > > This patch increases the number of GFP_ATOMIC allocations that can > > occur by allocating GFP_ATOMIC in some cases and GFP_KERNEL in others. > > Previously, only GFP_KERNEL was used and I didn't realise this > > allocation method was so recent. Problems of this sort have cropped up > > before and while there are later changes that suppress some of these > > warnings, I believe this is a strong candidate for where the > > allocation failures started appearing. I have tried reverting this patch and that does make a significant difference, but the results are still not really conclusive. I tested the revert on top of: - the first net-next-2.6 merge (2ed0e21), i.e. before the mm merge - 2.6.31.1 In both cases I no longer get SKB errors, but instead (?) I get firmware errors: iwlagn 0000:10:00.0: Microcode SW error detected. Restarting 0x2000000. So on the wireless side it does look as if there is more than one change involved. Remember that with .30 I don't get any errors, only relatively mild latencies and skips in the music. > I really do think that v2.6.30-rc6-1032-g7ba10a8 could play a role here. > That's a fix for v2.6.30-rc1-1131-gaa837e1. So that bug was introduced > _before_ the merge 82d0481 and may thus well explain both the latencies > I saw _and_ why that merge tested without problems. And that would also > go a long way to explain my test results. > So I'm going to retest 82d0481 with 7ba10a8 cherry-picked on top. ^^^^^^^-- should be 45ea4ea I've tried this but still don't get any SKB errors, so that bug does not seem to make a difference. > > > BISECTION of akpm (mm) MERGE > > > ---------------------------- > > While I didn't spot anything too out of the ordinary here, they did > > occur shortly after a number of other page allocator related patches. > > One small thing I noticed there is that kswapd is getting woken up > > less now than it did previously. Generally, I wouldn't have expected > > it to make a difference but it's possible that kswapd is not being > > woken up to reclaim at a higher order than it was previously. I have a > > patch for this below. It'd be nice if you could apply it and see do > > fewer allocation failures occur on current mainline. > > I'll give that patch a try and report back. With your patch on .32-rc4 I still get the SKB errors, so it does not seem to help. The only change there may have been is that the desktop was frozen longer than without the patch, but that is an impression, not a hard fact. Although identifying the problem on the wireless side is important, I still feel that tracing the mm change should have priority as it influences much more than just iwlagn, as the other reports prove. > > After that, could you edit drivers/net/wireless/iwlwifi/iwl-rx.c and > > make the GFP_ATOMIC in iwl_rx_replenish_now() GFP_ATOMIC|__GFP_NOWARN > > and see do any of the "serious" allocation failure messages appear. For the above reason I've not yet tried this. It seems to me that this change will not really solve anything, but just suppress errors. Cheers, FJP -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-13 20:38 ` Frans Pop (?) @ 2009-10-14 10:30 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-14 10:30 UTC (permalink / raw) To: Frans Pop Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Tue, Oct 13, 2009 at 10:38:37PM +0200, Frans Pop wrote: > On Monday 12 October 2009, Frans Pop wrote: > > On Monday 12 October 2009, Mel Gorman wrote: > > > but after some digging around in this general area, I saw this patch > > > > > > 4752c93c30 iwlcore: Allow skb allocation from tasklet > > > > That is v2.6.30-rc6-773-g4752c93, which is part of the first wireless > > merge I tested and where I saw no issues. But see below. > > > > > This patch increases the number of GFP_ATOMIC allocations that can > > > occur by allocating GFP_ATOMIC in some cases and GFP_KERNEL in others. > > > Previously, only GFP_KERNEL was used and I didn't realise this > > > allocation method was so recent. Problems of this sort have cropped up > > > before and while there are later changes that suppress some of these > > > warnings, I believe this is a strong candidate for where the > > > allocation failures started appearing. > > I have tried reverting this patch and that does make a significant > difference, but the results are still not really conclusive. > I tested the revert on top of: > - the first net-next-2.6 merge (2ed0e21), i.e. before the mm merge > - 2.6.31.1 > I think this is very significant. Either that change needs to be backed out or more likely, __GFP_NOWARN needs to be specified and warnings *only* printed when the RX buffers are really low. My expectation would be that some GFP_ATOMIC allocations fail during refill but the fact they fail wakes kswapd to reclaim order-2 pages while the RX buffers in the pool are consumed. > In both cases I no longer get SKB errors, but instead (?) I get firmware > errors: > iwlagn 0000:10:00.0: Microcode SW error detected. Restarting 0x2000000. > I am no wireless expert, but that looks like an separate problem to me. I don't see how an allocation failure could trigger errors in the microcode. I really really hate to say it, but this might need a separate bisection with 4752c93c30 either reverted or patched as I do below. > So on the wireless side it does look as if there is more than one change > involved. Remember that with .30 I don't get any errors, only relatively > mild latencies and skips in the music. > 2.6.31 does not appear to have done wireless any favours. > > I really do think that v2.6.30-rc6-1032-g7ba10a8 could play a role here. > > That's a fix for v2.6.30-rc1-1131-gaa837e1. So that bug was introduced > > _before_ the merge 82d0481 and may thus well explain both the latencies > > I saw _and_ why that merge tested without problems. And that would also > > go a long way to explain my test results. > > So I'm going to retest 82d0481 with 7ba10a8 cherry-picked on top. > ^^^^^^^-- should be 45ea4ea > > I've tried this but still don't get any SKB errors, so that bug does not > seem to make a difference. > > > > > BISECTION of akpm (mm) MERGE > > > > ---------------------------- > > > While I didn't spot anything too out of the ordinary here, they did > > > occur shortly after a number of other page allocator related patches. > > > One small thing I noticed there is that kswapd is getting woken up > > > less now than it did previously. Generally, I wouldn't have expected > > > it to make a difference but it's possible that kswapd is not being > > > woken up to reclaim at a higher order than it was previously. I have a > > > patch for this below. It'd be nice if you could apply it and see do > > > fewer allocation failures occur on current mainline. > > > > I'll give that patch a try and report back. > > With your patch on .32-rc4 I still get the SKB errors, so it does not seem > to help. The only change there may have been is that the desktop was > frozen longer than without the patch, but that is an impression, not a > hard fact. > Actually, that's fairly interesting and I think justifies pushing the patch. Direct reclaim can stall processes in a user-visible manner which kswapd is meant to avoid in the majority of cases but is tricky to quantify without instrumenting the kernel to measure direct reclaim frequency and latency (I have WIP tracepoints for this but it's still a WIP). If you notice shorter stalls with the patch applied, it means that kswapd really did need to be informed of the problems. > Although identifying the problem on the wireless side is important, I still > feel that tracing the mm change should have priority as it influences much > more than just iwlagn, as the other reports prove. > There still has not been a mm-change identified that makes fragmentation significantly worse. The majority of the wireless reports have been in this driver and I think we have the problem commit there. The only other is a firmware loading problem in e100 after resume that fails to make an atomic order-5 fail. It's possible that something has changed in resume in the 2.6.31 window there - maybe something like drivers now reload during resume where they didn't previously or less memory being pushed to swap during resume. > > > After that, could you edit drivers/net/wireless/iwlwifi/iwl-rx.c and > > > make the GFP_ATOMIC in iwl_rx_replenish_now() GFP_ATOMIC|__GFP_NOWARN > > > and see do any of the "serious" allocation failure messages appear. > > For the above reason I've not yet tried this. It seems to me that this > change will not really solve anything, but just suppress errors. > I disagree. Harmless allocation errors get suppressed but it still warns when things get really bad. See the following patch that suppresses the warnings from GFP_ATOMIC but warns for GFP_KERNEL failures and dumps a stack on serious allocation failure. We either need a patch like this or the GFP_ATOMIC-direct-with-refills-from-tasklet patch needs to be reverted. === CUT HERE === >From 5fb9f897117bf2701f9fdebe4d008dbe34358ab9 Mon Sep 17 00:00:00 2001 From: Mel Gorman <mel@csn.ul.ie> Date: Wed, 14 Oct 2009 11:19:57 +0100 Subject: [PATCH] iwlwifi: Suppress warnings related to GFP_ATOMIC allocations that do not matter iwlwifi refills RX buffers in two ways - a direct method using GFP_ATOMIC and a tasklet method using GFP_KERNEL. There are a number of RX buffers and there are only serious issues when there are no RX buffers left. The driver explicitly warns when refills are failing and the buffers are low but it always warns when a GFP_ATOMIC allocation fails even when there is no packet loss as a result. This patch specifies __GFP_NOWARN for the direct refill method that uses GFP_ATOMIC. To help identify where allocation failures might be coming from, the stack is dumped when the RX queue is dangerously low. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- drivers/net/wireless/iwlwifi/iwl-rx.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/Documentation/trace/postprocess/trace-pagealloc-postprocess.pl b/Documentation/trace/postprocess/trace-pagealloc-postprocess.pl old mode 100644 new mode 100755 diff --git a/drivers/net/wireless/iwlwifi/iwl-rx.c b/drivers/net/wireless/iwlwifi/iwl-rx.c index 8e1bb53..f91a108 100644 --- a/drivers/net/wireless/iwlwifi/iwl-rx.c +++ b/drivers/net/wireless/iwlwifi/iwl-rx.c @@ -260,10 +260,12 @@ void iwl_rx_allocate(struct iwl_priv *priv, gfp_t priority) if (net_ratelimit()) IWL_DEBUG_INFO(priv, "Failed to allocate SKB buffer.\n"); if ((rxq->free_count <= RX_LOW_WATERMARK) && - net_ratelimit()) + net_ratelimit()) { IWL_CRIT(priv, "Failed to allocate SKB buffer with %s. Only %u free buffers remaining.\n", priority == GFP_ATOMIC ? "GFP_ATOMIC" : "GFP_KERNEL", rxq->free_count); + dump_stack(); + } /* We don't reschedule replenish work here -- we will * call the restock method and if it still needs * more buffers it will schedule replenish */ @@ -320,7 +322,7 @@ EXPORT_SYMBOL(iwl_rx_replenish); void iwl_rx_replenish_now(struct iwl_priv *priv) { - iwl_rx_allocate(priv, GFP_ATOMIC); + iwl_rx_allocate(priv, GFP_ATOMIC|__GFP_NOWARN); iwl_rx_queue_restock(priv); } ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-14 10:30 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-14 10:30 UTC (permalink / raw) To: Frans Pop Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Tue, Oct 13, 2009 at 10:38:37PM +0200, Frans Pop wrote: > On Monday 12 October 2009, Frans Pop wrote: > > On Monday 12 October 2009, Mel Gorman wrote: > > > but after some digging around in this general area, I saw this patch > > > > > > 4752c93c30 iwlcore: Allow skb allocation from tasklet > > > > That is v2.6.30-rc6-773-g4752c93, which is part of the first wireless > > merge I tested and where I saw no issues. But see below. > > > > > This patch increases the number of GFP_ATOMIC allocations that can > > > occur by allocating GFP_ATOMIC in some cases and GFP_KERNEL in others. > > > Previously, only GFP_KERNEL was used and I didn't realise this > > > allocation method was so recent. Problems of this sort have cropped up > > > before and while there are later changes that suppress some of these > > > warnings, I believe this is a strong candidate for where the > > > allocation failures started appearing. > > I have tried reverting this patch and that does make a significant > difference, but the results are still not really conclusive. > I tested the revert on top of: > - the first net-next-2.6 merge (2ed0e21), i.e. before the mm merge > - 2.6.31.1 > I think this is very significant. Either that change needs to be backed out or more likely, __GFP_NOWARN needs to be specified and warnings *only* printed when the RX buffers are really low. My expectation would be that some GFP_ATOMIC allocations fail during refill but the fact they fail wakes kswapd to reclaim order-2 pages while the RX buffers in the pool are consumed. > In both cases I no longer get SKB errors, but instead (?) I get firmware > errors: > iwlagn 0000:10:00.0: Microcode SW error detected. Restarting 0x2000000. > I am no wireless expert, but that looks like an separate problem to me. I don't see how an allocation failure could trigger errors in the microcode. I really really hate to say it, but this might need a separate bisection with 4752c93c30 either reverted or patched as I do below. > So on the wireless side it does look as if there is more than one change > involved. Remember that with .30 I don't get any errors, only relatively > mild latencies and skips in the music. > 2.6.31 does not appear to have done wireless any favours. > > I really do think that v2.6.30-rc6-1032-g7ba10a8 could play a role here. > > That's a fix for v2.6.30-rc1-1131-gaa837e1. So that bug was introduced > > _before_ the merge 82d0481 and may thus well explain both the latencies > > I saw _and_ why that merge tested without problems. And that would also > > go a long way to explain my test results. > > So I'm going to retest 82d0481 with 7ba10a8 cherry-picked on top. > ^^^^^^^-- should be 45ea4ea > > I've tried this but still don't get any SKB errors, so that bug does not > seem to make a difference. > > > > > BISECTION of akpm (mm) MERGE > > > > ---------------------------- > > > While I didn't spot anything too out of the ordinary here, they did > > > occur shortly after a number of other page allocator related patches. > > > One small thing I noticed there is that kswapd is getting woken up > > > less now than it did previously. Generally, I wouldn't have expected > > > it to make a difference but it's possible that kswapd is not being > > > woken up to reclaim at a higher order than it was previously. I have a > > > patch for this below. It'd be nice if you could apply it and see do > > > fewer allocation failures occur on current mainline. > > > > I'll give that patch a try and report back. > > With your patch on .32-rc4 I still get the SKB errors, so it does not seem > to help. The only change there may have been is that the desktop was > frozen longer than without the patch, but that is an impression, not a > hard fact. > Actually, that's fairly interesting and I think justifies pushing the patch. Direct reclaim can stall processes in a user-visible manner which kswapd is meant to avoid in the majority of cases but is tricky to quantify without instrumenting the kernel to measure direct reclaim frequency and latency (I have WIP tracepoints for this but it's still a WIP). If you notice shorter stalls with the patch applied, it means that kswapd really did need to be informed of the problems. > Although identifying the problem on the wireless side is important, I still > feel that tracing the mm change should have priority as it influences much > more than just iwlagn, as the other reports prove. > There still has not been a mm-change identified that makes fragmentation significantly worse. The majority of the wireless reports have been in this driver and I think we have the problem commit there. The only other is a firmware loading problem in e100 after resume that fails to make an atomic order-5 fail. It's possible that something has changed in resume in the 2.6.31 window there - maybe something like drivers now reload during resume where they didn't previously or less memory being pushed to swap during resume. > > > After that, could you edit drivers/net/wireless/iwlwifi/iwl-rx.c and > > > make the GFP_ATOMIC in iwl_rx_replenish_now() GFP_ATOMIC|__GFP_NOWARN > > > and see do any of the "serious" allocation failure messages appear. > > For the above reason I've not yet tried this. It seems to me that this > change will not really solve anything, but just suppress errors. > I disagree. Harmless allocation errors get suppressed but it still warns when things get really bad. See the following patch that suppresses the warnings from GFP_ATOMIC but warns for GFP_KERNEL failures and dumps a stack on serious allocation failure. We either need a patch like this or the GFP_ATOMIC-direct-with-refills-from-tasklet patch needs to be reverted. === CUT HERE === From 5fb9f897117bf2701f9fdebe4d008dbe34358ab9 Mon Sep 17 00:00:00 2001 From: Mel Gorman <mel@csn.ul.ie> Date: Wed, 14 Oct 2009 11:19:57 +0100 Subject: [PATCH] iwlwifi: Suppress warnings related to GFP_ATOMIC allocations that do not matter iwlwifi refills RX buffers in two ways - a direct method using GFP_ATOMIC and a tasklet method using GFP_KERNEL. There are a number of RX buffers and there are only serious issues when there are no RX buffers left. The driver explicitly warns when refills are failing and the buffers are low but it always warns when a GFP_ATOMIC allocation fails even when there is no packet loss as a result. This patch specifies __GFP_NOWARN for the direct refill method that uses GFP_ATOMIC. To help identify where allocation failures might be coming from, the stack is dumped when the RX queue is dangerously low. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- drivers/net/wireless/iwlwifi/iwl-rx.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/Documentation/trace/postprocess/trace-pagealloc-postprocess.pl b/Documentation/trace/postprocess/trace-pagealloc-postprocess.pl old mode 100644 new mode 100755 diff --git a/drivers/net/wireless/iwlwifi/iwl-rx.c b/drivers/net/wireless/iwlwifi/iwl-rx.c index 8e1bb53..f91a108 100644 --- a/drivers/net/wireless/iwlwifi/iwl-rx.c +++ b/drivers/net/wireless/iwlwifi/iwl-rx.c @@ -260,10 +260,12 @@ void iwl_rx_allocate(struct iwl_priv *priv, gfp_t priority) if (net_ratelimit()) IWL_DEBUG_INFO(priv, "Failed to allocate SKB buffer.\n"); if ((rxq->free_count <= RX_LOW_WATERMARK) && - net_ratelimit()) + net_ratelimit()) { IWL_CRIT(priv, "Failed to allocate SKB buffer with %s. Only %u free buffers remaining.\n", priority == GFP_ATOMIC ? "GFP_ATOMIC" : "GFP_KERNEL", rxq->free_count); + dump_stack(); + } /* We don't reschedule replenish work here -- we will * call the restock method and if it still needs * more buffers it will schedule replenish */ @@ -320,7 +322,7 @@ EXPORT_SYMBOL(iwl_rx_replenish); void iwl_rx_replenish_now(struct iwl_priv *priv) { - iwl_rx_allocate(priv, GFP_ATOMIC); + iwl_rx_allocate(priv, GFP_ATOMIC|__GFP_NOWARN); iwl_rx_queue_restock(priv); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-14 10:30 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-14 10:30 UTC (permalink / raw) To: Frans Pop Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Tue, Oct 13, 2009 at 10:38:37PM +0200, Frans Pop wrote: > On Monday 12 October 2009, Frans Pop wrote: > > On Monday 12 October 2009, Mel Gorman wrote: > > > but after some digging around in this general area, I saw this patch > > > > > > 4752c93c30 iwlcore: Allow skb allocation from tasklet > > > > That is v2.6.30-rc6-773-g4752c93, which is part of the first wireless > > merge I tested and where I saw no issues. But see below. > > > > > This patch increases the number of GFP_ATOMIC allocations that can > > > occur by allocating GFP_ATOMIC in some cases and GFP_KERNEL in others. > > > Previously, only GFP_KERNEL was used and I didn't realise this > > > allocation method was so recent. Problems of this sort have cropped up > > > before and while there are later changes that suppress some of these > > > warnings, I believe this is a strong candidate for where the > > > allocation failures started appearing. > > I have tried reverting this patch and that does make a significant > difference, but the results are still not really conclusive. > I tested the revert on top of: > - the first net-next-2.6 merge (2ed0e21), i.e. before the mm merge > - 2.6.31.1 > I think this is very significant. Either that change needs to be backed out or more likely, __GFP_NOWARN needs to be specified and warnings *only* printed when the RX buffers are really low. My expectation would be that some GFP_ATOMIC allocations fail during refill but the fact they fail wakes kswapd to reclaim order-2 pages while the RX buffers in the pool are consumed. > In both cases I no longer get SKB errors, but instead (?) I get firmware > errors: > iwlagn 0000:10:00.0: Microcode SW error detected. Restarting 0x2000000. > I am no wireless expert, but that looks like an separate problem to me. I don't see how an allocation failure could trigger errors in the microcode. I really really hate to say it, but this might need a separate bisection with 4752c93c30 either reverted or patched as I do below. > So on the wireless side it does look as if there is more than one change > involved. Remember that with .30 I don't get any errors, only relatively > mild latencies and skips in the music. > 2.6.31 does not appear to have done wireless any favours. > > I really do think that v2.6.30-rc6-1032-g7ba10a8 could play a role here. > > That's a fix for v2.6.30-rc1-1131-gaa837e1. So that bug was introduced > > _before_ the merge 82d0481 and may thus well explain both the latencies > > I saw _and_ why that merge tested without problems. And that would also > > go a long way to explain my test results. > > So I'm going to retest 82d0481 with 7ba10a8 cherry-picked on top. > ^^^^^^^-- should be 45ea4ea > > I've tried this but still don't get any SKB errors, so that bug does not > seem to make a difference. > > > > > BISECTION of akpm (mm) MERGE > > > > ---------------------------- > > > While I didn't spot anything too out of the ordinary here, they did > > > occur shortly after a number of other page allocator related patches. > > > One small thing I noticed there is that kswapd is getting woken up > > > less now than it did previously. Generally, I wouldn't have expected > > > it to make a difference but it's possible that kswapd is not being > > > woken up to reclaim at a higher order than it was previously. I have a > > > patch for this below. It'd be nice if you could apply it and see do > > > fewer allocation failures occur on current mainline. > > > > I'll give that patch a try and report back. > > With your patch on .32-rc4 I still get the SKB errors, so it does not seem > to help. The only change there may have been is that the desktop was > frozen longer than without the patch, but that is an impression, not a > hard fact. > Actually, that's fairly interesting and I think justifies pushing the patch. Direct reclaim can stall processes in a user-visible manner which kswapd is meant to avoid in the majority of cases but is tricky to quantify without instrumenting the kernel to measure direct reclaim frequency and latency (I have WIP tracepoints for this but it's still a WIP). If you notice shorter stalls with the patch applied, it means that kswapd really did need to be informed of the problems. > Although identifying the problem on the wireless side is important, I still > feel that tracing the mm change should have priority as it influences much > more than just iwlagn, as the other reports prove. > There still has not been a mm-change identified that makes fragmentation significantly worse. The majority of the wireless reports have been in this driver and I think we have the problem commit there. The only other is a firmware loading problem in e100 after resume that fails to make an atomic order-5 fail. It's possible that something has changed in resume in the 2.6.31 window there - maybe something like drivers now reload during resume where they didn't previously or less memory being pushed to swap during resume. > > > After that, could you edit drivers/net/wireless/iwlwifi/iwl-rx.c and > > > make the GFP_ATOMIC in iwl_rx_replenish_now() GFP_ATOMIC|__GFP_NOWARN > > > and see do any of the "serious" allocation failure messages appear. > > For the above reason I've not yet tried this. It seems to me that this > change will not really solve anything, but just suppress errors. > I disagree. Harmless allocation errors get suppressed but it still warns when things get really bad. See the following patch that suppresses the warnings from GFP_ATOMIC but warns for GFP_KERNEL failures and dumps a stack on serious allocation failure. We either need a patch like this or the GFP_ATOMIC-direct-with-refills-from-tasklet patch needs to be reverted. === CUT HERE === ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-14 10:30 ` Mel Gorman (?) (?) @ 2009-10-14 13:10 ` Frans Pop 2009-10-14 15:40 ` Mel Gorman ` (2 more replies) -1 siblings, 3 replies; 384+ messages in thread From: Frans Pop @ 2009-10-14 13:10 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm [-- Attachment #1: Type: text/plain, Size: 4524 bytes --] On Wednesday 14 October 2009, Mel Gorman wrote: > I think this is very significant. Either that change needs to be backed > out or more likely, __GFP_NOWARN needs to be specified and warnings > *only* printed when the RX buffers are really low. My expectation would > be that some GFP_ATOMIC allocations fail during refill but the fact they > fail wakes kswapd to reclaim order-2 pages while the RX buffers in the > pool are consumed. Sorry I did not actually mention this, but the SKB failures I get with .32 have loads of the "Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining." errors. That's why I don't think your patch will help anything. zgrep "Only 0 free buffers remaining" /var/log/kern.log* | wc -l 84 OK, they are all GPF_ATOMIC and not GPF_KERNEL, but they also almost all have "0 free buffers"! Next to the 84 warnings for 0 remaining I only have one with "3 free buffers" and one with "1 free buffers". And that does not even count the rate limitting: Oct 12 20:15:07 aragorn kernel: __ratelimit: 45 callbacks suppressed Oct 12 20:25:19 aragorn kernel: __ratelimit: 27 callbacks suppressed Oct 12 20:25:20 aragorn kernel: __ratelimit: 2 callbacks suppressed Attached the kernel log for one test I did with .32. > > In both cases I no longer get SKB errors, but instead (?) I get > > firmware errors: > > iwlagn 0000:10:00.0: Microcode SW error detected. Restarting > > 0x2000000. > > I am no wireless expert, but that looks like an separate problem to me. > I don't see how an allocation failure could trigger errors in the > microcode. Yes, it is a separate problem, but it is still significant that reverting that patch triggers them in the extreme swap situation. > > With your patch on .32-rc4 I still get the SKB errors, so it does not > > seem to help. The only change there may have been is that the desktop > > was frozen longer than without the patch, but that is an impression, > > not a hard fact. > > Actually, that's fairly interesting and I think justifies pushing the > patch. Direct reclaim can stall processes in a user-visible manner which > kswapd is meant to avoid in the majority of cases but is tricky to > quantify without instrumenting the kernel to measure direct reclaim > frequency and latency (I have WIP tracepoints for this but it's still a > WIP). If you notice shorter stalls with the patch applied, it means that > kswapd really did need to be informed of the problems. No, I thought I saw _longer_ stalls with your patch applied... > There still has not been a mm-change identified that makes fragmentation > significantly worse. My bisection shows a very clear point, even if not an individual commit, in the 'akpm' merge where SKB errors suddenly become *much* more frequent and easy to trigger. I'm sorry to say this, but the fact that nothing has been identified yet is IMO the result of a lack of effort, not because there is no such change. > The majority of the wireless reports have been in > this driver and I think we have the problem commit there. The only other > is a firmware loading problem in e100 after resume that fails to make an > atomic order-5 fail. Not exactly true. Bartlomiej's report was about ipw2200, so there are at least 3 different drivers involved, two wireless and one wired. Besides that one report is related to heavy swap, one to resume and one to driver reload. So it's much more likely that there is some common regression (in mm) that affected all three than that there are three unrelated regressions. And although both of the others did extremely high allocations, they both started appearing in the same timeframe. And Bart's very first report linked it to mm changes. > It's possible that something has changed in resume > in the 2.6.31 window there - maybe something like drivers now reload > during resume where they didn't previously or less memory being pushed > to swap during resume. IMO you're sticking your head in the sand here. I'm not saying that mm is the only issue here, but I'm convinced that there _is_ an mm change that has contributed in a major way to these issues, even if we've not yet been able to identify it. > - net_ratelimit()) > + net_ratelimit()) { > IWL_CRIT(priv, "Failed to allocate SKB buffer with %s. Only %u free > buffers remaining.\n", priority == GFP_ATOMIC ? "GFP_ATOMIC" : > "GFP_KERNEL", Haven't you broken the test 'priority == GFP_ATOMIC' here by setting priority to GFP_ATOMIC|__GFP_NOWARN? Cheers, FJP [-- Attachment #2: kern.log --] [-- Type: text/x-log, Size: 114900 bytes --] Oct 12 17:13:07 aragorn kernel: Linux version 2.6.32-rc4 (root@aragorn) (gcc version 4.3.2 (Debian 4.3.2-1.1) ) #29 SMP Mon Oct 12 13:12:50 CEST 2009 Oct 12 17:13:07 aragorn kernel: Command line: root=/dev/mapper/main-root ro vga=791 quiet Oct 12 17:13:07 aragorn kernel: KERNEL supported cpus: Oct 12 17:13:07 aragorn kernel: Intel GenuineIntel Oct 12 17:13:07 aragorn kernel: AMD AuthenticAMD Oct 12 17:13:07 aragorn kernel: Centaur CentaurHauls Oct 12 17:13:07 aragorn kernel: BIOS-provided physical RAM map: Oct 12 17:13:07 aragorn kernel: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) Oct 12 17:13:07 aragorn kernel: BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) Oct 12 17:13:07 aragorn kernel: BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) Oct 12 17:13:07 aragorn kernel: BIOS-e820: 0000000000100000 - 000000007e7b0000 (usable) Oct 12 17:13:07 aragorn kernel: BIOS-e820: 000000007e7b0000 - 000000007e7c5400 (reserved) Oct 12 17:13:07 aragorn kernel: BIOS-e820: 000000007e7c5400 - 000000007e7e7fb8 (ACPI NVS) Oct 12 17:13:07 aragorn kernel: BIOS-e820: 000000007e7e7fb8 - 000000007f000000 (reserved) Oct 12 17:13:07 aragorn kernel: BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved) Oct 12 17:13:07 aragorn kernel: BIOS-e820: 00000000fed20000 - 00000000fed9a000 (reserved) Oct 12 17:13:07 aragorn kernel: BIOS-e820: 00000000feda0000 - 00000000fedc0000 (reserved) Oct 12 17:13:07 aragorn kernel: BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) Oct 12 17:13:07 aragorn kernel: BIOS-e820: 00000000ffb00000 - 00000000ffc00000 (reserved) Oct 12 17:13:07 aragorn kernel: BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved) Oct 12 17:13:07 aragorn kernel: DMI 2.4 present. Oct 12 17:13:07 aragorn kernel: last_pfn = 0x7e7b0 max_arch_pfn = 0x400000000 Oct 12 17:13:07 aragorn kernel: MTRR default type: uncachable Oct 12 17:13:07 aragorn kernel: MTRR fixed ranges enabled: Oct 12 17:13:07 aragorn kernel: 00000-9FFFF write-back Oct 12 17:13:07 aragorn kernel: A0000-BFFFF uncachable Oct 12 17:13:07 aragorn kernel: C0000-D3FFF write-protect Oct 12 17:13:07 aragorn kernel: D4000-EFFFF uncachable Oct 12 17:13:07 aragorn kernel: F0000-FFFFF write-protect Oct 12 17:13:07 aragorn kernel: MTRR variable ranges enabled: Oct 12 17:13:07 aragorn kernel: 0 base 000000000 mask F80000000 write-back Oct 12 17:13:07 aragorn kernel: 1 base 07F000000 mask FFF000000 uncachable Oct 12 17:13:07 aragorn kernel: 2 base 07E800000 mask FFF800000 uncachable Oct 12 17:13:07 aragorn kernel: 3 base 0FEDA0000 mask FFFFE0000 uncachable Oct 12 17:13:07 aragorn kernel: 4 disabled Oct 12 17:13:07 aragorn kernel: 5 disabled Oct 12 17:13:07 aragorn kernel: 6 disabled Oct 12 17:13:07 aragorn kernel: 7 disabled Oct 12 17:13:07 aragorn kernel: x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 Oct 12 17:13:07 aragorn kernel: initial memory mapped : 0 - 20000000 Oct 12 17:13:07 aragorn kernel: init_memory_mapping: 0000000000000000-000000007e7b0000 Oct 12 17:13:07 aragorn kernel: 0000000000 - 007e600000 page 2M Oct 12 17:13:07 aragorn kernel: 007e600000 - 007e7b0000 page 4k Oct 12 17:13:07 aragorn kernel: kernel direct mapping tables up to 7e7b0000 @ 8000-c000 Oct 12 17:13:07 aragorn kernel: RAMDISK: 37a79000 - 37fef2f8 Oct 12 17:13:07 aragorn kernel: ACPI: RSDP 00000000000f7960 00024 (v02 HP ) Oct 12 17:13:07 aragorn kernel: ACPI: XSDT 000000007e7c81c8 0007C (v01 HPQOEM SLIC-MPC 00000001 HP 00000001) Oct 12 17:13:07 aragorn kernel: ACPI: FACP 000000007e7c8084 000F4 (v04 HP 30C9 00000003 HP 00000001) Oct 12 17:13:07 aragorn kernel: ACPI: DSDT 000000007e7c8538 13484 (v01 HP nc2500 00010000 MSFT 03000001) Oct 12 17:13:07 aragorn kernel: ACPI: FACS 000000007e7e7d80 00040 Oct 12 17:13:07 aragorn kernel: ACPI: SLIC 000000007e7c8244 00176 (v01 HPQOEM SLIC-MPC 00000001 HP 00000001) Oct 12 17:13:07 aragorn kernel: ACPI: HPET 000000007e7c83bc 00038 (v01 HP 30C9 00000001 HP 00000001) Oct 12 17:13:07 aragorn kernel: ACPI: APIC 000000007e7c83f4 00068 (v01 HP 30C9 00000001 HP 00000001) Oct 12 17:13:07 aragorn kernel: ACPI: MCFG 000000007e7c845c 0003C (v01 HP 30C9 00000001 HP 00000001) Oct 12 17:13:07 aragorn kernel: ACPI: TCPA 000000007e7c8498 00032 (v02 HP 30C9 00000001 HP 00000001) Oct 12 17:13:07 aragorn kernel: ACPI: ASF! 000000007e7c84cc 00069 (v16 HP CHIMAYU 00000001 HP 00000000) Oct 12 17:13:07 aragorn kernel: ACPI: SSDT 000000007e7db9bc 002BE (v01 HP HPQPAT 00000001 MSFT 03000001) Oct 12 17:13:07 aragorn kernel: ACPI: SSDT 000000007e7dc640 0025F (v01 HP Cpu0Tst 00003000 INTL 20060317) Oct 12 17:13:07 aragorn kernel: ACPI: SSDT 000000007e7dc89f 000A6 (v01 HP Cpu1Tst 00003000 INTL 20060317) Oct 12 17:13:07 aragorn kernel: ACPI: SSDT 000000007e7dc945 004D7 (v01 HP CpuPm 00003000 INTL 20060317) Oct 12 17:13:07 aragorn kernel: ACPI: Local APIC address 0xfee00000 Oct 12 17:13:07 aragorn kernel: (7 early reservations) ==> bootmem [0000000000 - 007e7b0000] Oct 12 17:13:07 aragorn kernel: #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000] Oct 12 17:13:07 aragorn kernel: #1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000] Oct 12 17:13:07 aragorn kernel: #2 [0001000000 - 00015bdb30] TEXT DATA BSS ==> [0001000000 - 00015bdb30] Oct 12 17:13:07 aragorn kernel: #3 [0037a79000 - 0037fef2f8] RAMDISK ==> [0037a79000 - 0037fef2f8] Oct 12 17:13:07 aragorn kernel: #4 [000009fc00 - 0000100000] BIOS reserved ==> [000009fc00 - 0000100000] Oct 12 17:13:07 aragorn kernel: #5 [00015be000 - 00015be18c] BRK ==> [00015be000 - 00015be18c] Oct 12 17:13:07 aragorn kernel: #6 [0000008000 - 000000a000] PGTABLE ==> [0000008000 - 000000a000] Oct 12 17:13:07 aragorn kernel: [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff880001a00000-ffff8800039fffff] on node 0 Oct 12 17:13:07 aragorn kernel: Zone PFN ranges: Oct 12 17:13:07 aragorn kernel: DMA 0x00000000 -> 0x00001000 Oct 12 17:13:07 aragorn kernel: DMA32 0x00001000 -> 0x00100000 Oct 12 17:13:07 aragorn kernel: Normal 0x00100000 -> 0x00100000 Oct 12 17:13:07 aragorn kernel: Movable zone start PFN for each node Oct 12 17:13:07 aragorn kernel: early_node_map[2] active PFN ranges Oct 12 17:13:07 aragorn kernel: 0: 0x00000000 -> 0x0000009f Oct 12 17:13:07 aragorn kernel: 0: 0x00000100 -> 0x0007e7b0 Oct 12 17:13:07 aragorn kernel: On node 0 totalpages: 517967 Oct 12 17:13:07 aragorn kernel: DMA zone: 64 pages used for memmap Oct 12 17:13:07 aragorn kernel: DMA zone: 101 pages reserved Oct 12 17:13:07 aragorn kernel: DMA zone: 3834 pages, LIFO batch:0 Oct 12 17:13:07 aragorn kernel: DMA32 zone: 8031 pages used for memmap Oct 12 17:13:07 aragorn kernel: DMA32 zone: 505937 pages, LIFO batch:31 Oct 12 17:13:07 aragorn kernel: ACPI: PM-Timer IO Port: 0x1008 Oct 12 17:13:07 aragorn kernel: ACPI: Local APIC address 0xfee00000 Oct 12 17:13:07 aragorn kernel: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Oct 12 17:13:07 aragorn kernel: ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Oct 12 17:13:07 aragorn kernel: ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) Oct 12 17:13:07 aragorn kernel: ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1]) Oct 12 17:13:07 aragorn kernel: ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0]) Oct 12 17:13:07 aragorn kernel: IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23 Oct 12 17:13:07 aragorn kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) Oct 12 17:13:07 aragorn kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Oct 12 17:13:07 aragorn kernel: ACPI: IRQ0 used by override. Oct 12 17:13:07 aragorn kernel: ACPI: IRQ2 used by override. Oct 12 17:13:07 aragorn kernel: ACPI: IRQ9 used by override. Oct 12 17:13:07 aragorn kernel: Using ACPI (MADT) for SMP configuration information Oct 12 17:13:07 aragorn kernel: ACPI: HPET id: 0x8086a201 base: 0xfed00000 Oct 12 17:13:07 aragorn kernel: SMP: Allowing 2 CPUs, 0 hotplug CPUs Oct 12 17:13:07 aragorn kernel: nr_irqs_gsi: 24 Oct 12 17:13:07 aragorn kernel: PM: Registered nosave memory: 000000000009f000 - 00000000000a0000 Oct 12 17:13:07 aragorn kernel: PM: Registered nosave memory: 00000000000a0000 - 00000000000e0000 Oct 12 17:13:07 aragorn kernel: PM: Registered nosave memory: 00000000000e0000 - 0000000000100000 Oct 12 17:13:07 aragorn kernel: Allocating PCI resources starting at 7f000000 (gap: 7f000000:7fc00000) Oct 12 17:13:07 aragorn kernel: NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:2 nr_node_ids:1 Oct 12 17:13:07 aragorn kernel: PERCPU: Embedded 27 pages/cpu @ffff880001600000 s81752 r8192 d20648 u1048576 Oct 12 17:13:07 aragorn kernel: pcpu-alloc: s81752 r8192 d20648 u1048576 alloc=1*2097152 Oct 12 17:13:07 aragorn kernel: pcpu-alloc: [0] 0 1 Oct 12 17:13:07 aragorn kernel: Built 1 zonelists in Zone order, mobility grouping on. Total pages: 509771 Oct 12 17:13:07 aragorn kernel: Kernel command line: root=/dev/mapper/main-root ro vga=791 quiet Oct 12 17:13:07 aragorn kernel: PID hash table entries: 4096 (order: 3, 32768 bytes) Oct 12 17:13:07 aragorn kernel: Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) Oct 12 17:13:07 aragorn kernel: Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) Oct 12 17:13:07 aragorn kernel: Initializing CPU#0 Oct 12 17:13:07 aragorn kernel: Checking aperture... Oct 12 17:13:07 aragorn kernel: No AGP bridge found Oct 12 17:13:07 aragorn kernel: Memory: 2023496k/2072256k available (2758k kernel code, 388k absent, 47724k reserved, 1927k data, 508k init) Oct 12 17:13:07 aragorn kernel: SLUB: Genslabs=13, HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1 Oct 12 17:13:07 aragorn kernel: Hierarchical RCU implementation. Oct 12 17:13:07 aragorn kernel: NR_IRQS:512 Oct 12 17:13:07 aragorn kernel: Extended CMOS year: 2000 Oct 12 17:13:07 aragorn kernel: Console: colour dummy device 80x25 Oct 12 17:13:07 aragorn kernel: console [tty0] enabled Oct 12 17:13:07 aragorn kernel: hpet clockevent registered Oct 12 17:13:07 aragorn kernel: HPET: 3 timers in total, 0 timers will be used for per-cpu timer Oct 12 17:13:07 aragorn kernel: Fast TSC calibration using PIT Oct 12 17:13:07 aragorn kernel: Detected 1330.067 MHz processor. Oct 12 17:13:07 aragorn kernel: Calibrating delay loop (skipped), value calculated using timer frequency.. 2660.13 BogoMIPS (lpj=5320268) Oct 12 17:13:07 aragorn kernel: Security Framework initialized Oct 12 17:13:07 aragorn kernel: SELinux: Disabled at boot. Oct 12 17:13:07 aragorn kernel: Mount-cache hash table entries: 256 Oct 12 17:13:07 aragorn kernel: CPU: L1 I cache: 32K, L1 D cache: 32K Oct 12 17:13:07 aragorn kernel: CPU: L2 cache: 2048K Oct 12 17:13:07 aragorn kernel: CPU: Physical Processor ID: 0 Oct 12 17:13:07 aragorn kernel: CPU: Processor Core ID: 0 Oct 12 17:13:07 aragorn kernel: mce: CPU supports 6 MCE banks Oct 12 17:13:07 aragorn kernel: CPU0: Thermal monitoring handled by SMI Oct 12 17:13:07 aragorn kernel: using mwait in idle threads. Oct 12 17:13:07 aragorn kernel: Performance Events: Core2 events, Intel PMU driver. Oct 12 17:13:07 aragorn kernel: ... version: 2 Oct 12 17:13:07 aragorn kernel: ... bit width: 40 Oct 12 17:13:07 aragorn kernel: ... generic registers: 2 Oct 12 17:13:07 aragorn kernel: ... value mask: 000000ffffffffff Oct 12 17:13:07 aragorn kernel: ... max period: 000000007fffffff Oct 12 17:13:07 aragorn kernel: ... fixed-purpose events: 3 Oct 12 17:13:07 aragorn kernel: ... event mask: 0000000700000003 Oct 12 17:13:07 aragorn kernel: ACPI: Core revision 20090903 Oct 12 17:13:07 aragorn kernel: ftrace: converting mcount calls to 0f 1f 44 00 00 Oct 12 17:13:07 aragorn kernel: ftrace: allocating 13965 entries in 55 pages Oct 12 17:13:07 aragorn kernel: Setting APIC routing to flat Oct 12 17:13:07 aragorn kernel: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 Oct 12 17:13:07 aragorn kernel: CPU0: Intel(R) Core(TM)2 Duo CPU U7700 @ 1.33GHz stepping 0d Oct 12 17:13:07 aragorn kernel: Booting processor 1 APIC 0x1 ip 0x6000 Oct 12 17:13:07 aragorn kernel: Initializing CPU#1 Oct 12 17:13:07 aragorn kernel: Calibrating delay using timer specific routine.. 2659.98 BogoMIPS (lpj=5319962) Oct 12 17:13:07 aragorn kernel: CPU: L1 I cache: 32K, L1 D cache: 32K Oct 12 17:13:07 aragorn kernel: CPU: L2 cache: 2048K Oct 12 17:13:07 aragorn kernel: CPU: Physical Processor ID: 0 Oct 12 17:13:07 aragorn kernel: CPU: Processor Core ID: 1 Oct 12 17:13:07 aragorn kernel: mce: CPU supports 6 MCE banks Oct 12 17:13:07 aragorn kernel: CPU1: Thermal monitoring enabled (TM2) Oct 12 17:13:07 aragorn kernel: CPU1: Intel(R) Core(TM)2 Duo CPU U7700 @ 1.33GHz stepping 0d Oct 12 17:13:07 aragorn kernel: checking TSC synchronization [CPU#0 -> CPU#1]: passed. Oct 12 17:13:07 aragorn kernel: Brought up 2 CPUs Oct 12 17:13:07 aragorn kernel: Total of 2 processors activated (5320.11 BogoMIPS). Oct 12 17:13:07 aragorn kernel: CPU0 attaching sched-domain: Oct 12 17:13:07 aragorn kernel: domain 0: span 0-1 level MC Oct 12 17:13:07 aragorn kernel: groups: 0 1 Oct 12 17:13:07 aragorn kernel: CPU1 attaching sched-domain: Oct 12 17:13:07 aragorn kernel: domain 0: span 0-1 level MC Oct 12 17:13:07 aragorn kernel: groups: 1 0 Oct 12 17:13:07 aragorn kernel: NET: Registered protocol family 16 Oct 12 17:13:07 aragorn kernel: ACPI FADT declares the system doesn't support PCIe ASPM, so disable it Oct 12 17:13:07 aragorn kernel: ACPI: bus type pci registered Oct 12 17:13:07 aragorn kernel: PCI: MCFG configuration 0: base f8000000 segment 0 buses 0 - 63 Oct 12 17:13:07 aragorn kernel: PCI: Not using MMCONFIG. Oct 12 17:13:07 aragorn kernel: PCI: Using configuration type 1 for base access Oct 12 17:13:07 aragorn kernel: bio: create slab <bio-0> at 0 Oct 12 17:13:07 aragorn kernel: ACPI: EC: Look up EC in DSDT Oct 12 17:13:07 aragorn kernel: ACPI: Interpreter enabled Oct 12 17:13:07 aragorn kernel: ACPI: (supports S0 S3 S4 S5) Oct 12 17:13:07 aragorn kernel: ACPI: Using IOAPIC for interrupt routing Oct 12 17:13:07 aragorn kernel: PCI: MCFG configuration 0: base f8000000 segment 0 buses 0 - 63 Oct 12 17:13:07 aragorn kernel: PCI: MCFG area at f8000000 reserved in ACPI motherboard resources Oct 12 17:13:07 aragorn kernel: PCI: Using MMCONFIG at f8000000 - fbffffff Oct 12 17:13:07 aragorn kernel: ACPI: EC: GPE = 0x16, I/O: command/status = 0x66, data = 0x62 Oct 12 17:13:07 aragorn kernel: ACPI: Power Resource [C29F] (on) Oct 12 17:13:07 aragorn kernel: ACPI: Power Resource [C1C7] (off) Oct 12 17:13:07 aragorn kernel: ACPI: Power Resource [C3AD] (off) Oct 12 17:13:07 aragorn kernel: ACPI: Power Resource [C3B0] (off) Oct 12 17:13:07 aragorn kernel: ACPI: Power Resource [C3C3] (off) Oct 12 17:13:07 aragorn kernel: ACPI: Power Resource [C3C4] (off) Oct 12 17:13:07 aragorn kernel: ACPI: Power Resource [C3C5] (off) Oct 12 17:13:07 aragorn kernel: ACPI: Power Resource [C3C6] (off) Oct 12 17:13:07 aragorn kernel: ACPI: Power Resource [C3C7] (off) Oct 12 17:13:07 aragorn kernel: ACPI: No dock devices found. Oct 12 17:13:07 aragorn kernel: ACPI: PCI Root Bridge [C003] (0000:00) Oct 12 17:13:07 aragorn kernel: pci 0000:00:02.0: reg 10 64bit mmio: [0xe0400000-0xe04fffff] Oct 12 17:13:07 aragorn kernel: pci 0000:00:02.0: reg 18 64bit mmio pref: [0xd0000000-0xdfffffff] Oct 12 17:13:07 aragorn kernel: pci 0000:00:02.0: reg 20 io port: [0x2000-0x2007] Oct 12 17:13:07 aragorn kernel: pci 0000:00:02.1: reg 10 64bit mmio: [0xe0500000-0xe05fffff] Oct 12 17:13:07 aragorn kernel: pci 0000:00:03.0: reg 10 64bit mmio: [0xe0600000-0xe060000f] Oct 12 17:13:07 aragorn kernel: pci 0000:00:03.0: PME# supported from D0 D3hot D3cold Oct 12 17:13:07 aragorn kernel: pci 0000:00:03.0: PME# disabled Oct 12 17:13:07 aragorn kernel: pci 0000:00:03.2: reg 10 io port: [0x2008-0x200f] Oct 12 17:13:07 aragorn kernel: pci 0000:00:03.2: reg 14 io port: [0x2010-0x2013] Oct 12 17:13:07 aragorn kernel: pci 0000:00:03.2: reg 18 io port: [0x2018-0x201f] Oct 12 17:13:07 aragorn kernel: pci 0000:00:03.2: reg 1c io port: [0x2020-0x2023] Oct 12 17:13:07 aragorn kernel: pci 0000:00:03.2: reg 20 io port: [0x2030-0x203f] Oct 12 17:13:07 aragorn kernel: pci 0000:00:03.3: reg 10 io port: [0x2040-0x2047] Oct 12 17:13:07 aragorn kernel: pci 0000:00:03.3: reg 14 32bit mmio: [0xe0601000-0xe0601fff] Oct 12 17:13:07 aragorn kernel: pci 0000:00:19.0: reg 10 32bit mmio: [0xe0620000-0xe063ffff] Oct 12 17:13:07 aragorn kernel: pci 0000:00:19.0: reg 14 32bit mmio: [0xe0640000-0xe0640fff] Oct 12 17:13:07 aragorn kernel: pci 0000:00:19.0: reg 18 io port: [0x2060-0x207f] Oct 12 17:13:07 aragorn kernel: pci 0000:00:19.0: PME# supported from D0 D3hot D3cold Oct 12 17:13:07 aragorn kernel: pci 0000:00:19.0: PME# disabled Oct 12 17:13:07 aragorn kernel: pci 0000:00:1a.0: reg 20 io port: [0x2080-0x209f] Oct 12 17:13:07 aragorn kernel: pci 0000:00:1a.1: reg 20 io port: [0x20a0-0x20bf] Oct 12 17:13:07 aragorn kernel: pci 0000:00:1a.7: reg 10 32bit mmio: [0xe0641000-0xe06413ff] Oct 12 17:13:07 aragorn kernel: pci 0000:00:1a.7: PME# supported from D0 D3hot D3cold Oct 12 17:13:07 aragorn kernel: pci 0000:00:1a.7: PME# disabled Oct 12 17:13:07 aragorn kernel: pci 0000:00:1b.0: reg 10 64bit mmio: [0xe0644000-0xe0647fff] Oct 12 17:13:07 aragorn kernel: pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold Oct 12 17:13:07 aragorn kernel: pci 0000:00:1b.0: PME# disabled Oct 12 17:13:07 aragorn kernel: pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold Oct 12 17:13:07 aragorn kernel: pci 0000:00:1c.0: PME# disabled Oct 12 17:13:07 aragorn kernel: pci 0000:00:1c.1: PME# supported from D0 D3hot D3cold Oct 12 17:13:07 aragorn kernel: pci 0000:00:1c.1: PME# disabled Oct 12 17:13:07 aragorn kernel: pci 0000:00:1d.0: reg 20 io port: [0x20c0-0x20df] Oct 12 17:13:07 aragorn kernel: pci 0000:00:1d.1: reg 20 io port: [0x20e0-0x20ff] Oct 12 17:13:07 aragorn kernel: pci 0000:00:1d.2: reg 20 io port: [0x2100-0x211f] Oct 12 17:13:07 aragorn kernel: pci 0000:00:1d.7: reg 10 32bit mmio: [0xe0648000-0xe06483ff] Oct 12 17:13:07 aragorn kernel: pci 0000:00:1d.7: PME# supported from D0 D3hot D3cold Oct 12 17:13:07 aragorn kernel: pci 0000:00:1d.7: PME# disabled Oct 12 17:13:07 aragorn kernel: pci 0000:00:1f.0: quirk: region 1000-107f claimed by ICH6 ACPI/GPIO/TCO Oct 12 17:13:07 aragorn kernel: pci 0000:00:1f.0: quirk: region 1100-113f claimed by ICH6 GPIO Oct 12 17:13:07 aragorn kernel: pci 0000:00:1f.0: ICH7 LPC Generic IO decode 1 PIO at 0500 (mask 007f) Oct 12 17:13:07 aragorn kernel: pci 0000:00:1f.0: ICH7 LPC Generic IO decode 4 PIO at 02e8 (mask 0007) Oct 12 17:13:07 aragorn kernel: pci 0000:00:1f.1: reg 10 io port: [0x00-0x07] Oct 12 17:13:07 aragorn kernel: pci 0000:00:1f.1: reg 14 io port: [0x00-0x03] Oct 12 17:13:07 aragorn kernel: pci 0000:00:1f.1: reg 18 io port: [0x00-0x07] Oct 12 17:13:07 aragorn kernel: pci 0000:00:1f.1: reg 1c io port: [0x00-0x03] Oct 12 17:13:07 aragorn kernel: pci 0000:00:1f.1: reg 20 io port: [0x2120-0x212f] Oct 12 17:13:07 aragorn kernel: pci 0000:10:00.0: reg 10 64bit mmio: [0xe0000000-0xe0001fff] Oct 12 17:13:07 aragorn kernel: pci 0000:10:00.0: PME# supported from D0 D3hot D3cold Oct 12 17:13:07 aragorn kernel: pci 0000:10:00.0: PME# disabled Oct 12 17:13:07 aragorn kernel: pci 0000:00:1c.1: bridge 32bit mmio: [0xe0000000-0xe00fffff] Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.0: reg 10 32bit mmio: [0xe0100000-0xe0100fff] Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.0: supports D1 D2 Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.0: PME# supported from D0 D1 D2 D3hot D3cold Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.0: PME# disabled Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.1: reg 10 32bit mmio: [0xe0101000-0xe01017ff] Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.1: supports D1 D2 Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.1: PME# supported from D0 D1 D2 D3hot D3cold Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.1: PME# disabled Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.2: reg 10 32bit mmio: [0xe0102000-0xe01020ff] Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.2: supports D1 D2 Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.2: PME# supported from D0 D1 D2 D3hot D3cold Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.2: PME# disabled Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.3: reg 10 32bit mmio: [0xe0103000-0xe01030ff] Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.3: supports D1 D2 Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.3: PME# supported from D0 D1 D2 D3hot D3cold Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.3: PME# disabled Oct 12 17:13:07 aragorn kernel: pci 0000:00:1e.0: transparent bridge Oct 12 17:13:07 aragorn kernel: pci 0000:00:1e.0: bridge 32bit mmio: [0xe0100000-0xe03fffff] Oct 12 17:13:07 aragorn kernel: pci_bus 0000:00: on NUMA node 0 Oct 12 17:13:07 aragorn kernel: ACPI: PCI Interrupt Routing Table [\_SB_.C003._PRT] Oct 12 17:13:07 aragorn kernel: ACPI: PCI Interrupt Routing Table [\_SB_.C003.C0B2._PRT] Oct 12 17:13:07 aragorn kernel: ACPI: PCI Interrupt Routing Table [\_SB_.C003.C11F._PRT] Oct 12 17:13:07 aragorn kernel: ACPI: PCI Interrupt Routing Table [\_SB_.C003.C133._PRT] Oct 12 17:13:07 aragorn kernel: ACPI: PCI Interrupt Link [C12F] (IRQs *10 11) Oct 12 17:13:07 aragorn kernel: ACPI: PCI Interrupt Link [C130] (IRQs *10 11) Oct 12 17:13:07 aragorn kernel: ACPI: PCI Interrupt Link [C131] (IRQs 10 *11) Oct 12 17:13:07 aragorn kernel: ACPI: PCI Interrupt Link [C132] (IRQs 10 11) *5 Oct 12 17:13:07 aragorn kernel: ACPI: PCI Interrupt Link [C142] (IRQs *10 11) Oct 12 17:13:07 aragorn kernel: ACPI: PCI Interrupt Link [C143] (IRQs 10 11) *0, disabled. Oct 12 17:13:07 aragorn kernel: ACPI: PCI Interrupt Link [C144] (IRQs 10 *11) Oct 12 17:13:07 aragorn kernel: ACPI Exception: AE_NOT_FOUND, Evaluating _PRS (20090903/pci_link-184) Oct 12 17:13:07 aragorn kernel: vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none Oct 12 17:13:07 aragorn kernel: vgaarb: loaded Oct 12 17:13:07 aragorn kernel: usbcore: registered new interface driver usbfs Oct 12 17:13:07 aragorn kernel: usbcore: registered new interface driver hub Oct 12 17:13:07 aragorn kernel: usbcore: registered new device driver usb Oct 12 17:13:07 aragorn kernel: PCI: Using ACPI for IRQ routing Oct 12 17:13:07 aragorn kernel: hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 Oct 12 17:13:07 aragorn kernel: hpet0: 3 comparators, 64-bit 14.318180 MHz counter Oct 12 17:13:07 aragorn kernel: Switching to clocksource tsc Oct 12 17:13:07 aragorn kernel: pnp: PnP ACPI init Oct 12 17:13:07 aragorn kernel: ACPI: bus type pnp registered Oct 12 17:13:07 aragorn kernel: pnp: PnP ACPI: found 14 devices Oct 12 17:13:07 aragorn kernel: ACPI: ACPI bus type pnp unregistered Oct 12 17:13:07 aragorn kernel: system 00:00: iomem range 0x0-0x9ffff could not be reserved Oct 12 17:13:07 aragorn kernel: (The fact that a range could not be reserved is generally harmless.) Oct 12 17:13:07 aragorn kernel: system 00:00: iomem range 0xe0000-0xfffff could not be reserved Oct 12 17:13:07 aragorn kernel: system 00:00: iomem range 0x100000-0x7e7fffff could not be reserved Oct 12 17:13:07 aragorn kernel: system 00:0a: ioport range 0x500-0x55f has been reserved Oct 12 17:13:07 aragorn kernel: system 00:0a: ioport range 0x800-0x80f has been reserved Oct 12 17:13:07 aragorn kernel: system 00:0a: iomem range 0xffb00000-0xffbfffff has been reserved Oct 12 17:13:07 aragorn kernel: system 00:0a: iomem range 0xfff00000-0xffffffff has been reserved Oct 12 17:13:07 aragorn kernel: system 00:0c: ioport range 0x4d0-0x4d1 has been reserved Oct 12 17:13:07 aragorn kernel: system 00:0c: ioport range 0x2f8-0x2ff has been reserved Oct 12 17:13:07 aragorn kernel: system 00:0c: ioport range 0x3f8-0x3ff has been reserved Oct 12 17:13:07 aragorn kernel: system 00:0c: ioport range 0x1000-0x107f has been reserved Oct 12 17:13:07 aragorn kernel: system 00:0c: ioport range 0x1100-0x113f has been reserved Oct 12 17:13:07 aragorn kernel: system 00:0c: ioport range 0x1200-0x121f has been reserved Oct 12 17:13:07 aragorn kernel: system 00:0c: iomem range 0xf8000000-0xfbffffff has been reserved Oct 12 17:13:07 aragorn kernel: system 00:0c: iomem range 0xfec00000-0xfec000ff could not be reserved Oct 12 17:13:07 aragorn kernel: system 00:0c: iomem range 0xfed20000-0xfed3ffff has been reserved Oct 12 17:13:07 aragorn kernel: system 00:0c: iomem range 0xfed45000-0xfed8ffff has been reserved Oct 12 17:13:07 aragorn kernel: system 00:0c: iomem range 0xfed90000-0xfed99fff has been reserved Oct 12 17:13:07 aragorn kernel: system 00:0d: iomem range 0xcee00-0xcffff has been reserved Oct 12 17:13:07 aragorn kernel: system 00:0d: iomem range 0xd2000-0xd3fff has been reserved Oct 12 17:13:07 aragorn kernel: system 00:0d: iomem range 0xfeda0000-0xfedbffff has been reserved Oct 12 17:13:07 aragorn kernel: system 00:0d: iomem range 0xfee00000-0xfee00fff has been reserved Oct 12 17:13:07 aragorn kernel: pci 0000:00:1c.0: PCI bridge, secondary bus 0000:08 Oct 12 17:13:07 aragorn kernel: pci 0000:00:1c.0: IO window: disabled Oct 12 17:13:07 aragorn kernel: pci 0000:00:1c.0: MEM window: disabled Oct 12 17:13:07 aragorn kernel: pci 0000:00:1c.0: PREFETCH window: disabled Oct 12 17:13:07 aragorn kernel: pci 0000:00:1c.1: PCI bridge, secondary bus 0000:10 Oct 12 17:13:07 aragorn kernel: pci 0000:00:1c.1: IO window: disabled Oct 12 17:13:07 aragorn kernel: pci 0000:00:1c.1: MEM window: 0xe0000000-0xe00fffff Oct 12 17:13:07 aragorn kernel: pci 0000:00:1c.1: PREFETCH window: disabled Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.0: CardBus bridge, secondary bus 0000:03 Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.0: IO window: 0x003000-0x0030ff Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.0: IO window: 0x003400-0x0034ff Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.0: PREFETCH window: 0x80000000-0x83ffffff Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.0: MEM window: 0x84000000-0x87ffffff Oct 12 17:13:07 aragorn kernel: pci 0000:00:1e.0: PCI bridge, secondary bus 0000:02 Oct 12 17:13:07 aragorn kernel: pci 0000:00:1e.0: IO window: 0x3000-0x3fff Oct 12 17:13:07 aragorn kernel: pci 0000:00:1e.0: MEM window: 0xe0100000-0xe03fffff Oct 12 17:13:07 aragorn kernel: pci 0000:00:1e.0: PREFETCH window: 0x80000000-0x83ffffff Oct 12 17:13:07 aragorn kernel: pci 0000:00:1c.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 Oct 12 17:13:07 aragorn kernel: pci 0000:00:1c.0: setting latency timer to 64 Oct 12 17:13:07 aragorn kernel: pci 0000:00:1c.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17 Oct 12 17:13:07 aragorn kernel: pci 0000:00:1c.1: setting latency timer to 64 Oct 12 17:13:07 aragorn kernel: pci 0000:00:1e.0: setting latency timer to 64 Oct 12 17:13:07 aragorn kernel: pci 0000:02:06.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18 Oct 12 17:13:07 aragorn kernel: pci_bus 0000:00: resource 0 io: [0x00-0xffff] Oct 12 17:13:07 aragorn kernel: pci_bus 0000:00: resource 1 mem: [0x000000-0xffffffffffffffff] Oct 12 17:13:07 aragorn kernel: pci_bus 0000:10: resource 1 mem: [0xe0000000-0xe00fffff] Oct 12 17:13:07 aragorn kernel: pci_bus 0000:02: resource 0 io: [0x3000-0x3fff] Oct 12 17:13:07 aragorn kernel: pci_bus 0000:02: resource 1 mem: [0xe0100000-0xe03fffff] Oct 12 17:13:07 aragorn kernel: pci_bus 0000:02: resource 2 pref mem [0x80000000-0x83ffffff] Oct 12 17:13:07 aragorn kernel: pci_bus 0000:02: resource 3 io: [0x00-0xffff] Oct 12 17:13:07 aragorn kernel: pci_bus 0000:02: resource 4 mem: [0x000000-0xffffffffffffffff] Oct 12 17:13:07 aragorn kernel: pci_bus 0000:03: resource 0 io: [0x3000-0x30ff] Oct 12 17:13:07 aragorn kernel: pci_bus 0000:03: resource 1 io: [0x3400-0x34ff] Oct 12 17:13:07 aragorn kernel: pci_bus 0000:03: resource 2 pref mem [0x80000000-0x83ffffff] Oct 12 17:13:07 aragorn kernel: pci_bus 0000:03: resource 3 mem: [0x84000000-0x87ffffff] Oct 12 17:13:07 aragorn kernel: NET: Registered protocol family 2 Oct 12 17:13:07 aragorn kernel: IP route cache hash table entries: 65536 (order: 7, 524288 bytes) Oct 12 17:13:07 aragorn kernel: TCP established hash table entries: 262144 (order: 10, 4194304 bytes) Oct 12 17:13:07 aragorn kernel: TCP bind hash table entries: 65536 (order: 9, 2097152 bytes) Oct 12 17:13:07 aragorn kernel: TCP: Hash tables configured (established 262144 bind 65536) Oct 12 17:13:07 aragorn kernel: TCP reno registered Oct 12 17:13:07 aragorn kernel: NET: Registered protocol family 1 Oct 12 17:13:07 aragorn kernel: Trying to unpack rootfs image as initramfs... Oct 12 17:13:07 aragorn kernel: Freeing initrd memory: 5592k freed Oct 12 17:13:07 aragorn kernel: audit: initializing netlink socket (disabled) Oct 12 17:13:07 aragorn kernel: type=2000 audit(1255360345.659:1): initialized Oct 12 17:13:07 aragorn kernel: VFS: Disk quotas dquot_6.5.2 Oct 12 17:13:07 aragorn kernel: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) Oct 12 17:13:07 aragorn kernel: msgmni has been set to 3964 Oct 12 17:13:07 aragorn kernel: alg: No test for stdrng (krng) Oct 12 17:13:07 aragorn kernel: io scheduler noop registered Oct 12 17:13:07 aragorn kernel: io scheduler anticipatory registered Oct 12 17:13:07 aragorn kernel: io scheduler deadline registered Oct 12 17:13:07 aragorn kernel: io scheduler cfq registered (default) Oct 12 17:13:07 aragorn kernel: pci 0000:00:02.0: Boot video device Oct 12 17:13:07 aragorn kernel: pcieport-driver 0000:00:1c.0: irq 24 for MSI/MSI-X Oct 12 17:13:07 aragorn kernel: pcieport-driver 0000:00:1c.0: setting latency timer to 64 Oct 12 17:13:07 aragorn kernel: pcieport-driver 0000:00:1c.1: irq 25 for MSI/MSI-X Oct 12 17:13:07 aragorn kernel: pcieport-driver 0000:00:1c.1: setting latency timer to 64 Oct 12 17:13:07 aragorn kernel: vesafb: framebuffer at 0xd0000000, mapped to 0xffffc90004100000, using 3072k, total 7616k Oct 12 17:13:07 aragorn kernel: vesafb: mode is 1024x768x16, linelength=2048, pages=3 Oct 12 17:13:07 aragorn kernel: vesafb: scrolling: redraw Oct 12 17:13:07 aragorn kernel: vesafb: Truecolor: size=0:5:6:5, shift=0:11:5:0 Oct 12 17:13:07 aragorn kernel: Console: switching to colour frame buffer device 128x48 Oct 12 17:13:07 aragorn kernel: fb0: VESA VGA frame buffer device Oct 12 17:13:07 aragorn kernel: Linux agpgart interface v0.103 Oct 12 17:13:07 aragorn kernel: Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled Oct 12 17:13:07 aragorn kernel: serial 0000:00:03.3: PCI INT B -> GSI 17 (level, low) -> IRQ 17 Oct 12 17:13:07 aragorn kernel: 0000:00:03.3: ttyS0 at I/O 0x2040 (irq = 17) is a 16550A Oct 12 17:13:07 aragorn kernel: brd: module loaded Oct 12 17:13:07 aragorn kernel: PNP: PS/2 Controller [PNP0303:C29C,PNP0f13:C29D] at 0x60,0x64 irq 1,12 Oct 12 17:13:07 aragorn kernel: i8042.c: Detected active multiplexing controller, rev 1.1. Oct 12 17:13:07 aragorn kernel: serio: i8042 KBD port at 0x60,0x64 irq 1 Oct 12 17:13:07 aragorn kernel: serio: i8042 AUX0 port at 0x60,0x64 irq 12 Oct 12 17:13:07 aragorn kernel: serio: i8042 AUX1 port at 0x60,0x64 irq 12 Oct 12 17:13:07 aragorn kernel: serio: i8042 AUX2 port at 0x60,0x64 irq 12 Oct 12 17:13:07 aragorn kernel: serio: i8042 AUX3 port at 0x60,0x64 irq 12 Oct 12 17:13:07 aragorn kernel: mice: PS/2 mouse device common for all mice Oct 12 17:13:07 aragorn kernel: Driver 'rtc_cmos' needs updating - please use bus_type methods Oct 12 17:13:07 aragorn kernel: rtc_cmos 00:06: RTC can wake from S4 Oct 12 17:13:07 aragorn kernel: rtc_cmos 00:06: rtc core: registered rtc_cmos as rtc0 Oct 12 17:13:07 aragorn kernel: rtc0: alarms up to one month, y3k, 114 bytes nvram, hpet irqs Oct 12 17:13:07 aragorn kernel: cpuidle: using governor ladder Oct 12 17:13:07 aragorn kernel: cpuidle: using governor menu Oct 12 17:13:07 aragorn kernel: TCP cubic registered Oct 12 17:13:07 aragorn kernel: NET: Registered protocol family 17 Oct 12 17:13:07 aragorn kernel: registered taskstats version 1 Oct 12 17:13:07 aragorn kernel: rtc_cmos 00:06: setting system clock to 2009-10-12 15:12:26 UTC (1255360346) Oct 12 17:13:07 aragorn kernel: Freeing unused kernel memory: 508k freed Oct 12 17:13:07 aragorn kernel: input: AT Translated Set 2 keyboard as /class/input/input0 Oct 12 17:13:07 aragorn kernel: fan PNP0C0B:00: registered as cooling_device0 Oct 12 17:13:07 aragorn kernel: ACPI: Fan [C3B1] (off) Oct 12 17:13:07 aragorn kernel: fan PNP0C0B:01: registered as cooling_device1 Oct 12 17:13:07 aragorn kernel: ACPI: Fan [C3B2] (off) Oct 12 17:13:07 aragorn kernel: fan PNP0C0B:02: registered as cooling_device2 Oct 12 17:13:07 aragorn kernel: ACPI: Fan [C3C8] (off) Oct 12 17:13:07 aragorn kernel: fan PNP0C0B:03: registered as cooling_device3 Oct 12 17:13:07 aragorn kernel: ACPI: Fan [C3C9] (off) Oct 12 17:13:07 aragorn kernel: fan PNP0C0B:04: registered as cooling_device4 Oct 12 17:13:07 aragorn kernel: ACPI: Fan [C3CA] (off) Oct 12 17:13:07 aragorn kernel: fan PNP0C0B:05: registered as cooling_device5 Oct 12 17:13:07 aragorn kernel: ACPI: Fan [C3CB] (off) Oct 12 17:13:07 aragorn kernel: fan PNP0C0B:06: registered as cooling_device6 Oct 12 17:13:07 aragorn kernel: ACPI: Fan [C3CC] (off) Oct 12 17:13:07 aragorn kernel: thermal LNXTHERM:01: registered as thermal_zone0 Oct 12 17:13:07 aragorn kernel: ACPI: Thermal Zone [TZ6] (25 C) Oct 12 17:13:07 aragorn kernel: thermal LNXTHERM:02: registered as thermal_zone1 Oct 12 17:13:07 aragorn kernel: ACPI: Thermal Zone [TZ0] (55 C) Oct 12 17:13:07 aragorn kernel: thermal LNXTHERM:03: registered as thermal_zone2 Oct 12 17:13:07 aragorn kernel: ACPI: Thermal Zone [TZ1] (57 C) Oct 12 17:13:07 aragorn kernel: thermal LNXTHERM:04: registered as thermal_zone3 Oct 12 17:13:07 aragorn kernel: ACPI: Thermal Zone [TZ3] (45 C) Oct 12 17:13:07 aragorn kernel: thermal LNXTHERM:05: registered as thermal_zone4 Oct 12 17:13:07 aragorn kernel: ACPI: Thermal Zone [TZ4] (33 C) Oct 12 17:13:07 aragorn kernel: thermal LNXTHERM:06: registered as thermal_zone5 Oct 12 17:13:07 aragorn kernel: ACPI: Thermal Zone [TZ5] (49 C) Oct 12 17:13:07 aragorn kernel: e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k2 Oct 12 17:13:07 aragorn kernel: e1000e: Copyright (c) 1999-2008 Intel Corporation. Oct 12 17:13:07 aragorn kernel: e1000e 0000:00:19.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22 Oct 12 17:13:07 aragorn kernel: e1000e 0000:00:19.0: setting latency timer to 64 Oct 12 17:13:07 aragorn kernel: e1000e 0000:00:19.0: irq 26 for MSI/MSI-X Oct 12 17:13:07 aragorn kernel: ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver Oct 12 17:13:07 aragorn kernel: SCSI subsystem initialized Oct 12 17:13:07 aragorn kernel: libata version 3.00 loaded. Oct 12 17:13:07 aragorn kernel: ricoh-mmc: Ricoh MMC Controller disabling driver Oct 12 17:13:07 aragorn kernel: ricoh-mmc: Copyright(c) Philip Langdale Oct 12 17:13:07 aragorn kernel: ricoh-mmc: Ricoh MMC controller found at 0000:02:06.3 [1180:0843] (rev 11) Oct 12 17:13:07 aragorn kernel: ricoh-mmc: Controller is now disabled. Oct 12 17:13:07 aragorn kernel: ohci1394 0000:02:06.1: PCI INT B -> GSI 19 (level, low) -> IRQ 19 Oct 12 17:13:07 aragorn kernel: sdhci: Secure Digital Host Controller Interface driver Oct 12 17:13:07 aragorn kernel: sdhci: Copyright(c) Pierre Ossman Oct 12 17:13:07 aragorn kernel: ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[19] MMIO=[e0101000-e01017ff] Max Packet=[2048] IR/IT contexts=[4/4] Oct 12 17:13:07 aragorn kernel: sdhci-pci 0000:02:06.2: SDHCI controller found [1180:0822] (rev 21) Oct 12 17:13:07 aragorn kernel: sdhci-pci 0000:02:06.2: PCI INT C -> GSI 20 (level, low) -> IRQ 20 Oct 12 17:13:07 aragorn kernel: Registered led device: mmc0:: Oct 12 17:13:07 aragorn kernel: mmc0: SDHCI controller on PCI [0000:02:06.2] using PIO Oct 12 17:13:07 aragorn kernel: 0000:00:19.0: eth0: (PCI Express:2.5GB/s:Width x1) 00:1e:68:5e:3b:04 Oct 12 17:13:07 aragorn kernel: 0000:00:19.0: eth0: Intel(R) PRO/1000 Network Connection Oct 12 17:13:07 aragorn kernel: 0000:00:19.0: eth0: MAC: 6, PHY: 6, PBA No: ffffff-0ff Oct 12 17:13:07 aragorn kernel: ehci_hcd 0000:00:1a.7: PCI INT C -> GSI 18 (level, low) -> IRQ 18 Oct 12 17:13:07 aragorn kernel: ehci_hcd 0000:00:1a.7: setting latency timer to 64 Oct 12 17:13:07 aragorn kernel: ehci_hcd 0000:00:1a.7: EHCI Host Controller Oct 12 17:13:07 aragorn kernel: ehci_hcd 0000:00:1a.7: new USB bus registered, assigned bus number 1 Oct 12 17:13:07 aragorn kernel: ehci_hcd 0000:00:1a.7: debug port 1 Oct 12 17:13:07 aragorn kernel: ehci_hcd 0000:00:1a.7: cache line size of 32 is not supported Oct 12 17:13:07 aragorn kernel: ehci_hcd 0000:00:1a.7: irq 18, io mem 0xe0641000 Oct 12 17:13:07 aragorn kernel: ehci_hcd 0000:00:1a.7: USB 2.0 started, EHCI 1.00 Oct 12 17:13:07 aragorn kernel: usb usb1: configuration #1 chosen from 1 choice Oct 12 17:13:07 aragorn kernel: hub 1-0:1.0: USB hub found Oct 12 17:13:07 aragorn kernel: hub 1-0:1.0: 4 ports detected Oct 12 17:13:07 aragorn kernel: ehci_hcd 0000:00:1d.7: PCI INT A -> GSI 20 (level, low) -> IRQ 20 Oct 12 17:13:07 aragorn kernel: ehci_hcd 0000:00:1d.7: setting latency timer to 64 Oct 12 17:13:07 aragorn kernel: ehci_hcd 0000:00:1d.7: EHCI Host Controller Oct 12 17:13:07 aragorn kernel: ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 2 Oct 12 17:13:07 aragorn kernel: ehci_hcd 0000:00:1d.7: debug port 1 Oct 12 17:13:07 aragorn kernel: ehci_hcd 0000:00:1d.7: cache line size of 32 is not supported Oct 12 17:13:07 aragorn kernel: ehci_hcd 0000:00:1d.7: irq 20, io mem 0xe0648000 Oct 12 17:13:07 aragorn kernel: ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00 Oct 12 17:13:07 aragorn kernel: usb usb2: configuration #1 chosen from 1 choice Oct 12 17:13:07 aragorn kernel: hub 2-0:1.0: USB hub found Oct 12 17:13:07 aragorn kernel: hub 2-0:1.0: 6 ports detected Oct 12 17:13:07 aragorn kernel: pata_acpi 0000:00:03.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18 Oct 12 17:13:07 aragorn kernel: pata_acpi 0000:00:03.2: setting latency timer to 64 Oct 12 17:13:07 aragorn kernel: pata_acpi 0000:00:03.2: PCI INT C disabled Oct 12 17:13:07 aragorn kernel: pata_acpi 0000:00:1f.1: PCI INT A -> GSI 16 (level, low) -> IRQ 16 Oct 12 17:13:07 aragorn kernel: pata_acpi 0000:00:1f.1: setting latency timer to 64 Oct 12 17:13:07 aragorn kernel: pata_acpi 0000:00:1f.1: PCI INT A disabled Oct 12 17:13:07 aragorn kernel: ata_piix 0000:00:1f.1: version 2.13 Oct 12 17:13:07 aragorn kernel: ata_piix 0000:00:1f.1: quirky BIOS, skipping spindown on poweroff and hibernation Oct 12 17:13:07 aragorn kernel: ata_piix 0000:00:1f.1: PCI INT A -> GSI 16 (level, low) -> IRQ 16 Oct 12 17:13:07 aragorn kernel: ata_piix 0000:00:1f.1: setting latency timer to 64 Oct 12 17:13:07 aragorn kernel: uhci_hcd: USB Universal Host Controller Interface driver Oct 12 17:13:07 aragorn kernel: Uniform Multi-Platform E-IDE driver Oct 12 17:13:07 aragorn kernel: scsi0 : ata_piix Oct 12 17:13:07 aragorn kernel: scsi1 : ata_piix Oct 12 17:13:07 aragorn kernel: ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0x2120 irq 14 Oct 12 17:13:07 aragorn kernel: ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0x2128 irq 15 Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1a.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1a.0: setting latency timer to 64 Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1a.0: UHCI Host Controller Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1a.0: new USB bus registered, assigned bus number 3 Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1a.0: irq 16, io base 0x00002080 Oct 12 17:13:07 aragorn kernel: usb usb3: configuration #1 chosen from 1 choice Oct 12 17:13:07 aragorn kernel: hub 3-0:1.0: USB hub found Oct 12 17:13:07 aragorn kernel: hub 3-0:1.0: 2 ports detected Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1a.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17 Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1a.1: setting latency timer to 64 Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1a.1: UHCI Host Controller Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1a.1: new USB bus registered, assigned bus number 4 Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1a.1: irq 17, io base 0x000020a0 Oct 12 17:13:07 aragorn kernel: usb usb4: configuration #1 chosen from 1 choice Oct 12 17:13:07 aragorn kernel: hub 4-0:1.0: USB hub found Oct 12 17:13:07 aragorn kernel: hub 4-0:1.0: 2 ports detected Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1d.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20 Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1d.0: setting latency timer to 64 Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1d.0: UHCI Host Controller Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 5 Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1d.0: irq 20, io base 0x000020c0 Oct 12 17:13:07 aragorn kernel: usb usb5: configuration #1 chosen from 1 choice Oct 12 17:13:07 aragorn kernel: hub 5-0:1.0: USB hub found Oct 12 17:13:07 aragorn kernel: hub 5-0:1.0: 2 ports detected Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1d.1: PCI INT B -> GSI 22 (level, low) -> IRQ 22 Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1d.1: setting latency timer to 64 Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1d.1: UHCI Host Controller Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 6 Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1d.1: irq 22, io base 0x000020e0 Oct 12 17:13:07 aragorn kernel: usb usb6: configuration #1 chosen from 1 choice Oct 12 17:13:07 aragorn kernel: hub 6-0:1.0: USB hub found Oct 12 17:13:07 aragorn kernel: hub 6-0:1.0: 2 ports detected Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1d.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18 Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1d.2: setting latency timer to 64 Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1d.2: UHCI Host Controller Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 7 Oct 12 17:13:07 aragorn kernel: uhci_hcd 0000:00:1d.2: irq 18, io base 0x00002100 Oct 12 17:13:07 aragorn kernel: usb usb7: configuration #1 chosen from 1 choice Oct 12 17:13:07 aragorn kernel: hub 7-0:1.0: USB hub found Oct 12 17:13:07 aragorn kernel: hub 7-0:1.0: 2 ports detected Oct 12 17:13:07 aragorn kernel: ata2: port disabled. ignoring. Oct 12 17:13:07 aragorn kernel: ata1.00: ATA-7: SAMSUNG HS122JC, GQ100-04, max UDMA/100 Oct 12 17:13:07 aragorn kernel: ata1.00: 234441648 sectors, multi 16: LBA Oct 12 17:13:07 aragorn kernel: ata1.01: ATAPI: MATSHITADVD-RAM UJ-852S, 1.02, max MWDMA2 Oct 12 17:13:07 aragorn kernel: ata1.00: configured for UDMA/100 Oct 12 17:13:07 aragorn kernel: ata1.01: configured for MWDMA2 Oct 12 17:13:07 aragorn kernel: scsi 0:0:0:0: Direct-Access ATA SAMSUNG HS122JC GQ10 PQ: 0 ANSI: 5 Oct 12 17:13:07 aragorn kernel: scsi 0:0:1:0: CD-ROM MATSHITA DVD-RAM UJ-852S 1.02 PQ: 0 ANSI: 5 Oct 12 17:13:07 aragorn kernel: sd 0:0:0:0: [sda] 234441648 512-byte logical blocks: (120 GB/111 GiB) Oct 12 17:13:07 aragorn kernel: sd 0:0:0:0: [sda] Write Protect is off Oct 12 17:13:07 aragorn kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Oct 12 17:13:07 aragorn kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Oct 12 17:13:07 aragorn kernel: sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > Oct 12 17:13:07 aragorn kernel: sd 0:0:0:0: [sda] Attached SCSI disk Oct 12 17:13:07 aragorn kernel: sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray Oct 12 17:13:07 aragorn kernel: Uniform CD-ROM driver Revision: 3.20 Oct 12 17:13:07 aragorn kernel: sr 0:0:1:0: Attached scsi CD-ROM sr0 Oct 12 17:13:07 aragorn kernel: sd 0:0:0:0: Attached scsi generic sg0 type 0 Oct 12 17:13:07 aragorn kernel: sr 0:0:1:0: Attached scsi generic sg1 type 5 Oct 12 17:13:07 aragorn kernel: usb 1-2: new high speed USB device using ehci_hcd and address 3 Oct 12 17:13:07 aragorn kernel: usb 1-2: configuration #1 chosen from 1 choice Oct 12 17:13:07 aragorn kernel: hub 1-2:1.0: USB hub found Oct 12 17:13:07 aragorn kernel: hub 1-2:1.0: 4 ports detected Oct 12 17:13:07 aragorn kernel: usb 3-1: new full speed USB device using uhci_hcd and address 2 Oct 12 17:13:07 aragorn kernel: device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised: dm-devel@redhat.com Oct 12 17:13:07 aragorn kernel: usb 3-1: configuration #1 chosen from 1 choice Oct 12 17:13:07 aragorn kernel: ieee1394: Host added: ID:BUS[0-00:1023] GUID[001b249929192210] Oct 12 17:13:07 aragorn kernel: usb 5-2: new full speed USB device using uhci_hcd and address 2 Oct 12 17:13:07 aragorn kernel: usb 5-2: configuration #1 chosen from 1 choice Oct 12 17:13:07 aragorn kernel: usb 1-2.2: new full speed USB device using ehci_hcd and address 4 Oct 12 17:13:07 aragorn kernel: usb 1-2.2: configuration #1 chosen from 1 choice Oct 12 17:13:07 aragorn kernel: usb 1-2.3: new low speed USB device using ehci_hcd and address 5 Oct 12 17:13:07 aragorn kernel: usb 1-2.3: configuration #1 chosen from 1 choice Oct 12 17:13:07 aragorn kernel: usbcore: registered new interface driver hiddev Oct 12 17:13:07 aragorn kernel: input: Logitech USB Receiver as /class/input/input1 Oct 12 17:13:07 aragorn kernel: generic-usb 0003:046D:C50D.0001: input: USB HID v1.10 Mouse [Logitech USB Receiver] on usb-0000:00:1a.7-2.3/input0 Oct 12 17:13:07 aragorn kernel: usbcore: registered new interface driver usbhid Oct 12 17:13:07 aragorn kernel: usbhid: v2.6:USB HID core driver Oct 12 17:13:07 aragorn kernel: usb 1-2.4: new low speed USB device using ehci_hcd and address 6 Oct 12 17:13:07 aragorn kernel: usb 1-2.4: configuration #1 chosen from 1 choice Oct 12 17:13:07 aragorn kernel: input: USB Compliant Keyboard as /class/input/input2 Oct 12 17:13:07 aragorn kernel: generic-usb 0003:05A4:9841.0002: input: USB HID v1.10 Keyboard [USB Compliant Keyboard] on usb-0000:00:1a.7-2.4/input0 Oct 12 17:13:07 aragorn kernel: input: USB Compliant Keyboard as /class/input/input3 Oct 12 17:13:07 aragorn kernel: generic-usb 0003:05A4:9841.0003: input: USB HID v1.10 Device [USB Compliant Keyboard] on usb-0000:00:1a.7-2.4/input1 Oct 12 17:13:07 aragorn kernel: PM: Starting manual resume from disk Oct 12 17:13:07 aragorn kernel: kjournald starting. Commit interval 5 seconds Oct 12 17:13:07 aragorn kernel: EXT3-fs: mounted filesystem with ordered data mode. Oct 12 17:13:07 aragorn kernel: udevd version 125 started Oct 12 17:13:07 aragorn kernel: input: Sleep Button as /class/input/input4 Oct 12 17:13:07 aragorn kernel: ACPI: Sleep Button [C2BF] Oct 12 17:13:07 aragorn kernel: input: Lid Switch as /class/input/input5 Oct 12 17:13:07 aragorn kernel: ACPI: Lid Switch [C155] Oct 12 17:13:07 aragorn kernel: input: Power Button as /class/input/input6 Oct 12 17:13:07 aragorn kernel: ACPI: Power Button [PWRF] Oct 12 17:13:07 aragorn kernel: ACPI: SSDT 000000007e7dbd42 0027F (v01 HP Cpu0Ist 00003000 INTL 20060317) Oct 12 17:13:07 aragorn kernel: ACPI: SSDT 000000007e7dc046 005FA (v01 HP Cpu0Cst 00003001 INTL 20060317) Oct 12 17:13:07 aragorn kernel: ACPI: AC Adapter [C23B] (on-line) Oct 12 17:13:07 aragorn kernel: ACPI: WMI: Mapper loaded Oct 12 17:13:07 aragorn kernel: Monitor-Mwait will be used to enter C-1 state Oct 12 17:13:07 aragorn kernel: Monitor-Mwait will be used to enter C-2 state Oct 12 17:13:07 aragorn kernel: Marking TSC unstable due to TSC halts in idle Oct 12 17:13:07 aragorn kernel: processor LNXCPU:00: registered as cooling_device7 Oct 12 17:13:07 aragorn kernel: ACPI: SSDT 000000007e7dbc7a 000C8 (v01 HP Cpu1Ist 00003000 INTL 20060317) Oct 12 17:13:07 aragorn kernel: ACPI: SSDT 000000007e7dbfc1 00085 (v01 HP Cpu1Cst 00003000 INTL 20060317) Oct 12 17:13:07 aragorn kernel: Switching to clocksource hpet Oct 12 17:13:07 aragorn kernel: agpgart-intel 0000:00:00.0: Intel 965GM Chipset Oct 12 17:13:07 aragorn kernel: agpgart-intel 0000:00:00.0: detected 7676K stolen memory Oct 12 17:13:07 aragorn kernel: agpgart-intel 0000:00:00.0: AGP aperture is 256M @ 0xd0000000 Oct 12 17:13:07 aragorn kernel: processor LNXCPU:01: registered as cooling_device8 Oct 12 17:13:07 aragorn kernel: ACPI: Battery Slot [C23D] (battery present) Oct 12 17:13:07 aragorn kernel: lis3lv02d: hardware type NC2510 found Oct 12 17:13:07 aragorn kernel: lis3lv02d: 2-byte sensor found Oct 12 17:13:07 aragorn kernel: input: ST LIS3LV02DL Accelerometer as /class/input/input7 Oct 12 17:13:07 aragorn kernel: Registered led device: hp::hddprotect Oct 12 17:13:07 aragorn kernel: input: PC Speaker as /class/input/input8 Oct 12 17:13:07 aragorn kernel: cfg80211: Using static regulatory domain info Oct 12 17:13:07 aragorn kernel: cfg80211: Regulatory domain: EU Oct 12 17:13:07 aragorn kernel: ^I(start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) Oct 12 17:13:07 aragorn kernel: ^I(2402000 KHz - 2482000 KHz @ 40000 KHz), (600 mBi, 2000 mBm) Oct 12 17:13:07 aragorn kernel: ^I(5170000 KHz - 5190000 KHz @ 40000 KHz), (600 mBi, 2300 mBm) Oct 12 17:13:07 aragorn kernel: ^I(5190000 KHz - 5210000 KHz @ 40000 KHz), (600 mBi, 2300 mBm) Oct 12 17:13:07 aragorn kernel: ^I(5210000 KHz - 5230000 KHz @ 40000 KHz), (600 mBi, 2300 mBm) Oct 12 17:13:07 aragorn kernel: ^I(5230000 KHz - 5330000 KHz @ 40000 KHz), (600 mBi, 2000 mBm) Oct 12 17:13:07 aragorn kernel: ^I(5490000 KHz - 5710000 KHz @ 40000 KHz), (600 mBi, 3000 mBm) Oct 12 17:13:07 aragorn kernel: cfg80211: Calling CRDA for country: EU Oct 12 17:13:07 aragorn kernel: cfg80211: Calling CRDA for country: EU Oct 12 17:13:07 aragorn kernel: input: PS/2 Generic Mouse as /class/input/input9 Oct 12 17:13:07 aragorn kernel: usblp0: USB Bidirectional printer dev 4 if 0 alt 1 proto 2 vid 0x03F0 pid 0x3102 Oct 12 17:13:07 aragorn kernel: usbcore: registered new interface driver usblp Oct 12 17:13:07 aragorn kernel: yenta_cardbus 0000:02:06.0: CardBus bridge found [103c:30c9] Oct 12 17:13:07 aragorn kernel: yenta_cardbus 0000:02:06.0: ISA IRQ mask 0x0cb8, PCI irq 18 Oct 12 17:13:07 aragorn kernel: yenta_cardbus 0000:02:06.0: Socket status: 30000006 Oct 12 17:13:07 aragorn kernel: pci_bus 0000:02: Raising subordinate bus# of parent bus (#02) from #03 to #06 Oct 12 17:13:07 aragorn kernel: yenta_cardbus 0000:02:06.0: pcmcia: parent PCI bridge I/O window: 0x3000 - 0x3fff Oct 12 17:13:07 aragorn kernel: yenta_cardbus 0000:02:06.0: pcmcia: parent PCI bridge Memory window: 0xe0100000 - 0xe03fffff Oct 12 17:13:07 aragorn kernel: yenta_cardbus 0000:02:06.0: pcmcia: parent PCI bridge Memory window: 0x80000000 - 0x83ffffff Oct 12 17:13:07 aragorn kernel: Synaptics Touchpad, model: 1, fw: 6.3, id: 0x1a0b1, caps: 0xa04711/0xa00000 Oct 12 17:13:07 aragorn kernel: input: SynPS/2 Synaptics TouchPad as /class/input/input10 Oct 12 17:13:07 aragorn kernel: iwlagn: Intel(R) Wireless WiFi Link AGN driver for Linux, 1.3.27kd Oct 12 17:13:07 aragorn kernel: iwlagn: Copyright(c) 2003-2009 Intel Corporation Oct 12 17:13:07 aragorn kernel: iwlagn 0000:10:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 Oct 12 17:13:07 aragorn kernel: iwlagn 0000:10:00.0: setting latency timer to 64 Oct 12 17:13:07 aragorn kernel: iwlagn 0000:10:00.0: Detected Intel Wireless WiFi Link 4965AGN REV=0x4 Oct 12 17:13:07 aragorn kernel: iwlagn 0000:10:00.0: Tunable channels: 11 802.11bg, 13 802.11a channels Oct 12 17:13:07 aragorn kernel: iwlagn 0000:10:00.0: irq 27 for MSI/MSI-X Oct 12 17:13:07 aragorn kernel: phy0: Selected rate control algorithm 'iwl-agn-rs' Oct 12 17:13:07 aragorn kernel: HDA Intel 0000:00:1b.0: power state changed by ACPI to D0 Oct 12 17:13:07 aragorn kernel: HDA Intel 0000:00:1b.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 Oct 12 17:13:07 aragorn kernel: HDA Intel 0000:00:1b.0: setting latency timer to 64 Oct 12 17:13:07 aragorn kernel: input: HDA Digital PCBeep as /class/input/input11 Oct 12 17:13:07 aragorn kernel: EXT3 FS on dm-1, internal journal Oct 12 17:13:07 aragorn kernel: loop: module loaded Oct 12 17:13:07 aragorn kernel: input: HP WMI hotkeys as /class/input/input12 Oct 12 17:13:07 aragorn kernel: kjournald starting. Commit interval 5 seconds Oct 12 17:13:07 aragorn kernel: EXT3 FS on dm-5, internal journal Oct 12 17:13:07 aragorn kernel: EXT3-fs: mounted filesystem with ordered data mode. Oct 12 17:13:07 aragorn kernel: kjournald starting. Commit interval 5 seconds Oct 12 17:13:07 aragorn kernel: EXT3 FS on dm-2, internal journal Oct 12 17:13:07 aragorn kernel: EXT3-fs: mounted filesystem with ordered data mode. Oct 12 17:13:07 aragorn kernel: kjournald starting. Commit interval 5 seconds Oct 12 17:13:07 aragorn kernel: EXT3 FS on dm-3, internal journal Oct 12 17:13:07 aragorn kernel: EXT3-fs: mounted filesystem with ordered data mode. Oct 12 17:13:07 aragorn kernel: kjournald starting. Commit interval 5 seconds Oct 12 17:13:07 aragorn kernel: EXT3 FS on dm-4, internal journal Oct 12 17:13:07 aragorn kernel: EXT3-fs: mounted filesystem with ordered data mode. Oct 12 17:13:07 aragorn kernel: kjournald starting. Commit interval 5 seconds Oct 12 17:13:07 aragorn kernel: EXT3 FS on dm-6, internal journal Oct 12 17:13:07 aragorn kernel: EXT3-fs: mounted filesystem with ordered data mode. Oct 12 17:13:07 aragorn kernel: kjournald starting. Commit interval 5 seconds Oct 12 17:13:07 aragorn kernel: EXT3 FS on dm-9, internal journal Oct 12 17:13:07 aragorn kernel: EXT3-fs: mounted filesystem with ordered data mode. Oct 12 17:13:07 aragorn kernel: kjournald starting. Commit interval 5 seconds Oct 12 17:13:07 aragorn kernel: EXT3 FS on dm-12, internal journal Oct 12 17:13:07 aragorn kernel: EXT3-fs: mounted filesystem with ordered data mode. Oct 12 17:13:07 aragorn kernel: kjournald starting. Commit interval 5 seconds Oct 12 17:13:07 aragorn kernel: EXT3 FS on dm-8, internal journal Oct 12 17:13:07 aragorn kernel: EXT3-fs: mounted filesystem with ordered data mode. Oct 12 17:13:07 aragorn kernel: kjournald starting. Commit interval 5 seconds Oct 12 17:13:07 aragorn kernel: EXT3 FS on dm-10, internal journal Oct 12 17:13:07 aragorn kernel: EXT3-fs: mounted filesystem with ordered data mode. Oct 12 17:13:07 aragorn kernel: Adding 2097144k swap on /dev/mapper/main-swap. Priority:-1 extents:1 across:2097144k Oct 12 17:13:07 aragorn kernel: iwlagn 0000:10:00.0: firmware: requesting iwlwifi-4965-2.ucode Oct 12 17:13:07 aragorn kernel: iwlagn 0000:10:00.0: loaded firmware version 228.57.2.23 Oct 12 17:13:07 aragorn kernel: Registered led device: iwl-phy0::radio Oct 12 17:13:07 aragorn kernel: Registered led device: iwl-phy0::assoc Oct 12 17:13:07 aragorn kernel: Registered led device: iwl-phy0::RX Oct 12 17:13:07 aragorn kernel: Registered led device: iwl-phy0::TX Oct 12 17:13:07 aragorn kernel: wlan0: deauthenticating from 00:14:c1:38:e5:15 by local choice (reason=3) Oct 12 17:13:07 aragorn kernel: wlan0: direct probe to AP 00:14:c1:38:e5:15 (try 1) Oct 12 17:13:07 aragorn kernel: wlan0: direct probe responded Oct 12 17:13:07 aragorn kernel: wlan0: authenticate with AP 00:14:c1:38:e5:15 (try 1) Oct 12 17:13:07 aragorn kernel: wlan0: authenticated Oct 12 17:13:07 aragorn kernel: wlan0: associate with AP 00:14:c1:38:e5:15 (try 1) Oct 12 17:13:07 aragorn kernel: wlan0: RX AssocResp from 00:14:c1:38:e5:15 (capab=0x411 status=0 aid=1) Oct 12 17:13:07 aragorn kernel: wlan0: associated Oct 12 17:13:07 aragorn kernel: RPC: Registered udp transport module. Oct 12 17:13:07 aragorn kernel: RPC: Registered tcp transport module. Oct 12 17:13:07 aragorn kernel: RPC: Registered tcp NFSv4.1 backchannel transport module. Oct 12 17:13:07 aragorn kernel: Installing knfsd (copyright (C) 1996 okir@monad.swb.de). Oct 12 17:13:07 aragorn kernel: NET: Registered protocol family 10 Oct 12 17:13:07 aragorn kernel: lo: Disabled Privacy Extensions Oct 12 17:13:09 aragorn kernel: lp: driver loaded but no devices found Oct 12 17:13:09 aragorn kernel: ppdev: user-space parallel port driver Oct 12 17:13:15 aragorn kernel: wlan0: no IPv6 routers present Oct 12 17:13:20 aragorn kernel: [drm] Initialized drm 1.1.0 20060810 Oct 12 17:13:20 aragorn kernel: pci 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 Oct 12 17:13:20 aragorn kernel: pci 0000:00:02.0: setting latency timer to 64 Oct 12 17:13:20 aragorn kernel: pci 0000:00:02.0: irq 28 for MSI/MSI-X Oct 12 17:13:20 aragorn kernel: acpi device:00: registered as cooling_device9 Oct 12 17:13:20 aragorn kernel: input: Video Bus as /class/input/input13 Oct 12 17:13:20 aragorn kernel: ACPI: Video Device [C09A] (multi-head: yes rom: no post: no) Oct 12 17:13:20 aragorn kernel: [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0 Oct 12 17:16:11 aragorn kernel: sr0: CDROM not ready. Make sure there is a disc in the drive. Oct 12 17:16:11 aragorn kernel: sr0: CDROM not ready. Make sure there is a disc in the drive. Oct 12 17:55:15 aragorn kernel: sr0: CDROM not ready. Make sure there is a disc in the drive. Oct 12 17:55:15 aragorn kernel: sr0: CDROM not ready. Make sure there is a disc in the drive. Oct 12 18:10:35 aragorn kernel: e1000e 0000:00:19.0: irq 26 for MSI/MSI-X Oct 12 18:10:35 aragorn kernel: e1000e 0000:00:19.0: irq 26 for MSI/MSI-X Oct 12 18:10:35 aragorn kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready Oct 12 18:10:37 aragorn kernel: e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX Oct 12 18:10:37 aragorn kernel: 0000:00:19.0: eth0: 10/100 speed: disabling TSO Oct 12 18:10:37 aragorn kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Oct 12 18:10:39 aragorn kernel: wlan0: deauthenticating from 00:14:c1:38:e5:15 by local choice (reason=3) Oct 12 18:10:48 aragorn kernel: eth0: no IPv6 routers present Oct 12 20:13:29 aragorn kernel: Registered led device: iwl-phy0::radio Oct 12 20:13:29 aragorn kernel: Registered led device: iwl-phy0::assoc Oct 12 20:13:29 aragorn kernel: Registered led device: iwl-phy0::RX Oct 12 20:13:29 aragorn kernel: Registered led device: iwl-phy0::TX Oct 12 20:13:29 aragorn kernel: ADDRCONF(NETDEV_UP): wlan0: link is not ready Oct 12 20:13:29 aragorn kernel: wlan0: direct probe to AP 00:14:c1:38:e5:15 (try 1) Oct 12 20:13:29 aragorn kernel: wlan0: direct probe responded Oct 12 20:13:29 aragorn kernel: wlan0: authenticate with AP 00:14:c1:38:e5:15 (try 1) Oct 12 20:13:29 aragorn kernel: wlan0: authenticated Oct 12 20:13:29 aragorn kernel: wlan0: associate with AP 00:14:c1:38:e5:15 (try 1) Oct 12 20:13:29 aragorn kernel: wlan0: RX ReassocResp from 00:14:c1:38:e5:15 (capab=0x411 status=0 aid=0) Oct 12 20:13:29 aragorn kernel: wlan0: invalid aid value 0; bits 15:14 not set Oct 12 20:13:29 aragorn kernel: wlan0: associated Oct 12 20:13:29 aragorn kernel: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready Oct 12 20:13:30 aragorn kernel: wlan0: deauthenticating from 00:14:c1:38:e5:15 by local choice (reason=3) Oct 12 20:13:31 aragorn kernel: wlan0: deauthenticating from 00:14:c1:38:e5:15 by local choice (reason=3) Oct 12 20:13:31 aragorn kernel: wlan0: direct probe to AP 00:14:c1:38:e5:15 (try 1) Oct 12 20:13:31 aragorn kernel: wlan0: direct probe responded Oct 12 20:13:31 aragorn kernel: wlan0: authenticate with AP 00:14:c1:38:e5:15 (try 1) Oct 12 20:13:31 aragorn kernel: wlan0: authenticated Oct 12 20:13:31 aragorn kernel: wlan0: associate with AP 00:14:c1:38:e5:15 (try 1) Oct 12 20:13:31 aragorn kernel: wlan0: RX ReassocResp from 00:14:c1:38:e5:15 (capab=0x411 status=0 aid=1) Oct 12 20:13:31 aragorn kernel: wlan0: associated Oct 12 20:13:40 aragorn kernel: wlan0: no IPv6 routers present Oct 12 20:15:06 aragorn kernel: swapper: page allocation failure. order:2, mode:0x4020 Oct 12 20:15:07 aragorn kernel: Pid: 0, comm: swapper Not tainted 2.6.32-rc4 #29 Oct 12 20:15:07 aragorn kernel: Call Trace: Oct 12 20:15:07 aragorn kernel: <IRQ> [<ffffffff810a9d87>] __alloc_pages_nodemask+0x5b9/0x632 Oct 12 20:15:07 aragorn kernel: [<ffffffff810a9e17>] __get_free_pages+0x17/0x46 Oct 12 20:15:07 aragorn kernel: [<ffffffff810cecff>] __kmalloc_track_caller+0x4e/0x146 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] ? iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffff8122d96d>] __alloc_skb+0x6b/0x161 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b814>] iwl_rx_replenish_now+0x1b/0x28 [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b7243>] iwl_rx_handle+0x3ad/0x3c6 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffff8125732b>] ? ip_rcv+0x2b8/0x2ef Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b778d>] iwl_irq_tasklet_legacy+0x531/0x7a9 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0362b95>] ? __iwl_read32+0xaa/0xb9 [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffff81048abf>] tasklet_action+0x76/0xc1 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a31a>] __do_softirq+0xdd/0x197 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100ccdc>] call_softirq+0x1c/0x28 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100e81c>] do_softirq+0x38/0x70 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a164>] irq_exit+0x3b/0x7a Oct 12 20:15:07 aragorn kernel: [<ffffffff812af5d5>] do_IRQ+0xad/0xc4 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100c553>] ret_from_intr+0x0/0xa Oct 12 20:15:07 aragorn kernel: <EOI> [<ffffffffa02b1fc9>] ? acpi_idle_enter_simple+0xfe/0x12c [processor] Oct 12 20:15:07 aragorn kernel: [<ffffffffa02b1fbf>] ? acpi_idle_enter_simple+0xf4/0x12c [processor] Oct 12 20:15:07 aragorn kernel: [<ffffffff81220131>] ? cpuidle_idle_call+0x98/0xf3 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100aed0>] ? cpu_idle+0x5a/0x92 Oct 12 20:15:07 aragorn kernel: [<ffffffff812aa175>] ? start_secondary+0x17a/0x17f Oct 12 20:15:07 aragorn kernel: Mem-Info: Oct 12 20:15:07 aragorn kernel: DMA per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: DMA32 per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 186, btch: 31 usd: 187 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 186, btch: 31 usd: 159 Oct 12 20:15:07 aragorn kernel: active_anon:306195 inactive_anon:102886 isolated_anon:32 Oct 12 20:15:07 aragorn kernel: active_file:10730 inactive_file:10749 isolated_file:0 Oct 12 20:15:07 aragorn kernel: unevictable:400 dirty:0 writeback:35291 unstable:0 buffer:50 Oct 12 20:15:07 aragorn kernel: free:3047 slab_reclaimable:3967 slab_unreclaimable:11608 Oct 12 20:15:07 aragorn kernel: mapped:21571 shmem:19 pagetables:4243 bounce:0 Oct 12 20:15:07 aragorn kernel: DMA free:7928kB min:40kB low:48kB high:60kB active_anon:3880kB inactive_anon:3980kB active_file:16kB inactive_file:128kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15336kB mlocked:0kB dirty:0kB writeback:0kB mapped:28kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 1976 1976 1976 Oct 12 20:15:07 aragorn kernel: DMA32 free:4260kB min:5664kB low:7080kB high:8496kB active_anon:1220900kB inactive_anon:407564kB active_file:42904kB inactive_file:42868kB unevictable:1600kB isolated(anon):128kB isolated(file):0kB present:2023748kB mlocked:1600kB dirty:0kB writeback:141164kB mapped:86256kB shmem:76kB slab_reclaimable:15864kB slab_unreclaimable:46424kB kernel_stack:1440kB pagetables:16956kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 0 0 0 Oct 12 20:15:07 aragorn kernel: DMA: 6*4kB 6*8kB 5*16kB 5*32kB 5*64kB 3*128kB 1*256kB 1*512kB 2*1024kB 0*2048kB 1*4096kB = 7928kB Oct 12 20:15:07 aragorn kernel: DMA32: 549*4kB 196*8kB 0*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 4340kB Oct 12 20:15:07 aragorn kernel: 59407 total pagecache pages Oct 12 20:15:07 aragorn kernel: 37518 pages in swap cache Oct 12 20:15:07 aragorn kernel: Swap cache stats: add 205147, delete 167629, find 11222/12777 Oct 12 20:15:07 aragorn kernel: Free swap = 1361488kB Oct 12 20:15:07 aragorn kernel: Total swap = 2097144kB Oct 12 20:15:07 aragorn kernel: 518064 pages RAM Oct 12 20:15:07 aragorn kernel: 10503 pages reserved Oct 12 20:15:07 aragorn kernel: 93830 pages shared Oct 12 20:15:07 aragorn kernel: 433823 pages non-shared Oct 12 20:15:07 aragorn kernel: iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. Oct 12 20:15:07 aragorn kernel: swapper: page allocation failure. order:2, mode:0x4020 Oct 12 20:15:07 aragorn kernel: Pid: 0, comm: swapper Not tainted 2.6.32-rc4 #29 Oct 12 20:15:07 aragorn kernel: Call Trace: Oct 12 20:15:07 aragorn kernel: <IRQ> [<ffffffff810a9d87>] __alloc_pages_nodemask+0x5b9/0x632 Oct 12 20:15:07 aragorn kernel: [<ffffffff810a9e17>] __get_free_pages+0x17/0x46 Oct 12 20:15:07 aragorn kernel: [<ffffffff810cecff>] __kmalloc_track_caller+0x4e/0x146 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] ? iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffff8122d96d>] __alloc_skb+0x6b/0x161 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b814>] iwl_rx_replenish_now+0x1b/0x28 [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b7243>] iwl_rx_handle+0x3ad/0x3c6 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b778d>] iwl_irq_tasklet_legacy+0x531/0x7a9 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0362b95>] ? __iwl_read32+0xaa/0xb9 [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffff81048abf>] tasklet_action+0x76/0xc1 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a31a>] __do_softirq+0xdd/0x197 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100ccdc>] call_softirq+0x1c/0x28 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100e81c>] do_softirq+0x38/0x70 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a164>] irq_exit+0x3b/0x7a Oct 12 20:15:07 aragorn kernel: [<ffffffff812af5d5>] do_IRQ+0xad/0xc4 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100c553>] ret_from_intr+0x0/0xa Oct 12 20:15:07 aragorn kernel: <EOI> [<ffffffffa02b1fc9>] ? acpi_idle_enter_simple+0xfe/0x12c [processor] Oct 12 20:15:07 aragorn kernel: [<ffffffffa02b1fbf>] ? acpi_idle_enter_simple+0xf4/0x12c [processor] Oct 12 20:15:07 aragorn kernel: [<ffffffff81220131>] ? cpuidle_idle_call+0x98/0xf3 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100aed0>] ? cpu_idle+0x5a/0x92 Oct 12 20:15:07 aragorn kernel: [<ffffffff8129e042>] ? rest_init+0x66/0x68 Oct 12 20:15:07 aragorn kernel: [<ffffffff814a9c48>] ? start_kernel+0x34d/0x358 Oct 12 20:15:07 aragorn kernel: [<ffffffff814a929a>] ? x86_64_start_reservations+0xaa/0xae Oct 12 20:15:07 aragorn kernel: [<ffffffff814a937f>] ? x86_64_start_kernel+0xe1/0xe8 Oct 12 20:15:07 aragorn kernel: Mem-Info: Oct 12 20:15:07 aragorn kernel: DMA per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: DMA32 per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 186, btch: 31 usd: 187 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 186, btch: 31 usd: 167 Oct 12 20:15:07 aragorn kernel: active_anon:306195 inactive_anon:102888 isolated_anon:32 Oct 12 20:15:07 aragorn kernel: active_file:10737 inactive_file:10749 isolated_file:0 Oct 12 20:15:07 aragorn kernel: unevictable:400 dirty:0 writeback:35575 unstable:0 buffer:50 Oct 12 20:15:07 aragorn kernel: free:2981 slab_reclaimable:3935 slab_unreclaimable:11684 Oct 12 20:15:07 aragorn kernel: mapped:21563 shmem:19 pagetables:4243 bounce:0 Oct 12 20:15:07 aragorn kernel: DMA free:7928kB min:40kB low:48kB high:60kB active_anon:3880kB inactive_anon:3980kB active_file:16kB inactive_file:128kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15336kB mlocked:0kB dirty:0kB writeback:0kB mapped:28kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 1976 1976 1976 Oct 12 20:15:07 aragorn kernel: DMA32 free:3996kB min:5664kB low:7080kB high:8496kB active_anon:1220900kB inactive_anon:407572kB active_file:42932kB inactive_file:42868kB unevictable:1600kB isolated(anon):128kB isolated(file):0kB present:2023748kB mlocked:1600kB dirty:0kB writeback:142300kB mapped:86224kB shmem:76kB slab_reclaimable:15736kB slab_unreclaimable:46728kB kernel_stack:1440kB pagetables:16956kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 0 0 0 Oct 12 20:15:07 aragorn kernel: DMA: 6*4kB 6*8kB 5*16kB 5*32kB 5*64kB 3*128kB 1*256kB 1*512kB 2*1024kB 0*2048kB 1*4096kB = 7928kB Oct 12 20:15:07 aragorn kernel: DMA32: 609*4kB 135*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 4028kB Oct 12 20:15:07 aragorn kernel: 59639 total pagecache pages Oct 12 20:15:07 aragorn kernel: 37774 pages in swap cache Oct 12 20:15:07 aragorn kernel: Swap cache stats: add 205403, delete 167629, find 11222/12777 Oct 12 20:15:07 aragorn kernel: Free swap = 1360464kB Oct 12 20:15:07 aragorn kernel: Total swap = 2097144kB Oct 12 20:15:07 aragorn kernel: 518064 pages RAM Oct 12 20:15:07 aragorn kernel: 10503 pages reserved Oct 12 20:15:07 aragorn kernel: 93766 pages shared Oct 12 20:15:07 aragorn kernel: 433957 pages non-shared Oct 12 20:15:07 aragorn kernel: iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 3 free buffers remaining. Oct 12 20:15:07 aragorn kernel: kcryptd: page allocation failure. order:2, mode:0x4020 Oct 12 20:15:07 aragorn kernel: Pid: 1523, comm: kcryptd Not tainted 2.6.32-rc4 #29 Oct 12 20:15:07 aragorn kernel: Call Trace: Oct 12 20:15:07 aragorn kernel: <IRQ> [<ffffffff810a9d87>] __alloc_pages_nodemask+0x5b9/0x632 Oct 12 20:15:07 aragorn kernel: [<ffffffff810a9e17>] __get_free_pages+0x17/0x46 Oct 12 20:15:07 aragorn kernel: [<ffffffff810cecff>] __kmalloc_track_caller+0x4e/0x146 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] ? iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffff8122d96d>] __alloc_skb+0x6b/0x161 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b814>] iwl_rx_replenish_now+0x1b/0x28 [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b7218>] iwl_rx_handle+0x382/0x3c6 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffff8125732b>] ? ip_rcv+0x2b8/0x2ef Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b778d>] iwl_irq_tasklet_legacy+0x531/0x7a9 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0362b95>] ? __iwl_read32+0xaa/0xb9 [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffff81048abf>] tasklet_action+0x76/0xc1 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a31a>] __do_softirq+0xdd/0x197 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100ccdc>] call_softirq+0x1c/0x28 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100e81c>] do_softirq+0x38/0x70 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a164>] irq_exit+0x3b/0x7a Oct 12 20:15:07 aragorn kernel: [<ffffffff812af5d5>] do_IRQ+0xad/0xc4 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100c553>] ret_from_intr+0x0/0xa Oct 12 20:15:07 aragorn kernel: <EOI> [<ffffffffa02419f4>] ? enc128+0x67f/0x80b [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffffa024273a>] ? aes_encrypt+0x12/0x14 [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffffa022f2cc>] ? crypto_cbc_encrypt+0x131/0x193 [cbc] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0242728>] ? aes_encrypt+0x0/0x14 [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffff81152601>] ? async_encrypt+0x3d/0x3f Oct 12 20:15:07 aragorn kernel: [<ffffffffa0201bbd>] ? crypt_convert+0x1fe/0x290 [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0202077>] ? kcryptd_crypt+0x428/0x44e [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffff81058e77>] ? worker_thread+0x195/0x22d Oct 12 20:15:07 aragorn kernel: [<ffffffffa0201c4f>] ? kcryptd_crypt+0x0/0x44e [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffff8105cb19>] ? autoremove_wake_function+0x0/0x3d Oct 12 20:15:07 aragorn kernel: [<ffffffff81058ce2>] ? worker_thread+0x0/0x22d Oct 12 20:15:07 aragorn kernel: [<ffffffff8105c79b>] ? kthread+0x82/0x8a Oct 12 20:15:07 aragorn kernel: [<ffffffff8100cbda>] ? child_rip+0xa/0x20 Oct 12 20:15:07 aragorn kernel: [<ffffffff8105c719>] ? kthread+0x0/0x8a Oct 12 20:15:07 aragorn kernel: [<ffffffff8100cbd0>] ? child_rip+0x0/0x20 Oct 12 20:15:07 aragorn kernel: Mem-Info: Oct 12 20:15:07 aragorn kernel: DMA per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: DMA32 per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 186, btch: 31 usd: 161 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 186, btch: 31 usd: 200 Oct 12 20:15:07 aragorn kernel: active_anon:306195 inactive_anon:102838 isolated_anon:65 Oct 12 20:15:07 aragorn kernel: active_file:10600 inactive_file:10613 isolated_file:0 Oct 12 20:15:07 aragorn kernel: unevictable:400 dirty:0 writeback:36939 unstable:0 buffer:51 Oct 12 20:15:07 aragorn kernel: free:3171 slab_reclaimable:3935 slab_unreclaimable:11957 Oct 12 20:15:07 aragorn kernel: mapped:21315 shmem:19 pagetables:4243 bounce:0 Oct 12 20:15:07 aragorn kernel: DMA free:7928kB min:40kB low:48kB high:60kB active_anon:3880kB inactive_anon:3980kB active_file:16kB inactive_file:128kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15336kB mlocked:0kB dirty:0kB writeback:0kB mapped:28kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 1976 1976 1976 Oct 12 20:15:07 aragorn kernel: DMA32 free:4756kB min:5664kB low:7080kB high:8496kB active_anon:1220900kB inactive_anon:407376kB active_file:42384kB inactive_file:42324kB unevictable:1600kB isolated(anon):256kB isolated(file):0kB present:2023748kB mlocked:1600kB dirty:0kB writeback:147756kB mapped:85232kB shmem:76kB slab_reclaimable:15736kB slab_unreclaimable:47820kB kernel_stack:1440kB pagetables:16956kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:131 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 0 0 0 Oct 12 20:15:07 aragorn kernel: DMA: 6*4kB 6*8kB 5*16kB 5*32kB 5*64kB 3*128kB 1*256kB 1*512kB 2*1024kB 0*2048kB 1*4096kB = 7928kB Oct 12 20:15:07 aragorn kernel: DMA32: 923*4kB 17*8kB 0*16kB 1*32kB 0*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 4756kB Oct 12 20:15:07 aragorn kernel: 60755 total pagecache pages Oct 12 20:15:07 aragorn kernel: 39168 pages in swap cache Oct 12 20:15:07 aragorn kernel: Swap cache stats: add 206820, delete 167651, find 11222/12777 Oct 12 20:15:07 aragorn kernel: Free swap = 1354820kB Oct 12 20:15:07 aragorn kernel: Total swap = 2097144kB Oct 12 20:15:07 aragorn kernel: 518064 pages RAM Oct 12 20:15:07 aragorn kernel: 10503 pages reserved Oct 12 20:15:07 aragorn kernel: 93541 pages shared Oct 12 20:15:07 aragorn kernel: 433976 pages non-shared Oct 12 20:15:07 aragorn kernel: kcryptd: page allocation failure. order:2, mode:0x4020 Oct 12 20:15:07 aragorn kernel: Pid: 1523, comm: kcryptd Not tainted 2.6.32-rc4 #29 Oct 12 20:15:07 aragorn kernel: Call Trace: Oct 12 20:15:07 aragorn kernel: <IRQ> [<ffffffff810a9d87>] __alloc_pages_nodemask+0x5b9/0x632 Oct 12 20:15:07 aragorn kernel: [<ffffffff810a9e17>] __get_free_pages+0x17/0x46 Oct 12 20:15:07 aragorn kernel: [<ffffffff810cecff>] __kmalloc_track_caller+0x4e/0x146 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] ? iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffff8122d96d>] __alloc_skb+0x6b/0x161 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b814>] iwl_rx_replenish_now+0x1b/0x28 [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b7243>] iwl_rx_handle+0x3ad/0x3c6 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffff8125732b>] ? ip_rcv+0x2b8/0x2ef Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b778d>] iwl_irq_tasklet_legacy+0x531/0x7a9 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0362b95>] ? __iwl_read32+0xaa/0xb9 [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffff81048abf>] tasklet_action+0x76/0xc1 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a31a>] __do_softirq+0xdd/0x197 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100ccdc>] call_softirq+0x1c/0x28 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100e81c>] do_softirq+0x38/0x70 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a164>] irq_exit+0x3b/0x7a Oct 12 20:15:07 aragorn kernel: [<ffffffff812af5d5>] do_IRQ+0xad/0xc4 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100c553>] ret_from_intr+0x0/0xa Oct 12 20:15:07 aragorn kernel: <EOI> [<ffffffffa02419f4>] ? enc128+0x67f/0x80b [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffffa024273a>] ? aes_encrypt+0x12/0x14 [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffffa022f2cc>] ? crypto_cbc_encrypt+0x131/0x193 [cbc] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0242728>] ? aes_encrypt+0x0/0x14 [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffff81152601>] ? async_encrypt+0x3d/0x3f Oct 12 20:15:07 aragorn kernel: [<ffffffffa0201bbd>] ? crypt_convert+0x1fe/0x290 [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0202077>] ? kcryptd_crypt+0x428/0x44e [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffff81058e77>] ? worker_thread+0x195/0x22d Oct 12 20:15:07 aragorn kernel: [<ffffffffa0201c4f>] ? kcryptd_crypt+0x0/0x44e [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffff8105cb19>] ? autoremove_wake_function+0x0/0x3d Oct 12 20:15:07 aragorn kernel: [<ffffffff81058ce2>] ? worker_thread+0x0/0x22d Oct 12 20:15:07 aragorn kernel: [<ffffffff8105c79b>] ? kthread+0x82/0x8a Oct 12 20:15:07 aragorn kernel: [<ffffffff8100cbda>] ? child_rip+0xa/0x20 Oct 12 20:15:07 aragorn kernel: [<ffffffff8105c719>] ? kthread+0x0/0x8a Oct 12 20:15:07 aragorn kernel: [<ffffffff8100cbd0>] ? child_rip+0x0/0x20 Oct 12 20:15:07 aragorn kernel: Mem-Info: Oct 12 20:15:07 aragorn kernel: DMA per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: DMA32 per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 186, btch: 31 usd: 161 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 186, btch: 31 usd: 190 Oct 12 20:15:07 aragorn kernel: active_anon:306195 inactive_anon:102848 isolated_anon:66 Oct 12 20:15:07 aragorn kernel: active_file:10600 inactive_file:10613 isolated_file:0 Oct 12 20:15:07 aragorn kernel: unevictable:400 dirty:0 writeback:36970 unstable:0 buffer:51 Oct 12 20:15:07 aragorn kernel: free:3171 slab_reclaimable:3935 slab_unreclaimable:11978 Oct 12 20:15:07 aragorn kernel: mapped:21315 shmem:19 pagetables:4243 bounce:0 Oct 12 20:15:07 aragorn kernel: DMA free:7928kB min:40kB low:48kB high:60kB active_anon:3880kB inactive_anon:3980kB active_file:16kB inactive_file:128kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15336kB mlocked:0kB dirty:0kB writeback:0kB mapped:28kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 1976 1976 1976 Oct 12 20:15:07 aragorn kernel: DMA32 free:4756kB min:5664kB low:7080kB high:8496kB active_anon:1220900kB inactive_anon:407412kB active_file:42384kB inactive_file:42324kB unevictable:1600kB isolated(anon):264kB isolated(file):0kB present:2023748kB mlocked:1600kB dirty:0kB writeback:147880kB mapped:85232kB shmem:76kB slab_reclaimable:15736kB slab_unreclaimable:47904kB kernel_stack:1440kB pagetables:16956kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:165 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 0 0 0 Oct 12 20:15:07 aragorn kernel: DMA: 6*4kB 6*8kB 5*16kB 5*32kB 5*64kB 3*128kB 1*256kB 1*512kB 2*1024kB 0*2048kB 1*4096kB = 7928kB Oct 12 20:15:07 aragorn kernel: DMA32: 923*4kB 17*8kB 0*16kB 1*32kB 0*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 4756kB Oct 12 20:15:07 aragorn kernel: 60786 total pagecache pages Oct 12 20:15:07 aragorn kernel: 39195 pages in swap cache Oct 12 20:15:07 aragorn kernel: Swap cache stats: add 206846, delete 167651, find 11222/12777 Oct 12 20:15:07 aragorn kernel: Free swap = 1354716kB Oct 12 20:15:07 aragorn kernel: Total swap = 2097144kB Oct 12 20:15:07 aragorn kernel: 518064 pages RAM Oct 12 20:15:07 aragorn kernel: 10503 pages reserved Oct 12 20:15:07 aragorn kernel: 93541 pages shared Oct 12 20:15:07 aragorn kernel: 433976 pages non-shared Oct 12 20:15:07 aragorn kernel: kcryptd: page allocation failure. order:2, mode:0x4020 Oct 12 20:15:07 aragorn kernel: Pid: 1523, comm: kcryptd Not tainted 2.6.32-rc4 #29 Oct 12 20:15:07 aragorn kernel: Call Trace: Oct 12 20:15:07 aragorn kernel: <IRQ> [<ffffffff810a9d87>] __alloc_pages_nodemask+0x5b9/0x632 Oct 12 20:15:07 aragorn kernel: [<ffffffff810a9e17>] __get_free_pages+0x17/0x46 Oct 12 20:15:07 aragorn kernel: [<ffffffff810cecff>] __kmalloc_track_caller+0x4e/0x146 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] ? iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffff8122d96d>] __alloc_skb+0x6b/0x161 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b814>] iwl_rx_replenish_now+0x1b/0x28 [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b7218>] iwl_rx_handle+0x382/0x3c6 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b778d>] iwl_irq_tasklet_legacy+0x531/0x7a9 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffff8122c0e6>] ? skb_dequeue+0x60/0x6c Oct 12 20:15:07 aragorn kernel: [<ffffffff81048abf>] tasklet_action+0x76/0xc1 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a31a>] __do_softirq+0xdd/0x197 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100ccdc>] call_softirq+0x1c/0x28 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100e81c>] do_softirq+0x38/0x70 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a164>] irq_exit+0x3b/0x7a Oct 12 20:15:07 aragorn kernel: [<ffffffff812af5d5>] do_IRQ+0xad/0xc4 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100c553>] ret_from_intr+0x0/0xa Oct 12 20:15:07 aragorn kernel: <EOI> [<ffffffffa02419f4>] ? enc128+0x67f/0x80b [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffffa024273a>] ? aes_encrypt+0x12/0x14 [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffffa022f2cc>] ? crypto_cbc_encrypt+0x131/0x193 [cbc] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0242728>] ? aes_encrypt+0x0/0x14 [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffff81152601>] ? async_encrypt+0x3d/0x3f Oct 12 20:15:07 aragorn kernel: [<ffffffffa0201bbd>] ? crypt_convert+0x1fe/0x290 [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0202077>] ? kcryptd_crypt+0x428/0x44e [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffff81058e77>] ? worker_thread+0x195/0x22d Oct 12 20:15:07 aragorn kernel: [<ffffffffa0201c4f>] ? kcryptd_crypt+0x0/0x44e [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffff8105cb19>] ? autoremove_wake_function+0x0/0x3d Oct 12 20:15:07 aragorn kernel: [<ffffffff81058ce2>] ? worker_thread+0x0/0x22d Oct 12 20:15:07 aragorn kernel: [<ffffffff8105c79b>] ? kthread+0x82/0x8a Oct 12 20:15:07 aragorn kernel: [<ffffffff8100cbda>] ? child_rip+0xa/0x20 Oct 12 20:15:07 aragorn kernel: [<ffffffff8105c719>] ? kthread+0x0/0x8a Oct 12 20:15:07 aragorn kernel: [<ffffffff8100cbd0>] ? child_rip+0x0/0x20 Oct 12 20:15:07 aragorn kernel: Mem-Info: Oct 12 20:15:07 aragorn kernel: DMA per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: DMA32 per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 186, btch: 31 usd: 161 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 186, btch: 31 usd: 190 Oct 12 20:15:07 aragorn kernel: active_anon:306195 inactive_anon:102848 isolated_anon:66 Oct 12 20:15:07 aragorn kernel: active_file:10600 inactive_file:10613 isolated_file:0 Oct 12 20:15:07 aragorn kernel: unevictable:400 dirty:0 writeback:36970 unstable:0 buffer:51 Oct 12 20:15:07 aragorn kernel: free:3171 slab_reclaimable:3935 slab_unreclaimable:11978 Oct 12 20:15:07 aragorn kernel: mapped:21315 shmem:19 pagetables:4243 bounce:0 Oct 12 20:15:07 aragorn kernel: DMA free:7928kB min:40kB low:48kB high:60kB active_anon:3880kB inactive_anon:3980kB active_file:16kB inactive_file:128kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15336kB mlocked:0kB dirty:0kB writeback:0kB mapped:28kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 1976 1976 1976 Oct 12 20:15:07 aragorn kernel: DMA32 free:4756kB min:5664kB low:7080kB high:8496kB active_anon:1220900kB inactive_anon:407412kB active_file:42384kB inactive_file:42324kB unevictable:1600kB isolated(anon):264kB isolated(file):0kB present:2023748kB mlocked:1600kB dirty:0kB writeback:147880kB mapped:85232kB shmem:76kB slab_reclaimable:15736kB slab_unreclaimable:47904kB kernel_stack:1440kB pagetables:16956kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 0 0 0 Oct 12 20:15:07 aragorn kernel: DMA: 6*4kB 6*8kB 5*16kB 5*32kB 5*64kB 3*128kB 1*256kB 1*512kB 2*1024kB 0*2048kB 1*4096kB = 7928kB Oct 12 20:15:07 aragorn kernel: DMA32: 923*4kB 17*8kB 0*16kB 1*32kB 0*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 4756kB Oct 12 20:15:07 aragorn kernel: 60786 total pagecache pages Oct 12 20:15:07 aragorn kernel: 39195 pages in swap cache Oct 12 20:15:07 aragorn kernel: Swap cache stats: add 206846, delete 167651, find 11222/12777 Oct 12 20:15:07 aragorn kernel: Free swap = 1354716kB Oct 12 20:15:07 aragorn kernel: Total swap = 2097144kB Oct 12 20:15:07 aragorn kernel: 518064 pages RAM Oct 12 20:15:07 aragorn kernel: 10503 pages reserved Oct 12 20:15:07 aragorn kernel: 93541 pages shared Oct 12 20:15:07 aragorn kernel: 433976 pages non-shared Oct 12 20:15:07 aragorn kernel: kcryptd: page allocation failure. order:2, mode:0x4020 Oct 12 20:15:07 aragorn kernel: Pid: 1523, comm: kcryptd Not tainted 2.6.32-rc4 #29 Oct 12 20:15:07 aragorn kernel: Call Trace: Oct 12 20:15:07 aragorn kernel: <IRQ> [<ffffffff810a9d87>] __alloc_pages_nodemask+0x5b9/0x632 Oct 12 20:15:07 aragorn kernel: [<ffffffff810a9e17>] __get_free_pages+0x17/0x46 Oct 12 20:15:07 aragorn kernel: [<ffffffff810cecff>] __kmalloc_track_caller+0x4e/0x146 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] ? iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffff8122d96d>] __alloc_skb+0x6b/0x161 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b814>] iwl_rx_replenish_now+0x1b/0x28 [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b7218>] iwl_rx_handle+0x382/0x3c6 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b778d>] iwl_irq_tasklet_legacy+0x531/0x7a9 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffff8122c0e6>] ? skb_dequeue+0x60/0x6c Oct 12 20:15:07 aragorn kernel: [<ffffffff81048abf>] tasklet_action+0x76/0xc1 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a31a>] __do_softirq+0xdd/0x197 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100ccdc>] call_softirq+0x1c/0x28 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100e81c>] do_softirq+0x38/0x70 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a164>] irq_exit+0x3b/0x7a Oct 12 20:15:07 aragorn kernel: [<ffffffff812af5d5>] do_IRQ+0xad/0xc4 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100c553>] ret_from_intr+0x0/0xa Oct 12 20:15:07 aragorn kernel: <EOI> [<ffffffffa02419f4>] ? enc128+0x67f/0x80b [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffffa024273a>] ? aes_encrypt+0x12/0x14 [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffffa022f2cc>] ? crypto_cbc_encrypt+0x131/0x193 [cbc] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0242728>] ? aes_encrypt+0x0/0x14 [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffff81152601>] ? async_encrypt+0x3d/0x3f Oct 12 20:15:07 aragorn kernel: [<ffffffffa0201bbd>] ? crypt_convert+0x1fe/0x290 [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0202077>] ? kcryptd_crypt+0x428/0x44e [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffff81058e77>] ? worker_thread+0x195/0x22d Oct 12 20:15:07 aragorn kernel: [<ffffffffa0201c4f>] ? kcryptd_crypt+0x0/0x44e [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffff8105cb19>] ? autoremove_wake_function+0x0/0x3d Oct 12 20:15:07 aragorn kernel: [<ffffffff81058ce2>] ? worker_thread+0x0/0x22d Oct 12 20:15:07 aragorn kernel: [<ffffffff8105c79b>] ? kthread+0x82/0x8a Oct 12 20:15:07 aragorn kernel: [<ffffffff8100cbda>] ? child_rip+0xa/0x20 Oct 12 20:15:07 aragorn kernel: [<ffffffff8105c719>] ? kthread+0x0/0x8a Oct 12 20:15:07 aragorn kernel: [<ffffffff8100cbd0>] ? child_rip+0x0/0x20 Oct 12 20:15:07 aragorn kernel: Mem-Info: Oct 12 20:15:07 aragorn kernel: DMA per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: DMA32 per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 186, btch: 31 usd: 161 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 186, btch: 31 usd: 190 Oct 12 20:15:07 aragorn kernel: active_anon:306195 inactive_anon:102848 isolated_anon:66 Oct 12 20:15:07 aragorn kernel: active_file:10600 inactive_file:10613 isolated_file:0 Oct 12 20:15:07 aragorn kernel: unevictable:400 dirty:0 writeback:36970 unstable:0 buffer:51 Oct 12 20:15:07 aragorn kernel: free:3171 slab_reclaimable:3935 slab_unreclaimable:11978 Oct 12 20:15:07 aragorn kernel: mapped:21315 shmem:19 pagetables:4243 bounce:0 Oct 12 20:15:07 aragorn kernel: DMA free:7928kB min:40kB low:48kB high:60kB active_anon:3880kB inactive_anon:3980kB active_file:16kB inactive_file:128kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15336kB mlocked:0kB dirty:0kB writeback:0kB mapped:28kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 1976 1976 1976 Oct 12 20:15:07 aragorn kernel: DMA32 free:4756kB min:5664kB low:7080kB high:8496kB active_anon:1220900kB inactive_anon:407412kB active_file:42384kB inactive_file:42324kB unevictable:1600kB isolated(anon):264kB isolated(file):0kB present:2023748kB mlocked:1600kB dirty:0kB writeback:147880kB mapped:85232kB shmem:76kB slab_reclaimable:15736kB slab_unreclaimable:47904kB kernel_stack:1440kB pagetables:16956kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 0 0 0 Oct 12 20:15:07 aragorn kernel: DMA: 6*4kB 6*8kB 5*16kB 5*32kB 5*64kB 3*128kB 1*256kB 1*512kB 2*1024kB 0*2048kB 1*4096kB = 7928kB Oct 12 20:15:07 aragorn kernel: DMA32: 923*4kB 17*8kB 0*16kB 1*32kB 0*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 4756kB Oct 12 20:15:07 aragorn kernel: 60786 total pagecache pages Oct 12 20:15:07 aragorn kernel: 39195 pages in swap cache Oct 12 20:15:07 aragorn kernel: Swap cache stats: add 206846, delete 167651, find 11222/12777 Oct 12 20:15:07 aragorn kernel: Free swap = 1354716kB Oct 12 20:15:07 aragorn kernel: Total swap = 2097144kB Oct 12 20:15:07 aragorn kernel: 518064 pages RAM Oct 12 20:15:07 aragorn kernel: 10503 pages reserved Oct 12 20:15:07 aragorn kernel: 93541 pages shared Oct 12 20:15:07 aragorn kernel: 433976 pages non-shared Oct 12 20:15:07 aragorn kernel: kcryptd: page allocation failure. order:2, mode:0x4020 Oct 12 20:15:07 aragorn kernel: Pid: 1523, comm: kcryptd Not tainted 2.6.32-rc4 #29 Oct 12 20:15:07 aragorn kernel: Call Trace: Oct 12 20:15:07 aragorn kernel: <IRQ> [<ffffffff810a9d87>] __alloc_pages_nodemask+0x5b9/0x632 Oct 12 20:15:07 aragorn kernel: [<ffffffff810a9e17>] __get_free_pages+0x17/0x46 Oct 12 20:15:07 aragorn kernel: [<ffffffff810cecff>] __kmalloc_track_caller+0x4e/0x146 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] ? iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffff8122d96d>] __alloc_skb+0x6b/0x161 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b814>] iwl_rx_replenish_now+0x1b/0x28 [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b7218>] iwl_rx_handle+0x382/0x3c6 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b778d>] iwl_irq_tasklet_legacy+0x531/0x7a9 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffff8122c0e6>] ? skb_dequeue+0x60/0x6c Oct 12 20:15:07 aragorn kernel: [<ffffffff81048abf>] tasklet_action+0x76/0xc1 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a31a>] __do_softirq+0xdd/0x197 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100ccdc>] call_softirq+0x1c/0x28 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100e81c>] do_softirq+0x38/0x70 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a164>] irq_exit+0x3b/0x7a Oct 12 20:15:07 aragorn kernel: [<ffffffff812af5d5>] do_IRQ+0xad/0xc4 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100c553>] ret_from_intr+0x0/0xa Oct 12 20:15:07 aragorn kernel: <EOI> [<ffffffffa02419f4>] ? enc128+0x67f/0x80b [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffffa024273a>] ? aes_encrypt+0x12/0x14 [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffffa022f2cc>] ? crypto_cbc_encrypt+0x131/0x193 [cbc] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0242728>] ? aes_encrypt+0x0/0x14 [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffff81152601>] ? async_encrypt+0x3d/0x3f Oct 12 20:15:07 aragorn kernel: [<ffffffffa0201bbd>] ? crypt_convert+0x1fe/0x290 [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0202077>] ? kcryptd_crypt+0x428/0x44e [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffff81058e77>] ? worker_thread+0x195/0x22d Oct 12 20:15:07 aragorn kernel: [<ffffffffa0201c4f>] ? kcryptd_crypt+0x0/0x44e [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffff8105cb19>] ? autoremove_wake_function+0x0/0x3d Oct 12 20:15:07 aragorn kernel: [<ffffffff81058ce2>] ? worker_thread+0x0/0x22d Oct 12 20:15:07 aragorn kernel: [<ffffffff8105c79b>] ? kthread+0x82/0x8a Oct 12 20:15:07 aragorn kernel: [<ffffffff8100cbda>] ? child_rip+0xa/0x20 Oct 12 20:15:07 aragorn kernel: [<ffffffff8105c719>] ? kthread+0x0/0x8a Oct 12 20:15:07 aragorn kernel: [<ffffffff8100cbd0>] ? child_rip+0x0/0x20 Oct 12 20:15:07 aragorn kernel: Mem-Info: Oct 12 20:15:07 aragorn kernel: DMA per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: DMA32 per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 186, btch: 31 usd: 161 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 186, btch: 31 usd: 190 Oct 12 20:15:07 aragorn kernel: active_anon:306195 inactive_anon:102848 isolated_anon:66 Oct 12 20:15:07 aragorn kernel: active_file:10600 inactive_file:10613 isolated_file:0 Oct 12 20:15:07 aragorn kernel: unevictable:400 dirty:0 writeback:36970 unstable:0 buffer:51 Oct 12 20:15:07 aragorn kernel: free:3171 slab_reclaimable:3935 slab_unreclaimable:11978 Oct 12 20:15:07 aragorn kernel: mapped:21315 shmem:19 pagetables:4243 bounce:0 Oct 12 20:15:07 aragorn kernel: DMA free:7928kB min:40kB low:48kB high:60kB active_anon:3880kB inactive_anon:3980kB active_file:16kB inactive_file:128kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15336kB mlocked:0kB dirty:0kB writeback:0kB mapped:28kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 1976 1976 1976 Oct 12 20:15:07 aragorn kernel: DMA32 free:4756kB min:5664kB low:7080kB high:8496kB active_anon:1220900kB inactive_anon:407412kB active_file:42384kB inactive_file:42324kB unevictable:1600kB isolated(anon):264kB isolated(file):0kB present:2023748kB mlocked:1600kB dirty:0kB writeback:147880kB mapped:85232kB shmem:76kB slab_reclaimable:15736kB slab_unreclaimable:47904kB kernel_stack:1440kB pagetables:16956kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 0 0 0 Oct 12 20:15:07 aragorn kernel: DMA: 6*4kB 6*8kB 5*16kB 5*32kB 5*64kB 3*128kB 1*256kB 1*512kB 2*1024kB 0*2048kB 1*4096kB = 7928kB Oct 12 20:15:07 aragorn kernel: DMA32: 923*4kB 17*8kB 0*16kB 1*32kB 0*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 4756kB Oct 12 20:15:07 aragorn kernel: 60786 total pagecache pages Oct 12 20:15:07 aragorn kernel: 39195 pages in swap cache Oct 12 20:15:07 aragorn kernel: Swap cache stats: add 206846, delete 167651, find 11222/12777 Oct 12 20:15:07 aragorn kernel: Free swap = 1354716kB Oct 12 20:15:07 aragorn kernel: Total swap = 2097144kB Oct 12 20:15:07 aragorn kernel: 518064 pages RAM Oct 12 20:15:07 aragorn kernel: 10503 pages reserved Oct 12 20:15:07 aragorn kernel: 93541 pages shared Oct 12 20:15:07 aragorn kernel: 433976 pages non-shared Oct 12 20:15:07 aragorn kernel: kcryptd: page allocation failure. order:2, mode:0x4020 Oct 12 20:15:07 aragorn kernel: Pid: 1523, comm: kcryptd Not tainted 2.6.32-rc4 #29 Oct 12 20:15:07 aragorn kernel: Call Trace: Oct 12 20:15:07 aragorn kernel: <IRQ> [<ffffffff810a9d87>] __alloc_pages_nodemask+0x5b9/0x632 Oct 12 20:15:07 aragorn kernel: [<ffffffff810a9e17>] __get_free_pages+0x17/0x46 Oct 12 20:15:07 aragorn kernel: [<ffffffff810cecff>] __kmalloc_track_caller+0x4e/0x146 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] ? iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffff8122d96d>] __alloc_skb+0x6b/0x161 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b814>] iwl_rx_replenish_now+0x1b/0x28 [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b7218>] iwl_rx_handle+0x382/0x3c6 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b778d>] iwl_irq_tasklet_legacy+0x531/0x7a9 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffff8122c0e6>] ? skb_dequeue+0x60/0x6c Oct 12 20:15:07 aragorn kernel: [<ffffffff81048abf>] tasklet_action+0x76/0xc1 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a31a>] __do_softirq+0xdd/0x197 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100ccdc>] call_softirq+0x1c/0x28 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100e81c>] do_softirq+0x38/0x70 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a164>] irq_exit+0x3b/0x7a Oct 12 20:15:07 aragorn kernel: [<ffffffff812af5d5>] do_IRQ+0xad/0xc4 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100c553>] ret_from_intr+0x0/0xa Oct 12 20:15:07 aragorn kernel: <EOI> [<ffffffffa02419f4>] ? enc128+0x67f/0x80b [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffffa024273a>] ? aes_encrypt+0x12/0x14 [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffffa022f2cc>] ? crypto_cbc_encrypt+0x131/0x193 [cbc] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0242728>] ? aes_encrypt+0x0/0x14 [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffff81152601>] ? async_encrypt+0x3d/0x3f Oct 12 20:15:07 aragorn kernel: [<ffffffffa0201bbd>] ? crypt_convert+0x1fe/0x290 [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0202077>] ? kcryptd_crypt+0x428/0x44e [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffff81058e77>] ? worker_thread+0x195/0x22d Oct 12 20:15:07 aragorn kernel: [<ffffffffa0201c4f>] ? kcryptd_crypt+0x0/0x44e [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffff8105cb19>] ? autoremove_wake_function+0x0/0x3d Oct 12 20:15:07 aragorn kernel: [<ffffffff81058ce2>] ? worker_thread+0x0/0x22d Oct 12 20:15:07 aragorn kernel: [<ffffffff8105c79b>] ? kthread+0x82/0x8a Oct 12 20:15:07 aragorn kernel: [<ffffffff8100cbda>] ? child_rip+0xa/0x20 Oct 12 20:15:07 aragorn kernel: [<ffffffff8105c719>] ? kthread+0x0/0x8a Oct 12 20:15:07 aragorn kernel: [<ffffffff8100cbd0>] ? child_rip+0x0/0x20 Oct 12 20:15:07 aragorn kernel: Mem-Info: Oct 12 20:15:07 aragorn kernel: DMA per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: DMA32 per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 186, btch: 31 usd: 161 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 186, btch: 31 usd: 190 Oct 12 20:15:07 aragorn kernel: active_anon:306195 inactive_anon:102848 isolated_anon:66 Oct 12 20:15:07 aragorn kernel: active_file:10600 inactive_file:10613 isolated_file:0 Oct 12 20:15:07 aragorn kernel: unevictable:400 dirty:0 writeback:36970 unstable:0 buffer:51 Oct 12 20:15:07 aragorn kernel: free:3171 slab_reclaimable:3935 slab_unreclaimable:11978 Oct 12 20:15:07 aragorn kernel: mapped:21315 shmem:19 pagetables:4243 bounce:0 Oct 12 20:15:07 aragorn kernel: DMA free:7928kB min:40kB low:48kB high:60kB active_anon:3880kB inactive_anon:3980kB active_file:16kB inactive_file:128kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15336kB mlocked:0kB dirty:0kB writeback:0kB mapped:28kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 1976 1976 1976 Oct 12 20:15:07 aragorn kernel: DMA32 free:4756kB min:5664kB low:7080kB high:8496kB active_anon:1220900kB inactive_anon:407412kB active_file:42384kB inactive_file:42324kB unevictable:1600kB isolated(anon):264kB isolated(file):0kB present:2023748kB mlocked:1600kB dirty:0kB writeback:147880kB mapped:85232kB shmem:76kB slab_reclaimable:15736kB slab_unreclaimable:47904kB kernel_stack:1440kB pagetables:16956kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 0 0 0 Oct 12 20:15:07 aragorn kernel: DMA: 6*4kB 6*8kB 5*16kB 5*32kB 5*64kB 3*128kB 1*256kB 1*512kB 2*1024kB 0*2048kB 1*4096kB = 7928kB Oct 12 20:15:07 aragorn kernel: DMA32: 923*4kB 17*8kB 0*16kB 1*32kB 0*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 4756kB Oct 12 20:15:07 aragorn kernel: 60786 total pagecache pages Oct 12 20:15:07 aragorn kernel: 39195 pages in swap cache Oct 12 20:15:07 aragorn kernel: Swap cache stats: add 206846, delete 167651, find 11222/12777 Oct 12 20:15:07 aragorn kernel: Free swap = 1354716kB Oct 12 20:15:07 aragorn kernel: Total swap = 2097144kB Oct 12 20:15:07 aragorn kernel: 518064 pages RAM Oct 12 20:15:07 aragorn kernel: 10503 pages reserved Oct 12 20:15:07 aragorn kernel: 93541 pages shared Oct 12 20:15:07 aragorn kernel: 433976 pages non-shared Oct 12 20:15:07 aragorn kernel: kcryptd: page allocation failure. order:2, mode:0x4020 Oct 12 20:15:07 aragorn kernel: Pid: 1523, comm: kcryptd Not tainted 2.6.32-rc4 #29 Oct 12 20:15:07 aragorn kernel: Call Trace: Oct 12 20:15:07 aragorn kernel: <IRQ> [<ffffffff810a9d87>] __alloc_pages_nodemask+0x5b9/0x632 Oct 12 20:15:07 aragorn kernel: [<ffffffff810a9e17>] __get_free_pages+0x17/0x46 Oct 12 20:15:07 aragorn kernel: [<ffffffff810cecff>] __kmalloc_track_caller+0x4e/0x146 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] ? iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffff8122d96d>] __alloc_skb+0x6b/0x161 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b814>] iwl_rx_replenish_now+0x1b/0x28 [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b7218>] iwl_rx_handle+0x382/0x3c6 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b778d>] iwl_irq_tasklet_legacy+0x531/0x7a9 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffff8122c0e6>] ? skb_dequeue+0x60/0x6c Oct 12 20:15:07 aragorn kernel: [<ffffffff81048abf>] tasklet_action+0x76/0xc1 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a31a>] __do_softirq+0xdd/0x197 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100ccdc>] call_softirq+0x1c/0x28 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100e81c>] do_softirq+0x38/0x70 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a164>] irq_exit+0x3b/0x7a Oct 12 20:15:07 aragorn kernel: [<ffffffff812af5d5>] do_IRQ+0xad/0xc4 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100c553>] ret_from_intr+0x0/0xa Oct 12 20:15:07 aragorn kernel: <EOI> [<ffffffffa02419f4>] ? enc128+0x67f/0x80b [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffffa024273a>] ? aes_encrypt+0x12/0x14 [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffffa022f2cc>] ? crypto_cbc_encrypt+0x131/0x193 [cbc] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0242728>] ? aes_encrypt+0x0/0x14 [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffff81152601>] ? async_encrypt+0x3d/0x3f Oct 12 20:15:07 aragorn kernel: [<ffffffffa0201bbd>] ? crypt_convert+0x1fe/0x290 [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0202077>] ? kcryptd_crypt+0x428/0x44e [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffff81058e77>] ? worker_thread+0x195/0x22d Oct 12 20:15:07 aragorn kernel: [<ffffffffa0201c4f>] ? kcryptd_crypt+0x0/0x44e [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffff8105cb19>] ? autoremove_wake_function+0x0/0x3d Oct 12 20:15:07 aragorn kernel: [<ffffffff81058ce2>] ? worker_thread+0x0/0x22d Oct 12 20:15:07 aragorn kernel: [<ffffffff8105c79b>] ? kthread+0x82/0x8a Oct 12 20:15:07 aragorn kernel: [<ffffffff8100cbda>] ? child_rip+0xa/0x20 Oct 12 20:15:07 aragorn kernel: [<ffffffff8105c719>] ? kthread+0x0/0x8a Oct 12 20:15:07 aragorn kernel: [<ffffffff8100cbd0>] ? child_rip+0x0/0x20 Oct 12 20:15:07 aragorn kernel: Mem-Info: Oct 12 20:15:07 aragorn kernel: DMA per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: DMA32 per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 186, btch: 31 usd: 161 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 186, btch: 31 usd: 190 Oct 12 20:15:07 aragorn kernel: active_anon:306195 inactive_anon:102848 isolated_anon:66 Oct 12 20:15:07 aragorn kernel: active_file:10600 inactive_file:10613 isolated_file:0 Oct 12 20:15:07 aragorn kernel: unevictable:400 dirty:0 writeback:36970 unstable:0 buffer:51 Oct 12 20:15:07 aragorn kernel: free:3171 slab_reclaimable:3935 slab_unreclaimable:11978 Oct 12 20:15:07 aragorn kernel: mapped:21315 shmem:19 pagetables:4243 bounce:0 Oct 12 20:15:07 aragorn kernel: DMA free:7928kB min:40kB low:48kB high:60kB active_anon:3880kB inactive_anon:3980kB active_file:16kB inactive_file:128kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15336kB mlocked:0kB dirty:0kB writeback:0kB mapped:28kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 1976 1976 1976 Oct 12 20:15:07 aragorn kernel: DMA32 free:4756kB min:5664kB low:7080kB high:8496kB active_anon:1220900kB inactive_anon:407412kB active_file:42384kB inactive_file:42324kB unevictable:1600kB isolated(anon):264kB isolated(file):0kB present:2023748kB mlocked:1600kB dirty:0kB writeback:147880kB mapped:85232kB shmem:76kB slab_reclaimable:15736kB slab_unreclaimable:47904kB kernel_stack:1440kB pagetables:16956kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 0 0 0 Oct 12 20:15:07 aragorn kernel: DMA: 6*4kB 6*8kB 5*16kB 5*32kB 5*64kB 3*128kB 1*256kB 1*512kB 2*1024kB 0*2048kB 1*4096kB = 7928kB Oct 12 20:15:07 aragorn kernel: DMA32: 923*4kB 17*8kB 0*16kB 1*32kB 0*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 4756kB Oct 12 20:15:07 aragorn kernel: 60786 total pagecache pages Oct 12 20:15:07 aragorn kernel: 39195 pages in swap cache Oct 12 20:15:07 aragorn kernel: Swap cache stats: add 206846, delete 167651, find 11222/12777 Oct 12 20:15:07 aragorn kernel: Free swap = 1354716kB Oct 12 20:15:07 aragorn kernel: Total swap = 2097144kB Oct 12 20:15:07 aragorn kernel: 518064 pages RAM Oct 12 20:15:07 aragorn kernel: 10503 pages reserved Oct 12 20:15:07 aragorn kernel: 93541 pages shared Oct 12 20:15:07 aragorn kernel: 433976 pages non-shared Oct 12 20:15:07 aragorn kernel: kcryptd: page allocation failure. order:2, mode:0x4020 Oct 12 20:15:07 aragorn kernel: Pid: 1523, comm: kcryptd Not tainted 2.6.32-rc4 #29 Oct 12 20:15:07 aragorn kernel: Call Trace: Oct 12 20:15:07 aragorn kernel: <IRQ> [<ffffffff810a9d87>] __alloc_pages_nodemask+0x5b9/0x632 Oct 12 20:15:07 aragorn kernel: [<ffffffff810a9e17>] __get_free_pages+0x17/0x46 Oct 12 20:15:07 aragorn kernel: [<ffffffff810cecff>] __kmalloc_track_caller+0x4e/0x146 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] ? iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffff8122d96d>] __alloc_skb+0x6b/0x161 Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b583>] iwl_rx_allocate+0x94/0x30a [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa036b814>] iwl_rx_replenish_now+0x1b/0x28 [iwlcore] Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b7218>] iwl_rx_handle+0x382/0x3c6 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffffa03b778d>] iwl_irq_tasklet_legacy+0x531/0x7a9 [iwlagn] Oct 12 20:15:07 aragorn kernel: [<ffffffff8122c0e6>] ? skb_dequeue+0x60/0x6c Oct 12 20:15:07 aragorn kernel: [<ffffffff81048abf>] tasklet_action+0x76/0xc1 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a31a>] __do_softirq+0xdd/0x197 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100ccdc>] call_softirq+0x1c/0x28 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100e81c>] do_softirq+0x38/0x70 Oct 12 20:15:07 aragorn kernel: [<ffffffff8104a164>] irq_exit+0x3b/0x7a Oct 12 20:15:07 aragorn kernel: [<ffffffff812af5d5>] do_IRQ+0xad/0xc4 Oct 12 20:15:07 aragorn kernel: [<ffffffff8100c553>] ret_from_intr+0x0/0xa Oct 12 20:15:07 aragorn kernel: <EOI> [<ffffffffa02419f4>] ? enc128+0x67f/0x80b [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffffa024273a>] ? aes_encrypt+0x12/0x14 [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffffa022f2cc>] ? crypto_cbc_encrypt+0x131/0x193 [cbc] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0242728>] ? aes_encrypt+0x0/0x14 [aes_x86_64] Oct 12 20:15:07 aragorn kernel: [<ffffffff81152601>] ? async_encrypt+0x3d/0x3f Oct 12 20:15:07 aragorn kernel: [<ffffffffa0201bbd>] ? crypt_convert+0x1fe/0x290 [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffffa0202077>] ? kcryptd_crypt+0x428/0x44e [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffff81058e77>] ? worker_thread+0x195/0x22d Oct 12 20:15:07 aragorn kernel: [<ffffffffa0201c4f>] ? kcryptd_crypt+0x0/0x44e [dm_crypt] Oct 12 20:15:07 aragorn kernel: [<ffffffff8105cb19>] ? autoremove_wake_function+0x0/0x3d Oct 12 20:15:07 aragorn kernel: [<ffffffff81058ce2>] ? worker_thread+0x0/0x22d Oct 12 20:15:07 aragorn kernel: [<ffffffff8105c79b>] ? kthread+0x82/0x8a Oct 12 20:15:07 aragorn kernel: [<ffffffff8100cbda>] ? child_rip+0xa/0x20 Oct 12 20:15:07 aragorn kernel: [<ffffffff8105c719>] ? kthread+0x0/0x8a Oct 12 20:15:07 aragorn kernel: [<ffffffff8100cbd0>] ? child_rip+0x0/0x20 Oct 12 20:15:07 aragorn kernel: Mem-Info: Oct 12 20:15:07 aragorn kernel: DMA per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 0, btch: 1 usd: 0 Oct 12 20:15:07 aragorn kernel: DMA32 per-cpu: Oct 12 20:15:07 aragorn kernel: CPU 0: hi: 186, btch: 31 usd: 161 Oct 12 20:15:07 aragorn kernel: CPU 1: hi: 186, btch: 31 usd: 190 Oct 12 20:15:07 aragorn kernel: active_anon:306195 inactive_anon:102848 isolated_anon:66 Oct 12 20:15:07 aragorn kernel: active_file:10600 inactive_file:10613 isolated_file:0 Oct 12 20:15:07 aragorn kernel: unevictable:400 dirty:0 writeback:36970 unstable:0 buffer:51 Oct 12 20:15:07 aragorn kernel: free:3171 slab_reclaimable:3935 slab_unreclaimable:11978 Oct 12 20:15:07 aragorn kernel: mapped:21315 shmem:19 pagetables:4243 bounce:0 Oct 12 20:15:07 aragorn kernel: DMA free:7928kB min:40kB low:48kB high:60kB active_anon:3880kB inactive_anon:3980kB active_file:16kB inactive_file:128kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15336kB mlocked:0kB dirty:0kB writeback:0kB mapped:28kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 1976 1976 1976 Oct 12 20:15:07 aragorn kernel: DMA32 free:4756kB min:5664kB low:7080kB high:8496kB active_anon:1220900kB inactive_anon:407412kB active_file:42384kB inactive_file:42324kB unevictable:1600kB isolated(anon):264kB isolated(file):0kB present:2023748kB mlocked:1600kB dirty:0kB writeback:147880kB mapped:85232kB shmem:76kB slab_reclaimable:15736kB slab_unreclaimable:47904kB kernel_stack:1440kB pagetables:16956kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 12 20:15:07 aragorn kernel: lowmem_reserve[]: 0 0 0 0 Oct 12 20:15:07 aragorn kernel: DMA: 6*4kB 6*8kB 5*16kB 5*32kB 5*64kB 3*128kB 1*256kB 1*512kB 2*1024kB 0*2048kB 1*4096kB = 7928kB Oct 12 20:15:07 aragorn kernel: DMA32: 923*4kB 17*8kB 0*16kB 1*32kB 0*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 4756kB Oct 12 20:15:07 aragorn kernel: 60786 total pagecache pages Oct 12 20:15:07 aragorn kernel: 39195 pages in swap cache Oct 12 20:15:07 aragorn kernel: Swap cache stats: add 206846, delete 167651, find 11222/12777 Oct 12 20:15:07 aragorn kernel: Free swap = 1354716kB Oct 12 20:15:07 aragorn kernel: Total swap = 2097144kB Oct 12 20:15:07 aragorn kernel: 518064 pages RAM Oct 12 20:15:07 aragorn kernel: 10503 pages reserved Oct 12 20:15:07 aragorn kernel: 93541 pages shared Oct 12 20:15:07 aragorn kernel: 433976 pages non-shared Oct 12 20:15:07 aragorn kernel: __ratelimit: 45 callbacks suppressed ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-14 13:10 ` Frans Pop @ 2009-10-14 15:40 ` Mel Gorman 2009-10-14 16:30 ` reinette chatre 2009-10-18 23:33 ` Frans Pop 2 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-14 15:40 UTC (permalink / raw) To: Frans Pop Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Wed, Oct 14, 2009 at 03:10:08PM +0200, Frans Pop wrote: > On Wednesday 14 October 2009, Mel Gorman wrote: > > I think this is very significant. Either that change needs to be backed > > out or more likely, __GFP_NOWARN needs to be specified and warnings > > *only* printed when the RX buffers are really low. My expectation would > > be that some GFP_ATOMIC allocations fail during refill but the fact they > > fail wakes kswapd to reclaim order-2 pages while the RX buffers in the > > pool are consumed. > > Sorry I did not actually mention this, but the SKB failures I get with .32 > have loads of the "Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 > free buffers remaining." errors. That's why I don't think your patch will > help anything. > > zgrep "Only 0 free buffers remaining" /var/log/kern.log* | wc -l > 84 > > OK, they are all GPF_ATOMIC and not GPF_KERNEL, but they also almost all > have "0 free buffers"! Next to the 84 warnings for 0 remaining I only have > one with "3 free buffers" and one with "1 free buffers". > This is fairly important. It shows that the refills are not keeping up with the GFP_ATOMIC usage. I'm not sure what to do with this. As the driver introduced GFP_ATOMIC usage at all, I'm tempted to say revert the changes in the driver that makes use of GFP_ATOMIC but I'm not the maintainer. They could also consider having a GFP_ATOMIC-optimistic, GFP_KERNEL-if-no-buffers-free-and-directly-allocating with GFP_KERNEL refills always happening in the tasklet. However, it might be just avoiding the MM problem on my part. It's possible that if I figure out what went wrong in mm and drivers use of GFP_ATOMIC will be swept under the carpet. > And that does not even count the rate limitting: > Oct 12 20:15:07 aragorn kernel: __ratelimit: 45 callbacks suppressed > Oct 12 20:25:19 aragorn kernel: __ratelimit: 27 callbacks suppressed > Oct 12 20:25:20 aragorn kernel: __ratelimit: 2 callbacks suppressed > > Attached the kernel log for one test I did with .32. > > > > In both cases I no longer get SKB errors, but instead (?) I get > > > firmware errors: > > > iwlagn 0000:10:00.0: Microcode SW error detected. Restarting > > > 0x2000000. > > > > I am no wireless expert, but that looks like an separate problem to me. > > I don't see how an allocation failure could trigger errors in the > > microcode. > > Yes, it is a separate problem, but it is still significant that reverting > that patch triggers them in the extreme swap situation. > True. > > > With your patch on .32-rc4 I still get the SKB errors, so it does not > > > seem to help. The only change there may have been is that the desktop > > > was frozen longer than without the patch, but that is an impression, > > > not a hard fact. > > > > Actually, that's fairly interesting and I think justifies pushing the > > patch. Direct reclaim can stall processes in a user-visible manner which > > kswapd is meant to avoid in the majority of cases but is tricky to > > quantify without instrumenting the kernel to measure direct reclaim > > frequency and latency (I have WIP tracepoints for this but it's still a > > WIP). If you notice shorter stalls with the patch applied, it means that > > kswapd really did need to be informed of the problems. > > No, I thought I saw _longer_ stalls with your patch applied... > Sorry, I misinterpreted. If the stalls are longer, it likely means that kswapd is doing more work and causing more IO when applied as it tries to get order-2 pages free. You said you still got SKB errors. Were there any significant change to the number of failures or can that be told? > > There still has not been a mm-change identified that makes fragmentation > > significantly worse. > > My bisection shows a very clear point, even if not an individual commit, in > the 'akpm' merge where SKB errors suddenly become *much* more frequent and > easy to trigger. > I'm sorry to say this, but the fact that nothing has been identified yet is > IMO the result of a lack of effort, not because there is no such change. > I apologise if I've given that impression. I've been starting at the commits but could not find an obvious candidate within the page allocator itself which is why I've been looking at other areas. I put together a hack that allocated order-2 atomics at a constant rate and order-5 atomics at a lower rate to try replicate the problem without drivers. I ran some workloads but I wasn't able to get reliable figures that would have allowed me to investigate further. > > The majority of the wireless reports have been in > > this driver and I think we have the problem commit there. The only other > > is a firmware loading problem in e100 after resume that fails to make an > > atomic order-5 fail. > > Not exactly true. Bartlomiej's report was about ipw2200, so there are at > least 3 different drivers involved, two wireless and one wired. Besides > that one report is related to heavy swap, one to resume and one to driver > reload. > So it's much more likely that there is some common regression (in mm) that > affected all three than that there are three unrelated regressions. Very very likely, I'm not denying this. > And although both of the others did extremely high allocations, they both > started appearing in the same timeframe. And Bart's very first report > linked it to mm changes. > > > It's possible that something has changed in resume > > in the 2.6.31 window there - maybe something like drivers now reload > > during resume where they didn't previously or less memory being pushed > > to swap during resume. > > IMO you're sticking your head in the sand here. No. If I was sticking my head in the sand, I would have dismissed this entirely as "GFP_ATOMIC allocations can fail boo hoo hoo deal with it". What I'm trying to identify what changed that would affect fragmentation but that is not within the page allocator itself - largely because with the exception of the patch I gave you, I couldn't find obvious breakage. You highlighted the first akpm merge so lets look closer at that as I don't think there is anything more I can do with the wireless driver other than the suggestions made already. I looked at this already but I felt fixing GFP_ATOMIC in wireless was the more likely fix. Here is what you said about the merge. ==== For a good overview of the area, use 'gitk f83b1e61..517d0869'. v2.6.30-5466-ga1dd268 mm: use alloc_pages_exact in alloc_large_system_hash 2.3 +- v2.6.30-5478-ge9bb35d mm: setup_per_zone_inactive_ratio - fix comment and.. 2.5 +- v2.6.30-5486-g35282a2 migration: only migrate_prep() once per move_pages() 2.6 -|+|- not quite conclusive... v2.6.30-5492-gbce7394 page-allocator: reset wmark_min and inactive ratio.. 2.4 -|- ==== This is what I found. The following were the possible commits that might be causing the problem. d239171..72807a7 -- page allocator These are the bulk of the page-allocator changes that happened int the 2.6.30..2.6.31 cycle. It's also the location of the change to kswapd that I sent you a patch for. If there was a marked increase in the number of failures before and after this patchset, it means that I was wrong about the problem not being in the page allocator and I have to go back and keep looking. However, you report that commit e9bb35d mm: setup_per_zone_inactive_ratio - fix comment had relatively good results - relatively being that it didn't fail on the first try. In my head, these patches have been struck off the list of possibilities and is why I've been looking in other subsystems. 56e49d2..f166777 -- reclaim I would have considered this strong candidates except again, the last good commit happened after this point. If other obvious candidates don't crop up, it might be worth double checking within this range, particularly commit 56e49d2 vmscan: evict use-once pages first as it is targeted at streaming-IO workloads which would include your music workload. This commit also will cleanly revert on mainline so is relatively easy to test 5c87ead..e9bb35d -- inactive ratio changes These patches should be harmless but just in case, please compare the output of # grep inactive_ratio /proc/zoneinfo on 2.6.30 and 2.6.31 and make sure the ratios are the same. e9bb35d..bce7394 -- various changes According to your analysis, this is the most likely location of the problem commit. Commit b70d94e altered how zonelists were selected during allocation. This was tested fairly heavily but if the testing missed something, it would mean that some allocations are not using the zones they should be. However, my expectation would be that mistakes here would have severe consequences affecting a large number of people. This does not revert cleanly but there is an untested patch below that should do the job. While it's hard to imagine this patch being the problem, it's the most likely commit with the range of commits your analysis identified. Commit bc75d33 is totally harmless but it mentions min_free_kbytes. I checked on my machine to make sure min_free_kbytes was the same on both 2.6.30 and 2.6.31. Can you check that this is true for your machine? If min_free_kbytes decreased, it could explain GFP_ATOMIC failures. An extremely unlikely candidate is 75927af8. For this to be a problem, much of your userspace would have to be calling madvise() with stupid parameters and depending on it silently ignore the parameters A vague potential candidate for swapless systems is 69c85481 but your machine has swap so it can't be this. Commit bce7394 affects min_free_kbytes but only on hotplug so it can't be this either for your machine After this point, your analysis indicates that things are already broken but lets look at some of the candidates anyway. Out of curiousity, was CONFIG_UNEVICTABLE_LRU unset in your .config for 2.6.30? I could only find your 2.6.31 .config. If it was, it might be worth reverting 6837765963f1723e80ca97b1fae660f3a60d77df and unsetting it in 2.6.31 and seeing what happens. Commit 8cab4754d24a0f2e05920170c845bd84472814c6 keeps pages on the active lists for longer than 2.6.30 did. It's possible the fewer reclaim decisions is delaying lumpy reclaim. CONFIG_NUMA is not set in your config, so the zone_reclaim() changes around 24cf72518c79cdcda486ed26074ff8151291cf65 can be discounted. Commit ee993b135ec75a93bd5c45e636bb210d2975159b altered how lumpy reclaim works but it should have been harmless. It does not cleanly revert but it's easy to manually revert. I didn't spot any other patches that might be potential problems in the commits. > I'm not saying that mm is the only issue here, but I'm convinced that there > _is_ an mm change that has contributed in a major way to these issues, > even if we've not yet been able to identify it. > > > - net_ratelimit()) > > + net_ratelimit()) { > > IWL_CRIT(priv, "Failed to allocate SKB buffer with %s. Only %u free > > buffers remaining.\n", priority == GFP_ATOMIC ? "GFP_ATOMIC" : > > "GFP_KERNEL", > > Haven't you broken the test 'priority == GFP_ATOMIC' here by setting > priority to GFP_ATOMIC|__GFP_NOWARN? > Yes, I did, but as you say that this error message is showing up and buffers are all depleted, it's not even close to being the right fix. It'd only be relevant if that error message was showing up with buffers remaining in the queue. Revert commit b70d94ee --- diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 557bdad..3a94e4b 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -21,8 +21,7 @@ struct vm_area_struct; #define __GFP_DMA ((__force gfp_t)0x01u) #define __GFP_HIGHMEM ((__force gfp_t)0x02u) #define __GFP_DMA32 ((__force gfp_t)0x04u) -#define __GFP_MOVABLE ((__force gfp_t)0x08u) /* Page is movable */ -#define GFP_ZONEMASK (__GFP_DMA|__GFP_HIGHMEM|__GFP_DMA32|__GFP_MOVABLE) + /* * Action modifiers - doesn't change the zoning * @@ -52,6 +51,7 @@ struct vm_area_struct; #define __GFP_HARDWALL ((__force gfp_t)0x20000u) /* Enforce hardwall cpuset memory allocs */ #define __GFP_THISNODE ((__force gfp_t)0x40000u)/* No fallback, no policies */ #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */ +#define __GFP_MOVABLE ((__force gfp_t)0x100000u) /* Page is movable */ #ifdef CONFIG_KMEMCHECK #define __GFP_NOTRACK ((__force gfp_t)0x200000u) /* Don't track with kmemcheck */ @@ -128,105 +128,24 @@ static inline int allocflags_to_migratetype(gfp_t gfp_flags) ((gfp_flags & __GFP_RECLAIMABLE) != 0); } -#ifdef CONFIG_HIGHMEM -#define OPT_ZONE_HIGHMEM ZONE_HIGHMEM -#else -#define OPT_ZONE_HIGHMEM ZONE_NORMAL -#endif - +static inline enum zone_type gfp_zone(gfp_t flags) +{ #ifdef CONFIG_ZONE_DMA -#define OPT_ZONE_DMA ZONE_DMA -#else -#define OPT_ZONE_DMA ZONE_NORMAL + if (flags & __GFP_DMA) + return ZONE_DMA; #endif - #ifdef CONFIG_ZONE_DMA32 -#define OPT_ZONE_DMA32 ZONE_DMA32 -#else -#define OPT_ZONE_DMA32 ZONE_NORMAL -#endif - -/* - * GFP_ZONE_TABLE is a word size bitstring that is used for looking up the - * zone to use given the lowest 4 bits of gfp_t. Entries are ZONE_SHIFT long - * and there are 16 of them to cover all possible combinations of - * __GFP_DMA, __GFP_DMA32, __GFP_MOVABLE and __GFP_HIGHMEM - * - * The zone fallback order is MOVABLE=>HIGHMEM=>NORMAL=>DMA32=>DMA. - * But GFP_MOVABLE is not only a zone specifier but also an allocation - * policy. Therefore __GFP_MOVABLE plus another zone selector is valid. - * Only 1bit of the lowest 3 bit (DMA,DMA32,HIGHMEM) can be set to "1". - * - * bit result - * ================= - * 0x0 => NORMAL - * 0x1 => DMA or NORMAL - * 0x2 => HIGHMEM or NORMAL - * 0x3 => BAD (DMA+HIGHMEM) - * 0x4 => DMA32 or DMA or NORMAL - * 0x5 => BAD (DMA+DMA32) - * 0x6 => BAD (HIGHMEM+DMA32) - * 0x7 => BAD (HIGHMEM+DMA32+DMA) - * 0x8 => NORMAL (MOVABLE+0) - * 0x9 => DMA or NORMAL (MOVABLE+DMA) - * 0xa => MOVABLE (Movable is valid only if HIGHMEM is set too) - * 0xb => BAD (MOVABLE+HIGHMEM+DMA) - * 0xc => DMA32 (MOVABLE+HIGHMEM+DMA32) - * 0xd => BAD (MOVABLE+DMA32+DMA) - * 0xe => BAD (MOVABLE+DMA32+HIGHMEM) - * 0xf => BAD (MOVABLE+DMA32+HIGHMEM+DMA) - * - * ZONES_SHIFT must be <= 2 on 32 bit platforms. - */ - -#if 16 * ZONES_SHIFT > BITS_PER_LONG -#error ZONES_SHIFT too large to create GFP_ZONE_TABLE integer + if (flags & __GFP_DMA32) + return ZONE_DMA32; #endif - -#define GFP_ZONE_TABLE ( \ - (ZONE_NORMAL << 0 * ZONES_SHIFT) \ - | (OPT_ZONE_DMA << __GFP_DMA * ZONES_SHIFT) \ - | (OPT_ZONE_HIGHMEM << __GFP_HIGHMEM * ZONES_SHIFT) \ - | (OPT_ZONE_DMA32 << __GFP_DMA32 * ZONES_SHIFT) \ - | (ZONE_NORMAL << __GFP_MOVABLE * ZONES_SHIFT) \ - | (OPT_ZONE_DMA << (__GFP_MOVABLE | __GFP_DMA) * ZONES_SHIFT) \ - | (ZONE_MOVABLE << (__GFP_MOVABLE | __GFP_HIGHMEM) * ZONES_SHIFT)\ - | (OPT_ZONE_DMA32 << (__GFP_MOVABLE | __GFP_DMA32) * ZONES_SHIFT)\ -) - -/* - * GFP_ZONE_BAD is a bitmap for all combination of __GFP_DMA, __GFP_DMA32 - * __GFP_HIGHMEM and __GFP_MOVABLE that are not permitted. One flag per - * entry starting with bit 0. Bit is set if the combination is not - * allowed. - */ -#define GFP_ZONE_BAD ( \ - 1 << (__GFP_DMA | __GFP_HIGHMEM) \ - | 1 << (__GFP_DMA | __GFP_DMA32) \ - | 1 << (__GFP_DMA32 | __GFP_HIGHMEM) \ - | 1 << (__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM) \ - | 1 << (__GFP_MOVABLE | __GFP_HIGHMEM | __GFP_DMA) \ - | 1 << (__GFP_MOVABLE | __GFP_DMA32 | __GFP_DMA) \ - | 1 << (__GFP_MOVABLE | __GFP_DMA32 | __GFP_HIGHMEM) \ - | 1 << (__GFP_MOVABLE | __GFP_DMA32 | __GFP_DMA | __GFP_HIGHMEM)\ -) - -static inline enum zone_type gfp_zone(gfp_t flags) -{ - enum zone_type z; - int bit = flags & GFP_ZONEMASK; - - z = (GFP_ZONE_TABLE >> (bit * ZONES_SHIFT)) & - ((1 << ZONES_SHIFT) - 1); - - if (__builtin_constant_p(bit)) - MAYBE_BUILD_BUG_ON((GFP_ZONE_BAD >> bit) & 1); - else { -#ifdef CONFIG_DEBUG_VM - BUG_ON((GFP_ZONE_BAD >> bit) & 1); + if ((flags & (__GFP_HIGHMEM | __GFP_MOVABLE)) == + (__GFP_HIGHMEM | __GFP_MOVABLE)) + return ZONE_MOVABLE; +#ifdef CONFIG_HIGHMEM + if (flags & __GFP_HIGHMEM) + return ZONE_HIGHMEM; #endif - } - return z; + return ZONE_NORMAL; } /* -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-14 15:40 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-14 15:40 UTC (permalink / raw) To: Frans Pop Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Wed, Oct 14, 2009 at 03:10:08PM +0200, Frans Pop wrote: > On Wednesday 14 October 2009, Mel Gorman wrote: > > I think this is very significant. Either that change needs to be backed > > out or more likely, __GFP_NOWARN needs to be specified and warnings > > *only* printed when the RX buffers are really low. My expectation would > > be that some GFP_ATOMIC allocations fail during refill but the fact they > > fail wakes kswapd to reclaim order-2 pages while the RX buffers in the > > pool are consumed. > > Sorry I did not actually mention this, but the SKB failures I get with .32 > have loads of the "Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 > free buffers remaining." errors. That's why I don't think your patch will > help anything. > > zgrep "Only 0 free buffers remaining" /var/log/kern.log* | wc -l > 84 > > OK, they are all GPF_ATOMIC and not GPF_KERNEL, but they also almost all > have "0 free buffers"! Next to the 84 warnings for 0 remaining I only have > one with "3 free buffers" and one with "1 free buffers". > This is fairly important. It shows that the refills are not keeping up with the GFP_ATOMIC usage. I'm not sure what to do with this. As the driver introduced GFP_ATOMIC usage at all, I'm tempted to say revert the changes in the driver that makes use of GFP_ATOMIC but I'm not the maintainer. They could also consider having a GFP_ATOMIC-optimistic, GFP_KERNEL-if-no-buffers-free-and-directly-allocating with GFP_KERNEL refills always happening in the tasklet. However, it might be just avoiding the MM problem on my part. It's possible that if I figure out what went wrong in mm and drivers use of GFP_ATOMIC will be swept under the carpet. > And that does not even count the rate limitting: > Oct 12 20:15:07 aragorn kernel: __ratelimit: 45 callbacks suppressed > Oct 12 20:25:19 aragorn kernel: __ratelimit: 27 callbacks suppressed > Oct 12 20:25:20 aragorn kernel: __ratelimit: 2 callbacks suppressed > > Attached the kernel log for one test I did with .32. > > > > In both cases I no longer get SKB errors, but instead (?) I get > > > firmware errors: > > > iwlagn 0000:10:00.0: Microcode SW error detected. Restarting > > > 0x2000000. > > > > I am no wireless expert, but that looks like an separate problem to me. > > I don't see how an allocation failure could trigger errors in the > > microcode. > > Yes, it is a separate problem, but it is still significant that reverting > that patch triggers them in the extreme swap situation. > True. > > > With your patch on .32-rc4 I still get the SKB errors, so it does not > > > seem to help. The only change there may have been is that the desktop > > > was frozen longer than without the patch, but that is an impression, > > > not a hard fact. > > > > Actually, that's fairly interesting and I think justifies pushing the > > patch. Direct reclaim can stall processes in a user-visible manner which > > kswapd is meant to avoid in the majority of cases but is tricky to > > quantify without instrumenting the kernel to measure direct reclaim > > frequency and latency (I have WIP tracepoints for this but it's still a > > WIP). If you notice shorter stalls with the patch applied, it means that > > kswapd really did need to be informed of the problems. > > No, I thought I saw _longer_ stalls with your patch applied... > Sorry, I misinterpreted. If the stalls are longer, it likely means that kswapd is doing more work and causing more IO when applied as it tries to get order-2 pages free. You said you still got SKB errors. Were there any significant change to the number of failures or can that be told? > > There still has not been a mm-change identified that makes fragmentation > > significantly worse. > > My bisection shows a very clear point, even if not an individual commit, in > the 'akpm' merge where SKB errors suddenly become *much* more frequent and > easy to trigger. > I'm sorry to say this, but the fact that nothing has been identified yet is > IMO the result of a lack of effort, not because there is no such change. > I apologise if I've given that impression. I've been starting at the commits but could not find an obvious candidate within the page allocator itself which is why I've been looking at other areas. I put together a hack that allocated order-2 atomics at a constant rate and order-5 atomics at a lower rate to try replicate the problem without drivers. I ran some workloads but I wasn't able to get reliable figures that would have allowed me to investigate further. > > The majority of the wireless reports have been in > > this driver and I think we have the problem commit there. The only other > > is a firmware loading problem in e100 after resume that fails to make an > > atomic order-5 fail. > > Not exactly true. Bartlomiej's report was about ipw2200, so there are at > least 3 different drivers involved, two wireless and one wired. Besides > that one report is related to heavy swap, one to resume and one to driver > reload. > So it's much more likely that there is some common regression (in mm) that > affected all three than that there are three unrelated regressions. Very very likely, I'm not denying this. > And although both of the others did extremely high allocations, they both > started appearing in the same timeframe. And Bart's very first report > linked it to mm changes. > > > It's possible that something has changed in resume > > in the 2.6.31 window there - maybe something like drivers now reload > > during resume where they didn't previously or less memory being pushed > > to swap during resume. > > IMO you're sticking your head in the sand here. No. If I was sticking my head in the sand, I would have dismissed this entirely as "GFP_ATOMIC allocations can fail boo hoo hoo deal with it". What I'm trying to identify what changed that would affect fragmentation but that is not within the page allocator itself - largely because with the exception of the patch I gave you, I couldn't find obvious breakage. You highlighted the first akpm merge so lets look closer at that as I don't think there is anything more I can do with the wireless driver other than the suggestions made already. I looked at this already but I felt fixing GFP_ATOMIC in wireless was the more likely fix. Here is what you said about the merge. ==== For a good overview of the area, use 'gitk f83b1e61..517d0869'. v2.6.30-5466-ga1dd268 mm: use alloc_pages_exact in alloc_large_system_hash 2.3 +- v2.6.30-5478-ge9bb35d mm: setup_per_zone_inactive_ratio - fix comment and.. 2.5 +- v2.6.30-5486-g35282a2 migration: only migrate_prep() once per move_pages() 2.6 -|+|- not quite conclusive... v2.6.30-5492-gbce7394 page-allocator: reset wmark_min and inactive ratio.. 2.4 -|- ==== This is what I found. The following were the possible commits that might be causing the problem. d239171..72807a7 -- page allocator These are the bulk of the page-allocator changes that happened int the 2.6.30..2.6.31 cycle. It's also the location of the change to kswapd that I sent you a patch for. If there was a marked increase in the number of failures before and after this patchset, it means that I was wrong about the problem not being in the page allocator and I have to go back and keep looking. However, you report that commit e9bb35d mm: setup_per_zone_inactive_ratio - fix comment had relatively good results - relatively being that it didn't fail on the first try. In my head, these patches have been struck off the list of possibilities and is why I've been looking in other subsystems. 56e49d2..f166777 -- reclaim I would have considered this strong candidates except again, the last good commit happened after this point. If other obvious candidates don't crop up, it might be worth double checking within this range, particularly commit 56e49d2 vmscan: evict use-once pages first as it is targeted at streaming-IO workloads which would include your music workload. This commit also will cleanly revert on mainline so is relatively easy to test 5c87ead..e9bb35d -- inactive ratio changes These patches should be harmless but just in case, please compare the output of # grep inactive_ratio /proc/zoneinfo on 2.6.30 and 2.6.31 and make sure the ratios are the same. e9bb35d..bce7394 -- various changes According to your analysis, this is the most likely location of the problem commit. Commit b70d94e altered how zonelists were selected during allocation. This was tested fairly heavily but if the testing missed something, it would mean that some allocations are not using the zones they should be. However, my expectation would be that mistakes here would have severe consequences affecting a large number of people. This does not revert cleanly but there is an untested patch below that should do the job. While it's hard to imagine this patch being the problem, it's the most likely commit with the range of commits your analysis identified. Commit bc75d33 is totally harmless but it mentions min_free_kbytes. I checked on my machine to make sure min_free_kbytes was the same on both 2.6.30 and 2.6.31. Can you check that this is true for your machine? If min_free_kbytes decreased, it could explain GFP_ATOMIC failures. An extremely unlikely candidate is 75927af8. For this to be a problem, much of your userspace would have to be calling madvise() with stupid parameters and depending on it silently ignore the parameters A vague potential candidate for swapless systems is 69c85481 but your machine has swap so it can't be this. Commit bce7394 affects min_free_kbytes but only on hotplug so it can't be this either for your machine After this point, your analysis indicates that things are already broken but lets look at some of the candidates anyway. Out of curiousity, was CONFIG_UNEVICTABLE_LRU unset in your .config for 2.6.30? I could only find your 2.6.31 .config. If it was, it might be worth reverting 6837765963f1723e80ca97b1fae660f3a60d77df and unsetting it in 2.6.31 and seeing what happens. Commit 8cab4754d24a0f2e05920170c845bd84472814c6 keeps pages on the active lists for longer than 2.6.30 did. It's possible the fewer reclaim decisions is delaying lumpy reclaim. CONFIG_NUMA is not set in your config, so the zone_reclaim() changes around 24cf72518c79cdcda486ed26074ff8151291cf65 can be discounted. Commit ee993b135ec75a93bd5c45e636bb210d2975159b altered how lumpy reclaim works but it should have been harmless. It does not cleanly revert but it's easy to manually revert. I didn't spot any other patches that might be potential problems in the commits. > I'm not saying that mm is the only issue here, but I'm convinced that there > _is_ an mm change that has contributed in a major way to these issues, > even if we've not yet been able to identify it. > > > - net_ratelimit()) > > + net_ratelimit()) { > > IWL_CRIT(priv, "Failed to allocate SKB buffer with %s. Only %u free > > buffers remaining.\n", priority == GFP_ATOMIC ? "GFP_ATOMIC" : > > "GFP_KERNEL", > > Haven't you broken the test 'priority == GFP_ATOMIC' here by setting > priority to GFP_ATOMIC|__GFP_NOWARN? > Yes, I did, but as you say that this error message is showing up and buffers are all depleted, it's not even close to being the right fix. It'd only be relevant if that error message was showing up with buffers remaining in the queue. Revert commit b70d94ee --- diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 557bdad..3a94e4b 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -21,8 +21,7 @@ struct vm_area_struct; #define __GFP_DMA ((__force gfp_t)0x01u) #define __GFP_HIGHMEM ((__force gfp_t)0x02u) #define __GFP_DMA32 ((__force gfp_t)0x04u) -#define __GFP_MOVABLE ((__force gfp_t)0x08u) /* Page is movable */ -#define GFP_ZONEMASK (__GFP_DMA|__GFP_HIGHMEM|__GFP_DMA32|__GFP_MOVABLE) + /* * Action modifiers - doesn't change the zoning * @@ -52,6 +51,7 @@ struct vm_area_struct; #define __GFP_HARDWALL ((__force gfp_t)0x20000u) /* Enforce hardwall cpuset memory allocs */ #define __GFP_THISNODE ((__force gfp_t)0x40000u)/* No fallback, no policies */ #define __GFP_RECLAIMABLE ((__force gfp_t)0x80000u) /* Page is reclaimable */ +#define __GFP_MOVABLE ((__force gfp_t)0x100000u) /* Page is movable */ #ifdef CONFIG_KMEMCHECK #define __GFP_NOTRACK ((__force gfp_t)0x200000u) /* Don't track with kmemcheck */ @@ -128,105 +128,24 @@ static inline int allocflags_to_migratetype(gfp_t gfp_flags) ((gfp_flags & __GFP_RECLAIMABLE) != 0); } -#ifdef CONFIG_HIGHMEM -#define OPT_ZONE_HIGHMEM ZONE_HIGHMEM -#else -#define OPT_ZONE_HIGHMEM ZONE_NORMAL -#endif - +static inline enum zone_type gfp_zone(gfp_t flags) +{ #ifdef CONFIG_ZONE_DMA -#define OPT_ZONE_DMA ZONE_DMA -#else -#define OPT_ZONE_DMA ZONE_NORMAL + if (flags & __GFP_DMA) + return ZONE_DMA; #endif - #ifdef CONFIG_ZONE_DMA32 -#define OPT_ZONE_DMA32 ZONE_DMA32 -#else -#define OPT_ZONE_DMA32 ZONE_NORMAL -#endif - -/* - * GFP_ZONE_TABLE is a word size bitstring that is used for looking up the - * zone to use given the lowest 4 bits of gfp_t. Entries are ZONE_SHIFT long - * and there are 16 of them to cover all possible combinations of - * __GFP_DMA, __GFP_DMA32, __GFP_MOVABLE and __GFP_HIGHMEM - * - * The zone fallback order is MOVABLE=>HIGHMEM=>NORMAL=>DMA32=>DMA. - * But GFP_MOVABLE is not only a zone specifier but also an allocation - * policy. Therefore __GFP_MOVABLE plus another zone selector is valid. - * Only 1bit of the lowest 3 bit (DMA,DMA32,HIGHMEM) can be set to "1". - * - * bit result - * ================= - * 0x0 => NORMAL - * 0x1 => DMA or NORMAL - * 0x2 => HIGHMEM or NORMAL - * 0x3 => BAD (DMA+HIGHMEM) - * 0x4 => DMA32 or DMA or NORMAL - * 0x5 => BAD (DMA+DMA32) - * 0x6 => BAD (HIGHMEM+DMA32) - * 0x7 => BAD (HIGHMEM+DMA32+DMA) - * 0x8 => NORMAL (MOVABLE+0) - * 0x9 => DMA or NORMAL (MOVABLE+DMA) - * 0xa => MOVABLE (Movable is valid only if HIGHMEM is set too) - * 0xb => BAD (MOVABLE+HIGHMEM+DMA) - * 0xc => DMA32 (MOVABLE+HIGHMEM+DMA32) - * 0xd => BAD (MOVABLE+DMA32+DMA) - * 0xe => BAD (MOVABLE+DMA32+HIGHMEM) - * 0xf => BAD (MOVABLE+DMA32+HIGHMEM+DMA) - * - * ZONES_SHIFT must be <= 2 on 32 bit platforms. - */ - -#if 16 * ZONES_SHIFT > BITS_PER_LONG -#error ZONES_SHIFT too large to create GFP_ZONE_TABLE integer + if (flags & __GFP_DMA32) + return ZONE_DMA32; #endif - -#define GFP_ZONE_TABLE ( \ - (ZONE_NORMAL << 0 * ZONES_SHIFT) \ - | (OPT_ZONE_DMA << __GFP_DMA * ZONES_SHIFT) \ - | (OPT_ZONE_HIGHMEM << __GFP_HIGHMEM * ZONES_SHIFT) \ - | (OPT_ZONE_DMA32 << __GFP_DMA32 * ZONES_SHIFT) \ - | (ZONE_NORMAL << __GFP_MOVABLE * ZONES_SHIFT) \ - | (OPT_ZONE_DMA << (__GFP_MOVABLE | __GFP_DMA) * ZONES_SHIFT) \ - | (ZONE_MOVABLE << (__GFP_MOVABLE | __GFP_HIGHMEM) * ZONES_SHIFT)\ - | (OPT_ZONE_DMA32 << (__GFP_MOVABLE | __GFP_DMA32) * ZONES_SHIFT)\ -) - -/* - * GFP_ZONE_BAD is a bitmap for all combination of __GFP_DMA, __GFP_DMA32 - * __GFP_HIGHMEM and __GFP_MOVABLE that are not permitted. One flag per - * entry starting with bit 0. Bit is set if the combination is not - * allowed. - */ -#define GFP_ZONE_BAD ( \ - 1 << (__GFP_DMA | __GFP_HIGHMEM) \ - | 1 << (__GFP_DMA | __GFP_DMA32) \ - | 1 << (__GFP_DMA32 | __GFP_HIGHMEM) \ - | 1 << (__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM) \ - | 1 << (__GFP_MOVABLE | __GFP_HIGHMEM | __GFP_DMA) \ - | 1 << (__GFP_MOVABLE | __GFP_DMA32 | __GFP_DMA) \ - | 1 << (__GFP_MOVABLE | __GFP_DMA32 | __GFP_HIGHMEM) \ - | 1 << (__GFP_MOVABLE | __GFP_DMA32 | __GFP_DMA | __GFP_HIGHMEM)\ -) - -static inline enum zone_type gfp_zone(gfp_t flags) -{ - enum zone_type z; - int bit = flags & GFP_ZONEMASK; - - z = (GFP_ZONE_TABLE >> (bit * ZONES_SHIFT)) & - ((1 << ZONES_SHIFT) - 1); - - if (__builtin_constant_p(bit)) - MAYBE_BUILD_BUG_ON((GFP_ZONE_BAD >> bit) & 1); - else { -#ifdef CONFIG_DEBUG_VM - BUG_ON((GFP_ZONE_BAD >> bit) & 1); + if ((flags & (__GFP_HIGHMEM | __GFP_MOVABLE)) == + (__GFP_HIGHMEM | __GFP_MOVABLE)) + return ZONE_MOVABLE; +#ifdef CONFIG_HIGHMEM + if (flags & __GFP_HIGHMEM) + return ZONE_HIGHMEM; #endif - } - return z; + return ZONE_NORMAL; } /* -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-14 15:40 ` Mel Gorman @ 2009-10-14 16:13 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-14 16:13 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Wednesday 14 October 2009, Mel Gorman wrote: > You highlighted the first akpm merge so lets look closer at that as I > don't think there is anything more I can do with the wireless driver > other than the suggestions made already. Thanks a lot for that analysis Mel. I'll see if I can come up with additional data based of the info you provide here. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-14 16:13 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-14 16:13 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Wednesday 14 October 2009, Mel Gorman wrote: > You highlighted the first akpm merge so lets look closer at that as I > don't think there is anything more I can do with the wireless driver > other than the suggestions made already. Thanks a lot for that analysis Mel. I'll see if I can come up with additional data based of the info you provide here. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-14 15:40 ` Mel Gorman @ 2009-10-14 18:34 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-14 18:34 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm Some initial results; all negative I'm afraid. On Wednesday 14 October 2009, Mel Gorman wrote: > This is what I found. The following were the possible commits that might > be causing the problem. > 56e49d2..f166777 -- reclaim > I would have considered this strong candidates except again, the > last good commit happened after this point. If other obvious > candidates don't crop up, it might be worth double checking > within this range, particularly commit 56e49d2 vmscan: evict > use-once pages first as it is targeted at streaming-IO workloads > which would include your music workload. Reverted 56e49d2 on top of .31.1; no change. > 5c87ead..e9bb35d -- inactive ratio changes > These patches should be harmless but just in case, please > compare the output of > # grep inactive_ratio /proc/zoneinfo > on 2.6.30 and 2.6.31 and make sure the ratios are the same. The same for both (and for .32). DMA: 1; DMA32: 3 > Commit b70d94e altered how zonelists were selected during > allocation. This was tested fairly heavily but if the testing > missed something, it would mean that some allocations are not > using the zones they should be. Reverted on top of .31.1; no change. > Commit bc75d33 is totally harmless but it mentions > min_free_kbytes. I checked on my machine to make sure > min_free_kbytes was the same on both 2.6.30 and 2.6.31. Can you > check that this is true for your machine? If min_free_kbytes > decreased, it could explain GFP_ATOMIC failures. Virtually identical. .30: 5704; .31/.32: 5711 > After this point, your analysis indicates that things are already broken > but lets look at some of the candidates anyway. Out of curiousity, > was CONFIG_UNEVICTABLE_LRU unset in your .config for 2.6.30? I could > only find your 2.6.31 .config. If it was, it might be worth reverting > 6837765963f1723e80ca97b1fae660f3a60d77df and unsetting it in 2.6.31 and > seeing what happens. CONFIG_UNEVICTABLE_LRU was set and during bisections I've always accepted the default, which was "y". > Commit ee993b135ec75a93bd5c45e636bb210d2975159b altered how lumpy > reclaim works but it should have been harmless. It does not cleanly > revert but it's easy to manually revert. Reverted on top of .31.1; no change. I'll do some more digging in the 'akpm' merge. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-14 18:34 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-14 18:34 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm Some initial results; all negative I'm afraid. On Wednesday 14 October 2009, Mel Gorman wrote: > This is what I found. The following were the possible commits that might > be causing the problem. > 56e49d2..f166777 -- reclaim > I would have considered this strong candidates except again, the > last good commit happened after this point. If other obvious > candidates don't crop up, it might be worth double checking > within this range, particularly commit 56e49d2 vmscan: evict > use-once pages first as it is targeted at streaming-IO workloads > which would include your music workload. Reverted 56e49d2 on top of .31.1; no change. > 5c87ead..e9bb35d -- inactive ratio changes > These patches should be harmless but just in case, please > compare the output of > # grep inactive_ratio /proc/zoneinfo > on 2.6.30 and 2.6.31 and make sure the ratios are the same. The same for both (and for .32). DMA: 1; DMA32: 3 > Commit b70d94e altered how zonelists were selected during > allocation. This was tested fairly heavily but if the testing > missed something, it would mean that some allocations are not > using the zones they should be. Reverted on top of .31.1; no change. > Commit bc75d33 is totally harmless but it mentions > min_free_kbytes. I checked on my machine to make sure > min_free_kbytes was the same on both 2.6.30 and 2.6.31. Can you > check that this is true for your machine? If min_free_kbytes > decreased, it could explain GFP_ATOMIC failures. Virtually identical. .30: 5704; .31/.32: 5711 > After this point, your analysis indicates that things are already broken > but lets look at some of the candidates anyway. Out of curiousity, > was CONFIG_UNEVICTABLE_LRU unset in your .config for 2.6.30? I could > only find your 2.6.31 .config. If it was, it might be worth reverting > 6837765963f1723e80ca97b1fae660f3a60d77df and unsetting it in 2.6.31 and > seeing what happens. CONFIG_UNEVICTABLE_LRU was set and during bisections I've always accepted the default, which was "y". > Commit ee993b135ec75a93bd5c45e636bb210d2975159b altered how lumpy > reclaim works but it should have been harmless. It does not cleanly > revert but it's easy to manually revert. Reverted on top of .31.1; no change. I'll do some more digging in the 'akpm' merge. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-14 18:34 ` Frans Pop (?) @ 2009-10-14 23:56 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-14 23:56 UTC (permalink / raw) To: Frans Pop Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Wed, Oct 14, 2009 at 08:34:56PM +0200, Frans Pop wrote: > Some initial results; all negative I'm afraid. > These are highly unlikely candidates. I say highly unlikely because they are before the page allocator patches when your analysis indicated things were ok. Commit 70ac23c readahead: sequential mmap readahead This affects readahead for mmap() and could have an impact on the number of allocations made by the streaming IO. This might be generating more bursty network traffic in 2.6.31 than 2.6.30 and affecting the allocation apttern enough to cause problems Commit 2fad6f5 readahead: enforce full readahead size on async mmap readahead Another readahead change that may affect the rate of network traffic being generated when streaming IO over the network Commit 10be0b3 readahead: introduce context readahead algorithm By using readahead in more situations, it again may be affecting the burst rate of network traffic and the rate of GFP_ATOMIC arrivals Commit 78dc583 vmscan: low order lumpy reclaim also should use PAGEOUT_IO_SYNC Very low probability that this is a problem, but it affects lumpy reclaim and so has to be considered. It's an awkward revert but I think the most important part is just to revert the condition that checks if congestion_wait() should be called or not I relooked at the page allocator patches themselves just in case. Of the patches in there, I came up with Commit 11e33f6 page allocator: break up the allocator entry point into fast and slow paths This is possibly the most disruptive patch in the set. It should not have affected behaviour but the complexity of the patch is quite high. I did spot an oddity whereby a process exiting making a __GFP_NOFAIL allocation can ignore watermarks. It's unlikely this is the problem but as the journal layer uses __GFP_NOFAIL, you never know - it might be pushing things down low enough for other watermark checks to fail. Patch is below. This is also the patch that cause kswapd to wake up less. I sent a patch for that problem but I still don't know if it reduced the number of failures for you or not. Commit f2260e6 page allocator: update NR_FREE_PAGES only as necessary This patch affects the timing of when NR_FREE_PAGES is updated. The reclaim algorithm makes decisions based on this NR_FREE_PAGES value. Crucially, the value can determine if the anon list is force scanned or not. The window during which this can make a difference should be extremely small but maybe it's enough to make a difference. Outside the range of commits suspected of causing problems was the following. It's extremely low probability Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write confusion This patch alters the call to congestion_wait() in the page allocator. Frankly, I don't get the change but it might worth checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31 makes any difference After a lot more eyeballing, the best next candidate within mm is the following patch. Should be tested on it's own and in combination with the wakeup-kswapd patch sent before. ==== >From 4e8b5217f51a00caee527e4e8d8e46fe9f82b482 Mon Sep 17 00:00:00 2001 From: Mel Gorman <mel@csn.ul.ie> Date: Thu, 15 Oct 2009 00:17:05 +0100 Subject: [PATCH] page allocator: Direct reclaim should always obey watermarks ALLOC_NO_WATERMARKS should be cleared when trying to allocate from the free-lists after a direct reclaim. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- mm/page_alloc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3694609..619933d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1920,7 +1920,8 @@ rebalance: page = __alloc_pages_direct_reclaim(gfp_mask, order, zonelist, high_zoneidx, nodemask, - alloc_flags, preferred_zone, + alloc_flags & ~ALLOC_NO_WATERMARKS, + preferred_zone, migratetype, &did_some_progress); if (page) goto got_pg; ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-14 23:56 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-14 23:56 UTC (permalink / raw) To: Frans Pop Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Wed, Oct 14, 2009 at 08:34:56PM +0200, Frans Pop wrote: > Some initial results; all negative I'm afraid. > These are highly unlikely candidates. I say highly unlikely because they are before the page allocator patches when your analysis indicated things were ok. Commit 70ac23c readahead: sequential mmap readahead This affects readahead for mmap() and could have an impact on the number of allocations made by the streaming IO. This might be generating more bursty network traffic in 2.6.31 than 2.6.30 and affecting the allocation apttern enough to cause problems Commit 2fad6f5 readahead: enforce full readahead size on async mmap readahead Another readahead change that may affect the rate of network traffic being generated when streaming IO over the network Commit 10be0b3 readahead: introduce context readahead algorithm By using readahead in more situations, it again may be affecting the burst rate of network traffic and the rate of GFP_ATOMIC arrivals Commit 78dc583 vmscan: low order lumpy reclaim also should use PAGEOUT_IO_SYNC Very low probability that this is a problem, but it affects lumpy reclaim and so has to be considered. It's an awkward revert but I think the most important part is just to revert the condition that checks if congestion_wait() should be called or not I relooked at the page allocator patches themselves just in case. Of the patches in there, I came up with Commit 11e33f6 page allocator: break up the allocator entry point into fast and slow paths This is possibly the most disruptive patch in the set. It should not have affected behaviour but the complexity of the patch is quite high. I did spot an oddity whereby a process exiting making a __GFP_NOFAIL allocation can ignore watermarks. It's unlikely this is the problem but as the journal layer uses __GFP_NOFAIL, you never know - it might be pushing things down low enough for other watermark checks to fail. Patch is below. This is also the patch that cause kswapd to wake up less. I sent a patch for that problem but I still don't know if it reduced the number of failures for you or not. Commit f2260e6 page allocator: update NR_FREE_PAGES only as necessary This patch affects the timing of when NR_FREE_PAGES is updated. The reclaim algorithm makes decisions based on this NR_FREE_PAGES value. Crucially, the value can determine if the anon list is force scanned or not. The window during which this can make a difference should be extremely small but maybe it's enough to make a difference. Outside the range of commits suspected of causing problems was the following. It's extremely low probability Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write confusion This patch alters the call to congestion_wait() in the page allocator. Frankly, I don't get the change but it might worth checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31 makes any difference After a lot more eyeballing, the best next candidate within mm is the following patch. Should be tested on it's own and in combination with the wakeup-kswapd patch sent before. ==== From 4e8b5217f51a00caee527e4e8d8e46fe9f82b482 Mon Sep 17 00:00:00 2001 From: Mel Gorman <mel@csn.ul.ie> Date: Thu, 15 Oct 2009 00:17:05 +0100 Subject: [PATCH] page allocator: Direct reclaim should always obey watermarks ALLOC_NO_WATERMARKS should be cleared when trying to allocate from the free-lists after a direct reclaim. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- mm/page_alloc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3694609..619933d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1920,7 +1920,8 @@ rebalance: page = __alloc_pages_direct_reclaim(gfp_mask, order, zonelist, high_zoneidx, nodemask, - alloc_flags, preferred_zone, + alloc_flags & ~ALLOC_NO_WATERMARKS, + preferred_zone, migratetype, &did_some_progress); if (page) goto got_pg; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-14 23:56 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-14 23:56 UTC (permalink / raw) To: Frans Pop Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Wed, Oct 14, 2009 at 08:34:56PM +0200, Frans Pop wrote: > Some initial results; all negative I'm afraid. > These are highly unlikely candidates. I say highly unlikely because they are before the page allocator patches when your analysis indicated things were ok. Commit 70ac23c readahead: sequential mmap readahead This affects readahead for mmap() and could have an impact on the number of allocations made by the streaming IO. This might be generating more bursty network traffic in 2.6.31 than 2.6.30 and affecting the allocation apttern enough to cause problems Commit 2fad6f5 readahead: enforce full readahead size on async mmap readahead Another readahead change that may affect the rate of network traffic being generated when streaming IO over the network Commit 10be0b3 readahead: introduce context readahead algorithm By using readahead in more situations, it again may be affecting the burst rate of network traffic and the rate of GFP_ATOMIC arrivals Commit 78dc583 vmscan: low order lumpy reclaim also should use PAGEOUT_IO_SYNC Very low probability that this is a problem, but it affects lumpy reclaim and so has to be considered. It's an awkward revert but I think the most important part is just to revert the condition that checks if congestion_wait() should be called or not I relooked at the page allocator patches themselves just in case. Of the patches in there, I came up with Commit 11e33f6 page allocator: break up the allocator entry point into fast and slow paths This is possibly the most disruptive patch in the set. It should not have affected behaviour but the complexity of the patch is quite high. I did spot an oddity whereby a process exiting making a __GFP_NOFAIL allocation can ignore watermarks. It's unlikely this is the problem but as the journal layer uses __GFP_NOFAIL, you never know - it might be pushing things down low enough for other watermark checks to fail. Patch is below. This is also the patch that cause kswapd to wake up less. I sent a patch for that problem but I still don't know if it reduced the number of failures for you or not. Commit f2260e6 page allocator: update NR_FREE_PAGES only as necessary This patch affects the timing of when NR_FREE_PAGES is updated. The reclaim algorithm makes decisions based on this NR_FREE_PAGES value. Crucially, the value can determine if the anon list is force scanned or not. The window during which this can make a difference should be extremely small but maybe it's enough to make a difference. Outside the range of commits suspected of causing problems was the following. It's extremely low probability Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write confusion This patch alters the call to congestion_wait() in the page allocator. Frankly, I don't get the change but it might worth checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31 makes any difference After a lot more eyeballing, the best next candidate within mm is the following patch. Should be tested on it's own and in combination with the wakeup-kswapd patch sent before. ==== ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-14 23:56 ` Mel Gorman @ 2009-10-15 20:15 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-15 20:15 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Thursday 15 October 2009, Mel Gorman wrote: > After a lot more eyeballing, the best next candidate within mm is the > following patch. Should be tested on it's own and in combination with > the wakeup-kswapd patch sent before. > > From 4e8b5217f51a00caee527e4e8d8e46fe9f82b482 Mon Sep 17 00:00:00 2001 > From: Mel Gorman <mel@csn.ul.ie> > Date: Thu, 15 Oct 2009 00:17:05 +0100 > Subject: [PATCH] page allocator: Direct reclaim should always obey > watermarks > > ALLOC_NO_WATERMARKS should be cleared when trying to allocate from the > free-lists after a direct reclaim. I've tested the two patches together and this seems like a definite improvement. I still get SKB errors on the first test, but the desktop freezes are a lot shorter and the total time needed to load the 3rd gitk goes down from ~2:15 to ~1:15. The counter in gitk of the number of loaded commits goes up visibly faster and with fewer halts. This is on top of current mainline with the RX_LOW_WATERMARK in iwlagn at it's current value (8). Here are the allocation failures for 2 consecutive tests. Note that the first test still shows quite a lot of failures, but the second test hardly had any at all (I still had music skips though). [ 232.845116] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. [ 232.845116] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. [ 232.873009] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. [ 232.884545] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. [ 240.121577] __ratelimit: 26 callbacks suppressed [ 240.121634] __ratelimit: 6 callbacks suppressed [ 240.124006] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 6 free buffers remaining. [ 304.335767] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. [ 304.335767] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. [ 304.374729] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. [ 309.446481] __ratelimit: 5 callbacks suppressed [ 309.450197] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. [ 525.912934] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 5 free buffers remaining. [ 525.953939] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 7 free buffers remaining. [ 536.058171] __ratelimit: 1 callbacks suppressed Thanks, FJP ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-15 20:15 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-15 20:15 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Thursday 15 October 2009, Mel Gorman wrote: > After a lot more eyeballing, the best next candidate within mm is the > following patch. Should be tested on it's own and in combination with > the wakeup-kswapd patch sent before. > > From 4e8b5217f51a00caee527e4e8d8e46fe9f82b482 Mon Sep 17 00:00:00 2001 > From: Mel Gorman <mel@csn.ul.ie> > Date: Thu, 15 Oct 2009 00:17:05 +0100 > Subject: [PATCH] page allocator: Direct reclaim should always obey > watermarks > > ALLOC_NO_WATERMARKS should be cleared when trying to allocate from the > free-lists after a direct reclaim. I've tested the two patches together and this seems like a definite improvement. I still get SKB errors on the first test, but the desktop freezes are a lot shorter and the total time needed to load the 3rd gitk goes down from ~2:15 to ~1:15. The counter in gitk of the number of loaded commits goes up visibly faster and with fewer halts. This is on top of current mainline with the RX_LOW_WATERMARK in iwlagn at it's current value (8). Here are the allocation failures for 2 consecutive tests. Note that the first test still shows quite a lot of failures, but the second test hardly had any at all (I still had music skips though). [ 232.845116] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. [ 232.845116] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. [ 232.873009] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. [ 232.884545] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. [ 240.121577] __ratelimit: 26 callbacks suppressed [ 240.121634] __ratelimit: 6 callbacks suppressed [ 240.124006] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 6 free buffers remaining. [ 304.335767] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. [ 304.335767] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. [ 304.374729] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. [ 309.446481] __ratelimit: 5 callbacks suppressed [ 309.450197] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. [ 525.912934] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 5 free buffers remaining. [ 525.953939] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 7 free buffers remaining. [ 536.058171] __ratelimit: 1 callbacks suppressed Thanks, FJP -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-15 20:15 ` Frans Pop @ 2009-10-16 9:39 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-16 9:39 UTC (permalink / raw) To: Frans Pop Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Thu, Oct 15, 2009 at 10:15:09PM +0200, Frans Pop wrote: > On Thursday 15 October 2009, Mel Gorman wrote: > > After a lot more eyeballing, the best next candidate within mm is the > > following patch. Should be tested on it's own and in combination with > > the wakeup-kswapd patch sent before. > > > > From 4e8b5217f51a00caee527e4e8d8e46fe9f82b482 Mon Sep 17 00:00:00 2001 > > From: Mel Gorman <mel@csn.ul.ie> > > Date: Thu, 15 Oct 2009 00:17:05 +0100 > > Subject: [PATCH] page allocator: Direct reclaim should always obey > > watermarks > > > > ALLOC_NO_WATERMARKS should be cleared when trying to allocate from the > > free-lists after a direct reclaim. > > I've tested the two patches together and this seems like a definite > improvement. You probably don't need the mental image, but this made me do a happy dance. Certainly helps my cold! > I still get SKB errors on the first test, but the desktop > freezes are a lot shorter and the total time needed to load the 3rd gitk > goes down from ~2:15 to ~1:15. The counter in gitk of the number of > loaded commits goes up visibly faster and with fewer halts. > This brings us close to the state of affairs before the akpm merge. There might still be something missing in either the mm area or the wireless driver but any improvement is better than none. > This is on top of current mainline with the RX_LOW_WATERMARK in iwlagn > at it's current value (8). > > Here are the allocation failures for 2 consecutive tests. Note that the > first test still shows quite a lot of failures, but the second test hardly > had any at all (I still had music skips though). > So, we are still dealing with three problems. 1. GFP_ATOMICS were introduced to the wireless driver in the 2.6.30..2.6.31 timeframe. It has been more or less identified as being the tasklet off-loading and the pools being depleted too easily. This still needs to be fixed. 2. There is also some firmware reloading problem of an unknown source 3. There was an mm regression that made GFP_ATOMIC failures much worse. This is at least partially due to tasks exiting being able to go below the watermarks and kswapd not being woken up when it should be. This could be the source of the allocation failures on resume that have nothing to do with the iwlagn wireless driver. I am going to put together the pair of patches against mainline with a recommendation they be also applied for 2.6.31.5. I'll keep looking to see can I spot another possible candidate for GFP_ATOMIC being worse than it was. > [ 232.845116] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > [ 232.845116] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > [ 232.873009] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > [ 232.884545] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > [ 240.121577] __ratelimit: 26 callbacks suppressed > [ 240.121634] __ratelimit: 6 callbacks suppressed > [ 240.124006] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 6 free buffers remaining. > [ 304.335767] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > [ 304.335767] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > [ 304.374729] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > [ 309.446481] __ratelimit: 5 callbacks suppressed > [ 309.450197] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > > [ 525.912934] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 5 free buffers remaining. > [ 525.953939] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 7 free buffers remaining. > [ 536.058171] __ratelimit: 1 callbacks suppressed > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-16 9:39 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-16 9:39 UTC (permalink / raw) To: Frans Pop Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Thu, Oct 15, 2009 at 10:15:09PM +0200, Frans Pop wrote: > On Thursday 15 October 2009, Mel Gorman wrote: > > After a lot more eyeballing, the best next candidate within mm is the > > following patch. Should be tested on it's own and in combination with > > the wakeup-kswapd patch sent before. > > > > From 4e8b5217f51a00caee527e4e8d8e46fe9f82b482 Mon Sep 17 00:00:00 2001 > > From: Mel Gorman <mel@csn.ul.ie> > > Date: Thu, 15 Oct 2009 00:17:05 +0100 > > Subject: [PATCH] page allocator: Direct reclaim should always obey > > watermarks > > > > ALLOC_NO_WATERMARKS should be cleared when trying to allocate from the > > free-lists after a direct reclaim. > > I've tested the two patches together and this seems like a definite > improvement. You probably don't need the mental image, but this made me do a happy dance. Certainly helps my cold! > I still get SKB errors on the first test, but the desktop > freezes are a lot shorter and the total time needed to load the 3rd gitk > goes down from ~2:15 to ~1:15. The counter in gitk of the number of > loaded commits goes up visibly faster and with fewer halts. > This brings us close to the state of affairs before the akpm merge. There might still be something missing in either the mm area or the wireless driver but any improvement is better than none. > This is on top of current mainline with the RX_LOW_WATERMARK in iwlagn > at it's current value (8). > > Here are the allocation failures for 2 consecutive tests. Note that the > first test still shows quite a lot of failures, but the second test hardly > had any at all (I still had music skips though). > So, we are still dealing with three problems. 1. GFP_ATOMICS were introduced to the wireless driver in the 2.6.30..2.6.31 timeframe. It has been more or less identified as being the tasklet off-loading and the pools being depleted too easily. This still needs to be fixed. 2. There is also some firmware reloading problem of an unknown source 3. There was an mm regression that made GFP_ATOMIC failures much worse. This is at least partially due to tasks exiting being able to go below the watermarks and kswapd not being woken up when it should be. This could be the source of the allocation failures on resume that have nothing to do with the iwlagn wireless driver. I am going to put together the pair of patches against mainline with a recommendation they be also applied for 2.6.31.5. I'll keep looking to see can I spot another possible candidate for GFP_ATOMIC being worse than it was. > [ 232.845116] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > [ 232.845116] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > [ 232.873009] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > [ 232.884545] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > [ 240.121577] __ratelimit: 26 callbacks suppressed > [ 240.121634] __ratelimit: 6 callbacks suppressed > [ 240.124006] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 6 free buffers remaining. > [ 304.335767] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > [ 304.335767] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > [ 304.374729] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > [ 309.446481] __ratelimit: 5 callbacks suppressed > [ 309.450197] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > > [ 525.912934] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 5 free buffers remaining. > [ 525.953939] iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 7 free buffers remaining. > [ 536.058171] __ratelimit: 1 callbacks suppressed > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-14 13:10 ` Frans Pop @ 2009-10-14 16:30 ` reinette chatre 2009-10-14 16:30 ` reinette chatre 2009-10-18 23:33 ` Frans Pop 2 siblings, 0 replies; 384+ messages in thread From: reinette chatre @ 2009-10-14 16:30 UTC (permalink / raw) To: Frans Pop Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm, Kalle Valo On Wed, 2009-10-14 at 06:10 -0700, Frans Pop wrote: > On Wednesday 14 October 2009, Mel Gorman wrote: > > The majority of the wireless reports have been in > > this driver and I think we have the problem commit there. The only other > > is a firmware loading problem in e100 after resume that fails to make an > > atomic order-5 fail. > > Not exactly true. Bartlomiej's report was about ipw2200, so there are at > least 3 different drivers involved, two wireless and one wired. Besides > that one report is related to heavy swap, one to resume and one to driver > reload. Another report arrived today. Please see http://thread.gmane.org/gmane.linux.kernel.wireless.general/40858 - it is an order-5 allocation failure during driver reload. Reinette ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-14 16:30 ` reinette chatre 0 siblings, 0 replies; 384+ messages in thread From: reinette chatre @ 2009-10-14 16:30 UTC (permalink / raw) To: Frans Pop Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm, Kalle Valo On Wed, 2009-10-14 at 06:10 -0700, Frans Pop wrote: > On Wednesday 14 October 2009, Mel Gorman wrote: > > The majority of the wireless reports have been in > > this driver and I think we have the problem commit there. The only other > > is a firmware loading problem in e100 after resume that fails to make an > > atomic order-5 fail. > > Not exactly true. Bartlomiej's report was about ipw2200, so there are at > least 3 different drivers involved, two wireless and one wired. Besides > that one report is related to heavy swap, one to resume and one to driver > reload. Another report arrived today. Please see http://thread.gmane.org/gmane.linux.kernel.wireless.general/40858 - it is an order-5 allocation failure during driver reload. Reinette -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-14 13:10 ` Frans Pop 2009-10-14 15:40 ` Mel Gorman @ 2009-10-18 23:33 ` Frans Pop 2009-10-18 23:33 ` Frans Pop 2 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-18 23:33 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm Another long mail, sorry. On Wednesday 14 October 2009, Frans Pop wrote: > > There still has not been a mm-change identified that makes > > fragmentation significantly worse. > > My bisection shows a very clear point, even if not an individual commit, > in the 'akpm' merge where SKB errors suddenly become *much* more > frequent and easy to trigger. > I'm sorry to say this, but the fact that nothing has been identified yet > is IMO the result of a lack of effort, not because there is no such > change. I was wrong. It turns out that I was creating the variations in the test results around the akpm merge myself by tiny changes in the way I ran the tests. It took another round of about 30 compilations and tests purely in this range to show that, but those same tests also made me aware of other patterns I should look at. Until a few days ago I was concentrating on "do I see SKB allocation errors or not". Since then I've also been looking more consciously at when they happen, at disk access patterns and at desktop freeze patterns. I think I did mention before that this whole issue is rather subtle :-/ So, my apologies for finguering the wrong area for so long, but it looked solid given the info available at the time. On Thursday 15 October 2009, Mel Gorman wrote: > Outside the range of commits suspected of causing problems was the > following. It's extremely low probability > > Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write confusion > This patch alters the call to congestion_wait() in the page > allocator. Frankly, I don't get the change but it might worth > checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31 > makes any difference This is the real culprit. Mel: thanks very much for looking beyond the area I identified. Your overview of mm changes was exactly what I needed and really helped a lot during my later tests. This commit definitely causes most of the problems; confirmed by reverting it on top of 2.6.31 (also requires reverting 373c0a7e, which is a later build fix). The rest of this mail gives details on my tests and how I reached the above conclusion. TEST BASELINE (2.6.30) ====================== I mentioned in an earlier mail that I run three instances of gitk for my tests. Loading gitk seems to consist of 3 phases: 1) general initial scan of the repository (branches?) 2) reading commits: commit counter increases 3) reading references (including bisection good/bad points) and uncommitted changes Below times and comments per stage when the test is run with 2.6.30. As my test starts after a clean boot, buffers are mostly empty. 1st instance: 'gitk v2.6.29..master' (preparation) 1) ~20 seconds; user interface is mostly blank 2) ~5 seconds to read 35.000 commits; user interface is updated and counter increases steadily as they are read 3) ~10 seconds; "branch"/"follows"/"precedes" info and tags are filled in; fairly heavy disk activity 2st instance: 'gitk master' (preparation) 1) 0 seconds (because data is already buffered) 2) ~25 seconds to read 167500 commits; counter increases steadily 3) 1-2 seconds (because data is already buffered) 3st instance: 'gitk master' (the actual test) 1) 0 seconds because data is already buffered 2) ~55 seconds due to swapping overhead; minor music skip around commit 110.000; counter slower after 90.000, some short halts, but generally increases steadily; moderate disk activity 3) ~55-60 seconds; because buffers have been emptied data must by read again, with swapping; very heavy disk activity; fairly long music skip (15-20 seconds), but no SKB allocation errors So, the loading of the 3rd instance takes 1.5 minutes longer than the second because of the swapping. And phase 3) is most affected by it. AFTER WIRELESS CHANGE ===================== After commit 4752c93c30 ("iwlcore: Allow skb allocation from tasklet") I start getting the SKB errors. They can be triggered reliably if the whole test is repeated 1 or 2 times, but generally not the first time the test is run. Or so I thought for a long time. It turns out that I will get SKB errors during the first run if I'm "sloppy" in the test execution. For example if I wait too long before switching from the last gitk instance to konsole where I have a 'tail -f /var/log/kern.log' running. Another factor is the state of the repository: do I have master checked out, or an older branch, or am I in the middle of a bisection. This influences how data is read from the disk and thus the test results. A last factor may be the size of the kernel I'm using: my test/bisect kernel is significantly smaller than my regular kernel. If the test is run completely cleanly, I will not get SKB errors during the first run. Also, this change does not affect the timings of the test at all: the total load time of the 3rd instance is still ~1:55 and music skips happen in roughly the same places. The pattern of disk activity also remains unchanged. If I do *not* run the test cleanly, any SKB errors during the first test run will always be during phase 3), never during phase 2). This is what I saw during tests in the 'akpm' range, and explains the inconsistent results there. After discovering this I've made a copy of the git repo so that I always test using the exact same state and tightened my test procedure. AFTER congestion_wait CHANGE ============================ If I test commit 9f2d8be, which is just before the congestion_wait() change, I still get the same pattern as described above. But when I test with 8aa7e84 ("Fix congestion_wait() sync/async vs read/write confusion"), things change dramatically when the 3rd gitk instance is started. During the 2nd phase I see the first SKB allocation errors with a music skip between reading commits 95.000 and 110.000. About commit 115.000 there is a very long pause during which the counter does not increase, music stops and the desktop freezes completely. The first 30 seconds of that freeze there is only very low disk activity (which seems strange); the next 25 seconds there suddenly is very high disk activity during which things gradually unfreeze and more SKB errors are displayed. After that the commit counter runs up fairly steadily again. Phase 2) ends at ~1:45. Phase 3) (with more SKB errors) ends at ~2:05. So this change almost doubles the time needed for phase 2) and causes SKB allocation errors to occur during that phase. Also, before this commit the desktop freezes are much shorter and less severe. With this change the desktop is completely unusable for almost a minute during phase 2), with even the mouse pointer frozen solid. Note that phase 3) becomes shorter, but that the total time needed to load the 3rd instance increases by about 10-15 seconds. Note: -rc2 and -rc3 had broken NFS, so I had to cherry-pick 3 NFS commits from -rc4 on top of the commits I wanted to test. WITH congestion_wait CHANGE REVERTED ==================================== I've done quite a few tests of 2.6.31 with 373c0a7e and 8aa7e847 reverted to confirm that's really the culprit. I've done this for .31-rc3, .31-rc4, .31-rc5, .31 and .31.1. In all cases the huge freeze in phase 2) is gone and the general behavior and timings are again as it was after the wireless change. During most tests I did not get any SKB allocation errors during phase 2) or phase 3). However with .31-rc5, .31 and .31.1 I have had some tests where I would see a few SKB allocation errors during phase 3) (which is somewhat likely), but also during phase 2). At this point I'm unsure whether this is just noise, or maybe a minor influence from some change merged after .31-rc4. Looking through the commits there are several mm/page allocation changes. For now I suggest ignoring this though as the impact (if any) is very minor and it is not reproducible reliably enough. Next I'll retest Mel's patches and also test Reinette's patches. Cheers, FJP ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-18 23:33 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-18 23:33 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg Another long mail, sorry. On Wednesday 14 October 2009, Frans Pop wrote: > > There still has not been a mm-change identified that makes > > fragmentation significantly worse. > > My bisection shows a very clear point, even if not an individual commit, > in the 'akpm' merge where SKB errors suddenly become *much* more > frequent and easy to trigger. > I'm sorry to say this, but the fact that nothing has been identified yet > is IMO the result of a lack of effort, not because there is no such > change. I was wrong. It turns out that I was creating the variations in the test results around the akpm merge myself by tiny changes in the way I ran the tests. It took another round of about 30 compilations and tests purely in this range to show that, but those same tests also made me aware of other patterns I should look at. Until a few days ago I was concentrating on "do I see SKB allocation errors or not". Since then I've also been looking more consciously at when they happen, at disk access patterns and at desktop freeze patterns. I think I did mention before that this whole issue is rather subtle :-/ So, my apologies for finguering the wrong area for so long, but it looked solid given the info available at the time. On Thursday 15 October 2009, Mel Gorman wrote: > Outside the range of commits suspected of causing problems was the > following. It's extremely low probability > > Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write confusion > This patch alters the call to congestion_wait() in the page > allocator. Frankly, I don't get the change but it might worth > checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31 > makes any difference This is the real culprit. Mel: thanks very much for looking beyond the area I identified. Your overview of mm changes was exactly what I needed and really helped a lot during my later tests. This commit definitely causes most of the problems; confirmed by reverting it on top of 2.6.31 (also requires reverting 373c0a7e, which is a later build fix). The rest of this mail gives details on my tests and how I reached the above conclusion. TEST BASELINE (2.6.30) ====================== I mentioned in an earlier mail that I run three instances of gitk for my tests. Loading gitk seems to consist of 3 phases: 1) general initial scan of the repository (branches?) 2) reading commits: commit counter increases 3) reading references (including bisection good/bad points) and uncommitted changes Below times and comments per stage when the test is run with 2.6.30. As my test starts after a clean boot, buffers are mostly empty. 1st instance: 'gitk v2.6.29..master' (preparation) 1) ~20 seconds; user interface is mostly blank 2) ~5 seconds to read 35.000 commits; user interface is updated and counter increases steadily as they are read 3) ~10 seconds; "branch"/"follows"/"precedes" info and tags are filled in; fairly heavy disk activity 2st instance: 'gitk master' (preparation) 1) 0 seconds (because data is already buffered) 2) ~25 seconds to read 167500 commits; counter increases steadily 3) 1-2 seconds (because data is already buffered) 3st instance: 'gitk master' (the actual test) 1) 0 seconds because data is already buffered 2) ~55 seconds due to swapping overhead; minor music skip around commit 110.000; counter slower after 90.000, some short halts, but generally increases steadily; moderate disk activity 3) ~55-60 seconds; because buffers have been emptied data must by read again, with swapping; very heavy disk activity; fairly long music skip (15-20 seconds), but no SKB allocation errors So, the loading of the 3rd instance takes 1.5 minutes longer than the second because of the swapping. And phase 3) is most affected by it. AFTER WIRELESS CHANGE ===================== After commit 4752c93c30 ("iwlcore: Allow skb allocation from tasklet") I start getting the SKB errors. They can be triggered reliably if the whole test is repeated 1 or 2 times, but generally not the first time the test is run. Or so I thought for a long time. It turns out that I will get SKB errors during the first run if I'm "sloppy" in the test execution. For example if I wait too long before switching from the last gitk instance to konsole where I have a 'tail -f /var/log/kern.log' running. Another factor is the state of the repository: do I have master checked out, or an older branch, or am I in the middle of a bisection. This influences how data is read from the disk and thus the test results. A last factor may be the size of the kernel I'm using: my test/bisect kernel is significantly smaller than my regular kernel. If the test is run completely cleanly, I will not get SKB errors during the first run. Also, this change does not affect the timings of the test at all: the total load time of the 3rd instance is still ~1:55 and music skips happen in roughly the same places. The pattern of disk activity also remains unchanged. If I do *not* run the test cleanly, any SKB errors during the first test run will always be during phase 3), never during phase 2). This is what I saw during tests in the 'akpm' range, and explains the inconsistent results there. After discovering this I've made a copy of the git repo so that I always test using the exact same state and tightened my test procedure. AFTER congestion_wait CHANGE ============================ If I test commit 9f2d8be, which is just before the congestion_wait() change, I still get the same pattern as described above. But when I test with 8aa7e84 ("Fix congestion_wait() sync/async vs read/write confusion"), things change dramatically when the 3rd gitk instance is started. During the 2nd phase I see the first SKB allocation errors with a music skip between reading commits 95.000 and 110.000. About commit 115.000 there is a very long pause during which the counter does not increase, music stops and the desktop freezes completely. The first 30 seconds of that freeze there is only very low disk activity (which seems strange); the next 25 seconds there suddenly is very high disk activity during which things gradually unfreeze and more SKB errors are displayed. After that the commit counter runs up fairly steadily again. Phase 2) ends at ~1:45. Phase 3) (with more SKB errors) ends at ~2:05. So this change almost doubles the time needed for phase 2) and causes SKB allocation errors to occur during that phase. Also, before this commit the desktop freezes are much shorter and less severe. With this change the desktop is completely unusable for almost a minute during phase 2), with even the mouse pointer frozen solid. Note that phase 3) becomes shorter, but that the total time needed to load the 3rd instance increases by about 10-15 seconds. Note: -rc2 and -rc3 had broken NFS, so I had to cherry-pick 3 NFS commits from -rc4 on top of the commits I wanted to test. WITH congestion_wait CHANGE REVERTED ==================================== I've done quite a few tests of 2.6.31 with 373c0a7e and 8aa7e847 reverted to confirm that's really the culprit. I've done this for .31-rc3, .31-rc4, .31-rc5, .31 and .31.1. In all cases the huge freeze in phase 2) is gone and the general behavior and timings are again as it was after the wireless change. During most tests I did not get any SKB allocation errors during phase 2) or phase 3). However with .31-rc5, .31 and .31.1 I have had some tests where I would see a few SKB allocation errors during phase 3) (which is somewhat likely), but also during phase 2). At this point I'm unsure whether this is just noise, or maybe a minor influence from some change merged after .31-rc4. Looking through the commits there are several mm/page allocation changes. For now I suggest ignoring this though as the impact (if any) is very minor and it is not reproducible reliably enough. Next I'll retest Mel's patches and also test Reinette's patches. Cheers, FJP ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-18 23:33 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-18 23:33 UTC (permalink / raw) To: Mel Gorman Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm Another long mail, sorry. On Wednesday 14 October 2009, Frans Pop wrote: > > There still has not been a mm-change identified that makes > > fragmentation significantly worse. > > My bisection shows a very clear point, even if not an individual commit, > in the 'akpm' merge where SKB errors suddenly become *much* more > frequent and easy to trigger. > I'm sorry to say this, but the fact that nothing has been identified yet > is IMO the result of a lack of effort, not because there is no such > change. I was wrong. It turns out that I was creating the variations in the test results around the akpm merge myself by tiny changes in the way I ran the tests. It took another round of about 30 compilations and tests purely in this range to show that, but those same tests also made me aware of other patterns I should look at. Until a few days ago I was concentrating on "do I see SKB allocation errors or not". Since then I've also been looking more consciously at when they happen, at disk access patterns and at desktop freeze patterns. I think I did mention before that this whole issue is rather subtle :-/ So, my apologies for finguering the wrong area for so long, but it looked solid given the info available at the time. On Thursday 15 October 2009, Mel Gorman wrote: > Outside the range of commits suspected of causing problems was the > following. It's extremely low probability > > Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write confusion > This patch alters the call to congestion_wait() in the page > allocator. Frankly, I don't get the change but it might worth > checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31 > makes any difference This is the real culprit. Mel: thanks very much for looking beyond the area I identified. Your overview of mm changes was exactly what I needed and really helped a lot during my later tests. This commit definitely causes most of the problems; confirmed by reverting it on top of 2.6.31 (also requires reverting 373c0a7e, which is a later build fix). The rest of this mail gives details on my tests and how I reached the above conclusion. TEST BASELINE (2.6.30) ====================== I mentioned in an earlier mail that I run three instances of gitk for my tests. Loading gitk seems to consist of 3 phases: 1) general initial scan of the repository (branches?) 2) reading commits: commit counter increases 3) reading references (including bisection good/bad points) and uncommitted changes Below times and comments per stage when the test is run with 2.6.30. As my test starts after a clean boot, buffers are mostly empty. 1st instance: 'gitk v2.6.29..master' (preparation) 1) ~20 seconds; user interface is mostly blank 2) ~5 seconds to read 35.000 commits; user interface is updated and counter increases steadily as they are read 3) ~10 seconds; "branch"/"follows"/"precedes" info and tags are filled in; fairly heavy disk activity 2st instance: 'gitk master' (preparation) 1) 0 seconds (because data is already buffered) 2) ~25 seconds to read 167500 commits; counter increases steadily 3) 1-2 seconds (because data is already buffered) 3st instance: 'gitk master' (the actual test) 1) 0 seconds because data is already buffered 2) ~55 seconds due to swapping overhead; minor music skip around commit 110.000; counter slower after 90.000, some short halts, but generally increases steadily; moderate disk activity 3) ~55-60 seconds; because buffers have been emptied data must by read again, with swapping; very heavy disk activity; fairly long music skip (15-20 seconds), but no SKB allocation errors So, the loading of the 3rd instance takes 1.5 minutes longer than the second because of the swapping. And phase 3) is most affected by it. AFTER WIRELESS CHANGE ===================== After commit 4752c93c30 ("iwlcore: Allow skb allocation from tasklet") I start getting the SKB errors. They can be triggered reliably if the whole test is repeated 1 or 2 times, but generally not the first time the test is run. Or so I thought for a long time. It turns out that I will get SKB errors during the first run if I'm "sloppy" in the test execution. For example if I wait too long before switching from the last gitk instance to konsole where I have a 'tail -f /var/log/kern.log' running. Another factor is the state of the repository: do I have master checked out, or an older branch, or am I in the middle of a bisection. This influences how data is read from the disk and thus the test results. A last factor may be the size of the kernel I'm using: my test/bisect kernel is significantly smaller than my regular kernel. If the test is run completely cleanly, I will not get SKB errors during the first run. Also, this change does not affect the timings of the test at all: the total load time of the 3rd instance is still ~1:55 and music skips happen in roughly the same places. The pattern of disk activity also remains unchanged. If I do *not* run the test cleanly, any SKB errors during the first test run will always be during phase 3), never during phase 2). This is what I saw during tests in the 'akpm' range, and explains the inconsistent results there. After discovering this I've made a copy of the git repo so that I always test using the exact same state and tightened my test procedure. AFTER congestion_wait CHANGE ============================ If I test commit 9f2d8be, which is just before the congestion_wait() change, I still get the same pattern as described above. But when I test with 8aa7e84 ("Fix congestion_wait() sync/async vs read/write confusion"), things change dramatically when the 3rd gitk instance is started. During the 2nd phase I see the first SKB allocation errors with a music skip between reading commits 95.000 and 110.000. About commit 115.000 there is a very long pause during which the counter does not increase, music stops and the desktop freezes completely. The first 30 seconds of that freeze there is only very low disk activity (which seems strange); the next 25 seconds there suddenly is very high disk activity during which things gradually unfreeze and more SKB errors are displayed. After that the commit counter runs up fairly steadily again. Phase 2) ends at ~1:45. Phase 3) (with more SKB errors) ends at ~2:05. So this change almost doubles the time needed for phase 2) and causes SKB allocation errors to occur during that phase. Also, before this commit the desktop freezes are much shorter and less severe. With this change the desktop is completely unusable for almost a minute during phase 2), with even the mouse pointer frozen solid. Note that phase 3) becomes shorter, but that the total time needed to load the 3rd instance increases by about 10-15 seconds. Note: -rc2 and -rc3 had broken NFS, so I had to cherry-pick 3 NFS commits from -rc4 on top of the commits I wanted to test. WITH congestion_wait CHANGE REVERTED ==================================== I've done quite a few tests of 2.6.31 with 373c0a7e and 8aa7e847 reverted to confirm that's really the culprit. I've done this for .31-rc3, .31-rc4, .31-rc5, .31 and .31.1. In all cases the huge freeze in phase 2) is gone and the general behavior and timings are again as it was after the wireless change. During most tests I did not get any SKB allocation errors during phase 2) or phase 3). However with .31-rc5, .31 and .31.1 I have had some tests where I would see a few SKB allocation errors during phase 3) (which is somewhat likely), but also during phase 2). At this point I'm unsure whether this is just noise, or maybe a minor influence from some change merged after .31-rc4. Looking through the commits there are several mm/page allocation changes. For now I suggest ignoring this though as the impact (if any) is very minor and it is not reproducible reliably enough. Next I'll retest Mel's patches and also test Reinette's patches. Cheers, FJP -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-18 23:33 ` Frans Pop @ 2009-10-19 0:36 ` Pekka Enberg -1 siblings, 0 replies; 384+ messages in thread From: Pekka Enberg @ 2009-10-19 0:36 UTC (permalink / raw) To: Frans Pop Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe (Adding Jens to CC.) On Wednesday 14 October 2009, Frans Pop wrote: > > > There still has not been a mm-change identified that makes > > > fragmentation significantly worse. On Mon, 2009-10-19 at 01:33 +0200, Frans Pop wrote: > > My bisection shows a very clear point, even if not an individual commit, > > in the 'akpm' merge where SKB errors suddenly become *much* more > > frequent and easy to trigger. > > I'm sorry to say this, but the fact that nothing has been identified yet > > is IMO the result of a lack of effort, not because there is no such > > change. > > I was wrong. It turns out that I was creating the variations in the test > results around the akpm merge myself by tiny changes in the way I ran the > tests. It took another round of about 30 compilations and tests purely in > this range to show that, but those same tests also made me aware of other > patterns I should look at. > > Until a few days ago I was concentrating on "do I see SKB allocation errors > or not". Since then I've also been looking more consciously at when they > happen, at disk access patterns and at desktop freeze patterns. > > I think I did mention before that this whole issue is rather subtle :-/ > So, my apologies for finguering the wrong area for so long, but it looked > solid given the info available at the time. > > On Thursday 15 October 2009, Mel Gorman wrote: > > Outside the range of commits suspected of causing problems was the > > following. It's extremely low probability > > > > Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write confusion > > This patch alters the call to congestion_wait() in the page > > allocator. Frankly, I don't get the change but it might worth > > checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31 > > makes any difference > > This is the real culprit. Mel: thanks very much for looking beyond the > area I identified. Your overview of mm changes was exactly what I needed > and really helped a lot during my later tests. > > This commit definitely causes most of the problems; confirmed by reverting > it on top of 2.6.31 (also requires reverting 373c0a7e, which is a later > build fix). Mel/Jens, any ideas why commit 8aa7e84 makes us run out of high order pages? Should we be using BLK_RW_SYNC in mm/page_alloc.c instead of BLK_RW_ASYNC? Pekka diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bf72055..fa8380a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1727,7 +1727,7 @@ __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order, preferred_zone, migratetype); if (!page && gfp_mask & __GFP_NOFAIL) - congestion_wait(BLK_RW_ASYNC, HZ/50); + congestion_wait(BLK_RW_SYNC, HZ/50); } while (!page && (gfp_mask & __GFP_NOFAIL)); return page; @@ -1898,7 +1898,7 @@ rebalance: pages_reclaimed += did_some_progress; if (should_alloc_retry(gfp_mask, order, pages_reclaimed)) { /* Wait for some write requests to complete then retry */ - congestion_wait(BLK_RW_ASYNC, HZ/50); + congestion_wait(BLK_RW_SYNC, HZ/50); goto rebalance; } ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-19 0:36 ` Pekka Enberg 0 siblings, 0 replies; 384+ messages in thread From: Pekka Enberg @ 2009-10-19 0:36 UTC (permalink / raw) To: Frans Pop Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe (Adding Jens to CC.) On Wednesday 14 October 2009, Frans Pop wrote: > > > There still has not been a mm-change identified that makes > > > fragmentation significantly worse. On Mon, 2009-10-19 at 01:33 +0200, Frans Pop wrote: > > My bisection shows a very clear point, even if not an individual commit, > > in the 'akpm' merge where SKB errors suddenly become *much* more > > frequent and easy to trigger. > > I'm sorry to say this, but the fact that nothing has been identified yet > > is IMO the result of a lack of effort, not because there is no such > > change. > > I was wrong. It turns out that I was creating the variations in the test > results around the akpm merge myself by tiny changes in the way I ran the > tests. It took another round of about 30 compilations and tests purely in > this range to show that, but those same tests also made me aware of other > patterns I should look at. > > Until a few days ago I was concentrating on "do I see SKB allocation errors > or not". Since then I've also been looking more consciously at when they > happen, at disk access patterns and at desktop freeze patterns. > > I think I did mention before that this whole issue is rather subtle :-/ > So, my apologies for finguering the wrong area for so long, but it looked > solid given the info available at the time. > > On Thursday 15 October 2009, Mel Gorman wrote: > > Outside the range of commits suspected of causing problems was the > > following. It's extremely low probability > > > > Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write confusion > > This patch alters the call to congestion_wait() in the page > > allocator. Frankly, I don't get the change but it might worth > > checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31 > > makes any difference > > This is the real culprit. Mel: thanks very much for looking beyond the > area I identified. Your overview of mm changes was exactly what I needed > and really helped a lot during my later tests. > > This commit definitely causes most of the problems; confirmed by reverting > it on top of 2.6.31 (also requires reverting 373c0a7e, which is a later > build fix). Mel/Jens, any ideas why commit 8aa7e84 makes us run out of high order pages? Should we be using BLK_RW_SYNC in mm/page_alloc.c instead of BLK_RW_ASYNC? Pekka diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bf72055..fa8380a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1727,7 +1727,7 @@ __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order, preferred_zone, migratetype); if (!page && gfp_mask & __GFP_NOFAIL) - congestion_wait(BLK_RW_ASYNC, HZ/50); + congestion_wait(BLK_RW_SYNC, HZ/50); } while (!page && (gfp_mask & __GFP_NOFAIL)); return page; @@ -1898,7 +1898,7 @@ rebalance: pages_reclaimed += did_some_progress; if (should_alloc_retry(gfp_mask, order, pages_reclaimed)) { /* Wait for some write requests to complete then retry */ - congestion_wait(BLK_RW_ASYNC, HZ/50); + congestion_wait(BLK_RW_SYNC, HZ/50); goto rebalance; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-19 0:36 ` Pekka Enberg (?) @ 2009-10-19 2:44 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-19 2:44 UTC (permalink / raw) To: Pekka Enberg Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Monday 19 October 2009, Pekka Enberg wrote: > On Wednesday 14 October 2009, Frans Pop wrote: > > On Thursday 15 October 2009, Mel Gorman wrote: > > > Outside the range of commits suspected of causing problems was the > > > following. It's extremely low probability > > > > > > Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write > > > confusion This patch alters the call to congestion_wait() in the > > > page allocator. Frankly, I don't get the change but it might worth > > > checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31 makes > > > any difference > > > > This is the real culprit. Mel: thanks very much for looking beyond the > > area I identified. Your overview of mm changes was exactly what I > > needed and really helped a lot during my later tests. > > > > This commit definitely causes most of the problems; confirmed by > > reverting it on top of 2.6.31 (also requires reverting 373c0a7e, which > > is a later build fix). > > Mel/Jens, any ideas why commit 8aa7e84 makes us run out of high order > pages? Should we be using BLK_RW_SYNC in mm/page_alloc.c instead of > BLK_RW_ASYNC? I'm starting to think that this commit may not be directly related to high order allocation failures. The fact that I'm seeing SKB allocation failures earlier because of this commit could be just a side effect. It could be that instead the main impact of this commit is on encrypted file system and/or encrypted swap (kcryptd). Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm only reading from NFS that's unlikely). Reason for thinking this is that reverting it makes no difference for Karol [1]. It will be interesting to see if it does make a difference for Sven Geggus [2]. /me wonders if we'll ever get to the bottom of this... [1] http://lkml.org/lkml/2009/10/18/138 [2] http://lkml.org/lkml/2009/10/17/113 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-19 2:44 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-19 2:44 UTC (permalink / raw) To: Pekka Enberg Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA On Monday 19 October 2009, Pekka Enberg wrote: > On Wednesday 14 October 2009, Frans Pop wrote: > > On Thursday 15 October 2009, Mel Gorman wrote: > > > Outside the range of commits suspected of causing problems was the > > > following. It's extremely low probability > > > > > > Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write > > > confusion This patch alters the call to congestion_wait() in the > > > page allocator. Frankly, I don't get the change but it might worth > > > checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31 makes > > > any difference > > > > This is the real culprit. Mel: thanks very much for looking beyond the > > area I identified. Your overview of mm changes was exactly what I > > needed and really helped a lot during my later tests. > > > > This commit definitely causes most of the problems; confirmed by > > reverting it on top of 2.6.31 (also requires reverting 373c0a7e, which > > is a later build fix). > > Mel/Jens, any ideas why commit 8aa7e84 makes us run out of high order > pages? Should we be using BLK_RW_SYNC in mm/page_alloc.c instead of > BLK_RW_ASYNC? I'm starting to think that this commit may not be directly related to high order allocation failures. The fact that I'm seeing SKB allocation failures earlier because of this commit could be just a side effect. It could be that instead the main impact of this commit is on encrypted file system and/or encrypted swap (kcryptd). Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm only reading from NFS that's unlikely). Reason for thinking this is that reverting it makes no difference for Karol [1]. It will be interesting to see if it does make a difference for Sven Geggus [2]. /me wonders if we'll ever get to the bottom of this... [1] http://lkml.org/lkml/2009/10/18/138 [2] http://lkml.org/lkml/2009/10/17/113 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-19 2:44 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-19 2:44 UTC (permalink / raw) To: Pekka Enberg Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Monday 19 October 2009, Pekka Enberg wrote: > On Wednesday 14 October 2009, Frans Pop wrote: > > On Thursday 15 October 2009, Mel Gorman wrote: > > > Outside the range of commits suspected of causing problems was the > > > following. It's extremely low probability > > > > > > Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write > > > confusion This patch alters the call to congestion_wait() in the > > > page allocator. Frankly, I don't get the change but it might worth > > > checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31 makes > > > any difference > > > > This is the real culprit. Mel: thanks very much for looking beyond the > > area I identified. Your overview of mm changes was exactly what I > > needed and really helped a lot during my later tests. > > > > This commit definitely causes most of the problems; confirmed by > > reverting it on top of 2.6.31 (also requires reverting 373c0a7e, which > > is a later build fix). > > Mel/Jens, any ideas why commit 8aa7e84 makes us run out of high order > pages? Should we be using BLK_RW_SYNC in mm/page_alloc.c instead of > BLK_RW_ASYNC? I'm starting to think that this commit may not be directly related to high order allocation failures. The fact that I'm seeing SKB allocation failures earlier because of this commit could be just a side effect. It could be that instead the main impact of this commit is on encrypted file system and/or encrypted swap (kcryptd). Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm only reading from NFS that's unlikely). Reason for thinking this is that reverting it makes no difference for Karol [1]. It will be interesting to see if it does make a difference for Sven Geggus [2]. /me wonders if we'll ever get to the bottom of this... [1] http://lkml.org/lkml/2009/10/18/138 [2] http://lkml.org/lkml/2009/10/17/113 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-19 2:44 ` Frans Pop @ 2009-10-19 9:49 ` Tobi Oetiker -1 siblings, 0 replies; 384+ messages in thread From: Tobi Oetiker @ 2009-10-19 9:49 UTC (permalink / raw) To: Frans Pop Cc: Pekka Enberg, Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Today Frans Pop wrote: > > I'm starting to think that this commit may not be directly related to high > order allocation failures. The fact that I'm seeing SKB allocation > failures earlier because of this commit could be just a side effect. > It could be that instead the main impact of this commit is on encrypted > file system and/or encrypted swap (kcryptd). > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm > only reading from NFS that's unlikely). I have updated a fileserver to 2.6.31 today and I see page allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). So I guess the problem must be quite generic: Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] Oct 19 08:59:16 johan kernel: [30120.685647] __ratelimit: 13 callbacks suppressed [kern.warning] Oct 19 08:59:16 johan kernel: [30120.685654] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] Oct 19 08:59:16 johan kernel: [30120.685660] Pid: 6071, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] Oct 19 08:59:16 johan kernel: [30120.685663] Call Trace: [kern.warning] Oct 19 08:59:16 johan kernel: [30120.685666] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] Oct 19 09:36:31 johan kernel: [32355.708345] __ratelimit: 16 callbacks suppressed [kern.warning] Oct 19 09:36:31 johan kernel: [32355.708352] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] Oct 19 09:36:31 johan kernel: [32355.708358] Pid: 6087, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] Oct 19 09:36:31 johan kernel: [32355.708361] Call Trace: [kern.warning] Oct 19 09:36:31 johan kernel: [32355.708364] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] Oct 19 10:52:01 johan kernel: [36885.358312] __ratelimit: 31 callbacks suppressed [kern.warning] Oct 19 10:52:01 johan kernel: [36885.358319] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] Oct 19 10:52:01 johan kernel: [36885.358325] Pid: 6057, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] Oct 19 10:52:01 johan kernel: [36885.358327] Call Trace: [kern.warning] Oct 19 10:52:01 johan kernel: [36885.358331] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] Oct 19 11:12:01 johan kernel: [38085.163831] events/3: page allocation failure. order:5, mode:0x4020 [kern.warning] Oct 19 11:12:01 johan kernel: [38085.163840] Pid: 18, comm: events/3 Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] Oct 19 11:12:01 johan kernel: [38085.163843] Call Trace: [kern.warning] Oct 19 11:12:01 johan kernel: [38085.163846] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-19 9:49 ` Tobi Oetiker 0 siblings, 0 replies; 384+ messages in thread From: Tobi Oetiker @ 2009-10-19 9:49 UTC (permalink / raw) To: Frans Pop Cc: Pekka Enberg, Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Today Frans Pop wrote: > > I'm starting to think that this commit may not be directly related to high > order allocation failures. The fact that I'm seeing SKB allocation > failures earlier because of this commit could be just a side effect. > It could be that instead the main impact of this commit is on encrypted > file system and/or encrypted swap (kcryptd). > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm > only reading from NFS that's unlikely). I have updated a fileserver to 2.6.31 today and I see page allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). So I guess the problem must be quite generic: Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] Oct 19 08:59:16 johan kernel: [30120.685647] __ratelimit: 13 callbacks suppressed [kern.warning] Oct 19 08:59:16 johan kernel: [30120.685654] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] Oct 19 08:59:16 johan kernel: [30120.685660] Pid: 6071, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] Oct 19 08:59:16 johan kernel: [30120.685663] Call Trace: [kern.warning] Oct 19 08:59:16 johan kernel: [30120.685666] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] Oct 19 09:36:31 johan kernel: [32355.708345] __ratelimit: 16 callbacks suppressed [kern.warning] Oct 19 09:36:31 johan kernel: [32355.708352] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] Oct 19 09:36:31 johan kernel: [32355.708358] Pid: 6087, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] Oct 19 09:36:31 johan kernel: [32355.708361] Call Trace: [kern.warning] Oct 19 09:36:31 johan kernel: [32355.708364] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] Oct 19 10:52:01 johan kernel: [36885.358312] __ratelimit: 31 callbacks suppressed [kern.warning] Oct 19 10:52:01 johan kernel: [36885.358319] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] Oct 19 10:52:01 johan kernel: [36885.358325] Pid: 6057, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] Oct 19 10:52:01 johan kernel: [36885.358327] Call Trace: [kern.warning] Oct 19 10:52:01 johan kernel: [36885.358331] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] Oct 19 11:12:01 johan kernel: [38085.163831] events/3: page allocation failure. order:5, mode:0x4020 [kern.warning] Oct 19 11:12:01 johan kernel: [38085.163840] Pid: 18, comm: events/3 Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] Oct 19 11:12:01 johan kernel: [38085.163843] Call Trace: [kern.warning] Oct 19 11:12:01 johan kernel: [38085.163846] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-19 9:49 ` Tobi Oetiker (?) @ 2009-10-19 9:54 ` Pekka Enberg -1 siblings, 0 replies; 384+ messages in thread From: Pekka Enberg @ 2009-10-19 9:54 UTC (permalink / raw) To: Tobi Oetiker Cc: Frans Pop, Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Mon, 2009-10-19 at 11:49 +0200, Tobi Oetiker wrote: > Today Frans Pop wrote: > > > > > I'm starting to think that this commit may not be directly related to high > > order allocation failures. The fact that I'm seeing SKB allocation > > failures earlier because of this commit could be just a side effect. > > It could be that instead the main impact of this commit is on encrypted > > file system and/or encrypted swap (kcryptd). > > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm > > only reading from NFS that's unlikely). > > I have updated a fileserver to 2.6.31 today and I see page > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > So I guess the problem must be quite generic: Yup, it almost certainly is. Does this patch help? http://lkml.org/lkml/2009/10/16/89 Frans, did you ever get around retesting with just the above patch applied? Pekka > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > Oct 19 08:59:16 johan kernel: [30120.685647] __ratelimit: 13 callbacks suppressed [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685654] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685660] Pid: 6071, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685663] Call Trace: [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685666] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 09:36:31 johan kernel: [32355.708345] __ratelimit: 16 callbacks suppressed [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708352] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708358] Pid: 6087, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708361] Call Trace: [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708364] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 10:52:01 johan kernel: [36885.358312] __ratelimit: 31 callbacks suppressed [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358319] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358325] Pid: 6057, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358327] Call Trace: [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358331] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 11:12:01 johan kernel: [38085.163831] events/3: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 11:12:01 johan kernel: [38085.163840] Pid: 18, comm: events/3 Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 11:12:01 johan kernel: [38085.163843] Call Trace: [kern.warning] > Oct 19 11:12:01 johan kernel: [38085.163846] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-19 9:54 ` Pekka Enberg 0 siblings, 0 replies; 384+ messages in thread From: Pekka Enberg @ 2009-10-19 9:54 UTC (permalink / raw) To: Tobi Oetiker Cc: Frans Pop, Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA On Mon, 2009-10-19 at 11:49 +0200, Tobi Oetiker wrote: > Today Frans Pop wrote: > > > > > I'm starting to think that this commit may not be directly related to high > > order allocation failures. The fact that I'm seeing SKB allocation > > failures earlier because of this commit could be just a side effect. > > It could be that instead the main impact of this commit is on encrypted > > file system and/or encrypted swap (kcryptd). > > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm > > only reading from NFS that's unlikely). > > I have updated a fileserver to 2.6.31 today and I see page > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > So I guess the problem must be quite generic: Yup, it almost certainly is. Does this patch help? http://lkml.org/lkml/2009/10/16/89 Frans, did you ever get around retesting with just the above patch applied? Pekka > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > Oct 19 08:59:16 johan kernel: [30120.685647] __ratelimit: 13 callbacks suppressed [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685654] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685660] Pid: 6071, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685663] Call Trace: [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685666] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 09:36:31 johan kernel: [32355.708345] __ratelimit: 16 callbacks suppressed [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708352] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708358] Pid: 6087, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708361] Call Trace: [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708364] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 10:52:01 johan kernel: [36885.358312] __ratelimit: 31 callbacks suppressed [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358319] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358325] Pid: 6057, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358327] Call Trace: [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358331] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 11:12:01 johan kernel: [38085.163831] events/3: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 11:12:01 johan kernel: [38085.163840] Pid: 18, comm: events/3 Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 11:12:01 johan kernel: [38085.163843] Call Trace: [kern.warning] > Oct 19 11:12:01 johan kernel: [38085.163846] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-19 9:54 ` Pekka Enberg 0 siblings, 0 replies; 384+ messages in thread From: Pekka Enberg @ 2009-10-19 9:54 UTC (permalink / raw) To: Tobi Oetiker Cc: Frans Pop, Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Mon, 2009-10-19 at 11:49 +0200, Tobi Oetiker wrote: > Today Frans Pop wrote: > > > > > I'm starting to think that this commit may not be directly related to high > > order allocation failures. The fact that I'm seeing SKB allocation > > failures earlier because of this commit could be just a side effect. > > It could be that instead the main impact of this commit is on encrypted > > file system and/or encrypted swap (kcryptd). > > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm > > only reading from NFS that's unlikely). > > I have updated a fileserver to 2.6.31 today and I see page > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > So I guess the problem must be quite generic: Yup, it almost certainly is. Does this patch help? http://lkml.org/lkml/2009/10/16/89 Frans, did you ever get around retesting with just the above patch applied? Pekka > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > Oct 19 08:59:16 johan kernel: [30120.685647] __ratelimit: 13 callbacks suppressed [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685654] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685660] Pid: 6071, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685663] Call Trace: [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685666] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 09:36:31 johan kernel: [32355.708345] __ratelimit: 16 callbacks suppressed [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708352] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708358] Pid: 6087, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708361] Call Trace: [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708364] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 10:52:01 johan kernel: [36885.358312] __ratelimit: 31 callbacks suppressed [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358319] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358325] Pid: 6057, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358327] Call Trace: [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358331] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 11:12:01 johan kernel: [38085.163831] events/3: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 11:12:01 johan kernel: [38085.163840] Pid: 18, comm: events/3 Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 11:12:01 johan kernel: [38085.163843] Call Trace: [kern.warning] > Oct 19 11:12:01 johan kernel: [38085.163846] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-19 9:54 ` Pekka Enberg @ 2009-10-19 14:01 ` Karol Lewandowski -1 siblings, 0 replies; 384+ messages in thread From: Karol Lewandowski @ 2009-10-19 14:01 UTC (permalink / raw) To: Pekka Enberg Cc: Tobi Oetiker, Frans Pop, Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Mon, Oct 19, 2009 at 12:54:11PM +0300, Pekka Enberg wrote: > On Mon, 2009-10-19 at 11:49 +0200, Tobi Oetiker wrote: > > I have updated a fileserver to 2.6.31 today and I see page > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > > So I guess the problem must be quite generic: > > Yup, it almost certainly is. Does this patch help? > > http://lkml.org/lkml/2009/10/16/89 This patch seems to help in some cases. Before applying this patch I was able to trigger alloc failures on different machine by booting kernel with "mem=256MB" and doing: $ gitk on-full-tree & # rmmod e100 ... wait for few MBs in swap # modprobe e100; ifup --force ethX So here this patch helped -- with it, I was unable to trigger page allocation failures (testing was short, tough). However, as I said here[1], I applied both of Mel's patches (including this one) and that didn't help my orginal issue (failures after suspend). [1] http://lkml.org/lkml/2009/10/17/109 Thanks. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-19 14:01 ` Karol Lewandowski 0 siblings, 0 replies; 384+ messages in thread From: Karol Lewandowski @ 2009-10-19 14:01 UTC (permalink / raw) To: Pekka Enberg Cc: Tobi Oetiker, Frans Pop, Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Mon, Oct 19, 2009 at 12:54:11PM +0300, Pekka Enberg wrote: > On Mon, 2009-10-19 at 11:49 +0200, Tobi Oetiker wrote: > > I have updated a fileserver to 2.6.31 today and I see page > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > > So I guess the problem must be quite generic: > > Yup, it almost certainly is. Does this patch help? > > http://lkml.org/lkml/2009/10/16/89 This patch seems to help in some cases. Before applying this patch I was able to trigger alloc failures on different machine by booting kernel with "mem=256MB" and doing: $ gitk on-full-tree & # rmmod e100 ... wait for few MBs in swap # modprobe e100; ifup --force ethX So here this patch helped -- with it, I was unable to trigger page allocation failures (testing was short, tough). However, as I said here[1], I applied both of Mel's patches (including this one) and that didn't help my orginal issue (failures after suspend). [1] http://lkml.org/lkml/2009/10/17/109 Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-19 14:01 ` Karol Lewandowski (?) @ 2009-10-19 14:06 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-19 14:06 UTC (permalink / raw) To: Karol Lewandowski Cc: Pekka Enberg, Tobi Oetiker, Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Mon, Oct 19, 2009 at 04:01:45PM +0200, Karol Lewandowski wrote: > On Mon, Oct 19, 2009 at 12:54:11PM +0300, Pekka Enberg wrote: > > On Mon, 2009-10-19 at 11:49 +0200, Tobi Oetiker wrote: > > > I have updated a fileserver to 2.6.31 today and I see page > > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > > > So I guess the problem must be quite generic: > > > > Yup, it almost certainly is. Does this patch help? > > > > http://lkml.org/lkml/2009/10/16/89 > > This patch seems to help in some cases. Before applying this patch I > was able to trigger alloc failures on different machine by booting > kernel with "mem=256MB" and doing: > > $ gitk on-full-tree & > # rmmod e100 > ... wait for few MBs in swap > # modprobe e100; ifup --force ethX > > So here this patch helped -- with it, I was unable to trigger page > allocation failures (testing was short, tough). However, as I said > here[1], I applied both of Mel's patches (including this one) and that > didn't help my orginal issue (failures after suspend). > > [1] http://lkml.org/lkml/2009/10/17/109 > Can you test with my kswapd patch applied and commits 373c0a7e,8aa7e847 reverted please? -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-19 14:06 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-19 14:06 UTC (permalink / raw) To: Karol Lewandowski Cc: Pekka Enberg, Tobi Oetiker, Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA On Mon, Oct 19, 2009 at 04:01:45PM +0200, Karol Lewandowski wrote: > On Mon, Oct 19, 2009 at 12:54:11PM +0300, Pekka Enberg wrote: > > On Mon, 2009-10-19 at 11:49 +0200, Tobi Oetiker wrote: > > > I have updated a fileserver to 2.6.31 today and I see page > > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > > > So I guess the problem must be quite generic: > > > > Yup, it almost certainly is. Does this patch help? > > > > http://lkml.org/lkml/2009/10/16/89 > > This patch seems to help in some cases. Before applying this patch I > was able to trigger alloc failures on different machine by booting > kernel with "mem=256MB" and doing: > > $ gitk on-full-tree & > # rmmod e100 > ... wait for few MBs in swap > # modprobe e100; ifup --force ethX > > So here this patch helped -- with it, I was unable to trigger page > allocation failures (testing was short, tough). However, as I said > here[1], I applied both of Mel's patches (including this one) and that > didn't help my orginal issue (failures after suspend). > > [1] http://lkml.org/lkml/2009/10/17/109 > Can you test with my kswapd patch applied and commits 373c0a7e,8aa7e847 reverted please? -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-19 14:06 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-19 14:06 UTC (permalink / raw) To: Karol Lewandowski Cc: Pekka Enberg, Tobi Oetiker, Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Mon, Oct 19, 2009 at 04:01:45PM +0200, Karol Lewandowski wrote: > On Mon, Oct 19, 2009 at 12:54:11PM +0300, Pekka Enberg wrote: > > On Mon, 2009-10-19 at 11:49 +0200, Tobi Oetiker wrote: > > > I have updated a fileserver to 2.6.31 today and I see page > > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > > > So I guess the problem must be quite generic: > > > > Yup, it almost certainly is. Does this patch help? > > > > http://lkml.org/lkml/2009/10/16/89 > > This patch seems to help in some cases. Before applying this patch I > was able to trigger alloc failures on different machine by booting > kernel with "mem=256MB" and doing: > > $ gitk on-full-tree & > # rmmod e100 > ... wait for few MBs in swap > # modprobe e100; ifup --force ethX > > So here this patch helped -- with it, I was unable to trigger page > allocation failures (testing was short, tough). However, as I said > here[1], I applied both of Mel's patches (including this one) and that > didn't help my orginal issue (failures after suspend). > > [1] http://lkml.org/lkml/2009/10/17/109 > Can you test with my kswapd patch applied and commits 373c0a7e,8aa7e847 reverted please? -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-19 14:06 ` Mel Gorman @ 2009-10-19 17:09 ` Karol Lewandowski -1 siblings, 0 replies; 384+ messages in thread From: Karol Lewandowski @ 2009-10-19 17:09 UTC (permalink / raw) To: Mel Gorman Cc: Karol Lewandowski, Pekka Enberg, Tobi Oetiker, Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Mon, Oct 19, 2009 at 03:06:19PM +0100, Mel Gorman wrote: > Can you test with my kswapd patch applied and commits 373c0a7e,8aa7e847 > reverted please? It seems that your patch and Frans' reverts together *do* make difference. With these patches I haven't been able to trigger failures so far (in about 6 attempts). I'll continue testing and let you know if anything changes. If nothing changes this looks like fix for my problem. Thanks. Thanks a lot! ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-19 17:09 ` Karol Lewandowski 0 siblings, 0 replies; 384+ messages in thread From: Karol Lewandowski @ 2009-10-19 17:09 UTC (permalink / raw) To: Mel Gorman Cc: Karol Lewandowski, Pekka Enberg, Tobi Oetiker, Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Mon, Oct 19, 2009 at 03:06:19PM +0100, Mel Gorman wrote: > Can you test with my kswapd patch applied and commits 373c0a7e,8aa7e847 > reverted please? It seems that your patch and Frans' reverts together *do* make difference. With these patches I haven't been able to trigger failures so far (in about 6 attempts). I'll continue testing and let you know if anything changes. If nothing changes this looks like fix for my problem. Thanks. Thanks a lot! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-19 17:09 ` Karol Lewandowski @ 2009-10-20 1:47 ` Karol Lewandowski -1 siblings, 0 replies; 384+ messages in thread From: Karol Lewandowski @ 2009-10-20 1:47 UTC (permalink / raw) To: Karol Lewandowski Cc: Mel Gorman, Pekka Enberg, Tobi Oetiker, Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Mon, Oct 19, 2009 at 07:09:47PM +0200, Karol Lewandowski wrote: > On Mon, Oct 19, 2009 at 03:06:19PM +0100, Mel Gorman wrote: > > Can you test with my kswapd patch applied and commits 373c0a7e,8aa7e847 > > reverted please? > > It seems that your patch and Frans' reverts together *do* make > difference. > > With these patches I haven't been able to trigger failures so far > (in about 6 attempts). I'll continue testing and let you know if > anything changes. Damn it. I'm sorry to inform you that yes, I still get failures (less often, but still). Thanks. e100: Intel(R) PRO/100 Network Driver, 3.5.24-k2-NAPI e100: Copyright(c) 1999-2006 Intel Corporation e100 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 9 (level, low) -> IRQ 9 e100 0000:00:03.0: PME# disabled e100: eth0: e100_probe: addr 0xe8120000, irq 9, MAC addr 00:10:a4:89:e8:84 ifconfig: page allocation failure. order:5, mode:0x8020 Pid: 5151, comm: ifconfig Not tainted 2.6.31+frans2+mel-00002-g90702f9-dirty #2 Call Trace: [<c015c4e1>] ? __alloc_pages_nodemask+0x423/0x468 [<c0104de7>] ? dma_generic_alloc_coherent+0x4a/0xab [<c0104d9d>] ? dma_generic_alloc_coherent+0x0/0xab [<d1614b6f>] ? e100_alloc_cbs+0xc7/0x174 [e100] [<d1615bfe>] ? e100_up+0x1b/0xf5 [e100] [<d1615cef>] ? e100_open+0x17/0x41 [e100] [<c02f871f>] ? dev_open+0x8f/0xc5 [<c02f7ed9>] ? dev_change_flags+0xa2/0x155 [<c032daa6>] ? devinet_ioctl+0x22a/0x51c [<c02ebabe>] ? sock_ioctl+0x0/0x1e4 [<c02ebc7e>] ? sock_ioctl+0x1c0/0x1e4 [<c02ebabe>] ? sock_ioctl+0x0/0x1e4 [<c017f23a>] ? vfs_ioctl+0x16/0x4a [<c017fb01>] ? do_vfs_ioctl+0x48a/0x4c1 [<c0168137>] ? handle_mm_fault+0x1e0/0x42c [<c0348c6b>] ? do_page_fault+0x2ce/0x2e4 [<c017fb64>] ? sys_ioctl+0x2c/0x42 [<c0102748>] ? sysenter_do_call+0x12/0x26 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 Normal per-cpu: CPU 0: hi: 90, btch: 15 usd: 35 Active_anon:14778 active_file:10836 inactive_anon:22033 inactive_file:11854 unevictable:0 dirty:6 writeback:0 unstable:0 free:1031 slab:2083 mapped:6193 pagetables:417 bounce:0 DMA free:1096kB min:124kB low:152kB high:184kB active_anon:528kB inactive_anon:3440kB active_file:1076kB inactive_file:5580kB unevictable:0kB present:15868kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 238 238 Normal free:3028kB min:1908kB low:2384kB high:2860kB active_anon:58584kB inactive_anon:84692kB active_file:42268kB inactive_file:41836kB unevictable:0kB present:243776kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 DMA: 46*4kB 0*8kB 5*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1096kB Normal: 135*4kB 213*8kB 21*16kB 4*32kB 5*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3028kB 25927 total pagecache pages 3010 pages in swap cache Swap cache stats: add 205613, delete 202603, find 63665/79800 Free swap = 485236kB Total swap = 514040kB 65520 pages RAM 1663 pages reserved 14633 pages shared 52919 pages non-shared ifconfig: page allocation failure. order:5, mode:0x8020 Pid: 5151, comm: ifconfig Not tainted 2.6.31+frans2+mel-00002-g90702f9-dirty #2 Call Trace: [<c015c4e1>] ? __alloc_pages_nodemask+0x423/0x468 [<c0104de7>] ? dma_generic_alloc_coherent+0x4a/0xab [<c0104d9d>] ? dma_generic_alloc_coherent+0x0/0xab [<d1614b6f>] ? e100_alloc_cbs+0xc7/0x174 [e100] [<d1615bfe>] ? e100_up+0x1b/0xf5 [e100] [<d1615cef>] ? e100_open+0x17/0x41 [e100] [<c02f871f>] ? dev_open+0x8f/0xc5 [<c02f7ed9>] ? dev_change_flags+0xa2/0x155 [<c032daa6>] ? devinet_ioctl+0x22a/0x51c [<c02ebabe>] ? sock_ioctl+0x0/0x1e4 [<c02ebc7e>] ? sock_ioctl+0x1c0/0x1e4 [<c02ebabe>] ? sock_ioctl+0x0/0x1e4 [<c017f23a>] ? vfs_ioctl+0x16/0x4a [<c017fb01>] ? do_vfs_ioctl+0x48a/0x4c1 [<c0175fd1>] ? vfs_write+0xf4/0x105 [<c017fb64>] ? sys_ioctl+0x2c/0x42 [<c0102748>] ? sysenter_do_call+0x12/0x26 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 Normal per-cpu: CPU 0: hi: 90, btch: 15 usd: 67 Active_anon:14760 active_file:10798 inactive_anon:22052 inactive_file:11862 unevictable:0 dirty:6 writeback:30 unstable:0 free:1031 slab:2083 mapped:6187 pagetables:417 bounce:0 DMA free:1096kB min:124kB low:152kB high:184kB active_anon:528kB inactive_anon:3440kB active_file:1076kB inactive_file:5580kB unevictable:0kB present:15868kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 238 238 Normal free:3028kB min:1908kB low:2384kB high:2860kB active_anon:58512kB inactive_anon:84768kB active_file:42116kB inactive_file:41868kB unevictable:0kB present:243776kB pages_scanned:100 all_unreclaimable? no lowmem_reserve[]: 0 0 0 DMA: 46*4kB 0*8kB 5*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1096kB Normal: 135*4kB 213*8kB 21*16kB 4*32kB 5*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3028kB 25924 total pagecache pages 3037 pages in swap cache Swap cache stats: add 205644, delete 202607, find 63666/79802 Free swap = 485116kB Total swap = 514040kB 65520 pages RAM 1663 pages reserved 14638 pages shared 52896 pages non-shared e100 0000:00:03.0: firmware: requesting e100/d101s_ucode.bin ADDRCONF(NETDEV_UP): eth0: link is not ready e100: eth0 NIC Link is Up 100 Mbps Full Duplex ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready eth0: no IPv6 routers present ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-20 1:47 ` Karol Lewandowski 0 siblings, 0 replies; 384+ messages in thread From: Karol Lewandowski @ 2009-10-20 1:47 UTC (permalink / raw) To: Karol Lewandowski Cc: Mel Gorman, Pekka Enberg, Tobi Oetiker, Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Mon, Oct 19, 2009 at 07:09:47PM +0200, Karol Lewandowski wrote: > On Mon, Oct 19, 2009 at 03:06:19PM +0100, Mel Gorman wrote: > > Can you test with my kswapd patch applied and commits 373c0a7e,8aa7e847 > > reverted please? > > It seems that your patch and Frans' reverts together *do* make > difference. > > With these patches I haven't been able to trigger failures so far > (in about 6 attempts). I'll continue testing and let you know if > anything changes. Damn it. I'm sorry to inform you that yes, I still get failures (less often, but still). Thanks. e100: Intel(R) PRO/100 Network Driver, 3.5.24-k2-NAPI e100: Copyright(c) 1999-2006 Intel Corporation e100 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 9 (level, low) -> IRQ 9 e100 0000:00:03.0: PME# disabled e100: eth0: e100_probe: addr 0xe8120000, irq 9, MAC addr 00:10:a4:89:e8:84 ifconfig: page allocation failure. order:5, mode:0x8020 Pid: 5151, comm: ifconfig Not tainted 2.6.31+frans2+mel-00002-g90702f9-dirty #2 Call Trace: [<c015c4e1>] ? __alloc_pages_nodemask+0x423/0x468 [<c0104de7>] ? dma_generic_alloc_coherent+0x4a/0xab [<c0104d9d>] ? dma_generic_alloc_coherent+0x0/0xab [<d1614b6f>] ? e100_alloc_cbs+0xc7/0x174 [e100] [<d1615bfe>] ? e100_up+0x1b/0xf5 [e100] [<d1615cef>] ? e100_open+0x17/0x41 [e100] [<c02f871f>] ? dev_open+0x8f/0xc5 [<c02f7ed9>] ? dev_change_flags+0xa2/0x155 [<c032daa6>] ? devinet_ioctl+0x22a/0x51c [<c02ebabe>] ? sock_ioctl+0x0/0x1e4 [<c02ebc7e>] ? sock_ioctl+0x1c0/0x1e4 [<c02ebabe>] ? sock_ioctl+0x0/0x1e4 [<c017f23a>] ? vfs_ioctl+0x16/0x4a [<c017fb01>] ? do_vfs_ioctl+0x48a/0x4c1 [<c0168137>] ? handle_mm_fault+0x1e0/0x42c [<c0348c6b>] ? do_page_fault+0x2ce/0x2e4 [<c017fb64>] ? sys_ioctl+0x2c/0x42 [<c0102748>] ? sysenter_do_call+0x12/0x26 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 Normal per-cpu: CPU 0: hi: 90, btch: 15 usd: 35 Active_anon:14778 active_file:10836 inactive_anon:22033 inactive_file:11854 unevictable:0 dirty:6 writeback:0 unstable:0 free:1031 slab:2083 mapped:6193 pagetables:417 bounce:0 DMA free:1096kB min:124kB low:152kB high:184kB active_anon:528kB inactive_anon:3440kB active_file:1076kB inactive_file:5580kB unevictable:0kB present:15868kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 238 238 Normal free:3028kB min:1908kB low:2384kB high:2860kB active_anon:58584kB inactive_anon:84692kB active_file:42268kB inactive_file:41836kB unevictable:0kB present:243776kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 DMA: 46*4kB 0*8kB 5*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1096kB Normal: 135*4kB 213*8kB 21*16kB 4*32kB 5*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3028kB 25927 total pagecache pages 3010 pages in swap cache Swap cache stats: add 205613, delete 202603, find 63665/79800 Free swap = 485236kB Total swap = 514040kB 65520 pages RAM 1663 pages reserved 14633 pages shared 52919 pages non-shared ifconfig: page allocation failure. order:5, mode:0x8020 Pid: 5151, comm: ifconfig Not tainted 2.6.31+frans2+mel-00002-g90702f9-dirty #2 Call Trace: [<c015c4e1>] ? __alloc_pages_nodemask+0x423/0x468 [<c0104de7>] ? dma_generic_alloc_coherent+0x4a/0xab [<c0104d9d>] ? dma_generic_alloc_coherent+0x0/0xab [<d1614b6f>] ? e100_alloc_cbs+0xc7/0x174 [e100] [<d1615bfe>] ? e100_up+0x1b/0xf5 [e100] [<d1615cef>] ? e100_open+0x17/0x41 [e100] [<c02f871f>] ? dev_open+0x8f/0xc5 [<c02f7ed9>] ? dev_change_flags+0xa2/0x155 [<c032daa6>] ? devinet_ioctl+0x22a/0x51c [<c02ebabe>] ? sock_ioctl+0x0/0x1e4 [<c02ebc7e>] ? sock_ioctl+0x1c0/0x1e4 [<c02ebabe>] ? sock_ioctl+0x0/0x1e4 [<c017f23a>] ? vfs_ioctl+0x16/0x4a [<c017fb01>] ? do_vfs_ioctl+0x48a/0x4c1 [<c0175fd1>] ? vfs_write+0xf4/0x105 [<c017fb64>] ? sys_ioctl+0x2c/0x42 [<c0102748>] ? sysenter_do_call+0x12/0x26 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 Normal per-cpu: CPU 0: hi: 90, btch: 15 usd: 67 Active_anon:14760 active_file:10798 inactive_anon:22052 inactive_file:11862 unevictable:0 dirty:6 writeback:30 unstable:0 free:1031 slab:2083 mapped:6187 pagetables:417 bounce:0 DMA free:1096kB min:124kB low:152kB high:184kB active_anon:528kB inactive_anon:3440kB active_file:1076kB inactive_file:5580kB unevictable:0kB present:15868kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 238 238 Normal free:3028kB min:1908kB low:2384kB high:2860kB active_anon:58512kB inactive_anon:84768kB active_file:42116kB inactive_file:41868kB unevictable:0kB present:243776kB pages_scanned:100 all_unreclaimable? no lowmem_reserve[]: 0 0 0 DMA: 46*4kB 0*8kB 5*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1096kB Normal: 135*4kB 213*8kB 21*16kB 4*32kB 5*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3028kB 25924 total pagecache pages 3037 pages in swap cache Swap cache stats: add 205644, delete 202607, find 63666/79802 Free swap = 485116kB Total swap = 514040kB 65520 pages RAM 1663 pages reserved 14638 pages shared 52896 pages non-shared e100 0000:00:03.0: firmware: requesting e100/d101s_ucode.bin ADDRCONF(NETDEV_UP): eth0: link is not ready e100: eth0 NIC Link is Up 100 Mbps Full Duplex ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready eth0: no IPv6 routers present -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-19 9:49 ` Tobi Oetiker (?) @ 2009-10-19 13:31 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-19 13:31 UTC (permalink / raw) To: Tobi Oetiker Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Mon, Oct 19, 2009 at 11:49:08AM +0200, Tobi Oetiker wrote: > Today Frans Pop wrote: > > > > > I'm starting to think that this commit may not be directly related to high > > order allocation failures. The fact that I'm seeing SKB allocation > > failures earlier because of this commit could be just a side effect. > > It could be that instead the main impact of this commit is on encrypted > > file system and/or encrypted swap (kcryptd). > > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm > > only reading from NFS that's unlikely). > > I have updated a fileserver to 2.6.31 today and I see page > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > So I guess the problem must be quite generic: > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > What's the rest of the stack trace? I'm wondering where a large number of order-5 GFP_ATOMIC allocations are coming from. It seems different to the e100 problem where there is one GFP_ATOMIC allocation while the firmware is being loaded. Thanks > > Oct 19 08:59:16 johan kernel: [30120.685647] __ratelimit: 13 callbacks suppressed [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685654] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685660] Pid: 6071, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685663] Call Trace: [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685666] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 09:36:31 johan kernel: [32355.708345] __ratelimit: 16 callbacks suppressed [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708352] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708358] Pid: 6087, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708361] Call Trace: [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708364] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 10:52:01 johan kernel: [36885.358312] __ratelimit: 31 callbacks suppressed [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358319] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358325] Pid: 6057, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358327] Call Trace: [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358331] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 11:12:01 johan kernel: [38085.163831] events/3: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 11:12:01 johan kernel: [38085.163840] Pid: 18, comm: events/3 Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 11:12:01 johan kernel: [38085.163843] Call Trace: [kern.warning] > Oct 19 11:12:01 johan kernel: [38085.163846] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > > -- > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland > http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-19 13:31 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-19 13:31 UTC (permalink / raw) To: Tobi Oetiker Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA On Mon, Oct 19, 2009 at 11:49:08AM +0200, Tobi Oetiker wrote: > Today Frans Pop wrote: > > > > > I'm starting to think that this commit may not be directly related to high > > order allocation failures. The fact that I'm seeing SKB allocation > > failures earlier because of this commit could be just a side effect. > > It could be that instead the main impact of this commit is on encrypted > > file system and/or encrypted swap (kcryptd). > > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm > > only reading from NFS that's unlikely). > > I have updated a fileserver to 2.6.31 today and I see page > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > So I guess the problem must be quite generic: > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > What's the rest of the stack trace? I'm wondering where a large number of order-5 GFP_ATOMIC allocations are coming from. It seems different to the e100 problem where there is one GFP_ATOMIC allocation while the firmware is being loaded. Thanks > > Oct 19 08:59:16 johan kernel: [30120.685647] __ratelimit: 13 callbacks suppressed [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685654] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685660] Pid: 6071, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685663] Call Trace: [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685666] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 09:36:31 johan kernel: [32355.708345] __ratelimit: 16 callbacks suppressed [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708352] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708358] Pid: 6087, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708361] Call Trace: [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708364] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 10:52:01 johan kernel: [36885.358312] __ratelimit: 31 callbacks suppressed [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358319] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358325] Pid: 6057, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358327] Call Trace: [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358331] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 11:12:01 johan kernel: [38085.163831] events/3: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 11:12:01 johan kernel: [38085.163840] Pid: 18, comm: events/3 Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 11:12:01 johan kernel: [38085.163843] Call Trace: [kern.warning] > Oct 19 11:12:01 johan kernel: [38085.163846] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > > -- > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland > http://it.oetiker.ch tobi-7K0TWYW2a3pyDzI6CaY1VQ@public.gmane.org ++41 62 775 9902 / sb: -9900 > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-19 13:31 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-19 13:31 UTC (permalink / raw) To: Tobi Oetiker Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Mon, Oct 19, 2009 at 11:49:08AM +0200, Tobi Oetiker wrote: > Today Frans Pop wrote: > > > > > I'm starting to think that this commit may not be directly related to high > > order allocation failures. The fact that I'm seeing SKB allocation > > failures earlier because of this commit could be just a side effect. > > It could be that instead the main impact of this commit is on encrypted > > file system and/or encrypted swap (kcryptd). > > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm > > only reading from NFS that's unlikely). > > I have updated a fileserver to 2.6.31 today and I see page > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > So I guess the problem must be quite generic: > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > What's the rest of the stack trace? I'm wondering where a large number of order-5 GFP_ATOMIC allocations are coming from. It seems different to the e100 problem where there is one GFP_ATOMIC allocation while the firmware is being loaded. Thanks > > Oct 19 08:59:16 johan kernel: [30120.685647] __ratelimit: 13 callbacks suppressed [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685654] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685660] Pid: 6071, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685663] Call Trace: [kern.warning] > Oct 19 08:59:16 johan kernel: [30120.685666] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 09:36:31 johan kernel: [32355.708345] __ratelimit: 16 callbacks suppressed [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708352] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708358] Pid: 6087, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708361] Call Trace: [kern.warning] > Oct 19 09:36:31 johan kernel: [32355.708364] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 10:52:01 johan kernel: [36885.358312] __ratelimit: 31 callbacks suppressed [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358319] nfsd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358325] Pid: 6057, comm: nfsd Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358327] Call Trace: [kern.warning] > Oct 19 10:52:01 johan kernel: [36885.358331] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 11:12:01 johan kernel: [38085.163831] events/3: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 11:12:01 johan kernel: [38085.163840] Pid: 18, comm: events/3 Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 11:12:01 johan kernel: [38085.163843] Call Trace: [kern.warning] > Oct 19 11:12:01 johan kernel: [38085.163846] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > > -- > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland > http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-19 13:31 ` Mel Gorman (?) @ 2009-10-19 13:40 ` Tobias Oetiker -1 siblings, 0 replies; 384+ messages in thread From: Tobias Oetiker @ 2009-10-19 13:40 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Hi Mel, Today Mel Gorman wrote: > On Mon, Oct 19, 2009 at 11:49:08AM +0200, Tobi Oetiker wrote: > > Today Frans Pop wrote: > > > > > > > > I'm starting to think that this commit may not be directly related to high > > > order allocation failures. The fact that I'm seeing SKB allocation > > > failures earlier because of this commit could be just a side effect. > > > It could be that instead the main impact of this commit is on encrypted > > > file system and/or encrypted swap (kcryptd). > > > > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm > > > only reading from NFS that's unlikely). > > > > I have updated a fileserver to 2.6.31 today and I see page > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > > So I guess the problem must be quite generic: > > > > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > > What's the rest of the stack trace? I'm wondering where a large number > of order-5 GFP_ATOMIC allocations are coming from. It seems different to > the e100 problem where there is one GFP_ATOMIC allocation while the > firmware is being loaded. Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684157] [<ffffffff810da7e5>] __alloc_pages_nodemask+0x135/0x140 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684164] [<ffffffff815065b4>] ? _spin_unlock_bh+0x14/0x20 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684170] [<ffffffff8110b368>] kmalloc_large_node+0x68/0xc0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684175] [<ffffffff8110f15a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684181] [<ffffffff8140ffd2>] ? skb_copy+0x32/0xa0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684185] [<ffffffff8140d8b6>] __alloc_skb+0x76/0x180 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684205] [<ffffffff8140ffd2>] skb_copy+0x32/0xa0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684221] [<ffffffffa050f33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684227] [<ffffffff81416a6d>] dev_queue_xmit_nit+0x10d/0x170 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684231] [<ffffffff81416f79>] dev_hard_start_xmit+0x189/0x1c0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684236] [<ffffffff8142f071>] __qdisc_run+0x1a1/0x230 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684240] [<ffffffff81418a88>] dev_queue_xmit+0x238/0x310 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684246] [<ffffffff8144864b>] ip_finish_output+0x11b/0x2f0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684250] [<ffffffff814488a9>] ip_output+0x89/0xd0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684254] [<ffffffff814478c0>] ip_local_out+0x20/0x30 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684258] [<ffffffff814481ab>] ip_queue_xmit+0x22b/0x3f0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684264] [<ffffffff8145d5e5>] tcp_transmit_skb+0x345/0x4e0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684269] [<ffffffff8145eaf6>] tcp_write_xmit+0xb6/0x2e0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684273] [<ffffffff8145ed8b>] __tcp_push_pending_frames+0x2b/0xa0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684277] [<ffffffff8145b249>] tcp_rcv_established+0x459/0x6d0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684282] [<ffffffff814630bd>] tcp_v4_do_rcv+0x12d/0x140 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684285] [<ffffffff8146365e>] tcp_v4_rcv+0x58e/0x7c0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684289] [<ffffffff8144276d>] ip_local_deliver_finish+0x11d/0x2b0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684293] [<ffffffff8144293b>] ip_local_deliver+0x3b/0x90 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684297] [<ffffffff81442ad6>] ip_rcv_finish+0x146/0x420 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684301] [<ffffffff8144304b>] ip_rcv+0x29b/0x370 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684304] [<ffffffff81418f9a>] netif_receive_skb+0x38a/0x4d0 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684308] [<ffffffff81419268>] napi_skb_finish+0x48/0x60 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684311] [<ffffffff81419724>] napi_gro_receive+0x34/0x40 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684330] [<ffffffffa006b623>] tg3_rx+0x373/0x4b0 [tg3] [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684339] [<ffffffffa006cbf0>] tg3_poll_work+0x70/0xf0 [tg3] [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684347] [<ffffffffa006ccae>] tg3_poll+0x3e/0xe0 [tg3] [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684350] [<ffffffff814198d2>] net_rx_action+0x102/0x210 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684357] [<ffffffff81061d24>] __do_softirq+0xc4/0x1f0 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684362] [<ffffffff8101314c>] call_softirq+0x1c/0x30 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684365] [<ffffffff81014945>] do_softirq+0x55/0x90 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684369] [<ffffffff8106116b>] irq_exit+0x7b/0x90 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684372] [<ffffffff81013e93>] do_IRQ+0x73/0xe0 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684378] [<ffffffff81012993>] ret_from_intr+0x0/0x11 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684381] <EOI> [<ffffffff810318b6>] ? native_safe_halt+0x6/0x10 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684391] [<ffffffff81019cd8>] ? default_idle+0x48/0xe0 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684396] [<ffffffff8150929d>] ? __atomic_notifier_call_chain+0xd/0x10 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684400] [<ffffffff815092b1>] ? atomic_notifier_call_chain+0x11/0x20 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684404] [<ffffffff810107c8>] ? cpu_idle+0x98/0xe0 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684410] [<ffffffff81500d95>] ? start_secondary+0x95/0xc0 [kern.warning] if you need more, I can send you a whole bunch of them ... cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-19 13:40 ` Tobias Oetiker 0 siblings, 0 replies; 384+ messages in thread From: Tobias Oetiker @ 2009-10-19 13:40 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA Hi Mel, Today Mel Gorman wrote: > On Mon, Oct 19, 2009 at 11:49:08AM +0200, Tobi Oetiker wrote: > > Today Frans Pop wrote: > > > > > > > > I'm starting to think that this commit may not be directly related to high > > > order allocation failures. The fact that I'm seeing SKB allocation > > > failures earlier because of this commit could be just a side effect. > > > It could be that instead the main impact of this commit is on encrypted > > > file system and/or encrypted swap (kcryptd). > > > > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm > > > only reading from NFS that's unlikely). > > > > I have updated a fileserver to 2.6.31 today and I see page > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > > So I guess the problem must be quite generic: > > > > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > > What's the rest of the stack trace? I'm wondering where a large number > of order-5 GFP_ATOMIC allocations are coming from. It seems different to > the e100 problem where there is one GFP_ATOMIC allocation while the > firmware is being loaded. Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684157] [<ffffffff810da7e5>] __alloc_pages_nodemask+0x135/0x140 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684164] [<ffffffff815065b4>] ? _spin_unlock_bh+0x14/0x20 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684170] [<ffffffff8110b368>] kmalloc_large_node+0x68/0xc0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684175] [<ffffffff8110f15a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684181] [<ffffffff8140ffd2>] ? skb_copy+0x32/0xa0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684185] [<ffffffff8140d8b6>] __alloc_skb+0x76/0x180 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684205] [<ffffffff8140ffd2>] skb_copy+0x32/0xa0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684221] [<ffffffffa050f33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684227] [<ffffffff81416a6d>] dev_queue_xmit_nit+0x10d/0x170 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684231] [<ffffffff81416f79>] dev_hard_start_xmit+0x189/0x1c0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684236] [<ffffffff8142f071>] __qdisc_run+0x1a1/0x230 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684240] [<ffffffff81418a88>] dev_queue_xmit+0x238/0x310 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684246] [<ffffffff8144864b>] ip_finish_output+0x11b/0x2f0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684250] [<ffffffff814488a9>] ip_output+0x89/0xd0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684254] [<ffffffff814478c0>] ip_local_out+0x20/0x30 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684258] [<ffffffff814481ab>] ip_queue_xmit+0x22b/0x3f0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684264] [<ffffffff8145d5e5>] tcp_transmit_skb+0x345/0x4e0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684269] [<ffffffff8145eaf6>] tcp_write_xmit+0xb6/0x2e0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684273] [<ffffffff8145ed8b>] __tcp_push_pending_frames+0x2b/0xa0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684277] [<ffffffff8145b249>] tcp_rcv_established+0x459/0x6d0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684282] [<ffffffff814630bd>] tcp_v4_do_rcv+0x12d/0x140 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684285] [<ffffffff8146365e>] tcp_v4_rcv+0x58e/0x7c0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684289] [<ffffffff8144276d>] ip_local_deliver_finish+0x11d/0x2b0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684293] [<ffffffff8144293b>] ip_local_deliver+0x3b/0x90 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684297] [<ffffffff81442ad6>] ip_rcv_finish+0x146/0x420 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684301] [<ffffffff8144304b>] ip_rcv+0x29b/0x370 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684304] [<ffffffff81418f9a>] netif_receive_skb+0x38a/0x4d0 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684308] [<ffffffff81419268>] napi_skb_finish+0x48/0x60 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684311] [<ffffffff81419724>] napi_gro_receive+0x34/0x40 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684330] [<ffffffffa006b623>] tg3_rx+0x373/0x4b0 [tg3] [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684339] [<ffffffffa006cbf0>] tg3_poll_work+0x70/0xf0 [tg3] [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684347] [<ffffffffa006ccae>] tg3_poll+0x3e/0xe0 [tg3] [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684350] [<ffffffff814198d2>] net_rx_action+0x102/0x210 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684357] [<ffffffff81061d24>] __do_softirq+0xc4/0x1f0 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684362] [<ffffffff8101314c>] call_softirq+0x1c/0x30 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684365] [<ffffffff81014945>] do_softirq+0x55/0x90 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684369] [<ffffffff8106116b>] irq_exit+0x7b/0x90 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684372] [<ffffffff81013e93>] do_IRQ+0x73/0xe0 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684378] [<ffffffff81012993>] ret_from_intr+0x0/0x11 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684381] <EOI> [<ffffffff810318b6>] ? native_safe_halt+0x6/0x10 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684391] [<ffffffff81019cd8>] ? default_idle+0x48/0xe0 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684396] [<ffffffff8150929d>] ? __atomic_notifier_call_chain+0xd/0x10 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684400] [<ffffffff815092b1>] ? atomic_notifier_call_chain+0x11/0x20 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684404] [<ffffffff810107c8>] ? cpu_idle+0x98/0xe0 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684410] [<ffffffff81500d95>] ? start_secondary+0x95/0xc0 [kern.warning] if you need more, I can send you a whole bunch of them ... cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi-7K0TWYW2a3pyDzI6CaY1VQ@public.gmane.org ++41 62 775 9902 / sb: -9900 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-19 13:40 ` Tobias Oetiker 0 siblings, 0 replies; 384+ messages in thread From: Tobias Oetiker @ 2009-10-19 13:40 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Hi Mel, Today Mel Gorman wrote: > On Mon, Oct 19, 2009 at 11:49:08AM +0200, Tobi Oetiker wrote: > > Today Frans Pop wrote: > > > > > > > > I'm starting to think that this commit may not be directly related to high > > > order allocation failures. The fact that I'm seeing SKB allocation > > > failures earlier because of this commit could be just a side effect. > > > It could be that instead the main impact of this commit is on encrypted > > > file system and/or encrypted swap (kcryptd). > > > > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm > > > only reading from NFS that's unlikely). > > > > I have updated a fileserver to 2.6.31 today and I see page > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > > So I guess the problem must be quite generic: > > > > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > > What's the rest of the stack trace? I'm wondering where a large number > of order-5 GFP_ATOMIC allocations are coming from. It seems different to > the e100 problem where there is one GFP_ATOMIC allocation while the > firmware is being loaded. Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684157] [<ffffffff810da7e5>] __alloc_pages_nodemask+0x135/0x140 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684164] [<ffffffff815065b4>] ? _spin_unlock_bh+0x14/0x20 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684170] [<ffffffff8110b368>] kmalloc_large_node+0x68/0xc0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684175] [<ffffffff8110f15a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684181] [<ffffffff8140ffd2>] ? skb_copy+0x32/0xa0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684185] [<ffffffff8140d8b6>] __alloc_skb+0x76/0x180 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684205] [<ffffffff8140ffd2>] skb_copy+0x32/0xa0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684221] [<ffffffffa050f33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684227] [<ffffffff81416a6d>] dev_queue_xmit_nit+0x10d/0x170 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684231] [<ffffffff81416f79>] dev_hard_start_xmit+0x189/0x1c0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684236] [<ffffffff8142f071>] __qdisc_run+0x1a1/0x230 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684240] [<ffffffff81418a88>] dev_queue_xmit+0x238/0x310 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684246] [<ffffffff8144864b>] ip_finish_output+0x11b/0x2f0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684250] [<ffffffff814488a9>] ip_output+0x89/0xd0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684254] [<ffffffff814478c0>] ip_local_out+0x20/0x30 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684258] [<ffffffff814481ab>] ip_queue_xmit+0x22b/0x3f0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684264] [<ffffffff8145d5e5>] tcp_transmit_skb+0x345/0x4e0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684269] [<ffffffff8145eaf6>] tcp_write_xmit+0xb6/0x2e0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684273] [<ffffffff8145ed8b>] __tcp_push_pending_frames+0x2b/0xa0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684277] [<ffffffff8145b249>] tcp_rcv_established+0x459/0x6d0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684282] [<ffffffff814630bd>] tcp_v4_do_rcv+0x12d/0x140 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684285] [<ffffffff8146365e>] tcp_v4_rcv+0x58e/0x7c0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684289] [<ffffffff8144276d>] ip_local_deliver_finish+0x11d/0x2b0 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684293] [<ffffffff8144293b>] ip_local_deliver+0x3b/0x90 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684297] [<ffffffff81442ad6>] ip_rcv_finish+0x146/0x420 [kern.warning] Oct 19 07:10:02 johan kernel: [23565.684301] [<ffffffff8144304b>] ip_rcv+0x29b/0x370 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684304] [<ffffffff81418f9a>] netif_receive_skb+0x38a/0x4d0 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684308] [<ffffffff81419268>] napi_skb_finish+0x48/0x60 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684311] [<ffffffff81419724>] napi_gro_receive+0x34/0x40 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684330] [<ffffffffa006b623>] tg3_rx+0x373/0x4b0 [tg3] [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684339] [<ffffffffa006cbf0>] tg3_poll_work+0x70/0xf0 [tg3] [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684347] [<ffffffffa006ccae>] tg3_poll+0x3e/0xe0 [tg3] [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684350] [<ffffffff814198d2>] net_rx_action+0x102/0x210 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684357] [<ffffffff81061d24>] __do_softirq+0xc4/0x1f0 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684362] [<ffffffff8101314c>] call_softirq+0x1c/0x30 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684365] [<ffffffff81014945>] do_softirq+0x55/0x90 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684369] [<ffffffff8106116b>] irq_exit+0x7b/0x90 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684372] [<ffffffff81013e93>] do_IRQ+0x73/0xe0 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684378] [<ffffffff81012993>] ret_from_intr+0x0/0x11 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684381] <EOI> [<ffffffff810318b6>] ? native_safe_halt+0x6/0x10 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684391] [<ffffffff81019cd8>] ? default_idle+0x48/0xe0 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684396] [<ffffffff8150929d>] ? __atomic_notifier_call_chain+0xd/0x10 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684400] [<ffffffff815092b1>] ? atomic_notifier_call_chain+0x11/0x20 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684404] [<ffffffff810107c8>] ? cpu_idle+0x98/0xe0 [kern.warning] Oct 19 07:10:04 johan kernel: [23565.684410] [<ffffffff81500d95>] ? start_secondary+0x95/0xc0 [kern.warning] if you need more, I can send you a whole bunch of them ... cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-19 13:40 ` Tobias Oetiker (?) @ 2009-10-19 14:09 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-19 14:09 UTC (permalink / raw) To: Tobias Oetiker Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Mon, Oct 19, 2009 at 03:40:05PM +0200, Tobias Oetiker wrote: > Hi Mel, > > Today Mel Gorman wrote: > > > On Mon, Oct 19, 2009 at 11:49:08AM +0200, Tobi Oetiker wrote: > > > Today Frans Pop wrote: > > > > > > > > > > > I'm starting to think that this commit may not be directly related to high > > > > order allocation failures. The fact that I'm seeing SKB allocation > > > > failures earlier because of this commit could be just a side effect. > > > > It could be that instead the main impact of this commit is on encrypted > > > > file system and/or encrypted swap (kcryptd). > > > > > > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm > > > > only reading from NFS that's unlikely). > > > > > > I have updated a fileserver to 2.6.31 today and I see page > > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > > > So I guess the problem must be quite generic: > > > > > > > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > > > > > What's the rest of the stack trace? I'm wondering where a large number > > of order-5 GFP_ATOMIC allocations are coming from. It seems different to > > the e100 problem where there is one GFP_ATOMIC allocation while the > > firmware is being loaded. > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684157] [<ffffffff810da7e5>] __alloc_pages_nodemask+0x135/0x140 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684164] [<ffffffff815065b4>] ? _spin_unlock_bh+0x14/0x20 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684170] [<ffffffff8110b368>] kmalloc_large_node+0x68/0xc0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684175] [<ffffffff8110f15a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684181] [<ffffffff8140ffd2>] ? skb_copy+0x32/0xa0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684185] [<ffffffff8140d8b6>] __alloc_skb+0x76/0x180 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684205] [<ffffffff8140ffd2>] skb_copy+0x32/0xa0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684221] [<ffffffffa050f33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] Is the MTU set very high between the host and virtualised machine? Can you test please with the patch at http://lkml.org/lkml/2009/10/16/89 applied and with commits 373c0a7e and 8aa7e847 reverted please? > Oct 19 07:10:02 johan kernel: [23565.684231] [<ffffffff81416f79>] dev_hard_start_xmit+0x189/0x1c0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684236] [<ffffffff8142f071>] __qdisc_run+0x1a1/0x230 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684240] [<ffffffff81418a88>] dev_queue_xmit+0x238/0x310 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684246] [<ffffffff8144864b>] ip_finish_output+0x11b/0x2f0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684250] [<ffffffff814488a9>] ip_output+0x89/0xd0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684254] [<ffffffff814478c0>] ip_local_out+0x20/0x30 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684258] [<ffffffff814481ab>] ip_queue_xmit+0x22b/0x3f0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684264] [<ffffffff8145d5e5>] tcp_transmit_skb+0x345/0x4e0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684269] [<ffffffff8145eaf6>] tcp_write_xmit+0xb6/0x2e0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684273] [<ffffffff8145ed8b>] __tcp_push_pending_frames+0x2b/0xa0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684277] [<ffffffff8145b249>] tcp_rcv_established+0x459/0x6d0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684282] [<ffffffff814630bd>] tcp_v4_do_rcv+0x12d/0x140 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684285] [<ffffffff8146365e>] tcp_v4_rcv+0x58e/0x7c0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684289] [<ffffffff8144276d>] ip_local_deliver_finish+0x11d/0x2b0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684293] [<ffffffff8144293b>] ip_local_deliver+0x3b/0x90 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684297] [<ffffffff81442ad6>] ip_rcv_finish+0x146/0x420 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684301] [<ffffffff8144304b>] ip_rcv+0x29b/0x370 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684304] [<ffffffff81418f9a>] netif_receive_skb+0x38a/0x4d0 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684308] [<ffffffff81419268>] napi_skb_finish+0x48/0x60 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684311] [<ffffffff81419724>] napi_gro_receive+0x34/0x40 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684330] [<ffffffffa006b623>] tg3_rx+0x373/0x4b0 [tg3] [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684339] [<ffffffffa006cbf0>] tg3_poll_work+0x70/0xf0 [tg3] [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684347] [<ffffffffa006ccae>] tg3_poll+0x3e/0xe0 [tg3] [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684350] [<ffffffff814198d2>] net_rx_action+0x102/0x210 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684357] [<ffffffff81061d24>] __do_softirq+0xc4/0x1f0 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684362] [<ffffffff8101314c>] call_softirq+0x1c/0x30 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684365] [<ffffffff81014945>] do_softirq+0x55/0x90 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684369] [<ffffffff8106116b>] irq_exit+0x7b/0x90 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684372] [<ffffffff81013e93>] do_IRQ+0x73/0xe0 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684378] [<ffffffff81012993>] ret_from_intr+0x0/0x11 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684381] <EOI> [<ffffffff810318b6>] ? native_safe_halt+0x6/0x10 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684391] [<ffffffff81019cd8>] ? default_idle+0x48/0xe0 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684396] [<ffffffff8150929d>] ? __atomic_notifier_call_chain+0xd/0x10 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684400] [<ffffffff815092b1>] ? atomic_notifier_call_chain+0x11/0x20 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684404] [<ffffffff810107c8>] ? cpu_idle+0x98/0xe0 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684410] [<ffffffff81500d95>] ? start_secondary+0x95/0xc0 [kern.warning] > > if you need more, I can send you a whole bunch of them ... > I'm assuming they are all more or less the same. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-19 14:09 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-19 14:09 UTC (permalink / raw) To: Tobias Oetiker Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA On Mon, Oct 19, 2009 at 03:40:05PM +0200, Tobias Oetiker wrote: > Hi Mel, > > Today Mel Gorman wrote: > > > On Mon, Oct 19, 2009 at 11:49:08AM +0200, Tobi Oetiker wrote: > > > Today Frans Pop wrote: > > > > > > > > > > > I'm starting to think that this commit may not be directly related to high > > > > order allocation failures. The fact that I'm seeing SKB allocation > > > > failures earlier because of this commit could be just a side effect. > > > > It could be that instead the main impact of this commit is on encrypted > > > > file system and/or encrypted swap (kcryptd). > > > > > > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm > > > > only reading from NFS that's unlikely). > > > > > > I have updated a fileserver to 2.6.31 today and I see page > > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > > > So I guess the problem must be quite generic: > > > > > > > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > > > > > What's the rest of the stack trace? I'm wondering where a large number > > of order-5 GFP_ATOMIC allocations are coming from. It seems different to > > the e100 problem where there is one GFP_ATOMIC allocation while the > > firmware is being loaded. > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684157] [<ffffffff810da7e5>] __alloc_pages_nodemask+0x135/0x140 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684164] [<ffffffff815065b4>] ? _spin_unlock_bh+0x14/0x20 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684170] [<ffffffff8110b368>] kmalloc_large_node+0x68/0xc0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684175] [<ffffffff8110f15a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684181] [<ffffffff8140ffd2>] ? skb_copy+0x32/0xa0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684185] [<ffffffff8140d8b6>] __alloc_skb+0x76/0x180 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684205] [<ffffffff8140ffd2>] skb_copy+0x32/0xa0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684221] [<ffffffffa050f33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] Is the MTU set very high between the host and virtualised machine? Can you test please with the patch at http://lkml.org/lkml/2009/10/16/89 applied and with commits 373c0a7e and 8aa7e847 reverted please? > Oct 19 07:10:02 johan kernel: [23565.684231] [<ffffffff81416f79>] dev_hard_start_xmit+0x189/0x1c0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684236] [<ffffffff8142f071>] __qdisc_run+0x1a1/0x230 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684240] [<ffffffff81418a88>] dev_queue_xmit+0x238/0x310 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684246] [<ffffffff8144864b>] ip_finish_output+0x11b/0x2f0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684250] [<ffffffff814488a9>] ip_output+0x89/0xd0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684254] [<ffffffff814478c0>] ip_local_out+0x20/0x30 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684258] [<ffffffff814481ab>] ip_queue_xmit+0x22b/0x3f0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684264] [<ffffffff8145d5e5>] tcp_transmit_skb+0x345/0x4e0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684269] [<ffffffff8145eaf6>] tcp_write_xmit+0xb6/0x2e0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684273] [<ffffffff8145ed8b>] __tcp_push_pending_frames+0x2b/0xa0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684277] [<ffffffff8145b249>] tcp_rcv_established+0x459/0x6d0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684282] [<ffffffff814630bd>] tcp_v4_do_rcv+0x12d/0x140 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684285] [<ffffffff8146365e>] tcp_v4_rcv+0x58e/0x7c0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684289] [<ffffffff8144276d>] ip_local_deliver_finish+0x11d/0x2b0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684293] [<ffffffff8144293b>] ip_local_deliver+0x3b/0x90 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684297] [<ffffffff81442ad6>] ip_rcv_finish+0x146/0x420 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684301] [<ffffffff8144304b>] ip_rcv+0x29b/0x370 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684304] [<ffffffff81418f9a>] netif_receive_skb+0x38a/0x4d0 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684308] [<ffffffff81419268>] napi_skb_finish+0x48/0x60 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684311] [<ffffffff81419724>] napi_gro_receive+0x34/0x40 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684330] [<ffffffffa006b623>] tg3_rx+0x373/0x4b0 [tg3] [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684339] [<ffffffffa006cbf0>] tg3_poll_work+0x70/0xf0 [tg3] [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684347] [<ffffffffa006ccae>] tg3_poll+0x3e/0xe0 [tg3] [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684350] [<ffffffff814198d2>] net_rx_action+0x102/0x210 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684357] [<ffffffff81061d24>] __do_softirq+0xc4/0x1f0 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684362] [<ffffffff8101314c>] call_softirq+0x1c/0x30 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684365] [<ffffffff81014945>] do_softirq+0x55/0x90 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684369] [<ffffffff8106116b>] irq_exit+0x7b/0x90 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684372] [<ffffffff81013e93>] do_IRQ+0x73/0xe0 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684378] [<ffffffff81012993>] ret_from_intr+0x0/0x11 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684381] <EOI> [<ffffffff810318b6>] ? native_safe_halt+0x6/0x10 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684391] [<ffffffff81019cd8>] ? default_idle+0x48/0xe0 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684396] [<ffffffff8150929d>] ? __atomic_notifier_call_chain+0xd/0x10 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684400] [<ffffffff815092b1>] ? atomic_notifier_call_chain+0x11/0x20 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684404] [<ffffffff810107c8>] ? cpu_idle+0x98/0xe0 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684410] [<ffffffff81500d95>] ? start_secondary+0x95/0xc0 [kern.warning] > > if you need more, I can send you a whole bunch of them ... > I'm assuming they are all more or less the same. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-19 14:09 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-19 14:09 UTC (permalink / raw) To: Tobias Oetiker Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Mon, Oct 19, 2009 at 03:40:05PM +0200, Tobias Oetiker wrote: > Hi Mel, > > Today Mel Gorman wrote: > > > On Mon, Oct 19, 2009 at 11:49:08AM +0200, Tobi Oetiker wrote: > > > Today Frans Pop wrote: > > > > > > > > > > > I'm starting to think that this commit may not be directly related to high > > > > order allocation failures. The fact that I'm seeing SKB allocation > > > > failures earlier because of this commit could be just a side effect. > > > > It could be that instead the main impact of this commit is on encrypted > > > > file system and/or encrypted swap (kcryptd). > > > > > > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm > > > > only reading from NFS that's unlikely). > > > > > > I have updated a fileserver to 2.6.31 today and I see page > > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > > > So I guess the problem must be quite generic: > > > > > > > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > > > > > What's the rest of the stack trace? I'm wondering where a large number > > of order-5 GFP_ATOMIC allocations are coming from. It seems different to > > the e100 problem where there is one GFP_ATOMIC allocation while the > > firmware is being loaded. > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684157] [<ffffffff810da7e5>] __alloc_pages_nodemask+0x135/0x140 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684164] [<ffffffff815065b4>] ? _spin_unlock_bh+0x14/0x20 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684170] [<ffffffff8110b368>] kmalloc_large_node+0x68/0xc0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684175] [<ffffffff8110f15a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684181] [<ffffffff8140ffd2>] ? skb_copy+0x32/0xa0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684185] [<ffffffff8140d8b6>] __alloc_skb+0x76/0x180 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684205] [<ffffffff8140ffd2>] skb_copy+0x32/0xa0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684221] [<ffffffffa050f33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] Is the MTU set very high between the host and virtualised machine? Can you test please with the patch at http://lkml.org/lkml/2009/10/16/89 applied and with commits 373c0a7e and 8aa7e847 reverted please? > Oct 19 07:10:02 johan kernel: [23565.684231] [<ffffffff81416f79>] dev_hard_start_xmit+0x189/0x1c0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684236] [<ffffffff8142f071>] __qdisc_run+0x1a1/0x230 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684240] [<ffffffff81418a88>] dev_queue_xmit+0x238/0x310 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684246] [<ffffffff8144864b>] ip_finish_output+0x11b/0x2f0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684250] [<ffffffff814488a9>] ip_output+0x89/0xd0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684254] [<ffffffff814478c0>] ip_local_out+0x20/0x30 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684258] [<ffffffff814481ab>] ip_queue_xmit+0x22b/0x3f0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684264] [<ffffffff8145d5e5>] tcp_transmit_skb+0x345/0x4e0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684269] [<ffffffff8145eaf6>] tcp_write_xmit+0xb6/0x2e0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684273] [<ffffffff8145ed8b>] __tcp_push_pending_frames+0x2b/0xa0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684277] [<ffffffff8145b249>] tcp_rcv_established+0x459/0x6d0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684282] [<ffffffff814630bd>] tcp_v4_do_rcv+0x12d/0x140 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684285] [<ffffffff8146365e>] tcp_v4_rcv+0x58e/0x7c0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684289] [<ffffffff8144276d>] ip_local_deliver_finish+0x11d/0x2b0 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684293] [<ffffffff8144293b>] ip_local_deliver+0x3b/0x90 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684297] [<ffffffff81442ad6>] ip_rcv_finish+0x146/0x420 [kern.warning] > Oct 19 07:10:02 johan kernel: [23565.684301] [<ffffffff8144304b>] ip_rcv+0x29b/0x370 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684304] [<ffffffff81418f9a>] netif_receive_skb+0x38a/0x4d0 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684308] [<ffffffff81419268>] napi_skb_finish+0x48/0x60 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684311] [<ffffffff81419724>] napi_gro_receive+0x34/0x40 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684330] [<ffffffffa006b623>] tg3_rx+0x373/0x4b0 [tg3] [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684339] [<ffffffffa006cbf0>] tg3_poll_work+0x70/0xf0 [tg3] [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684347] [<ffffffffa006ccae>] tg3_poll+0x3e/0xe0 [tg3] [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684350] [<ffffffff814198d2>] net_rx_action+0x102/0x210 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684357] [<ffffffff81061d24>] __do_softirq+0xc4/0x1f0 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684362] [<ffffffff8101314c>] call_softirq+0x1c/0x30 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684365] [<ffffffff81014945>] do_softirq+0x55/0x90 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684369] [<ffffffff8106116b>] irq_exit+0x7b/0x90 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684372] [<ffffffff81013e93>] do_IRQ+0x73/0xe0 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684378] [<ffffffff81012993>] ret_from_intr+0x0/0x11 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684381] <EOI> [<ffffffff810318b6>] ? native_safe_halt+0x6/0x10 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684391] [<ffffffff81019cd8>] ? default_idle+0x48/0xe0 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684396] [<ffffffff8150929d>] ? __atomic_notifier_call_chain+0xd/0x10 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684400] [<ffffffff815092b1>] ? atomic_notifier_call_chain+0x11/0x20 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684404] [<ffffffff810107c8>] ? cpu_idle+0x98/0xe0 [kern.warning] > Oct 19 07:10:04 johan kernel: [23565.684410] [<ffffffff81500d95>] ? start_secondary+0x95/0xc0 [kern.warning] > > if you need more, I can send you a whole bunch of them ... > I'm assuming they are all more or less the same. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-19 14:09 ` Mel Gorman @ 2009-10-19 14:16 ` Tobias Oetiker -1 siblings, 0 replies; 384+ messages in thread From: Tobias Oetiker @ 2009-10-19 14:16 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Hi Mel, Today Mel Gorman wrote: > On Mon, Oct 19, 2009 at 03:40:05PM +0200, Tobias Oetiker wrote: > > Hi Mel, > > > > Today Mel Gorman wrote: > > > > > On Mon, Oct 19, 2009 at 11:49:08AM +0200, Tobi Oetiker wrote: > > > > Today Frans Pop wrote: > > > > > > > > > > > > > > I'm starting to think that this commit may not be directly related to high > > > > > order allocation failures. The fact that I'm seeing SKB allocation > > > > > failures earlier because of this commit could be just a side effect. > > > > > It could be that instead the main impact of this commit is on encrypted > > > > > file system and/or encrypted swap (kcryptd). > > > > > > > > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm > > > > > only reading from NFS that's unlikely). > > > > > > > > I have updated a fileserver to 2.6.31 today and I see page > > > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > > > > So I guess the problem must be quite generic: > > > > > > > > > > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > > > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > > > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > > > > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > > > > > > > > What's the rest of the stack trace? I'm wondering where a large number > > > of order-5 GFP_ATOMIC allocations are coming from. It seems different to > > > the e100 problem where there is one GFP_ATOMIC allocation while the > > > firmware is being loaded. > > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684157] [<ffffffff810da7e5>] __alloc_pages_nodemask+0x135/0x140 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684164] [<ffffffff815065b4>] ? _spin_unlock_bh+0x14/0x20 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684170] [<ffffffff8110b368>] kmalloc_large_node+0x68/0xc0 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684175] [<ffffffff8110f15a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684181] [<ffffffff8140ffd2>] ? skb_copy+0x32/0xa0 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684185] [<ffffffff8140d8b6>] __alloc_skb+0x76/0x180 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684205] [<ffffffff8140ffd2>] skb_copy+0x32/0xa0 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684221] [<ffffffffa050f33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] > > Is the MTU set very high between the host and virtualised machine? > > Can you test please with the patch at http://lkml.org/lkml/2009/10/16/89 > applied and with commits 373c0a7e and 8aa7e847 reverted please? if you can send me a consolidated patch which does apply to 2.6.31.4 I will be glad to try ... your patch in http://lkml.org/lkml/2009/10/16/89 seems not to be for 2.6.31 ... I assume it would be but then again I I don't realy understand the code so this is just pattern matching ... --- a/mm/page_alloc.c 2009-10-05 19:12:06.000000000 +0200 +++ b/mm/page_alloc.c 2009-10-19 14:52:15.000000000 +0200 @@ -1763,6 +1763,7 @@ if (NUMA_BUILD && (gfp_mask & GFP_THISNODE) == GFP_THISNODE) goto nopage; +restart: wake_all_kswapd(order, zonelist, high_zoneidx); /* @@ -1772,7 +1773,6 @@ */ alloc_flags = gfp_to_alloc_flags(gfp_mask); -restart: /* This is the last chance, in general, before the goto nopage. */ page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist, high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS, cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-19 14:16 ` Tobias Oetiker 0 siblings, 0 replies; 384+ messages in thread From: Tobias Oetiker @ 2009-10-19 14:16 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Hi Mel, Today Mel Gorman wrote: > On Mon, Oct 19, 2009 at 03:40:05PM +0200, Tobias Oetiker wrote: > > Hi Mel, > > > > Today Mel Gorman wrote: > > > > > On Mon, Oct 19, 2009 at 11:49:08AM +0200, Tobi Oetiker wrote: > > > > Today Frans Pop wrote: > > > > > > > > > > > > > > I'm starting to think that this commit may not be directly related to high > > > > > order allocation failures. The fact that I'm seeing SKB allocation > > > > > failures earlier because of this commit could be just a side effect. > > > > > It could be that instead the main impact of this commit is on encrypted > > > > > file system and/or encrypted swap (kcryptd). > > > > > > > > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm > > > > > only reading from NFS that's unlikely). > > > > > > > > I have updated a fileserver to 2.6.31 today and I see page > > > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > > > > So I guess the problem must be quite generic: > > > > > > > > > > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > > > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > > > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > > > > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > > > > > > > > What's the rest of the stack trace? I'm wondering where a large number > > > of order-5 GFP_ATOMIC allocations are coming from. It seems different to > > > the e100 problem where there is one GFP_ATOMIC allocation while the > > > firmware is being loaded. > > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684157] [<ffffffff810da7e5>] __alloc_pages_nodemask+0x135/0x140 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684164] [<ffffffff815065b4>] ? _spin_unlock_bh+0x14/0x20 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684170] [<ffffffff8110b368>] kmalloc_large_node+0x68/0xc0 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684175] [<ffffffff8110f15a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684181] [<ffffffff8140ffd2>] ? skb_copy+0x32/0xa0 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684185] [<ffffffff8140d8b6>] __alloc_skb+0x76/0x180 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684205] [<ffffffff8140ffd2>] skb_copy+0x32/0xa0 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684221] [<ffffffffa050f33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] > > Is the MTU set very high between the host and virtualised machine? > > Can you test please with the patch at http://lkml.org/lkml/2009/10/16/89 > applied and with commits 373c0a7e and 8aa7e847 reverted please? if you can send me a consolidated patch which does apply to 2.6.31.4 I will be glad to try ... your patch in http://lkml.org/lkml/2009/10/16/89 seems not to be for 2.6.31 ... I assume it would be but then again I I don't realy understand the code so this is just pattern matching ... --- a/mm/page_alloc.c 2009-10-05 19:12:06.000000000 +0200 +++ b/mm/page_alloc.c 2009-10-19 14:52:15.000000000 +0200 @@ -1763,6 +1763,7 @@ if (NUMA_BUILD && (gfp_mask & GFP_THISNODE) == GFP_THISNODE) goto nopage; +restart: wake_all_kswapd(order, zonelist, high_zoneidx); /* @@ -1772,7 +1773,6 @@ */ alloc_flags = gfp_to_alloc_flags(gfp_mask); -restart: /* This is the last chance, in general, before the goto nopage. */ page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist, high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS, cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-19 14:16 ` Tobias Oetiker @ 2009-10-19 14:59 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-19 14:59 UTC (permalink / raw) To: Tobias Oetiker Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Mon, Oct 19, 2009 at 04:16:36PM +0200, Tobias Oetiker wrote: > Hi Mel, > > Today Mel Gorman wrote: > > > On Mon, Oct 19, 2009 at 03:40:05PM +0200, Tobias Oetiker wrote: > > > Hi Mel, > > > > > > Today Mel Gorman wrote: > > > > > > > On Mon, Oct 19, 2009 at 11:49:08AM +0200, Tobi Oetiker wrote: > > > > > Today Frans Pop wrote: > > > > > > > > > > > > > > > > > I'm starting to think that this commit may not be directly related to high > > > > > > order allocation failures. The fact that I'm seeing SKB allocation > > > > > > failures earlier because of this commit could be just a side effect. > > > > > > It could be that instead the main impact of this commit is on encrypted > > > > > > file system and/or encrypted swap (kcryptd). > > > > > > > > > > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm > > > > > > only reading from NFS that's unlikely). > > > > > > > > > > I have updated a fileserver to 2.6.31 today and I see page > > > > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > > > > > So I guess the problem must be quite generic: > > > > > > > > > > > > > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > > > > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > > > > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > > > > > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > > > > > > > > > > > What's the rest of the stack trace? I'm wondering where a large number > > > > of order-5 GFP_ATOMIC allocations are coming from. It seems different to > > > > the e100 problem where there is one GFP_ATOMIC allocation while the > > > > firmware is being loaded. > > > > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684157] [<ffffffff810da7e5>] __alloc_pages_nodemask+0x135/0x140 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684164] [<ffffffff815065b4>] ? _spin_unlock_bh+0x14/0x20 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684170] [<ffffffff8110b368>] kmalloc_large_node+0x68/0xc0 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684175] [<ffffffff8110f15a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684181] [<ffffffff8140ffd2>] ? skb_copy+0x32/0xa0 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684185] [<ffffffff8140d8b6>] __alloc_skb+0x76/0x180 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684205] [<ffffffff8140ffd2>] skb_copy+0x32/0xa0 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684221] [<ffffffffa050f33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] > > > > Is the MTU set very high between the host and virtualised machine? > > > > Can you test please with the patch at http://lkml.org/lkml/2009/10/16/89 > > applied and with commits 373c0a7e and 8aa7e847 reverted please? > > if you can send me a consolidated patch which does apply to > 2.6.31.4 I will be glad to try ... > Sure ==== CUT HERE ==== >From 6c0215af3b7c39ef7b8083ea38ca3ad93cd3f51f Mon Sep 17 00:00:00 2001 From: Mel Gorman <mel@csn.ul.ie> Date: Mon, 19 Oct 2009 15:40:43 +0100 Subject: [PATCH] Kick off kswapd after direct reclaim and revert congestion changes The following patch is http://lkml.org/lkml/2009/10/16/89 on top of 2.6.31.4 as well as patches 373c0a7e and 8aa7e847 reverted. --- arch/x86/lib/usercopy_32.c | 2 +- drivers/block/pktcdvd.c | 10 ++++------ drivers/md/dm-crypt.c | 2 +- fs/fat/file.c | 2 +- fs/fuse/dev.c | 8 ++++---- fs/nfs/write.c | 8 +++----- fs/reiserfs/journal.c | 2 +- fs/xfs/linux-2.6/kmem.c | 4 ++-- fs/xfs/linux-2.6/xfs_buf.c | 2 +- include/linux/backing-dev.h | 11 +++-------- include/linux/blkdev.h | 13 +++++++++---- mm/backing-dev.c | 7 ++++--- mm/memcontrol.c | 2 +- mm/page-writeback.c | 8 ++++---- mm/page_alloc.c | 15 ++++++++------- mm/vmscan.c | 8 ++++---- 16 files changed, 51 insertions(+), 53 deletions(-) diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c index 1f118d4..7c8ca91 100644 --- a/arch/x86/lib/usercopy_32.c +++ b/arch/x86/lib/usercopy_32.c @@ -751,7 +751,7 @@ survive: if (retval == -ENOMEM && is_global_init(current)) { up_read(¤t->mm->mmap_sem); - congestion_wait(BLK_RW_ASYNC, HZ/50); + congestion_wait(WRITE, HZ/50); goto survive; } diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c index 99a506f..83650e0 100644 --- a/drivers/block/pktcdvd.c +++ b/drivers/block/pktcdvd.c @@ -1372,10 +1372,8 @@ try_next_bio: wakeup = (pd->write_congestion_on > 0 && pd->bio_queue_size <= pd->write_congestion_off); spin_unlock(&pd->lock); - if (wakeup) { - clear_bdi_congested(&pd->disk->queue->backing_dev_info, - BLK_RW_ASYNC); - } + if (wakeup) + clear_bdi_congested(&pd->disk->queue->backing_dev_info, WRITE); pkt->sleep_time = max(PACKET_WAIT_TIME, 1); pkt_set_state(pkt, PACKET_WAITING_STATE); @@ -2594,10 +2592,10 @@ static int pkt_make_request(struct request_queue *q, struct bio *bio) spin_lock(&pd->lock); if (pd->write_congestion_on > 0 && pd->bio_queue_size >= pd->write_congestion_on) { - set_bdi_congested(&q->backing_dev_info, BLK_RW_ASYNC); + set_bdi_congested(&q->backing_dev_info, WRITE); do { spin_unlock(&pd->lock); - congestion_wait(BLK_RW_ASYNC, HZ); + congestion_wait(WRITE, HZ); spin_lock(&pd->lock); } while(pd->bio_queue_size > pd->write_congestion_off); } diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index ed10381..c72a8dd 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -776,7 +776,7 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io) * But don't wait if split was due to the io size restriction */ if (unlikely(out_of_pages)) - congestion_wait(BLK_RW_ASYNC, HZ/100); + congestion_wait(WRITE, HZ/100); /* * With async crypto it is unsafe to share the crypto context diff --git a/fs/fat/file.c b/fs/fat/file.c index f042b96..b28ea64 100644 --- a/fs/fat/file.c +++ b/fs/fat/file.c @@ -134,7 +134,7 @@ static int fat_file_release(struct inode *inode, struct file *filp) if ((filp->f_mode & FMODE_WRITE) && MSDOS_SB(inode->i_sb)->options.flush) { fat_flush_inodes(inode->i_sb, inode, NULL); - congestion_wait(BLK_RW_ASYNC, HZ/10); + congestion_wait(WRITE, HZ/10); } return 0; } diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 6484eb7..f58ecbc 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -286,8 +286,8 @@ __releases(&fc->lock) } if (fc->num_background == FUSE_CONGESTION_THRESHOLD && fc->connected && fc->bdi_initialized) { - clear_bdi_congested(&fc->bdi, BLK_RW_SYNC); - clear_bdi_congested(&fc->bdi, BLK_RW_ASYNC); + clear_bdi_congested(&fc->bdi, READ); + clear_bdi_congested(&fc->bdi, WRITE); } fc->num_background--; fc->active_background--; @@ -414,8 +414,8 @@ static void fuse_request_send_nowait_locked(struct fuse_conn *fc, fc->blocked = 1; if (fc->num_background == FUSE_CONGESTION_THRESHOLD && fc->bdi_initialized) { - set_bdi_congested(&fc->bdi, BLK_RW_SYNC); - set_bdi_congested(&fc->bdi, BLK_RW_ASYNC); + set_bdi_congested(&fc->bdi, READ); + set_bdi_congested(&fc->bdi, WRITE); } list_add_tail(&req->list, &fc->bg_queue); flush_bg_queue(fc); diff --git a/fs/nfs/write.c b/fs/nfs/write.c index a34fae2..5693fcd 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -200,10 +200,8 @@ static int nfs_set_page_writeback(struct page *page) struct nfs_server *nfss = NFS_SERVER(inode); if (atomic_long_inc_return(&nfss->writeback) > - NFS_CONGESTION_ON_THRESH) { - set_bdi_congested(&nfss->backing_dev_info, - BLK_RW_ASYNC); - } + NFS_CONGESTION_ON_THRESH) + set_bdi_congested(&nfss->backing_dev_info, WRITE); } return ret; } @@ -215,7 +213,7 @@ static void nfs_end_page_writeback(struct page *page) end_page_writeback(page); if (atomic_long_dec_return(&nfss->writeback) < NFS_CONGESTION_OFF_THRESH) - clear_bdi_congested(&nfss->backing_dev_info, BLK_RW_ASYNC); + clear_bdi_congested(&nfss->backing_dev_info, WRITE); } /* diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c index 9062220..77f5bb7 100644 --- a/fs/reiserfs/journal.c +++ b/fs/reiserfs/journal.c @@ -997,7 +997,7 @@ static int reiserfs_async_progress_wait(struct super_block *s) DEFINE_WAIT(wait); struct reiserfs_journal *j = SB_JOURNAL(s); if (atomic_read(&j->j_async_throttle)) - congestion_wait(BLK_RW_ASYNC, HZ / 10); + congestion_wait(WRITE, HZ / 10); return 0; } diff --git a/fs/xfs/linux-2.6/kmem.c b/fs/xfs/linux-2.6/kmem.c index 2d3f90a..1cd3b55 100644 --- a/fs/xfs/linux-2.6/kmem.c +++ b/fs/xfs/linux-2.6/kmem.c @@ -53,7 +53,7 @@ kmem_alloc(size_t size, unsigned int __nocast flags) printk(KERN_ERR "XFS: possible memory allocation " "deadlock in %s (mode:0x%x)\n", __func__, lflags); - congestion_wait(BLK_RW_ASYNC, HZ/50); + congestion_wait(WRITE, HZ/50); } while (1); } @@ -130,7 +130,7 @@ kmem_zone_alloc(kmem_zone_t *zone, unsigned int __nocast flags) printk(KERN_ERR "XFS: possible memory allocation " "deadlock in %s (mode:0x%x)\n", __func__, lflags); - congestion_wait(BLK_RW_ASYNC, HZ/50); + congestion_wait(WRITE, HZ/50); } while (1); } diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c index 965df12..178c20c 100644 --- a/fs/xfs/linux-2.6/xfs_buf.c +++ b/fs/xfs/linux-2.6/xfs_buf.c @@ -412,7 +412,7 @@ _xfs_buf_lookup_pages( XFS_STATS_INC(xb_page_retries); xfsbufd_wakeup(0, gfp_mask); - congestion_wait(BLK_RW_ASYNC, HZ/50); + congestion_wait(WRITE, HZ/50); goto retry; } diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index 1d52425..0ec2c59 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -229,14 +229,9 @@ static inline int bdi_rw_congested(struct backing_dev_info *bdi) (1 << BDI_async_congested)); } -enum { - BLK_RW_ASYNC = 0, - BLK_RW_SYNC = 1, -}; - -void clear_bdi_congested(struct backing_dev_info *bdi, int sync); -void set_bdi_congested(struct backing_dev_info *bdi, int sync); -long congestion_wait(int sync, long timeout); +void clear_bdi_congested(struct backing_dev_info *bdi, int rw); +void set_bdi_congested(struct backing_dev_info *bdi, int rw); +long congestion_wait(int rw, long timeout); static inline bool bdi_cap_writeback_dirty(struct backing_dev_info *bdi) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 69103e0..998c8e0 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -70,6 +70,11 @@ enum rq_cmd_type_bits { REQ_TYPE_ATA_PC, }; +enum { + BLK_RW_ASYNC = 0, + BLK_RW_SYNC = 1, +}; + /* * For request of type REQ_TYPE_LINUX_BLOCK, rq->cmd[0] is the opcode being * sent down (similar to how REQ_TYPE_BLOCK_PC means that ->cmd[] holds a @@ -775,18 +780,18 @@ extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t, * congested queues, and wake up anyone who was waiting for requests to be * put back. */ -static inline void blk_clear_queue_congested(struct request_queue *q, int sync) +static inline void blk_clear_queue_congested(struct request_queue *q, int rw) { - clear_bdi_congested(&q->backing_dev_info, sync); + clear_bdi_congested(&q->backing_dev_info, rw); } /* * A queue has just entered congestion. Flag that in the queue's VM-visible * state flags and increment the global gounter of congested queues. */ -static inline void blk_set_queue_congested(struct request_queue *q, int sync) +static inline void blk_set_queue_congested(struct request_queue *q, int rw) { - set_bdi_congested(&q->backing_dev_info, sync); + set_bdi_congested(&q->backing_dev_info, rw); } extern void blk_start_queue(struct request_queue *q); diff --git a/mm/backing-dev.c b/mm/backing-dev.c index c86edd2..493b468 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -283,6 +283,7 @@ static wait_queue_head_t congestion_wqh[2] = { __WAIT_QUEUE_HEAD_INITIALIZER(congestion_wqh[1]) }; + void clear_bdi_congested(struct backing_dev_info *bdi, int sync) { enum bdi_state bit; @@ -307,18 +308,18 @@ EXPORT_SYMBOL(set_bdi_congested); /** * congestion_wait - wait for a backing_dev to become uncongested - * @sync: SYNC or ASYNC IO + * @rw: READ or WRITE * @timeout: timeout in jiffies * * Waits for up to @timeout jiffies for a backing_dev (any backing_dev) to exit * write congestion. If no backing_devs are congested then just wait for the * next write to be completed. */ -long congestion_wait(int sync, long timeout) +long congestion_wait(int rw, long timeout) { long ret; DEFINE_WAIT(wait); - wait_queue_head_t *wqh = &congestion_wqh[sync]; + wait_queue_head_t *wqh = &congestion_wqh[rw]; prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); ret = io_schedule_timeout(timeout); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index fd4529d..834509f 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1990,7 +1990,7 @@ try_to_free: if (!progress) { nr_retries--; /* maybe some writeback is necessary */ - congestion_wait(BLK_RW_ASYNC, HZ/10); + congestion_wait(WRITE, HZ/10); } } diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 81627eb..7687879 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -575,7 +575,7 @@ static void balance_dirty_pages(struct address_space *mapping) if (pages_written >= write_chunk) break; /* We've done our duty */ - congestion_wait(BLK_RW_ASYNC, HZ/10); + congestion_wait(WRITE, HZ/10); } if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh && @@ -669,7 +669,7 @@ void throttle_vm_writeout(gfp_t gfp_mask) if (global_page_state(NR_UNSTABLE_NFS) + global_page_state(NR_WRITEBACK) <= dirty_thresh) break; - congestion_wait(BLK_RW_ASYNC, HZ/10); + congestion_wait(WRITE, HZ/10); /* * The caller might hold locks which can prevent IO completion @@ -715,7 +715,7 @@ static void background_writeout(unsigned long _min_pages) if (wbc.nr_to_write > 0 || wbc.pages_skipped > 0) { /* Wrote less than expected */ if (wbc.encountered_congestion || wbc.more_io) - congestion_wait(BLK_RW_ASYNC, HZ/10); + congestion_wait(WRITE, HZ/10); else break; } @@ -787,7 +787,7 @@ static void wb_kupdate(unsigned long arg) writeback_inodes(&wbc); if (wbc.nr_to_write > 0) { if (wbc.encountered_congestion || wbc.more_io) - congestion_wait(BLK_RW_ASYNC, HZ/10); + congestion_wait(WRITE, HZ/10); else break; /* All the old data is written */ } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0b3c6cb..489a187 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1673,7 +1673,7 @@ __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order, preferred_zone, migratetype); if (!page && gfp_mask & __GFP_NOFAIL) - congestion_wait(BLK_RW_ASYNC, HZ/50); + congestion_wait(WRITE, HZ/50); } while (!page && (gfp_mask & __GFP_NOFAIL)); return page; @@ -1763,16 +1763,17 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, if (NUMA_BUILD && (gfp_mask & GFP_THISNODE) == GFP_THISNODE) goto nopage; - wake_all_kswapd(order, zonelist, high_zoneidx); - /* - * OK, we're below the kswapd watermark and have kicked background - * reclaim. Now things get more complex, so set up alloc_flags according - * to how we want to proceed. + * OK, we're below the kswapd watermark and now things get more + * complex, so set up alloc_flags according to how we want to + * proceed. */ alloc_flags = gfp_to_alloc_flags(gfp_mask); restart: + /* Kick background reclaim */ + wake_all_kswapd(order, zonelist, high_zoneidx); + /* This is the last chance, in general, before the goto nopage. */ page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist, high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS, @@ -1844,7 +1845,7 @@ rebalance: pages_reclaimed += did_some_progress; if (should_alloc_retry(gfp_mask, order, pages_reclaimed)) { /* Wait for some write requests to complete then retry */ - congestion_wait(BLK_RW_ASYNC, HZ/50); + congestion_wait(WRITE, HZ/50); goto rebalance; } diff --git a/mm/vmscan.c b/mm/vmscan.c index 94e86dd..9219beb 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1109,7 +1109,7 @@ static unsigned long shrink_inactive_list(unsigned long max_scan, */ if (nr_freed < nr_taken && !current_is_kswapd() && lumpy_reclaim) { - congestion_wait(BLK_RW_ASYNC, HZ/10); + congestion_wait(WRITE, HZ/10); /* * The attempt at page out may have made some @@ -1726,7 +1726,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist, /* Take a nap, wait for some writeback to complete */ if (sc->nr_scanned && priority < DEF_PRIORITY - 2) - congestion_wait(BLK_RW_ASYNC, HZ/10); + congestion_wait(WRITE, HZ/10); } /* top priority shrink_zones still had more to do? don't OOM, then */ if (!sc->all_unreclaimable && scanning_global_lru(sc)) @@ -1965,7 +1965,7 @@ loop_again: * another pass across the zones. */ if (total_scanned && priority < DEF_PRIORITY - 2) - congestion_wait(BLK_RW_ASYNC, HZ/10); + congestion_wait(WRITE, HZ/10); /* * We do this so kswapd doesn't build up large priorities for @@ -2238,7 +2238,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages) goto out; if (sc.nr_scanned && prio < DEF_PRIORITY - 2) - congestion_wait(BLK_RW_ASYNC, HZ / 10); + congestion_wait(WRITE, HZ / 10); } } ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-19 14:59 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-19 14:59 UTC (permalink / raw) To: Tobias Oetiker Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Mon, Oct 19, 2009 at 04:16:36PM +0200, Tobias Oetiker wrote: > Hi Mel, > > Today Mel Gorman wrote: > > > On Mon, Oct 19, 2009 at 03:40:05PM +0200, Tobias Oetiker wrote: > > > Hi Mel, > > > > > > Today Mel Gorman wrote: > > > > > > > On Mon, Oct 19, 2009 at 11:49:08AM +0200, Tobi Oetiker wrote: > > > > > Today Frans Pop wrote: > > > > > > > > > > > > > > > > > I'm starting to think that this commit may not be directly related to high > > > > > > order allocation failures. The fact that I'm seeing SKB allocation > > > > > > failures earlier because of this commit could be just a side effect. > > > > > > It could be that instead the main impact of this commit is on encrypted > > > > > > file system and/or encrypted swap (kcryptd). > > > > > > > > > > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm > > > > > > only reading from NFS that's unlikely). > > > > > > > > > > I have updated a fileserver to 2.6.31 today and I see page > > > > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > > > > > So I guess the problem must be quite generic: > > > > > > > > > > > > > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > > > > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > > > > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > > > > > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > > > > > > > > > > > What's the rest of the stack trace? I'm wondering where a large number > > > > of order-5 GFP_ATOMIC allocations are coming from. It seems different to > > > > the e100 problem where there is one GFP_ATOMIC allocation while the > > > > firmware is being loaded. > > > > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684124] <IRQ> [<ffffffff810da5a2>] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684157] [<ffffffff810da7e5>] __alloc_pages_nodemask+0x135/0x140 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684164] [<ffffffff815065b4>] ? _spin_unlock_bh+0x14/0x20 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684170] [<ffffffff8110b368>] kmalloc_large_node+0x68/0xc0 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684175] [<ffffffff8110f15a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684181] [<ffffffff8140ffd2>] ? skb_copy+0x32/0xa0 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684185] [<ffffffff8140d8b6>] __alloc_skb+0x76/0x180 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684205] [<ffffffff8140ffd2>] skb_copy+0x32/0xa0 [kern.warning] > > > Oct 19 07:10:02 johan kernel: [23565.684221] [<ffffffffa050f33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] > > > > Is the MTU set very high between the host and virtualised machine? > > > > Can you test please with the patch at http://lkml.org/lkml/2009/10/16/89 > > applied and with commits 373c0a7e and 8aa7e847 reverted please? > > if you can send me a consolidated patch which does apply to > 2.6.31.4 I will be glad to try ... > Sure ==== CUT HERE ==== ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-19 14:59 ` Mel Gorman @ 2009-10-19 20:12 ` Tobias Oetiker -1 siblings, 0 replies; 384+ messages in thread From: Tobias Oetiker @ 2009-10-19 20:12 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Hi Mel, Today Mel Gorman wrote: > > > > if you can send me a consolidated patch which does apply to > > 2.6.31.4 I will be glad to try ... > > > > Sure > > ==== CUT HERE ==== > > From 6c0215af3b7c39ef7b8083ea38ca3ad93cd3f51f Mon Sep 17 00:00:00 2001 > From: Mel Gorman <mel@csn.ul.ie> > Date: Mon, 19 Oct 2009 15:40:43 +0100 > Subject: [PATCH] Kick off kswapd after direct reclaim and revert congestion changes > > The following patch is http://lkml.org/lkml/2009/10/16/89 on top of > 2.6.31.4 as well as patches 373c0a7e and 8aa7e847 reverted. it seems to help ... the server has been running for 3 hours now without incident, but then again it is not as active as during the day, ... will report tomorrow. cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-19 20:12 ` Tobias Oetiker 0 siblings, 0 replies; 384+ messages in thread From: Tobias Oetiker @ 2009-10-19 20:12 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Hi Mel, Today Mel Gorman wrote: > > > > if you can send me a consolidated patch which does apply to > > 2.6.31.4 I will be glad to try ... > > > > Sure > > ==== CUT HERE ==== > > From 6c0215af3b7c39ef7b8083ea38ca3ad93cd3f51f Mon Sep 17 00:00:00 2001 > From: Mel Gorman <mel@csn.ul.ie> > Date: Mon, 19 Oct 2009 15:40:43 +0100 > Subject: [PATCH] Kick off kswapd after direct reclaim and revert congestion changes > > The following patch is http://lkml.org/lkml/2009/10/16/89 on top of > 2.6.31.4 as well as patches 373c0a7e and 8aa7e847 reverted. it seems to help ... the server has been running for 3 hours now without incident, but then again it is not as active as during the day, ... will report tomorrow. cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-19 20:12 ` Tobias Oetiker @ 2009-10-19 20:17 ` Tobias Oetiker -1 siblings, 0 replies; 384+ messages in thread From: Tobias Oetiker @ 2009-10-19 20:17 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Hi Mel, Today Tobias Oetiker wrote: > Hi Mel, > > Today Mel Gorman wrote: > > > > > > > if you can send me a consolidated patch which does apply to > > > 2.6.31.4 I will be glad to try ... > > > > > > > Sure > > > > ==== CUT HERE ==== > > > > From 6c0215af3b7c39ef7b8083ea38ca3ad93cd3f51f Mon Sep 17 00:00:00 2001 > > From: Mel Gorman <mel@csn.ul.ie> > > Date: Mon, 19 Oct 2009 15:40:43 +0100 > > Subject: [PATCH] Kick off kswapd after direct reclaim and revert congestion changes > > > > The following patch is http://lkml.org/lkml/2009/10/16/89 on top of > > 2.6.31.4 as well as patches 373c0a7e and 8aa7e847 reverted. > > it seems to help ... the server has been running for 3 hours now > without incident, but then again it is not as active as during the > day, ... will report tomorrow. while I was writing, the system found that the patch does not realy help: Oct 19 22:09:52 johan kernel: [11157.121506] smtpd: page allocation failure. order:5, mode:0x4020 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121514] Pid: 19324, comm: smtpd Tainted: G D 2.6.31.4-oep #1 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121518] Call Trace: [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121521] <IRQ> [<ffffffff810cb599>] __alloc_pages_nodemask+0x549/0x650 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121563] [<ffffffffa02bde3b>] ? __nf_ct_refresh_acct+0xab/0x110 [nf_conntrack] [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121572] [<ffffffffa02a8337>] ? ipt_do_table+0x2f7/0x610 [ip_tables] [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121580] [<ffffffff810fac18>] kmalloc_large_node+0x68/0xc0 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121585] [<ffffffff810fe90a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121592] [<ffffffff813ebd42>] ? skb_copy+0x32/0xa0 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121596] [<ffffffff813e9606>] __alloc_skb+0x76/0x180 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121600] [<ffffffff813ebd42>] skb_copy+0x32/0xa0 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121615] [<ffffffffa07dd33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121620] [<ffffffff813f2512>] dev_hard_start_xmit+0x142/0x320 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121632] [<ffffffff8140a2c1>] __qdisc_run+0x1a1/0x230 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121637] [<ffffffff813f41e0>] dev_queue_xmit+0x2b0/0x3a0 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121642] [<ffffffff8142349b>] ip_finish_output+0x11b/0x2f0 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121646] [<ffffffff814236f9>] ip_output+0x89/0xd0 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121650] [<ffffffff81422710>] ip_local_out+0x20/0x30 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121654] [<ffffffff81422ffb>] ip_queue_xmit+0x22b/0x3f0 [kern.warning] cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-19 20:17 ` Tobias Oetiker 0 siblings, 0 replies; 384+ messages in thread From: Tobias Oetiker @ 2009-10-19 20:17 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Hi Mel, Today Tobias Oetiker wrote: > Hi Mel, > > Today Mel Gorman wrote: > > > > > > > if you can send me a consolidated patch which does apply to > > > 2.6.31.4 I will be glad to try ... > > > > > > > Sure > > > > ==== CUT HERE ==== > > > > From 6c0215af3b7c39ef7b8083ea38ca3ad93cd3f51f Mon Sep 17 00:00:00 2001 > > From: Mel Gorman <mel@csn.ul.ie> > > Date: Mon, 19 Oct 2009 15:40:43 +0100 > > Subject: [PATCH] Kick off kswapd after direct reclaim and revert congestion changes > > > > The following patch is http://lkml.org/lkml/2009/10/16/89 on top of > > 2.6.31.4 as well as patches 373c0a7e and 8aa7e847 reverted. > > it seems to help ... the server has been running for 3 hours now > without incident, but then again it is not as active as during the > day, ... will report tomorrow. while I was writing, the system found that the patch does not realy help: Oct 19 22:09:52 johan kernel: [11157.121506] smtpd: page allocation failure. order:5, mode:0x4020 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121514] Pid: 19324, comm: smtpd Tainted: G D 2.6.31.4-oep #1 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121518] Call Trace: [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121521] <IRQ> [<ffffffff810cb599>] __alloc_pages_nodemask+0x549/0x650 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121563] [<ffffffffa02bde3b>] ? __nf_ct_refresh_acct+0xab/0x110 [nf_conntrack] [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121572] [<ffffffffa02a8337>] ? ipt_do_table+0x2f7/0x610 [ip_tables] [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121580] [<ffffffff810fac18>] kmalloc_large_node+0x68/0xc0 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121585] [<ffffffff810fe90a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121592] [<ffffffff813ebd42>] ? skb_copy+0x32/0xa0 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121596] [<ffffffff813e9606>] __alloc_skb+0x76/0x180 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121600] [<ffffffff813ebd42>] skb_copy+0x32/0xa0 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121615] [<ffffffffa07dd33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121620] [<ffffffff813f2512>] dev_hard_start_xmit+0x142/0x320 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121632] [<ffffffff8140a2c1>] __qdisc_run+0x1a1/0x230 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121637] [<ffffffff813f41e0>] dev_queue_xmit+0x2b0/0x3a0 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121642] [<ffffffff8142349b>] ip_finish_output+0x11b/0x2f0 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121646] [<ffffffff814236f9>] ip_output+0x89/0xd0 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121650] [<ffffffff81422710>] ip_local_out+0x20/0x30 [kern.warning] Oct 19 22:09:52 johan kernel: [11157.121654] [<ffffffff81422ffb>] ip_queue_xmit+0x22b/0x3f0 [kern.warning] cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-19 20:17 ` Tobias Oetiker @ 2009-10-20 10:57 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-20 10:57 UTC (permalink / raw) To: Tobias Oetiker Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Mon, Oct 19, 2009 at 10:17:06PM +0200, Tobias Oetiker wrote: > Hi Mel, > > Today Tobias Oetiker wrote: > > > Hi Mel, > > > > Today Mel Gorman wrote: > > > > > > > > > > if you can send me a consolidated patch which does apply to > > > > 2.6.31.4 I will be glad to try ... > > > > > > > > > > Sure > > > > > > ==== CUT HERE ==== > > > > > > From 6c0215af3b7c39ef7b8083ea38ca3ad93cd3f51f Mon Sep 17 00:00:00 2001 > > > From: Mel Gorman <mel@csn.ul.ie> > > > Date: Mon, 19 Oct 2009 15:40:43 +0100 > > > Subject: [PATCH] Kick off kswapd after direct reclaim and revert congestion changes > > > > > > The following patch is http://lkml.org/lkml/2009/10/16/89 on top of > > > 2.6.31.4 as well as patches 373c0a7e and 8aa7e847 reverted. > > > > it seems to help ... the server has been running for 3 hours now > > without incident, but then again it is not as active as during the > > day, ... will report tomorrow. > > while I was writing, the system found that the patch does not realy > help: > > Oct 19 22:09:52 johan kernel: [11157.121506] smtpd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121514] Pid: 19324, comm: smtpd Tainted: G D 2.6.31.4-oep #1 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121518] Call Trace: [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121521] <IRQ> [<ffffffff810cb599>] __alloc_pages_nodemask+0x549/0x650 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121563] [<ffffffffa02bde3b>] ? __nf_ct_refresh_acct+0xab/0x110 [nf_conntrack] [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121572] [<ffffffffa02a8337>] ? ipt_do_table+0x2f7/0x610 [ip_tables] [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121580] [<ffffffff810fac18>] kmalloc_large_node+0x68/0xc0 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121585] [<ffffffff810fe90a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121592] [<ffffffff813ebd42>] ? skb_copy+0x32/0xa0 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121596] [<ffffffff813e9606>] __alloc_skb+0x76/0x180 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121600] [<ffffffff813ebd42>] skb_copy+0x32/0xa0 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121615] [<ffffffffa07dd33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121620] [<ffffffff813f2512>] dev_hard_start_xmit+0x142/0x320 [kern.warning] Are the number of failures at least reduced or are they occuring at the same rate? Also, what was the last kernel that worked for you with this configuration? Thanks > Oct 19 22:09:52 johan kernel: [11157.121632] [<ffffffff8140a2c1>] __qdisc_run+0x1a1/0x230 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121637] [<ffffffff813f41e0>] dev_queue_xmit+0x2b0/0x3a0 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121642] [<ffffffff8142349b>] ip_finish_output+0x11b/0x2f0 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121646] [<ffffffff814236f9>] ip_output+0x89/0xd0 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121650] [<ffffffff81422710>] ip_local_out+0x20/0x30 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121654] [<ffffffff81422ffb>] ip_queue_xmit+0x22b/0x3f0 [kern.warning] > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-20 10:57 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-20 10:57 UTC (permalink / raw) To: Tobias Oetiker Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Mon, Oct 19, 2009 at 10:17:06PM +0200, Tobias Oetiker wrote: > Hi Mel, > > Today Tobias Oetiker wrote: > > > Hi Mel, > > > > Today Mel Gorman wrote: > > > > > > > > > > if you can send me a consolidated patch which does apply to > > > > 2.6.31.4 I will be glad to try ... > > > > > > > > > > Sure > > > > > > ==== CUT HERE ==== > > > > > > From 6c0215af3b7c39ef7b8083ea38ca3ad93cd3f51f Mon Sep 17 00:00:00 2001 > > > From: Mel Gorman <mel@csn.ul.ie> > > > Date: Mon, 19 Oct 2009 15:40:43 +0100 > > > Subject: [PATCH] Kick off kswapd after direct reclaim and revert congestion changes > > > > > > The following patch is http://lkml.org/lkml/2009/10/16/89 on top of > > > 2.6.31.4 as well as patches 373c0a7e and 8aa7e847 reverted. > > > > it seems to help ... the server has been running for 3 hours now > > without incident, but then again it is not as active as during the > > day, ... will report tomorrow. > > while I was writing, the system found that the patch does not realy > help: > > Oct 19 22:09:52 johan kernel: [11157.121506] smtpd: page allocation failure. order:5, mode:0x4020 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121514] Pid: 19324, comm: smtpd Tainted: G D 2.6.31.4-oep #1 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121518] Call Trace: [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121521] <IRQ> [<ffffffff810cb599>] __alloc_pages_nodemask+0x549/0x650 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121563] [<ffffffffa02bde3b>] ? __nf_ct_refresh_acct+0xab/0x110 [nf_conntrack] [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121572] [<ffffffffa02a8337>] ? ipt_do_table+0x2f7/0x610 [ip_tables] [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121580] [<ffffffff810fac18>] kmalloc_large_node+0x68/0xc0 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121585] [<ffffffff810fe90a>] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121592] [<ffffffff813ebd42>] ? skb_copy+0x32/0xa0 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121596] [<ffffffff813e9606>] __alloc_skb+0x76/0x180 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121600] [<ffffffff813ebd42>] skb_copy+0x32/0xa0 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121615] [<ffffffffa07dd33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121620] [<ffffffff813f2512>] dev_hard_start_xmit+0x142/0x320 [kern.warning] Are the number of failures at least reduced or are they occuring at the same rate? Also, what was the last kernel that worked for you with this configuration? Thanks > Oct 19 22:09:52 johan kernel: [11157.121632] [<ffffffff8140a2c1>] __qdisc_run+0x1a1/0x230 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121637] [<ffffffff813f41e0>] dev_queue_xmit+0x2b0/0x3a0 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121642] [<ffffffff8142349b>] ip_finish_output+0x11b/0x2f0 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121646] [<ffffffff814236f9>] ip_output+0x89/0xd0 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121650] [<ffffffff81422710>] ip_local_out+0x20/0x30 [kern.warning] > Oct 19 22:09:52 johan kernel: [11157.121654] [<ffffffff81422ffb>] ip_queue_xmit+0x22b/0x3f0 [kern.warning] > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-20 10:57 ` Mel Gorman @ 2009-10-20 11:44 ` Tobias Oetiker -1 siblings, 0 replies; 384+ messages in thread From: Tobias Oetiker @ 2009-10-20 11:44 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Hi Mel, Today Mel Gorman wrote: > On Mon, Oct 19, 2009 at 10:17:06PM +0200, Tobias Oetiker wrote: > > Oct 19 22:09:52 johan kernel: [11157.121600] [<ffffffff813ebd42>] skb_copy+0x32/0xa0 [kern.warning] > > Oct 19 22:09:52 johan kernel: [11157.121615] [<ffffffffa07dd33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] > > Oct 19 22:09:52 johan kernel: [11157.121620] [<ffffffff813f2512>] dev_hard_start_xmit+0x142/0x320 [kern.warning] > > Are the number of failures at least reduced or are they occuring at the > same rate? not that it would have any statistical significance, but I had 5 failure (clusters) yesterday morning and 5 this morning ... the failures often show up in groups I saved one on http://tobi.oetiker.ch/cluster-2009-10-20-08-31.txt > Also, what was the last kernel that worked for you with this > configuration? that would be 2.6.24 ... I have not upgraded in quite some time. But since the io performance of 2.6.31 is about double in my tests I thought it would be a good thing todo ... cheers tobi > Thanks > > > Oct 19 22:09:52 johan kernel: [11157.121632] [<ffffffff8140a2c1>] __qdisc_run+0x1a1/0x230 [kern.warning] > > Oct 19 22:09:52 johan kernel: [11157.121637] [<ffffffff813f41e0>] dev_queue_xmit+0x2b0/0x3a0 [kern.warning] > > Oct 19 22:09:52 johan kernel: [11157.121642] [<ffffffff8142349b>] ip_finish_output+0x11b/0x2f0 [kern.warning] > > Oct 19 22:09:52 johan kernel: [11157.121646] [<ffffffff814236f9>] ip_output+0x89/0xd0 [kern.warning] > > Oct 19 22:09:52 johan kernel: [11157.121650] [<ffffffff81422710>] ip_local_out+0x20/0x30 [kern.warning] > > Oct 19 22:09:52 johan kernel: [11157.121654] [<ffffffff81422ffb>] ip_queue_xmit+0x22b/0x3f0 [kern.warning] > > > > -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-20 11:44 ` Tobias Oetiker 0 siblings, 0 replies; 384+ messages in thread From: Tobias Oetiker @ 2009-10-20 11:44 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Hi Mel, Today Mel Gorman wrote: > On Mon, Oct 19, 2009 at 10:17:06PM +0200, Tobias Oetiker wrote: > > Oct 19 22:09:52 johan kernel: [11157.121600] [<ffffffff813ebd42>] skb_copy+0x32/0xa0 [kern.warning] > > Oct 19 22:09:52 johan kernel: [11157.121615] [<ffffffffa07dd33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] > > Oct 19 22:09:52 johan kernel: [11157.121620] [<ffffffff813f2512>] dev_hard_start_xmit+0x142/0x320 [kern.warning] > > Are the number of failures at least reduced or are they occuring at the > same rate? not that it would have any statistical significance, but I had 5 failure (clusters) yesterday morning and 5 this morning ... the failures often show up in groups I saved one on http://tobi.oetiker.ch/cluster-2009-10-20-08-31.txt > Also, what was the last kernel that worked for you with this > configuration? that would be 2.6.24 ... I have not upgraded in quite some time. But since the io performance of 2.6.31 is about double in my tests I thought it would be a good thing todo ... cheers tobi > Thanks > > > Oct 19 22:09:52 johan kernel: [11157.121632] [<ffffffff8140a2c1>] __qdisc_run+0x1a1/0x230 [kern.warning] > > Oct 19 22:09:52 johan kernel: [11157.121637] [<ffffffff813f41e0>] dev_queue_xmit+0x2b0/0x3a0 [kern.warning] > > Oct 19 22:09:52 johan kernel: [11157.121642] [<ffffffff8142349b>] ip_finish_output+0x11b/0x2f0 [kern.warning] > > Oct 19 22:09:52 johan kernel: [11157.121646] [<ffffffff814236f9>] ip_output+0x89/0xd0 [kern.warning] > > Oct 19 22:09:52 johan kernel: [11157.121650] [<ffffffff81422710>] ip_local_out+0x20/0x30 [kern.warning] > > Oct 19 22:09:52 johan kernel: [11157.121654] [<ffffffff81422ffb>] ip_queue_xmit+0x22b/0x3f0 [kern.warning] > > > > -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-20 11:44 ` Tobias Oetiker @ 2009-10-20 12:51 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-20 12:51 UTC (permalink / raw) To: Tobias Oetiker Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Tue, Oct 20, 2009 at 01:44:50PM +0200, Tobias Oetiker wrote: > Hi Mel, > > Today Mel Gorman wrote: > > > On Mon, Oct 19, 2009 at 10:17:06PM +0200, Tobias Oetiker wrote: > > > > Oct 19 22:09:52 johan kernel: [11157.121600] [<ffffffff813ebd42>] skb_copy+0x32/0xa0 [kern.warning] > > > Oct 19 22:09:52 johan kernel: [11157.121615] [<ffffffffa07dd33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] > > > Oct 19 22:09:52 johan kernel: [11157.121620] [<ffffffff813f2512>] dev_hard_start_xmit+0x142/0x320 [kern.warning] > > > > Are the number of failures at least reduced or are they occuring at the > > same rate? > > not that it would have any statistical significance, but I had 5 > failure (clusters) yesterday morning and 5 this morning ... > Before the patches were applied, how many failures were you seeing in the morning? > the failures often show up in groups I saved one on > http://tobi.oetiker.ch/cluster-2009-10-20-08-31.txt > > > Also, what was the last kernel that worked for you with this > > configuration? > > that would be 2.6.24 ... I have not upgraded in quite some time. > But since the io performance of 2.6.31 is about double in my tests > I thought it would be a good thing todo ... > That significant a different in performance may explain differences in timing as well. i.e. the allocator is being put under more pressure now than it was previously as more processes make forward progress. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-20 12:51 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-20 12:51 UTC (permalink / raw) To: Tobias Oetiker Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Tue, Oct 20, 2009 at 01:44:50PM +0200, Tobias Oetiker wrote: > Hi Mel, > > Today Mel Gorman wrote: > > > On Mon, Oct 19, 2009 at 10:17:06PM +0200, Tobias Oetiker wrote: > > > > Oct 19 22:09:52 johan kernel: [11157.121600] [<ffffffff813ebd42>] skb_copy+0x32/0xa0 [kern.warning] > > > Oct 19 22:09:52 johan kernel: [11157.121615] [<ffffffffa07dd33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] > > > Oct 19 22:09:52 johan kernel: [11157.121620] [<ffffffff813f2512>] dev_hard_start_xmit+0x142/0x320 [kern.warning] > > > > Are the number of failures at least reduced or are they occuring at the > > same rate? > > not that it would have any statistical significance, but I had 5 > failure (clusters) yesterday morning and 5 this morning ... > Before the patches were applied, how many failures were you seeing in the morning? > the failures often show up in groups I saved one on > http://tobi.oetiker.ch/cluster-2009-10-20-08-31.txt > > > Also, what was the last kernel that worked for you with this > > configuration? > > that would be 2.6.24 ... I have not upgraded in quite some time. > But since the io performance of 2.6.31 is about double in my tests > I thought it would be a good thing todo ... > That significant a different in performance may explain differences in timing as well. i.e. the allocator is being put under more pressure now than it was previously as more processes make forward progress. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-20 12:51 ` Mel Gorman @ 2009-10-20 12:58 ` Tobias Oetiker -1 siblings, 0 replies; 384+ messages in thread From: Tobias Oetiker @ 2009-10-20 12:58 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Hi Mel, Today Mel Gorman wrote: > On Tue, Oct 20, 2009 at 01:44:50PM +0200, Tobias Oetiker wrote: > > Hi Mel, > > > > Today Mel Gorman wrote: > > > > > On Mon, Oct 19, 2009 at 10:17:06PM +0200, Tobias Oetiker wrote: > > > > > > Oct 19 22:09:52 johan kernel: [11157.121600] [<ffffffff813ebd42>] skb_copy+0x32/0xa0 [kern.warning] > > > > Oct 19 22:09:52 johan kernel: [11157.121615] [<ffffffffa07dd33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] > > > > Oct 19 22:09:52 johan kernel: [11157.121620] [<ffffffff813f2512>] dev_hard_start_xmit+0x142/0x320 [kern.warning] > > > > > > Are the number of failures at least reduced or are they occuring at the > > > same rate? > > > > not that it would have any statistical significance, but I had 5 > > failure (clusters) yesterday morning and 5 this morning ... > > > > Before the patches were applied, how many failures were you seeing in > the morning? 5 as well ... before an after ... > > the failures often show up in groups I saved one on > > http://tobi.oetiker.ch/cluster-2009-10-20-08-31.txt > > > > > Also, what was the last kernel that worked for you with this > > > configuration? > > > > that would be 2.6.24 ... I have not upgraded in quite some time. > > But since the io performance of 2.6.31 is about double in my tests > > I thought it would be a good thing todo ... > > > > That significant a different in performance may explain differences in timing > as well. i.e. the allocator is being put under more pressure now than it > was previously as more processes make forward progress. you are saing that the problem might be even older ? we do have 8GB ram and 16 GB swap, so it should not fail to allocate all that often top - 14:58:34 up 19:54, 6 users, load average: 2.09, 1.94, 1.97 Tasks: 451 total, 1 running, 449 sleeping, 0 stopped, 1 zombie Cpu(s): 3.5%us, 15.5%sy, 2.0%ni, 72.2%id, 6.5%wa, 0.1%hi, 0.3%si, 0.0%st Mem: 8198504k total, 7599132k used, 599372k free, 1212636k buffers Swap: 16777208k total, 83568k used, 16693640k free, 610136k cached cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-20 12:58 ` Tobias Oetiker 0 siblings, 0 replies; 384+ messages in thread From: Tobias Oetiker @ 2009-10-20 12:58 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Hi Mel, Today Mel Gorman wrote: > On Tue, Oct 20, 2009 at 01:44:50PM +0200, Tobias Oetiker wrote: > > Hi Mel, > > > > Today Mel Gorman wrote: > > > > > On Mon, Oct 19, 2009 at 10:17:06PM +0200, Tobias Oetiker wrote: > > > > > > Oct 19 22:09:52 johan kernel: [11157.121600] [<ffffffff813ebd42>] skb_copy+0x32/0xa0 [kern.warning] > > > > Oct 19 22:09:52 johan kernel: [11157.121615] [<ffffffffa07dd33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] > > > > Oct 19 22:09:52 johan kernel: [11157.121620] [<ffffffff813f2512>] dev_hard_start_xmit+0x142/0x320 [kern.warning] > > > > > > Are the number of failures at least reduced or are they occuring at the > > > same rate? > > > > not that it would have any statistical significance, but I had 5 > > failure (clusters) yesterday morning and 5 this morning ... > > > > Before the patches were applied, how many failures were you seeing in > the morning? 5 as well ... before an after ... > > the failures often show up in groups I saved one on > > http://tobi.oetiker.ch/cluster-2009-10-20-08-31.txt > > > > > Also, what was the last kernel that worked for you with this > > > configuration? > > > > that would be 2.6.24 ... I have not upgraded in quite some time. > > But since the io performance of 2.6.31 is about double in my tests > > I thought it would be a good thing todo ... > > > > That significant a different in performance may explain differences in timing > as well. i.e. the allocator is being put under more pressure now than it > was previously as more processes make forward progress. you are saing that the problem might be even older ? we do have 8GB ram and 16 GB swap, so it should not fail to allocate all that often top - 14:58:34 up 19:54, 6 users, load average: 2.09, 1.94, 1.97 Tasks: 451 total, 1 running, 449 sleeping, 0 stopped, 1 zombie Cpu(s): 3.5%us, 15.5%sy, 2.0%ni, 72.2%id, 6.5%wa, 0.1%hi, 0.3%si, 0.0%st Mem: 8198504k total, 7599132k used, 599372k free, 1212636k buffers Swap: 16777208k total, 83568k used, 16693640k free, 610136k cached cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-20 12:58 ` Tobias Oetiker @ 2009-10-20 13:39 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-20 13:39 UTC (permalink / raw) To: Tobias Oetiker Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Tue, Oct 20, 2009 at 02:58:53PM +0200, Tobias Oetiker wrote: > Hi Mel, > > Today Mel Gorman wrote: > > > On Tue, Oct 20, 2009 at 01:44:50PM +0200, Tobias Oetiker wrote: > > > Hi Mel, > > > > > > Today Mel Gorman wrote: > > > > > > > On Mon, Oct 19, 2009 at 10:17:06PM +0200, Tobias Oetiker wrote: > > > > > > > > Oct 19 22:09:52 johan kernel: [11157.121600] [<ffffffff813ebd42>] skb_copy+0x32/0xa0 [kern.warning] > > > > > Oct 19 22:09:52 johan kernel: [11157.121615] [<ffffffffa07dd33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] > > > > > Oct 19 22:09:52 johan kernel: [11157.121620] [<ffffffff813f2512>] dev_hard_start_xmit+0x142/0x320 [kern.warning] > > > > > > > > Are the number of failures at least reduced or are they occuring at the > > > > same rate? > > > > > > not that it would have any statistical significance, but I had 5 > > > failure (clusters) yesterday morning and 5 this morning ... > > > > > > > Before the patches were applied, how many failures were you seeing in > > the morning? > > 5 as well ... before an after ... > > > > the failures often show up in groups I saved one on > > > http://tobi.oetiker.ch/cluster-2009-10-20-08-31.txt > > > > > > > Also, what was the last kernel that worked for you with this > > > > configuration? > > > > > > that would be 2.6.24 ... I have not upgraded in quite some time. > > > But since the io performance of 2.6.31 is about double in my tests > > > I thought it would be a good thing todo ... > > > > > > > That significant a different in performance may explain differences in timing > > as well. i.e. the allocator is being put under more pressure now than it > > was previously as more processes make forward progress. > > you are saing that the problem might be even older ? > > we do have 8GB ram and 16 GB swap, so it should not fail to allocate all that > often > > top - 14:58:34 up 19:54, 6 users, load average: 2.09, 1.94, 1.97 > Tasks: 451 total, 1 running, 449 sleeping, 0 stopped, 1 zombie > Cpu(s): 3.5%us, 15.5%sy, 2.0%ni, 72.2%id, 6.5%wa, 0.1%hi, 0.3%si, 0.0%st > Mem: 8198504k total, 7599132k used, 599372k free, 1212636k buffers > Swap: 16777208k total, 83568k used, 16693640k free, 610136k cached > High-order atomic allocations of the type you are trying at that frequency were always a very long shot. The most likely outcome is that something has changed that means a burst of allocations trigger an allocation failure where as before processes would delay long enough for the system not to notice. 1. Have MTU settings changed? 2. As order-5 allocations are required to succeed, I'm surprised in a sense that there are only 5 failures because it implies the machine is actually recovering and continueing on as normal. Can you think of what happens in the morning that causes a burst of allocations to occur? 3. Other than the failures, have you noticed any other problems with the machine or does it continue along happily? 4. Does the following patch help by any chance? Thanks ==== CUT HERE ==== vmscan: Force kswapd to take notice faster when high-order watermarks are being hit When a high-order allocation fails, kswapd is kicked so that it reclaims at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC allocations. Something has changed in recent kernels that affect the timing where high-order GFP_ATOMIC allocations are now failing with more frequency, particularly under pressure. This patch forces kswapd to notice sooner that high-order allocations are occuring by checking when watermarks are hit early and by having kswapd restart quickly when the reclaim order is increased. Not-signed-off-by-because-this-is-a-hatchet-job: Mel Gorman <mel@csn.ul.ie> --- mm/page_alloc.c | 14 ++++++++++++-- mm/vmscan.c | 9 +++++++++ 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2fd7b20..fdbf8c9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1907,6 +1906,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, zonelist, high_zoneidx, nodemask, preferred_zone, migratetype); + /* + * If after a high-order allocation we are now below watermarks, + * pre-emptively kick kswapd rather than having the next allocation + * fail and have to wake up kswapd, potentially failing GFP_ATOMIC + * allocations or entering direct reclaim + */ + if (unlikely(order) && page && !zone_watermark_ok(preferred_zone, order, + preferred_zone->watermark[ALLOC_WMARK_LOW], + zone_idx(preferred_zone), ALLOC_WMARK_LOW)) + wake_all_kswapd(order, zonelist, high_zoneidx); + return page; } EXPORT_SYMBOL(__alloc_pages_nodemask); diff --git a/mm/vmscan.c b/mm/vmscan.c index 9219beb..0e66a6b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1925,6 +1925,15 @@ loop_again: priority != DEF_PRIORITY) continue; + /* + * Exit quickly to restart if it has been indicated + * that higher orders are required + */ + if (pgdat->kswapd_max_order > order) { + all_zones_ok = 1; + goto out; + } + if (!zone_watermark_ok(zone, order, high_wmark_pages(zone), end_zone, 0)) all_zones_ok = 0; ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-20 13:39 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-20 13:39 UTC (permalink / raw) To: Tobias Oetiker Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Tue, Oct 20, 2009 at 02:58:53PM +0200, Tobias Oetiker wrote: > Hi Mel, > > Today Mel Gorman wrote: > > > On Tue, Oct 20, 2009 at 01:44:50PM +0200, Tobias Oetiker wrote: > > > Hi Mel, > > > > > > Today Mel Gorman wrote: > > > > > > > On Mon, Oct 19, 2009 at 10:17:06PM +0200, Tobias Oetiker wrote: > > > > > > > > Oct 19 22:09:52 johan kernel: [11157.121600] [<ffffffff813ebd42>] skb_copy+0x32/0xa0 [kern.warning] > > > > > Oct 19 22:09:52 johan kernel: [11157.121615] [<ffffffffa07dd33c>] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] > > > > > Oct 19 22:09:52 johan kernel: [11157.121620] [<ffffffff813f2512>] dev_hard_start_xmit+0x142/0x320 [kern.warning] > > > > > > > > Are the number of failures at least reduced or are they occuring at the > > > > same rate? > > > > > > not that it would have any statistical significance, but I had 5 > > > failure (clusters) yesterday morning and 5 this morning ... > > > > > > > Before the patches were applied, how many failures were you seeing in > > the morning? > > 5 as well ... before an after ... > > > > the failures often show up in groups I saved one on > > > http://tobi.oetiker.ch/cluster-2009-10-20-08-31.txt > > > > > > > Also, what was the last kernel that worked for you with this > > > > configuration? > > > > > > that would be 2.6.24 ... I have not upgraded in quite some time. > > > But since the io performance of 2.6.31 is about double in my tests > > > I thought it would be a good thing todo ... > > > > > > > That significant a different in performance may explain differences in timing > > as well. i.e. the allocator is being put under more pressure now than it > > was previously as more processes make forward progress. > > you are saing that the problem might be even older ? > > we do have 8GB ram and 16 GB swap, so it should not fail to allocate all that > often > > top - 14:58:34 up 19:54, 6 users, load average: 2.09, 1.94, 1.97 > Tasks: 451 total, 1 running, 449 sleeping, 0 stopped, 1 zombie > Cpu(s): 3.5%us, 15.5%sy, 2.0%ni, 72.2%id, 6.5%wa, 0.1%hi, 0.3%si, 0.0%st > Mem: 8198504k total, 7599132k used, 599372k free, 1212636k buffers > Swap: 16777208k total, 83568k used, 16693640k free, 610136k cached > High-order atomic allocations of the type you are trying at that frequency were always a very long shot. The most likely outcome is that something has changed that means a burst of allocations trigger an allocation failure where as before processes would delay long enough for the system not to notice. 1. Have MTU settings changed? 2. As order-5 allocations are required to succeed, I'm surprised in a sense that there are only 5 failures because it implies the machine is actually recovering and continueing on as normal. Can you think of what happens in the morning that causes a burst of allocations to occur? 3. Other than the failures, have you noticed any other problems with the machine or does it continue along happily? 4. Does the following patch help by any chance? Thanks ==== CUT HERE ==== vmscan: Force kswapd to take notice faster when high-order watermarks are being hit When a high-order allocation fails, kswapd is kicked so that it reclaims at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC allocations. Something has changed in recent kernels that affect the timing where high-order GFP_ATOMIC allocations are now failing with more frequency, particularly under pressure. This patch forces kswapd to notice sooner that high-order allocations are occuring by checking when watermarks are hit early and by having kswapd restart quickly when the reclaim order is increased. Not-signed-off-by-because-this-is-a-hatchet-job: Mel Gorman <mel@csn.ul.ie> --- mm/page_alloc.c | 14 ++++++++++++-- mm/vmscan.c | 9 +++++++++ 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2fd7b20..fdbf8c9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1907,6 +1906,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, zonelist, high_zoneidx, nodemask, preferred_zone, migratetype); + /* + * If after a high-order allocation we are now below watermarks, + * pre-emptively kick kswapd rather than having the next allocation + * fail and have to wake up kswapd, potentially failing GFP_ATOMIC + * allocations or entering direct reclaim + */ + if (unlikely(order) && page && !zone_watermark_ok(preferred_zone, order, + preferred_zone->watermark[ALLOC_WMARK_LOW], + zone_idx(preferred_zone), ALLOC_WMARK_LOW)) + wake_all_kswapd(order, zonelist, high_zoneidx); + return page; } EXPORT_SYMBOL(__alloc_pages_nodemask); diff --git a/mm/vmscan.c b/mm/vmscan.c index 9219beb..0e66a6b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1925,6 +1925,15 @@ loop_again: priority != DEF_PRIORITY) continue; + /* + * Exit quickly to restart if it has been indicated + * that higher orders are required + */ + if (pgdat->kswapd_max_order > order) { + all_zones_ok = 1; + goto out; + } + if (!zone_watermark_ok(zone, order, high_wmark_pages(zone), end_zone, 0)) all_zones_ok = 0; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-20 13:39 ` Mel Gorman @ 2009-10-20 13:50 ` Tobias Oetiker -1 siblings, 0 replies; 384+ messages in thread From: Tobias Oetiker @ 2009-10-20 13:50 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Hi Mel, Today Mel Gorman wrote: > On Tue, Oct 20, 2009 at 02:58:53PM +0200, Tobias Oetiker wrote: > > you are saing that the problem might be even older ? > > > > we do have 8GB ram and 16 GB swap, so it should not fail to allocate all that > > often > > > > top - 14:58:34 up 19:54, 6 users, load average: 2.09, 1.94, 1.97 > > Tasks: 451 total, 1 running, 449 sleeping, 0 stopped, 1 zombie > > Cpu(s): 3.5%us, 15.5%sy, 2.0%ni, 72.2%id, 6.5%wa, 0.1%hi, 0.3%si, 0.0%st > > Mem: 8198504k total, 7599132k used, 599372k free, 1212636k buffers > > Swap: 16777208k total, 83568k used, 16693640k free, 610136k cached > > > > High-order atomic allocations of the type you are trying at that frequency > were always a very long shot. The most likely outcome is that something > has changed that means a burst of allocations trigger an allocation failure > where as before processes would delay long enough for the system not to notice. > > 1. Have MTU settings changed? no not to my knowledge > 2. As order-5 allocations are required to succeed, I'm surprised in a > sense that there are only 5 failures because it implies the machine is > actually recovering and continueing on as normal. Can you think of what > happens in the morning that causes a burst of allocations to occur? the burts occur all day while the machine is in use ... its just that I was writing this at noon so only the morning had passed. So I compared things to the day before ... > 3. Other than the failures, have you noticed any other problems with the > machine or does it continue along happily? The machine seems to be fine. > 4. Does the following patch help by any chance? should I try this on vanilla 2.6.31.4 or ontop of your previous patch? we are running virtualbox 3.0.8 on this machine, virtualbox is using the physical network interface in bridge mode access the network. Could this have something todo with the problem ? cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-20 13:50 ` Tobias Oetiker 0 siblings, 0 replies; 384+ messages in thread From: Tobias Oetiker @ 2009-10-20 13:50 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Hi Mel, Today Mel Gorman wrote: > On Tue, Oct 20, 2009 at 02:58:53PM +0200, Tobias Oetiker wrote: > > you are saing that the problem might be even older ? > > > > we do have 8GB ram and 16 GB swap, so it should not fail to allocate all that > > often > > > > top - 14:58:34 up 19:54, 6 users, load average: 2.09, 1.94, 1.97 > > Tasks: 451 total, 1 running, 449 sleeping, 0 stopped, 1 zombie > > Cpu(s): 3.5%us, 15.5%sy, 2.0%ni, 72.2%id, 6.5%wa, 0.1%hi, 0.3%si, 0.0%st > > Mem: 8198504k total, 7599132k used, 599372k free, 1212636k buffers > > Swap: 16777208k total, 83568k used, 16693640k free, 610136k cached > > > > High-order atomic allocations of the type you are trying at that frequency > were always a very long shot. The most likely outcome is that something > has changed that means a burst of allocations trigger an allocation failure > where as before processes would delay long enough for the system not to notice. > > 1. Have MTU settings changed? no not to my knowledge > 2. As order-5 allocations are required to succeed, I'm surprised in a > sense that there are only 5 failures because it implies the machine is > actually recovering and continueing on as normal. Can you think of what > happens in the morning that causes a burst of allocations to occur? the burts occur all day while the machine is in use ... its just that I was writing this at noon so only the morning had passed. So I compared things to the day before ... > 3. Other than the failures, have you noticed any other problems with the > machine or does it continue along happily? The machine seems to be fine. > 4. Does the following patch help by any chance? should I try this on vanilla 2.6.31.4 or ontop of your previous patch? we are running virtualbox 3.0.8 on this machine, virtualbox is using the physical network interface in bridge mode access the network. Could this have something todo with the problem ? cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-20 13:50 ` Tobias Oetiker @ 2009-10-20 14:14 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-20 14:14 UTC (permalink / raw) To: Tobias Oetiker Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Tue, Oct 20, 2009 at 03:50:12PM +0200, Tobias Oetiker wrote: > Hi Mel, > > Today Mel Gorman wrote: > > > On Tue, Oct 20, 2009 at 02:58:53PM +0200, Tobias Oetiker wrote: > > > you are saing that the problem might be even older ? > > > > > > we do have 8GB ram and 16 GB swap, so it should not fail to allocate all that > > > often > > > > > > top - 14:58:34 up 19:54, 6 users, load average: 2.09, 1.94, 1.97 > > > Tasks: 451 total, 1 running, 449 sleeping, 0 stopped, 1 zombie > > > Cpu(s): 3.5%us, 15.5%sy, 2.0%ni, 72.2%id, 6.5%wa, 0.1%hi, 0.3%si, 0.0%st > > > Mem: 8198504k total, 7599132k used, 599372k free, 1212636k buffers > > > Swap: 16777208k total, 83568k used, 16693640k free, 610136k cached > > > > > > > High-order atomic allocations of the type you are trying at that frequency > > were always a very long shot. The most likely outcome is that something > > has changed that means a burst of allocations trigger an allocation failure > > where as before processes would delay long enough for the system not to notice. > > > > 1. Have MTU settings changed? > > no not to my knowledge > > > 2. As order-5 allocations are required to succeed, I'm surprised in a > > sense that there are only 5 failures because it implies the machine is > > actually recovering and continueing on as normal. Can you think of what > > happens in the morning that causes a burst of allocations to occur? > > the burts occur all day while the machine is in use ... its just > that I was writing this at noon so only the morning had passed. So > I compared things to the day before ... > Over the course of a day, how many would you see? By and large, it seems that the problem yourself and Frans are similar except his is a lot more severe. > > 3. Other than the failures, have you noticed any other problems with the > > machine or does it continue along happily? > > The machine seems to be fine. > > > 4. Does the following patch help by any chance? > > should I try this on vanilla 2.6.31.4 or ontop of your previous > patch? > Try on top of vanilla 2.6.31.4 first plase and if failures still occur, then on top of the previous patch. > we are running virtualbox 3.0.8 on this machine, virtualbox is using > the physical network interface in bridge mode access the network. > Could this have something todo with the problem ? > I do not know for sure. I'm assuming the configuration is the same on both kernels so it's unlikely to be the issue. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-20 14:14 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-20 14:14 UTC (permalink / raw) To: Tobias Oetiker Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe On Tue, Oct 20, 2009 at 03:50:12PM +0200, Tobias Oetiker wrote: > Hi Mel, > > Today Mel Gorman wrote: > > > On Tue, Oct 20, 2009 at 02:58:53PM +0200, Tobias Oetiker wrote: > > > you are saing that the problem might be even older ? > > > > > > we do have 8GB ram and 16 GB swap, so it should not fail to allocate all that > > > often > > > > > > top - 14:58:34 up 19:54, 6 users, load average: 2.09, 1.94, 1.97 > > > Tasks: 451 total, 1 running, 449 sleeping, 0 stopped, 1 zombie > > > Cpu(s): 3.5%us, 15.5%sy, 2.0%ni, 72.2%id, 6.5%wa, 0.1%hi, 0.3%si, 0.0%st > > > Mem: 8198504k total, 7599132k used, 599372k free, 1212636k buffers > > > Swap: 16777208k total, 83568k used, 16693640k free, 610136k cached > > > > > > > High-order atomic allocations of the type you are trying at that frequency > > were always a very long shot. The most likely outcome is that something > > has changed that means a burst of allocations trigger an allocation failure > > where as before processes would delay long enough for the system not to notice. > > > > 1. Have MTU settings changed? > > no not to my knowledge > > > 2. As order-5 allocations are required to succeed, I'm surprised in a > > sense that there are only 5 failures because it implies the machine is > > actually recovering and continueing on as normal. Can you think of what > > happens in the morning that causes a burst of allocations to occur? > > the burts occur all day while the machine is in use ... its just > that I was writing this at noon so only the morning had passed. So > I compared things to the day before ... > Over the course of a day, how many would you see? By and large, it seems that the problem yourself and Frans are similar except his is a lot more severe. > > 3. Other than the failures, have you noticed any other problems with the > > machine or does it continue along happily? > > The machine seems to be fine. > > > 4. Does the following patch help by any chance? > > should I try this on vanilla 2.6.31.4 or ontop of your previous > patch? > Try on top of vanilla 2.6.31.4 first plase and if failures still occur, then on top of the previous patch. > we are running virtualbox 3.0.8 on this machine, virtualbox is using > the physical network interface in bridge mode access the network. > Could this have something todo with the problem ? > I do not know for sure. I'm assuming the configuration is the same on both kernels so it's unlikely to be the issue. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-20 14:14 ` Mel Gorman @ 2009-10-20 14:20 ` Tobias Oetiker -1 siblings, 0 replies; 384+ messages in thread From: Tobias Oetiker @ 2009-10-20 14:20 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Hi Mel, Today Mel Gorman wrote: > > Over the course of a day, how many would you see? By and large, it seems > that the problem yourself and Frans are similar except his is a lot more > severe. yesterday it was 19 for 24 hours, today it is 9 for 16 hours (day is not done yet). > Try on top of vanilla 2.6.31.4 first plase and if failures still occur, > then on top of the previous patch. ok > > we are running virtualbox 3.0.8 on this machine, virtualbox is using > > the physical network interface in bridge mode access the network. > > Could this have something todo with the problem ? > > > > I do not know for sure. I'm assuming the configuration is the same on > both kernels so it's unlikely to be the issue. just to be on the sure side I created a tickt with the virtualbox people ... http://www.virtualbox.org/ticket/5260 cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-20 14:20 ` Tobias Oetiker 0 siblings, 0 replies; 384+ messages in thread From: Tobias Oetiker @ 2009-10-20 14:20 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Hi Mel, Today Mel Gorman wrote: > > Over the course of a day, how many would you see? By and large, it seems > that the problem yourself and Frans are similar except his is a lot more > severe. yesterday it was 19 for 24 hours, today it is 9 for 16 hours (day is not done yet). > Try on top of vanilla 2.6.31.4 first plase and if failures still occur, > then on top of the previous patch. ok > > we are running virtualbox 3.0.8 on this machine, virtualbox is using > > the physical network interface in bridge mode access the network. > > Could this have something todo with the problem ? > > > > I do not know for sure. I'm assuming the configuration is the same on > both kernels so it's unlikely to be the issue. just to be on the sure side I created a tickt with the virtualbox people ... http://www.virtualbox.org/ticket/5260 cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) 2009-10-20 13:39 ` Mel Gorman @ 2009-10-22 10:27 ` Tobias Oetiker -1 siblings, 0 replies; 384+ messages in thread From: Tobias Oetiker @ 2009-10-22 10:27 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Hi Mel, Tuesday Mel Gorman wrote: > 4. Does the following patch help by any chance? > > Thanks > > ==== CUT HERE ==== > vmscan: Force kswapd to take notice faster when high-order watermarks are being hit > > When a high-order allocation fails, kswapd is kicked so that it reclaims > at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC > allocations. Something has changed in recent kernels that affect the timing > where high-order GFP_ATOMIC allocations are now failing with more frequency, > particularly under pressure. This patch forces kswapd to notice sooner that > high-order allocations are occuring by checking when watermarks are hit early > and by having kswapd restart quickly when the reclaim order is increased. > > Not-signed-off-by-because-this-is-a-hatchet-job: Mel Gorman <mel@csn.ul.ie> > --- it does seem to help ... I have been running it from 6am to 12am on our server now and have not yet seen any issues ... will shout if I do ... cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures (generic) @ 2009-10-22 10:27 ` Tobias Oetiker 0 siblings, 0 replies; 384+ messages in thread From: Tobias Oetiker @ 2009-10-22 10:27 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe Hi Mel, Tuesday Mel Gorman wrote: > 4. Does the following patch help by any chance? > > Thanks > > ==== CUT HERE ==== > vmscan: Force kswapd to take notice faster when high-order watermarks are being hit > > When a high-order allocation fails, kswapd is kicked so that it reclaims > at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC > allocations. Something has changed in recent kernels that affect the timing > where high-order GFP_ATOMIC allocations are now failing with more frequency, > particularly under pressure. This patch forces kswapd to notice sooner that > high-order allocations are occuring by checking when watermarks are hit early > and by having kswapd restart quickly when the reclaim order is increased. > > Not-signed-off-by-because-this-is-a-hatchet-job: Mel Gorman <mel@csn.ul.ie> > --- it does seem to help ... I have been running it from 6am to 12am on our server now and have not yet seen any issues ... will shout if I do ... cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-19 0:36 ` Pekka Enberg @ 2009-10-19 2:52 ` Jens Axboe -1 siblings, 0 replies; 384+ messages in thread From: Jens Axboe @ 2009-10-19 2:52 UTC (permalink / raw) To: Pekka Enberg Cc: Frans Pop, Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Mon, Oct 19 2009, Pekka Enberg wrote: > (Adding Jens to CC.) > > On Wednesday 14 October 2009, Frans Pop wrote: > > > > There still has not been a mm-change identified that makes > > > > fragmentation significantly worse. > > On Mon, 2009-10-19 at 01:33 +0200, Frans Pop wrote: > > > My bisection shows a very clear point, even if not an individual commit, > > > in the 'akpm' merge where SKB errors suddenly become *much* more > > > frequent and easy to trigger. > > > I'm sorry to say this, but the fact that nothing has been identified yet > > > is IMO the result of a lack of effort, not because there is no such > > > change. > > > > I was wrong. It turns out that I was creating the variations in the test > > results around the akpm merge myself by tiny changes in the way I ran the > > tests. It took another round of about 30 compilations and tests purely in > > this range to show that, but those same tests also made me aware of other > > patterns I should look at. > > > > Until a few days ago I was concentrating on "do I see SKB allocation errors > > or not". Since then I've also been looking more consciously at when they > > happen, at disk access patterns and at desktop freeze patterns. > > > > I think I did mention before that this whole issue is rather subtle :-/ > > So, my apologies for finguering the wrong area for so long, but it looked > > solid given the info available at the time. > > > > On Thursday 15 October 2009, Mel Gorman wrote: > > > Outside the range of commits suspected of causing problems was the > > > following. It's extremely low probability > > > > > > Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write confusion > > > This patch alters the call to congestion_wait() in the page > > > allocator. Frankly, I don't get the change but it might worth > > > checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31 > > > makes any difference > > > > This is the real culprit. Mel: thanks very much for looking beyond the > > area I identified. Your overview of mm changes was exactly what I needed > > and really helped a lot during my later tests. > > > > This commit definitely causes most of the problems; confirmed by reverting > > it on top of 2.6.31 (also requires reverting 373c0a7e, which is a later > > build fix). > > Mel/Jens, any ideas why commit 8aa7e84 makes us run out of high order > pages? Should we be using BLK_RW_SYNC in mm/page_alloc.c instead of > BLK_RW_ASYNC? No, I think that is definitely broken since the page freeing should be using async writes. If the commit in question is making the difference and the below does indeed fix it, I think that's primarliy due to timing issues and the general brokeness of the congestion bits. With the below change, you essentially guarenteed to be waiting 20ms every time and it's quite likely that that is enough to change the picture. So I'd like elsewhere for the real problem, it's not likely to be caused by the sync vs async bits themselves. -- Jens Axboe ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-19 2:52 ` Jens Axboe 0 siblings, 0 replies; 384+ messages in thread From: Jens Axboe @ 2009-10-19 2:52 UTC (permalink / raw) To: Pekka Enberg Cc: Frans Pop, Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, John W. Linville, linux-mm On Mon, Oct 19 2009, Pekka Enberg wrote: > (Adding Jens to CC.) > > On Wednesday 14 October 2009, Frans Pop wrote: > > > > There still has not been a mm-change identified that makes > > > > fragmentation significantly worse. > > On Mon, 2009-10-19 at 01:33 +0200, Frans Pop wrote: > > > My bisection shows a very clear point, even if not an individual commit, > > > in the 'akpm' merge where SKB errors suddenly become *much* more > > > frequent and easy to trigger. > > > I'm sorry to say this, but the fact that nothing has been identified yet > > > is IMO the result of a lack of effort, not because there is no such > > > change. > > > > I was wrong. It turns out that I was creating the variations in the test > > results around the akpm merge myself by tiny changes in the way I ran the > > tests. It took another round of about 30 compilations and tests purely in > > this range to show that, but those same tests also made me aware of other > > patterns I should look at. > > > > Until a few days ago I was concentrating on "do I see SKB allocation errors > > or not". Since then I've also been looking more consciously at when they > > happen, at disk access patterns and at desktop freeze patterns. > > > > I think I did mention before that this whole issue is rather subtle :-/ > > So, my apologies for finguering the wrong area for so long, but it looked > > solid given the info available at the time. > > > > On Thursday 15 October 2009, Mel Gorman wrote: > > > Outside the range of commits suspected of causing problems was the > > > following. It's extremely low probability > > > > > > Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write confusion > > > This patch alters the call to congestion_wait() in the page > > > allocator. Frankly, I don't get the change but it might worth > > > checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31 > > > makes any difference > > > > This is the real culprit. Mel: thanks very much for looking beyond the > > area I identified. Your overview of mm changes was exactly what I needed > > and really helped a lot during my later tests. > > > > This commit definitely causes most of the problems; confirmed by reverting > > it on top of 2.6.31 (also requires reverting 373c0a7e, which is a later > > build fix). > > Mel/Jens, any ideas why commit 8aa7e84 makes us run out of high order > pages? Should we be using BLK_RW_SYNC in mm/page_alloc.c instead of > BLK_RW_ASYNC? No, I think that is definitely broken since the page freeing should be using async writes. If the commit in question is making the difference and the below does indeed fix it, I think that's primarliy due to timing issues and the general brokeness of the congestion bits. With the below change, you essentially guarenteed to be waiting 20ms every time and it's quite likely that that is enough to change the picture. So I'd like elsewhere for the real problem, it's not likely to be caused by the sync vs async bits themselves. -- Jens Axboe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-18 23:33 ` Frans Pop (?) @ 2009-10-19 14:01 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-19 14:01 UTC (permalink / raw) To: Frans Pop Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Mon, Oct 19, 2009 at 01:33:29AM +0200, Frans Pop wrote: > Another long mail, sorry. > > On Wednesday 14 October 2009, Frans Pop wrote: > > > There still has not been a mm-change identified that makes > > > fragmentation significantly worse. > > > > My bisection shows a very clear point, even if not an individual commit, > > in the 'akpm' merge where SKB errors suddenly become *much* more > > frequent and easy to trigger. > > I'm sorry to say this, but the fact that nothing has been identified yet > > is IMO the result of a lack of effort, not because there is no such > > change. > > I was wrong. It turns out that I was creating the variations in the test > results around the akpm merge myself by tiny changes in the way I ran the > tests. It took another round of about 30 compilations and tests purely in > this range to show that, but those same tests also made me aware of other > patterns I should look at. > Once again, thanks for persisting with this for so long. That many tests and searching is a miserable undertaking. > Until a few days ago I was concentrating on "do I see SKB allocation errors > or not". Since then I've also been looking more consciously at when they > happen, at disk access patterns and at desktop freeze patterns. > > I think I did mention before that this whole issue is rather subtle :-/ Indeed > So, my apologies for finguering the wrong area for so long, but it looked > solid given the info available at the time. > > On Thursday 15 October 2009, Mel Gorman wrote: > > Outside the range of commits suspected of causing problems was the > > following. It's extremely low probability > > > > Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write confusion > > This patch alters the call to congestion_wait() in the page > > allocator. Frankly, I don't get the change but it might worth > > checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31 > > makes any difference > > This is the real culprit. Mel: thanks very much for looking beyond the > area I identified. Your overview of mm changes was exactly what I needed > and really helped a lot during my later tests. > I'm surprised this made such a big difference which is why I described it as "extremely low probability". It implies that the real problem isn't fragmentation per-se but the timing of when pages get consumed. Maybe what has really changed is how long direct reclaimers wait before trying to allocate again. After the commit, if direct reclaimers are waiting longer between direct reclaim attempts, it might mean that the GFP_KERNEL reclaimers of high-order pages are doing less work before and hurting parallel GFP_ATOMIC users. Jens, does this sound plausible? > This commit definitely causes most of the problems; confirmed by reverting > it on top of 2.6.31 (also requires reverting 373c0a7e, which is a later > build fix). > > The rest of this mail gives details on my tests and how I reached the above > conclusion. > > TEST BASELINE (2.6.30) > ====================== > I mentioned in an earlier mail that I run three instances of gitk for my > tests. Loading gitk seems to consist of 3 phases: > 1) general initial scan of the repository (branches?) > 2) reading commits: commit counter increases > 3) reading references (including bisection good/bad points) and > uncommitted changes > > Below times and comments per stage when the test is run with 2.6.30. As my > test starts after a clean boot, buffers are mostly empty. > > 1st instance: 'gitk v2.6.29..master' (preparation) > 1) ~20 seconds; user interface is mostly blank > 2) ~5 seconds to read 35.000 commits; user interface is updated and counter > increases steadily as they are read > 3) ~10 seconds; "branch"/"follows"/"precedes" info and tags are filled > in; fairly heavy disk activity > > 2st instance: 'gitk master' (preparation) > 1) 0 seconds (because data is already buffered) > 2) ~25 seconds to read 167500 commits; counter increases steadily > 3) 1-2 seconds (because data is already buffered) > > 3st instance: 'gitk master' (the actual test) > 1) 0 seconds because data is already buffered > 2) ~55 seconds due to swapping overhead; minor music skip around commit > 110.000; counter slower after 90.000, some short halts, but generally > increases steadily; moderate disk activity > 3) ~55-60 seconds; because buffers have been emptied data must by read > again, with swapping; very heavy disk activity; fairly long music > skip (15-20 seconds), but no SKB allocation errors > > So, the loading of the 3rd instance takes 1.5 minutes longer than the > second because of the swapping. And phase 3) is most affected by it. > > AFTER WIRELESS CHANGE > ===================== > After commit 4752c93c30 ("iwlcore: Allow skb allocation from tasklet") I > start getting the SKB errors. They can be triggered reliably if the whole > test is repeated 1 or 2 times, but generally not the first time the test > is run. It's up to the wireless driver maintainer what to do here, but it seems like that patch needs to be reverted and thought about some more before trying again. > > Or so I thought for a long time. > It turns out that I will get SKB errors during the first run if I'm > "sloppy" in the test execution. For example if I wait too long before > switching from the last gitk instance to konsole where I have > a 'tail -f /var/log/kern.log' running. So the timing is critical of when the high-order atomic allocations start kicking in. > Another factor is the state of the repository: do I have master checked > out, or an older branch, or am I in the middle of a bisection. This > influences how data is read from the disk and thus the test results. > A last factor may be the size of the kernel I'm using: my test/bisect > kernel is significantly smaller than my regular kernel. > > If the test is run completely cleanly, I will not get SKB errors during the > first run. Also, this change does not affect the timings of the test at > all: the total load time of the 3rd instance is still ~1:55 and music > skips happen in roughly the same places. The pattern of disk activity also > remains unchanged. > > If I do *not* run the test cleanly, any SKB errors during the first test > run will always be during phase 3), never during phase 2). This is what I > saw during tests in the 'akpm' range, and explains the inconsistent > results there. > > After discovering this I've made a copy of the git repo so that I always > test using the exact same state and tightened my test procedure. > > AFTER congestion_wait CHANGE > ============================ > If I test commit 9f2d8be, which is just before the congestion_wait() > change, I still get the same pattern as described above. But when I test > with 8aa7e84 ("Fix congestion_wait() sync/async vs read/write confusion"), > things change dramatically when the 3rd gitk instance is started. > So, assuming this is a timing problem, this commit affects the timing of when pages are consumed by processes doing direct reclaim. > During the 2nd phase I see the first SKB allocation errors with a music > skip between reading commits 95.000 and 110.000. > About commit 115.000 there is a very long pause during which the counter > does not increase, music stops and the desktop freezes completely. The > first 30 seconds of that freeze there is only very low disk activity (which > seems strange); I'm just going to have to depend on Jens here. Jens, the congestion_wait() is on BLK_RW_ASYNC after the commit. Reclaim usually writes pages asynchronously but lumpy reclaim actually waits of pages to write out synchronously so it's not always async. Either way, reclaim is usually worried about writing pages but it would appear after this change that a lot of read activity can also stall a process in direct reclaim. What might be happening in Frans's particular case is that the tasklet that allocates high-order pages for the RX buffers is getting stalled by congestion caused by other processes doing reads from the filesystem. While it makes sense from a congestion point of view to halt the IO, the reclaim operations from direct reclaimers is getting delayed for long enough to cause problems for GFP_ATOMIC. Does this sound plausible to you? If so, what's the best way of addressing this? Changing congestion_wait back to WRITE (assuming that works for Frans)? Changing it to SYNC (again, assuming it actually works) or a revert? > the next 25 seconds there suddenly is very high disk > activity during which things gradually unfreeze and more SKB errors are > displayed. After that the commit counter runs up fairly steadily again. > > Phase 2) ends at ~1:45. Phase 3) (with more SKB errors) ends at ~2:05. > > So this change almost doubles the time needed for phase 2) and causes SKB > allocation errors to occur during that phase. Also, before this commit the > desktop freezes are much shorter and less severe. With this change the > desktop is completely unusable for almost a minute during phase 2), with > even the mouse pointer frozen solid. > Note that phase 3) becomes shorter, but that the total time needed to load > the 3rd instance increases by about 10-15 seconds. > > Note: -rc2 and -rc3 had broken NFS, so I had to cherry-pick 3 NFS commits > from -rc4 on top of the commits I wanted to test. > > WITH congestion_wait CHANGE REVERTED > ==================================== > I've done quite a few tests of 2.6.31 with 373c0a7e and 8aa7e847 reverted > to confirm that's really the culprit. I've done this for .31-rc3, .31-rc4, > .31-rc5, .31 and .31.1. > > In all cases the huge freeze in phase 2) is gone and the general behavior > and timings are again as it was after the wireless change. During most > tests I did not get any SKB allocation errors during phase 2) or phase 3). > > However with .31-rc5, .31 and .31.1 I have had some tests where I would see > a few SKB allocation errors during phase 3) (which is somewhat likely), > but also during phase 2). At this point I'm unsure whether this is just > noise, or maybe a minor influence from some change merged after .31-rc4. > Looking through the commits there are several mm/page allocation changes. > It could still be kswapd not being woken up often enough after direct reclaimers. I took a look through the commits but none of the mm or allocator changes struck me as likely candidates for making fragmentation worse or altering the timing. > For now I suggest ignoring this though as the impact (if any) is very minor > and it is not reproducible reliably enough. > > Next I'll retest Mel's patches and also test Reinette's patches. > Of the two patches, only the kswapd one should have any significance. As David pointed out, the second patch is essentially a no-op as it should not have been possible to enter direct reclaim with ALLOC_NO_WATERMARKS set. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-19 14:01 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-19 14:01 UTC (permalink / raw) To: Frans Pop Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg On Mon, Oct 19, 2009 at 01:33:29AM +0200, Frans Pop wrote: > Another long mail, sorry. > > On Wednesday 14 October 2009, Frans Pop wrote: > > > There still has not been a mm-change identified that makes > > > fragmentation significantly worse. > > > > My bisection shows a very clear point, even if not an individual commit, > > in the 'akpm' merge where SKB errors suddenly become *much* more > > frequent and easy to trigger. > > I'm sorry to say this, but the fact that nothing has been identified yet > > is IMO the result of a lack of effort, not because there is no such > > change. > > I was wrong. It turns out that I was creating the variations in the test > results around the akpm merge myself by tiny changes in the way I ran the > tests. It took another round of about 30 compilations and tests purely in > this range to show that, but those same tests also made me aware of other > patterns I should look at. > Once again, thanks for persisting with this for so long. That many tests and searching is a miserable undertaking. > Until a few days ago I was concentrating on "do I see SKB allocation errors > or not". Since then I've also been looking more consciously at when they > happen, at disk access patterns and at desktop freeze patterns. > > I think I did mention before that this whole issue is rather subtle :-/ Indeed > So, my apologies for finguering the wrong area for so long, but it looked > solid given the info available at the time. > > On Thursday 15 October 2009, Mel Gorman wrote: > > Outside the range of commits suspected of causing problems was the > > following. It's extremely low probability > > > > Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write confusion > > This patch alters the call to congestion_wait() in the page > > allocator. Frankly, I don't get the change but it might worth > > checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31 > > makes any difference > > This is the real culprit. Mel: thanks very much for looking beyond the > area I identified. Your overview of mm changes was exactly what I needed > and really helped a lot during my later tests. > I'm surprised this made such a big difference which is why I described it as "extremely low probability". It implies that the real problem isn't fragmentation per-se but the timing of when pages get consumed. Maybe what has really changed is how long direct reclaimers wait before trying to allocate again. After the commit, if direct reclaimers are waiting longer between direct reclaim attempts, it might mean that the GFP_KERNEL reclaimers of high-order pages are doing less work before and hurting parallel GFP_ATOMIC users. Jens, does this sound plausible? > This commit definitely causes most of the problems; confirmed by reverting > it on top of 2.6.31 (also requires reverting 373c0a7e, which is a later > build fix). > > The rest of this mail gives details on my tests and how I reached the above > conclusion. > > TEST BASELINE (2.6.30) > ====================== > I mentioned in an earlier mail that I run three instances of gitk for my > tests. Loading gitk seems to consist of 3 phases: > 1) general initial scan of the repository (branches?) > 2) reading commits: commit counter increases > 3) reading references (including bisection good/bad points) and > uncommitted changes > > Below times and comments per stage when the test is run with 2.6.30. As my > test starts after a clean boot, buffers are mostly empty. > > 1st instance: 'gitk v2.6.29..master' (preparation) > 1) ~20 seconds; user interface is mostly blank > 2) ~5 seconds to read 35.000 commits; user interface is updated and counter > increases steadily as they are read > 3) ~10 seconds; "branch"/"follows"/"precedes" info and tags are filled > in; fairly heavy disk activity > > 2st instance: 'gitk master' (preparation) > 1) 0 seconds (because data is already buffered) > 2) ~25 seconds to read 167500 commits; counter increases steadily > 3) 1-2 seconds (because data is already buffered) > > 3st instance: 'gitk master' (the actual test) > 1) 0 seconds because data is already buffered > 2) ~55 seconds due to swapping overhead; minor music skip around commit > 110.000; counter slower after 90.000, some short halts, but generally > increases steadily; moderate disk activity > 3) ~55-60 seconds; because buffers have been emptied data must by read > again, with swapping; very heavy disk activity; fairly long music > skip (15-20 seconds), but no SKB allocation errors > > So, the loading of the 3rd instance takes 1.5 minutes longer than the > second because of the swapping. And phase 3) is most affected by it. > > AFTER WIRELESS CHANGE > ===================== > After commit 4752c93c30 ("iwlcore: Allow skb allocation from tasklet") I > start getting the SKB errors. They can be triggered reliably if the whole > test is repeated 1 or 2 times, but generally not the first time the test > is run. It's up to the wireless driver maintainer what to do here, but it seems like that patch needs to be reverted and thought about some more before trying again. > > Or so I thought for a long time. > It turns out that I will get SKB errors during the first run if I'm > "sloppy" in the test execution. For example if I wait too long before > switching from the last gitk instance to konsole where I have > a 'tail -f /var/log/kern.log' running. So the timing is critical of when the high-order atomic allocations start kicking in. > Another factor is the state of the repository: do I have master checked > out, or an older branch, or am I in the middle of a bisection. This > influences how data is read from the disk and thus the test results. > A last factor may be the size of the kernel I'm using: my test/bisect > kernel is significantly smaller than my regular kernel. > > If the test is run completely cleanly, I will not get SKB errors during the > first run. Also, this change does not affect the timings of the test at > all: the total load time of the 3rd instance is still ~1:55 and music > skips happen in roughly the same places. The pattern of disk activity also > remains unchanged. > > If I do *not* run the test cleanly, any SKB errors during the first test > run will always be during phase 3), never during phase 2). This is what I > saw during tests in the 'akpm' range, and explains the inconsistent > results there. > > After discovering this I've made a copy of the git repo so that I always > test using the exact same state and tightened my test procedure. > > AFTER congestion_wait CHANGE > ============================ > If I test commit 9f2d8be, which is just before the congestion_wait() > change, I still get the same pattern as described above. But when I test > with 8aa7e84 ("Fix congestion_wait() sync/async vs read/write confusion"), > things change dramatically when the 3rd gitk instance is started. > So, assuming this is a timing problem, this commit affects the timing of when pages are consumed by processes doing direct reclaim. > During the 2nd phase I see the first SKB allocation errors with a music > skip between reading commits 95.000 and 110.000. > About commit 115.000 there is a very long pause during which the counter > does not increase, music stops and the desktop freezes completely. The > first 30 seconds of that freeze there is only very low disk activity (which > seems strange); I'm just going to have to depend on Jens here. Jens, the congestion_wait() is on BLK_RW_ASYNC after the commit. Reclaim usually writes pages asynchronously but lumpy reclaim actually waits of pages to write out synchronously so it's not always async. Either way, reclaim is usually worried about writing pages but it would appear after this change that a lot of read activity can also stall a process in direct reclaim. What might be happening in Frans's particular case is that the tasklet that allocates high-order pages for the RX buffers is getting stalled by congestion caused by other processes doing reads from the filesystem. While it makes sense from a congestion point of view to halt the IO, the reclaim operations from direct reclaimers is getting delayed for long enough to cause problems for GFP_ATOMIC. Does this sound plausible to you? If so, what's the best way of addressing this? Changing congestion_wait back to WRITE (assuming that works for Frans)? Changing it to SYNC (again, assuming it actually works) or a revert? > the next 25 seconds there suddenly is very high disk > activity during which things gradually unfreeze and more SKB errors are > displayed. After that the commit counter runs up fairly steadily again. > > Phase 2) ends at ~1:45. Phase 3) (with more SKB errors) ends at ~2:05. > > So this change almost doubles the time needed for phase 2) and causes SKB > allocation errors to occur during that phase. Also, before this commit the > desktop freezes are much shorter and less severe. With this change the > desktop is completely unusable for almost a minute during phase 2), with > even the mouse pointer frozen solid. > Note that phase 3) becomes shorter, but that the total time needed to load > the 3rd instance increases by about 10-15 seconds. > > Note: -rc2 and -rc3 had broken NFS, so I had to cherry-pick 3 NFS commits > from -rc4 on top of the commits I wanted to test. > > WITH congestion_wait CHANGE REVERTED > ==================================== > I've done quite a few tests of 2.6.31 with 373c0a7e and 8aa7e847 reverted > to confirm that's really the culprit. I've done this for .31-rc3, .31-rc4, > .31-rc5, .31 and .31.1. > > In all cases the huge freeze in phase 2) is gone and the general behavior > and timings are again as it was after the wireless change. During most > tests I did not get any SKB allocation errors during phase 2) or phase 3). > > However with .31-rc5, .31 and .31.1 I have had some tests where I would see > a few SKB allocation errors during phase 3) (which is somewhat likely), > but also during phase 2). At this point I'm unsure whether this is just > noise, or maybe a minor influence from some change merged after .31-rc4. > Looking through the commits there are several mm/page allocation changes. > It could still be kswapd not being woken up often enough after direct reclaimers. I took a look through the commits but none of the mm or allocator changes struck me as likely candidates for making fragmentation worse or altering the timing. > For now I suggest ignoring this though as the impact (if any) is very minor > and it is not reproducible reliably enough. > > Next I'll retest Mel's patches and also test Reinette's patches. > Of the two patches, only the kswapd one should have any significance. As David pointed out, the second patch is essentially a no-op as it should not have been possible to enter direct reclaim with ALLOC_NO_WATERMARKS set. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-19 14:01 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-19 14:01 UTC (permalink / raw) To: Frans Pop Cc: David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Mon, Oct 19, 2009 at 01:33:29AM +0200, Frans Pop wrote: > Another long mail, sorry. > > On Wednesday 14 October 2009, Frans Pop wrote: > > > There still has not been a mm-change identified that makes > > > fragmentation significantly worse. > > > > My bisection shows a very clear point, even if not an individual commit, > > in the 'akpm' merge where SKB errors suddenly become *much* more > > frequent and easy to trigger. > > I'm sorry to say this, but the fact that nothing has been identified yet > > is IMO the result of a lack of effort, not because there is no such > > change. > > I was wrong. It turns out that I was creating the variations in the test > results around the akpm merge myself by tiny changes in the way I ran the > tests. It took another round of about 30 compilations and tests purely in > this range to show that, but those same tests also made me aware of other > patterns I should look at. > Once again, thanks for persisting with this for so long. That many tests and searching is a miserable undertaking. > Until a few days ago I was concentrating on "do I see SKB allocation errors > or not". Since then I've also been looking more consciously at when they > happen, at disk access patterns and at desktop freeze patterns. > > I think I did mention before that this whole issue is rather subtle :-/ Indeed > So, my apologies for finguering the wrong area for so long, but it looked > solid given the info available at the time. > > On Thursday 15 October 2009, Mel Gorman wrote: > > Outside the range of commits suspected of causing problems was the > > following. It's extremely low probability > > > > Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write confusion > > This patch alters the call to congestion_wait() in the page > > allocator. Frankly, I don't get the change but it might worth > > checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31 > > makes any difference > > This is the real culprit. Mel: thanks very much for looking beyond the > area I identified. Your overview of mm changes was exactly what I needed > and really helped a lot during my later tests. > I'm surprised this made such a big difference which is why I described it as "extremely low probability". It implies that the real problem isn't fragmentation per-se but the timing of when pages get consumed. Maybe what has really changed is how long direct reclaimers wait before trying to allocate again. After the commit, if direct reclaimers are waiting longer between direct reclaim attempts, it might mean that the GFP_KERNEL reclaimers of high-order pages are doing less work before and hurting parallel GFP_ATOMIC users. Jens, does this sound plausible? > This commit definitely causes most of the problems; confirmed by reverting > it on top of 2.6.31 (also requires reverting 373c0a7e, which is a later > build fix). > > The rest of this mail gives details on my tests and how I reached the above > conclusion. > > TEST BASELINE (2.6.30) > ====================== > I mentioned in an earlier mail that I run three instances of gitk for my > tests. Loading gitk seems to consist of 3 phases: > 1) general initial scan of the repository (branches?) > 2) reading commits: commit counter increases > 3) reading references (including bisection good/bad points) and > uncommitted changes > > Below times and comments per stage when the test is run with 2.6.30. As my > test starts after a clean boot, buffers are mostly empty. > > 1st instance: 'gitk v2.6.29..master' (preparation) > 1) ~20 seconds; user interface is mostly blank > 2) ~5 seconds to read 35.000 commits; user interface is updated and counter > increases steadily as they are read > 3) ~10 seconds; "branch"/"follows"/"precedes" info and tags are filled > in; fairly heavy disk activity > > 2st instance: 'gitk master' (preparation) > 1) 0 seconds (because data is already buffered) > 2) ~25 seconds to read 167500 commits; counter increases steadily > 3) 1-2 seconds (because data is already buffered) > > 3st instance: 'gitk master' (the actual test) > 1) 0 seconds because data is already buffered > 2) ~55 seconds due to swapping overhead; minor music skip around commit > 110.000; counter slower after 90.000, some short halts, but generally > increases steadily; moderate disk activity > 3) ~55-60 seconds; because buffers have been emptied data must by read > again, with swapping; very heavy disk activity; fairly long music > skip (15-20 seconds), but no SKB allocation errors > > So, the loading of the 3rd instance takes 1.5 minutes longer than the > second because of the swapping. And phase 3) is most affected by it. > > AFTER WIRELESS CHANGE > ===================== > After commit 4752c93c30 ("iwlcore: Allow skb allocation from tasklet") I > start getting the SKB errors. They can be triggered reliably if the whole > test is repeated 1 or 2 times, but generally not the first time the test > is run. It's up to the wireless driver maintainer what to do here, but it seems like that patch needs to be reverted and thought about some more before trying again. > > Or so I thought for a long time. > It turns out that I will get SKB errors during the first run if I'm > "sloppy" in the test execution. For example if I wait too long before > switching from the last gitk instance to konsole where I have > a 'tail -f /var/log/kern.log' running. So the timing is critical of when the high-order atomic allocations start kicking in. > Another factor is the state of the repository: do I have master checked > out, or an older branch, or am I in the middle of a bisection. This > influences how data is read from the disk and thus the test results. > A last factor may be the size of the kernel I'm using: my test/bisect > kernel is significantly smaller than my regular kernel. > > If the test is run completely cleanly, I will not get SKB errors during the > first run. Also, this change does not affect the timings of the test at > all: the total load time of the 3rd instance is still ~1:55 and music > skips happen in roughly the same places. The pattern of disk activity also > remains unchanged. > > If I do *not* run the test cleanly, any SKB errors during the first test > run will always be during phase 3), never during phase 2). This is what I > saw during tests in the 'akpm' range, and explains the inconsistent > results there. > > After discovering this I've made a copy of the git repo so that I always > test using the exact same state and tightened my test procedure. > > AFTER congestion_wait CHANGE > ============================ > If I test commit 9f2d8be, which is just before the congestion_wait() > change, I still get the same pattern as described above. But when I test > with 8aa7e84 ("Fix congestion_wait() sync/async vs read/write confusion"), > things change dramatically when the 3rd gitk instance is started. > So, assuming this is a timing problem, this commit affects the timing of when pages are consumed by processes doing direct reclaim. > During the 2nd phase I see the first SKB allocation errors with a music > skip between reading commits 95.000 and 110.000. > About commit 115.000 there is a very long pause during which the counter > does not increase, music stops and the desktop freezes completely. The > first 30 seconds of that freeze there is only very low disk activity (which > seems strange); I'm just going to have to depend on Jens here. Jens, the congestion_wait() is on BLK_RW_ASYNC after the commit. Reclaim usually writes pages asynchronously but lumpy reclaim actually waits of pages to write out synchronously so it's not always async. Either way, reclaim is usually worried about writing pages but it would appear after this change that a lot of read activity can also stall a process in direct reclaim. What might be happening in Frans's particular case is that the tasklet that allocates high-order pages for the RX buffers is getting stalled by congestion caused by other processes doing reads from the filesystem. While it makes sense from a congestion point of view to halt the IO, the reclaim operations from direct reclaimers is getting delayed for long enough to cause problems for GFP_ATOMIC. Does this sound plausible to you? If so, what's the best way of addressing this? Changing congestion_wait back to WRITE (assuming that works for Frans)? Changing it to SYNC (again, assuming it actually works) or a revert? > the next 25 seconds there suddenly is very high disk > activity during which things gradually unfreeze and more SKB errors are > displayed. After that the commit counter runs up fairly steadily again. > > Phase 2) ends at ~1:45. Phase 3) (with more SKB errors) ends at ~2:05. > > So this change almost doubles the time needed for phase 2) and causes SKB > allocation errors to occur during that phase. Also, before this commit the > desktop freezes are much shorter and less severe. With this change the > desktop is completely unusable for almost a minute during phase 2), with > even the mouse pointer frozen solid. > Note that phase 3) becomes shorter, but that the total time needed to load > the 3rd instance increases by about 10-15 seconds. > > Note: -rc2 and -rc3 had broken NFS, so I had to cherry-pick 3 NFS commits > from -rc4 on top of the commits I wanted to test. > > WITH congestion_wait CHANGE REVERTED > ==================================== > I've done quite a few tests of 2.6.31 with 373c0a7e and 8aa7e847 reverted > to confirm that's really the culprit. I've done this for .31-rc3, .31-rc4, > .31-rc5, .31 and .31.1. > > In all cases the huge freeze in phase 2) is gone and the general behavior > and timings are again as it was after the wireless change. During most > tests I did not get any SKB allocation errors during phase 2) or phase 3). > > However with .31-rc5, .31 and .31.1 I have had some tests where I would see > a few SKB allocation errors during phase 3) (which is somewhat likely), > but also during phase 2). At this point I'm unsure whether this is just > noise, or maybe a minor influence from some change merged after .31-rc4. > Looking through the commits there are several mm/page allocation changes. > It could still be kswapd not being woken up often enough after direct reclaimers. I took a look through the commits but none of the mm or allocator changes struck me as likely candidates for making fragmentation worse or altering the timing. > For now I suggest ignoring this though as the impact (if any) is very minor > and it is not reproducible reliably enough. > > Next I'll retest Mel's patches and also test Reinette's patches. > Of the two patches, only the kswapd one should have any significance. As David pointed out, the second patch is essentially a no-op as it should not have been possible to enter direct reclaim with ALLOC_NO_WATERMARKS set. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-19 14:01 ` Mel Gorman @ 2009-10-19 16:18 ` Chris Mason -1 siblings, 0 replies; 384+ messages in thread From: Chris Mason @ 2009-10-19 16:18 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Mon, Oct 19, 2009 at 03:01:52PM +0100, Mel Gorman wrote: > > > During the 2nd phase I see the first SKB allocation errors with a music > > skip between reading commits 95.000 and 110.000. > > About commit 115.000 there is a very long pause during which the counter > > does not increase, music stops and the desktop freezes completely. The > > first 30 seconds of that freeze there is only very low disk activity (which > > seems strange); > > I'm just going to have to depend on Jens here. Jens, the congestion_wait() is > on BLK_RW_ASYNC after the commit. Reclaim usually writes pages asynchronously > but lumpy reclaim actually waits of pages to write out synchronously so > it's not always async. Waiting doesn't make it synchronous from the elevator point of view ;) If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be using the async congestion wait. (the exception is xfs which always does async writes). But I'm honestly not 100% sure. Looking back through the emails, the test case is doing IO on top of a whole lot of things on top of dm-crypt? I just tried to figure out if dm-crypt is turning the async IO into sync IOs, but didn't quite make sense of it. Could you also please include which filesystems were being abused during the test and how? Reading through the emails, I think you've got: gitk being run 3 times on some FS (NFS?) streaming reads on NFS swap on dm-crypt If other filesystems are being used, please correct me. Also please include if they are on crypto or straight block device. > > Either way, reclaim is usually worried about writing pages but it would appear > after this change that a lot of read activity can also stall a process in > direct reclaim. What might be happening in Frans's particular case is that the > tasklet that allocates high-order pages for the RX buffers is getting stalled > by congestion caused by other processes doing reads from the filesystem. > While it makes sense from a congestion point of view to halt the IO, the > reclaim operations from direct reclaimers is getting delayed for long enough > to cause problems for GFP_ATOMIC. The congestion_wait code either waits for congestion to clear or for a given timeout. The part that isn't clear is if before the patch we waited a very short time (congestion cleared quickly) or a very long time (we hit the timeout or congestion cleared slowly). The easiest way to tell is to just replace the congestion_wait() calls in direct reclaim with schedule_timeout_interruptible(10), test, then schedule_timeout_interruptible(HZ/20), then test again. > > Does this sound plausible to you? If so, what's the best way of > addressing this? Changing congestion_wait back to WRITE (assuming that > works for Frans)? Changing it to SYNC (again, assuming it actually > works) or a revert? I don't think changing it to SYNC is a good plan unless we're actually doing sync io. It would be better to just wait on one of the pages that you've sent down (or its hashed waitqueue since the page can go away). -chris ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-19 16:18 ` Chris Mason 0 siblings, 0 replies; 384+ messages in thread From: Chris Mason @ 2009-10-19 16:18 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Mon, Oct 19, 2009 at 03:01:52PM +0100, Mel Gorman wrote: > > > During the 2nd phase I see the first SKB allocation errors with a music > > skip between reading commits 95.000 and 110.000. > > About commit 115.000 there is a very long pause during which the counter > > does not increase, music stops and the desktop freezes completely. The > > first 30 seconds of that freeze there is only very low disk activity (which > > seems strange); > > I'm just going to have to depend on Jens here. Jens, the congestion_wait() is > on BLK_RW_ASYNC after the commit. Reclaim usually writes pages asynchronously > but lumpy reclaim actually waits of pages to write out synchronously so > it's not always async. Waiting doesn't make it synchronous from the elevator point of view ;) If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be using the async congestion wait. (the exception is xfs which always does async writes). But I'm honestly not 100% sure. Looking back through the emails, the test case is doing IO on top of a whole lot of things on top of dm-crypt? I just tried to figure out if dm-crypt is turning the async IO into sync IOs, but didn't quite make sense of it. Could you also please include which filesystems were being abused during the test and how? Reading through the emails, I think you've got: gitk being run 3 times on some FS (NFS?) streaming reads on NFS swap on dm-crypt If other filesystems are being used, please correct me. Also please include if they are on crypto or straight block device. > > Either way, reclaim is usually worried about writing pages but it would appear > after this change that a lot of read activity can also stall a process in > direct reclaim. What might be happening in Frans's particular case is that the > tasklet that allocates high-order pages for the RX buffers is getting stalled > by congestion caused by other processes doing reads from the filesystem. > While it makes sense from a congestion point of view to halt the IO, the > reclaim operations from direct reclaimers is getting delayed for long enough > to cause problems for GFP_ATOMIC. The congestion_wait code either waits for congestion to clear or for a given timeout. The part that isn't clear is if before the patch we waited a very short time (congestion cleared quickly) or a very long time (we hit the timeout or congestion cleared slowly). The easiest way to tell is to just replace the congestion_wait() calls in direct reclaim with schedule_timeout_interruptible(10), test, then schedule_timeout_interruptible(HZ/20), then test again. > > Does this sound plausible to you? If so, what's the best way of > addressing this? Changing congestion_wait back to WRITE (assuming that > works for Frans)? Changing it to SYNC (again, assuming it actually > works) or a revert? I don't think changing it to SYNC is a good plan unless we're actually doing sync io. It would be better to just wait on one of the pages that you've sent down (or its hashed waitqueue since the page can go away). -chris -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-19 16:18 ` Chris Mason (?) @ 2009-10-19 17:01 ` Christoph Hellwig -1 siblings, 0 replies; 384+ messages in thread From: Christoph Hellwig @ 2009-10-19 17:01 UTC (permalink / raw) To: Chris Mason, Mel Gorman, Frans Pop, David Rientjes, KOSAKI Motohiro On Tue, Oct 20, 2009 at 01:18:15AM +0900, Chris Mason wrote: > Waiting doesn't make it synchronous from the elevator point of view ;) > If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it > a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be > using the async congestion wait. (the exception is xfs which always > does async writes). That's only because those people who did the global sweep did not bother to convert it or even tell the list about it. I have a patch in my QA queue to change it.. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-19 16:18 ` Chris Mason (?) @ 2009-10-19 17:01 ` Christoph Hellwig -1 siblings, 0 replies; 384+ messages in thread From: Christoph Hellwig @ 2009-10-19 17:01 UTC (permalink / raw) To: Chris Mason, Mel Gorman, Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Tue, Oct 20, 2009 at 01:18:15AM +0900, Chris Mason wrote: > Waiting doesn't make it synchronous from the elevator point of view ;) > If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it > a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be > using the async congestion wait. (the exception is xfs which always > does async writes). That's only because those people who did the global sweep did not bother to convert it or even tell the list about it. I have a patch in my QA queue to change it.. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-19 17:01 ` Christoph Hellwig 0 siblings, 0 replies; 384+ messages in thread From: Christoph Hellwig @ 2009-10-19 17:01 UTC (permalink / raw) To: Chris Mason, Mel Gorman, Frans Pop, David Rientjes, KOSAKI Motohiro On Tue, Oct 20, 2009 at 01:18:15AM +0900, Chris Mason wrote: > Waiting doesn't make it synchronous from the elevator point of view ;) > If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it > a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be > using the async congestion wait. (the exception is xfs which always > does async writes). That's only because those people who did the global sweep did not bother to convert it or even tell the list about it. I have a patch in my QA queue to change it.. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-19 17:01 ` Christoph Hellwig 0 siblings, 0 replies; 384+ messages in thread From: Christoph Hellwig @ 2009-10-19 17:01 UTC (permalink / raw) To: Chris Mason, Mel Gorman, Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Tue, Oct 20, 2009 at 01:18:15AM +0900, Chris Mason wrote: > Waiting doesn't make it synchronous from the elevator point of view ;) > If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it > a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be > using the async congestion wait. (the exception is xfs which always > does async writes). That's only because those people who did the global sweep did not bother to convert it or even tell the list about it. I have a patch in my QA queue to change it.. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-19 17:01 ` Christoph Hellwig (?) @ 2009-10-19 21:57 ` Chris Mason -1 siblings, 0 replies; 384+ messages in thread From: Chris Mason @ 2009-10-19 21:57 UTC (permalink / raw) To: Christoph Hellwig Cc: Mel Gorman, Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Mon, Oct 19, 2009 at 01:01:15PM -0400, Christoph Hellwig wrote: > On Tue, Oct 20, 2009 at 01:18:15AM +0900, Chris Mason wrote: > > Waiting doesn't make it synchronous from the elevator point of view ;) > > If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it > > a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be > > using the async congestion wait. (the exception is xfs which always > > does async writes). > > That's only because those people who did the global sweep did not bother > to convert it or even tell the list about it. I have a patch in my > QA queue to change it.. Yes, we just didn't realize XFS was missed. Sorry. I wasn't trying to blame xfs for being behind, just mentioning that we've got about 10 different variables here and I'm having a hard time figuring out which ones to push on. -chris ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-19 21:57 ` Chris Mason 0 siblings, 0 replies; 384+ messages in thread From: Chris Mason @ 2009-10-19 21:57 UTC (permalink / raw) To: Christoph Hellwig Cc: Mel Gorman, Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg On Mon, Oct 19, 2009 at 01:01:15PM -0400, Christoph Hellwig wrote: > On Tue, Oct 20, 2009 at 01:18:15AM +0900, Chris Mason wrote: > > Waiting doesn't make it synchronous from the elevator point of view ;) > > If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it > > a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be > > using the async congestion wait. (the exception is xfs which always > > does async writes). > > That's only because those people who did the global sweep did not bother > to convert it or even tell the list about it. I have a patch in my > QA queue to change it.. Yes, we just didn't realize XFS was missed. Sorry. I wasn't trying to blame xfs for being behind, just mentioning that we've got about 10 different variables here and I'm having a hard time figuring out which ones to push on. -chris ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-19 21:57 ` Chris Mason 0 siblings, 0 replies; 384+ messages in thread From: Chris Mason @ 2009-10-19 21:57 UTC (permalink / raw) To: Christoph Hellwig Cc: Mel Gorman, Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Mon, Oct 19, 2009 at 01:01:15PM -0400, Christoph Hellwig wrote: > On Tue, Oct 20, 2009 at 01:18:15AM +0900, Chris Mason wrote: > > Waiting doesn't make it synchronous from the elevator point of view ;) > > If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it > > a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be > > using the async congestion wait. (the exception is xfs which always > > does async writes). > > That's only because those people who did the global sweep did not bother > to convert it or even tell the list about it. I have a patch in my > QA queue to change it.. Yes, we just didn't realize XFS was missed. Sorry. I wasn't trying to blame xfs for being behind, just mentioning that we've got about 10 different variables here and I'm having a hard time figuring out which ones to push on. -chris -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-19 16:18 ` Chris Mason (?) @ 2009-10-20 10:48 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-20 10:48 UTC (permalink / raw) To: Chris Mason, Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Tue, Oct 20, 2009 at 01:18:15AM +0900, Chris Mason wrote: > On Mon, Oct 19, 2009 at 03:01:52PM +0100, Mel Gorman wrote: > > > > > During the 2nd phase I see the first SKB allocation errors with a music > > > skip between reading commits 95.000 and 110.000. > > > About commit 115.000 there is a very long pause during which the counter > > > does not increase, music stops and the desktop freezes completely. The > > > first 30 seconds of that freeze there is only very low disk activity (which > > > seems strange); > > > > I'm just going to have to depend on Jens here. Jens, the congestion_wait() is > > on BLK_RW_ASYNC after the commit. Reclaim usually writes pages asynchronously > > but lumpy reclaim actually waits of pages to write out synchronously so > > it's not always async. > > Waiting doesn't make it synchronous from the elevator point of view ;) > If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it > a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be > using the async congestion wait. (the exception is xfs which always > does async writes). > Right, reclaim always queues the pages for async IO but for lumpy reclaim, it calls wait_on_page_writeback() but as you say, from an elevator point of view, it's still async. > But I'm honestly not 100% sure. Looking back through the emails, the > test case is doing IO on top of a whole lot of things on top of > dm-crypt? I just tried to figure out if dm-crypt is turning the async > IO into sync IOs, but didn't quite make sense of it. > I'm not overly sure either. > Could you also please include which filesystems were being abused during > the test and how? Reading through the emails, I think you've got: > > gitk being run 3 times on some FS (NFS?) > streaming reads on NFS > swap on dm-crypt > > If other filesystems are being used, please correct me. Also please > include if they are on crypto or straight block device. > I've attached a patch below that should allow us to cheat. When it's applied, it outputs who called congestion_wait(), how long the timeout was and how long it waited for. By comparing before and after sleep times, we should be able to see which of the callers has significantly changed and if it's something easily addressable. > > Either way, reclaim is usually worried about writing pages but it would appear > > after this change that a lot of read activity can also stall a process in > > direct reclaim. What might be happening in Frans's particular case is that the > > tasklet that allocates high-order pages for the RX buffers is getting stalled > > by congestion caused by other processes doing reads from the filesystem. > > While it makes sense from a congestion point of view to halt the IO, the > > reclaim operations from direct reclaimers is getting delayed for long enough > > to cause problems for GFP_ATOMIC. > > The congestion_wait code either waits for congestion to clear or for > a given timeout. The part that isn't clear is if before the patch > we waited a very short time (congestion cleared quickly) or a very long > time (we hit the timeout or congestion cleared slowly). > Using the instrumentation patch, I found with a very basic test that we are waiting for short periods of time more often with the patch applied 1 congestion_wait rw=1 delay 6 timeout 25 :: before commit 7 kswapd congestion_wait rw=1 delay 0 timeout 25 :: before commit 32 kswapd congestion_wait sync=0 delay 0 timeout 25 :: after commit 61 kswapd congestion_wait rw=1 delay 1 timeout 25 :: before commit 133 kswapd congestion_wait sync=0 delay 1 timeout 25 :: after commit 16 kswapd congestion_wait rw=1 delay 2 timeout 25 :: before commit 70 kswapd congestion_wait sync=0 delay 2 timeout 25 :: after commit 1 try_to_free_pages congestion_wait sync=0 delay 2 timeout 25 :: after commit 17 kswapd congestion_wait rw=1 delay 3 timeout 25 :: before commit 28 kswapd congestion_wait sync=0 delay 3 timeout 25 :: after commit 1 try_to_free_pages congestion_wait sync=0 delay 3 timeout 25 :: after commit 23 kswapd congestion_wait rw=1 delay 4 timeout 25 :: before commit 16 kswapd congestion_wait sync=0 delay 4 timeout 25 :: after commit 5 try_to_free_pages congestion_wait sync=0 delay 4 timeout 25 :: after commit 20 kswapd congestion_wait rw=1 delay 5 timeout 25 :: before commit 18 kswapd congestion_wait sync=0 delay 5 timeout 25 :: after commit 3 try_to_free_pages congestion_wait sync=0 delay 5 timeout 25 :: after commit 21 kswapd congestion_wait rw=1 delay 6 timeout 25 :: before commit 8 kswapd congestion_wait sync=0 delay 6 timeout 25 :: after commit 2 try_to_free_pages congestion_wait sync=0 delay 6 timeout 25 :: after commit 13 kswapd congestion_wait rw=1 delay 7 timeout 25 :: before commit 12 kswapd congestion_wait sync=0 delay 7 timeout 25 :: after commit 2 try_to_free_pages congestion_wait sync=0 delay 7 timeout 25 :: after commit 8 kswapd congestion_wait rw=1 delay 8 timeout 25 :: before commit 7 kswapd congestion_wait sync=0 delay 8 timeout 25 :: after commit 9 kswapd congestion_wait rw=1 delay 9 timeout 25 :: before commit 5 kswapd congestion_wait sync=0 delay 9 timeout 25 :: after commit 2 try_to_free_pages congestion_wait sync=0 delay 9 timeout 25 :: after commit 4 kswapd congestion_wait rw=1 delay 10 timeout 25 :: before commit 5 kswapd congestion_wait sync=0 delay 10 timeout 25 :: after commit 1 try_to_free_pages congestion_wait sync=0 delay 10 timeout 25 :: after commit [... remaining output snipped ...] The before and after commit are really 2.6.31 and 2.6.31-patch-reverted. The first column is how many times we delayed for that length of time. To generate the output, I just took the console log from both kernels with a basic test, put the congestion_wait lines into two separate files and cat congestion-*-sorted | sort -n -k5 | uniq -c to give a count of how many times we delayed for a particular caller. > The easiest way to tell is to just replace the congestion_wait() calls > in direct reclaim with schedule_timeout_interruptible(10), test, then > schedule_timeout_interruptible(HZ/20), then test again. > Reclaim can also call congestion_wait() and maybe the problem isn't within the page allocator at all but that it's indirectly affected by timing. > > > > Does this sound plausible to you? If so, what's the best way of > > addressing this? Changing congestion_wait back to WRITE (assuming that > > works for Frans)? Changing it to SYNC (again, assuming it actually > > works) or a revert? > > I don't think changing it to SYNC is a good plan unless we're actually > doing sync io. It would be better to just wait on one of the pages that > you've sent down (or its hashed waitqueue since the page can go away). > Frans, is there any chance you could apply the following patch and get the console logs for a vanilla kernel and with the congestion patches reverted? I'm hoping it'll be able to tell us which of the callers has significantly changed in timing. If there is one caller that has significantly changed, it might be enough to address just that caller. ===== >From 757999066dc41f2e053d59589c673052fc7c1a65 Mon Sep 17 00:00:00 2001 From: Mel Gorman <mel@csn.ul.ie> Date: Tue, 20 Oct 2009 11:01:57 +0100 Subject: [PATCH] Instrument congestion_wait This patch instruments how long congestion_wait() really waited for a given caller. diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 3d3accb..fc945e0 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -10,6 +10,7 @@ #include <linux/module.h> #include <linux/writeback.h> #include <linux/device.h> +#include <linux/kallsyms.h> void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page) { @@ -729,6 +730,11 @@ EXPORT_SYMBOL(set_bdi_congested); */ long congestion_wait(int sync, long timeout) { + unsigned long jiffies_start = jiffies; + char *module; + char buf[128]; + const char *symbol; + unsigned long offset, symbolsize; long ret; DEFINE_WAIT(wait); wait_queue_head_t *wqh = &congestion_wqh[sync]; @@ -736,6 +742,13 @@ long congestion_wait(int sync, long timeout) prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); ret = io_schedule_timeout(timeout); finish_wait(wqh, &wait); + + symbol = kallsyms_lookup(_RET_IP_, &symbolsize, &offset, &module, buf), + printk(KERN_INFO "%-20s congestion_wait sync=%d delay %lu timeout %ld\n", + symbol, + sync, + jiffies - jiffies_start, + timeout); return ret; } EXPORT_SYMBOL(congestion_wait); ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-20 10:48 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-20 10:48 UTC (permalink / raw) To: Chris Mason, Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki On Tue, Oct 20, 2009 at 01:18:15AM +0900, Chris Mason wrote: > On Mon, Oct 19, 2009 at 03:01:52PM +0100, Mel Gorman wrote: > > > > > During the 2nd phase I see the first SKB allocation errors with a music > > > skip between reading commits 95.000 and 110.000. > > > About commit 115.000 there is a very long pause during which the counter > > > does not increase, music stops and the desktop freezes completely. The > > > first 30 seconds of that freeze there is only very low disk activity (which > > > seems strange); > > > > I'm just going to have to depend on Jens here. Jens, the congestion_wait() is > > on BLK_RW_ASYNC after the commit. Reclaim usually writes pages asynchronously > > but lumpy reclaim actually waits of pages to write out synchronously so > > it's not always async. > > Waiting doesn't make it synchronous from the elevator point of view ;) > If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it > a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be > using the async congestion wait. (the exception is xfs which always > does async writes). > Right, reclaim always queues the pages for async IO but for lumpy reclaim, it calls wait_on_page_writeback() but as you say, from an elevator point of view, it's still async. > But I'm honestly not 100% sure. Looking back through the emails, the > test case is doing IO on top of a whole lot of things on top of > dm-crypt? I just tried to figure out if dm-crypt is turning the async > IO into sync IOs, but didn't quite make sense of it. > I'm not overly sure either. > Could you also please include which filesystems were being abused during > the test and how? Reading through the emails, I think you've got: > > gitk being run 3 times on some FS (NFS?) > streaming reads on NFS > swap on dm-crypt > > If other filesystems are being used, please correct me. Also please > include if they are on crypto or straight block device. > I've attached a patch below that should allow us to cheat. When it's applied, it outputs who called congestion_wait(), how long the timeout was and how long it waited for. By comparing before and after sleep times, we should be able to see which of the callers has significantly changed and if it's something easily addressable. > > Either way, reclaim is usually worried about writing pages but it would appear > > after this change that a lot of read activity can also stall a process in > > direct reclaim. What might be happening in Frans's particular case is that the > > tasklet that allocates high-order pages for the RX buffers is getting stalled > > by congestion caused by other processes doing reads from the filesystem. > > While it makes sense from a congestion point of view to halt the IO, the > > reclaim operations from direct reclaimers is getting delayed for long enough > > to cause problems for GFP_ATOMIC. > > The congestion_wait code either waits for congestion to clear or for > a given timeout. The part that isn't clear is if before the patch > we waited a very short time (congestion cleared quickly) or a very long > time (we hit the timeout or congestion cleared slowly). > Using the instrumentation patch, I found with a very basic test that we are waiting for short periods of time more often with the patch applied 1 congestion_wait rw=1 delay 6 timeout 25 :: before commit 7 kswapd congestion_wait rw=1 delay 0 timeout 25 :: before commit 32 kswapd congestion_wait sync=0 delay 0 timeout 25 :: after commit 61 kswapd congestion_wait rw=1 delay 1 timeout 25 :: before commit 133 kswapd congestion_wait sync=0 delay 1 timeout 25 :: after commit 16 kswapd congestion_wait rw=1 delay 2 timeout 25 :: before commit 70 kswapd congestion_wait sync=0 delay 2 timeout 25 :: after commit 1 try_to_free_pages congestion_wait sync=0 delay 2 timeout 25 :: after commit 17 kswapd congestion_wait rw=1 delay 3 timeout 25 :: before commit 28 kswapd congestion_wait sync=0 delay 3 timeout 25 :: after commit 1 try_to_free_pages congestion_wait sync=0 delay 3 timeout 25 :: after commit 23 kswapd congestion_wait rw=1 delay 4 timeout 25 :: before commit 16 kswapd congestion_wait sync=0 delay 4 timeout 25 :: after commit 5 try_to_free_pages congestion_wait sync=0 delay 4 timeout 25 :: after commit 20 kswapd congestion_wait rw=1 delay 5 timeout 25 :: before commit 18 kswapd congestion_wait sync=0 delay 5 timeout 25 :: after commit 3 try_to_free_pages congestion_wait sync=0 delay 5 timeout 25 :: after commit 21 kswapd congestion_wait rw=1 delay 6 timeout 25 :: before commit 8 kswapd congestion_wait sync=0 delay 6 timeout 25 :: after commit 2 try_to_free_pages congestion_wait sync=0 delay 6 timeout 25 :: after commit 13 kswapd congestion_wait rw=1 delay 7 timeout 25 :: before commit 12 kswapd congestion_wait sync=0 delay 7 timeout 25 :: after commit 2 try_to_free_pages congestion_wait sync=0 delay 7 timeout 25 :: after commit 8 kswapd congestion_wait rw=1 delay 8 timeout 25 :: before commit 7 kswapd congestion_wait sync=0 delay 8 timeout 25 :: after commit 9 kswapd congestion_wait rw=1 delay 9 timeout 25 :: before commit 5 kswapd congestion_wait sync=0 delay 9 timeout 25 :: after commit 2 try_to_free_pages congestion_wait sync=0 delay 9 timeout 25 :: after commit 4 kswapd congestion_wait rw=1 delay 10 timeout 25 :: before commit 5 kswapd congestion_wait sync=0 delay 10 timeout 25 :: after commit 1 try_to_free_pages congestion_wait sync=0 delay 10 timeout 25 :: after commit [... remaining output snipped ...] The before and after commit are really 2.6.31 and 2.6.31-patch-reverted. The first column is how many times we delayed for that length of time. To generate the output, I just took the console log from both kernels with a basic test, put the congestion_wait lines into two separate files and cat congestion-*-sorted | sort -n -k5 | uniq -c to give a count of how many times we delayed for a particular caller. > The easiest way to tell is to just replace the congestion_wait() calls > in direct reclaim with schedule_timeout_interruptible(10), test, then > schedule_timeout_interruptible(HZ/20), then test again. > Reclaim can also call congestion_wait() and maybe the problem isn't within the page allocator at all but that it's indirectly affected by timing. > > > > Does this sound plausible to you? If so, what's the best way of > > addressing this? Changing congestion_wait back to WRITE (assuming that > > works for Frans)? Changing it to SYNC (again, assuming it actually > > works) or a revert? > > I don't think changing it to SYNC is a good plan unless we're actually > doing sync io. It would be better to just wait on one of the pages that > you've sent down (or its hashed waitqueue since the page can go away). > Frans, is there any chance you could apply the following patch and get the console logs for a vanilla kernel and with the congestion patches reverted? I'm hoping it'll be able to tell us which of the callers has significantly changed in timing. If there is one caller that has significantly changed, it might be enough to address just that caller. ===== From 757999066dc41f2e053d59589c673052fc7c1a65 Mon Sep 17 00:00:00 2001 From: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> Date: Tue, 20 Oct 2009 11:01:57 +0100 Subject: [PATCH] Instrument congestion_wait This patch instruments how long congestion_wait() really waited for a given caller. diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 3d3accb..fc945e0 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -10,6 +10,7 @@ #include <linux/module.h> #include <linux/writeback.h> #include <linux/device.h> +#include <linux/kallsyms.h> void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page) { @@ -729,6 +730,11 @@ EXPORT_SYMBOL(set_bdi_congested); */ long congestion_wait(int sync, long timeout) { + unsigned long jiffies_start = jiffies; + char *module; + char buf[128]; + const char *symbol; + unsigned long offset, symbolsize; long ret; DEFINE_WAIT(wait); wait_queue_head_t *wqh = &congestion_wqh[sync]; @@ -736,6 +742,13 @@ long congestion_wait(int sync, long timeout) prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); ret = io_schedule_timeout(timeout); finish_wait(wqh, &wait); + + symbol = kallsyms_lookup(_RET_IP_, &symbolsize, &offset, &module, buf), + printk(KERN_INFO "%-20s congestion_wait sync=%d delay %lu timeout %ld\n", + symbol, + sync, + jiffies - jiffies_start, + timeout); return ret; } EXPORT_SYMBOL(congestion_wait); ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-20 10:48 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-20 10:48 UTC (permalink / raw) To: Chris Mason, Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Tue, Oct 20, 2009 at 01:18:15AM +0900, Chris Mason wrote: > On Mon, Oct 19, 2009 at 03:01:52PM +0100, Mel Gorman wrote: > > > > > During the 2nd phase I see the first SKB allocation errors with a music > > > skip between reading commits 95.000 and 110.000. > > > About commit 115.000 there is a very long pause during which the counter > > > does not increase, music stops and the desktop freezes completely. The > > > first 30 seconds of that freeze there is only very low disk activity (which > > > seems strange); > > > > I'm just going to have to depend on Jens here. Jens, the congestion_wait() is > > on BLK_RW_ASYNC after the commit. Reclaim usually writes pages asynchronously > > but lumpy reclaim actually waits of pages to write out synchronously so > > it's not always async. > > Waiting doesn't make it synchronous from the elevator point of view ;) > If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it > a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be > using the async congestion wait. (the exception is xfs which always > does async writes). > Right, reclaim always queues the pages for async IO but for lumpy reclaim, it calls wait_on_page_writeback() but as you say, from an elevator point of view, it's still async. > But I'm honestly not 100% sure. Looking back through the emails, the > test case is doing IO on top of a whole lot of things on top of > dm-crypt? I just tried to figure out if dm-crypt is turning the async > IO into sync IOs, but didn't quite make sense of it. > I'm not overly sure either. > Could you also please include which filesystems were being abused during > the test and how? Reading through the emails, I think you've got: > > gitk being run 3 times on some FS (NFS?) > streaming reads on NFS > swap on dm-crypt > > If other filesystems are being used, please correct me. Also please > include if they are on crypto or straight block device. > I've attached a patch below that should allow us to cheat. When it's applied, it outputs who called congestion_wait(), how long the timeout was and how long it waited for. By comparing before and after sleep times, we should be able to see which of the callers has significantly changed and if it's something easily addressable. > > Either way, reclaim is usually worried about writing pages but it would appear > > after this change that a lot of read activity can also stall a process in > > direct reclaim. What might be happening in Frans's particular case is that the > > tasklet that allocates high-order pages for the RX buffers is getting stalled > > by congestion caused by other processes doing reads from the filesystem. > > While it makes sense from a congestion point of view to halt the IO, the > > reclaim operations from direct reclaimers is getting delayed for long enough > > to cause problems for GFP_ATOMIC. > > The congestion_wait code either waits for congestion to clear or for > a given timeout. The part that isn't clear is if before the patch > we waited a very short time (congestion cleared quickly) or a very long > time (we hit the timeout or congestion cleared slowly). > Using the instrumentation patch, I found with a very basic test that we are waiting for short periods of time more often with the patch applied 1 congestion_wait rw=1 delay 6 timeout 25 :: before commit 7 kswapd congestion_wait rw=1 delay 0 timeout 25 :: before commit 32 kswapd congestion_wait sync=0 delay 0 timeout 25 :: after commit 61 kswapd congestion_wait rw=1 delay 1 timeout 25 :: before commit 133 kswapd congestion_wait sync=0 delay 1 timeout 25 :: after commit 16 kswapd congestion_wait rw=1 delay 2 timeout 25 :: before commit 70 kswapd congestion_wait sync=0 delay 2 timeout 25 :: after commit 1 try_to_free_pages congestion_wait sync=0 delay 2 timeout 25 :: after commit 17 kswapd congestion_wait rw=1 delay 3 timeout 25 :: before commit 28 kswapd congestion_wait sync=0 delay 3 timeout 25 :: after commit 1 try_to_free_pages congestion_wait sync=0 delay 3 timeout 25 :: after commit 23 kswapd congestion_wait rw=1 delay 4 timeout 25 :: before commit 16 kswapd congestion_wait sync=0 delay 4 timeout 25 :: after commit 5 try_to_free_pages congestion_wait sync=0 delay 4 timeout 25 :: after commit 20 kswapd congestion_wait rw=1 delay 5 timeout 25 :: before commit 18 kswapd congestion_wait sync=0 delay 5 timeout 25 :: after commit 3 try_to_free_pages congestion_wait sync=0 delay 5 timeout 25 :: after commit 21 kswapd congestion_wait rw=1 delay 6 timeout 25 :: before commit 8 kswapd congestion_wait sync=0 delay 6 timeout 25 :: after commit 2 try_to_free_pages congestion_wait sync=0 delay 6 timeout 25 :: after commit 13 kswapd congestion_wait rw=1 delay 7 timeout 25 :: before commit 12 kswapd congestion_wait sync=0 delay 7 timeout 25 :: after commit 2 try_to_free_pages congestion_wait sync=0 delay 7 timeout 25 :: after commit 8 kswapd congestion_wait rw=1 delay 8 timeout 25 :: before commit 7 kswapd congestion_wait sync=0 delay 8 timeout 25 :: after commit 9 kswapd congestion_wait rw=1 delay 9 timeout 25 :: before commit 5 kswapd congestion_wait sync=0 delay 9 timeout 25 :: after commit 2 try_to_free_pages congestion_wait sync=0 delay 9 timeout 25 :: after commit 4 kswapd congestion_wait rw=1 delay 10 timeout 25 :: before commit 5 kswapd congestion_wait sync=0 delay 10 timeout 25 :: after commit 1 try_to_free_pages congestion_wait sync=0 delay 10 timeout 25 :: after commit [... remaining output snipped ...] The before and after commit are really 2.6.31 and 2.6.31-patch-reverted. The first column is how many times we delayed for that length of time. To generate the output, I just took the console log from both kernels with a basic test, put the congestion_wait lines into two separate files and cat congestion-*-sorted | sort -n -k5 | uniq -c to give a count of how many times we delayed for a particular caller. > The easiest way to tell is to just replace the congestion_wait() calls > in direct reclaim with schedule_timeout_interruptible(10), test, then > schedule_timeout_interruptible(HZ/20), then test again. > Reclaim can also call congestion_wait() and maybe the problem isn't within the page allocator at all but that it's indirectly affected by timing. > > > > Does this sound plausible to you? If so, what's the best way of > > addressing this? Changing congestion_wait back to WRITE (assuming that > > works for Frans)? Changing it to SYNC (again, assuming it actually > > works) or a revert? > > I don't think changing it to SYNC is a good plan unless we're actually > doing sync io. It would be better to just wait on one of the pages that > you've sent down (or its hashed waitqueue since the page can go away). > Frans, is there any chance you could apply the following patch and get the console logs for a vanilla kernel and with the congestion patches reverted? I'm hoping it'll be able to tell us which of the callers has significantly changed in timing. If there is one caller that has significantly changed, it might be enough to address just that caller. ===== ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-20 10:48 ` Mel Gorman (?) (?) @ 2009-10-26 21:06 ` Frans Pop 2009-10-27 14:54 ` Mel Gorman 2009-11-05 20:14 ` Frans Pop -1 siblings, 2 replies; 384+ messages in thread From: Frans Pop @ 2009-10-26 21:06 UTC (permalink / raw) To: Mel Gorman Cc: Chris Mason, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm [-- Attachment #1: Type: text/plain, Size: 2324 bytes --] On Tuesday 20 October 2009, Mel Gorman wrote: > I've attached a patch below that should allow us to cheat. When it's > applied, it outputs who called congestion_wait(), how long the timeout > was and how long it waited for. By comparing before and after sleep > times, we should be able to see which of the callers has significantly > changed and if it's something easily addressable. The results from this look fairly interesting (although I may be a bad judge as I don't really know what I'm looking at ;-). I've tested with two kernels: 1) 2.6.31.1: 1 test run 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs The 1st kernel had the expected "freeze" while reading commits in gitk; reading commits with the 2nd kernel was more fluent. I did 2 runs with the 2nd kernel as the first run had a fairly long music skip and more SKB errors than expected. The second run was fairly normal with no music skips at all even though it had a few SKB errors. Data for the tests: 1st kernel 2nd kernel 1 2nd kernel 2 end reading commits 1:15 1:00 0:55 "freeze" yes no no branch data shown 1:55 1:15 1:10 system quiet 2:25 1:50 1:45 # SKB allocation errors 10 53 5 Note that the test is substantially faster with the 2nd kernel and that the SKB errors don't really affect the duration of the test. Attached a tarball with the kernel logs, both the full logs and a stripped version with only the lines generated during the actual test. Something like this will extract the debug data from the logs: $ grep "delay " <file> | sed "s/^.*\] //" Also attached a ODF spreadsheet with a summary of the data for all 3 tests. I've dropped the congestion_wait and sync/rw= columns as they were always the same (rw=1 for 1st kernel and sync=0 for 2nd kernel). I've added a column "weighed delay" and totals for that column and the count column. My layman's observations are: - without the revert 'background_writeout' is called a lot less frequently, but when it's called it gets long delays - without the revert you have 'wb_kupdate', which is relatively expensive - with the revert 'shrink_list' is relatively expensive, although not really in absolute terms You people may want to look at exactly what happens directly around the SKB allocation errors. I've only looked at totals. Cheers, FJP [-- Attachment #2: logs.tgz --] [-- Type: application/x-tgz, Size: 151463 bytes --] [-- Attachment #3: results.ods --] [-- Type: application/vnd.oasis.opendocument.spreadsheet, Size: 20051 bytes --] ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-26 21:06 ` Frans Pop 2009-10-27 14:54 ` Mel Gorman @ 2009-10-27 14:54 ` Mel Gorman 1 sibling, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-27 14:54 UTC (permalink / raw) To: Frans Pop Cc: Chris Mason, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Mon, Oct 26, 2009 at 10:06:09PM +0100, Frans Pop wrote: > On Tuesday 20 October 2009, Mel Gorman wrote: > > I've attached a patch below that should allow us to cheat. When it's > > applied, it outputs who called congestion_wait(), how long the timeout > > was and how long it waited for. By comparing before and after sleep > > times, we should be able to see which of the callers has significantly > > changed and if it's something easily addressable. > > The results from this look fairly interesting (although I may be a bad > judge as I don't really know what I'm looking at ;-). > > I've tested with two kernels: > 1) 2.6.31.1: 1 test run > 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs > > The 1st kernel had the expected "freeze" while reading commits in gitk; > reading commits with the 2nd kernel was more fluent. > I did 2 runs with the 2nd kernel as the first run had a fairly long music > skip and more SKB errors than expected. The second run was fairly normal > with no music skips at all even though it had a few SKB errors. > > Data for the tests: > 1st kernel 2nd kernel 1 2nd kernel 2 > end reading commits 1:15 1:00 0:55 > "freeze" yes no no > branch data shown 1:55 1:15 1:10 > system quiet 2:25 1:50 1:45 > # SKB allocation errors 10 53 5 > > Note that the test is substantially faster with the 2nd kernel and that the > SKB errors don't really affect the duration of the test. > Ok. I think that despite expectations, the writeback changes have changed the timing significantly enough to be worth examining closer. > > - without the revert 'background_writeout' is called a lot less frequently, > but when it's called it gets long delays > - without the revert you have 'wb_kupdate', which is relatively expensive > - with the revert 'shrink_list' is relatively expensive, although not > really in absolute terms > Lets look at the callers that waited in congestion_wait() for at least 25 jiffies. 2.6.31.1-async-sync-congestion-wait i.e. vanilla kernel generated with: cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c 24 background_writeout congestion_wait sync=0 delay 25 timeout 25 203 kswapd congestion_wait sync=0 delay 25 timeout 25 5 shrink_list congestion_wait sync=0 delay 25 timeout 25 155 try_to_free_pages congestion_wait sync=0 delay 25 timeout 25 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 2 kswapd congestion_wait sync=0 delay 26 timeout 25 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 1 try_to_free_pages congestion_wait sync=0 delay 54 timeout 25 2.6.31.1-write-congestion-wait i.e. kernel with patch reverted generated with: cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c 2 background_writeout congestion_wait rw=1 delay 25 timeout 25 188 kswapd congestion_wait rw=1 delay 25 timeout 25 14 shrink_list congestion_wait rw=1 delay 25 timeout 25 181 try_to_free_pages congestion_wait rw=1 delay 25 timeout 25 5 kswapd congestion_wait rw=1 delay 26 timeout 25 10 try_to_free_pages congestion_wait rw=1 delay 26 timeout 25 3 try_to_free_pages congestion_wait rw=1 delay 27 timeout 25 1 kswapd congestion_wait rw=1 delay 29 timeout 25 1 __alloc_pages_nodemask congestion_wait rw=1 delay 30 timeout 5 1 try_to_free_pages congestion_wait rw=1 delay 31 timeout 25 1 try_to_free_pages congestion_wait rw=1 delay 35 timeout 25 1 kswapd congestion_wait rw=1 delay 51 timeout 25 1 try_to_free_pages congestion_wait rw=1 delay 56 timeout 25 So, wb_kupdate and background_writeout are the big movers in terms of waiting, not the direct reclaimers which is what we were expecting. Of those big movers, wb_kupdate is the most interested because compare the following $ cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup [ no output ] $ $ cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup 1 wb_kupdate congestion_wait sync=0 delay 15 timeout 25 1 wb_kupdate congestion_wait sync=0 delay 23 timeout 25 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 The vanilla kernel is not waiting in wb_kupdate at all. Jens, before the congestion_wait() changes, wb_kupdate was waiting on congestion and afterwards it's not. Furthermore, look at the number of pages that are queued for writeback in the two page allocation failure reports. without-revert: writeback:65653 with-revert: writeback:21713 So, after the move to async/sync, a lot more pages are getting queued for writeback - more than three times the number of pages are queued for writeback with the vanilla kernel. This amount of congestion might be why direct reclaimers and kswapd's timings have changed so much. Chris Mason hinted at this but I didn't quite "get it" at the time but is it possible that writeback_inodes() is converting what is expected to be async IO into sync IO? One way of checking this is if Frans could test the patch below that makes wb_kupdate wait on sync instead of async. If this makes a difference, I think the three main areas of trouble we are now seeing are 1. page allocator regressions - mostly fixed hopefully 2. page writeback change in timing - theory yet to be confirmed 3. drivers using more atomics - iwlagn specific, being dealt with Of course, the big problem is if the changes are due to major timing differences in page writeback, then mainline is a totally different shape of problem as pdflush has been replaced there. ==== Have wb_kupdate wait on sync IO congestion instead of async wb_kupdate is expected to only have queued up pages for async IO. However, something screwy is happening because it never appears to go to sleep. Frans, can you test with this patch instead of the revert please? Preferably, keep the verbose-congestion_wait patch applied so we can still get an idea who is going to sleep and for how long when calling congestion_wait. thanks Not-signed-off-hacket-job: Mel Gorman <mel@csn.ul.ie> --- diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 81627eb..cb646dd 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -787,7 +787,7 @@ static void wb_kupdate(unsigned long arg) writeback_inodes(&wbc); if (wbc.nr_to_write > 0) { if (wbc.encountered_congestion || wbc.more_io) - congestion_wait(BLK_RW_ASYNC, HZ/10); + congestion_wait(BLK_RW_SYNC, HZ/10); else break; /* All the old data is written */ } ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-27 14:54 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-27 14:54 UTC (permalink / raw) To: Frans Pop Cc: Chris Mason, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg On Mon, Oct 26, 2009 at 10:06:09PM +0100, Frans Pop wrote: > On Tuesday 20 October 2009, Mel Gorman wrote: > > I've attached a patch below that should allow us to cheat. When it's > > applied, it outputs who called congestion_wait(), how long the timeout > > was and how long it waited for. By comparing before and after sleep > > times, we should be able to see which of the callers has significantly > > changed and if it's something easily addressable. > > The results from this look fairly interesting (although I may be a bad > judge as I don't really know what I'm looking at ;-). > > I've tested with two kernels: > 1) 2.6.31.1: 1 test run > 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs > > The 1st kernel had the expected "freeze" while reading commits in gitk; > reading commits with the 2nd kernel was more fluent. > I did 2 runs with the 2nd kernel as the first run had a fairly long music > skip and more SKB errors than expected. The second run was fairly normal > with no music skips at all even though it had a few SKB errors. > > Data for the tests: > 1st kernel 2nd kernel 1 2nd kernel 2 > end reading commits 1:15 1:00 0:55 > "freeze" yes no no > branch data shown 1:55 1:15 1:10 > system quiet 2:25 1:50 1:45 > # SKB allocation errors 10 53 5 > > Note that the test is substantially faster with the 2nd kernel and that the > SKB errors don't really affect the duration of the test. > Ok. I think that despite expectations, the writeback changes have changed the timing significantly enough to be worth examining closer. > > - without the revert 'background_writeout' is called a lot less frequently, > but when it's called it gets long delays > - without the revert you have 'wb_kupdate', which is relatively expensive > - with the revert 'shrink_list' is relatively expensive, although not > really in absolute terms > Lets look at the callers that waited in congestion_wait() for at least 25 jiffies. 2.6.31.1-async-sync-congestion-wait i.e. vanilla kernel generated with: cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c 24 background_writeout congestion_wait sync=0 delay 25 timeout 25 203 kswapd congestion_wait sync=0 delay 25 timeout 25 5 shrink_list congestion_wait sync=0 delay 25 timeout 25 155 try_to_free_pages congestion_wait sync=0 delay 25 timeout 25 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 2 kswapd congestion_wait sync=0 delay 26 timeout 25 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 1 try_to_free_pages congestion_wait sync=0 delay 54 timeout 25 2.6.31.1-write-congestion-wait i.e. kernel with patch reverted generated with: cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c 2 background_writeout congestion_wait rw=1 delay 25 timeout 25 188 kswapd congestion_wait rw=1 delay 25 timeout 25 14 shrink_list congestion_wait rw=1 delay 25 timeout 25 181 try_to_free_pages congestion_wait rw=1 delay 25 timeout 25 5 kswapd congestion_wait rw=1 delay 26 timeout 25 10 try_to_free_pages congestion_wait rw=1 delay 26 timeout 25 3 try_to_free_pages congestion_wait rw=1 delay 27 timeout 25 1 kswapd congestion_wait rw=1 delay 29 timeout 25 1 __alloc_pages_nodemask congestion_wait rw=1 delay 30 timeout 5 1 try_to_free_pages congestion_wait rw=1 delay 31 timeout 25 1 try_to_free_pages congestion_wait rw=1 delay 35 timeout 25 1 kswapd congestion_wait rw=1 delay 51 timeout 25 1 try_to_free_pages congestion_wait rw=1 delay 56 timeout 25 So, wb_kupdate and background_writeout are the big movers in terms of waiting, not the direct reclaimers which is what we were expecting. Of those big movers, wb_kupdate is the most interested because compare the following $ cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup [ no output ] $ $ cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup 1 wb_kupdate congestion_wait sync=0 delay 15 timeout 25 1 wb_kupdate congestion_wait sync=0 delay 23 timeout 25 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 The vanilla kernel is not waiting in wb_kupdate at all. Jens, before the congestion_wait() changes, wb_kupdate was waiting on congestion and afterwards it's not. Furthermore, look at the number of pages that are queued for writeback in the two page allocation failure reports. without-revert: writeback:65653 with-revert: writeback:21713 So, after the move to async/sync, a lot more pages are getting queued for writeback - more than three times the number of pages are queued for writeback with the vanilla kernel. This amount of congestion might be why direct reclaimers and kswapd's timings have changed so much. Chris Mason hinted at this but I didn't quite "get it" at the time but is it possible that writeback_inodes() is converting what is expected to be async IO into sync IO? One way of checking this is if Frans could test the patch below that makes wb_kupdate wait on sync instead of async. If this makes a difference, I think the three main areas of trouble we are now seeing are 1. page allocator regressions - mostly fixed hopefully 2. page writeback change in timing - theory yet to be confirmed 3. drivers using more atomics - iwlagn specific, being dealt with Of course, the big problem is if the changes are due to major timing differences in page writeback, then mainline is a totally different shape of problem as pdflush has been replaced there. ==== Have wb_kupdate wait on sync IO congestion instead of async wb_kupdate is expected to only have queued up pages for async IO. However, something screwy is happening because it never appears to go to sleep. Frans, can you test with this patch instead of the revert please? Preferably, keep the verbose-congestion_wait patch applied so we can still get an idea who is going to sleep and for how long when calling congestion_wait. thanks Not-signed-off-hacket-job: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> --- diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 81627eb..cb646dd 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -787,7 +787,7 @@ static void wb_kupdate(unsigned long arg) writeback_inodes(&wbc); if (wbc.nr_to_write > 0) { if (wbc.encountered_congestion || wbc.more_io) - congestion_wait(BLK_RW_ASYNC, HZ/10); + congestion_wait(BLK_RW_SYNC, HZ/10); else break; /* All the old data is written */ } ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-27 14:54 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-27 14:54 UTC (permalink / raw) To: Frans Pop Cc: Chris Mason, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Mon, Oct 26, 2009 at 10:06:09PM +0100, Frans Pop wrote: > On Tuesday 20 October 2009, Mel Gorman wrote: > > I've attached a patch below that should allow us to cheat. When it's > > applied, it outputs who called congestion_wait(), how long the timeout > > was and how long it waited for. By comparing before and after sleep > > times, we should be able to see which of the callers has significantly > > changed and if it's something easily addressable. > > The results from this look fairly interesting (although I may be a bad > judge as I don't really know what I'm looking at ;-). > > I've tested with two kernels: > 1) 2.6.31.1: 1 test run > 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs > > The 1st kernel had the expected "freeze" while reading commits in gitk; > reading commits with the 2nd kernel was more fluent. > I did 2 runs with the 2nd kernel as the first run had a fairly long music > skip and more SKB errors than expected. The second run was fairly normal > with no music skips at all even though it had a few SKB errors. > > Data for the tests: > 1st kernel 2nd kernel 1 2nd kernel 2 > end reading commits 1:15 1:00 0:55 > "freeze" yes no no > branch data shown 1:55 1:15 1:10 > system quiet 2:25 1:50 1:45 > # SKB allocation errors 10 53 5 > > Note that the test is substantially faster with the 2nd kernel and that the > SKB errors don't really affect the duration of the test. > Ok. I think that despite expectations, the writeback changes have changed the timing significantly enough to be worth examining closer. > > - without the revert 'background_writeout' is called a lot less frequently, > but when it's called it gets long delays > - without the revert you have 'wb_kupdate', which is relatively expensive > - with the revert 'shrink_list' is relatively expensive, although not > really in absolute terms > Lets look at the callers that waited in congestion_wait() for at least 25 jiffies. 2.6.31.1-async-sync-congestion-wait i.e. vanilla kernel generated with: cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c 24 background_writeout congestion_wait sync=0 delay 25 timeout 25 203 kswapd congestion_wait sync=0 delay 25 timeout 25 5 shrink_list congestion_wait sync=0 delay 25 timeout 25 155 try_to_free_pages congestion_wait sync=0 delay 25 timeout 25 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 2 kswapd congestion_wait sync=0 delay 26 timeout 25 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 1 try_to_free_pages congestion_wait sync=0 delay 54 timeout 25 2.6.31.1-write-congestion-wait i.e. kernel with patch reverted generated with: cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c 2 background_writeout congestion_wait rw=1 delay 25 timeout 25 188 kswapd congestion_wait rw=1 delay 25 timeout 25 14 shrink_list congestion_wait rw=1 delay 25 timeout 25 181 try_to_free_pages congestion_wait rw=1 delay 25 timeout 25 5 kswapd congestion_wait rw=1 delay 26 timeout 25 10 try_to_free_pages congestion_wait rw=1 delay 26 timeout 25 3 try_to_free_pages congestion_wait rw=1 delay 27 timeout 25 1 kswapd congestion_wait rw=1 delay 29 timeout 25 1 __alloc_pages_nodemask congestion_wait rw=1 delay 30 timeout 5 1 try_to_free_pages congestion_wait rw=1 delay 31 timeout 25 1 try_to_free_pages congestion_wait rw=1 delay 35 timeout 25 1 kswapd congestion_wait rw=1 delay 51 timeout 25 1 try_to_free_pages congestion_wait rw=1 delay 56 timeout 25 So, wb_kupdate and background_writeout are the big movers in terms of waiting, not the direct reclaimers which is what we were expecting. Of those big movers, wb_kupdate is the most interested because compare the following $ cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup [ no output ] $ $ cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup 1 wb_kupdate congestion_wait sync=0 delay 15 timeout 25 1 wb_kupdate congestion_wait sync=0 delay 23 timeout 25 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 The vanilla kernel is not waiting in wb_kupdate at all. Jens, before the congestion_wait() changes, wb_kupdate was waiting on congestion and afterwards it's not. Furthermore, look at the number of pages that are queued for writeback in the two page allocation failure reports. without-revert: writeback:65653 with-revert: writeback:21713 So, after the move to async/sync, a lot more pages are getting queued for writeback - more than three times the number of pages are queued for writeback with the vanilla kernel. This amount of congestion might be why direct reclaimers and kswapd's timings have changed so much. Chris Mason hinted at this but I didn't quite "get it" at the time but is it possible that writeback_inodes() is converting what is expected to be async IO into sync IO? One way of checking this is if Frans could test the patch below that makes wb_kupdate wait on sync instead of async. If this makes a difference, I think the three main areas of trouble we are now seeing are 1. page allocator regressions - mostly fixed hopefully 2. page writeback change in timing - theory yet to be confirmed 3. drivers using more atomics - iwlagn specific, being dealt with Of course, the big problem is if the changes are due to major timing differences in page writeback, then mainline is a totally different shape of problem as pdflush has been replaced there. ==== Have wb_kupdate wait on sync IO congestion instead of async wb_kupdate is expected to only have queued up pages for async IO. However, something screwy is happening because it never appears to go to sleep. Frans, can you test with this patch instead of the revert please? Preferably, keep the verbose-congestion_wait patch applied so we can still get an idea who is going to sleep and for how long when calling congestion_wait. thanks Not-signed-off-hacket-job: Mel Gorman <mel@csn.ul.ie> --- diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 81627eb..cb646dd 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -787,7 +787,7 @@ static void wb_kupdate(unsigned long arg) writeback_inodes(&wbc); if (wbc.nr_to_write > 0) { if (wbc.encountered_congestion || wbc.more_io) - congestion_wait(BLK_RW_ASYNC, HZ/10); + congestion_wait(BLK_RW_SYNC, HZ/10); else break; /* All the old data is written */ } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-27 14:54 ` Mel Gorman @ 2009-10-27 15:16 ` KOSAKI Motohiro -1 siblings, 0 replies; 384+ messages in thread From: KOSAKI Motohiro @ 2009-10-27 15:16 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Chris Mason, David Rientjes, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm 2009/10/27 Mel Gorman <mel@csn.ul.ie>: > On Mon, Oct 26, 2009 at 10:06:09PM +0100, Frans Pop wrote: >> On Tuesday 20 October 2009, Mel Gorman wrote: >> > I've attached a patch below that should allow us to cheat. When it's >> > applied, it outputs who called congestion_wait(), how long the timeout >> > was and how long it waited for. By comparing before and after sleep >> > times, we should be able to see which of the callers has significantly >> > changed and if it's something easily addressable. >> >> The results from this look fairly interesting (although I may be a bad >> judge as I don't really know what I'm looking at ;-). >> >> I've tested with two kernels: >> 1) 2.6.31.1: 1 test run >> 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs >> >> The 1st kernel had the expected "freeze" while reading commits in gitk; >> reading commits with the 2nd kernel was more fluent. >> I did 2 runs with the 2nd kernel as the first run had a fairly long music >> skip and more SKB errors than expected. The second run was fairly normal >> with no music skips at all even though it had a few SKB errors. >> >> Data for the tests: >> 1st kernel 2nd kernel 1 2nd kernel 2 >> end reading commits 1:15 1:00 0:55 >> "freeze" yes no no >> branch data shown 1:55 1:15 1:10 >> system quiet 2:25 1:50 1:45 >> # SKB allocation errors 10 53 5 >> >> Note that the test is substantially faster with the 2nd kernel and that the >> SKB errors don't really affect the duration of the test. >> > > Ok. I think that despite expectations, the writeback changes have > changed the timing significantly enough to be worth examining closer. > >> >> - without the revert 'background_writeout' is called a lot less frequently, >> but when it's called it gets long delays >> - without the revert you have 'wb_kupdate', which is relatively expensive >> - with the revert 'shrink_list' is relatively expensive, although not >> really in absolute terms >> > > Lets look at the callers that waited in congestion_wait() for at least > 25 jiffies. > > 2.6.31.1-async-sync-congestion-wait i.e. vanilla kernel > generated with: cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c > 24 background_writeout congestion_wait sync=0 delay 25 timeout 25 > 203 kswapd congestion_wait sync=0 delay 25 timeout 25 > 5 shrink_list congestion_wait sync=0 delay 25 timeout 25 > 155 try_to_free_pages congestion_wait sync=0 delay 25 timeout 25 > 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 > 2 kswapd congestion_wait sync=0 delay 26 timeout 25 > 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 > 1 try_to_free_pages congestion_wait sync=0 delay 54 timeout 25 > > 2.6.31.1-write-congestion-wait i.e. kernel with patch reverted > generated with: cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c > 2 background_writeout congestion_wait rw=1 delay 25 timeout 25 > 188 kswapd congestion_wait rw=1 delay 25 timeout 25 > 14 shrink_list congestion_wait rw=1 delay 25 timeout 25 > 181 try_to_free_pages congestion_wait rw=1 delay 25 timeout 25 > 5 kswapd congestion_wait rw=1 delay 26 timeout 25 > 10 try_to_free_pages congestion_wait rw=1 delay 26 timeout 25 > 3 try_to_free_pages congestion_wait rw=1 delay 27 timeout 25 > 1 kswapd congestion_wait rw=1 delay 29 timeout 25 > 1 __alloc_pages_nodemask congestion_wait rw=1 delay 30 timeout 5 > 1 try_to_free_pages congestion_wait rw=1 delay 31 timeout 25 > 1 try_to_free_pages congestion_wait rw=1 delay 35 timeout 25 > 1 kswapd congestion_wait rw=1 delay 51 timeout 25 > 1 try_to_free_pages congestion_wait rw=1 delay 56 timeout 25 > > So, wb_kupdate and background_writeout are the big movers in terms of waiting, > not the direct reclaimers which is what we were expecting. Of those big > movers, wb_kupdate is the most interested because compare the following > > $ cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup > [ no output ] > $ $ cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup > 1 wb_kupdate congestion_wait sync=0 delay 15 timeout 25 > 1 wb_kupdate congestion_wait sync=0 delay 23 timeout 25 > 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 > 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 > > The vanilla kernel is not waiting in wb_kupdate at all. > > Jens, before the congestion_wait() changes, wb_kupdate was waiting on > congestion and afterwards it's not. Furthermore, look at the number of pages > that are queued for writeback in the two page allocation failure reports. > > without-revert: writeback:65653 > with-revert: writeback:21713 > > So, after the move to async/sync, a lot more pages are getting queued > for writeback - more than three times the number of pages are queued for > writeback with the vanilla kernel. This amount of congestion might be why > direct reclaimers and kswapd's timings have changed so much. > > Chris Mason hinted at this but I didn't quite "get it" at the time but is it > possible that writeback_inodes() is converting what is expected to be async > IO into sync IO? One way of checking this is if Frans could test the patch > below that makes wb_kupdate wait on sync instead of async. > > If this makes a difference, I think the three main areas of trouble we > are now seeing are > > 1. page allocator regressions - mostly fixed hopefully > 2. page writeback change in timing - theory yet to be confirmed > 3. drivers using more atomics - iwlagn specific, being dealt with > > Of course, the big problem is if the changes are due to major timing > differences in page writeback, then mainline is a totally different > shape of problem as pdflush has been replaced there. > > ==== > Have wb_kupdate wait on sync IO congestion instead of async > > wb_kupdate is expected to only have queued up pages for async IO. > However, something screwy is happening because it never appears to go to > sleep. Frans, can you test with this patch instead of the revert please? > Preferably, keep the verbose-congestion_wait patch applied so we can > still get an idea who is going to sleep and for how long when calling > congestion_wait. thanks > > Not-signed-off-hacket-job: Mel Gorman <mel@csn.ul.ie> > --- > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index 81627eb..cb646dd 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -787,7 +787,7 @@ static void wb_kupdate(unsigned long arg) > writeback_inodes(&wbc); > if (wbc.nr_to_write > 0) { > if (wbc.encountered_congestion || wbc.more_io) > - congestion_wait(BLK_RW_ASYNC, HZ/10); > + congestion_wait(BLK_RW_SYNC, HZ/10); > else > break; /* All the old data is written */ > } Hmm, This doesn't looks correct to me. BLK_RW_ASYNC mean async write. BLK_RW_SYNC mean read and sync-write. wb_kupdate use WB_SYNC_NONE. it's async write. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-27 15:16 ` KOSAKI Motohiro 0 siblings, 0 replies; 384+ messages in thread From: KOSAKI Motohiro @ 2009-10-27 15:16 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, Chris Mason, David Rientjes, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm 2009/10/27 Mel Gorman <mel@csn.ul.ie>: > On Mon, Oct 26, 2009 at 10:06:09PM +0100, Frans Pop wrote: >> On Tuesday 20 October 2009, Mel Gorman wrote: >> > I've attached a patch below that should allow us to cheat. When it's >> > applied, it outputs who called congestion_wait(), how long the timeout >> > was and how long it waited for. By comparing before and after sleep >> > times, we should be able to see which of the callers has significantly >> > changed and if it's something easily addressable. >> >> The results from this look fairly interesting (although I may be a bad >> judge as I don't really know what I'm looking at ;-). >> >> I've tested with two kernels: >> 1) 2.6.31.1: 1 test run >> 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs >> >> The 1st kernel had the expected "freeze" while reading commits in gitk; >> reading commits with the 2nd kernel was more fluent. >> I did 2 runs with the 2nd kernel as the first run had a fairly long music >> skip and more SKB errors than expected. The second run was fairly normal >> with no music skips at all even though it had a few SKB errors. >> >> Data for the tests: >> 1st kernel 2nd kernel 1 2nd kernel 2 >> end reading commits 1:15 1:00 0:55 >> "freeze" yes no no >> branch data shown 1:55 1:15 1:10 >> system quiet 2:25 1:50 1:45 >> # SKB allocation errors 10 53 5 >> >> Note that the test is substantially faster with the 2nd kernel and that the >> SKB errors don't really affect the duration of the test. >> > > Ok. I think that despite expectations, the writeback changes have > changed the timing significantly enough to be worth examining closer. > >> >> - without the revert 'background_writeout' is called a lot less frequently, >> but when it's called it gets long delays >> - without the revert you have 'wb_kupdate', which is relatively expensive >> - with the revert 'shrink_list' is relatively expensive, although not >> really in absolute terms >> > > Lets look at the callers that waited in congestion_wait() for at least > 25 jiffies. > > 2.6.31.1-async-sync-congestion-wait i.e. vanilla kernel > generated with: cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c > 24 background_writeout congestion_wait sync=0 delay 25 timeout 25 > 203 kswapd congestion_wait sync=0 delay 25 timeout 25 > 5 shrink_list congestion_wait sync=0 delay 25 timeout 25 > 155 try_to_free_pages congestion_wait sync=0 delay 25 timeout 25 > 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 > 2 kswapd congestion_wait sync=0 delay 26 timeout 25 > 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 > 1 try_to_free_pages congestion_wait sync=0 delay 54 timeout 25 > > 2.6.31.1-write-congestion-wait i.e. kernel with patch reverted > generated with: cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c > 2 background_writeout congestion_wait rw=1 delay 25 timeout 25 > 188 kswapd congestion_wait rw=1 delay 25 timeout 25 > 14 shrink_list congestion_wait rw=1 delay 25 timeout 25 > 181 try_to_free_pages congestion_wait rw=1 delay 25 timeout 25 > 5 kswapd congestion_wait rw=1 delay 26 timeout 25 > 10 try_to_free_pages congestion_wait rw=1 delay 26 timeout 25 > 3 try_to_free_pages congestion_wait rw=1 delay 27 timeout 25 > 1 kswapd congestion_wait rw=1 delay 29 timeout 25 > 1 __alloc_pages_nodemask congestion_wait rw=1 delay 30 timeout 5 > 1 try_to_free_pages congestion_wait rw=1 delay 31 timeout 25 > 1 try_to_free_pages congestion_wait rw=1 delay 35 timeout 25 > 1 kswapd congestion_wait rw=1 delay 51 timeout 25 > 1 try_to_free_pages congestion_wait rw=1 delay 56 timeout 25 > > So, wb_kupdate and background_writeout are the big movers in terms of waiting, > not the direct reclaimers which is what we were expecting. Of those big > movers, wb_kupdate is the most interested because compare the following > > $ cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup > [ no output ] > $ $ cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup > 1 wb_kupdate congestion_wait sync=0 delay 15 timeout 25 > 1 wb_kupdate congestion_wait sync=0 delay 23 timeout 25 > 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 > 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 > > The vanilla kernel is not waiting in wb_kupdate at all. > > Jens, before the congestion_wait() changes, wb_kupdate was waiting on > congestion and afterwards it's not. Furthermore, look at the number of pages > that are queued for writeback in the two page allocation failure reports. > > without-revert: writeback:65653 > with-revert: writeback:21713 > > So, after the move to async/sync, a lot more pages are getting queued > for writeback - more than three times the number of pages are queued for > writeback with the vanilla kernel. This amount of congestion might be why > direct reclaimers and kswapd's timings have changed so much. > > Chris Mason hinted at this but I didn't quite "get it" at the time but is it > possible that writeback_inodes() is converting what is expected to be async > IO into sync IO? One way of checking this is if Frans could test the patch > below that makes wb_kupdate wait on sync instead of async. > > If this makes a difference, I think the three main areas of trouble we > are now seeing are > > 1. page allocator regressions - mostly fixed hopefully > 2. page writeback change in timing - theory yet to be confirmed > 3. drivers using more atomics - iwlagn specific, being dealt with > > Of course, the big problem is if the changes are due to major timing > differences in page writeback, then mainline is a totally different > shape of problem as pdflush has been replaced there. > > ==== > Have wb_kupdate wait on sync IO congestion instead of async > > wb_kupdate is expected to only have queued up pages for async IO. > However, something screwy is happening because it never appears to go to > sleep. Frans, can you test with this patch instead of the revert please? > Preferably, keep the verbose-congestion_wait patch applied so we can > still get an idea who is going to sleep and for how long when calling > congestion_wait. thanks > > Not-signed-off-hacket-job: Mel Gorman <mel@csn.ul.ie> > --- > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index 81627eb..cb646dd 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -787,7 +787,7 @@ static void wb_kupdate(unsigned long arg) > writeback_inodes(&wbc); > if (wbc.nr_to_write > 0) { > if (wbc.encountered_congestion || wbc.more_io) > - congestion_wait(BLK_RW_ASYNC, HZ/10); > + congestion_wait(BLK_RW_SYNC, HZ/10); > else > break; /* All the old data is written */ > } Hmm, This doesn't looks correct to me. BLK_RW_ASYNC mean async write. BLK_RW_SYNC mean read and sync-write. wb_kupdate use WB_SYNC_NONE. it's async write. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-27 15:16 ` KOSAKI Motohiro (?) @ 2009-10-27 15:21 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-27 15:21 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Frans Pop, Chris Mason, David Rientjes, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Wed, Oct 28, 2009 at 12:16:30AM +0900, KOSAKI Motohiro wrote: > 2009/10/27 Mel Gorman <mel@csn.ul.ie>: > > On Mon, Oct 26, 2009 at 10:06:09PM +0100, Frans Pop wrote: > >> On Tuesday 20 October 2009, Mel Gorman wrote: > >> > I've attached a patch below that should allow us to cheat. When it's > >> > applied, it outputs who called congestion_wait(), how long the timeout > >> > was and how long it waited for. By comparing before and after sleep > >> > times, we should be able to see which of the callers has significantly > >> > changed and if it's something easily addressable. > >> > >> The results from this look fairly interesting (although I may be a bad > >> judge as I don't really know what I'm looking at ;-). > >> > >> I've tested with two kernels: > >> 1) 2.6.31.1: 1 test run > >> 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs > >> > >> The 1st kernel had the expected "freeze" while reading commits in gitk; > >> reading commits with the 2nd kernel was more fluent. > >> I did 2 runs with the 2nd kernel as the first run had a fairly long music > >> skip and more SKB errors than expected. The second run was fairly normal > >> with no music skips at all even though it had a few SKB errors. > >> > >> Data for the tests: > >> 1st kernel 2nd kernel 1 2nd kernel 2 > >> end reading commits 1:15 1:00 0:55 > >> "freeze" yes no no > >> branch data shown 1:55 1:15 1:10 > >> system quiet 2:25 1:50 1:45 > >> # SKB allocation errors 10 53 5 > >> > >> Note that the test is substantially faster with the 2nd kernel and that the > >> SKB errors don't really affect the duration of the test. > >> > > > > Ok. I think that despite expectations, the writeback changes have > > changed the timing significantly enough to be worth examining closer. > > > >> > >> - without the revert 'background_writeout' is called a lot less frequently, > >> but when it's called it gets long delays > >> - without the revert you have 'wb_kupdate', which is relatively expensive > >> - with the revert 'shrink_list' is relatively expensive, although not > >> really in absolute terms > >> > > > > Lets look at the callers that waited in congestion_wait() for at least > > 25 jiffies. > > > > 2.6.31.1-async-sync-congestion-wait i.e. vanilla kernel > > generated with: cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c > > 24 background_writeout congestion_wait sync=0 delay 25 timeout 25 > > 203 kswapd congestion_wait sync=0 delay 25 timeout 25 > > 5 shrink_list congestion_wait sync=0 delay 25 timeout 25 > > 155 try_to_free_pages congestion_wait sync=0 delay 25 timeout 25 > > 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 > > 2 kswapd congestion_wait sync=0 delay 26 timeout 25 > > 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 > > 1 try_to_free_pages congestion_wait sync=0 delay 54 timeout 25 > > > > 2.6.31.1-write-congestion-wait i.e. kernel with patch reverted > > generated with: cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c > > 2 background_writeout congestion_wait rw=1 delay 25 timeout 25 > > 188 kswapd congestion_wait rw=1 delay 25 timeout 25 > > 14 shrink_list congestion_wait rw=1 delay 25 timeout 25 > > 181 try_to_free_pages congestion_wait rw=1 delay 25 timeout 25 > > 5 kswapd congestion_wait rw=1 delay 26 timeout 25 > > 10 try_to_free_pages congestion_wait rw=1 delay 26 timeout 25 > > 3 try_to_free_pages congestion_wait rw=1 delay 27 timeout 25 > > 1 kswapd congestion_wait rw=1 delay 29 timeout 25 > > 1 __alloc_pages_nodemask congestion_wait rw=1 delay 30 timeout 5 > > 1 try_to_free_pages congestion_wait rw=1 delay 31 timeout 25 > > 1 try_to_free_pages congestion_wait rw=1 delay 35 timeout 25 > > 1 kswapd congestion_wait rw=1 delay 51 timeout 25 > > 1 try_to_free_pages congestion_wait rw=1 delay 56 timeout 25 > > > > So, wb_kupdate and background_writeout are the big movers in terms of waiting, > > not the direct reclaimers which is what we were expecting. Of those big > > movers, wb_kupdate is the most interested because compare the following > > > > $ cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup > > [ no output ] > > $ $ cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup > > 1 wb_kupdate congestion_wait sync=0 delay 15 timeout 25 > > 1 wb_kupdate congestion_wait sync=0 delay 23 timeout 25 > > 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 > > 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 > > > > The vanilla kernel is not waiting in wb_kupdate at all. > > > > Jens, before the congestion_wait() changes, wb_kupdate was waiting on > > congestion and afterwards it's not. Furthermore, look at the number of pages > > that are queued for writeback in the two page allocation failure reports. > > > > without-revert: writeback:65653 > > with-revert: writeback:21713 > > > > So, after the move to async/sync, a lot more pages are getting queued > > for writeback - more than three times the number of pages are queued for > > writeback with the vanilla kernel. This amount of congestion might be why > > direct reclaimers and kswapd's timings have changed so much. > > > > Chris Mason hinted at this but I didn't quite "get it" at the time but is it > > possible that writeback_inodes() is converting what is expected to be async > > IO into sync IO? One way of checking this is if Frans could test the patch > > below that makes wb_kupdate wait on sync instead of async. > > > > If this makes a difference, I think the three main areas of trouble we > > are now seeing are > > > > 1. page allocator regressions - mostly fixed hopefully > > 2. page writeback change in timing - theory yet to be confirmed > > 3. drivers using more atomics - iwlagn specific, being dealt with > > > > Of course, the big problem is if the changes are due to major timing > > differences in page writeback, then mainline is a totally different > > shape of problem as pdflush has been replaced there. > > > > ==== > > Have wb_kupdate wait on sync IO congestion instead of async > > > > wb_kupdate is expected to only have queued up pages for async IO. > > However, something screwy is happening because it never appears to go to > > sleep. Frans, can you test with this patch instead of the revert please? > > Preferably, keep the verbose-congestion_wait patch applied so we can > > still get an idea who is going to sleep and for how long when calling > > congestion_wait. thanks > > > > Not-signed-off-hacket-job: Mel Gorman <mel@csn.ul.ie> > > --- > > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > > index 81627eb..cb646dd 100644 > > --- a/mm/page-writeback.c > > +++ b/mm/page-writeback.c > > @@ -787,7 +787,7 @@ static void wb_kupdate(unsigned long arg) > > writeback_inodes(&wbc); > > if (wbc.nr_to_write > 0) { > > if (wbc.encountered_congestion || wbc.more_io) > > - congestion_wait(BLK_RW_ASYNC, HZ/10); > > + congestion_wait(BLK_RW_SYNC, HZ/10); > > else > > break; /* All the old data is written */ > > } > > Hmm, This doesn't looks correct to me. > > BLK_RW_ASYNC mean async write. > BLK_RW_SYNC mean read and sync-write. > > wb_kupdate use WB_SYNC_NONE. it's async write. > I don't think it's correct either which is why I described it as "something screwy is happening because it never appears to go to sleep". This is despite there being a whole lot of pages queued for writeback according to the page allocation failure reports. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-27 15:21 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-27 15:21 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Frans Pop, Chris Mason, David Rientjes, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg On Wed, Oct 28, 2009 at 12:16:30AM +0900, KOSAKI Motohiro wrote: > 2009/10/27 Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>: > > On Mon, Oct 26, 2009 at 10:06:09PM +0100, Frans Pop wrote: > >> On Tuesday 20 October 2009, Mel Gorman wrote: > >> > I've attached a patch below that should allow us to cheat. When it's > >> > applied, it outputs who called congestion_wait(), how long the timeout > >> > was and how long it waited for. By comparing before and after sleep > >> > times, we should be able to see which of the callers has significantly > >> > changed and if it's something easily addressable. > >> > >> The results from this look fairly interesting (although I may be a bad > >> judge as I don't really know what I'm looking at ;-). > >> > >> I've tested with two kernels: > >> 1) 2.6.31.1: 1 test run > >> 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs > >> > >> The 1st kernel had the expected "freeze" while reading commits in gitk; > >> reading commits with the 2nd kernel was more fluent. > >> I did 2 runs with the 2nd kernel as the first run had a fairly long music > >> skip and more SKB errors than expected. The second run was fairly normal > >> with no music skips at all even though it had a few SKB errors. > >> > >> Data for the tests: > >> 1st kernel 2nd kernel 1 2nd kernel 2 > >> end reading commits 1:15 1:00 0:55 > >> "freeze" yes no no > >> branch data shown 1:55 1:15 1:10 > >> system quiet 2:25 1:50 1:45 > >> # SKB allocation errors 10 53 5 > >> > >> Note that the test is substantially faster with the 2nd kernel and that the > >> SKB errors don't really affect the duration of the test. > >> > > > > Ok. I think that despite expectations, the writeback changes have > > changed the timing significantly enough to be worth examining closer. > > > >> > >> - without the revert 'background_writeout' is called a lot less frequently, > >> but when it's called it gets long delays > >> - without the revert you have 'wb_kupdate', which is relatively expensive > >> - with the revert 'shrink_list' is relatively expensive, although not > >> really in absolute terms > >> > > > > Lets look at the callers that waited in congestion_wait() for at least > > 25 jiffies. > > > > 2.6.31.1-async-sync-congestion-wait i.e. vanilla kernel > > generated with: cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c > > 24 background_writeout congestion_wait sync=0 delay 25 timeout 25 > > 203 kswapd congestion_wait sync=0 delay 25 timeout 25 > > 5 shrink_list congestion_wait sync=0 delay 25 timeout 25 > > 155 try_to_free_pages congestion_wait sync=0 delay 25 timeout 25 > > 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 > > 2 kswapd congestion_wait sync=0 delay 26 timeout 25 > > 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 > > 1 try_to_free_pages congestion_wait sync=0 delay 54 timeout 25 > > > > 2.6.31.1-write-congestion-wait i.e. kernel with patch reverted > > generated with: cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c > > 2 background_writeout congestion_wait rw=1 delay 25 timeout 25 > > 188 kswapd congestion_wait rw=1 delay 25 timeout 25 > > 14 shrink_list congestion_wait rw=1 delay 25 timeout 25 > > 181 try_to_free_pages congestion_wait rw=1 delay 25 timeout 25 > > 5 kswapd congestion_wait rw=1 delay 26 timeout 25 > > 10 try_to_free_pages congestion_wait rw=1 delay 26 timeout 25 > > 3 try_to_free_pages congestion_wait rw=1 delay 27 timeout 25 > > 1 kswapd congestion_wait rw=1 delay 29 timeout 25 > > 1 __alloc_pages_nodemask congestion_wait rw=1 delay 30 timeout 5 > > 1 try_to_free_pages congestion_wait rw=1 delay 31 timeout 25 > > 1 try_to_free_pages congestion_wait rw=1 delay 35 timeout 25 > > 1 kswapd congestion_wait rw=1 delay 51 timeout 25 > > 1 try_to_free_pages congestion_wait rw=1 delay 56 timeout 25 > > > > So, wb_kupdate and background_writeout are the big movers in terms of waiting, > > not the direct reclaimers which is what we were expecting. Of those big > > movers, wb_kupdate is the most interested because compare the following > > > > $ cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup > > [ no output ] > > $ $ cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup > > 1 wb_kupdate congestion_wait sync=0 delay 15 timeout 25 > > 1 wb_kupdate congestion_wait sync=0 delay 23 timeout 25 > > 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 > > 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 > > > > The vanilla kernel is not waiting in wb_kupdate at all. > > > > Jens, before the congestion_wait() changes, wb_kupdate was waiting on > > congestion and afterwards it's not. Furthermore, look at the number of pages > > that are queued for writeback in the two page allocation failure reports. > > > > without-revert: writeback:65653 > > with-revert: writeback:21713 > > > > So, after the move to async/sync, a lot more pages are getting queued > > for writeback - more than three times the number of pages are queued for > > writeback with the vanilla kernel. This amount of congestion might be why > > direct reclaimers and kswapd's timings have changed so much. > > > > Chris Mason hinted at this but I didn't quite "get it" at the time but is it > > possible that writeback_inodes() is converting what is expected to be async > > IO into sync IO? One way of checking this is if Frans could test the patch > > below that makes wb_kupdate wait on sync instead of async. > > > > If this makes a difference, I think the three main areas of trouble we > > are now seeing are > > > > 1. page allocator regressions - mostly fixed hopefully > > 2. page writeback change in timing - theory yet to be confirmed > > 3. drivers using more atomics - iwlagn specific, being dealt with > > > > Of course, the big problem is if the changes are due to major timing > > differences in page writeback, then mainline is a totally different > > shape of problem as pdflush has been replaced there. > > > > ==== > > Have wb_kupdate wait on sync IO congestion instead of async > > > > wb_kupdate is expected to only have queued up pages for async IO. > > However, something screwy is happening because it never appears to go to > > sleep. Frans, can you test with this patch instead of the revert please? > > Preferably, keep the verbose-congestion_wait patch applied so we can > > still get an idea who is going to sleep and for how long when calling > > congestion_wait. thanks > > > > Not-signed-off-hacket-job: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> > > --- > > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > > index 81627eb..cb646dd 100644 > > --- a/mm/page-writeback.c > > +++ b/mm/page-writeback.c > > @@ -787,7 +787,7 @@ static void wb_kupdate(unsigned long arg) > > writeback_inodes(&wbc); > > if (wbc.nr_to_write > 0) { > > if (wbc.encountered_congestion || wbc.more_io) > > - congestion_wait(BLK_RW_ASYNC, HZ/10); > > + congestion_wait(BLK_RW_SYNC, HZ/10); > > else > > break; /* All the old data is written */ > > } > > Hmm, This doesn't looks correct to me. > > BLK_RW_ASYNC mean async write. > BLK_RW_SYNC mean read and sync-write. > > wb_kupdate use WB_SYNC_NONE. it's async write. > I don't think it's correct either which is why I described it as "something screwy is happening because it never appears to go to sleep". This is despite there being a whole lot of pages queued for writeback according to the page allocation failure reports. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-27 15:21 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-27 15:21 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Frans Pop, Chris Mason, David Rientjes, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Wed, Oct 28, 2009 at 12:16:30AM +0900, KOSAKI Motohiro wrote: > 2009/10/27 Mel Gorman <mel@csn.ul.ie>: > > On Mon, Oct 26, 2009 at 10:06:09PM +0100, Frans Pop wrote: > >> On Tuesday 20 October 2009, Mel Gorman wrote: > >> > I've attached a patch below that should allow us to cheat. When it's > >> > applied, it outputs who called congestion_wait(), how long the timeout > >> > was and how long it waited for. By comparing before and after sleep > >> > times, we should be able to see which of the callers has significantly > >> > changed and if it's something easily addressable. > >> > >> The results from this look fairly interesting (although I may be a bad > >> judge as I don't really know what I'm looking at ;-). > >> > >> I've tested with two kernels: > >> 1) 2.6.31.1: 1 test run > >> 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs > >> > >> The 1st kernel had the expected "freeze" while reading commits in gitk; > >> reading commits with the 2nd kernel was more fluent. > >> I did 2 runs with the 2nd kernel as the first run had a fairly long music > >> skip and more SKB errors than expected. The second run was fairly normal > >> with no music skips at all even though it had a few SKB errors. > >> > >> Data for the tests: > >> 1st kernel 2nd kernel 1 2nd kernel 2 > >> end reading commits 1:15 1:00 0:55 > >> "freeze" yes no no > >> branch data shown 1:55 1:15 1:10 > >> system quiet 2:25 1:50 1:45 > >> # SKB allocation errors 10 53 5 > >> > >> Note that the test is substantially faster with the 2nd kernel and that the > >> SKB errors don't really affect the duration of the test. > >> > > > > Ok. I think that despite expectations, the writeback changes have > > changed the timing significantly enough to be worth examining closer. > > > >> > >> - without the revert 'background_writeout' is called a lot less frequently, > >> but when it's called it gets long delays > >> - without the revert you have 'wb_kupdate', which is relatively expensive > >> - with the revert 'shrink_list' is relatively expensive, although not > >> really in absolute terms > >> > > > > Lets look at the callers that waited in congestion_wait() for at least > > 25 jiffies. > > > > 2.6.31.1-async-sync-congestion-wait i.e. vanilla kernel > > generated with: cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c > > 24 background_writeout congestion_wait sync=0 delay 25 timeout 25 > > 203 kswapd congestion_wait sync=0 delay 25 timeout 25 > > 5 shrink_list congestion_wait sync=0 delay 25 timeout 25 > > 155 try_to_free_pages congestion_wait sync=0 delay 25 timeout 25 > > 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 > > 2 kswapd congestion_wait sync=0 delay 26 timeout 25 > > 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 > > 1 try_to_free_pages congestion_wait sync=0 delay 54 timeout 25 > > > > 2.6.31.1-write-congestion-wait i.e. kernel with patch reverted > > generated with: cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c > > 2 background_writeout congestion_wait rw=1 delay 25 timeout 25 > > 188 kswapd congestion_wait rw=1 delay 25 timeout 25 > > 14 shrink_list congestion_wait rw=1 delay 25 timeout 25 > > 181 try_to_free_pages congestion_wait rw=1 delay 25 timeout 25 > > 5 kswapd congestion_wait rw=1 delay 26 timeout 25 > > 10 try_to_free_pages congestion_wait rw=1 delay 26 timeout 25 > > 3 try_to_free_pages congestion_wait rw=1 delay 27 timeout 25 > > 1 kswapd congestion_wait rw=1 delay 29 timeout 25 > > 1 __alloc_pages_nodemask congestion_wait rw=1 delay 30 timeout 5 > > 1 try_to_free_pages congestion_wait rw=1 delay 31 timeout 25 > > 1 try_to_free_pages congestion_wait rw=1 delay 35 timeout 25 > > 1 kswapd congestion_wait rw=1 delay 51 timeout 25 > > 1 try_to_free_pages congestion_wait rw=1 delay 56 timeout 25 > > > > So, wb_kupdate and background_writeout are the big movers in terms of waiting, > > not the direct reclaimers which is what we were expecting. Of those big > > movers, wb_kupdate is the most interested because compare the following > > > > $ cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup > > [ no output ] > > $ $ cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup > > 1 wb_kupdate congestion_wait sync=0 delay 15 timeout 25 > > 1 wb_kupdate congestion_wait sync=0 delay 23 timeout 25 > > 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 > > 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 > > > > The vanilla kernel is not waiting in wb_kupdate at all. > > > > Jens, before the congestion_wait() changes, wb_kupdate was waiting on > > congestion and afterwards it's not. Furthermore, look at the number of pages > > that are queued for writeback in the two page allocation failure reports. > > > > without-revert: writeback:65653 > > with-revert: writeback:21713 > > > > So, after the move to async/sync, a lot more pages are getting queued > > for writeback - more than three times the number of pages are queued for > > writeback with the vanilla kernel. This amount of congestion might be why > > direct reclaimers and kswapd's timings have changed so much. > > > > Chris Mason hinted at this but I didn't quite "get it" at the time but is it > > possible that writeback_inodes() is converting what is expected to be async > > IO into sync IO? One way of checking this is if Frans could test the patch > > below that makes wb_kupdate wait on sync instead of async. > > > > If this makes a difference, I think the three main areas of trouble we > > are now seeing are > > > > 1. page allocator regressions - mostly fixed hopefully > > 2. page writeback change in timing - theory yet to be confirmed > > 3. drivers using more atomics - iwlagn specific, being dealt with > > > > Of course, the big problem is if the changes are due to major timing > > differences in page writeback, then mainline is a totally different > > shape of problem as pdflush has been replaced there. > > > > ==== > > Have wb_kupdate wait on sync IO congestion instead of async > > > > wb_kupdate is expected to only have queued up pages for async IO. > > However, something screwy is happening because it never appears to go to > > sleep. Frans, can you test with this patch instead of the revert please? > > Preferably, keep the verbose-congestion_wait patch applied so we can > > still get an idea who is going to sleep and for how long when calling > > congestion_wait. thanks > > > > Not-signed-off-hacket-job: Mel Gorman <mel@csn.ul.ie> > > --- > > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > > index 81627eb..cb646dd 100644 > > --- a/mm/page-writeback.c > > +++ b/mm/page-writeback.c > > @@ -787,7 +787,7 @@ static void wb_kupdate(unsigned long arg) > > writeback_inodes(&wbc); > > if (wbc.nr_to_write > 0) { > > if (wbc.encountered_congestion || wbc.more_io) > > - congestion_wait(BLK_RW_ASYNC, HZ/10); > > + congestion_wait(BLK_RW_SYNC, HZ/10); > > else > > break; /* All the old data is written */ > > } > > Hmm, This doesn't looks correct to me. > > BLK_RW_ASYNC mean async write. > BLK_RW_SYNC mean read and sync-write. > > wb_kupdate use WB_SYNC_NONE. it's async write. > I don't think it's correct either which is why I described it as "something screwy is happening because it never appears to go to sleep". This is despite there being a whole lot of pages queued for writeback according to the page allocation failure reports. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-27 14:54 ` Mel Gorman (?) @ 2009-10-27 15:52 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-27 15:52 UTC (permalink / raw) To: Frans Pop Cc: Chris Mason, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Tue, Oct 27, 2009 at 02:54:35PM +0000, Mel Gorman wrote: > On Mon, Oct 26, 2009 at 10:06:09PM +0100, Frans Pop wrote: > > On Tuesday 20 October 2009, Mel Gorman wrote: > > > I've attached a patch below that should allow us to cheat. When it's > > > applied, it outputs who called congestion_wait(), how long the timeout > > > was and how long it waited for. By comparing before and after sleep > > > times, we should be able to see which of the callers has significantly > > > changed and if it's something easily addressable. > > > > The results from this look fairly interesting (although I may be a bad > > judge as I don't really know what I'm looking at ;-). > > > > I've tested with two kernels: > > 1) 2.6.31.1: 1 test run > > 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs > > > > The 1st kernel had the expected "freeze" while reading commits in gitk; > > reading commits with the 2nd kernel was more fluent. > > I did 2 runs with the 2nd kernel as the first run had a fairly long music > > skip and more SKB errors than expected. The second run was fairly normal > > with no music skips at all even though it had a few SKB errors. > > > > Data for the tests: > > 1st kernel 2nd kernel 1 2nd kernel 2 > > end reading commits 1:15 1:00 0:55 > > "freeze" yes no no > > branch data shown 1:55 1:15 1:10 > > system quiet 2:25 1:50 1:45 > > # SKB allocation errors 10 53 5 > > > > Note that the test is substantially faster with the 2nd kernel and that the > > SKB errors don't really affect the duration of the test. > > > > Ok. I think that despite expectations, the writeback changes have > changed the timing significantly enough to be worth examining closer. > > > > > - without the revert 'background_writeout' is called a lot less frequently, > > but when it's called it gets long delays > > - without the revert you have 'wb_kupdate', which is relatively expensive > > - with the revert 'shrink_list' is relatively expensive, although not > > really in absolute terms > > > > Lets look at the callers that waited in congestion_wait() for at least > 25 jiffies. > > 2.6.31.1-async-sync-congestion-wait i.e. vanilla kernel > generated with: cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c > 24 background_writeout congestion_wait sync=0 delay 25 timeout 25 > 203 kswapd congestion_wait sync=0 delay 25 timeout 25 > 5 shrink_list congestion_wait sync=0 delay 25 timeout 25 > 155 try_to_free_pages congestion_wait sync=0 delay 25 timeout 25 > 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 > 2 kswapd congestion_wait sync=0 delay 26 timeout 25 > 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 > 1 try_to_free_pages congestion_wait sync=0 delay 54 timeout 25 > > 2.6.31.1-write-congestion-wait i.e. kernel with patch reverted > generated with: cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c > 2 background_writeout congestion_wait rw=1 delay 25 timeout 25 > 188 kswapd congestion_wait rw=1 delay 25 timeout 25 > 14 shrink_list congestion_wait rw=1 delay 25 timeout 25 > 181 try_to_free_pages congestion_wait rw=1 delay 25 timeout 25 > 5 kswapd congestion_wait rw=1 delay 26 timeout 25 > 10 try_to_free_pages congestion_wait rw=1 delay 26 timeout 25 > 3 try_to_free_pages congestion_wait rw=1 delay 27 timeout 25 > 1 kswapd congestion_wait rw=1 delay 29 timeout 25 > 1 __alloc_pages_nodemask congestion_wait rw=1 delay 30 timeout 5 > 1 try_to_free_pages congestion_wait rw=1 delay 31 timeout 25 > 1 try_to_free_pages congestion_wait rw=1 delay 35 timeout 25 > 1 kswapd congestion_wait rw=1 delay 51 timeout 25 > 1 try_to_free_pages congestion_wait rw=1 delay 56 timeout 25 > > So, wb_kupdate and background_writeout are the big movers in terms of waiting, > not the direct reclaimers which is what we were expecting. Of those big > movers, wb_kupdate is the most interested because compare the following > Bah, this part is right, but I got the next section the wrong way around. I should have renamed the damn things instead of remember what was 1 and what was 2. 1 == vanilla 2 == with-revert > $ cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup > [ no output ] > $ $ cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup > 1 wb_kupdate congestion_wait sync=0 delay 15 timeout 25 > 1 wb_kupdate congestion_wait sync=0 delay 23 timeout 25 > 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 > 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 > > The vanilla kernel is not waiting in wb_kupdate at all. > The vanilla kernel *is* waiting. The reverted kernel is not. If my patch makes any difference, it's not for the right reasons. > Jens, before the congestion_wait() changes, wb_kupdate was waiting on > congestion and afterwards it's not. Furthermore, look at the number of pages > that are queued for writeback in the two page allocation failure reports. > > without-revert: writeback:65653 > with-revert: writeback:21713 > and got it back right again. kernel 1 == vanilla kernel == without-revert writeback:65653 kernel 2 == revert kernel == with-revert writeback:21713 > So, after the move to async/sync, a lot more pages are getting queued > for writeback - more than three times the number of pages are queued for > writeback with the vanilla kernel. This amount of congestion might be why > direct reclaimers and kswapd's timings have changed so much. > Or more accurately, the vanilla kernel has queued up a lot more pages for IO than when the patch is reverted. I'm not seeing yet why this is. > Chris Mason hinted at this but I didn't quite "get it" at the time but is it > possible that writeback_inodes() is converting what is expected to be async > IO into sync IO? One way of checking this is if Frans could test the patch > below that makes wb_kupdate wait on sync instead of async. > This reasoning is rubbish. If the patch makes any difference, it's because it changes timing. It's probably more important to figure out if a) if the different number of pages for writeback is relevant and if so b) why has it changed. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-27 15:52 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-27 15:52 UTC (permalink / raw) To: Frans Pop Cc: Chris Mason, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg On Tue, Oct 27, 2009 at 02:54:35PM +0000, Mel Gorman wrote: > On Mon, Oct 26, 2009 at 10:06:09PM +0100, Frans Pop wrote: > > On Tuesday 20 October 2009, Mel Gorman wrote: > > > I've attached a patch below that should allow us to cheat. When it's > > > applied, it outputs who called congestion_wait(), how long the timeout > > > was and how long it waited for. By comparing before and after sleep > > > times, we should be able to see which of the callers has significantly > > > changed and if it's something easily addressable. > > > > The results from this look fairly interesting (although I may be a bad > > judge as I don't really know what I'm looking at ;-). > > > > I've tested with two kernels: > > 1) 2.6.31.1: 1 test run > > 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs > > > > The 1st kernel had the expected "freeze" while reading commits in gitk; > > reading commits with the 2nd kernel was more fluent. > > I did 2 runs with the 2nd kernel as the first run had a fairly long music > > skip and more SKB errors than expected. The second run was fairly normal > > with no music skips at all even though it had a few SKB errors. > > > > Data for the tests: > > 1st kernel 2nd kernel 1 2nd kernel 2 > > end reading commits 1:15 1:00 0:55 > > "freeze" yes no no > > branch data shown 1:55 1:15 1:10 > > system quiet 2:25 1:50 1:45 > > # SKB allocation errors 10 53 5 > > > > Note that the test is substantially faster with the 2nd kernel and that the > > SKB errors don't really affect the duration of the test. > > > > Ok. I think that despite expectations, the writeback changes have > changed the timing significantly enough to be worth examining closer. > > > > > - without the revert 'background_writeout' is called a lot less frequently, > > but when it's called it gets long delays > > - without the revert you have 'wb_kupdate', which is relatively expensive > > - with the revert 'shrink_list' is relatively expensive, although not > > really in absolute terms > > > > Lets look at the callers that waited in congestion_wait() for at least > 25 jiffies. > > 2.6.31.1-async-sync-congestion-wait i.e. vanilla kernel > generated with: cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c > 24 background_writeout congestion_wait sync=0 delay 25 timeout 25 > 203 kswapd congestion_wait sync=0 delay 25 timeout 25 > 5 shrink_list congestion_wait sync=0 delay 25 timeout 25 > 155 try_to_free_pages congestion_wait sync=0 delay 25 timeout 25 > 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 > 2 kswapd congestion_wait sync=0 delay 26 timeout 25 > 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 > 1 try_to_free_pages congestion_wait sync=0 delay 54 timeout 25 > > 2.6.31.1-write-congestion-wait i.e. kernel with patch reverted > generated with: cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c > 2 background_writeout congestion_wait rw=1 delay 25 timeout 25 > 188 kswapd congestion_wait rw=1 delay 25 timeout 25 > 14 shrink_list congestion_wait rw=1 delay 25 timeout 25 > 181 try_to_free_pages congestion_wait rw=1 delay 25 timeout 25 > 5 kswapd congestion_wait rw=1 delay 26 timeout 25 > 10 try_to_free_pages congestion_wait rw=1 delay 26 timeout 25 > 3 try_to_free_pages congestion_wait rw=1 delay 27 timeout 25 > 1 kswapd congestion_wait rw=1 delay 29 timeout 25 > 1 __alloc_pages_nodemask congestion_wait rw=1 delay 30 timeout 5 > 1 try_to_free_pages congestion_wait rw=1 delay 31 timeout 25 > 1 try_to_free_pages congestion_wait rw=1 delay 35 timeout 25 > 1 kswapd congestion_wait rw=1 delay 51 timeout 25 > 1 try_to_free_pages congestion_wait rw=1 delay 56 timeout 25 > > So, wb_kupdate and background_writeout are the big movers in terms of waiting, > not the direct reclaimers which is what we were expecting. Of those big > movers, wb_kupdate is the most interested because compare the following > Bah, this part is right, but I got the next section the wrong way around. I should have renamed the damn things instead of remember what was 1 and what was 2. 1 == vanilla 2 == with-revert > $ cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup > [ no output ] > $ $ cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup > 1 wb_kupdate congestion_wait sync=0 delay 15 timeout 25 > 1 wb_kupdate congestion_wait sync=0 delay 23 timeout 25 > 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 > 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 > > The vanilla kernel is not waiting in wb_kupdate at all. > The vanilla kernel *is* waiting. The reverted kernel is not. If my patch makes any difference, it's not for the right reasons. > Jens, before the congestion_wait() changes, wb_kupdate was waiting on > congestion and afterwards it's not. Furthermore, look at the number of pages > that are queued for writeback in the two page allocation failure reports. > > without-revert: writeback:65653 > with-revert: writeback:21713 > and got it back right again. kernel 1 == vanilla kernel == without-revert writeback:65653 kernel 2 == revert kernel == with-revert writeback:21713 > So, after the move to async/sync, a lot more pages are getting queued > for writeback - more than three times the number of pages are queued for > writeback with the vanilla kernel. This amount of congestion might be why > direct reclaimers and kswapd's timings have changed so much. > Or more accurately, the vanilla kernel has queued up a lot more pages for IO than when the patch is reverted. I'm not seeing yet why this is. > Chris Mason hinted at this but I didn't quite "get it" at the time but is it > possible that writeback_inodes() is converting what is expected to be async > IO into sync IO? One way of checking this is if Frans could test the patch > below that makes wb_kupdate wait on sync instead of async. > This reasoning is rubbish. If the patch makes any difference, it's because it changes timing. It's probably more important to figure out if a) if the different number of pages for writeback is relevant and if so b) why has it changed. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-27 15:52 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-27 15:52 UTC (permalink / raw) To: Frans Pop Cc: Chris Mason, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Tue, Oct 27, 2009 at 02:54:35PM +0000, Mel Gorman wrote: > On Mon, Oct 26, 2009 at 10:06:09PM +0100, Frans Pop wrote: > > On Tuesday 20 October 2009, Mel Gorman wrote: > > > I've attached a patch below that should allow us to cheat. When it's > > > applied, it outputs who called congestion_wait(), how long the timeout > > > was and how long it waited for. By comparing before and after sleep > > > times, we should be able to see which of the callers has significantly > > > changed and if it's something easily addressable. > > > > The results from this look fairly interesting (although I may be a bad > > judge as I don't really know what I'm looking at ;-). > > > > I've tested with two kernels: > > 1) 2.6.31.1: 1 test run > > 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs > > > > The 1st kernel had the expected "freeze" while reading commits in gitk; > > reading commits with the 2nd kernel was more fluent. > > I did 2 runs with the 2nd kernel as the first run had a fairly long music > > skip and more SKB errors than expected. The second run was fairly normal > > with no music skips at all even though it had a few SKB errors. > > > > Data for the tests: > > 1st kernel 2nd kernel 1 2nd kernel 2 > > end reading commits 1:15 1:00 0:55 > > "freeze" yes no no > > branch data shown 1:55 1:15 1:10 > > system quiet 2:25 1:50 1:45 > > # SKB allocation errors 10 53 5 > > > > Note that the test is substantially faster with the 2nd kernel and that the > > SKB errors don't really affect the duration of the test. > > > > Ok. I think that despite expectations, the writeback changes have > changed the timing significantly enough to be worth examining closer. > > > > > - without the revert 'background_writeout' is called a lot less frequently, > > but when it's called it gets long delays > > - without the revert you have 'wb_kupdate', which is relatively expensive > > - with the revert 'shrink_list' is relatively expensive, although not > > really in absolute terms > > > > Lets look at the callers that waited in congestion_wait() for at least > 25 jiffies. > > 2.6.31.1-async-sync-congestion-wait i.e. vanilla kernel > generated with: cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c > 24 background_writeout congestion_wait sync=0 delay 25 timeout 25 > 203 kswapd congestion_wait sync=0 delay 25 timeout 25 > 5 shrink_list congestion_wait sync=0 delay 25 timeout 25 > 155 try_to_free_pages congestion_wait sync=0 delay 25 timeout 25 > 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 > 2 kswapd congestion_wait sync=0 delay 26 timeout 25 > 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 > 1 try_to_free_pages congestion_wait sync=0 delay 54 timeout 25 > > 2.6.31.1-write-congestion-wait i.e. kernel with patch reverted > generated with: cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c > 2 background_writeout congestion_wait rw=1 delay 25 timeout 25 > 188 kswapd congestion_wait rw=1 delay 25 timeout 25 > 14 shrink_list congestion_wait rw=1 delay 25 timeout 25 > 181 try_to_free_pages congestion_wait rw=1 delay 25 timeout 25 > 5 kswapd congestion_wait rw=1 delay 26 timeout 25 > 10 try_to_free_pages congestion_wait rw=1 delay 26 timeout 25 > 3 try_to_free_pages congestion_wait rw=1 delay 27 timeout 25 > 1 kswapd congestion_wait rw=1 delay 29 timeout 25 > 1 __alloc_pages_nodemask congestion_wait rw=1 delay 30 timeout 5 > 1 try_to_free_pages congestion_wait rw=1 delay 31 timeout 25 > 1 try_to_free_pages congestion_wait rw=1 delay 35 timeout 25 > 1 kswapd congestion_wait rw=1 delay 51 timeout 25 > 1 try_to_free_pages congestion_wait rw=1 delay 56 timeout 25 > > So, wb_kupdate and background_writeout are the big movers in terms of waiting, > not the direct reclaimers which is what we were expecting. Of those big > movers, wb_kupdate is the most interested because compare the following > Bah, this part is right, but I got the next section the wrong way around. I should have renamed the damn things instead of remember what was 1 and what was 2. 1 == vanilla 2 == with-revert > $ cat kern.log_2.1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup > [ no output ] > $ $ cat kern.log_1_test | awk -F ] '{print $2}' | sort -k 5 -n | uniq -c | grep wb_kup > 1 wb_kupdate congestion_wait sync=0 delay 15 timeout 25 > 1 wb_kupdate congestion_wait sync=0 delay 23 timeout 25 > 145 wb_kupdate congestion_wait sync=0 delay 25 timeout 25 > 8 wb_kupdate congestion_wait sync=0 delay 26 timeout 25 > > The vanilla kernel is not waiting in wb_kupdate at all. > The vanilla kernel *is* waiting. The reverted kernel is not. If my patch makes any difference, it's not for the right reasons. > Jens, before the congestion_wait() changes, wb_kupdate was waiting on > congestion and afterwards it's not. Furthermore, look at the number of pages > that are queued for writeback in the two page allocation failure reports. > > without-revert: writeback:65653 > with-revert: writeback:21713 > and got it back right again. kernel 1 == vanilla kernel == without-revert writeback:65653 kernel 2 == revert kernel == with-revert writeback:21713 > So, after the move to async/sync, a lot more pages are getting queued > for writeback - more than three times the number of pages are queued for > writeback with the vanilla kernel. This amount of congestion might be why > direct reclaimers and kswapd's timings have changed so much. > Or more accurately, the vanilla kernel has queued up a lot more pages for IO than when the patch is reverted. I'm not seeing yet why this is. > Chris Mason hinted at this but I didn't quite "get it" at the time but is it > possible that writeback_inodes() is converting what is expected to be async > IO into sync IO? One way of checking this is if Frans could test the patch > below that makes wb_kupdate wait on sync instead of async. > This reasoning is rubbish. If the patch makes any difference, it's because it changes timing. It's probably more important to figure out if a) if the different number of pages for writeback is relevant and if so b) why has it changed. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-27 15:52 ` Mel Gorman @ 2009-10-27 16:03 ` Chris Mason -1 siblings, 0 replies; 384+ messages in thread From: Chris Mason @ 2009-10-27 16:03 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Tue, Oct 27, 2009 at 03:52:24PM +0000, Mel Gorman wrote: > > > So, after the move to async/sync, a lot more pages are getting queued > > for writeback - more than three times the number of pages are queued for > > writeback with the vanilla kernel. This amount of congestion might be why > > direct reclaimers and kswapd's timings have changed so much. > > > > Or more accurately, the vanilla kernel has queued up a lot more pages for > IO than when the patch is reverted. I'm not seeing yet why this is. [ sympathies over confusion about congestion...lots of variables here ] If wb_kupdate has been able to queue more writes it is because the congestion logic isn't stopping it. We have congestion_wait(), but before calling that in the writeback paths it says: are you congested? and then backs off if the answer is yes. Ideally, direct reclaim will never do writeback. We want it to be able to find clean pages that kupdate and friends have already processed. Waiting for congestion is a funny thing, it only tells us the device has managed to finish some IO or that a timeout has passed. Neither event has any relation to figuring out if the IO for reclaimable pages has finished. One option is to have the VM remember the hashed waitqueue for one of the pages it direct reclaims and then wait on it. -chris ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-27 16:03 ` Chris Mason 0 siblings, 0 replies; 384+ messages in thread From: Chris Mason @ 2009-10-27 16:03 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Tue, Oct 27, 2009 at 03:52:24PM +0000, Mel Gorman wrote: > > > So, after the move to async/sync, a lot more pages are getting queued > > for writeback - more than three times the number of pages are queued for > > writeback with the vanilla kernel. This amount of congestion might be why > > direct reclaimers and kswapd's timings have changed so much. > > > > Or more accurately, the vanilla kernel has queued up a lot more pages for > IO than when the patch is reverted. I'm not seeing yet why this is. [ sympathies over confusion about congestion...lots of variables here ] If wb_kupdate has been able to queue more writes it is because the congestion logic isn't stopping it. We have congestion_wait(), but before calling that in the writeback paths it says: are you congested? and then backs off if the answer is yes. Ideally, direct reclaim will never do writeback. We want it to be able to find clean pages that kupdate and friends have already processed. Waiting for congestion is a funny thing, it only tells us the device has managed to finish some IO or that a timeout has passed. Neither event has any relation to figuring out if the IO for reclaimable pages has finished. One option is to have the VM remember the hashed waitqueue for one of the pages it direct reclaims and then wait on it. -chris -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-27 16:03 ` Chris Mason (?) @ 2009-10-27 17:21 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-27 17:21 UTC (permalink / raw) To: Chris Mason, Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki On Tuesday 27 October 2009, Chris Mason wrote: > On Tue, Oct 27, 2009 at 03:52:24PM +0000, Mel Gorman wrote: > > > So, after the move to async/sync, a lot more pages are getting > > > queued for writeback - more than three times the number of pages are > > > queued for writeback with the vanilla kernel. This amount of > > > congestion might be why direct reclaimers and kswapd's timings have > > > changed so much. > > > > Or more accurately, the vanilla kernel has queued up a lot more pages > > for IO than when the patch is reverted. I'm not seeing yet why this > > is. > > [ sympathies over confusion about congestion...lots of variables here ] > > If wb_kupdate has been able to queue more writes it is because the > congestion logic isn't stopping it. We have congestion_wait(), but > before calling that in the writeback paths it says: are you congested? > and then backs off if the answer is yes. > > Ideally, direct reclaim will never do writeback. We want it to be able > to find clean pages that kupdate and friends have already processed. > > Waiting for congestion is a funny thing, it only tells us the device has > managed to finish some IO or that a timeout has passed. Neither event > has any relation to figuring out if the IO for reclaimable pages has > finished. > > One option is to have the VM remember the hashed waitqueue for one of > the pages it direct reclaims and then wait on it. What people should be aware of is the behavior of the system I see at this point. I've already mentioned this in other mails, but it's probably good to repeat it here. While gitk is reading commits with vanilla .31 and .32 kernels there is at some point a fairly long period (10-20 seconds) where I see: - a completely frozen desktop, including frozen mouse cursor - really very little disk activity (HD led flashes very briefly less than once per second) - reading commits stops completely during this period - no music. After that there is a period (another 5-15 seconds) with a huge amount of disk activity during which the system gradually becomes responsive again and in gitk the count of commits that have been read starts increasing again (without a jump in the counter which confirms that no commits were read during the freeze). I cannot really tell what the system is doing during those freezes. Because of the frozen desktop I cannot for example see CPU usage. I suspect that, as there is hardly any disk activity, the system must be reorganizing RAM or something. But it seems quite bad that that gets "bunched up" instead of happening more gradually. With the congestion_wait() change reverted I never see these freezes, only much more normal minor latencies (< 2 seconds; mostly < 0.5 seconds), which is probably unavoidable during heavy swapping. Hth, FJP -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-27 16:03 ` Chris Mason (?) @ 2009-10-27 17:21 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-27 17:21 UTC (permalink / raw) To: Chris Mason, Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Tuesday 27 October 2009, Chris Mason wrote: > On Tue, Oct 27, 2009 at 03:52:24PM +0000, Mel Gorman wrote: > > > So, after the move to async/sync, a lot more pages are getting > > > queued for writeback - more than three times the number of pages are > > > queued for writeback with the vanilla kernel. This amount of > > > congestion might be why direct reclaimers and kswapd's timings have > > > changed so much. > > > > Or more accurately, the vanilla kernel has queued up a lot more pages > > for IO than when the patch is reverted. I'm not seeing yet why this > > is. > > [ sympathies over confusion about congestion...lots of variables here ] > > If wb_kupdate has been able to queue more writes it is because the > congestion logic isn't stopping it. We have congestion_wait(), but > before calling that in the writeback paths it says: are you congested? > and then backs off if the answer is yes. > > Ideally, direct reclaim will never do writeback. We want it to be able > to find clean pages that kupdate and friends have already processed. > > Waiting for congestion is a funny thing, it only tells us the device has > managed to finish some IO or that a timeout has passed. Neither event > has any relation to figuring out if the IO for reclaimable pages has > finished. > > One option is to have the VM remember the hashed waitqueue for one of > the pages it direct reclaims and then wait on it. What people should be aware of is the behavior of the system I see at this point. I've already mentioned this in other mails, but it's probably good to repeat it here. While gitk is reading commits with vanilla .31 and .32 kernels there is at some point a fairly long period (10-20 seconds) where I see: - a completely frozen desktop, including frozen mouse cursor - really very little disk activity (HD led flashes very briefly less than once per second) - reading commits stops completely during this period - no music. After that there is a period (another 5-15 seconds) with a huge amount of disk activity during which the system gradually becomes responsive again and in gitk the count of commits that have been read starts increasing again (without a jump in the counter which confirms that no commits were read during the freeze). I cannot really tell what the system is doing during those freezes. Because of the frozen desktop I cannot for example see CPU usage. I suspect that, as there is hardly any disk activity, the system must be reorganizing RAM or something. But it seems quite bad that that gets "bunched up" instead of happening more gradually. With the congestion_wait() change reverted I never see these freezes, only much more normal minor latencies (< 2 seconds; mostly < 0.5 seconds), which is probably unavoidable during heavy swapping. Hth, FJP ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-27 17:21 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-27 17:21 UTC (permalink / raw) To: Chris Mason, Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki On Tuesday 27 October 2009, Chris Mason wrote: > On Tue, Oct 27, 2009 at 03:52:24PM +0000, Mel Gorman wrote: > > > So, after the move to async/sync, a lot more pages are getting > > > queued for writeback - more than three times the number of pages are > > > queued for writeback with the vanilla kernel. This amount of > > > congestion might be why direct reclaimers and kswapd's timings have > > > changed so much. > > > > Or more accurately, the vanilla kernel has queued up a lot more pages > > for IO than when the patch is reverted. I'm not seeing yet why this > > is. > > [ sympathies over confusion about congestion...lots of variables here ] > > If wb_kupdate has been able to queue more writes it is because the > congestion logic isn't stopping it. We have congestion_wait(), but > before calling that in the writeback paths it says: are you congested? > and then backs off if the answer is yes. > > Ideally, direct reclaim will never do writeback. We want it to be able > to find clean pages that kupdate and friends have already processed. > > Waiting for congestion is a funny thing, it only tells us the device has > managed to finish some IO or that a timeout has passed. Neither event > has any relation to figuring out if the IO for reclaimable pages has > finished. > > One option is to have the VM remember the hashed waitqueue for one of > the pages it direct reclaims and then wait on it. What people should be aware of is the behavior of the system I see at this point. I've already mentioned this in other mails, but it's probably good to repeat it here. While gitk is reading commits with vanilla .31 and .32 kernels there is at some point a fairly long period (10-20 seconds) where I see: - a completely frozen desktop, including frozen mouse cursor - really very little disk activity (HD led flashes very briefly less than once per second) - reading commits stops completely during this period - no music. After that there is a period (another 5-15 seconds) with a huge amount of disk activity during which the system gradually becomes responsive again and in gitk the count of commits that have been read starts increasing again (without a jump in the counter which confirms that no commits were read during the freeze). I cannot really tell what the system is doing during those freezes. Because of the frozen desktop I cannot for example see CPU usage. I suspect that, as there is hardly any disk activity, the system must be reorganizing RAM or something. But it seems quite bad that that gets "bunched up" instead of happening more gradually. With the congestion_wait() change reverted I never see these freezes, only much more normal minor latencies (< 2 seconds; mostly < 0.5 seconds), which is probably unavoidable during heavy swapping. Hth, FJP ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-27 17:21 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-27 17:21 UTC (permalink / raw) To: Chris Mason, Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Tuesday 27 October 2009, Chris Mason wrote: > On Tue, Oct 27, 2009 at 03:52:24PM +0000, Mel Gorman wrote: > > > So, after the move to async/sync, a lot more pages are getting > > > queued for writeback - more than three times the number of pages are > > > queued for writeback with the vanilla kernel. This amount of > > > congestion might be why direct reclaimers and kswapd's timings have > > > changed so much. > > > > Or more accurately, the vanilla kernel has queued up a lot more pages > > for IO than when the patch is reverted. I'm not seeing yet why this > > is. > > [ sympathies over confusion about congestion...lots of variables here ] > > If wb_kupdate has been able to queue more writes it is because the > congestion logic isn't stopping it. We have congestion_wait(), but > before calling that in the writeback paths it says: are you congested? > and then backs off if the answer is yes. > > Ideally, direct reclaim will never do writeback. We want it to be able > to find clean pages that kupdate and friends have already processed. > > Waiting for congestion is a funny thing, it only tells us the device has > managed to finish some IO or that a timeout has passed. Neither event > has any relation to figuring out if the IO for reclaimable pages has > finished. > > One option is to have the VM remember the hashed waitqueue for one of > the pages it direct reclaims and then wait on it. What people should be aware of is the behavior of the system I see at this point. I've already mentioned this in other mails, but it's probably good to repeat it here. While gitk is reading commits with vanilla .31 and .32 kernels there is at some point a fairly long period (10-20 seconds) where I see: - a completely frozen desktop, including frozen mouse cursor - really very little disk activity (HD led flashes very briefly less than once per second) - reading commits stops completely during this period - no music. After that there is a period (another 5-15 seconds) with a huge amount of disk activity during which the system gradually becomes responsive again and in gitk the count of commits that have been read starts increasing again (without a jump in the counter which confirms that no commits were read during the freeze). I cannot really tell what the system is doing during those freezes. Because of the frozen desktop I cannot for example see CPU usage. I suspect that, as there is hardly any disk activity, the system must be reorganizing RAM or something. But it seems quite bad that that gets "bunched up" instead of happening more gradually. With the congestion_wait() change reverted I never see these freezes, only much more normal minor latencies (< 2 seconds; mostly < 0.5 seconds), which is probably unavoidable during heavy swapping. Hth, FJP -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-26 21:06 ` Frans Pop 2009-10-27 14:54 ` Mel Gorman @ 2009-11-05 20:14 ` Frans Pop 1 sibling, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-11-05 20:14 UTC (permalink / raw) To: Mel Gorman Cc: Chris Mason, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Monday 26 October 2009, Frans Pop wrote: > On Tuesday 20 October 2009, Mel Gorman wrote: > > I've attached a patch below that should allow us to cheat. When it's > > applied, it outputs who called congestion_wait(), how long the timeout > > was and how long it waited for. By comparing before and after sleep > > times, we should be able to see which of the callers has significantly > > changed and if it's something easily addressable. > > The results from this look fairly interesting (although I may be a bad > judge as I don't really know what I'm looking at ;-). > > I've tested with two kernels: > 1) 2.6.31.1: 1 test run > 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs I've taken another look at the data from this debug patch, resulting in these graphs: http://people.debian.org/~fjp/tmp/kernel/congestion.pdf I think the graph may show the reason for the congestion_wait() regression. Horizontal axis shows time, vertical axis shows number of logged congestion_wait calls per type. The top chart is without the revert, the bottom one after the revert. Note how before the revert the graph shows distinct steps: first you get almost exclusively kwapd, followed by almost exclusively alloc_pages and try_to_free. I suspect the periods where kswapd is almost horizontal correspond to the freezes. With the revert the lines for the different functions are almost straight and everything happens much better interspersed. Cheers, FJP ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-11-05 20:14 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-11-05 20:14 UTC (permalink / raw) To: Mel Gorman Cc: Chris Mason, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg On Monday 26 October 2009, Frans Pop wrote: > On Tuesday 20 October 2009, Mel Gorman wrote: > > I've attached a patch below that should allow us to cheat. When it's > > applied, it outputs who called congestion_wait(), how long the timeout > > was and how long it waited for. By comparing before and after sleep > > times, we should be able to see which of the callers has significantly > > changed and if it's something easily addressable. > > The results from this look fairly interesting (although I may be a bad > judge as I don't really know what I'm looking at ;-). > > I've tested with two kernels: > 1) 2.6.31.1: 1 test run > 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs I've taken another look at the data from this debug patch, resulting in these graphs: http://people.debian.org/~fjp/tmp/kernel/congestion.pdf I think the graph may show the reason for the congestion_wait() regression. Horizontal axis shows time, vertical axis shows number of logged congestion_wait calls per type. The top chart is without the revert, the bottom one after the revert. Note how before the revert the graph shows distinct steps: first you get almost exclusively kwapd, followed by almost exclusively alloc_pages and try_to_free. I suspect the periods where kswapd is almost horizontal correspond to the freezes. With the revert the lines for the different functions are almost straight and everything happens much better interspersed. Cheers, FJP ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-11-05 20:14 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-11-05 20:14 UTC (permalink / raw) To: Mel Gorman Cc: Chris Mason, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Monday 26 October 2009, Frans Pop wrote: > On Tuesday 20 October 2009, Mel Gorman wrote: > > I've attached a patch below that should allow us to cheat. When it's > > applied, it outputs who called congestion_wait(), how long the timeout > > was and how long it waited for. By comparing before and after sleep > > times, we should be able to see which of the callers has significantly > > changed and if it's something easily addressable. > > The results from this look fairly interesting (although I may be a bad > judge as I don't really know what I'm looking at ;-). > > I've tested with two kernels: > 1) 2.6.31.1: 1 test run > 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs I've taken another look at the data from this debug patch, resulting in these graphs: http://people.debian.org/~fjp/tmp/kernel/congestion.pdf I think the graph may show the reason for the congestion_wait() regression. Horizontal axis shows time, vertical axis shows number of logged congestion_wait calls per type. The top chart is without the revert, the bottom one after the revert. Note how before the revert the graph shows distinct steps: first you get almost exclusively kwapd, followed by almost exclusively alloc_pages and try_to_free. I suspect the periods where kswapd is almost horizontal correspond to the freezes. With the revert the lines for the different functions are almost straight and everything happens much better interspersed. Cheers, FJP -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-11-05 20:14 ` Frans Pop (?) @ 2009-11-06 9:51 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-11-06 9:51 UTC (permalink / raw) To: Mel Gorman Cc: Chris Mason, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Thursday 05 November 2009, Frans Pop wrote: > On Monday 26 October 2009, Frans Pop wrote: > > On Tuesday 20 October 2009, Mel Gorman wrote: > > > I've attached a patch below that should allow us to cheat. When it's > > > applied, it outputs who called congestion_wait(), how long the > > > timeout was and how long it waited for. By comparing before and > > > after sleep times, we should be able to see which of the callers has > > > significantly changed and if it's something easily addressable. > > > > The results from this look fairly interesting (although I may be a bad > > judge as I don't really know what I'm looking at ;-). > > > > I've tested with two kernels: > > 1) 2.6.31.1: 1 test run > > 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs > > I've taken another look at the data from this debug patch, resulting in > these graphs: http://people.debian.org/~fjp/tmp/kernel/congestion.pdf > > I think the graph may show the reason for the congestion_wait() > regression. Horizontal axis shows time, vertical axis shows number of > logged congestion_wait calls per type. I'm sorry. My initial version had a skewed time axis (showed occurrences instead of actual time). I've now uploaded a corrected version: http://people.debian.org/~fjp/tmp/kernel/congestion.pdf I've also uploaded a second version that shows cumulative delay per type, which probably gives a better insight: http://people.debian.org/~fjp/tmp/kernel/congestion2.pdf For both the top chart is without the revert, the bottom one after the revert. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-11-06 9:51 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-11-06 9:51 UTC (permalink / raw) To: Mel Gorman Cc: Chris Mason, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg On Thursday 05 November 2009, Frans Pop wrote: > On Monday 26 October 2009, Frans Pop wrote: > > On Tuesday 20 October 2009, Mel Gorman wrote: > > > I've attached a patch below that should allow us to cheat. When it's > > > applied, it outputs who called congestion_wait(), how long the > > > timeout was and how long it waited for. By comparing before and > > > after sleep times, we should be able to see which of the callers has > > > significantly changed and if it's something easily addressable. > > > > The results from this look fairly interesting (although I may be a bad > > judge as I don't really know what I'm looking at ;-). > > > > I've tested with two kernels: > > 1) 2.6.31.1: 1 test run > > 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs > > I've taken another look at the data from this debug patch, resulting in > these graphs: http://people.debian.org/~fjp/tmp/kernel/congestion.pdf > > I think the graph may show the reason for the congestion_wait() > regression. Horizontal axis shows time, vertical axis shows number of > logged congestion_wait calls per type. I'm sorry. My initial version had a skewed time axis (showed occurrences instead of actual time). I've now uploaded a corrected version: http://people.debian.org/~fjp/tmp/kernel/congestion.pdf I've also uploaded a second version that shows cumulative delay per type, which probably gives a better insight: http://people.debian.org/~fjp/tmp/kernel/congestion2.pdf For both the top chart is without the revert, the bottom one after the revert. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-11-06 9:51 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-11-06 9:51 UTC (permalink / raw) To: Mel Gorman Cc: Chris Mason, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Thursday 05 November 2009, Frans Pop wrote: > On Monday 26 October 2009, Frans Pop wrote: > > On Tuesday 20 October 2009, Mel Gorman wrote: > > > I've attached a patch below that should allow us to cheat. When it's > > > applied, it outputs who called congestion_wait(), how long the > > > timeout was and how long it waited for. By comparing before and > > > after sleep times, we should be able to see which of the callers has > > > significantly changed and if it's something easily addressable. > > > > The results from this look fairly interesting (although I may be a bad > > judge as I don't really know what I'm looking at ;-). > > > > I've tested with two kernels: > > 1) 2.6.31.1: 1 test run > > 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs > > I've taken another look at the data from this debug patch, resulting in > these graphs: http://people.debian.org/~fjp/tmp/kernel/congestion.pdf > > I think the graph may show the reason for the congestion_wait() > regression. Horizontal axis shows time, vertical axis shows number of > logged congestion_wait calls per type. I'm sorry. My initial version had a skewed time axis (showed occurrences instead of actual time). I've now uploaded a corrected version: http://people.debian.org/~fjp/tmp/kernel/congestion.pdf I've also uploaded a second version that shows cumulative delay per type, which probably gives a better insight: http://people.debian.org/~fjp/tmp/kernel/congestion2.pdf For both the top chart is without the revert, the bottom one after the revert. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-11-06 9:51 ` Frans Pop @ 2009-11-09 19:00 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-11-09 19:00 UTC (permalink / raw) To: Frans Pop Cc: Chris Mason, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Fri, Nov 06, 2009 at 10:51:37AM +0100, Frans Pop wrote: > On Thursday 05 November 2009, Frans Pop wrote: > > On Monday 26 October 2009, Frans Pop wrote: > > > On Tuesday 20 October 2009, Mel Gorman wrote: > > > > I've attached a patch below that should allow us to cheat. When it's > > > > applied, it outputs who called congestion_wait(), how long the > > > > timeout was and how long it waited for. By comparing before and > > > > after sleep times, we should be able to see which of the callers has > > > > significantly changed and if it's something easily addressable. > > > > > > The results from this look fairly interesting (although I may be a bad > > > judge as I don't really know what I'm looking at ;-). > > > > > > I've tested with two kernels: > > > 1) 2.6.31.1: 1 test run > > > 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs > > > > I've taken another look at the data from this debug patch, resulting in > > these graphs: http://people.debian.org/~fjp/tmp/kernel/congestion.pdf > > > > I think the graph may show the reason for the congestion_wait() > > regression. Horizontal axis shows time, vertical axis shows number of > > logged congestion_wait calls per type. > > I'm sorry. My initial version had a skewed time axis (showed occurrences > instead of actual time). I've now uploaded a corrected version: > http://people.debian.org/~fjp/tmp/kernel/congestion.pdf > > I've also uploaded a second version that shows cumulative delay per type, > which probably gives a better insight: > http://people.debian.org/~fjp/tmp/kernel/congestion2.pdf > > For both the top chart is without the revert, the bottom one after the > revert. > I'm looking into this at the moment. There are some definite differences not only in the length congestion_wait() is waiting but in what the callers are doing. I've more or less reproduced your results locally and am slowly plodding through each caller to see what has changed of significance. No patches yet though. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-11-09 19:00 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-11-09 19:00 UTC (permalink / raw) To: Frans Pop Cc: Chris Mason, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm On Fri, Nov 06, 2009 at 10:51:37AM +0100, Frans Pop wrote: > On Thursday 05 November 2009, Frans Pop wrote: > > On Monday 26 October 2009, Frans Pop wrote: > > > On Tuesday 20 October 2009, Mel Gorman wrote: > > > > I've attached a patch below that should allow us to cheat. When it's > > > > applied, it outputs who called congestion_wait(), how long the > > > > timeout was and how long it waited for. By comparing before and > > > > after sleep times, we should be able to see which of the callers has > > > > significantly changed and if it's something easily addressable. > > > > > > The results from this look fairly interesting (although I may be a bad > > > judge as I don't really know what I'm looking at ;-). > > > > > > I've tested with two kernels: > > > 1) 2.6.31.1: 1 test run > > > 2) 2.6.31.1 + congestion_wait() reverts: 2 test runs > > > > I've taken another look at the data from this debug patch, resulting in > > these graphs: http://people.debian.org/~fjp/tmp/kernel/congestion.pdf > > > > I think the graph may show the reason for the congestion_wait() > > regression. Horizontal axis shows time, vertical axis shows number of > > logged congestion_wait calls per type. > > I'm sorry. My initial version had a skewed time axis (showed occurrences > instead of actual time). I've now uploaded a corrected version: > http://people.debian.org/~fjp/tmp/kernel/congestion.pdf > > I've also uploaded a second version that shows cumulative delay per type, > which probably gives a better insight: > http://people.debian.org/~fjp/tmp/kernel/congestion2.pdf > > For both the top chart is without the revert, the bottom one after the > revert. > I'm looking into this at the moment. There are some definite differences not only in the length congestion_wait() is waiting but in what the callers are doing. I've more or less reproduced your results locally and am slowly plodding through each caller to see what has changed of significance. No patches yet though. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-19 16:18 ` Chris Mason ` (3 preceding siblings ...) (?) @ 2009-10-20 10:48 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-20 10:48 UTC (permalink / raw) To: Chris Mason, Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki On Tue, Oct 20, 2009 at 01:18:15AM +0900, Chris Mason wrote: > On Mon, Oct 19, 2009 at 03:01:52PM +0100, Mel Gorman wrote: > > > > > During the 2nd phase I see the first SKB allocation errors with a music > > > skip between reading commits 95.000 and 110.000. > > > About commit 115.000 there is a very long pause during which the counter > > > does not increase, music stops and the desktop freezes completely. The > > > first 30 seconds of that freeze there is only very low disk activity (which > > > seems strange); > > > > I'm just going to have to depend on Jens here. Jens, the congestion_wait() is > > on BLK_RW_ASYNC after the commit. Reclaim usually writes pages asynchronously > > but lumpy reclaim actually waits of pages to write out synchronously so > > it's not always async. > > Waiting doesn't make it synchronous from the elevator point of view ;) > If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it > a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be > using the async congestion wait. (the exception is xfs which always > does async writes). > Right, reclaim always queues the pages for async IO but for lumpy reclaim, it calls wait_on_page_writeback() but as you say, from an elevator point of view, it's still async. > But I'm honestly not 100% sure. Looking back through the emails, the > test case is doing IO on top of a whole lot of things on top of > dm-crypt? I just tried to figure out if dm-crypt is turning the async > IO into sync IOs, but didn't quite make sense of it. > I'm not overly sure either. > Could you also please include which filesystems were being abused during > the test and how? Reading through the emails, I think you've got: > > gitk being run 3 times on some FS (NFS?) > streaming reads on NFS > swap on dm-crypt > > If other filesystems are being used, please correct me. Also please > include if they are on crypto or straight block device. > I've attached a patch below that should allow us to cheat. When it's applied, it outputs who called congestion_wait(), how long the timeout was and how long it waited for. By comparing before and after sleep times, we should be able to see which of the callers has significantly changed and if it's something easily addressable. > > Either way, reclaim is usually worried about writing pages but it would appear > > after this change that a lot of read activity can also stall a process in > > direct reclaim. What might be happening in Frans's particular case is that the > > tasklet that allocates high-order pages for the RX buffers is getting stalled > > by congestion caused by other processes doing reads from the filesystem. > > While it makes sense from a congestion point of view to halt the IO, the > > reclaim operations from direct reclaimers is getting delayed for long enough > > to cause problems for GFP_ATOMIC. > > The congestion_wait code either waits for congestion to clear or for > a given timeout. The part that isn't clear is if before the patch > we waited a very short time (congestion cleared quickly) or a very long > time (we hit the timeout or congestion cleared slowly). > Using the instrumentation patch, I found with a very basic test that we are waiting for short periods of time more often with the patch applied 1 congestion_wait rw=1 delay 6 timeout 25 :: before commit 7 kswapd congestion_wait rw=1 delay 0 timeout 25 :: before commit 32 kswapd congestion_wait sync=0 delay 0 timeout 25 :: after commit 61 kswapd congestion_wait rw=1 delay 1 timeout 25 :: before commit 133 kswapd congestion_wait sync=0 delay 1 timeout 25 :: after commit 16 kswapd congestion_wait rw=1 delay 2 timeout 25 :: before commit 70 kswapd congestion_wait sync=0 delay 2 timeout 25 :: after commit 1 try_to_free_pages congestion_wait sync=0 delay 2 timeout 25 :: after commit 17 kswapd congestion_wait rw=1 delay 3 timeout 25 :: before commit 28 kswapd congestion_wait sync=0 delay 3 timeout 25 :: after commit 1 try_to_free_pages congestion_wait sync=0 delay 3 timeout 25 :: after commit 23 kswapd congestion_wait rw=1 delay 4 timeout 25 :: before commit 16 kswapd congestion_wait sync=0 delay 4 timeout 25 :: after commit 5 try_to_free_pages congestion_wait sync=0 delay 4 timeout 25 :: after commit 20 kswapd congestion_wait rw=1 delay 5 timeout 25 :: before commit 18 kswapd congestion_wait sync=0 delay 5 timeout 25 :: after commit 3 try_to_free_pages congestion_wait sync=0 delay 5 timeout 25 :: after commit 21 kswapd congestion_wait rw=1 delay 6 timeout 25 :: before commit 8 kswapd congestion_wait sync=0 delay 6 timeout 25 :: after commit 2 try_to_free_pages congestion_wait sync=0 delay 6 timeout 25 :: after commit 13 kswapd congestion_wait rw=1 delay 7 timeout 25 :: before commit 12 kswapd congestion_wait sync=0 delay 7 timeout 25 :: after commit 2 try_to_free_pages congestion_wait sync=0 delay 7 timeout 25 :: after commit 8 kswapd congestion_wait rw=1 delay 8 timeout 25 :: before commit 7 kswapd congestion_wait sync=0 delay 8 timeout 25 :: after commit 9 kswapd congestion_wait rw=1 delay 9 timeout 25 :: before commit 5 kswapd congestion_wait sync=0 delay 9 timeout 25 :: after commit 2 try_to_free_pages congestion_wait sync=0 delay 9 timeout 25 :: after commit 4 kswapd congestion_wait rw=1 delay 10 timeout 25 :: before commit 5 kswapd congestion_wait sync=0 delay 10 timeout 25 :: after commit 1 try_to_free_pages congestion_wait sync=0 delay 10 timeout 25 :: after commit [... remaining output snipped ...] The before and after commit are really 2.6.31 and 2.6.31-patch-reverted. The first column is how many times we delayed for that length of time. To generate the output, I just took the console log from both kernels with a basic test, put the congestion_wait lines into two separate files and cat congestion-*-sorted | sort -n -k5 | uniq -c to give a count of how many times we delayed for a particular caller. > The easiest way to tell is to just replace the congestion_wait() calls > in direct reclaim with schedule_timeout_interruptible(10), test, then > schedule_timeout_interruptible(HZ/20), then test again. > Reclaim can also call congestion_wait() and maybe the problem isn't within the page allocator at all but that it's indirectly affected by timing. > > > > Does this sound plausible to you? If so, what's the best way of > > addressing this? Changing congestion_wait back to WRITE (assuming that > > works for Frans)? Changing it to SYNC (again, assuming it actually > > works) or a revert? > > I don't think changing it to SYNC is a good plan unless we're actually > doing sync io. It would be better to just wait on one of the pages that > you've sent down (or its hashed waitqueue since the page can go away). > Frans, is there any chance you could apply the following patch and get the console logs for a vanilla kernel and with the congestion patches reverted? I'm hoping it'll be able to tell us which of the callers has significantly changed in timing. If there is one caller that has significantly changed, it might be enough to address just that caller. ===== >From 757999066dc41f2e053d59589c673052fc7c1a65 Mon Sep 17 00:00:00 2001 From: Mel Gorman <mel@csn.ul.ie> Date: Tue, 20 Oct 2009 11:01:57 +0100 Subject: [PATCH] Instrument congestion_wait This patch instruments how long congestion_wait() really waited for a given caller. diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 3d3accb..fc945e0 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -10,6 +10,7 @@ #include <linux/module.h> #include <linux/writeback.h> #include <linux/device.h> +#include <linux/kallsyms.h> void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page) { @@ -729,6 +730,11 @@ EXPORT_SYMBOL(set_bdi_congested); */ long congestion_wait(int sync, long timeout) { + unsigned long jiffies_start = jiffies; + char *module; + char buf[128]; + const char *symbol; + unsigned long offset, symbolsize; long ret; DEFINE_WAIT(wait); wait_queue_head_t *wqh = &congestion_wqh[sync]; @@ -736,6 +742,13 @@ long congestion_wait(int sync, long timeout) prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); ret = io_schedule_timeout(timeout); finish_wait(wqh, &wait); + + symbol = kallsyms_lookup(_RET_IP_, &symbolsize, &offset, &module, buf), + printk(KERN_INFO "%-20s congestion_wait sync=%d delay %lu timeout %ld\n", + symbol, + sync, + jiffies - jiffies_start, + timeout); return ret; } EXPORT_SYMBOL(congestion_wait); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-19 16:18 ` Chris Mason (?) @ 2009-10-25 18:54 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-25 18:54 UTC (permalink / raw) To: Chris Mason Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm Sorry for the delayed reply. On Monday 19 October 2009, Chris Mason wrote: > On Mon, Oct 19, 2009 at 03:01:52PM +0100, Mel Gorman wrote: > > > During the 2nd phase I see the first SKB allocation errors with a > > > music skip between reading commits 95.000 and 110.000. > > > About commit 115.000 there is a very long pause during which the > > > counter does not increase, music stops and the desktop freezes > > > completely. The first 30 seconds of that freeze there is only very > > > low disk activity (which seems strange); > > > > I'm just going to have to depend on Jens here. Jens, the > > congestion_wait() is on BLK_RW_ASYNC after the commit. Reclaim usually > > writes pages asynchronously but lumpy reclaim actually waits of pages > > to write out synchronously so it's not always async. > > Waiting doesn't make it synchronous from the elevator point of view ;) > If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it > a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be > using the async congestion wait. (the exception is xfs which always > does async writes). > > But I'm honestly not 100% sure. Looking back through the emails, the > test case is doing IO on top of a whole lot of things on top of > dm-crypt? I just tried to figure out if dm-crypt is turning the async > IO into sync IOs, but didn't quite make sense of it. > > Could you also please include which filesystems were being abused during > the test and how? Reading through the emails, I think you've got: > > gitk being run 3 times on some FS (NFS?) gitk is run on an ext3 logical volume in a volume group that's on a LUKS encrypted partition of the local hard disk. So it's: SATA harddisk -> dm-crypt (dmsetup) -> LVM (lvm2) -> ext3 > streaming reads on NFS Correct. My music share is a remote (nfs4) read-only mounted ext3 partition. > swap on dm-crypt Correct. Swap is another logical volume in the same volume group as mentioned above. So kcrypt gets to (de)encrypt both the gitk data *and* any swapping caused by that [1]. > If other filesystems are being used, please correct me. Also please > include if they are on crypto or straight block device. All my file systems are ext3. Nothing newfangled or exotic ;-) There are some bind mounts involved, but I expect that's transparent. Cheers, FJP [1] I've plans to move some of my data outside the encrypted volume, but currently everything except /boot is in the encrypted VG. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-25 18:54 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-25 18:54 UTC (permalink / raw) To: Chris Mason Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg Sorry for the delayed reply. On Monday 19 October 2009, Chris Mason wrote: > On Mon, Oct 19, 2009 at 03:01:52PM +0100, Mel Gorman wrote: > > > During the 2nd phase I see the first SKB allocation errors with a > > > music skip between reading commits 95.000 and 110.000. > > > About commit 115.000 there is a very long pause during which the > > > counter does not increase, music stops and the desktop freezes > > > completely. The first 30 seconds of that freeze there is only very > > > low disk activity (which seems strange); > > > > I'm just going to have to depend on Jens here. Jens, the > > congestion_wait() is on BLK_RW_ASYNC after the commit. Reclaim usually > > writes pages asynchronously but lumpy reclaim actually waits of pages > > to write out synchronously so it's not always async. > > Waiting doesn't make it synchronous from the elevator point of view ;) > If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it > a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be > using the async congestion wait. (the exception is xfs which always > does async writes). > > But I'm honestly not 100% sure. Looking back through the emails, the > test case is doing IO on top of a whole lot of things on top of > dm-crypt? I just tried to figure out if dm-crypt is turning the async > IO into sync IOs, but didn't quite make sense of it. > > Could you also please include which filesystems were being abused during > the test and how? Reading through the emails, I think you've got: > > gitk being run 3 times on some FS (NFS?) gitk is run on an ext3 logical volume in a volume group that's on a LUKS encrypted partition of the local hard disk. So it's: SATA harddisk -> dm-crypt (dmsetup) -> LVM (lvm2) -> ext3 > streaming reads on NFS Correct. My music share is a remote (nfs4) read-only mounted ext3 partition. > swap on dm-crypt Correct. Swap is another logical volume in the same volume group as mentioned above. So kcrypt gets to (de)encrypt both the gitk data *and* any swapping caused by that [1]. > If other filesystems are being used, please correct me. Also please > include if they are on crypto or straight block device. All my file systems are ext3. Nothing newfangled or exotic ;-) There are some bind mounts involved, but I expect that's transparent. Cheers, FJP [1] I've plans to move some of my data outside the encrypted volume, but currently everything except /boot is in the encrypted VG. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-25 18:54 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-25 18:54 UTC (permalink / raw) To: Chris Mason Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Reinette Chatre, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Mohamed Abbas, Jens Axboe, John W. Linville, linux-mm Sorry for the delayed reply. On Monday 19 October 2009, Chris Mason wrote: > On Mon, Oct 19, 2009 at 03:01:52PM +0100, Mel Gorman wrote: > > > During the 2nd phase I see the first SKB allocation errors with a > > > music skip between reading commits 95.000 and 110.000. > > > About commit 115.000 there is a very long pause during which the > > > counter does not increase, music stops and the desktop freezes > > > completely. The first 30 seconds of that freeze there is only very > > > low disk activity (which seems strange); > > > > I'm just going to have to depend on Jens here. Jens, the > > congestion_wait() is on BLK_RW_ASYNC after the commit. Reclaim usually > > writes pages asynchronously but lumpy reclaim actually waits of pages > > to write out synchronously so it's not always async. > > Waiting doesn't make it synchronous from the elevator point of view ;) > If you're using WB_SYNC_NONE, it's a async write. WB_SYNC_ALL makes it > a sync write. I only see WB_SYNC_NONE in vmscan.c, so we should be > using the async congestion wait. (the exception is xfs which always > does async writes). > > But I'm honestly not 100% sure. Looking back through the emails, the > test case is doing IO on top of a whole lot of things on top of > dm-crypt? I just tried to figure out if dm-crypt is turning the async > IO into sync IOs, but didn't quite make sense of it. > > Could you also please include which filesystems were being abused during > the test and how? Reading through the emails, I think you've got: > > gitk being run 3 times on some FS (NFS?) gitk is run on an ext3 logical volume in a volume group that's on a LUKS encrypted partition of the local hard disk. So it's: SATA harddisk -> dm-crypt (dmsetup) -> LVM (lvm2) -> ext3 > streaming reads on NFS Correct. My music share is a remote (nfs4) read-only mounted ext3 partition. > swap on dm-crypt Correct. Swap is another logical volume in the same volume group as mentioned above. So kcrypt gets to (de)encrypt both the gitk data *and* any swapping caused by that [1]. > If other filesystems are being used, please correct me. Also please > include if they are on crypto or straight block device. All my file systems are ext3. Nothing newfangled or exotic ;-) There are some bind mounts involved, but I expect that's transparent. Cheers, FJP [1] I've plans to move some of my data outside the encrypted volume, but currently everything except /boot is in the encrypted VG. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-14 10:30 ` Mel Gorman @ 2009-10-14 16:28 ` reinette chatre -1 siblings, 0 replies; 384+ messages in thread From: reinette chatre @ 2009-10-14 16:28 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm Hi Mel, On Wed, 2009-10-14 at 03:30 -0700, Mel Gorman wrote: > From 5fb9f897117bf2701f9fdebe4d008dbe34358ab9 Mon Sep 17 00:00:00 2001 > From: Mel Gorman <mel@csn.ul.ie> > Date: Wed, 14 Oct 2009 11:19:57 +0100 > Subject: [PATCH] iwlwifi: Suppress warnings related to GFP_ATOMIC allocations that do not matter > > iwlwifi refills RX buffers in two ways - a direct method using GFP_ATOMIC > and a tasklet method using GFP_KERNEL. There are a number of RX buffers and > there are only serious issues when there are no RX buffers left. The driver > explicitly warns when refills are failing and the buffers are low but it > always warns when a GFP_ATOMIC allocation fails even when there is no > packet loss as a result. No, it does not always warn when a GFP_ATOMIC allocation fails. Please check earlier in iwl_rx_allocate() we have: if (rxq->free_count > RX_LOW_WATERMARK) priority |= __GFP_NOWARN; So it will suppress warnings as long as we have buffers available. We do want to see warnings if memory is below watermark and allocation fails - your patch prevents these warnings from appearing. > This patch specifies __GFP_NOWARN for the direct refill method that uses > GFP_ATOMIC. To help identify where allocation failures might be coming > from, the stack is dumped when the RX queue is dangerously low. > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > --- > drivers/net/wireless/iwlwifi/iwl-rx.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/Documentation/trace/postprocess/trace-pagealloc-postprocess.pl b/Documentation/trace/postprocess/trace-pagealloc-postprocess.pl > old mode 100644 > new mode 100755 > diff --git a/drivers/net/wireless/iwlwifi/iwl-rx.c b/drivers/net/wireless/iwlwifi/iwl-rx.c > index 8e1bb53..f91a108 100644 > --- a/drivers/net/wireless/iwlwifi/iwl-rx.c > +++ b/drivers/net/wireless/iwlwifi/iwl-rx.c > @@ -260,10 +260,12 @@ void iwl_rx_allocate(struct iwl_priv *priv, gfp_t priority) > if (net_ratelimit()) > IWL_DEBUG_INFO(priv, "Failed to allocate SKB buffer.\n"); > if ((rxq->free_count <= RX_LOW_WATERMARK) && > - net_ratelimit()) > + net_ratelimit()) { > IWL_CRIT(priv, "Failed to allocate SKB buffer with %s. Only %u free buffers remaining.\n", > priority == GFP_ATOMIC ? "GFP_ATOMIC" : "GFP_KERNEL", > rxq->free_count); > + dump_stack(); > + } > /* We don't reschedule replenish work here -- we will > * call the restock method and if it still needs > * more buffers it will schedule replenish */ > @@ -320,7 +322,7 @@ EXPORT_SYMBOL(iwl_rx_replenish); > > void iwl_rx_replenish_now(struct iwl_priv *priv) > { > - iwl_rx_allocate(priv, GFP_ATOMIC); > + iwl_rx_allocate(priv, GFP_ATOMIC|__GFP_NOWARN); > > iwl_rx_queue_restock(priv); > } ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-14 16:28 ` reinette chatre 0 siblings, 0 replies; 384+ messages in thread From: reinette chatre @ 2009-10-14 16:28 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm Hi Mel, On Wed, 2009-10-14 at 03:30 -0700, Mel Gorman wrote: > From 5fb9f897117bf2701f9fdebe4d008dbe34358ab9 Mon Sep 17 00:00:00 2001 > From: Mel Gorman <mel@csn.ul.ie> > Date: Wed, 14 Oct 2009 11:19:57 +0100 > Subject: [PATCH] iwlwifi: Suppress warnings related to GFP_ATOMIC allocations that do not matter > > iwlwifi refills RX buffers in two ways - a direct method using GFP_ATOMIC > and a tasklet method using GFP_KERNEL. There are a number of RX buffers and > there are only serious issues when there are no RX buffers left. The driver > explicitly warns when refills are failing and the buffers are low but it > always warns when a GFP_ATOMIC allocation fails even when there is no > packet loss as a result. No, it does not always warn when a GFP_ATOMIC allocation fails. Please check earlier in iwl_rx_allocate() we have: if (rxq->free_count > RX_LOW_WATERMARK) priority |= __GFP_NOWARN; So it will suppress warnings as long as we have buffers available. We do want to see warnings if memory is below watermark and allocation fails - your patch prevents these warnings from appearing. > This patch specifies __GFP_NOWARN for the direct refill method that uses > GFP_ATOMIC. To help identify where allocation failures might be coming > from, the stack is dumped when the RX queue is dangerously low. > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > --- > drivers/net/wireless/iwlwifi/iwl-rx.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/Documentation/trace/postprocess/trace-pagealloc-postprocess.pl b/Documentation/trace/postprocess/trace-pagealloc-postprocess.pl > old mode 100644 > new mode 100755 > diff --git a/drivers/net/wireless/iwlwifi/iwl-rx.c b/drivers/net/wireless/iwlwifi/iwl-rx.c > index 8e1bb53..f91a108 100644 > --- a/drivers/net/wireless/iwlwifi/iwl-rx.c > +++ b/drivers/net/wireless/iwlwifi/iwl-rx.c > @@ -260,10 +260,12 @@ void iwl_rx_allocate(struct iwl_priv *priv, gfp_t priority) > if (net_ratelimit()) > IWL_DEBUG_INFO(priv, "Failed to allocate SKB buffer.\n"); > if ((rxq->free_count <= RX_LOW_WATERMARK) && > - net_ratelimit()) > + net_ratelimit()) { > IWL_CRIT(priv, "Failed to allocate SKB buffer with %s. Only %u free buffers remaining.\n", > priority == GFP_ATOMIC ? "GFP_ATOMIC" : "GFP_KERNEL", > rxq->free_count); > + dump_stack(); > + } > /* We don't reschedule replenish work here -- we will > * call the restock method and if it still needs > * more buffers it will schedule replenish */ > @@ -320,7 +322,7 @@ EXPORT_SYMBOL(iwl_rx_replenish); > > void iwl_rx_replenish_now(struct iwl_priv *priv) > { > - iwl_rx_allocate(priv, GFP_ATOMIC); > + iwl_rx_allocate(priv, GFP_ATOMIC|__GFP_NOWARN); > > iwl_rx_queue_restock(priv); > } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-14 16:28 ` reinette chatre @ 2009-10-14 16:50 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-14 16:50 UTC (permalink / raw) To: reinette chatre Cc: Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm On Wed, Oct 14, 2009 at 09:28:00AM -0700, reinette chatre wrote: > Hi Mel, > > On Wed, 2009-10-14 at 03:30 -0700, Mel Gorman wrote: > > From 5fb9f897117bf2701f9fdebe4d008dbe34358ab9 Mon Sep 17 00:00:00 2001 > > From: Mel Gorman <mel@csn.ul.ie> > > Date: Wed, 14 Oct 2009 11:19:57 +0100 > > Subject: [PATCH] iwlwifi: Suppress warnings related to GFP_ATOMIC allocations that do not matter > > > > iwlwifi refills RX buffers in two ways - a direct method using GFP_ATOMIC > > and a tasklet method using GFP_KERNEL. There are a number of RX buffers and > > there are only serious issues when there are no RX buffers left. The driver > > explicitly warns when refills are failing and the buffers are low but it > > always warns when a GFP_ATOMIC allocation fails even when there is no > > packet loss as a result. > > > No, it does not always warn when a GFP_ATOMIC allocation fails. Please > check earlier in iwl_rx_allocate() we have: > > if (rxq->free_count > RX_LOW_WATERMARK) > priority |= __GFP_NOWARN; > > So it will suppress warnings as long as we have buffers available. > > We do want to see warnings if memory is below watermark and allocation > fails - your patch prevents these warnings from appearing. > Yeah, the patch is balls and is not the way forward. What is your take on GFP_ATOMIC-direct deleting the pool before the tasklet can refill it with GFP_KERNEL? Should direct allocation be falling back to calling with GFP_KERNEL when the pool has been depleted instead of failing? -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-14 16:50 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-14 16:50 UTC (permalink / raw) To: reinette chatre Cc: Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm On Wed, Oct 14, 2009 at 09:28:00AM -0700, reinette chatre wrote: > Hi Mel, > > On Wed, 2009-10-14 at 03:30 -0700, Mel Gorman wrote: > > From 5fb9f897117bf2701f9fdebe4d008dbe34358ab9 Mon Sep 17 00:00:00 2001 > > From: Mel Gorman <mel@csn.ul.ie> > > Date: Wed, 14 Oct 2009 11:19:57 +0100 > > Subject: [PATCH] iwlwifi: Suppress warnings related to GFP_ATOMIC allocations that do not matter > > > > iwlwifi refills RX buffers in two ways - a direct method using GFP_ATOMIC > > and a tasklet method using GFP_KERNEL. There are a number of RX buffers and > > there are only serious issues when there are no RX buffers left. The driver > > explicitly warns when refills are failing and the buffers are low but it > > always warns when a GFP_ATOMIC allocation fails even when there is no > > packet loss as a result. > > > No, it does not always warn when a GFP_ATOMIC allocation fails. Please > check earlier in iwl_rx_allocate() we have: > > if (rxq->free_count > RX_LOW_WATERMARK) > priority |= __GFP_NOWARN; > > So it will suppress warnings as long as we have buffers available. > > We do want to see warnings if memory is below watermark and allocation > fails - your patch prevents these warnings from appearing. > Yeah, the patch is balls and is not the way forward. What is your take on GFP_ATOMIC-direct deleting the pool before the tasklet can refill it with GFP_KERNEL? Should direct allocation be falling back to calling with GFP_KERNEL when the pool has been depleted instead of failing? -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-14 16:50 ` Mel Gorman @ 2009-10-14 20:41 ` reinette chatre -1 siblings, 0 replies; 384+ messages in thread From: reinette chatre @ 2009-10-14 20:41 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm On Wed, 2009-10-14 at 09:50 -0700, Mel Gorman wrote: > What is your take on GFP_ATOMIC-direct deleting the pool before the tasklet > can refill it with GFP_KERNEL? I am not sure I understand your question. We attempt to reclaim a received buffer on every receive, and with a queue size of 256 + 64 we assume to have a pretty big buffer to deal with cases when allocations fail. So, technically, for us to get into this situation where we start seeing these allocation failures there would have been more than 200 times in which GFP_ATOMIC allocations failed that we did _not_ see since we only see those warnings when there are less than 8 free buffers remaining. More on this below ... > Should direct allocation be falling back to > calling with GFP_KERNEL when the pool has been depleted instead of failing? This is the intention of the current implementation. In the tasklet we run iwl_rx_replenish_now(), which attempts the GFP_ATOMIC allocations first by calling iwl_rx_allocate() with the GFP_ATOMIC flag. No particular action is taken when this fails (apart from the error message), but if the buffers are running low then iwl_rx_queue_restock() (which is also called from iwl_rx_replenish_now()) will queue work that will do the allocation with GFP_KERNEL. We do queue the GFP_KERNEL allocations when there are only a few buffers remaining in the queue (8 right now) ... maybe we can make this higher? I am not sure if this will help in what you are trying to figure out here, but would it help to play with the numbers here? That is, in iwl_rx_queue_restock() we have: if (rxq->free_count <= RX_LOW_WATERMARK) queue_work(priv->workqueue, &priv->rx_replenish); Would it help here to make that value higher? Maybe queue the GFP_KERNEL allocation when there are, for example, 50 or 100 free buffers remaining? Reinette ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-14 20:41 ` reinette chatre 0 siblings, 0 replies; 384+ messages in thread From: reinette chatre @ 2009-10-14 20:41 UTC (permalink / raw) To: Mel Gorman Cc: Frans Pop, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm On Wed, 2009-10-14 at 09:50 -0700, Mel Gorman wrote: > What is your take on GFP_ATOMIC-direct deleting the pool before the tasklet > can refill it with GFP_KERNEL? I am not sure I understand your question. We attempt to reclaim a received buffer on every receive, and with a queue size of 256 + 64 we assume to have a pretty big buffer to deal with cases when allocations fail. So, technically, for us to get into this situation where we start seeing these allocation failures there would have been more than 200 times in which GFP_ATOMIC allocations failed that we did _not_ see since we only see those warnings when there are less than 8 free buffers remaining. More on this below ... > Should direct allocation be falling back to > calling with GFP_KERNEL when the pool has been depleted instead of failing? This is the intention of the current implementation. In the tasklet we run iwl_rx_replenish_now(), which attempts the GFP_ATOMIC allocations first by calling iwl_rx_allocate() with the GFP_ATOMIC flag. No particular action is taken when this fails (apart from the error message), but if the buffers are running low then iwl_rx_queue_restock() (which is also called from iwl_rx_replenish_now()) will queue work that will do the allocation with GFP_KERNEL. We do queue the GFP_KERNEL allocations when there are only a few buffers remaining in the queue (8 right now) ... maybe we can make this higher? I am not sure if this will help in what you are trying to figure out here, but would it help to play with the numbers here? That is, in iwl_rx_queue_restock() we have: if (rxq->free_count <= RX_LOW_WATERMARK) queue_work(priv->workqueue, &priv->rx_replenish); Would it help here to make that value higher? Maybe queue the GFP_KERNEL allocation when there are, for example, 50 or 100 free buffers remaining? Reinette -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-14 20:41 ` reinette chatre @ 2009-10-14 21:33 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-14 21:33 UTC (permalink / raw) To: reinette chatre Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm On Wednesday 14 October 2009, reinette chatre wrote: > We do queue the GFP_KERNEL allocations when there are only a few buffers > remaining in the queue (8 right now) ... Are you sure of this? I have zero messages in my logs about allocation failures with GFP_KERNEL, but I do have plenty with "Only 0 free buffers remaining" with GFP_ATOMIC. Does that indicate a bug or could they fall under the ratelimit somehow? Or do I misunderstand the logic? ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-14 21:33 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-14 21:33 UTC (permalink / raw) To: reinette chatre Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm On Wednesday 14 October 2009, reinette chatre wrote: > We do queue the GFP_KERNEL allocations when there are only a few buffers > remaining in the queue (8 right now) ... Are you sure of this? I have zero messages in my logs about allocation failures with GFP_KERNEL, but I do have plenty with "Only 0 free buffers remaining" with GFP_ATOMIC. Does that indicate a bug or could they fall under the ratelimit somehow? Or do I misunderstand the logic? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-14 21:33 ` Frans Pop @ 2009-10-14 21:55 ` reinette chatre -1 siblings, 0 replies; 384+ messages in thread From: reinette chatre @ 2009-10-14 21:55 UTC (permalink / raw) To: Frans Pop Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm On Wed, 2009-10-14 at 14:33 -0700, Frans Pop wrote: > On Wednesday 14 October 2009, reinette chatre wrote: > > We do queue the GFP_KERNEL allocations when there are only a few buffers > > remaining in the queue (8 right now) ... > > Are you sure of this? I have zero messages in my logs about allocation > failures with GFP_KERNEL, but I do have plenty with "Only 0 free buffers > remaining" with GFP_ATOMIC. That does make sense to me. We do not expect allocations with GFP_KERNEL to fail. Considering how I understand how things work I am considering the following scenario: * start with system low on available memory * now introduce incoming traffic (causing the RX code to run) * upon receipt of frame we attempt an allocation (to reclaim the buffer) with GFP_ATOMIC (state: num RX buffer free > watermark) * this fails since memory is not available * num RX buffer free reduces * does _not_ queue replenishment of buffers with GFP_KERNEL * repeat above until we hit the watermark (currently 8) * upon receipt of frame we attempt an allocation (to reclaim the buffer) with GFP_ATOMIC (state: num RX buffer free <= watermark) * this fails (now user sees big warning) * queue replenishment of buffers with GFP_KERNEL Essentially what I suspect could happen is that we do attempt to replenish the buffers with GFP_KERNEL after several failures with GFP_ATOMIC, but at that point we have already run out completely. One way to test this theory is to queue the GFP_KERNEL allocation earlier (when we still have a significant number of RX buffers available), 8 may turn out to be too small. > Does that indicate a bug or could they fall under the ratelimit somehow? In your kernel log I do see that the driver's error messages related to GFP_ATOMIC are rate limited (we see many more "order-2 allocation failure" messages than the "Failed to allocate" messages). All of these allocation failures are from the "replenish_now" code though, which is GFP_ATOMIC. So even though we do not see the "Failed to allocate" errors (which are rate limited) it seems that all allocation failures are from that (the GFP_ATOMIC) code. Reinette ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-14 21:55 ` reinette chatre 0 siblings, 0 replies; 384+ messages in thread From: reinette chatre @ 2009-10-14 21:55 UTC (permalink / raw) To: Frans Pop Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm On Wed, 2009-10-14 at 14:33 -0700, Frans Pop wrote: > On Wednesday 14 October 2009, reinette chatre wrote: > > We do queue the GFP_KERNEL allocations when there are only a few buffers > > remaining in the queue (8 right now) ... > > Are you sure of this? I have zero messages in my logs about allocation > failures with GFP_KERNEL, but I do have plenty with "Only 0 free buffers > remaining" with GFP_ATOMIC. That does make sense to me. We do not expect allocations with GFP_KERNEL to fail. Considering how I understand how things work I am considering the following scenario: * start with system low on available memory * now introduce incoming traffic (causing the RX code to run) * upon receipt of frame we attempt an allocation (to reclaim the buffer) with GFP_ATOMIC (state: num RX buffer free > watermark) * this fails since memory is not available * num RX buffer free reduces * does _not_ queue replenishment of buffers with GFP_KERNEL * repeat above until we hit the watermark (currently 8) * upon receipt of frame we attempt an allocation (to reclaim the buffer) with GFP_ATOMIC (state: num RX buffer free <= watermark) * this fails (now user sees big warning) * queue replenishment of buffers with GFP_KERNEL Essentially what I suspect could happen is that we do attempt to replenish the buffers with GFP_KERNEL after several failures with GFP_ATOMIC, but at that point we have already run out completely. One way to test this theory is to queue the GFP_KERNEL allocation earlier (when we still have a significant number of RX buffers available), 8 may turn out to be too small. > Does that indicate a bug or could they fall under the ratelimit somehow? In your kernel log I do see that the driver's error messages related to GFP_ATOMIC are rate limited (we see many more "order-2 allocation failure" messages than the "Failed to allocate" messages). All of these allocation failures are from the "replenish_now" code though, which is GFP_ATOMIC. So even though we do not see the "Failed to allocate" errors (which are rate limited) it seems that all allocation failures are from that (the GFP_ATOMIC) code. Reinette -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-14 20:41 ` reinette chatre @ 2009-10-15 2:02 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-15 2:02 UTC (permalink / raw) To: reinette chatre Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm On Wednesday 14 October 2009, reinette chatre wrote: > We do queue the GFP_KERNEL allocations when there are only a few buffers > remaining in the queue (8 right now) ... maybe we can make this higher? I've tried increasing it to 50. Here's the result for a single test: iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 25 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 48 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 48 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 48 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. __ratelimit: 1 callbacks suppressed iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. __ratelimit: 97 callbacks suppressed iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 44 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. This is with current mainline (v2.6.32-rc4-149-ga3ccf63). The log file timestamps don't tell much as the logging gets delayed, so they all end up at the same time. Maybe I should enable the kernel timestamps so we can see how far apart these failures are. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-15 2:02 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-15 2:02 UTC (permalink / raw) To: reinette chatre Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm On Wednesday 14 October 2009, reinette chatre wrote: > We do queue the GFP_KERNEL allocations when there are only a few buffers > remaining in the queue (8 right now) ... maybe we can make this higher? I've tried increasing it to 50. Here's the result for a single test: iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 25 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 48 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 48 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 48 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. __ratelimit: 1 callbacks suppressed iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. __ratelimit: 97 callbacks suppressed iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 44 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. This is with current mainline (v2.6.32-rc4-149-ga3ccf63). The log file timestamps don't tell much as the logging gets delayed, so they all end up at the same time. Maybe I should enable the kernel timestamps so we can see how far apart these failures are. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-15 2:02 ` Frans Pop @ 2009-10-15 15:29 ` reinette chatre -1 siblings, 0 replies; 384+ messages in thread From: reinette chatre @ 2009-10-15 15:29 UTC (permalink / raw) To: Frans Pop Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm On Wed, 2009-10-14 at 19:02 -0700, Frans Pop wrote: > On Wednesday 14 October 2009, reinette chatre wrote: > > We do queue the GFP_KERNEL allocations when there are only a few buffers > > remaining in the queue (8 right now) ... maybe we can make this higher? > > I've tried increasing it to 50. Here's the result for a single test: > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 25 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 48 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 48 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 48 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > __ratelimit: 1 callbacks suppressed > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > __ratelimit: 97 callbacks suppressed > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 44 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > > This is with current mainline (v2.6.32-rc4-149-ga3ccf63). > The log file timestamps don't tell much as the logging gets delayed, > so they all end up at the same time. Maybe I should enable the kernel > timestamps so we can see how far apart these failures are. If you can get accurate timing it will be very useful. I am interested to see how quickly it goes from "48 free buffers" to "0 free buffers". Thank you Reinette ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-15 15:29 ` reinette chatre 0 siblings, 0 replies; 384+ messages in thread From: reinette chatre @ 2009-10-15 15:29 UTC (permalink / raw) To: Frans Pop Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm On Wed, 2009-10-14 at 19:02 -0700, Frans Pop wrote: > On Wednesday 14 October 2009, reinette chatre wrote: > > We do queue the GFP_KERNEL allocations when there are only a few buffers > > remaining in the queue (8 right now) ... maybe we can make this higher? > > I've tried increasing it to 50. Here's the result for a single test: > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 25 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 48 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 48 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 48 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > __ratelimit: 1 callbacks suppressed > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > __ratelimit: 97 callbacks suppressed > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 44 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > iwlagn 0000:10:00.0: Failed to allocate SKB buffer with GFP_ATOMIC. Only 0 free buffers remaining. > > This is with current mainline (v2.6.32-rc4-149-ga3ccf63). > The log file timestamps don't tell much as the logging gets delayed, > so they all end up at the same time. Maybe I should enable the kernel > timestamps so we can see how far apart these failures are. If you can get accurate timing it will be very useful. I am interested to see how quickly it goes from "48 free buffers" to "0 free buffers". Thank you Reinette -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-15 15:29 ` reinette chatre (?) @ 2009-10-15 19:41 ` Frans Pop 2009-10-16 17:21 ` reinette chatre 2009-10-17 5:42 ` reinette chatre -1 siblings, 2 replies; 384+ messages in thread From: Frans Pop @ 2009-10-15 19:41 UTC (permalink / raw) To: reinette chatre Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm [-- Attachment #1: Type: text/plain, Size: 1150 bytes --] On Thursday 15 October 2009, reinette chatre wrote: > > The log file timestamps don't tell much as the logging gets delayed, > > so they all end up at the same time. Maybe I should enable the kernel > > timestamps so we can see how far apart these failures are. > > If you can get accurate timing it will be very useful. I am interested > to see how quickly it goes from "48 free buffers" to "0 free buffers". Attached the dmesg for three consecutive test runs (i.e. without rebooting). Not that the 2nd one includes only "0 free buffers" messages, even though the behavior (point where desktop freezes and music stops) looked similar. Not sure if you can tell all that much from the data. N.B. You may want to clean this up in iwlwifi code: iwl-dev.h:#include "iwl-fh.h" iwl-dev.h:#define RX_LOW_WATERMARK 8 iwl-fh.h:#define RX_LOW_WATERMARK 8 I.e: RX_LOW_WATERMARK is defined in iwl-dev.h even though that includes iwl-fh.h where it's also defined. The same may be true for other defines. I think this gave me an incorrect result the first time I increased the limit as I only changed one of the two files (iwl-dev.h IIRC). Cheers, FJP [-- Attachment #2: dmesg.tgz --] [-- Type: application/x-tgz, Size: 44980 bytes --] ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-15 19:41 ` Frans Pop 2009-10-16 17:21 ` reinette chatre @ 2009-10-16 17:21 ` reinette chatre 1 sibling, 0 replies; 384+ messages in thread From: reinette chatre @ 2009-10-16 17:21 UTC (permalink / raw) To: Frans Pop Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm On Thu, 2009-10-15 at 12:41 -0700, Frans Pop wrote: > On Thursday 15 October 2009, reinette chatre wrote: > > > The log file timestamps don't tell much as the logging gets delayed, > > > so they all end up at the same time. Maybe I should enable the kernel > > > timestamps so we can see how far apart these failures are. > > > > If you can get accurate timing it will be very useful. I am interested > > to see how quickly it goes from "48 free buffers" to "0 free buffers". > > Attached the dmesg for three consecutive test runs (i.e. without > rebooting). Not that the 2nd one includes only "0 free buffers" messages, > even though the behavior (point where desktop freezes and music stops) > looked similar. Thank you very much. I am studying it. > Not sure if you can tell all that much from the data. > > N.B. You may want to clean this up in iwlwifi code: > iwl-dev.h:#include "iwl-fh.h" > iwl-dev.h:#define RX_LOW_WATERMARK 8 > iwl-fh.h:#define RX_LOW_WATERMARK 8 > > I.e: RX_LOW_WATERMARK is defined in iwl-dev.h even though that includes > iwl-fh.h where it's also defined. The same may be true for other defines. Sorry about that. The patch below will fix that. I will send it separately to wireless list. >From 7cc8e6482b359eef5ce099457037a237d355b5b1 Mon Sep 17 00:00:00 2001 From: Reinette Chatre <reinette.chatre@intel.com> Date: Fri, 16 Oct 2009 10:11:10 -0700 Subject: [PATCH] iwlwifi: remove duplicate defines RX_FREE_BUFFERS and RX_LOW_WATERMARK are currently defined in four places. Based on how files are included we only need the definition in iwl-fh.h Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Reported-by: Frans Pop <elendil@planet.nl> --- drivers/net/wireless/iwlwifi/iwl-3945-hw.h | 6 ------ drivers/net/wireless/iwlwifi/iwl-3945.h | 6 ------ drivers/net/wireless/iwlwifi/iwl-dev.h | 6 ------ 3 files changed, 0 insertions(+), 18 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/iwl-3945-hw.h b/drivers/net/wireless/iwlwifi/iwl-3945-hw.h index ccdac69..6fd10d4 100644 --- a/drivers/net/wireless/iwlwifi/iwl-3945-hw.h +++ b/drivers/net/wireless/iwlwifi/iwl-3945-hw.h @@ -248,12 +248,6 @@ struct iwl3945_eeprom { #define TFD_CTL_PAD_SET(n) (n << 28) #define TFD_CTL_PAD_GET(ctl) (ctl >> 28) -/* - * RX related structures and functions - */ -#define RX_FREE_BUFFERS 64 -#define RX_LOW_WATERMARK 8 - /* Sizes and addresses for instruction and data memory (SRAM) in * 3945's embedded processor. Driver access is via HBUS_TARG_MEM_* regs. */ #define IWL39_RTC_INST_LOWER_BOUND (0x000000) diff --git a/drivers/net/wireless/iwlwifi/iwl-3945.h b/drivers/net/wireless/iwlwifi/iwl-3945.h index f3907c1..84fa0d7 100644 --- a/drivers/net/wireless/iwlwifi/iwl-3945.h +++ b/drivers/net/wireless/iwlwifi/iwl-3945.h @@ -130,12 +130,6 @@ struct iwl3945_frame { #define SN_TO_SEQ(ssn) (((ssn) << 4) & IEEE80211_SCTL_SEQ) #define MAX_SN ((IEEE80211_SCTL_SEQ) >> 4) -/* - * RX related structures and functions - */ -#define RX_FREE_BUFFERS 64 -#define RX_LOW_WATERMARK 8 - #define SUP_RATE_11A_MAX_NUM_CHANNELS 8 #define SUP_RATE_11B_MAX_NUM_CHANNELS 4 #define SUP_RATE_11G_MAX_NUM_CHANNELS 12 diff --git a/drivers/net/wireless/iwlwifi/iwl-dev.h b/drivers/net/wireless/iwlwifi/iwl-dev.h index 1378654..0fa0cf5 100644 --- a/drivers/net/wireless/iwlwifi/iwl-dev.h +++ b/drivers/net/wireless/iwlwifi/iwl-dev.h @@ -406,12 +406,6 @@ struct iwl_host_cmd { u8 id; }; -/* - * RX related structures and functions - */ -#define RX_FREE_BUFFERS 64 -#define RX_LOW_WATERMARK 8 - #define SUP_RATE_11A_MAX_NUM_CHANNELS 8 #define SUP_RATE_11B_MAX_NUM_CHANNELS 4 #define SUP_RATE_11G_MAX_NUM_CHANNELS 12 -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-16 17:21 ` reinette chatre 0 siblings, 0 replies; 384+ messages in thread From: reinette chatre @ 2009-10-16 17:21 UTC (permalink / raw) To: Frans Pop Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm On Thu, 2009-10-15 at 12:41 -0700, Frans Pop wrote: > On Thursday 15 October 2009, reinette chatre wrote: > > > The log file timestamps don't tell much as the logging gets delayed, > > > so they all end up at the same time. Maybe I should enable the kernel > > > timestamps so we can see how far apart these failures are. > > > > If you can get accurate timing it will be very useful. I am interested > > to see how quickly it goes from "48 free buffers" to "0 free buffers". > > Attached the dmesg for three consecutive test runs (i.e. without > rebooting). Not that the 2nd one includes only "0 free buffers" messages, > even though the behavior (point where desktop freezes and music stops) > looked similar. Thank you very much. I am studying it. > Not sure if you can tell all that much from the data. > > N.B. You may want to clean this up in iwlwifi code: > iwl-dev.h:#include "iwl-fh.h" > iwl-dev.h:#define RX_LOW_WATERMARK 8 > iwl-fh.h:#define RX_LOW_WATERMARK 8 > > I.e: RX_LOW_WATERMARK is defined in iwl-dev.h even though that includes > iwl-fh.h where it's also defined. The same may be true for other defines. Sorry about that. The patch below will fix that. I will send it separately to wireless list. From 7cc8e6482b359eef5ce099457037a237d355b5b1 Mon Sep 17 00:00:00 2001 From: Reinette Chatre <reinette.chatre@intel.com> Date: Fri, 16 Oct 2009 10:11:10 -0700 Subject: [PATCH] iwlwifi: remove duplicate defines RX_FREE_BUFFERS and RX_LOW_WATERMARK are currently defined in four places. Based on how files are included we only need the definition in iwl-fh.h Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Reported-by: Frans Pop <elendil@planet.nl> --- drivers/net/wireless/iwlwifi/iwl-3945-hw.h | 6 ------ drivers/net/wireless/iwlwifi/iwl-3945.h | 6 ------ drivers/net/wireless/iwlwifi/iwl-dev.h | 6 ------ 3 files changed, 0 insertions(+), 18 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/iwl-3945-hw.h b/drivers/net/wireless/iwlwifi/iwl-3945-hw.h index ccdac69..6fd10d4 100644 --- a/drivers/net/wireless/iwlwifi/iwl-3945-hw.h +++ b/drivers/net/wireless/iwlwifi/iwl-3945-hw.h @@ -248,12 +248,6 @@ struct iwl3945_eeprom { #define TFD_CTL_PAD_SET(n) (n << 28) #define TFD_CTL_PAD_GET(ctl) (ctl >> 28) -/* - * RX related structures and functions - */ -#define RX_FREE_BUFFERS 64 -#define RX_LOW_WATERMARK 8 - /* Sizes and addresses for instruction and data memory (SRAM) in * 3945's embedded processor. Driver access is via HBUS_TARG_MEM_* regs. */ #define IWL39_RTC_INST_LOWER_BOUND (0x000000) diff --git a/drivers/net/wireless/iwlwifi/iwl-3945.h b/drivers/net/wireless/iwlwifi/iwl-3945.h index f3907c1..84fa0d7 100644 --- a/drivers/net/wireless/iwlwifi/iwl-3945.h +++ b/drivers/net/wireless/iwlwifi/iwl-3945.h @@ -130,12 +130,6 @@ struct iwl3945_frame { #define SN_TO_SEQ(ssn) (((ssn) << 4) & IEEE80211_SCTL_SEQ) #define MAX_SN ((IEEE80211_SCTL_SEQ) >> 4) -/* - * RX related structures and functions - */ -#define RX_FREE_BUFFERS 64 -#define RX_LOW_WATERMARK 8 - #define SUP_RATE_11A_MAX_NUM_CHANNELS 8 #define SUP_RATE_11B_MAX_NUM_CHANNELS 4 #define SUP_RATE_11G_MAX_NUM_CHANNELS 12 diff --git a/drivers/net/wireless/iwlwifi/iwl-dev.h b/drivers/net/wireless/iwlwifi/iwl-dev.h index 1378654..0fa0cf5 100644 --- a/drivers/net/wireless/iwlwifi/iwl-dev.h +++ b/drivers/net/wireless/iwlwifi/iwl-dev.h @@ -406,12 +406,6 @@ struct iwl_host_cmd { u8 id; }; -/* - * RX related structures and functions - */ -#define RX_FREE_BUFFERS 64 -#define RX_LOW_WATERMARK 8 - #define SUP_RATE_11A_MAX_NUM_CHANNELS 8 #define SUP_RATE_11B_MAX_NUM_CHANNELS 4 #define SUP_RATE_11G_MAX_NUM_CHANNELS 12 -- 1.5.6.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-16 17:21 ` reinette chatre 0 siblings, 0 replies; 384+ messages in thread From: reinette chatre @ 2009-10-16 17:21 UTC (permalink / raw) To: Frans Pop Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm On Thu, 2009-10-15 at 12:41 -0700, Frans Pop wrote: > On Thursday 15 October 2009, reinette chatre wrote: > > > The log file timestamps don't tell much as the logging gets delayed, > > > so they all end up at the same time. Maybe I should enable the kernel > > > timestamps so we can see how far apart these failures are. > > > > If you can get accurate timing it will be very useful. I am interested > > to see how quickly it goes from "48 free buffers" to "0 free buffers". > > Attached the dmesg for three consecutive test runs (i.e. without > rebooting). Not that the 2nd one includes only "0 free buffers" messages, > even though the behavior (point where desktop freezes and music stops) > looked similar. Thank you very much. I am studying it. > Not sure if you can tell all that much from the data. > > N.B. You may want to clean this up in iwlwifi code: > iwl-dev.h:#include "iwl-fh.h" > iwl-dev.h:#define RX_LOW_WATERMARK 8 > iwl-fh.h:#define RX_LOW_WATERMARK 8 > > I.e: RX_LOW_WATERMARK is defined in iwl-dev.h even though that includes > iwl-fh.h where it's also defined. The same may be true for other defines. Sorry about that. The patch below will fix that. I will send it separately to wireless list. >From 7cc8e6482b359eef5ce099457037a237d355b5b1 Mon Sep 17 00:00:00 2001 From: Reinette Chatre <reinette.chatre@intel.com> Date: Fri, 16 Oct 2009 10:11:10 -0700 Subject: [PATCH] iwlwifi: remove duplicate defines RX_FREE_BUFFERS and RX_LOW_WATERMARK are currently defined in four places. Based on how files are included we only need the definition in iwl-fh.h Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Reported-by: Frans Pop <elendil@planet.nl> --- drivers/net/wireless/iwlwifi/iwl-3945-hw.h | 6 ------ drivers/net/wireless/iwlwifi/iwl-3945.h | 6 ------ drivers/net/wireless/iwlwifi/iwl-dev.h | 6 ------ 3 files changed, 0 insertions(+), 18 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/iwl-3945-hw.h b/drivers/net/wireless/iwlwifi/iwl-3945-hw.h index ccdac69..6fd10d4 100644 --- a/drivers/net/wireless/iwlwifi/iwl-3945-hw.h +++ b/drivers/net/wireless/iwlwifi/iwl-3945-hw.h @@ -248,12 +248,6 @@ struct iwl3945_eeprom { #define TFD_CTL_PAD_SET(n) (n << 28) #define TFD_CTL_PAD_GET(ctl) (ctl >> 28) -/* - * RX related structures and functions - */ -#define RX_FREE_BUFFERS 64 -#define RX_LOW_WATERMARK 8 - /* Sizes and addresses for instruction and data memory (SRAM) in * 3945's embedded processor. Driver access is via HBUS_TARG_MEM_* regs. */ #define IWL39_RTC_INST_LOWER_BOUND (0x000000) diff --git a/drivers/net/wireless/iwlwifi/iwl-3945.h b/drivers/net/wireless/iwlwifi/iwl-3945.h index f3907c1..84fa0d7 100644 --- a/drivers/net/wireless/iwlwifi/iwl-3945.h +++ b/drivers/net/wireless/iwlwifi/iwl-3945.h @@ -130,12 +130,6 @@ struct iwl3945_frame { #define SN_TO_SEQ(ssn) (((ssn) << 4) & IEEE80211_SCTL_SEQ) #define MAX_SN ((IEEE80211_SCTL_SEQ) >> 4) -/* - * RX related structures and functions - */ -#define RX_FREE_BUFFERS 64 -#define RX_LOW_WATERMARK 8 - #define SUP_RATE_11A_MAX_NUM_CHANNELS 8 #define SUP_RATE_11B_MAX_NUM_CHANNELS 4 #define SUP_RATE_11G_MAX_NUM_CHANNELS 12 diff --git a/drivers/net/wireless/iwlwifi/iwl-dev.h b/drivers/net/wireless/iwlwifi/iwl-dev.h index 1378654..0fa0cf5 100644 --- a/drivers/net/wireless/iwlwifi/iwl-dev.h +++ b/drivers/net/wireless/iwlwifi/iwl-dev.h @@ -406,12 +406,6 @@ struct iwl_host_cmd { u8 id; }; -/* - * RX related structures and functions - */ -#define RX_FREE_BUFFERS 64 -#define RX_LOW_WATERMARK 8 - #define SUP_RATE_11A_MAX_NUM_CHANNELS 8 #define SUP_RATE_11B_MAX_NUM_CHANNELS 4 #define SUP_RATE_11G_MAX_NUM_CHANNELS 12 -- 1.5.6.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-15 19:41 ` Frans Pop 2009-10-16 17:21 ` reinette chatre @ 2009-10-17 5:42 ` reinette chatre 1 sibling, 0 replies; 384+ messages in thread From: reinette chatre @ 2009-10-17 5:42 UTC (permalink / raw) To: Frans Pop Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm [-- Attachment #1: Type: text/plain, Size: 1102 bytes --] Hi Frans, On Thu, 2009-10-15 at 12:41 -0700, Frans Pop wrote: > On Thursday 15 October 2009, reinette chatre wrote: > > > The log file timestamps don't tell much as the logging gets delayed, > > > so they all end up at the same time. Maybe I should enable the kernel > > > timestamps so we can see how far apart these failures are. > > > > If you can get accurate timing it will be very useful. I am interested > > to see how quickly it goes from "48 free buffers" to "0 free buffers". > > Attached the dmesg for three consecutive test runs (i.e. without > rebooting). Not that the 2nd one includes only "0 free buffers" messages, > even though the behavior (point where desktop freezes and music stops) > looked similar. > > Not sure if you can tell all that much from the data. > Prompted by this thread we are in process of moving allocation to paged skb. This will definitely reduce the allocation size (from order 2 to order 1) and hopefully help with this problem also. Could you please try with the attached two patches? They are based on 2.6.32-rc4. Thank you very much Reinette [-- Attachment #2: 0001-iwlwifi-use-paged-Rx.patch --] [-- Type: text/x-patch, Size: 50735 bytes --] >From d94fc37fb25aacec8a41e3d14ec333fa5a8f681e Mon Sep 17 00:00:00 2001 From: Zhu Yi <yi.zhu@intel.com> Date: Fri, 9 Oct 2009 17:19:45 +0800 Subject: [PATCH 1/2] iwlwifi: use paged Rx This switches the iwlwifi driver to use paged skb from linear skb for Rx buffer. So that it relieves some Rx buffer allocation pressure for the memory subsystem. Currently iwlwifi (4K for 3945) requests 8K bytes for Rx buffer. Due to the trailing skb_shared_info in the skb->data, alloc_skb() will do the next order allocation, which is 16K bytes. This is suboptimal and more likely to fail when the system is under memory usage pressure. Switching to paged Rx skb lets us allocate the RXB directly by alloc_pages(), so that only order 1 allocation is required. It also adjusts the area spin_lock (with IRQ disabled) protected in the tasklet because tasklet guarentees to run only on one CPU and the new unprotected code can be preempted by the IRQ handler. This saves us from spawning another workqueue to make skb_linearize/__pskb_pull_tail happy (which cannot be called in hard irq context). Finally, mac80211 doesn't support paged Rx yet. So we linearize the skb for all the management frames and software decryption or defragmentation required data frames before handed to mac80211. For all the other frames, we __pskb_pull_tail 64 bytes in the linear area of the skb for mac80211 to handle them properly. Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> --- drivers/net/wireless/iwlwifi/iwl-3945.c | 67 ++++++++++----- drivers/net/wireless/iwlwifi/iwl-4965.c | 2 +- drivers/net/wireless/iwlwifi/iwl-5000.c | 4 +- drivers/net/wireless/iwlwifi/iwl-agn.c | 42 ++++----- drivers/net/wireless/iwlwifi/iwl-commands.h | 10 ++ drivers/net/wireless/iwlwifi/iwl-core.c | 13 ++-- drivers/net/wireless/iwlwifi/iwl-core.h | 2 +- drivers/net/wireless/iwlwifi/iwl-dev.h | 27 ++++-- drivers/net/wireless/iwlwifi/iwl-hcmd.c | 21 ++---- drivers/net/wireless/iwlwifi/iwl-rx.c | 122 +++++++++++++++++---------- drivers/net/wireless/iwlwifi/iwl-scan.c | 20 ++-- drivers/net/wireless/iwlwifi/iwl-spectrum.c | 2 +- drivers/net/wireless/iwlwifi/iwl-sta.c | 62 +++++-------- drivers/net/wireless/iwlwifi/iwl-tx.c | 10 +- drivers/net/wireless/iwlwifi/iwl3945-base.c | 120 +++++++++++++------------- 15 files changed, 284 insertions(+), 240 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/iwl-3945.c b/drivers/net/wireless/iwlwifi/iwl-3945.c index f059b49..7d5962d 100644 --- a/drivers/net/wireless/iwlwifi/iwl-3945.c +++ b/drivers/net/wireless/iwlwifi/iwl-3945.c @@ -293,7 +293,7 @@ static void iwl3945_tx_queue_reclaim(struct iwl_priv *priv, static void iwl3945_rx_reply_tx(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (void *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); u16 sequence = le16_to_cpu(pkt->hdr.sequence); int txq_id = SEQ_TO_QUEUE(sequence); int index = SEQ_TO_INDEX(sequence); @@ -353,7 +353,7 @@ static void iwl3945_rx_reply_tx(struct iwl_priv *priv, void iwl3945_hw_rx_statistics(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (void *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); IWL_DEBUG_RX(priv, "Statistics notification received (%d vs %d).\n", (int)sizeof(struct iwl3945_notif_statistics), le32_to_cpu(pkt->len_n_flags) & FH_RSCSR_FRAME_SIZE_MSK); @@ -545,14 +545,17 @@ static void iwl3945_pass_packet_to_mac80211(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb, struct ieee80211_rx_status *stats) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)IWL_RX_DATA(pkt); struct iwl3945_rx_frame_hdr *rx_hdr = IWL_RX_HDR(pkt); struct iwl3945_rx_frame_end *rx_end = IWL_RX_END(pkt); - short len = le16_to_cpu(rx_hdr->len); + u16 len = le16_to_cpu(rx_hdr->len); + struct sk_buff *skb; + int ret; /* We received data from the HW, so stop the watchdog */ - if (unlikely((len + IWL39_RX_FRAME_SIZE) > skb_tailroom(rxb->skb))) { + if (unlikely(len + IWL39_RX_FRAME_SIZE > + PAGE_SIZE << priv->hw_params.rx_page_order)) { IWL_DEBUG_DROP(priv, "Corruption detected!\n"); return; } @@ -564,24 +567,49 @@ static void iwl3945_pass_packet_to_mac80211(struct iwl_priv *priv, return; } - skb_reserve(rxb->skb, (void *)rx_hdr->payload - (void *)pkt); - /* Set the size of the skb to the size of the frame */ - skb_put(rxb->skb, le16_to_cpu(rx_hdr->len)); + skb = alloc_skb(IWL_LINK_HDR_MAX, GFP_ATOMIC); + if (!skb) { + IWL_ERR(priv, "alloc_skb failed\n"); + return; + } if (!iwl3945_mod_params.sw_crypto) iwl_set_decrypted_flag(priv, - (struct ieee80211_hdr *)rxb->skb->data, + (struct ieee80211_hdr *)rxb_addr(rxb), le32_to_cpu(rx_end->status), stats); + skb_add_rx_frag(skb, 0, rxb->page, + (void *)rx_hdr->payload - (void *)pkt, len); + + /* mac80211 currently doesn't support paged SKB. Convert it to + * linear SKB for management frame and data frame requires + * software decryption or software defragementation. */ + if (ieee80211_is_mgmt(hdr->frame_control) || + ieee80211_has_protected(hdr->frame_control) || + ieee80211_has_morefrags(hdr->frame_control) || + le16_to_cpu(hdr->seq_ctrl) & IEEE80211_SCTL_FRAG) + ret = skb_linearize(skb); + else + ret = __pskb_pull_tail(skb, min_t(u16, IWL_LINK_HDR_MAX, len)) ? + 0 : -ENOMEM; + + if (ret) { + kfree_skb(skb); + goto out; + } + #ifdef CONFIG_IWLWIFI_LEDS if (ieee80211_is_data(hdr->frame_control)) priv->rxtxpackets += len; #endif iwl_update_stats(priv, false, hdr->frame_control, len); - memcpy(IEEE80211_SKB_RXCB(rxb->skb), stats, sizeof(*stats)); - ieee80211_rx_irqsafe(priv->hw, rxb->skb); - rxb->skb = NULL; + memcpy(IEEE80211_SKB_RXCB(skb), stats, sizeof(*stats)); + ieee80211_rx(priv->hw, skb); + + out: + priv->alloc_rxb_page--; + rxb->page = NULL; } #define IWL_DELAY_NEXT_SCAN_AFTER_ASSOC (HZ*6) @@ -591,7 +619,7 @@ static void iwl3945_rx_reply_rx(struct iwl_priv *priv, { struct ieee80211_hdr *header; struct ieee80211_rx_status rx_status; - struct iwl_rx_packet *pkt = (void *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl3945_rx_frame_stats *rx_stats = IWL_RX_STATS(pkt); struct iwl3945_rx_frame_hdr *rx_hdr = IWL_RX_HDR(pkt); struct iwl3945_rx_frame_end *rx_end = IWL_RX_END(pkt); @@ -1858,7 +1886,7 @@ int iwl3945_hw_reg_set_txpower(struct iwl_priv *priv, s8 power) static int iwl3945_send_rxon_assoc(struct iwl_priv *priv) { int rc = 0; - struct iwl_rx_packet *res = NULL; + struct iwl_rx_packet *pkt; struct iwl3945_rxon_assoc_cmd rxon_assoc; struct iwl_host_cmd cmd = { .id = REPLY_RXON_ASSOC, @@ -1887,14 +1915,14 @@ static int iwl3945_send_rxon_assoc(struct iwl_priv *priv) if (rc) return rc; - res = (struct iwl_rx_packet *)cmd.reply_skb->data; - if (res->hdr.flags & IWL_CMD_FAILED_MSK) { + pkt = (struct iwl_rx_packet *)cmd.reply_page; + if (pkt->hdr.flags & IWL_CMD_FAILED_MSK) { IWL_ERR(priv, "Bad return from REPLY_RXON_ASSOC command\n"); rc = -EIO; } - priv->alloc_rxb_skb--; - dev_kfree_skb_any(cmd.reply_skb); + priv->alloc_rxb_page--; + free_pages(cmd.reply_page, priv->hw_params.rx_page_order); return rc; } @@ -2560,8 +2588,7 @@ int iwl3945_hw_set_hw_params(struct iwl_priv *priv) priv->hw_params.max_txq_num = IWL39_NUM_QUEUES; priv->hw_params.tfd_size = sizeof(struct iwl3945_tfd); - priv->hw_params.rx_buf_size = IWL_RX_BUF_SIZE_3K; - priv->hw_params.max_pkt_size = 2342; + priv->hw_params.rx_page_order = get_order(IWL_RX_BUF_SIZE_3K); priv->hw_params.max_rxq_size = RX_QUEUE_SIZE; priv->hw_params.max_rxq_log = RX_QUEUE_SIZE_LOG; priv->hw_params.max_stations = IWL3945_STATION_COUNT; diff --git a/drivers/net/wireless/iwlwifi/iwl-4965.c b/drivers/net/wireless/iwlwifi/iwl-4965.c index 6f703a0..e7c67d8 100644 --- a/drivers/net/wireless/iwlwifi/iwl-4965.c +++ b/drivers/net/wireless/iwlwifi/iwl-4965.c @@ -2078,7 +2078,7 @@ static int iwl4965_tx_status_reply_tx(struct iwl_priv *priv, static void iwl4965_rx_reply_tx(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); u16 sequence = le16_to_cpu(pkt->hdr.sequence); int txq_id = SEQ_TO_QUEUE(sequence); int index = SEQ_TO_INDEX(sequence); diff --git a/drivers/net/wireless/iwlwifi/iwl-5000.c b/drivers/net/wireless/iwlwifi/iwl-5000.c index 6e6f516..29dfe27 100644 --- a/drivers/net/wireless/iwlwifi/iwl-5000.c +++ b/drivers/net/wireless/iwlwifi/iwl-5000.c @@ -493,7 +493,7 @@ static int iwl5000_send_calib_cfg(struct iwl_priv *priv) static void iwl5000_rx_calib_result(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (void *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_calib_hdr *hdr = (struct iwl_calib_hdr *)pkt->u.raw; int len = le32_to_cpu(pkt->len_n_flags) & FH_RSCSR_FRAME_SIZE_MSK; int index; @@ -1218,7 +1218,7 @@ static int iwl5000_tx_status_reply_tx(struct iwl_priv *priv, static void iwl5000_rx_reply_tx(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); u16 sequence = le16_to_cpu(pkt->hdr.sequence); int txq_id = SEQ_TO_QUEUE(sequence); int index = SEQ_TO_INDEX(sequence); diff --git a/drivers/net/wireless/iwlwifi/iwl-agn.c b/drivers/net/wireless/iwlwifi/iwl-agn.c index eaafae0..c5ff7c0 100644 --- a/drivers/net/wireless/iwlwifi/iwl-agn.c +++ b/drivers/net/wireless/iwlwifi/iwl-agn.c @@ -521,7 +521,7 @@ int iwl_hw_tx_queue_init(struct iwl_priv *priv, static void iwl_rx_reply_alive(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_alive_resp *palive; struct delayed_work *pwork; @@ -607,7 +607,7 @@ static void iwl_rx_beacon_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { #ifdef CONFIG_IWLWIFI_DEBUG - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl4965_beacon_notif *beacon = (struct iwl4965_beacon_notif *)pkt->u.raw; u8 rate = iwl_hw_get_rate(beacon->beacon_notify_hdr.rate_n_flags); @@ -631,7 +631,7 @@ static void iwl_rx_beacon_notif(struct iwl_priv *priv, static void iwl_rx_card_state_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); u32 flags = le32_to_cpu(pkt->u.card_state_notif.flags); unsigned long status = priv->status; @@ -783,10 +783,10 @@ void iwl_rx_handle(struct iwl_priv *priv) rxq->queue[i] = NULL; - pci_unmap_single(priv->pci_dev, rxb->real_dma_addr, - priv->hw_params.rx_buf_size + 256, - PCI_DMA_FROMDEVICE); - pkt = (struct iwl_rx_packet *)rxb->skb->data; + pci_unmap_page(priv->pci_dev, rxb->page_dma, + PAGE_SIZE << priv->hw_params.rx_page_order, + PCI_DMA_FROMDEVICE); + pkt = rxb_addr(rxb); /* Reclaim a command buffer only if this packet is a response * to a (driver-originated) command. @@ -819,10 +819,10 @@ void iwl_rx_handle(struct iwl_priv *priv) } if (reclaim) { - /* Invoke any callbacks, transfer the skb to caller, and - * fire off the (possibly) blocking iwl_send_cmd() + /* Invoke any callbacks, transfer the buffer to caller, + * and fire off the (possibly) blocking iwl_send_cmd() * as we reclaim the driver command queue */ - if (rxb && rxb->skb) + if (rxb && rxb->page) iwl_tx_cmd_complete(priv, rxb); else IWL_WARN(priv, "Claim null rxb?\n"); @@ -831,10 +831,10 @@ void iwl_rx_handle(struct iwl_priv *priv) /* For now we just don't re-use anything. We can tweak this * later to try and re-use notification packets and SKBs that * fail to Rx correctly */ - if (rxb->skb != NULL) { - priv->alloc_rxb_skb--; - dev_kfree_skb_any(rxb->skb); - rxb->skb = NULL; + if (rxb->page != NULL) { + priv->alloc_rxb_page--; + __free_pages(rxb->page, priv->hw_params.rx_page_order); + rxb->page = NULL; } spin_lock_irqsave(&rxq->lock, flags); @@ -901,6 +901,8 @@ static void iwl_irq_tasklet_legacy(struct iwl_priv *priv) } #endif + spin_unlock_irqrestore(&priv->lock, flags); + /* Since CSR_INT and CSR_FH_INT_STATUS reads and clears are not * atomic, make sure that inta covers all the interrupts that * we've discovered, even if FH interrupt came in just after @@ -922,8 +924,6 @@ static void iwl_irq_tasklet_legacy(struct iwl_priv *priv) handled |= CSR_INT_BIT_HW_ERR; - spin_unlock_irqrestore(&priv->lock, flags); - return; } @@ -1050,7 +1050,6 @@ static void iwl_irq_tasklet_legacy(struct iwl_priv *priv) "flags 0x%08lx\n", inta, inta_mask, inta_fh, flags); } #endif - spin_unlock_irqrestore(&priv->lock, flags); } /* tasklet for iwlagn interrupt */ @@ -1080,6 +1079,9 @@ static void iwl_irq_tasklet(struct iwl_priv *priv) inta, inta_mask); } #endif + + spin_unlock_irqrestore(&priv->lock, flags); + /* saved interrupt in inta variable now we can reset priv->inta */ priv->inta = 0; @@ -1095,8 +1097,6 @@ static void iwl_irq_tasklet(struct iwl_priv *priv) handled |= CSR_INT_BIT_HW_ERR; - spin_unlock_irqrestore(&priv->lock, flags); - return; } @@ -1236,14 +1236,10 @@ static void iwl_irq_tasklet(struct iwl_priv *priv) inta & ~priv->inta_mask); } - /* Re-enable all interrupts */ /* only Re-enable if diabled by irq */ if (test_bit(STATUS_INT_ENABLED, &priv->status)) iwl_enable_interrupts(priv); - - spin_unlock_irqrestore(&priv->lock, flags); - } diff --git a/drivers/net/wireless/iwlwifi/iwl-commands.h b/drivers/net/wireless/iwlwifi/iwl-commands.h index 4afaf77..dd54bf2 100644 --- a/drivers/net/wireless/iwlwifi/iwl-commands.h +++ b/drivers/net/wireless/iwlwifi/iwl-commands.h @@ -3495,6 +3495,16 @@ struct iwl_wimax_coex_cmd { *****************************************************************************/ struct iwl_rx_packet { + /* + * The first 4 bytes of the RX frame header contain both the RX frame + * size and some flags. + * Bit fields: + * 31: flag flush RB request + * 30: flag ignore TC (terminal counter) request + * 29: flag fast IRQ request + * 28-14: Reserved + * 13-00: RX frame size + */ __le32 len_n_flags; struct iwl_cmd_header hdr; union { diff --git a/drivers/net/wireless/iwlwifi/iwl-core.c b/drivers/net/wireless/iwlwifi/iwl-core.c index 2dc9287..bb9ff29 100644 --- a/drivers/net/wireless/iwlwifi/iwl-core.c +++ b/drivers/net/wireless/iwlwifi/iwl-core.c @@ -1281,7 +1281,7 @@ static void iwl_set_rate(struct iwl_priv *priv) void iwl_rx_csa(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_rxon_cmd *rxon = (void *)&priv->active_rxon; struct iwl_csa_notification *csa = &(pkt->u.csa_notif); IWL_DEBUG_11H(priv, "CSA notif: channel %d, status %d\n", @@ -1456,10 +1456,9 @@ int iwl_set_hw_params(struct iwl_priv *priv) priv->hw_params.max_rxq_size = RX_QUEUE_SIZE; priv->hw_params.max_rxq_log = RX_QUEUE_SIZE_LOG; if (priv->cfg->mod_params->amsdu_size_8K) - priv->hw_params.rx_buf_size = IWL_RX_BUF_SIZE_8K; + priv->hw_params.rx_page_order = get_order(IWL_RX_BUF_SIZE_8K); else - priv->hw_params.rx_buf_size = IWL_RX_BUF_SIZE_4K; - priv->hw_params.max_pkt_size = priv->hw_params.rx_buf_size - 256; + priv->hw_params.rx_page_order = get_order(IWL_RX_BUF_SIZE_4K); priv->hw_params.max_beacon_itrvl = IWL_MAX_UCODE_BEACON_INTERVAL; @@ -2143,7 +2142,7 @@ void iwl_rx_pm_sleep_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { #ifdef CONFIG_IWLWIFI_DEBUG - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_sleep_notification *sleep = &(pkt->u.sleep_notif); IWL_DEBUG_RX(priv, "sleep mode: %d, src: %d\n", sleep->pm_sleep_mode, sleep->pm_wakeup_src); @@ -2154,7 +2153,7 @@ EXPORT_SYMBOL(iwl_rx_pm_sleep_notif); void iwl_rx_pm_debug_statistics_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); u32 len = le32_to_cpu(pkt->len_n_flags) & FH_RSCSR_FRAME_SIZE_MSK; IWL_DEBUG_RADIO(priv, "Dumping %d bytes of unhandled " "notification for %s:\n", len, @@ -2166,7 +2165,7 @@ EXPORT_SYMBOL(iwl_rx_pm_debug_statistics_notif); void iwl_rx_reply_error(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); IWL_ERR(priv, "Error Reply type 0x%08X cmd %s (0x%02X) " "seq 0x%04X ser 0x%08X\n", diff --git a/drivers/net/wireless/iwlwifi/iwl-core.h b/drivers/net/wireless/iwlwifi/iwl-core.h index e50103a..d95674e 100644 --- a/drivers/net/wireless/iwlwifi/iwl-core.h +++ b/drivers/net/wireless/iwlwifi/iwl-core.h @@ -509,7 +509,7 @@ int iwl_send_cmd_pdu_async(struct iwl_priv *priv, u8 id, u16 len, const void *data, void (*callback)(struct iwl_priv *priv, struct iwl_device_cmd *cmd, - struct sk_buff *skb)); + struct iwl_rx_packet *pkt)); int iwl_enqueue_hcmd(struct iwl_priv *priv, struct iwl_host_cmd *cmd); diff --git a/drivers/net/wireless/iwlwifi/iwl-dev.h b/drivers/net/wireless/iwlwifi/iwl-dev.h index 028d505..7fb1688 100644 --- a/drivers/net/wireless/iwlwifi/iwl-dev.h +++ b/drivers/net/wireless/iwlwifi/iwl-dev.h @@ -144,12 +144,13 @@ extern void iwl5000_temperature(struct iwl_priv *priv); #define DEFAULT_LONG_RETRY_LIMIT 4U struct iwl_rx_mem_buffer { - dma_addr_t real_dma_addr; - dma_addr_t aligned_dma_addr; - struct sk_buff *skb; + dma_addr_t page_dma; + struct page *page; struct list_head list; }; +#define rxb_addr(r) page_address(r->page) + /* defined below */ struct iwl_device_cmd; @@ -165,7 +166,7 @@ struct iwl_cmd_meta { */ void (*callback)(struct iwl_priv *priv, struct iwl_device_cmd *cmd, - struct sk_buff *skb); + struct iwl_rx_packet *pkt); /* The CMD_SIZE_HUGE flag bit indicates that the command * structure is stored at the end of the shared queue memory. */ @@ -358,6 +359,13 @@ enum { #define IWL_CMD_MAX_PAYLOAD 320 +/* + * IWL_LINK_HDR_MAX should include ieee80211_hdr, radiotap header, + * SNAP header and alignment. It should also be big enough for 802.11 + * control frames. + */ +#define IWL_LINK_HDR_MAX 64 + /** * struct iwl_device_cmd * @@ -382,10 +390,10 @@ struct iwl_device_cmd { struct iwl_host_cmd { const void *data; - struct sk_buff *reply_skb; + unsigned long reply_page; void (*callback)(struct iwl_priv *priv, struct iwl_device_cmd *cmd, - struct sk_buff *skb); + struct iwl_rx_packet *pkt); u32 flags; u16 len; u8 id; @@ -639,7 +647,7 @@ struct iwl_sensitivity_ranges { * @valid_tx/rx_ant: usable antennas * @max_rxq_size: Max # Rx frames in Rx queue (must be power-of-2) * @max_rxq_log: Log-base-2 of max_rxq_size - * @rx_buf_size: Rx buffer size + * @rx_page_order: Rx buffer page order * @rx_wrt_ptr_reg: FH{39}_RSCSR_CHNL0_WPTR * @max_stations: * @bcast_sta_id: @@ -662,9 +670,8 @@ struct iwl_hw_params { u8 valid_rx_ant; u16 max_rxq_size; u16 max_rxq_log; - u32 rx_buf_size; + u32 rx_page_order; u32 rx_wrt_ptr_reg; - u32 max_pkt_size; u8 max_stations; u8 bcast_sta_id; u8 ht40_channel; @@ -976,7 +983,7 @@ struct iwl_priv { int frames_count; enum ieee80211_band band; - int alloc_rxb_skb; + int alloc_rxb_page; void (*rx_handlers[REPLY_MAX])(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb); diff --git a/drivers/net/wireless/iwlwifi/iwl-hcmd.c b/drivers/net/wireless/iwlwifi/iwl-hcmd.c index a6856da..1bf17d2 100644 --- a/drivers/net/wireless/iwlwifi/iwl-hcmd.c +++ b/drivers/net/wireless/iwlwifi/iwl-hcmd.c @@ -104,17 +104,8 @@ EXPORT_SYMBOL(get_cmd_string); static void iwl_generic_cmd_callback(struct iwl_priv *priv, struct iwl_device_cmd *cmd, - struct sk_buff *skb) + struct iwl_rx_packet *pkt) { - struct iwl_rx_packet *pkt = NULL; - - if (!skb) { - IWL_ERR(priv, "Error: Response NULL in %s.\n", - get_cmd_string(cmd->hdr.cmd)); - return; - } - - pkt = (struct iwl_rx_packet *)skb->data; if (pkt->hdr.flags & IWL_CMD_FAILED_MSK) { IWL_ERR(priv, "Bad return from %s (0x%08X)\n", get_cmd_string(cmd->hdr.cmd), pkt->hdr.flags); @@ -216,7 +207,7 @@ int iwl_send_cmd_sync(struct iwl_priv *priv, struct iwl_host_cmd *cmd) ret = -EIO; goto fail; } - if ((cmd->flags & CMD_WANT_SKB) && !cmd->reply_skb) { + if ((cmd->flags & CMD_WANT_SKB) && !cmd->reply_page) { IWL_ERR(priv, "Error: Response NULL in '%s'\n", get_cmd_string(cmd->id)); ret = -EIO; @@ -238,9 +229,9 @@ cancel: ~CMD_WANT_SKB; } fail: - if (cmd->reply_skb) { - dev_kfree_skb_any(cmd->reply_skb); - cmd->reply_skb = NULL; + if (cmd->reply_page) { + free_pages(cmd->reply_page, priv->hw_params.rx_page_order); + cmd->reply_page = 0; } out: clear_bit(STATUS_HCMD_SYNC_ACTIVE, &priv->status); @@ -273,7 +264,7 @@ int iwl_send_cmd_pdu_async(struct iwl_priv *priv, u8 id, u16 len, const void *data, void (*callback)(struct iwl_priv *priv, struct iwl_device_cmd *cmd, - struct sk_buff *skb)) + struct iwl_rx_packet *pkt)) { struct iwl_host_cmd cmd = { .id = id, diff --git a/drivers/net/wireless/iwlwifi/iwl-rx.c b/drivers/net/wireless/iwlwifi/iwl-rx.c index 493626b..5e56857 100644 --- a/drivers/net/wireless/iwlwifi/iwl-rx.c +++ b/drivers/net/wireless/iwlwifi/iwl-rx.c @@ -200,7 +200,7 @@ int iwl_rx_queue_restock(struct iwl_priv *priv) list_del(element); /* Point to Rx buffer via next RBD in circular buffer */ - rxq->bd[rxq->write] = iwl_dma_addr2rbd_ptr(priv, rxb->aligned_dma_addr); + rxq->bd[rxq->write] = iwl_dma_addr2rbd_ptr(priv, rxb->page_dma); rxq->queue[rxq->write] = rxb; rxq->write = (rxq->write + 1) & RX_QUEUE_MASK; rxq->free_count--; @@ -239,7 +239,7 @@ void iwl_rx_allocate(struct iwl_priv *priv, gfp_t priority) struct iwl_rx_queue *rxq = &priv->rxq; struct list_head *element; struct iwl_rx_mem_buffer *rxb; - struct sk_buff *skb; + struct page *page; unsigned long flags; while (1) { @@ -252,29 +252,34 @@ void iwl_rx_allocate(struct iwl_priv *priv, gfp_t priority) if (rxq->free_count > RX_LOW_WATERMARK) priority |= __GFP_NOWARN; - /* Alloc a new receive buffer */ - skb = alloc_skb(priv->hw_params.rx_buf_size + 256, - priority); - if (!skb) { + if (priv->hw_params.rx_page_order > 0) + priority |= __GFP_COMP; + + /* Alloc a new receive buffer */ + page = alloc_pages(priority, priv->hw_params.rx_page_order); + if (!page) { if (net_ratelimit()) - IWL_DEBUG_INFO(priv, "Failed to allocate SKB buffer.\n"); + IWL_DEBUG_INFO(priv, "alloc_pages failed, " + "order: %d\n", + priv->hw_params.rx_page_order); + if ((rxq->free_count <= RX_LOW_WATERMARK) && net_ratelimit()) - IWL_CRIT(priv, "Failed to allocate SKB buffer with %s. Only %u free buffers remaining.\n", + IWL_CRIT(priv, "Failed to alloc_pages with %s. Only %u free buffers remaining.\n", priority == GFP_ATOMIC ? "GFP_ATOMIC" : "GFP_KERNEL", rxq->free_count); /* We don't reschedule replenish work here -- we will * call the restock method and if it still needs * more buffers it will schedule replenish */ - break; + return; } spin_lock_irqsave(&rxq->lock, flags); if (list_empty(&rxq->rx_used)) { spin_unlock_irqrestore(&rxq->lock, flags); - dev_kfree_skb_any(skb); + __free_pages(page, priv->hw_params.rx_page_order); return; } element = rxq->rx_used.next; @@ -283,24 +288,21 @@ void iwl_rx_allocate(struct iwl_priv *priv, gfp_t priority) spin_unlock_irqrestore(&rxq->lock, flags); - rxb->skb = skb; - /* Get physical address of RB/SKB */ - rxb->real_dma_addr = pci_map_single( - priv->pci_dev, - rxb->skb->data, - priv->hw_params.rx_buf_size + 256, - PCI_DMA_FROMDEVICE); + rxb->page = page; + /* Get physical address of the RB */ + rxb->page_dma = pci_map_page(priv->pci_dev, page, 0, + PAGE_SIZE << priv->hw_params.rx_page_order, + PCI_DMA_FROMDEVICE); /* dma address must be no more than 36 bits */ - BUG_ON(rxb->real_dma_addr & ~DMA_BIT_MASK(36)); + BUG_ON(rxb->page_dma & ~DMA_BIT_MASK(36)); /* and also 256 byte aligned! */ - rxb->aligned_dma_addr = ALIGN(rxb->real_dma_addr, 256); - skb_reserve(rxb->skb, rxb->aligned_dma_addr - rxb->real_dma_addr); + BUG_ON(rxb->page_dma & DMA_BIT_MASK(8)); spin_lock_irqsave(&rxq->lock, flags); list_add_tail(&rxb->list, &rxq->rx_free); rxq->free_count++; - priv->alloc_rxb_skb++; + priv->alloc_rxb_page++; spin_unlock_irqrestore(&rxq->lock, flags); } @@ -336,12 +338,14 @@ void iwl_rx_queue_free(struct iwl_priv *priv, struct iwl_rx_queue *rxq) { int i; for (i = 0; i < RX_QUEUE_SIZE + RX_FREE_BUFFERS; i++) { - if (rxq->pool[i].skb != NULL) { - pci_unmap_single(priv->pci_dev, - rxq->pool[i].real_dma_addr, - priv->hw_params.rx_buf_size + 256, - PCI_DMA_FROMDEVICE); - dev_kfree_skb(rxq->pool[i].skb); + if (rxq->pool[i].page != NULL) { + pci_unmap_page(priv->pci_dev, rxq->pool[i].page_dma, + PAGE_SIZE << priv->hw_params.rx_page_order, + PCI_DMA_FROMDEVICE); + __free_pages(rxq->pool[i].page, + priv->hw_params.rx_page_order); + rxq->pool[i].page = NULL; + priv->alloc_rxb_page--; } } @@ -405,14 +409,14 @@ void iwl_rx_queue_reset(struct iwl_priv *priv, struct iwl_rx_queue *rxq) for (i = 0; i < RX_FREE_BUFFERS + RX_QUEUE_SIZE; i++) { /* In the reset function, these buffers may have been allocated * to an SKB, so we need to unmap and free potential storage */ - if (rxq->pool[i].skb != NULL) { - pci_unmap_single(priv->pci_dev, - rxq->pool[i].real_dma_addr, - priv->hw_params.rx_buf_size + 256, - PCI_DMA_FROMDEVICE); - priv->alloc_rxb_skb--; - dev_kfree_skb(rxq->pool[i].skb); - rxq->pool[i].skb = NULL; + if (rxq->pool[i].page != NULL) { + pci_unmap_page(priv->pci_dev, rxq->pool[i].page_dma, + PAGE_SIZE << priv->hw_params.rx_page_order, + PCI_DMA_FROMDEVICE); + priv->alloc_rxb_page--; + __free_pages(rxq->pool[i].page, + priv->hw_params.rx_page_order); + rxq->pool[i].page = NULL; } list_add_tail(&rxq->pool[i].list, &rxq->rx_used); } @@ -491,7 +495,7 @@ void iwl_rx_missed_beacon_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_missed_beacon_notif *missed_beacon; missed_beacon = &pkt->u.missed_beacon; @@ -554,7 +558,7 @@ void iwl_rx_statistics(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { int change; - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); IWL_DEBUG_RX(priv, "Statistics notification received (%d vs %d).\n", (int)sizeof(priv->statistics), @@ -878,6 +882,9 @@ static void iwl_pass_packet_to_mac80211(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb, struct ieee80211_rx_status *stats) { + struct sk_buff *skb; + int ret = 0; + /* We only process data packets if the interface is open */ if (unlikely(!priv->is_open)) { IWL_DEBUG_DROP_LIMIT(priv, @@ -890,15 +897,38 @@ static void iwl_pass_packet_to_mac80211(struct iwl_priv *priv, iwl_set_decrypted_flag(priv, hdr, ampdu_status, stats)) return; - /* Resize SKB from mac header to end of packet */ - skb_reserve(rxb->skb, (void *)hdr - (void *)rxb->skb->data); - skb_put(rxb->skb, len); + skb = alloc_skb(IWL_LINK_HDR_MAX, GFP_ATOMIC); + if (!skb) { + IWL_ERR(priv, "alloc_skb failed\n"); + return; + } + + skb_add_rx_frag(skb, 0, rxb->page, (void *)hdr - rxb_addr(rxb), len); + + /* mac80211 currently doesn't support paged SKB. Convert it to + * linear SKB for management frame and data frame requires + * software decryption or software defragementation. */ + if (ieee80211_is_mgmt(hdr->frame_control) || + ieee80211_has_protected(hdr->frame_control) || + ieee80211_has_morefrags(hdr->frame_control) || + le16_to_cpu(hdr->seq_ctrl) & IEEE80211_SCTL_FRAG) + ret = skb_linearize(skb); + else + ret = __pskb_pull_tail(skb, min_t(u16, IWL_LINK_HDR_MAX, len)) ? + 0 : -ENOMEM; + + if (ret) { + kfree_skb(skb); + goto out; + } iwl_update_stats(priv, false, hdr->frame_control, len); - memcpy(IEEE80211_SKB_RXCB(rxb->skb), stats, sizeof(*stats)); - ieee80211_rx_irqsafe(priv->hw, rxb->skb); - priv->alloc_rxb_skb--; - rxb->skb = NULL; + memcpy(IEEE80211_SKB_RXCB(skb), stats, sizeof(*stats)); + + ieee80211_rx(priv->hw, skb); + out: + priv->alloc_rxb_page--; + rxb->page = NULL; } /* This is necessary only for a number of statistics, see the caller. */ @@ -926,7 +956,7 @@ void iwl_rx_reply_rx(struct iwl_priv *priv, { struct ieee80211_hdr *header; struct ieee80211_rx_status rx_status; - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_rx_phy_res *phy_res; __le32 rx_pkt_status; struct iwl4965_rx_mpdu_res_start *amsdu; @@ -1087,7 +1117,7 @@ EXPORT_SYMBOL(iwl_rx_reply_rx); void iwl_rx_reply_rx_phy(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); priv->last_phy_res[0] = 1; memcpy(&priv->last_phy_res[1], &(pkt->u.raw[0]), sizeof(struct iwl_rx_phy_res)); diff --git a/drivers/net/wireless/iwlwifi/iwl-scan.c b/drivers/net/wireless/iwlwifi/iwl-scan.c index 4f3a108..bcccc6f 100644 --- a/drivers/net/wireless/iwlwifi/iwl-scan.c +++ b/drivers/net/wireless/iwlwifi/iwl-scan.c @@ -112,7 +112,7 @@ EXPORT_SYMBOL(iwl_scan_cancel_timeout); static int iwl_send_scan_abort(struct iwl_priv *priv) { int ret = 0; - struct iwl_rx_packet *res; + struct iwl_rx_packet *pkt; struct iwl_host_cmd cmd = { .id = REPLY_SCAN_ABORT_CMD, .flags = CMD_WANT_SKB, @@ -132,21 +132,21 @@ static int iwl_send_scan_abort(struct iwl_priv *priv) return ret; } - res = (struct iwl_rx_packet *)cmd.reply_skb->data; - if (res->u.status != CAN_ABORT_STATUS) { + pkt = (struct iwl_rx_packet *)cmd.reply_page; + if (pkt->u.status != CAN_ABORT_STATUS) { /* The scan abort will return 1 for success or * 2 for "failure". A failure condition can be * due to simply not being in an active scan which * can occur if we send the scan abort before we * the microcode has notified us that a scan is * completed. */ - IWL_DEBUG_INFO(priv, "SCAN_ABORT returned %d.\n", res->u.status); + IWL_DEBUG_INFO(priv, "SCAN_ABORT returned %d.\n", pkt->u.status); clear_bit(STATUS_SCAN_ABORTING, &priv->status); clear_bit(STATUS_SCAN_HW, &priv->status); } - priv->alloc_rxb_skb--; - dev_kfree_skb_any(cmd.reply_skb); + priv->alloc_rxb_page--; + free_pages(cmd.reply_page, priv->hw_params.rx_page_order); return ret; } @@ -156,7 +156,7 @@ static void iwl_rx_reply_scan(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { #ifdef CONFIG_IWLWIFI_DEBUG - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_scanreq_notification *notif = (struct iwl_scanreq_notification *)pkt->u.raw; @@ -168,7 +168,7 @@ static void iwl_rx_reply_scan(struct iwl_priv *priv, static void iwl_rx_scan_start_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_scanstart_notification *notif = (struct iwl_scanstart_notification *)pkt->u.raw; priv->scan_start_tsf = le32_to_cpu(notif->tsf_low); @@ -187,7 +187,7 @@ static void iwl_rx_scan_results_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { #ifdef CONFIG_IWLWIFI_DEBUG - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_scanresults_notification *notif = (struct iwl_scanresults_notification *)pkt->u.raw; @@ -214,7 +214,7 @@ static void iwl_rx_scan_complete_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { #ifdef CONFIG_IWLWIFI_DEBUG - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_scancomplete_notification *scan_notif = (void *)pkt->u.raw; IWL_DEBUG_SCAN(priv, "Scan complete: %d channels (TSF 0x%08X:%08X) - %d\n", diff --git a/drivers/net/wireless/iwlwifi/iwl-spectrum.c b/drivers/net/wireless/iwlwifi/iwl-spectrum.c index 022bcf1..1ea5cd3 100644 --- a/drivers/net/wireless/iwlwifi/iwl-spectrum.c +++ b/drivers/net/wireless/iwlwifi/iwl-spectrum.c @@ -177,7 +177,7 @@ static int iwl_get_measurement(struct iwl_priv *priv, static void iwl_rx_spectrum_measure_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_spectrum_notification *report = &(pkt->u.spectrum_notif); if (!report->state) { diff --git a/drivers/net/wireless/iwlwifi/iwl-sta.c b/drivers/net/wireless/iwlwifi/iwl-sta.c index c6633fe..dc74c16 100644 --- a/drivers/net/wireless/iwlwifi/iwl-sta.c +++ b/drivers/net/wireless/iwlwifi/iwl-sta.c @@ -99,32 +99,25 @@ static void iwl_sta_ucode_activate(struct iwl_priv *priv, u8 sta_id) static void iwl_add_sta_callback(struct iwl_priv *priv, struct iwl_device_cmd *cmd, - struct sk_buff *skb) + struct iwl_rx_packet *pkt) { - struct iwl_rx_packet *res = NULL; struct iwl_addsta_cmd *addsta = (struct iwl_addsta_cmd *)cmd->cmd.payload; u8 sta_id = addsta->sta.sta_id; - if (!skb) { - IWL_ERR(priv, "Error: Response NULL in REPLY_ADD_STA.\n"); - return; - } - - res = (struct iwl_rx_packet *)skb->data; - if (res->hdr.flags & IWL_CMD_FAILED_MSK) { + if (pkt->hdr.flags & IWL_CMD_FAILED_MSK) { IWL_ERR(priv, "Bad return from REPLY_ADD_STA (0x%08X)\n", - res->hdr.flags); + pkt->hdr.flags); return; } - switch (res->u.add_sta.status) { + switch (pkt->u.add_sta.status) { case ADD_STA_SUCCESS_MSK: iwl_sta_ucode_activate(priv, sta_id); /* fall through */ default: IWL_DEBUG_HC(priv, "Received REPLY_ADD_STA:(0x%08X)\n", - res->u.add_sta.status); + pkt->u.add_sta.status); break; } } @@ -132,7 +125,7 @@ static void iwl_add_sta_callback(struct iwl_priv *priv, int iwl_send_add_sta(struct iwl_priv *priv, struct iwl_addsta_cmd *sta, u8 flags) { - struct iwl_rx_packet *res = NULL; + struct iwl_rx_packet *pkt = NULL; int ret = 0; u8 data[sizeof(*sta)]; struct iwl_host_cmd cmd = { @@ -152,15 +145,15 @@ int iwl_send_add_sta(struct iwl_priv *priv, if (ret || (flags & CMD_ASYNC)) return ret; - res = (struct iwl_rx_packet *)cmd.reply_skb->data; - if (res->hdr.flags & IWL_CMD_FAILED_MSK) { + pkt = (struct iwl_rx_packet *)cmd.reply_page; + if (pkt->hdr.flags & IWL_CMD_FAILED_MSK) { IWL_ERR(priv, "Bad return from REPLY_ADD_STA (0x%08X)\n", - res->hdr.flags); + pkt->hdr.flags); ret = -EIO; } if (ret == 0) { - switch (res->u.add_sta.status) { + switch (pkt->u.add_sta.status) { case ADD_STA_SUCCESS_MSK: iwl_sta_ucode_activate(priv, sta->sta.sta_id); IWL_DEBUG_INFO(priv, "REPLY_ADD_STA PASSED\n"); @@ -172,8 +165,8 @@ int iwl_send_add_sta(struct iwl_priv *priv, } } - priv->alloc_rxb_skb--; - dev_kfree_skb_any(cmd.reply_skb); + priv->alloc_rxb_page--; + free_pages(cmd.reply_page, priv->hw_params.rx_page_order); return ret; } @@ -324,26 +317,19 @@ static void iwl_sta_ucode_deactivate(struct iwl_priv *priv, const char *addr) static void iwl_remove_sta_callback(struct iwl_priv *priv, struct iwl_device_cmd *cmd, - struct sk_buff *skb) + struct iwl_rx_packet *pkt) { - struct iwl_rx_packet *res = NULL; struct iwl_rem_sta_cmd *rm_sta = - (struct iwl_rem_sta_cmd *)cmd->cmd.payload; + (struct iwl_rem_sta_cmd *)cmd->cmd.payload; const char *addr = rm_sta->addr; - if (!skb) { - IWL_ERR(priv, "Error: Response NULL in REPLY_REMOVE_STA.\n"); - return; - } - - res = (struct iwl_rx_packet *)skb->data; - if (res->hdr.flags & IWL_CMD_FAILED_MSK) { + if (pkt->hdr.flags & IWL_CMD_FAILED_MSK) { IWL_ERR(priv, "Bad return from REPLY_REMOVE_STA (0x%08X)\n", - res->hdr.flags); + pkt->hdr.flags); return; } - switch (res->u.rem_sta.status) { + switch (pkt->u.rem_sta.status) { case REM_STA_SUCCESS_MSK: iwl_sta_ucode_deactivate(priv, addr); break; @@ -356,7 +342,7 @@ static void iwl_remove_sta_callback(struct iwl_priv *priv, static int iwl_send_remove_station(struct iwl_priv *priv, const u8 *addr, u8 flags) { - struct iwl_rx_packet *res = NULL; + struct iwl_rx_packet *pkt; int ret; struct iwl_rem_sta_cmd rm_sta_cmd; @@ -381,15 +367,15 @@ static int iwl_send_remove_station(struct iwl_priv *priv, const u8 *addr, if (ret || (flags & CMD_ASYNC)) return ret; - res = (struct iwl_rx_packet *)cmd.reply_skb->data; - if (res->hdr.flags & IWL_CMD_FAILED_MSK) { + pkt = (struct iwl_rx_packet *)cmd.reply_page; + if (pkt->hdr.flags & IWL_CMD_FAILED_MSK) { IWL_ERR(priv, "Bad return from REPLY_REMOVE_STA (0x%08X)\n", - res->hdr.flags); + pkt->hdr.flags); ret = -EIO; } if (!ret) { - switch (res->u.rem_sta.status) { + switch (pkt->u.rem_sta.status) { case REM_STA_SUCCESS_MSK: iwl_sta_ucode_deactivate(priv, addr); IWL_DEBUG_ASSOC(priv, "REPLY_REMOVE_STA PASSED\n"); @@ -401,8 +387,8 @@ static int iwl_send_remove_station(struct iwl_priv *priv, const u8 *addr, } } - priv->alloc_rxb_skb--; - dev_kfree_skb_any(cmd.reply_skb); + priv->alloc_rxb_page--; + free_pages(cmd.reply_page, priv->hw_params.rx_page_order); return ret; } diff --git a/drivers/net/wireless/iwlwifi/iwl-tx.c b/drivers/net/wireless/iwlwifi/iwl-tx.c index fb9bcfa..a98d60d 100644 --- a/drivers/net/wireless/iwlwifi/iwl-tx.c +++ b/drivers/net/wireless/iwlwifi/iwl-tx.c @@ -1132,7 +1132,7 @@ static void iwl_hcmd_queue_reclaim(struct iwl_priv *priv, int txq_id, */ void iwl_tx_cmd_complete(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); u16 sequence = le16_to_cpu(pkt->hdr.sequence); int txq_id = SEQ_TO_QUEUE(sequence); int index = SEQ_TO_INDEX(sequence); @@ -1159,10 +1159,10 @@ void iwl_tx_cmd_complete(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) /* Input error checking is done when commands are added to queue. */ if (meta->flags & CMD_WANT_SKB) { - meta->source->reply_skb = rxb->skb; - rxb->skb = NULL; + meta->source->reply_page = (unsigned long)rxb_addr(rxb); + rxb->page = NULL; } else if (meta->callback) - meta->callback(priv, cmd, rxb->skb); + meta->callback(priv, cmd, pkt); iwl_hcmd_queue_reclaim(priv, txq_id, index, cmd_index); @@ -1421,7 +1421,7 @@ static int iwl_tx_status_reply_compressed_ba(struct iwl_priv *priv, void iwl_rx_reply_compressed_ba(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_compressed_ba_resp *ba_resp = &pkt->u.compressed_ba; struct iwl_tx_queue *txq = NULL; struct iwl_ht_agg *agg; diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c index d00a803..e20690d 100644 --- a/drivers/net/wireless/iwlwifi/iwl3945-base.c +++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c @@ -758,7 +758,7 @@ static int iwl3945_get_measurement(struct iwl_priv *priv, u8 type) { struct iwl_spectrum_cmd spectrum; - struct iwl_rx_packet *res; + struct iwl_rx_packet *pkt; struct iwl_host_cmd cmd = { .id = REPLY_SPECTRUM_MEASUREMENT_CMD, .data = (void *)&spectrum, @@ -803,18 +803,18 @@ static int iwl3945_get_measurement(struct iwl_priv *priv, if (rc) return rc; - res = (struct iwl_rx_packet *)cmd.reply_skb->data; - if (res->hdr.flags & IWL_CMD_FAILED_MSK) { + pkt = (struct iwl_rx_packet *)cmd.reply_page; + if (pkt->hdr.flags & IWL_CMD_FAILED_MSK) { IWL_ERR(priv, "Bad return from REPLY_RX_ON_ASSOC command\n"); rc = -EIO; } - spectrum_resp_status = le16_to_cpu(res->u.spectrum.status); + spectrum_resp_status = le16_to_cpu(pkt->u.spectrum.status); switch (spectrum_resp_status) { case 0: /* Command will be handled */ - if (res->u.spectrum.id != 0xff) { + if (pkt->u.spectrum.id != 0xff) { IWL_DEBUG_INFO(priv, "Replaced existing measurement: %d\n", - res->u.spectrum.id); + pkt->u.spectrum.id); priv->measurement_status &= ~MEASUREMENT_READY; } priv->measurement_status |= MEASUREMENT_ACTIVE; @@ -826,7 +826,7 @@ static int iwl3945_get_measurement(struct iwl_priv *priv, break; } - dev_kfree_skb_any(cmd.reply_skb); + free_pages(cmd.reply_page, priv->hw_params.rx_page_order); return rc; } @@ -835,7 +835,7 @@ static int iwl3945_get_measurement(struct iwl_priv *priv, static void iwl3945_rx_reply_alive(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (void *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_alive_resp *palive; struct delayed_work *pwork; @@ -872,7 +872,7 @@ static void iwl3945_rx_reply_add_sta(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { #ifdef CONFIG_IWLWIFI_DEBUG - struct iwl_rx_packet *pkt = (void *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); #endif IWL_DEBUG_RX(priv, "Received REPLY_ADD_STA: 0x%02X\n", pkt->u.status); @@ -908,7 +908,7 @@ static void iwl3945_rx_beacon_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { #ifdef CONFIG_IWLWIFI_DEBUG - struct iwl_rx_packet *pkt = (void *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl3945_beacon_notif *beacon = &(pkt->u.beacon_status); u8 rate = beacon->beacon_notify_hdr.rate; @@ -931,7 +931,7 @@ static void iwl3945_rx_beacon_notif(struct iwl_priv *priv, static void iwl3945_rx_card_state_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (void *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); u32 flags = le32_to_cpu(pkt->u.card_state_notif.flags); unsigned long status = priv->status; @@ -1095,7 +1095,7 @@ static int iwl3945_rx_queue_restock(struct iwl_priv *priv) list_del(element); /* Point to Rx buffer via next RBD in circular buffer */ - rxq->bd[rxq->write] = iwl3945_dma_addr2rbd_ptr(priv, rxb->real_dma_addr); + rxq->bd[rxq->write] = iwl3945_dma_addr2rbd_ptr(priv, rxb->page_dma); rxq->queue[rxq->write] = rxb; rxq->write = (rxq->write + 1) & RX_QUEUE_MASK; rxq->free_count--; @@ -1135,7 +1135,7 @@ static void iwl3945_rx_allocate(struct iwl_priv *priv, gfp_t priority) struct iwl_rx_queue *rxq = &priv->rxq; struct list_head *element; struct iwl_rx_mem_buffer *rxb; - struct sk_buff *skb; + struct page *page; unsigned long flags; while (1) { @@ -1149,9 +1149,13 @@ static void iwl3945_rx_allocate(struct iwl_priv *priv, gfp_t priority) if (rxq->free_count > RX_LOW_WATERMARK) priority |= __GFP_NOWARN; + + if (priv->hw_params.rx_page_order > 0) + priority |= __GFP_COMP; + /* Alloc a new receive buffer */ - skb = alloc_skb(priv->hw_params.rx_buf_size, priority); - if (!skb) { + page = alloc_pages(priority, priv->hw_params.rx_page_order); + if (!page) { if (net_ratelimit()) IWL_DEBUG_INFO(priv, "Failed to allocate SKB buffer.\n"); if ((rxq->free_count <= RX_LOW_WATERMARK) && @@ -1168,7 +1172,7 @@ static void iwl3945_rx_allocate(struct iwl_priv *priv, gfp_t priority) spin_lock_irqsave(&rxq->lock, flags); if (list_empty(&rxq->rx_used)) { spin_unlock_irqrestore(&rxq->lock, flags); - dev_kfree_skb_any(skb); + __free_pages(page, priv->hw_params.rx_page_order); return; } element = rxq->rx_used.next; @@ -1176,26 +1180,18 @@ static void iwl3945_rx_allocate(struct iwl_priv *priv, gfp_t priority) list_del(element); spin_unlock_irqrestore(&rxq->lock, flags); - rxb->skb = skb; - - /* If radiotap head is required, reserve some headroom here. - * The physical head count is a variable rx_stats->phy_count. - * We reserve 4 bytes here. Plus these extra bytes, the - * headroom of the physical head should be enough for the - * radiotap head that iwl3945 supported. See iwl3945_rt. - */ - skb_reserve(rxb->skb, 4); - + rxb->page = page; /* Get physical address of RB/SKB */ - rxb->real_dma_addr = pci_map_single(priv->pci_dev, - rxb->skb->data, - priv->hw_params.rx_buf_size, - PCI_DMA_FROMDEVICE); + rxb->page_dma = pci_map_page(priv->pci_dev, page, 0, + PAGE_SIZE << priv->hw_params.rx_page_order, + PCI_DMA_FROMDEVICE); spin_lock_irqsave(&rxq->lock, flags); + list_add_tail(&rxb->list, &rxq->rx_free); - priv->alloc_rxb_skb++; rxq->free_count++; + priv->alloc_rxb_page++; + spin_unlock_irqrestore(&rxq->lock, flags); } } @@ -1211,14 +1207,14 @@ void iwl3945_rx_queue_reset(struct iwl_priv *priv, struct iwl_rx_queue *rxq) for (i = 0; i < RX_FREE_BUFFERS + RX_QUEUE_SIZE; i++) { /* In the reset function, these buffers may have been allocated * to an SKB, so we need to unmap and free potential storage */ - if (rxq->pool[i].skb != NULL) { - pci_unmap_single(priv->pci_dev, - rxq->pool[i].real_dma_addr, - priv->hw_params.rx_buf_size, - PCI_DMA_FROMDEVICE); - priv->alloc_rxb_skb--; - dev_kfree_skb(rxq->pool[i].skb); - rxq->pool[i].skb = NULL; + if (rxq->pool[i].page != NULL) { + pci_unmap_page(priv->pci_dev, rxq->pool[i].page_dma, + PAGE_SIZE << priv->hw_params.rx_page_order, + PCI_DMA_FROMDEVICE); + priv->alloc_rxb_page--; + __free_pages(rxq->pool[i].page, + priv->hw_params.rx_page_order); + rxq->pool[i].page = NULL; } list_add_tail(&rxq->pool[i].list, &rxq->rx_used); } @@ -1226,8 +1222,8 @@ void iwl3945_rx_queue_reset(struct iwl_priv *priv, struct iwl_rx_queue *rxq) /* Set us so that we have processed and used all buffers, but have * not restocked the Rx queue with fresh buffers */ rxq->read = rxq->write = 0; - rxq->free_count = 0; rxq->write_actual = 0; + rxq->free_count = 0; spin_unlock_irqrestore(&rxq->lock, flags); } @@ -1260,12 +1256,14 @@ static void iwl3945_rx_queue_free(struct iwl_priv *priv, struct iwl_rx_queue *rx { int i; for (i = 0; i < RX_QUEUE_SIZE + RX_FREE_BUFFERS; i++) { - if (rxq->pool[i].skb != NULL) { - pci_unmap_single(priv->pci_dev, - rxq->pool[i].real_dma_addr, - priv->hw_params.rx_buf_size, - PCI_DMA_FROMDEVICE); - dev_kfree_skb(rxq->pool[i].skb); + if (rxq->pool[i].page != NULL) { + pci_unmap_page(priv->pci_dev, rxq->pool[i].page_dma, + PAGE_SIZE << priv->hw_params.rx_page_order, + PCI_DMA_FROMDEVICE); + __free_pages(rxq->pool[i].page, + priv->hw_params.rx_page_order); + rxq->pool[i].page = NULL; + priv->alloc_rxb_page--; } } @@ -1401,10 +1399,10 @@ static void iwl3945_rx_handle(struct iwl_priv *priv) rxq->queue[i] = NULL; - pci_unmap_single(priv->pci_dev, rxb->real_dma_addr, - priv->hw_params.rx_buf_size, - PCI_DMA_FROMDEVICE); - pkt = (struct iwl_rx_packet *)rxb->skb->data; + pci_unmap_page(priv->pci_dev, rxb->page_dma, + PAGE_SIZE << priv->hw_params.rx_page_order, + PCI_DMA_FROMDEVICE); + pkt = rxb_addr(rxb); /* Reclaim a command buffer only if this packet is a response * to a (driver-originated) command. @@ -1426,16 +1424,17 @@ static void iwl3945_rx_handle(struct iwl_priv *priv) priv->isr_stats.rx_handlers[pkt->hdr.cmd]++; } else { /* No handling needed */ - IWL_DEBUG_RX(priv, "r %d i %d No handler needed for %s, 0x%02x\n", + IWL_DEBUG_RX(priv, + "r %d i %d No handler needed for %s, 0x%02x\n", r, i, get_cmd_string(pkt->hdr.cmd), pkt->hdr.cmd); } if (reclaim) { - /* Invoke any callbacks, transfer the skb to caller, and - * fire off the (possibly) blocking iwl_send_cmd() + /* Invoke any callbacks, transfer the buffer to caller, + * and fire off the (possibly) blocking iwl_send_cmd() * as we reclaim the driver command queue */ - if (rxb && rxb->skb) + if (rxb && rxb->page) iwl_tx_cmd_complete(priv, rxb); else IWL_WARN(priv, "Claim null rxb?\n"); @@ -1444,10 +1443,10 @@ static void iwl3945_rx_handle(struct iwl_priv *priv) /* For now we just don't re-use anything. We can tweak this * later to try and re-use notification packets and SKBs that * fail to Rx correctly */ - if (rxb->skb != NULL) { - priv->alloc_rxb_skb--; - dev_kfree_skb_any(rxb->skb); - rxb->skb = NULL; + if (rxb->page != NULL) { + priv->alloc_rxb_page--; + __free_pages(rxb->page, priv->hw_params.rx_page_order); + rxb->page = NULL; } spin_lock_irqsave(&rxq->lock, flags); @@ -1685,6 +1684,8 @@ static void iwl3945_irq_tasklet(struct iwl_priv *priv) } #endif + spin_unlock_irqrestore(&priv->lock, flags); + /* Since CSR_INT and CSR_FH_INT_STATUS reads and clears are not * atomic, make sure that inta covers all the interrupts that * we've discovered, even if FH interrupt came in just after @@ -1706,8 +1707,6 @@ static void iwl3945_irq_tasklet(struct iwl_priv *priv) handled |= CSR_INT_BIT_HW_ERR; - spin_unlock_irqrestore(&priv->lock, flags); - return; } @@ -1799,7 +1798,6 @@ static void iwl3945_irq_tasklet(struct iwl_priv *priv) "flags 0x%08lx\n", inta, inta_mask, inta_fh, flags); } #endif - spin_unlock_irqrestore(&priv->lock, flags); } static int iwl3945_get_channels_for_scan(struct iwl_priv *priv, -- 1.5.6.3 [-- Attachment #3: 0002-iwlwifi-fix-use-after-free-bug-for-paged-rx.patch --] [-- Type: text/x-patch, Size: 8708 bytes --] >From 000c60eef9bf7a579c02ccb7deee447a2231d2b0 Mon Sep 17 00:00:00 2001 From: Zhu Yi <yi.zhu@intel.com> Date: Thu, 15 Oct 2009 20:00:57 -0700 Subject: [PATCH 2/2] iwlwifi: fix use after free bug for paged rx In the paged rx patch (4854fde2), I introduced a bug that could possibly touch an already freed page. It is fixed by avoiding the access in this patch. I've also added some comments so that other people touching the code won't make the same mistake. In the future, if we cannot avoid access the page after being handled to the upper layer, we can use get_page/put_page to handle it. For now, it's just not necessary. It also fixed a debug message print bug reported by Stanislaw Gruszka <sgruszka@redhat.com>. Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> --- drivers/net/wireless/iwlwifi/iwl-3945.c | 16 +++++++++++----- drivers/net/wireless/iwlwifi/iwl-agn.c | 11 +++++++++-- drivers/net/wireless/iwlwifi/iwl-rx.c | 21 ++++++++++++++------- drivers/net/wireless/iwlwifi/iwl3945-base.c | 18 +++++++++++++----- 4 files changed, 47 insertions(+), 19 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/iwl-3945.c b/drivers/net/wireless/iwlwifi/iwl-3945.c index 7d5962d..4406650 100644 --- a/drivers/net/wireless/iwlwifi/iwl-3945.c +++ b/drivers/net/wireless/iwlwifi/iwl-3945.c @@ -552,6 +552,7 @@ static void iwl3945_pass_packet_to_mac80211(struct iwl_priv *priv, u16 len = le16_to_cpu(rx_hdr->len); struct sk_buff *skb; int ret; + __le16 fc = hdr->frame_control; /* We received data from the HW, so stop the watchdog */ if (unlikely(len + IWL39_RX_FRAME_SIZE > @@ -584,9 +585,9 @@ static void iwl3945_pass_packet_to_mac80211(struct iwl_priv *priv, /* mac80211 currently doesn't support paged SKB. Convert it to * linear SKB for management frame and data frame requires * software decryption or software defragementation. */ - if (ieee80211_is_mgmt(hdr->frame_control) || - ieee80211_has_protected(hdr->frame_control) || - ieee80211_has_morefrags(hdr->frame_control) || + if (ieee80211_is_mgmt(fc) || + ieee80211_has_protected(fc) || + ieee80211_has_morefrags(fc) || le16_to_cpu(hdr->seq_ctrl) & IEEE80211_SCTL_FRAG) ret = skb_linearize(skb); else @@ -598,11 +599,16 @@ static void iwl3945_pass_packet_to_mac80211(struct iwl_priv *priv, goto out; } + /* + * XXX: We cannot touch the page and its virtual memory (pkt) after + * here. It might have already been freed by the above skb change. + */ + #ifdef CONFIG_IWLWIFI_LEDS - if (ieee80211_is_data(hdr->frame_control)) + if (ieee80211_is_data(fc)) priv->rxtxpackets += len; #endif - iwl_update_stats(priv, false, hdr->frame_control, len); + iwl_update_stats(priv, false, fc, len); memcpy(IEEE80211_SKB_RXCB(skb), stats, sizeof(*stats)); ieee80211_rx(priv->hw, skb); diff --git a/drivers/net/wireless/iwlwifi/iwl-agn.c b/drivers/net/wireless/iwlwifi/iwl-agn.c index c5ff7c0..475f677 100644 --- a/drivers/net/wireless/iwlwifi/iwl-agn.c +++ b/drivers/net/wireless/iwlwifi/iwl-agn.c @@ -808,8 +808,8 @@ void iwl_rx_handle(struct iwl_priv *priv) if (priv->rx_handlers[pkt->hdr.cmd]) { IWL_DEBUG_RX(priv, "r = %d, i = %d, %s, 0x%02x\n", r, i, get_cmd_string(pkt->hdr.cmd), pkt->hdr.cmd); - priv->rx_handlers[pkt->hdr.cmd] (priv, rxb); priv->isr_stats.rx_handlers[pkt->hdr.cmd]++; + priv->rx_handlers[pkt->hdr.cmd] (priv, rxb); } else { /* No handling needed */ IWL_DEBUG_RX(priv, @@ -818,11 +818,18 @@ void iwl_rx_handle(struct iwl_priv *priv) pkt->hdr.cmd); } + /* + * XXX: After here, we should always check rxb->page + * against NULL before touching it or its virtual + * memory (pkt). Because some rx_handler might have + * already taken or freed the pages. + */ + if (reclaim) { /* Invoke any callbacks, transfer the buffer to caller, * and fire off the (possibly) blocking iwl_send_cmd() * as we reclaim the driver command queue */ - if (rxb && rxb->page) + if (rxb->page) iwl_tx_cmd_complete(priv, rxb); else IWL_WARN(priv, "Claim null rxb?\n"); diff --git a/drivers/net/wireless/iwlwifi/iwl-rx.c b/drivers/net/wireless/iwlwifi/iwl-rx.c index 5e56857..2663689 100644 --- a/drivers/net/wireless/iwlwifi/iwl-rx.c +++ b/drivers/net/wireless/iwlwifi/iwl-rx.c @@ -241,6 +241,7 @@ void iwl_rx_allocate(struct iwl_priv *priv, gfp_t priority) struct iwl_rx_mem_buffer *rxb; struct page *page; unsigned long flags; + gfp_t gfp_mask = priority; while (1) { spin_lock_irqsave(&rxq->lock, flags); @@ -251,13 +252,13 @@ void iwl_rx_allocate(struct iwl_priv *priv, gfp_t priority) spin_unlock_irqrestore(&rxq->lock, flags); if (rxq->free_count > RX_LOW_WATERMARK) - priority |= __GFP_NOWARN; + gfp_mask |= __GFP_NOWARN; if (priv->hw_params.rx_page_order > 0) - priority |= __GFP_COMP; + gfp_mask |= __GFP_COMP; /* Alloc a new receive buffer */ - page = alloc_pages(priority, priv->hw_params.rx_page_order); + page = alloc_pages(gfp_mask, priv->hw_params.rx_page_order); if (!page) { if (net_ratelimit()) IWL_DEBUG_INFO(priv, "alloc_pages failed, " @@ -884,6 +885,7 @@ static void iwl_pass_packet_to_mac80211(struct iwl_priv *priv, { struct sk_buff *skb; int ret = 0; + __le16 fc = hdr->frame_control; /* We only process data packets if the interface is open */ if (unlikely(!priv->is_open)) { @@ -908,9 +910,9 @@ static void iwl_pass_packet_to_mac80211(struct iwl_priv *priv, /* mac80211 currently doesn't support paged SKB. Convert it to * linear SKB for management frame and data frame requires * software decryption or software defragementation. */ - if (ieee80211_is_mgmt(hdr->frame_control) || - ieee80211_has_protected(hdr->frame_control) || - ieee80211_has_morefrags(hdr->frame_control) || + if (ieee80211_is_mgmt(fc) || + ieee80211_has_protected(fc) || + ieee80211_has_morefrags(fc) || le16_to_cpu(hdr->seq_ctrl) & IEEE80211_SCTL_FRAG) ret = skb_linearize(skb); else @@ -922,7 +924,12 @@ static void iwl_pass_packet_to_mac80211(struct iwl_priv *priv, goto out; } - iwl_update_stats(priv, false, hdr->frame_control, len); + /* + * XXX: We cannot touch the page and its virtual memory (hdr) after + * here. It might have already been freed by the above skb change. + */ + + iwl_update_stats(priv, false, fc, len); memcpy(IEEE80211_SKB_RXCB(skb), stats, sizeof(*stats)); ieee80211_rx(priv->hw, skb); diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c index e20690d..5ae8698 100644 --- a/drivers/net/wireless/iwlwifi/iwl3945-base.c +++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c @@ -1137,6 +1137,7 @@ static void iwl3945_rx_allocate(struct iwl_priv *priv, gfp_t priority) struct iwl_rx_mem_buffer *rxb; struct page *page; unsigned long flags; + gfp_t gfp_mask = priority; while (1) { spin_lock_irqsave(&rxq->lock, flags); @@ -1148,13 +1149,13 @@ static void iwl3945_rx_allocate(struct iwl_priv *priv, gfp_t priority) spin_unlock_irqrestore(&rxq->lock, flags); if (rxq->free_count > RX_LOW_WATERMARK) - priority |= __GFP_NOWARN; + gfp_mask |= __GFP_NOWARN; if (priv->hw_params.rx_page_order > 0) - priority |= __GFP_COMP; + gfp_mask |= __GFP_COMP; /* Alloc a new receive buffer */ - page = alloc_pages(priority, priv->hw_params.rx_page_order); + page = alloc_pages(gfp_mask, priv->hw_params.rx_page_order); if (!page) { if (net_ratelimit()) IWL_DEBUG_INFO(priv, "Failed to allocate SKB buffer.\n"); @@ -1420,8 +1421,8 @@ static void iwl3945_rx_handle(struct iwl_priv *priv) if (priv->rx_handlers[pkt->hdr.cmd]) { IWL_DEBUG_RX(priv, "r = %d, i = %d, %s, 0x%02x\n", r, i, get_cmd_string(pkt->hdr.cmd), pkt->hdr.cmd); - priv->rx_handlers[pkt->hdr.cmd] (priv, rxb); priv->isr_stats.rx_handlers[pkt->hdr.cmd]++; + priv->rx_handlers[pkt->hdr.cmd] (priv, rxb); } else { /* No handling needed */ IWL_DEBUG_RX(priv, @@ -1430,11 +1431,18 @@ static void iwl3945_rx_handle(struct iwl_priv *priv) pkt->hdr.cmd); } + /* + * XXX: After here, we should always check rxb->page + * against NULL before touching it or its virtual + * memory (pkt). Because some rx_handler might have + * already taken or freed the pages. + */ + if (reclaim) { /* Invoke any callbacks, transfer the buffer to caller, * and fire off the (possibly) blocking iwl_send_cmd() * as we reclaim the driver command queue */ - if (rxb && rxb->page) + if (rxb->page) iwl_tx_cmd_complete(priv, rxb); else IWL_WARN(priv, "Claim null rxb?\n"); -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-17 5:42 ` reinette chatre 0 siblings, 0 replies; 384+ messages in thread From: reinette chatre @ 2009-10-17 5:42 UTC (permalink / raw) To: Frans Pop Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg [-- Attachment #1: Type: text/plain, Size: 1102 bytes --] Hi Frans, On Thu, 2009-10-15 at 12:41 -0700, Frans Pop wrote: > On Thursday 15 October 2009, reinette chatre wrote: > > > The log file timestamps don't tell much as the logging gets delayed, > > > so they all end up at the same time. Maybe I should enable the kernel > > > timestamps so we can see how far apart these failures are. > > > > If you can get accurate timing it will be very useful. I am interested > > to see how quickly it goes from "48 free buffers" to "0 free buffers". > > Attached the dmesg for three consecutive test runs (i.e. without > rebooting). Not that the 2nd one includes only "0 free buffers" messages, > even though the behavior (point where desktop freezes and music stops) > looked similar. > > Not sure if you can tell all that much from the data. > Prompted by this thread we are in process of moving allocation to paged skb. This will definitely reduce the allocation size (from order 2 to order 1) and hopefully help with this problem also. Could you please try with the attached two patches? They are based on 2.6.32-rc4. Thank you very much Reinette [-- Attachment #2: 0001-iwlwifi-use-paged-Rx.patch --] [-- Type: text/x-patch, Size: 50820 bytes --] From d94fc37fb25aacec8a41e3d14ec333fa5a8f681e Mon Sep 17 00:00:00 2001 From: Zhu Yi <yi.zhu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> Date: Fri, 9 Oct 2009 17:19:45 +0800 Subject: [PATCH 1/2] iwlwifi: use paged Rx This switches the iwlwifi driver to use paged skb from linear skb for Rx buffer. So that it relieves some Rx buffer allocation pressure for the memory subsystem. Currently iwlwifi (4K for 3945) requests 8K bytes for Rx buffer. Due to the trailing skb_shared_info in the skb->data, alloc_skb() will do the next order allocation, which is 16K bytes. This is suboptimal and more likely to fail when the system is under memory usage pressure. Switching to paged Rx skb lets us allocate the RXB directly by alloc_pages(), so that only order 1 allocation is required. It also adjusts the area spin_lock (with IRQ disabled) protected in the tasklet because tasklet guarentees to run only on one CPU and the new unprotected code can be preempted by the IRQ handler. This saves us from spawning another workqueue to make skb_linearize/__pskb_pull_tail happy (which cannot be called in hard irq context). Finally, mac80211 doesn't support paged Rx yet. So we linearize the skb for all the management frames and software decryption or defragmentation required data frames before handed to mac80211. For all the other frames, we __pskb_pull_tail 64 bytes in the linear area of the skb for mac80211 to handle them properly. Signed-off-by: Zhu Yi <yi.zhu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> Signed-off-by: John W. Linville <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org> --- drivers/net/wireless/iwlwifi/iwl-3945.c | 67 ++++++++++----- drivers/net/wireless/iwlwifi/iwl-4965.c | 2 +- drivers/net/wireless/iwlwifi/iwl-5000.c | 4 +- drivers/net/wireless/iwlwifi/iwl-agn.c | 42 ++++----- drivers/net/wireless/iwlwifi/iwl-commands.h | 10 ++ drivers/net/wireless/iwlwifi/iwl-core.c | 13 ++-- drivers/net/wireless/iwlwifi/iwl-core.h | 2 +- drivers/net/wireless/iwlwifi/iwl-dev.h | 27 ++++-- drivers/net/wireless/iwlwifi/iwl-hcmd.c | 21 ++---- drivers/net/wireless/iwlwifi/iwl-rx.c | 122 +++++++++++++++++---------- drivers/net/wireless/iwlwifi/iwl-scan.c | 20 ++-- drivers/net/wireless/iwlwifi/iwl-spectrum.c | 2 +- drivers/net/wireless/iwlwifi/iwl-sta.c | 62 +++++-------- drivers/net/wireless/iwlwifi/iwl-tx.c | 10 +- drivers/net/wireless/iwlwifi/iwl3945-base.c | 120 +++++++++++++------------- 15 files changed, 284 insertions(+), 240 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/iwl-3945.c b/drivers/net/wireless/iwlwifi/iwl-3945.c index f059b49..7d5962d 100644 --- a/drivers/net/wireless/iwlwifi/iwl-3945.c +++ b/drivers/net/wireless/iwlwifi/iwl-3945.c @@ -293,7 +293,7 @@ static void iwl3945_tx_queue_reclaim(struct iwl_priv *priv, static void iwl3945_rx_reply_tx(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (void *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); u16 sequence = le16_to_cpu(pkt->hdr.sequence); int txq_id = SEQ_TO_QUEUE(sequence); int index = SEQ_TO_INDEX(sequence); @@ -353,7 +353,7 @@ static void iwl3945_rx_reply_tx(struct iwl_priv *priv, void iwl3945_hw_rx_statistics(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (void *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); IWL_DEBUG_RX(priv, "Statistics notification received (%d vs %d).\n", (int)sizeof(struct iwl3945_notif_statistics), le32_to_cpu(pkt->len_n_flags) & FH_RSCSR_FRAME_SIZE_MSK); @@ -545,14 +545,17 @@ static void iwl3945_pass_packet_to_mac80211(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb, struct ieee80211_rx_status *stats) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)IWL_RX_DATA(pkt); struct iwl3945_rx_frame_hdr *rx_hdr = IWL_RX_HDR(pkt); struct iwl3945_rx_frame_end *rx_end = IWL_RX_END(pkt); - short len = le16_to_cpu(rx_hdr->len); + u16 len = le16_to_cpu(rx_hdr->len); + struct sk_buff *skb; + int ret; /* We received data from the HW, so stop the watchdog */ - if (unlikely((len + IWL39_RX_FRAME_SIZE) > skb_tailroom(rxb->skb))) { + if (unlikely(len + IWL39_RX_FRAME_SIZE > + PAGE_SIZE << priv->hw_params.rx_page_order)) { IWL_DEBUG_DROP(priv, "Corruption detected!\n"); return; } @@ -564,24 +567,49 @@ static void iwl3945_pass_packet_to_mac80211(struct iwl_priv *priv, return; } - skb_reserve(rxb->skb, (void *)rx_hdr->payload - (void *)pkt); - /* Set the size of the skb to the size of the frame */ - skb_put(rxb->skb, le16_to_cpu(rx_hdr->len)); + skb = alloc_skb(IWL_LINK_HDR_MAX, GFP_ATOMIC); + if (!skb) { + IWL_ERR(priv, "alloc_skb failed\n"); + return; + } if (!iwl3945_mod_params.sw_crypto) iwl_set_decrypted_flag(priv, - (struct ieee80211_hdr *)rxb->skb->data, + (struct ieee80211_hdr *)rxb_addr(rxb), le32_to_cpu(rx_end->status), stats); + skb_add_rx_frag(skb, 0, rxb->page, + (void *)rx_hdr->payload - (void *)pkt, len); + + /* mac80211 currently doesn't support paged SKB. Convert it to + * linear SKB for management frame and data frame requires + * software decryption or software defragementation. */ + if (ieee80211_is_mgmt(hdr->frame_control) || + ieee80211_has_protected(hdr->frame_control) || + ieee80211_has_morefrags(hdr->frame_control) || + le16_to_cpu(hdr->seq_ctrl) & IEEE80211_SCTL_FRAG) + ret = skb_linearize(skb); + else + ret = __pskb_pull_tail(skb, min_t(u16, IWL_LINK_HDR_MAX, len)) ? + 0 : -ENOMEM; + + if (ret) { + kfree_skb(skb); + goto out; + } + #ifdef CONFIG_IWLWIFI_LEDS if (ieee80211_is_data(hdr->frame_control)) priv->rxtxpackets += len; #endif iwl_update_stats(priv, false, hdr->frame_control, len); - memcpy(IEEE80211_SKB_RXCB(rxb->skb), stats, sizeof(*stats)); - ieee80211_rx_irqsafe(priv->hw, rxb->skb); - rxb->skb = NULL; + memcpy(IEEE80211_SKB_RXCB(skb), stats, sizeof(*stats)); + ieee80211_rx(priv->hw, skb); + + out: + priv->alloc_rxb_page--; + rxb->page = NULL; } #define IWL_DELAY_NEXT_SCAN_AFTER_ASSOC (HZ*6) @@ -591,7 +619,7 @@ static void iwl3945_rx_reply_rx(struct iwl_priv *priv, { struct ieee80211_hdr *header; struct ieee80211_rx_status rx_status; - struct iwl_rx_packet *pkt = (void *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl3945_rx_frame_stats *rx_stats = IWL_RX_STATS(pkt); struct iwl3945_rx_frame_hdr *rx_hdr = IWL_RX_HDR(pkt); struct iwl3945_rx_frame_end *rx_end = IWL_RX_END(pkt); @@ -1858,7 +1886,7 @@ int iwl3945_hw_reg_set_txpower(struct iwl_priv *priv, s8 power) static int iwl3945_send_rxon_assoc(struct iwl_priv *priv) { int rc = 0; - struct iwl_rx_packet *res = NULL; + struct iwl_rx_packet *pkt; struct iwl3945_rxon_assoc_cmd rxon_assoc; struct iwl_host_cmd cmd = { .id = REPLY_RXON_ASSOC, @@ -1887,14 +1915,14 @@ static int iwl3945_send_rxon_assoc(struct iwl_priv *priv) if (rc) return rc; - res = (struct iwl_rx_packet *)cmd.reply_skb->data; - if (res->hdr.flags & IWL_CMD_FAILED_MSK) { + pkt = (struct iwl_rx_packet *)cmd.reply_page; + if (pkt->hdr.flags & IWL_CMD_FAILED_MSK) { IWL_ERR(priv, "Bad return from REPLY_RXON_ASSOC command\n"); rc = -EIO; } - priv->alloc_rxb_skb--; - dev_kfree_skb_any(cmd.reply_skb); + priv->alloc_rxb_page--; + free_pages(cmd.reply_page, priv->hw_params.rx_page_order); return rc; } @@ -2560,8 +2588,7 @@ int iwl3945_hw_set_hw_params(struct iwl_priv *priv) priv->hw_params.max_txq_num = IWL39_NUM_QUEUES; priv->hw_params.tfd_size = sizeof(struct iwl3945_tfd); - priv->hw_params.rx_buf_size = IWL_RX_BUF_SIZE_3K; - priv->hw_params.max_pkt_size = 2342; + priv->hw_params.rx_page_order = get_order(IWL_RX_BUF_SIZE_3K); priv->hw_params.max_rxq_size = RX_QUEUE_SIZE; priv->hw_params.max_rxq_log = RX_QUEUE_SIZE_LOG; priv->hw_params.max_stations = IWL3945_STATION_COUNT; diff --git a/drivers/net/wireless/iwlwifi/iwl-4965.c b/drivers/net/wireless/iwlwifi/iwl-4965.c index 6f703a0..e7c67d8 100644 --- a/drivers/net/wireless/iwlwifi/iwl-4965.c +++ b/drivers/net/wireless/iwlwifi/iwl-4965.c @@ -2078,7 +2078,7 @@ static int iwl4965_tx_status_reply_tx(struct iwl_priv *priv, static void iwl4965_rx_reply_tx(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); u16 sequence = le16_to_cpu(pkt->hdr.sequence); int txq_id = SEQ_TO_QUEUE(sequence); int index = SEQ_TO_INDEX(sequence); diff --git a/drivers/net/wireless/iwlwifi/iwl-5000.c b/drivers/net/wireless/iwlwifi/iwl-5000.c index 6e6f516..29dfe27 100644 --- a/drivers/net/wireless/iwlwifi/iwl-5000.c +++ b/drivers/net/wireless/iwlwifi/iwl-5000.c @@ -493,7 +493,7 @@ static int iwl5000_send_calib_cfg(struct iwl_priv *priv) static void iwl5000_rx_calib_result(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (void *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_calib_hdr *hdr = (struct iwl_calib_hdr *)pkt->u.raw; int len = le32_to_cpu(pkt->len_n_flags) & FH_RSCSR_FRAME_SIZE_MSK; int index; @@ -1218,7 +1218,7 @@ static int iwl5000_tx_status_reply_tx(struct iwl_priv *priv, static void iwl5000_rx_reply_tx(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); u16 sequence = le16_to_cpu(pkt->hdr.sequence); int txq_id = SEQ_TO_QUEUE(sequence); int index = SEQ_TO_INDEX(sequence); diff --git a/drivers/net/wireless/iwlwifi/iwl-agn.c b/drivers/net/wireless/iwlwifi/iwl-agn.c index eaafae0..c5ff7c0 100644 --- a/drivers/net/wireless/iwlwifi/iwl-agn.c +++ b/drivers/net/wireless/iwlwifi/iwl-agn.c @@ -521,7 +521,7 @@ int iwl_hw_tx_queue_init(struct iwl_priv *priv, static void iwl_rx_reply_alive(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_alive_resp *palive; struct delayed_work *pwork; @@ -607,7 +607,7 @@ static void iwl_rx_beacon_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { #ifdef CONFIG_IWLWIFI_DEBUG - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl4965_beacon_notif *beacon = (struct iwl4965_beacon_notif *)pkt->u.raw; u8 rate = iwl_hw_get_rate(beacon->beacon_notify_hdr.rate_n_flags); @@ -631,7 +631,7 @@ static void iwl_rx_beacon_notif(struct iwl_priv *priv, static void iwl_rx_card_state_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); u32 flags = le32_to_cpu(pkt->u.card_state_notif.flags); unsigned long status = priv->status; @@ -783,10 +783,10 @@ void iwl_rx_handle(struct iwl_priv *priv) rxq->queue[i] = NULL; - pci_unmap_single(priv->pci_dev, rxb->real_dma_addr, - priv->hw_params.rx_buf_size + 256, - PCI_DMA_FROMDEVICE); - pkt = (struct iwl_rx_packet *)rxb->skb->data; + pci_unmap_page(priv->pci_dev, rxb->page_dma, + PAGE_SIZE << priv->hw_params.rx_page_order, + PCI_DMA_FROMDEVICE); + pkt = rxb_addr(rxb); /* Reclaim a command buffer only if this packet is a response * to a (driver-originated) command. @@ -819,10 +819,10 @@ void iwl_rx_handle(struct iwl_priv *priv) } if (reclaim) { - /* Invoke any callbacks, transfer the skb to caller, and - * fire off the (possibly) blocking iwl_send_cmd() + /* Invoke any callbacks, transfer the buffer to caller, + * and fire off the (possibly) blocking iwl_send_cmd() * as we reclaim the driver command queue */ - if (rxb && rxb->skb) + if (rxb && rxb->page) iwl_tx_cmd_complete(priv, rxb); else IWL_WARN(priv, "Claim null rxb?\n"); @@ -831,10 +831,10 @@ void iwl_rx_handle(struct iwl_priv *priv) /* For now we just don't re-use anything. We can tweak this * later to try and re-use notification packets and SKBs that * fail to Rx correctly */ - if (rxb->skb != NULL) { - priv->alloc_rxb_skb--; - dev_kfree_skb_any(rxb->skb); - rxb->skb = NULL; + if (rxb->page != NULL) { + priv->alloc_rxb_page--; + __free_pages(rxb->page, priv->hw_params.rx_page_order); + rxb->page = NULL; } spin_lock_irqsave(&rxq->lock, flags); @@ -901,6 +901,8 @@ static void iwl_irq_tasklet_legacy(struct iwl_priv *priv) } #endif + spin_unlock_irqrestore(&priv->lock, flags); + /* Since CSR_INT and CSR_FH_INT_STATUS reads and clears are not * atomic, make sure that inta covers all the interrupts that * we've discovered, even if FH interrupt came in just after @@ -922,8 +924,6 @@ static void iwl_irq_tasklet_legacy(struct iwl_priv *priv) handled |= CSR_INT_BIT_HW_ERR; - spin_unlock_irqrestore(&priv->lock, flags); - return; } @@ -1050,7 +1050,6 @@ static void iwl_irq_tasklet_legacy(struct iwl_priv *priv) "flags 0x%08lx\n", inta, inta_mask, inta_fh, flags); } #endif - spin_unlock_irqrestore(&priv->lock, flags); } /* tasklet for iwlagn interrupt */ @@ -1080,6 +1079,9 @@ static void iwl_irq_tasklet(struct iwl_priv *priv) inta, inta_mask); } #endif + + spin_unlock_irqrestore(&priv->lock, flags); + /* saved interrupt in inta variable now we can reset priv->inta */ priv->inta = 0; @@ -1095,8 +1097,6 @@ static void iwl_irq_tasklet(struct iwl_priv *priv) handled |= CSR_INT_BIT_HW_ERR; - spin_unlock_irqrestore(&priv->lock, flags); - return; } @@ -1236,14 +1236,10 @@ static void iwl_irq_tasklet(struct iwl_priv *priv) inta & ~priv->inta_mask); } - /* Re-enable all interrupts */ /* only Re-enable if diabled by irq */ if (test_bit(STATUS_INT_ENABLED, &priv->status)) iwl_enable_interrupts(priv); - - spin_unlock_irqrestore(&priv->lock, flags); - } diff --git a/drivers/net/wireless/iwlwifi/iwl-commands.h b/drivers/net/wireless/iwlwifi/iwl-commands.h index 4afaf77..dd54bf2 100644 --- a/drivers/net/wireless/iwlwifi/iwl-commands.h +++ b/drivers/net/wireless/iwlwifi/iwl-commands.h @@ -3495,6 +3495,16 @@ struct iwl_wimax_coex_cmd { *****************************************************************************/ struct iwl_rx_packet { + /* + * The first 4 bytes of the RX frame header contain both the RX frame + * size and some flags. + * Bit fields: + * 31: flag flush RB request + * 30: flag ignore TC (terminal counter) request + * 29: flag fast IRQ request + * 28-14: Reserved + * 13-00: RX frame size + */ __le32 len_n_flags; struct iwl_cmd_header hdr; union { diff --git a/drivers/net/wireless/iwlwifi/iwl-core.c b/drivers/net/wireless/iwlwifi/iwl-core.c index 2dc9287..bb9ff29 100644 --- a/drivers/net/wireless/iwlwifi/iwl-core.c +++ b/drivers/net/wireless/iwlwifi/iwl-core.c @@ -1281,7 +1281,7 @@ static void iwl_set_rate(struct iwl_priv *priv) void iwl_rx_csa(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_rxon_cmd *rxon = (void *)&priv->active_rxon; struct iwl_csa_notification *csa = &(pkt->u.csa_notif); IWL_DEBUG_11H(priv, "CSA notif: channel %d, status %d\n", @@ -1456,10 +1456,9 @@ int iwl_set_hw_params(struct iwl_priv *priv) priv->hw_params.max_rxq_size = RX_QUEUE_SIZE; priv->hw_params.max_rxq_log = RX_QUEUE_SIZE_LOG; if (priv->cfg->mod_params->amsdu_size_8K) - priv->hw_params.rx_buf_size = IWL_RX_BUF_SIZE_8K; + priv->hw_params.rx_page_order = get_order(IWL_RX_BUF_SIZE_8K); else - priv->hw_params.rx_buf_size = IWL_RX_BUF_SIZE_4K; - priv->hw_params.max_pkt_size = priv->hw_params.rx_buf_size - 256; + priv->hw_params.rx_page_order = get_order(IWL_RX_BUF_SIZE_4K); priv->hw_params.max_beacon_itrvl = IWL_MAX_UCODE_BEACON_INTERVAL; @@ -2143,7 +2142,7 @@ void iwl_rx_pm_sleep_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { #ifdef CONFIG_IWLWIFI_DEBUG - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_sleep_notification *sleep = &(pkt->u.sleep_notif); IWL_DEBUG_RX(priv, "sleep mode: %d, src: %d\n", sleep->pm_sleep_mode, sleep->pm_wakeup_src); @@ -2154,7 +2153,7 @@ EXPORT_SYMBOL(iwl_rx_pm_sleep_notif); void iwl_rx_pm_debug_statistics_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); u32 len = le32_to_cpu(pkt->len_n_flags) & FH_RSCSR_FRAME_SIZE_MSK; IWL_DEBUG_RADIO(priv, "Dumping %d bytes of unhandled " "notification for %s:\n", len, @@ -2166,7 +2165,7 @@ EXPORT_SYMBOL(iwl_rx_pm_debug_statistics_notif); void iwl_rx_reply_error(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); IWL_ERR(priv, "Error Reply type 0x%08X cmd %s (0x%02X) " "seq 0x%04X ser 0x%08X\n", diff --git a/drivers/net/wireless/iwlwifi/iwl-core.h b/drivers/net/wireless/iwlwifi/iwl-core.h index e50103a..d95674e 100644 --- a/drivers/net/wireless/iwlwifi/iwl-core.h +++ b/drivers/net/wireless/iwlwifi/iwl-core.h @@ -509,7 +509,7 @@ int iwl_send_cmd_pdu_async(struct iwl_priv *priv, u8 id, u16 len, const void *data, void (*callback)(struct iwl_priv *priv, struct iwl_device_cmd *cmd, - struct sk_buff *skb)); + struct iwl_rx_packet *pkt)); int iwl_enqueue_hcmd(struct iwl_priv *priv, struct iwl_host_cmd *cmd); diff --git a/drivers/net/wireless/iwlwifi/iwl-dev.h b/drivers/net/wireless/iwlwifi/iwl-dev.h index 028d505..7fb1688 100644 --- a/drivers/net/wireless/iwlwifi/iwl-dev.h +++ b/drivers/net/wireless/iwlwifi/iwl-dev.h @@ -144,12 +144,13 @@ extern void iwl5000_temperature(struct iwl_priv *priv); #define DEFAULT_LONG_RETRY_LIMIT 4U struct iwl_rx_mem_buffer { - dma_addr_t real_dma_addr; - dma_addr_t aligned_dma_addr; - struct sk_buff *skb; + dma_addr_t page_dma; + struct page *page; struct list_head list; }; +#define rxb_addr(r) page_address(r->page) + /* defined below */ struct iwl_device_cmd; @@ -165,7 +166,7 @@ struct iwl_cmd_meta { */ void (*callback)(struct iwl_priv *priv, struct iwl_device_cmd *cmd, - struct sk_buff *skb); + struct iwl_rx_packet *pkt); /* The CMD_SIZE_HUGE flag bit indicates that the command * structure is stored at the end of the shared queue memory. */ @@ -358,6 +359,13 @@ enum { #define IWL_CMD_MAX_PAYLOAD 320 +/* + * IWL_LINK_HDR_MAX should include ieee80211_hdr, radiotap header, + * SNAP header and alignment. It should also be big enough for 802.11 + * control frames. + */ +#define IWL_LINK_HDR_MAX 64 + /** * struct iwl_device_cmd * @@ -382,10 +390,10 @@ struct iwl_device_cmd { struct iwl_host_cmd { const void *data; - struct sk_buff *reply_skb; + unsigned long reply_page; void (*callback)(struct iwl_priv *priv, struct iwl_device_cmd *cmd, - struct sk_buff *skb); + struct iwl_rx_packet *pkt); u32 flags; u16 len; u8 id; @@ -639,7 +647,7 @@ struct iwl_sensitivity_ranges { * @valid_tx/rx_ant: usable antennas * @max_rxq_size: Max # Rx frames in Rx queue (must be power-of-2) * @max_rxq_log: Log-base-2 of max_rxq_size - * @rx_buf_size: Rx buffer size + * @rx_page_order: Rx buffer page order * @rx_wrt_ptr_reg: FH{39}_RSCSR_CHNL0_WPTR * @max_stations: * @bcast_sta_id: @@ -662,9 +670,8 @@ struct iwl_hw_params { u8 valid_rx_ant; u16 max_rxq_size; u16 max_rxq_log; - u32 rx_buf_size; + u32 rx_page_order; u32 rx_wrt_ptr_reg; - u32 max_pkt_size; u8 max_stations; u8 bcast_sta_id; u8 ht40_channel; @@ -976,7 +983,7 @@ struct iwl_priv { int frames_count; enum ieee80211_band band; - int alloc_rxb_skb; + int alloc_rxb_page; void (*rx_handlers[REPLY_MAX])(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb); diff --git a/drivers/net/wireless/iwlwifi/iwl-hcmd.c b/drivers/net/wireless/iwlwifi/iwl-hcmd.c index a6856da..1bf17d2 100644 --- a/drivers/net/wireless/iwlwifi/iwl-hcmd.c +++ b/drivers/net/wireless/iwlwifi/iwl-hcmd.c @@ -104,17 +104,8 @@ EXPORT_SYMBOL(get_cmd_string); static void iwl_generic_cmd_callback(struct iwl_priv *priv, struct iwl_device_cmd *cmd, - struct sk_buff *skb) + struct iwl_rx_packet *pkt) { - struct iwl_rx_packet *pkt = NULL; - - if (!skb) { - IWL_ERR(priv, "Error: Response NULL in %s.\n", - get_cmd_string(cmd->hdr.cmd)); - return; - } - - pkt = (struct iwl_rx_packet *)skb->data; if (pkt->hdr.flags & IWL_CMD_FAILED_MSK) { IWL_ERR(priv, "Bad return from %s (0x%08X)\n", get_cmd_string(cmd->hdr.cmd), pkt->hdr.flags); @@ -216,7 +207,7 @@ int iwl_send_cmd_sync(struct iwl_priv *priv, struct iwl_host_cmd *cmd) ret = -EIO; goto fail; } - if ((cmd->flags & CMD_WANT_SKB) && !cmd->reply_skb) { + if ((cmd->flags & CMD_WANT_SKB) && !cmd->reply_page) { IWL_ERR(priv, "Error: Response NULL in '%s'\n", get_cmd_string(cmd->id)); ret = -EIO; @@ -238,9 +229,9 @@ cancel: ~CMD_WANT_SKB; } fail: - if (cmd->reply_skb) { - dev_kfree_skb_any(cmd->reply_skb); - cmd->reply_skb = NULL; + if (cmd->reply_page) { + free_pages(cmd->reply_page, priv->hw_params.rx_page_order); + cmd->reply_page = 0; } out: clear_bit(STATUS_HCMD_SYNC_ACTIVE, &priv->status); @@ -273,7 +264,7 @@ int iwl_send_cmd_pdu_async(struct iwl_priv *priv, u8 id, u16 len, const void *data, void (*callback)(struct iwl_priv *priv, struct iwl_device_cmd *cmd, - struct sk_buff *skb)) + struct iwl_rx_packet *pkt)) { struct iwl_host_cmd cmd = { .id = id, diff --git a/drivers/net/wireless/iwlwifi/iwl-rx.c b/drivers/net/wireless/iwlwifi/iwl-rx.c index 493626b..5e56857 100644 --- a/drivers/net/wireless/iwlwifi/iwl-rx.c +++ b/drivers/net/wireless/iwlwifi/iwl-rx.c @@ -200,7 +200,7 @@ int iwl_rx_queue_restock(struct iwl_priv *priv) list_del(element); /* Point to Rx buffer via next RBD in circular buffer */ - rxq->bd[rxq->write] = iwl_dma_addr2rbd_ptr(priv, rxb->aligned_dma_addr); + rxq->bd[rxq->write] = iwl_dma_addr2rbd_ptr(priv, rxb->page_dma); rxq->queue[rxq->write] = rxb; rxq->write = (rxq->write + 1) & RX_QUEUE_MASK; rxq->free_count--; @@ -239,7 +239,7 @@ void iwl_rx_allocate(struct iwl_priv *priv, gfp_t priority) struct iwl_rx_queue *rxq = &priv->rxq; struct list_head *element; struct iwl_rx_mem_buffer *rxb; - struct sk_buff *skb; + struct page *page; unsigned long flags; while (1) { @@ -252,29 +252,34 @@ void iwl_rx_allocate(struct iwl_priv *priv, gfp_t priority) if (rxq->free_count > RX_LOW_WATERMARK) priority |= __GFP_NOWARN; - /* Alloc a new receive buffer */ - skb = alloc_skb(priv->hw_params.rx_buf_size + 256, - priority); - if (!skb) { + if (priv->hw_params.rx_page_order > 0) + priority |= __GFP_COMP; + + /* Alloc a new receive buffer */ + page = alloc_pages(priority, priv->hw_params.rx_page_order); + if (!page) { if (net_ratelimit()) - IWL_DEBUG_INFO(priv, "Failed to allocate SKB buffer.\n"); + IWL_DEBUG_INFO(priv, "alloc_pages failed, " + "order: %d\n", + priv->hw_params.rx_page_order); + if ((rxq->free_count <= RX_LOW_WATERMARK) && net_ratelimit()) - IWL_CRIT(priv, "Failed to allocate SKB buffer with %s. Only %u free buffers remaining.\n", + IWL_CRIT(priv, "Failed to alloc_pages with %s. Only %u free buffers remaining.\n", priority == GFP_ATOMIC ? "GFP_ATOMIC" : "GFP_KERNEL", rxq->free_count); /* We don't reschedule replenish work here -- we will * call the restock method and if it still needs * more buffers it will schedule replenish */ - break; + return; } spin_lock_irqsave(&rxq->lock, flags); if (list_empty(&rxq->rx_used)) { spin_unlock_irqrestore(&rxq->lock, flags); - dev_kfree_skb_any(skb); + __free_pages(page, priv->hw_params.rx_page_order); return; } element = rxq->rx_used.next; @@ -283,24 +288,21 @@ void iwl_rx_allocate(struct iwl_priv *priv, gfp_t priority) spin_unlock_irqrestore(&rxq->lock, flags); - rxb->skb = skb; - /* Get physical address of RB/SKB */ - rxb->real_dma_addr = pci_map_single( - priv->pci_dev, - rxb->skb->data, - priv->hw_params.rx_buf_size + 256, - PCI_DMA_FROMDEVICE); + rxb->page = page; + /* Get physical address of the RB */ + rxb->page_dma = pci_map_page(priv->pci_dev, page, 0, + PAGE_SIZE << priv->hw_params.rx_page_order, + PCI_DMA_FROMDEVICE); /* dma address must be no more than 36 bits */ - BUG_ON(rxb->real_dma_addr & ~DMA_BIT_MASK(36)); + BUG_ON(rxb->page_dma & ~DMA_BIT_MASK(36)); /* and also 256 byte aligned! */ - rxb->aligned_dma_addr = ALIGN(rxb->real_dma_addr, 256); - skb_reserve(rxb->skb, rxb->aligned_dma_addr - rxb->real_dma_addr); + BUG_ON(rxb->page_dma & DMA_BIT_MASK(8)); spin_lock_irqsave(&rxq->lock, flags); list_add_tail(&rxb->list, &rxq->rx_free); rxq->free_count++; - priv->alloc_rxb_skb++; + priv->alloc_rxb_page++; spin_unlock_irqrestore(&rxq->lock, flags); } @@ -336,12 +338,14 @@ void iwl_rx_queue_free(struct iwl_priv *priv, struct iwl_rx_queue *rxq) { int i; for (i = 0; i < RX_QUEUE_SIZE + RX_FREE_BUFFERS; i++) { - if (rxq->pool[i].skb != NULL) { - pci_unmap_single(priv->pci_dev, - rxq->pool[i].real_dma_addr, - priv->hw_params.rx_buf_size + 256, - PCI_DMA_FROMDEVICE); - dev_kfree_skb(rxq->pool[i].skb); + if (rxq->pool[i].page != NULL) { + pci_unmap_page(priv->pci_dev, rxq->pool[i].page_dma, + PAGE_SIZE << priv->hw_params.rx_page_order, + PCI_DMA_FROMDEVICE); + __free_pages(rxq->pool[i].page, + priv->hw_params.rx_page_order); + rxq->pool[i].page = NULL; + priv->alloc_rxb_page--; } } @@ -405,14 +409,14 @@ void iwl_rx_queue_reset(struct iwl_priv *priv, struct iwl_rx_queue *rxq) for (i = 0; i < RX_FREE_BUFFERS + RX_QUEUE_SIZE; i++) { /* In the reset function, these buffers may have been allocated * to an SKB, so we need to unmap and free potential storage */ - if (rxq->pool[i].skb != NULL) { - pci_unmap_single(priv->pci_dev, - rxq->pool[i].real_dma_addr, - priv->hw_params.rx_buf_size + 256, - PCI_DMA_FROMDEVICE); - priv->alloc_rxb_skb--; - dev_kfree_skb(rxq->pool[i].skb); - rxq->pool[i].skb = NULL; + if (rxq->pool[i].page != NULL) { + pci_unmap_page(priv->pci_dev, rxq->pool[i].page_dma, + PAGE_SIZE << priv->hw_params.rx_page_order, + PCI_DMA_FROMDEVICE); + priv->alloc_rxb_page--; + __free_pages(rxq->pool[i].page, + priv->hw_params.rx_page_order); + rxq->pool[i].page = NULL; } list_add_tail(&rxq->pool[i].list, &rxq->rx_used); } @@ -491,7 +495,7 @@ void iwl_rx_missed_beacon_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_missed_beacon_notif *missed_beacon; missed_beacon = &pkt->u.missed_beacon; @@ -554,7 +558,7 @@ void iwl_rx_statistics(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { int change; - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); IWL_DEBUG_RX(priv, "Statistics notification received (%d vs %d).\n", (int)sizeof(priv->statistics), @@ -878,6 +882,9 @@ static void iwl_pass_packet_to_mac80211(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb, struct ieee80211_rx_status *stats) { + struct sk_buff *skb; + int ret = 0; + /* We only process data packets if the interface is open */ if (unlikely(!priv->is_open)) { IWL_DEBUG_DROP_LIMIT(priv, @@ -890,15 +897,38 @@ static void iwl_pass_packet_to_mac80211(struct iwl_priv *priv, iwl_set_decrypted_flag(priv, hdr, ampdu_status, stats)) return; - /* Resize SKB from mac header to end of packet */ - skb_reserve(rxb->skb, (void *)hdr - (void *)rxb->skb->data); - skb_put(rxb->skb, len); + skb = alloc_skb(IWL_LINK_HDR_MAX, GFP_ATOMIC); + if (!skb) { + IWL_ERR(priv, "alloc_skb failed\n"); + return; + } + + skb_add_rx_frag(skb, 0, rxb->page, (void *)hdr - rxb_addr(rxb), len); + + /* mac80211 currently doesn't support paged SKB. Convert it to + * linear SKB for management frame and data frame requires + * software decryption or software defragementation. */ + if (ieee80211_is_mgmt(hdr->frame_control) || + ieee80211_has_protected(hdr->frame_control) || + ieee80211_has_morefrags(hdr->frame_control) || + le16_to_cpu(hdr->seq_ctrl) & IEEE80211_SCTL_FRAG) + ret = skb_linearize(skb); + else + ret = __pskb_pull_tail(skb, min_t(u16, IWL_LINK_HDR_MAX, len)) ? + 0 : -ENOMEM; + + if (ret) { + kfree_skb(skb); + goto out; + } iwl_update_stats(priv, false, hdr->frame_control, len); - memcpy(IEEE80211_SKB_RXCB(rxb->skb), stats, sizeof(*stats)); - ieee80211_rx_irqsafe(priv->hw, rxb->skb); - priv->alloc_rxb_skb--; - rxb->skb = NULL; + memcpy(IEEE80211_SKB_RXCB(skb), stats, sizeof(*stats)); + + ieee80211_rx(priv->hw, skb); + out: + priv->alloc_rxb_page--; + rxb->page = NULL; } /* This is necessary only for a number of statistics, see the caller. */ @@ -926,7 +956,7 @@ void iwl_rx_reply_rx(struct iwl_priv *priv, { struct ieee80211_hdr *header; struct ieee80211_rx_status rx_status; - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_rx_phy_res *phy_res; __le32 rx_pkt_status; struct iwl4965_rx_mpdu_res_start *amsdu; @@ -1087,7 +1117,7 @@ EXPORT_SYMBOL(iwl_rx_reply_rx); void iwl_rx_reply_rx_phy(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); priv->last_phy_res[0] = 1; memcpy(&priv->last_phy_res[1], &(pkt->u.raw[0]), sizeof(struct iwl_rx_phy_res)); diff --git a/drivers/net/wireless/iwlwifi/iwl-scan.c b/drivers/net/wireless/iwlwifi/iwl-scan.c index 4f3a108..bcccc6f 100644 --- a/drivers/net/wireless/iwlwifi/iwl-scan.c +++ b/drivers/net/wireless/iwlwifi/iwl-scan.c @@ -112,7 +112,7 @@ EXPORT_SYMBOL(iwl_scan_cancel_timeout); static int iwl_send_scan_abort(struct iwl_priv *priv) { int ret = 0; - struct iwl_rx_packet *res; + struct iwl_rx_packet *pkt; struct iwl_host_cmd cmd = { .id = REPLY_SCAN_ABORT_CMD, .flags = CMD_WANT_SKB, @@ -132,21 +132,21 @@ static int iwl_send_scan_abort(struct iwl_priv *priv) return ret; } - res = (struct iwl_rx_packet *)cmd.reply_skb->data; - if (res->u.status != CAN_ABORT_STATUS) { + pkt = (struct iwl_rx_packet *)cmd.reply_page; + if (pkt->u.status != CAN_ABORT_STATUS) { /* The scan abort will return 1 for success or * 2 for "failure". A failure condition can be * due to simply not being in an active scan which * can occur if we send the scan abort before we * the microcode has notified us that a scan is * completed. */ - IWL_DEBUG_INFO(priv, "SCAN_ABORT returned %d.\n", res->u.status); + IWL_DEBUG_INFO(priv, "SCAN_ABORT returned %d.\n", pkt->u.status); clear_bit(STATUS_SCAN_ABORTING, &priv->status); clear_bit(STATUS_SCAN_HW, &priv->status); } - priv->alloc_rxb_skb--; - dev_kfree_skb_any(cmd.reply_skb); + priv->alloc_rxb_page--; + free_pages(cmd.reply_page, priv->hw_params.rx_page_order); return ret; } @@ -156,7 +156,7 @@ static void iwl_rx_reply_scan(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { #ifdef CONFIG_IWLWIFI_DEBUG - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_scanreq_notification *notif = (struct iwl_scanreq_notification *)pkt->u.raw; @@ -168,7 +168,7 @@ static void iwl_rx_reply_scan(struct iwl_priv *priv, static void iwl_rx_scan_start_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_scanstart_notification *notif = (struct iwl_scanstart_notification *)pkt->u.raw; priv->scan_start_tsf = le32_to_cpu(notif->tsf_low); @@ -187,7 +187,7 @@ static void iwl_rx_scan_results_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { #ifdef CONFIG_IWLWIFI_DEBUG - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_scanresults_notification *notif = (struct iwl_scanresults_notification *)pkt->u.raw; @@ -214,7 +214,7 @@ static void iwl_rx_scan_complete_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { #ifdef CONFIG_IWLWIFI_DEBUG - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_scancomplete_notification *scan_notif = (void *)pkt->u.raw; IWL_DEBUG_SCAN(priv, "Scan complete: %d channels (TSF 0x%08X:%08X) - %d\n", diff --git a/drivers/net/wireless/iwlwifi/iwl-spectrum.c b/drivers/net/wireless/iwlwifi/iwl-spectrum.c index 022bcf1..1ea5cd3 100644 --- a/drivers/net/wireless/iwlwifi/iwl-spectrum.c +++ b/drivers/net/wireless/iwlwifi/iwl-spectrum.c @@ -177,7 +177,7 @@ static int iwl_get_measurement(struct iwl_priv *priv, static void iwl_rx_spectrum_measure_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_spectrum_notification *report = &(pkt->u.spectrum_notif); if (!report->state) { diff --git a/drivers/net/wireless/iwlwifi/iwl-sta.c b/drivers/net/wireless/iwlwifi/iwl-sta.c index c6633fe..dc74c16 100644 --- a/drivers/net/wireless/iwlwifi/iwl-sta.c +++ b/drivers/net/wireless/iwlwifi/iwl-sta.c @@ -99,32 +99,25 @@ static void iwl_sta_ucode_activate(struct iwl_priv *priv, u8 sta_id) static void iwl_add_sta_callback(struct iwl_priv *priv, struct iwl_device_cmd *cmd, - struct sk_buff *skb) + struct iwl_rx_packet *pkt) { - struct iwl_rx_packet *res = NULL; struct iwl_addsta_cmd *addsta = (struct iwl_addsta_cmd *)cmd->cmd.payload; u8 sta_id = addsta->sta.sta_id; - if (!skb) { - IWL_ERR(priv, "Error: Response NULL in REPLY_ADD_STA.\n"); - return; - } - - res = (struct iwl_rx_packet *)skb->data; - if (res->hdr.flags & IWL_CMD_FAILED_MSK) { + if (pkt->hdr.flags & IWL_CMD_FAILED_MSK) { IWL_ERR(priv, "Bad return from REPLY_ADD_STA (0x%08X)\n", - res->hdr.flags); + pkt->hdr.flags); return; } - switch (res->u.add_sta.status) { + switch (pkt->u.add_sta.status) { case ADD_STA_SUCCESS_MSK: iwl_sta_ucode_activate(priv, sta_id); /* fall through */ default: IWL_DEBUG_HC(priv, "Received REPLY_ADD_STA:(0x%08X)\n", - res->u.add_sta.status); + pkt->u.add_sta.status); break; } } @@ -132,7 +125,7 @@ static void iwl_add_sta_callback(struct iwl_priv *priv, int iwl_send_add_sta(struct iwl_priv *priv, struct iwl_addsta_cmd *sta, u8 flags) { - struct iwl_rx_packet *res = NULL; + struct iwl_rx_packet *pkt = NULL; int ret = 0; u8 data[sizeof(*sta)]; struct iwl_host_cmd cmd = { @@ -152,15 +145,15 @@ int iwl_send_add_sta(struct iwl_priv *priv, if (ret || (flags & CMD_ASYNC)) return ret; - res = (struct iwl_rx_packet *)cmd.reply_skb->data; - if (res->hdr.flags & IWL_CMD_FAILED_MSK) { + pkt = (struct iwl_rx_packet *)cmd.reply_page; + if (pkt->hdr.flags & IWL_CMD_FAILED_MSK) { IWL_ERR(priv, "Bad return from REPLY_ADD_STA (0x%08X)\n", - res->hdr.flags); + pkt->hdr.flags); ret = -EIO; } if (ret == 0) { - switch (res->u.add_sta.status) { + switch (pkt->u.add_sta.status) { case ADD_STA_SUCCESS_MSK: iwl_sta_ucode_activate(priv, sta->sta.sta_id); IWL_DEBUG_INFO(priv, "REPLY_ADD_STA PASSED\n"); @@ -172,8 +165,8 @@ int iwl_send_add_sta(struct iwl_priv *priv, } } - priv->alloc_rxb_skb--; - dev_kfree_skb_any(cmd.reply_skb); + priv->alloc_rxb_page--; + free_pages(cmd.reply_page, priv->hw_params.rx_page_order); return ret; } @@ -324,26 +317,19 @@ static void iwl_sta_ucode_deactivate(struct iwl_priv *priv, const char *addr) static void iwl_remove_sta_callback(struct iwl_priv *priv, struct iwl_device_cmd *cmd, - struct sk_buff *skb) + struct iwl_rx_packet *pkt) { - struct iwl_rx_packet *res = NULL; struct iwl_rem_sta_cmd *rm_sta = - (struct iwl_rem_sta_cmd *)cmd->cmd.payload; + (struct iwl_rem_sta_cmd *)cmd->cmd.payload; const char *addr = rm_sta->addr; - if (!skb) { - IWL_ERR(priv, "Error: Response NULL in REPLY_REMOVE_STA.\n"); - return; - } - - res = (struct iwl_rx_packet *)skb->data; - if (res->hdr.flags & IWL_CMD_FAILED_MSK) { + if (pkt->hdr.flags & IWL_CMD_FAILED_MSK) { IWL_ERR(priv, "Bad return from REPLY_REMOVE_STA (0x%08X)\n", - res->hdr.flags); + pkt->hdr.flags); return; } - switch (res->u.rem_sta.status) { + switch (pkt->u.rem_sta.status) { case REM_STA_SUCCESS_MSK: iwl_sta_ucode_deactivate(priv, addr); break; @@ -356,7 +342,7 @@ static void iwl_remove_sta_callback(struct iwl_priv *priv, static int iwl_send_remove_station(struct iwl_priv *priv, const u8 *addr, u8 flags) { - struct iwl_rx_packet *res = NULL; + struct iwl_rx_packet *pkt; int ret; struct iwl_rem_sta_cmd rm_sta_cmd; @@ -381,15 +367,15 @@ static int iwl_send_remove_station(struct iwl_priv *priv, const u8 *addr, if (ret || (flags & CMD_ASYNC)) return ret; - res = (struct iwl_rx_packet *)cmd.reply_skb->data; - if (res->hdr.flags & IWL_CMD_FAILED_MSK) { + pkt = (struct iwl_rx_packet *)cmd.reply_page; + if (pkt->hdr.flags & IWL_CMD_FAILED_MSK) { IWL_ERR(priv, "Bad return from REPLY_REMOVE_STA (0x%08X)\n", - res->hdr.flags); + pkt->hdr.flags); ret = -EIO; } if (!ret) { - switch (res->u.rem_sta.status) { + switch (pkt->u.rem_sta.status) { case REM_STA_SUCCESS_MSK: iwl_sta_ucode_deactivate(priv, addr); IWL_DEBUG_ASSOC(priv, "REPLY_REMOVE_STA PASSED\n"); @@ -401,8 +387,8 @@ static int iwl_send_remove_station(struct iwl_priv *priv, const u8 *addr, } } - priv->alloc_rxb_skb--; - dev_kfree_skb_any(cmd.reply_skb); + priv->alloc_rxb_page--; + free_pages(cmd.reply_page, priv->hw_params.rx_page_order); return ret; } diff --git a/drivers/net/wireless/iwlwifi/iwl-tx.c b/drivers/net/wireless/iwlwifi/iwl-tx.c index fb9bcfa..a98d60d 100644 --- a/drivers/net/wireless/iwlwifi/iwl-tx.c +++ b/drivers/net/wireless/iwlwifi/iwl-tx.c @@ -1132,7 +1132,7 @@ static void iwl_hcmd_queue_reclaim(struct iwl_priv *priv, int txq_id, */ void iwl_tx_cmd_complete(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); u16 sequence = le16_to_cpu(pkt->hdr.sequence); int txq_id = SEQ_TO_QUEUE(sequence); int index = SEQ_TO_INDEX(sequence); @@ -1159,10 +1159,10 @@ void iwl_tx_cmd_complete(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) /* Input error checking is done when commands are added to queue. */ if (meta->flags & CMD_WANT_SKB) { - meta->source->reply_skb = rxb->skb; - rxb->skb = NULL; + meta->source->reply_page = (unsigned long)rxb_addr(rxb); + rxb->page = NULL; } else if (meta->callback) - meta->callback(priv, cmd, rxb->skb); + meta->callback(priv, cmd, pkt); iwl_hcmd_queue_reclaim(priv, txq_id, index, cmd_index); @@ -1421,7 +1421,7 @@ static int iwl_tx_status_reply_compressed_ba(struct iwl_priv *priv, void iwl_rx_reply_compressed_ba(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (struct iwl_rx_packet *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_compressed_ba_resp *ba_resp = &pkt->u.compressed_ba; struct iwl_tx_queue *txq = NULL; struct iwl_ht_agg *agg; diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c index d00a803..e20690d 100644 --- a/drivers/net/wireless/iwlwifi/iwl3945-base.c +++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c @@ -758,7 +758,7 @@ static int iwl3945_get_measurement(struct iwl_priv *priv, u8 type) { struct iwl_spectrum_cmd spectrum; - struct iwl_rx_packet *res; + struct iwl_rx_packet *pkt; struct iwl_host_cmd cmd = { .id = REPLY_SPECTRUM_MEASUREMENT_CMD, .data = (void *)&spectrum, @@ -803,18 +803,18 @@ static int iwl3945_get_measurement(struct iwl_priv *priv, if (rc) return rc; - res = (struct iwl_rx_packet *)cmd.reply_skb->data; - if (res->hdr.flags & IWL_CMD_FAILED_MSK) { + pkt = (struct iwl_rx_packet *)cmd.reply_page; + if (pkt->hdr.flags & IWL_CMD_FAILED_MSK) { IWL_ERR(priv, "Bad return from REPLY_RX_ON_ASSOC command\n"); rc = -EIO; } - spectrum_resp_status = le16_to_cpu(res->u.spectrum.status); + spectrum_resp_status = le16_to_cpu(pkt->u.spectrum.status); switch (spectrum_resp_status) { case 0: /* Command will be handled */ - if (res->u.spectrum.id != 0xff) { + if (pkt->u.spectrum.id != 0xff) { IWL_DEBUG_INFO(priv, "Replaced existing measurement: %d\n", - res->u.spectrum.id); + pkt->u.spectrum.id); priv->measurement_status &= ~MEASUREMENT_READY; } priv->measurement_status |= MEASUREMENT_ACTIVE; @@ -826,7 +826,7 @@ static int iwl3945_get_measurement(struct iwl_priv *priv, break; } - dev_kfree_skb_any(cmd.reply_skb); + free_pages(cmd.reply_page, priv->hw_params.rx_page_order); return rc; } @@ -835,7 +835,7 @@ static int iwl3945_get_measurement(struct iwl_priv *priv, static void iwl3945_rx_reply_alive(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (void *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_alive_resp *palive; struct delayed_work *pwork; @@ -872,7 +872,7 @@ static void iwl3945_rx_reply_add_sta(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { #ifdef CONFIG_IWLWIFI_DEBUG - struct iwl_rx_packet *pkt = (void *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); #endif IWL_DEBUG_RX(priv, "Received REPLY_ADD_STA: 0x%02X\n", pkt->u.status); @@ -908,7 +908,7 @@ static void iwl3945_rx_beacon_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { #ifdef CONFIG_IWLWIFI_DEBUG - struct iwl_rx_packet *pkt = (void *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl3945_beacon_notif *beacon = &(pkt->u.beacon_status); u8 rate = beacon->beacon_notify_hdr.rate; @@ -931,7 +931,7 @@ static void iwl3945_rx_beacon_notif(struct iwl_priv *priv, static void iwl3945_rx_card_state_notif(struct iwl_priv *priv, struct iwl_rx_mem_buffer *rxb) { - struct iwl_rx_packet *pkt = (void *)rxb->skb->data; + struct iwl_rx_packet *pkt = rxb_addr(rxb); u32 flags = le32_to_cpu(pkt->u.card_state_notif.flags); unsigned long status = priv->status; @@ -1095,7 +1095,7 @@ static int iwl3945_rx_queue_restock(struct iwl_priv *priv) list_del(element); /* Point to Rx buffer via next RBD in circular buffer */ - rxq->bd[rxq->write] = iwl3945_dma_addr2rbd_ptr(priv, rxb->real_dma_addr); + rxq->bd[rxq->write] = iwl3945_dma_addr2rbd_ptr(priv, rxb->page_dma); rxq->queue[rxq->write] = rxb; rxq->write = (rxq->write + 1) & RX_QUEUE_MASK; rxq->free_count--; @@ -1135,7 +1135,7 @@ static void iwl3945_rx_allocate(struct iwl_priv *priv, gfp_t priority) struct iwl_rx_queue *rxq = &priv->rxq; struct list_head *element; struct iwl_rx_mem_buffer *rxb; - struct sk_buff *skb; + struct page *page; unsigned long flags; while (1) { @@ -1149,9 +1149,13 @@ static void iwl3945_rx_allocate(struct iwl_priv *priv, gfp_t priority) if (rxq->free_count > RX_LOW_WATERMARK) priority |= __GFP_NOWARN; + + if (priv->hw_params.rx_page_order > 0) + priority |= __GFP_COMP; + /* Alloc a new receive buffer */ - skb = alloc_skb(priv->hw_params.rx_buf_size, priority); - if (!skb) { + page = alloc_pages(priority, priv->hw_params.rx_page_order); + if (!page) { if (net_ratelimit()) IWL_DEBUG_INFO(priv, "Failed to allocate SKB buffer.\n"); if ((rxq->free_count <= RX_LOW_WATERMARK) && @@ -1168,7 +1172,7 @@ static void iwl3945_rx_allocate(struct iwl_priv *priv, gfp_t priority) spin_lock_irqsave(&rxq->lock, flags); if (list_empty(&rxq->rx_used)) { spin_unlock_irqrestore(&rxq->lock, flags); - dev_kfree_skb_any(skb); + __free_pages(page, priv->hw_params.rx_page_order); return; } element = rxq->rx_used.next; @@ -1176,26 +1180,18 @@ static void iwl3945_rx_allocate(struct iwl_priv *priv, gfp_t priority) list_del(element); spin_unlock_irqrestore(&rxq->lock, flags); - rxb->skb = skb; - - /* If radiotap head is required, reserve some headroom here. - * The physical head count is a variable rx_stats->phy_count. - * We reserve 4 bytes here. Plus these extra bytes, the - * headroom of the physical head should be enough for the - * radiotap head that iwl3945 supported. See iwl3945_rt. - */ - skb_reserve(rxb->skb, 4); - + rxb->page = page; /* Get physical address of RB/SKB */ - rxb->real_dma_addr = pci_map_single(priv->pci_dev, - rxb->skb->data, - priv->hw_params.rx_buf_size, - PCI_DMA_FROMDEVICE); + rxb->page_dma = pci_map_page(priv->pci_dev, page, 0, + PAGE_SIZE << priv->hw_params.rx_page_order, + PCI_DMA_FROMDEVICE); spin_lock_irqsave(&rxq->lock, flags); + list_add_tail(&rxb->list, &rxq->rx_free); - priv->alloc_rxb_skb++; rxq->free_count++; + priv->alloc_rxb_page++; + spin_unlock_irqrestore(&rxq->lock, flags); } } @@ -1211,14 +1207,14 @@ void iwl3945_rx_queue_reset(struct iwl_priv *priv, struct iwl_rx_queue *rxq) for (i = 0; i < RX_FREE_BUFFERS + RX_QUEUE_SIZE; i++) { /* In the reset function, these buffers may have been allocated * to an SKB, so we need to unmap and free potential storage */ - if (rxq->pool[i].skb != NULL) { - pci_unmap_single(priv->pci_dev, - rxq->pool[i].real_dma_addr, - priv->hw_params.rx_buf_size, - PCI_DMA_FROMDEVICE); - priv->alloc_rxb_skb--; - dev_kfree_skb(rxq->pool[i].skb); - rxq->pool[i].skb = NULL; + if (rxq->pool[i].page != NULL) { + pci_unmap_page(priv->pci_dev, rxq->pool[i].page_dma, + PAGE_SIZE << priv->hw_params.rx_page_order, + PCI_DMA_FROMDEVICE); + priv->alloc_rxb_page--; + __free_pages(rxq->pool[i].page, + priv->hw_params.rx_page_order); + rxq->pool[i].page = NULL; } list_add_tail(&rxq->pool[i].list, &rxq->rx_used); } @@ -1226,8 +1222,8 @@ void iwl3945_rx_queue_reset(struct iwl_priv *priv, struct iwl_rx_queue *rxq) /* Set us so that we have processed and used all buffers, but have * not restocked the Rx queue with fresh buffers */ rxq->read = rxq->write = 0; - rxq->free_count = 0; rxq->write_actual = 0; + rxq->free_count = 0; spin_unlock_irqrestore(&rxq->lock, flags); } @@ -1260,12 +1256,14 @@ static void iwl3945_rx_queue_free(struct iwl_priv *priv, struct iwl_rx_queue *rx { int i; for (i = 0; i < RX_QUEUE_SIZE + RX_FREE_BUFFERS; i++) { - if (rxq->pool[i].skb != NULL) { - pci_unmap_single(priv->pci_dev, - rxq->pool[i].real_dma_addr, - priv->hw_params.rx_buf_size, - PCI_DMA_FROMDEVICE); - dev_kfree_skb(rxq->pool[i].skb); + if (rxq->pool[i].page != NULL) { + pci_unmap_page(priv->pci_dev, rxq->pool[i].page_dma, + PAGE_SIZE << priv->hw_params.rx_page_order, + PCI_DMA_FROMDEVICE); + __free_pages(rxq->pool[i].page, + priv->hw_params.rx_page_order); + rxq->pool[i].page = NULL; + priv->alloc_rxb_page--; } } @@ -1401,10 +1399,10 @@ static void iwl3945_rx_handle(struct iwl_priv *priv) rxq->queue[i] = NULL; - pci_unmap_single(priv->pci_dev, rxb->real_dma_addr, - priv->hw_params.rx_buf_size, - PCI_DMA_FROMDEVICE); - pkt = (struct iwl_rx_packet *)rxb->skb->data; + pci_unmap_page(priv->pci_dev, rxb->page_dma, + PAGE_SIZE << priv->hw_params.rx_page_order, + PCI_DMA_FROMDEVICE); + pkt = rxb_addr(rxb); /* Reclaim a command buffer only if this packet is a response * to a (driver-originated) command. @@ -1426,16 +1424,17 @@ static void iwl3945_rx_handle(struct iwl_priv *priv) priv->isr_stats.rx_handlers[pkt->hdr.cmd]++; } else { /* No handling needed */ - IWL_DEBUG_RX(priv, "r %d i %d No handler needed for %s, 0x%02x\n", + IWL_DEBUG_RX(priv, + "r %d i %d No handler needed for %s, 0x%02x\n", r, i, get_cmd_string(pkt->hdr.cmd), pkt->hdr.cmd); } if (reclaim) { - /* Invoke any callbacks, transfer the skb to caller, and - * fire off the (possibly) blocking iwl_send_cmd() + /* Invoke any callbacks, transfer the buffer to caller, + * and fire off the (possibly) blocking iwl_send_cmd() * as we reclaim the driver command queue */ - if (rxb && rxb->skb) + if (rxb && rxb->page) iwl_tx_cmd_complete(priv, rxb); else IWL_WARN(priv, "Claim null rxb?\n"); @@ -1444,10 +1443,10 @@ static void iwl3945_rx_handle(struct iwl_priv *priv) /* For now we just don't re-use anything. We can tweak this * later to try and re-use notification packets and SKBs that * fail to Rx correctly */ - if (rxb->skb != NULL) { - priv->alloc_rxb_skb--; - dev_kfree_skb_any(rxb->skb); - rxb->skb = NULL; + if (rxb->page != NULL) { + priv->alloc_rxb_page--; + __free_pages(rxb->page, priv->hw_params.rx_page_order); + rxb->page = NULL; } spin_lock_irqsave(&rxq->lock, flags); @@ -1685,6 +1684,8 @@ static void iwl3945_irq_tasklet(struct iwl_priv *priv) } #endif + spin_unlock_irqrestore(&priv->lock, flags); + /* Since CSR_INT and CSR_FH_INT_STATUS reads and clears are not * atomic, make sure that inta covers all the interrupts that * we've discovered, even if FH interrupt came in just after @@ -1706,8 +1707,6 @@ static void iwl3945_irq_tasklet(struct iwl_priv *priv) handled |= CSR_INT_BIT_HW_ERR; - spin_unlock_irqrestore(&priv->lock, flags); - return; } @@ -1799,7 +1798,6 @@ static void iwl3945_irq_tasklet(struct iwl_priv *priv) "flags 0x%08lx\n", inta, inta_mask, inta_fh, flags); } #endif - spin_unlock_irqrestore(&priv->lock, flags); } static int iwl3945_get_channels_for_scan(struct iwl_priv *priv, -- 1.5.6.3 [-- Attachment #3: 0002-iwlwifi-fix-use-after-free-bug-for-paged-rx.patch --] [-- Type: text/x-patch, Size: 8826 bytes --] From 000c60eef9bf7a579c02ccb7deee447a2231d2b0 Mon Sep 17 00:00:00 2001 From: Zhu Yi <yi.zhu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> Date: Thu, 15 Oct 2009 20:00:57 -0700 Subject: [PATCH 2/2] iwlwifi: fix use after free bug for paged rx In the paged rx patch (4854fde2), I introduced a bug that could possibly touch an already freed page. It is fixed by avoiding the access in this patch. I've also added some comments so that other people touching the code won't make the same mistake. In the future, if we cannot avoid access the page after being handled to the upper layer, we can use get_page/put_page to handle it. For now, it's just not necessary. It also fixed a debug message print bug reported by Stanislaw Gruszka <sgruszka-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>. Signed-off-by: Zhu Yi <yi.zhu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> Signed-off-by: Reinette Chatre <reinette.chatre-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> --- drivers/net/wireless/iwlwifi/iwl-3945.c | 16 +++++++++++----- drivers/net/wireless/iwlwifi/iwl-agn.c | 11 +++++++++-- drivers/net/wireless/iwlwifi/iwl-rx.c | 21 ++++++++++++++------- drivers/net/wireless/iwlwifi/iwl3945-base.c | 18 +++++++++++++----- 4 files changed, 47 insertions(+), 19 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/iwl-3945.c b/drivers/net/wireless/iwlwifi/iwl-3945.c index 7d5962d..4406650 100644 --- a/drivers/net/wireless/iwlwifi/iwl-3945.c +++ b/drivers/net/wireless/iwlwifi/iwl-3945.c @@ -552,6 +552,7 @@ static void iwl3945_pass_packet_to_mac80211(struct iwl_priv *priv, u16 len = le16_to_cpu(rx_hdr->len); struct sk_buff *skb; int ret; + __le16 fc = hdr->frame_control; /* We received data from the HW, so stop the watchdog */ if (unlikely(len + IWL39_RX_FRAME_SIZE > @@ -584,9 +585,9 @@ static void iwl3945_pass_packet_to_mac80211(struct iwl_priv *priv, /* mac80211 currently doesn't support paged SKB. Convert it to * linear SKB for management frame and data frame requires * software decryption or software defragementation. */ - if (ieee80211_is_mgmt(hdr->frame_control) || - ieee80211_has_protected(hdr->frame_control) || - ieee80211_has_morefrags(hdr->frame_control) || + if (ieee80211_is_mgmt(fc) || + ieee80211_has_protected(fc) || + ieee80211_has_morefrags(fc) || le16_to_cpu(hdr->seq_ctrl) & IEEE80211_SCTL_FRAG) ret = skb_linearize(skb); else @@ -598,11 +599,16 @@ static void iwl3945_pass_packet_to_mac80211(struct iwl_priv *priv, goto out; } + /* + * XXX: We cannot touch the page and its virtual memory (pkt) after + * here. It might have already been freed by the above skb change. + */ + #ifdef CONFIG_IWLWIFI_LEDS - if (ieee80211_is_data(hdr->frame_control)) + if (ieee80211_is_data(fc)) priv->rxtxpackets += len; #endif - iwl_update_stats(priv, false, hdr->frame_control, len); + iwl_update_stats(priv, false, fc, len); memcpy(IEEE80211_SKB_RXCB(skb), stats, sizeof(*stats)); ieee80211_rx(priv->hw, skb); diff --git a/drivers/net/wireless/iwlwifi/iwl-agn.c b/drivers/net/wireless/iwlwifi/iwl-agn.c index c5ff7c0..475f677 100644 --- a/drivers/net/wireless/iwlwifi/iwl-agn.c +++ b/drivers/net/wireless/iwlwifi/iwl-agn.c @@ -808,8 +808,8 @@ void iwl_rx_handle(struct iwl_priv *priv) if (priv->rx_handlers[pkt->hdr.cmd]) { IWL_DEBUG_RX(priv, "r = %d, i = %d, %s, 0x%02x\n", r, i, get_cmd_string(pkt->hdr.cmd), pkt->hdr.cmd); - priv->rx_handlers[pkt->hdr.cmd] (priv, rxb); priv->isr_stats.rx_handlers[pkt->hdr.cmd]++; + priv->rx_handlers[pkt->hdr.cmd] (priv, rxb); } else { /* No handling needed */ IWL_DEBUG_RX(priv, @@ -818,11 +818,18 @@ void iwl_rx_handle(struct iwl_priv *priv) pkt->hdr.cmd); } + /* + * XXX: After here, we should always check rxb->page + * against NULL before touching it or its virtual + * memory (pkt). Because some rx_handler might have + * already taken or freed the pages. + */ + if (reclaim) { /* Invoke any callbacks, transfer the buffer to caller, * and fire off the (possibly) blocking iwl_send_cmd() * as we reclaim the driver command queue */ - if (rxb && rxb->page) + if (rxb->page) iwl_tx_cmd_complete(priv, rxb); else IWL_WARN(priv, "Claim null rxb?\n"); diff --git a/drivers/net/wireless/iwlwifi/iwl-rx.c b/drivers/net/wireless/iwlwifi/iwl-rx.c index 5e56857..2663689 100644 --- a/drivers/net/wireless/iwlwifi/iwl-rx.c +++ b/drivers/net/wireless/iwlwifi/iwl-rx.c @@ -241,6 +241,7 @@ void iwl_rx_allocate(struct iwl_priv *priv, gfp_t priority) struct iwl_rx_mem_buffer *rxb; struct page *page; unsigned long flags; + gfp_t gfp_mask = priority; while (1) { spin_lock_irqsave(&rxq->lock, flags); @@ -251,13 +252,13 @@ void iwl_rx_allocate(struct iwl_priv *priv, gfp_t priority) spin_unlock_irqrestore(&rxq->lock, flags); if (rxq->free_count > RX_LOW_WATERMARK) - priority |= __GFP_NOWARN; + gfp_mask |= __GFP_NOWARN; if (priv->hw_params.rx_page_order > 0) - priority |= __GFP_COMP; + gfp_mask |= __GFP_COMP; /* Alloc a new receive buffer */ - page = alloc_pages(priority, priv->hw_params.rx_page_order); + page = alloc_pages(gfp_mask, priv->hw_params.rx_page_order); if (!page) { if (net_ratelimit()) IWL_DEBUG_INFO(priv, "alloc_pages failed, " @@ -884,6 +885,7 @@ static void iwl_pass_packet_to_mac80211(struct iwl_priv *priv, { struct sk_buff *skb; int ret = 0; + __le16 fc = hdr->frame_control; /* We only process data packets if the interface is open */ if (unlikely(!priv->is_open)) { @@ -908,9 +910,9 @@ static void iwl_pass_packet_to_mac80211(struct iwl_priv *priv, /* mac80211 currently doesn't support paged SKB. Convert it to * linear SKB for management frame and data frame requires * software decryption or software defragementation. */ - if (ieee80211_is_mgmt(hdr->frame_control) || - ieee80211_has_protected(hdr->frame_control) || - ieee80211_has_morefrags(hdr->frame_control) || + if (ieee80211_is_mgmt(fc) || + ieee80211_has_protected(fc) || + ieee80211_has_morefrags(fc) || le16_to_cpu(hdr->seq_ctrl) & IEEE80211_SCTL_FRAG) ret = skb_linearize(skb); else @@ -922,7 +924,12 @@ static void iwl_pass_packet_to_mac80211(struct iwl_priv *priv, goto out; } - iwl_update_stats(priv, false, hdr->frame_control, len); + /* + * XXX: We cannot touch the page and its virtual memory (hdr) after + * here. It might have already been freed by the above skb change. + */ + + iwl_update_stats(priv, false, fc, len); memcpy(IEEE80211_SKB_RXCB(skb), stats, sizeof(*stats)); ieee80211_rx(priv->hw, skb); diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c index e20690d..5ae8698 100644 --- a/drivers/net/wireless/iwlwifi/iwl3945-base.c +++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c @@ -1137,6 +1137,7 @@ static void iwl3945_rx_allocate(struct iwl_priv *priv, gfp_t priority) struct iwl_rx_mem_buffer *rxb; struct page *page; unsigned long flags; + gfp_t gfp_mask = priority; while (1) { spin_lock_irqsave(&rxq->lock, flags); @@ -1148,13 +1149,13 @@ static void iwl3945_rx_allocate(struct iwl_priv *priv, gfp_t priority) spin_unlock_irqrestore(&rxq->lock, flags); if (rxq->free_count > RX_LOW_WATERMARK) - priority |= __GFP_NOWARN; + gfp_mask |= __GFP_NOWARN; if (priv->hw_params.rx_page_order > 0) - priority |= __GFP_COMP; + gfp_mask |= __GFP_COMP; /* Alloc a new receive buffer */ - page = alloc_pages(priority, priv->hw_params.rx_page_order); + page = alloc_pages(gfp_mask, priv->hw_params.rx_page_order); if (!page) { if (net_ratelimit()) IWL_DEBUG_INFO(priv, "Failed to allocate SKB buffer.\n"); @@ -1420,8 +1421,8 @@ static void iwl3945_rx_handle(struct iwl_priv *priv) if (priv->rx_handlers[pkt->hdr.cmd]) { IWL_DEBUG_RX(priv, "r = %d, i = %d, %s, 0x%02x\n", r, i, get_cmd_string(pkt->hdr.cmd), pkt->hdr.cmd); - priv->rx_handlers[pkt->hdr.cmd] (priv, rxb); priv->isr_stats.rx_handlers[pkt->hdr.cmd]++; + priv->rx_handlers[pkt->hdr.cmd] (priv, rxb); } else { /* No handling needed */ IWL_DEBUG_RX(priv, @@ -1430,11 +1431,18 @@ static void iwl3945_rx_handle(struct iwl_priv *priv) pkt->hdr.cmd); } + /* + * XXX: After here, we should always check rxb->page + * against NULL before touching it or its virtual + * memory (pkt). Because some rx_handler might have + * already taken or freed the pages. + */ + if (reclaim) { /* Invoke any callbacks, transfer the buffer to caller, * and fire off the (possibly) blocking iwl_send_cmd() * as we reclaim the driver command queue */ - if (rxb && rxb->page) + if (rxb->page) iwl_tx_cmd_complete(priv, rxb); else IWL_WARN(priv, "Claim null rxb?\n"); -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-17 5:42 ` reinette chatre 0 siblings, 0 replies; 384+ messages in thread From: reinette chatre @ 2009-10-17 5:42 UTC (permalink / raw) To: Frans Pop Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm [-- Attachment #1: Type: text/plain, Size: 1102 bytes --] Hi Frans, On Thu, 2009-10-15 at 12:41 -0700, Frans Pop wrote: > On Thursday 15 October 2009, reinette chatre wrote: > > > The log file timestamps don't tell much as the logging gets delayed, > > > so they all end up at the same time. Maybe I should enable the kernel > > > timestamps so we can see how far apart these failures are. > > > > If you can get accurate timing it will be very useful. I am interested > > to see how quickly it goes from "48 free buffers" to "0 free buffers". > > Attached the dmesg for three consecutive test runs (i.e. without > rebooting). Not that the 2nd one includes only "0 free buffers" messages, > even though the behavior (point where desktop freezes and music stops) > looked similar. > > Not sure if you can tell all that much from the data. > Prompted by this thread we are in process of moving allocation to paged skb. This will definitely reduce the allocation size (from order 2 to order 1) and hopefully help with this problem also. Could you please try with the attached two patches? They are based on 2.6.32-rc4. Thank you very much Reinette [-- Attachment #2: 0001-iwlwifi-use-paged-Rx.patch --] [-- Type: text/x-patch, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-17 5:42 ` reinette chatre @ 2009-10-27 11:10 ` Frans Pop -1 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-27 11:10 UTC (permalink / raw) To: reinette chatre Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm Sorry for the delay in replying. On Saturday 17 October 2009, reinette chatre wrote: > Prompted by this thread we are in process of moving allocation to paged > skb. This will definitely reduce the allocation size (from order 2 to > order 1) and hopefully help with this problem also. Could you please try > with the attached two patches? They are based on 2.6.32-rc4. Looks very good! With these patches I no longer get any SKB allocation errors, even during the heaviest freezes while gitk is loading. I do still get (long) music skips during the freezes, but that's not unexpected. AFAICT the wireless connection is stable. Tested on top of current mainline git: v2.6.32-rc5-81-g964fe08. Please add, if you feel it's appropriate, my: Reported-and-tested-by: Frans Pop <elendil@planet.nl> Cheers, FJP ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-27 11:10 ` Frans Pop 0 siblings, 0 replies; 384+ messages in thread From: Frans Pop @ 2009-10-27 11:10 UTC (permalink / raw) To: reinette chatre Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm Sorry for the delay in replying. On Saturday 17 October 2009, reinette chatre wrote: > Prompted by this thread we are in process of moving allocation to paged > skb. This will definitely reduce the allocation size (from order 2 to > order 1) and hopefully help with this problem also. Could you please try > with the attached two patches? They are based on 2.6.32-rc4. Looks very good! With these patches I no longer get any SKB allocation errors, even during the heaviest freezes while gitk is loading. I do still get (long) music skips during the freezes, but that's not unexpected. AFAICT the wireless connection is stable. Tested on top of current mainline git: v2.6.32-rc5-81-g964fe08. Please add, if you feel it's appropriate, my: Reported-and-tested-by: Frans Pop <elendil@planet.nl> Cheers, FJP -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn 2009-10-27 11:10 ` Frans Pop (?) @ 2009-10-27 16:15 ` reinette chatre -1 siblings, 0 replies; 384+ messages in thread From: reinette chatre @ 2009-10-27 16:15 UTC (permalink / raw) To: Frans Pop Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm Hi Frans, On Tue, 2009-10-27 at 04:10 -0700, Frans Pop wrote: > Sorry for the delay in replying. > > On Saturday 17 October 2009, reinette chatre wrote: > > Prompted by this thread we are in process of moving allocation to paged > > skb. This will definitely reduce the allocation size (from order 2 to > > order 1) and hopefully help with this problem also. Could you please try > > with the attached two patches? They are based on 2.6.32-rc4. > > Looks very good! With these patches I no longer get any SKB allocation > errors, even during the heaviest freezes while gitk is loading. I do still > get (long) music skips during the freezes, but that's not unexpected. > AFAICT the wireless connection is stable. > > Tested on top of current mainline git: v2.6.32-rc5-81-g964fe08. > > Please add, if you feel it's appropriate, my: > Reported-and-tested-by: Frans Pop <elendil@planet.nl> Thank you very much for testing these patches so thoroughly. They are both on their way upstream already so I am not able to add your signature at this time. Since these are pretty big changes these patches will be in 2.6.33. Reinette ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-27 16:15 ` reinette chatre 0 siblings, 0 replies; 384+ messages in thread From: reinette chatre @ 2009-10-27 16:15 UTC (permalink / raw) To: Frans Pop Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg Hi Frans, On Tue, 2009-10-27 at 04:10 -0700, Frans Pop wrote: > Sorry for the delay in replying. > > On Saturday 17 October 2009, reinette chatre wrote: > > Prompted by this thread we are in process of moving allocation to paged > > skb. This will definitely reduce the allocation size (from order 2 to > > order 1) and hopefully help with this problem also. Could you please try > > with the attached two patches? They are based on 2.6.32-rc4. > > Looks very good! With these patches I no longer get any SKB allocation > errors, even during the heaviest freezes while gitk is loading. I do still > get (long) music skips during the freezes, but that's not unexpected. > AFAICT the wireless connection is stable. > > Tested on top of current mainline git: v2.6.32-rc5-81-g964fe08. > > Please add, if you feel it's appropriate, my: > Reported-and-tested-by: Frans Pop <elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org> Thank you very much for testing these patches so thoroughly. They are both on their way upstream already so I am not able to add your signature at this time. Since these are pretty big changes these patches will be in 2.6.33. Reinette ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14141] order 2 page allocation failures in iwlagn @ 2009-10-27 16:15 ` reinette chatre 0 siblings, 0 replies; 384+ messages in thread From: reinette chatre @ 2009-10-27 16:15 UTC (permalink / raw) To: Frans Pop Cc: Mel Gorman, David Rientjes, KOSAKI Motohiro, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Pekka Enberg, Bartlomiej Zolnierkiewicz, Karol Lewandowski, Abbas, Mohamed, John W. Linville, linux-mm Hi Frans, On Tue, 2009-10-27 at 04:10 -0700, Frans Pop wrote: > Sorry for the delay in replying. > > On Saturday 17 October 2009, reinette chatre wrote: > > Prompted by this thread we are in process of moving allocation to paged > > skb. This will definitely reduce the allocation size (from order 2 to > > order 1) and hopefully help with this problem also. Could you please try > > with the attached two patches? They are based on 2.6.32-rc4. > > Looks very good! With these patches I no longer get any SKB allocation > errors, even during the heaviest freezes while gitk is loading. I do still > get (long) music skips during the freezes, but that's not unexpected. > AFAICT the wireless connection is stable. > > Tested on top of current mainline git: v2.6.32-rc5-81-g964fe08. > > Please add, if you feel it's appropriate, my: > Reported-and-tested-by: Frans Pop <elendil@planet.nl> Thank you very much for testing these patches so thoroughly. They are both on their way upstream already so I am not able to add your signature at this time. Since these are pretty big changes these patches will be in 2.6.33. Reinette -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14185] Oops in driversbasefirmware_class 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, David Woodhouse, lars_ericsson This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14185 Subject : Oops in driversbasefirmware_class Submitter : <lars_ericsson@telia.com> Date : 2009-09-17 05:09 (15 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6e03a201bbe8137487f340d26aa662110e324b20 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14185] Oops in driversbasefirmware_class @ 2009-10-01 19:55 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, David Woodhouse, lars_ericsson This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14185 Subject : Oops in driversbasefirmware_class Submitter : <lars_ericsson@telia.com> Date : 2009-09-17 05:09 (15 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6e03a201bbe8137487f340d26aa662110e324b20 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14204] MCE prevent booting on my computer(pentium iii @500Mhz) 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (26 preceding siblings ...) 2009-10-01 19:55 ` Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #14205] Intel DX58SO mainboard - powering off takes really long Rafael J. Wysocki ` (20 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, GNUtoo, Ingo Molnar This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14204 Subject : MCE prevent booting on my computer(pentium iii @500Mhz) Submitter : GNUtoo <GNUtoo@no-log.org> Date : 2009-09-21 20:36 (11 days old) ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14205] Intel DX58SO mainboard - powering off takes really long 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (27 preceding siblings ...) 2009-10-01 19:55 ` [Bug #14204] MCE prevent booting on my computer(pentium iii @500Mhz) Rafael J. Wysocki @ 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:56 ` Rafael J. Wysocki ` (19 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:55 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Tomasz Chmielewski This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14205 Subject : Intel DX58SO mainboard - powering off takes really long Submitter : Tomasz Chmielewski <tch@wpkg.org> Date : 2009-09-22 10:14 (10 days old) ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14249] BUG: oops in gss_validate on 2.6.31 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Bastian Blank, Trond Myklebust This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14249 Subject : BUG: oops in gss_validate on 2.6.31 Submitter : Bastian Blank <bastian@waldi.eu.org> Date : 2009-09-16 10:29 (16 days old) References : http://marc.info/?l=linux-kernel&m=125309700417283&w=4 Handled-By : Trond Myklebust <trond.myklebust@fys.uio.no> ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14249] BUG: oops in gss_validate on 2.6.31 @ 2009-10-01 19:56 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Bastian Blank, Trond Myklebust This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14249 Subject : BUG: oops in gss_validate on 2.6.31 Submitter : Bastian Blank <bastian-yyjItF7Rl6lg9hUCZPvPmw@public.gmane.org> Date : 2009-09-16 10:29 (16 days old) References : http://marc.info/?l=linux-kernel&m=125309700417283&w=4 Handled-By : Trond Myklebust <trond.myklebust-41N18TsMXrtuMpJDpNschA@public.gmane.org> ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14248] 2.6.31 wireless: WARNING: at net/wireless/ibss.c:34 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Jurriaan This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14248 Subject : 2.6.31 wireless: WARNING: at net/wireless/ibss.c:34 Submitter : Jurriaan <thunder8@xs4all.nl> Date : 2009-09-13 7:32 (19 days old) References : http://marc.info/?l=linux-kernel&m=125282721113553&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14248] 2.6.31 wireless: WARNING: at net/wireless/ibss.c:34 @ 2009-10-01 19:56 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Jurriaan This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14248 Subject : 2.6.31 wireless: WARNING: at net/wireless/ibss.c:34 Submitter : Jurriaan <thunder8-qWit8jRvyhVmR6Xm/wNWPw@public.gmane.org> Date : 2009-09-13 7:32 (19 days old) References : http://marc.info/?l=linux-kernel&m=125282721113553&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14222] Hibernation oopses for the 2nd time with 2.6.31 (won't fit the screen) 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (30 preceding siblings ...) 2009-10-01 19:56 ` Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:56 ` Rafael J. Wysocki ` (16 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Magnus Damm, Magnus Damm, Ondrej Zary, Thomas Gleixner This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14222 Subject : Hibernation oopses for the 2nd time with 2.6.31 (won't fit the screen) Submitter : Ondrej Zary <linux@rainbow-software.org> Date : 2009-09-24 14:07 (8 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c7121843685de2bf7f3afd3ae1d6a146010bf1fc ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14252] WARNING: at include/linux/skbuff.h:1382 w/ e1000 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Stephan von Krawczynski This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14252 Subject : WARNING: at include/linux/skbuff.h:1382 w/ e1000 Submitter : Stephan von Krawczynski <skraw@ithnet.com> Date : 2009-09-20 11:26 (12 days old) References : http://marc.info/?l=linux-kernel&m=125344599006033&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14252] WARNING: at include/linux/skbuff.h:1382 w/ e1000 @ 2009-10-01 19:56 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Stephan von Krawczynski This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14252 Subject : WARNING: at include/linux/skbuff.h:1382 w/ e1000 Submitter : Stephan von Krawczynski <skraw-DcQCyzbjH0jQT0dZR+AlfA@public.gmane.org> Date : 2009-09-20 11:26 (12 days old) References : http://marc.info/?l=linux-kernel&m=125344599006033&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14253] Oops in driversbasefirmware_class 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Frederik Deweerdt, Lars Ericsson This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14253 Subject : Oops in driversbasefirmware_class Submitter : Lars Ericsson <Lars_Ericsson@telia.com> Date : 2009-09-16 20:44 (16 days old) References : http://lkml.org/lkml/2009/9/16/461 Handled-By : Frederik Deweerdt <frederik.deweerdt@xprog.eu> Patch : http://patchwork.kernel.org/patch/49914/ ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14253] Oops in driversbasefirmware_class @ 2009-10-01 19:56 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Frederik Deweerdt, Lars Ericsson This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14253 Subject : Oops in driversbasefirmware_class Submitter : Lars Ericsson <Lars_Ericsson-zq6IREYz3ykAvxtiuMwx3w@public.gmane.org> Date : 2009-09-16 20:44 (16 days old) References : http://lkml.org/lkml/2009/9/16/461 Handled-By : Frederik Deweerdt <frederik.deweerdt-kjvbsxwSFqI@public.gmane.org> Patch : http://patchwork.kernel.org/patch/49914/ ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14254] Hibernation broken by clocksource: Save mult_orig in clocksource_disable() 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Magnus Damm, Magnus Damm, Ondrej Zary, Thomas Gleixner This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14254 Subject : Hibernation broken by clocksource: Save mult_orig in clocksource_disable() Submitter : Ondrej Zary <linux@rainbow-software.org> Date : 2009-09-19 19:55 (13 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c7121843685de2bf7f3afd3ae1d6a146010bf1fc References : http://marc.info/?l=linux-kernel&m=125339012527719&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14254] Hibernation broken by clocksource: Save mult_orig in clocksource_disable() @ 2009-10-01 19:56 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Magnus Damm, Magnus Damm, Ondrej Zary, Thomas Gleixner This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14254 Subject : Hibernation broken by clocksource: Save mult_orig in clocksource_disable() Submitter : Ondrej Zary <linux-ZCIryABCsrmttCpgsWEBFmD2FQJk+8+b@public.gmane.org> Date : 2009-09-19 19:55 (13 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c7121843685de2bf7f3afd3ae1d6a146010bf1fc References : http://marc.info/?l=linux-kernel&m=125339012527719&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14251] 2.6.31: no login prompt 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Frédéric L. W. Meunier This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14251 Subject : 2.6.31: no login prompt Submitter : Frédéric L. W. Meunier <fredlwm@gmail.com> Date : 2009-09-19 22:43 (13 days old) References : http://marc.info/?l=linux-kernel&m=125340020804711&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14251] 2.6.31: no login prompt @ 2009-10-01 19:56 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Frédéric L. W. Meunier This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14251 Subject : 2.6.31: no login prompt Submitter : Frédéric L. W. Meunier <fredlwm-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Date : 2009-09-19 22:43 (13 days old) References : http://marc.info/?l=linux-kernel&m=125340020804711&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14255] WARNING: at drivers/char/tty_io.c:1267 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Heinz Diehl, Ingo Molnar, Linus Torvalds This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14255 Subject : WARNING: at drivers/char/tty_io.c:1267 Submitter : Heinz Diehl <htd@fancy-poultry.org> Date : 2009-09-20 11:37 (12 days old) References : http://marc.info/?l=linux-kernel&m=125344629506309&w=4 http://lkml.org/lkml/2009/9/8/393 Handled-By : Linus Torvalds <torvalds@linux-foundation.org> ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14255] WARNING: at drivers/char/tty_io.c:1267 @ 2009-10-01 19:56 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Heinz Diehl, Ingo Molnar, Linus Torvalds This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14255 Subject : WARNING: at drivers/char/tty_io.c:1267 Submitter : Heinz Diehl <htd-HjJ2MNWy62to6+H+lsi3Gti2O/JbrIOy@public.gmane.org> Date : 2009-09-20 11:37 (12 days old) References : http://marc.info/?l=linux-kernel&m=125344629506309&w=4 http://lkml.org/lkml/2009/9/8/393 Handled-By : Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14255] WARNING: at drivers/char/tty_io.c:1267 2009-10-01 19:56 ` Rafael J. Wysocki @ 2009-10-02 0:05 ` Linus Torvalds -1 siblings, 0 replies; 384+ messages in thread From: Linus Torvalds @ 2009-10-02 0:05 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Heinz Diehl, Ingo Molnar On Thu, 1 Oct 2009, Rafael J. Wysocki wrote: > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14255 > Subject : WARNING: at drivers/char/tty_io.c:1267 > Submitter : Heinz Diehl <htd@fancy-poultry.org> > Date : 2009-09-20 11:37 (12 days old) > References : http://marc.info/?l=linux-kernel&m=125344629506309&w=4 > http://lkml.org/lkml/2009/9/8/393 > Handled-By : Linus Torvalds <torvalds@linux-foundation.org> So the real problem here is that horrible workqueue deadlock, but it turns out that I think that we should be able to safely do a cancel_delayed_work_sync(&tty->buf.work); in tty_ldisc_halt(), because cancel_delayed_work_sync() should never wait for any other work than the exact work in question. And the buf.work thing is flush_to_ldisc(), so waiting for _that_ is safe - the problematic thing was always waiting for the (unrelated) tty->hangup_work, which can (and does) take the semaphore for do_tty_hangup. So doing that synchronous version of the delayed work cancel means that we can now rest easy after tty_ldisc_halt(), and we don't need to worry about buf.work still being pending. We _do_ in general need to worry about hangup_work, which will call do_tty_hangup, which will call tty_ldisc_hangup, but that's actually the routine we are in right now, so for the _very_ special case of tty_ldisc_hangup that is a non-issue too. Did that sound subtle to you? It should. It's subtle as hell, and I don't like it, but I think that the two subtle rules above means that the following two-liner patch is safe - it can't cause any new deadlocks, and getting rid of a the flush_scheduled_work is safe in this one case. So please give it a whirl. I'm not happy about the subtlety, but I also hope that we'll get rid of that in the long run, so as a short-term hack this looks acceptable. To recap: - tty_ldisc_halt() _can_ be called under the ldisc_mutex, because while it waits for the work, it never waits for _other_ work, and buf.work itself doesn't need the ldisc_mutex. So no deadlock. - The flush_scheduled_work() after tty_ldisc_halt() is normally needed to not just flush the buf.work (which is now done by tty_ldisc_halt() itself), but to also make sure that there isn't any hangup work pending. So we can't remove that in general, and the other cases will still need to flush all scheduled work (and worry about deadlocks with ldisc_mutex). HOWEVER, in the special case of tty_ldisc_hangup() we know that we are inside the hangup work, and thus don't need to wait for ourselves, so we can just get rid of it there - just nowhere else. - The other cases of dropping the ldisc_mutex in the middle are long-standing, and have that TTY_LDISC_CHANGING vs TTY_HUPPED hackery to take care of the races that it opens. I'd love to get rid of that too, but they all seem to work. And they have never apparently triggered the WARN_ON in this bugzilla. I'm not proud of this patch, and I'm not signing off on it until the people who have seen this warning have tried it and report that it seems to work.. Linus --- drivers/char/tty_ldisc.c | 7 ++----- 1 files changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/char/tty_ldisc.c b/drivers/char/tty_ldisc.c index aafdbae..feb5507 100644 --- a/drivers/char/tty_ldisc.c +++ b/drivers/char/tty_ldisc.c @@ -518,7 +518,7 @@ static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old) static int tty_ldisc_halt(struct tty_struct *tty) { clear_bit(TTY_LDISC, &tty->flags); - return cancel_delayed_work(&tty->buf.work); + return cancel_delayed_work_sync(&tty->buf.work); } /** @@ -756,12 +756,9 @@ void tty_ldisc_hangup(struct tty_struct *tty) * N_TTY. */ if (tty->driver->flags & TTY_DRIVER_RESET_TERMIOS) { - /* Make sure the old ldisc is quiescent */ - tty_ldisc_halt(tty); - flush_scheduled_work(); - /* Avoid racing set_ldisc or tty_ldisc_release */ mutex_lock(&tty->ldisc_mutex); + tty_ldisc_halt(tty); if (tty->ldisc) { /* Not yet closed */ /* Switch back to N_TTY */ tty_ldisc_reinit(tty); ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14255] WARNING: at drivers/char/tty_io.c:1267 @ 2009-10-02 0:05 ` Linus Torvalds 0 siblings, 0 replies; 384+ messages in thread From: Linus Torvalds @ 2009-10-02 0:05 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Heinz Diehl, Ingo Molnar On Thu, 1 Oct 2009, Rafael J. Wysocki wrote: > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14255 > Subject : WARNING: at drivers/char/tty_io.c:1267 > Submitter : Heinz Diehl <htd-HjJ2MNWy62to6+H+lsi3Gti2O/JbrIOy@public.gmane.org> > Date : 2009-09-20 11:37 (12 days old) > References : http://marc.info/?l=linux-kernel&m=125344629506309&w=4 > http://lkml.org/lkml/2009/9/8/393 > Handled-By : Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> So the real problem here is that horrible workqueue deadlock, but it turns out that I think that we should be able to safely do a cancel_delayed_work_sync(&tty->buf.work); in tty_ldisc_halt(), because cancel_delayed_work_sync() should never wait for any other work than the exact work in question. And the buf.work thing is flush_to_ldisc(), so waiting for _that_ is safe - the problematic thing was always waiting for the (unrelated) tty->hangup_work, which can (and does) take the semaphore for do_tty_hangup. So doing that synchronous version of the delayed work cancel means that we can now rest easy after tty_ldisc_halt(), and we don't need to worry about buf.work still being pending. We _do_ in general need to worry about hangup_work, which will call do_tty_hangup, which will call tty_ldisc_hangup, but that's actually the routine we are in right now, so for the _very_ special case of tty_ldisc_hangup that is a non-issue too. Did that sound subtle to you? It should. It's subtle as hell, and I don't like it, but I think that the two subtle rules above means that the following two-liner patch is safe - it can't cause any new deadlocks, and getting rid of a the flush_scheduled_work is safe in this one case. So please give it a whirl. I'm not happy about the subtlety, but I also hope that we'll get rid of that in the long run, so as a short-term hack this looks acceptable. To recap: - tty_ldisc_halt() _can_ be called under the ldisc_mutex, because while it waits for the work, it never waits for _other_ work, and buf.work itself doesn't need the ldisc_mutex. So no deadlock. - The flush_scheduled_work() after tty_ldisc_halt() is normally needed to not just flush the buf.work (which is now done by tty_ldisc_halt() itself), but to also make sure that there isn't any hangup work pending. So we can't remove that in general, and the other cases will still need to flush all scheduled work (and worry about deadlocks with ldisc_mutex). HOWEVER, in the special case of tty_ldisc_hangup() we know that we are inside the hangup work, and thus don't need to wait for ourselves, so we can just get rid of it there - just nowhere else. - The other cases of dropping the ldisc_mutex in the middle are long-standing, and have that TTY_LDISC_CHANGING vs TTY_HUPPED hackery to take care of the races that it opens. I'd love to get rid of that too, but they all seem to work. And they have never apparently triggered the WARN_ON in this bugzilla. I'm not proud of this patch, and I'm not signing off on it until the people who have seen this warning have tried it and report that it seems to work.. Linus --- drivers/char/tty_ldisc.c | 7 ++----- 1 files changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/char/tty_ldisc.c b/drivers/char/tty_ldisc.c index aafdbae..feb5507 100644 --- a/drivers/char/tty_ldisc.c +++ b/drivers/char/tty_ldisc.c @@ -518,7 +518,7 @@ static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old) static int tty_ldisc_halt(struct tty_struct *tty) { clear_bit(TTY_LDISC, &tty->flags); - return cancel_delayed_work(&tty->buf.work); + return cancel_delayed_work_sync(&tty->buf.work); } /** @@ -756,12 +756,9 @@ void tty_ldisc_hangup(struct tty_struct *tty) * N_TTY. */ if (tty->driver->flags & TTY_DRIVER_RESET_TERMIOS) { - /* Make sure the old ldisc is quiescent */ - tty_ldisc_halt(tty); - flush_scheduled_work(); - /* Avoid racing set_ldisc or tty_ldisc_release */ mutex_lock(&tty->ldisc_mutex); + tty_ldisc_halt(tty); if (tty->ldisc) { /* Not yet closed */ /* Switch back to N_TTY */ tty_ldisc_reinit(tty); ^ permalink raw reply related [flat|nested] 384+ messages in thread
* [Bug #14258] Memory leak in SCSI initialization 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (36 preceding siblings ...) 2009-10-01 19:56 ` Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-02 12:58 ` Tetsuo Handa 2009-10-01 19:56 ` [Bug #14257] Not able to boot on 32 bit System Rafael J. Wysocki ` (10 subsequent siblings) 48 siblings, 1 reply; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Michael Ellerman, Tetsuo Handa This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14258 Subject : Memory leak in SCSI initialization Submitter : Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Date : 2009-09-22 4:18 (10 days old) References : http://marc.info/?l=linux-kernel&m=125359311312243&w=4 Handled-By : Michael Ellerman <michael@ellerman.id.au> Patch : http://patchwork.kernel.org/patch/49258/ ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14258] Memory leak in SCSI initialization 2009-10-01 19:56 ` [Bug #14258] Memory leak in SCSI initialization Rafael J. Wysocki @ 2009-10-02 12:58 ` Tetsuo Handa 2009-10-02 17:26 ` Rafael J. Wysocki 0 siblings, 1 reply; 384+ messages in thread From: Tetsuo Handa @ 2009-10-02 12:58 UTC (permalink / raw) To: rjw, James.Bottomley; +Cc: michael, linux-kernel, kernel-testers Rafael J. Wysocki wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. This memory leak might exist in all releases since 23 Sep 2005. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6f3a20242db2597312c50abc11f1e747c5d2326a > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). As of now, the patch is not yet merged into Linus's tree. It still should be listed. > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14258 > Subject : Memory leak in SCSI initialization > Submitter : Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> > Date : 2009-09-22 4:18 (10 days old) > References : http://marc.info/?l=linux-kernel&m=125359311312243&w=4 > Handled-By : Michael Ellerman <michael@ellerman.id.au> > Patch : http://patchwork.kernel.org/patch/49258/ > Regards. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14258] Memory leak in SCSI initialization 2009-10-02 12:58 ` Tetsuo Handa @ 2009-10-02 17:26 ` Rafael J. Wysocki 2009-10-07 14:04 ` Tetsuo Handa 0 siblings, 1 reply; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-02 17:26 UTC (permalink / raw) To: Tetsuo Handa; +Cc: James.Bottomley, michael, linux-kernel, kernel-testers On Friday 02 October 2009, Tetsuo Handa wrote: > Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of regressions introduced between 2.6.30 and 2.6.31. > This memory leak might exist in all releases since 23 Sep 2005. > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6f3a20242db2597312c50abc11f1e747c5d2326a > > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.30 and 2.6.31. Please verify if it still should > > be listed and let me know (either way). > As of now, the patch is not yet merged into Linus's tree. > It still should be listed. > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14258 > > Subject : Memory leak in SCSI initialization > > Submitter : Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> > > Date : 2009-09-22 4:18 (10 days old) > > References : http://marc.info/?l=linux-kernel&m=125359311312243&w=4 > > Handled-By : Michael Ellerman <michael@ellerman.id.au> > > Patch : http://patchwork.kernel.org/patch/49258/ Thanks for the update. Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14258] Memory leak in SCSI initialization 2009-10-02 17:26 ` Rafael J. Wysocki @ 2009-10-07 14:04 ` Tetsuo Handa 2009-10-07 20:24 ` Rafael J. Wysocki 0 siblings, 1 reply; 384+ messages in thread From: Tetsuo Handa @ 2009-10-07 14:04 UTC (permalink / raw) To: rjw, kernel-testers; +Cc: James.Bottomley, michael, linux-kernel Rafael J. Wysocki wrote: > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14258 > > > Subject : Memory leak in SCSI initialization > > > Submitter : Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> > > > Date : 2009-09-22 4:18 (10 days old) > > > References : http://marc.info/?l=linux-kernel&m=125359311312243&w=4 > > > Handled-By : Michael Ellerman <michael@ellerman.id.au> > > > Patch : http://patchwork.kernel.org/patch/49258/ > > Thanks for the update. http://patchwork.kernel.org/patch/49258/ would be replaced by an updated patch at http://lkml.org/lkml/2009/10/2/335 Regards. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14258] Memory leak in SCSI initialization 2009-10-07 14:04 ` Tetsuo Handa @ 2009-10-07 20:24 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-07 20:24 UTC (permalink / raw) To: Tetsuo Handa; +Cc: kernel-testers, James.Bottomley, michael, linux-kernel On Wednesday 07 October 2009, Tetsuo Handa wrote: > Rafael J. Wysocki wrote: > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14258 > > > > Subject : Memory leak in SCSI initialization > > > > Submitter : Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> > > > > Date : 2009-09-22 4:18 (10 days old) > > > > References : http://marc.info/?l=linux-kernel&m=125359311312243&w=4 > > > > Handled-By : Michael Ellerman <michael@ellerman.id.au> > > > > Patch : http://patchwork.kernel.org/patch/49258/ > > > > Thanks for the update. > http://patchwork.kernel.org/patch/49258/ would be replaced by > an updated patch at http://lkml.org/lkml/2009/10/2/335 Thanks, updated. Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14257] Not able to boot on 32 bit System 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (37 preceding siblings ...) 2009-10-01 19:56 ` [Bug #14258] Memory leak in SCSI initialization Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14256] kernel BUG at fs/ext3/super.c:435 Rafael J. Wysocki ` (9 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Rishikesh This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14257 Subject : Not able to boot on 32 bit System Submitter : Rishikesh <risrajak@linux.vnet.ibm.com> Date : 2009-09-21 15:25 (11 days old) References : http://marc.info/?l=linux-kernel&m=125354604314412&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14256] kernel BUG at fs/ext3/super.c:435 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (38 preceding siblings ...) 2009-10-01 19:56 ` [Bug #14257] Not able to boot on 32 bit System Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-04 17:38 ` Mikael Pettersson 2009-10-01 19:56 ` Rafael J. Wysocki ` (8 subsequent siblings) 48 siblings, 1 reply; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Mikael Pettersson This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14256 Subject : kernel BUG at fs/ext3/super.c:435 Submitter : Mikael Pettersson <mikpe@it.uu.se> Date : 2009-09-21 7:29 (11 days old) References : http://marc.info/?l=linux-kernel&m=125351816109264&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14256] kernel BUG at fs/ext3/super.c:435 2009-10-01 19:56 ` [Bug #14256] kernel BUG at fs/ext3/super.c:435 Rafael J. Wysocki @ 2009-10-04 17:38 ` Mikael Pettersson 0 siblings, 0 replies; 384+ messages in thread From: Mikael Pettersson @ 2009-10-04 17:38 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Mikael Pettersson Rafael J. Wysocki wrote: > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14256 > Subject : kernel BUG at fs/ext3/super.c:435 > Submitter : Mikael Pettersson <mikpe@it.uu.se> > Date : 2009-09-21 7:29 (11 days old) > References : http://marc.info/?l=3Dlinux-kernel&m=3D125351816109264&w=3D4 The exact same bug (same cause, same symptom) just hit me again in 2.6.32-rc1. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14256] kernel BUG at fs/ext3/super.c:435 @ 2009-10-04 17:38 ` Mikael Pettersson 0 siblings, 0 replies; 384+ messages in thread From: Mikael Pettersson @ 2009-10-04 17:38 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Mikael Pettersson Rafael J. Wysocki wrote: > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14256 > Subject : kernel BUG at fs/ext3/super.c:435 > Submitter : Mikael Pettersson <mikpe-1zs4UD6AkMk@public.gmane.org> > Date : 2009-09-21 7:29 (11 days old) > References : http://marc.info/?l=3Dlinux-kernel&m=3D125351816109264&w=3D4 The exact same bug (same cause, same symptom) just hit me again in 2.6.32-rc1. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14256] kernel BUG at fs/ext3/super.c:435 2009-10-04 17:38 ` Mikael Pettersson (?) @ 2009-10-04 20:49 ` Rafael J. Wysocki 2009-10-04 23:04 ` Mikael Pettersson -1 siblings, 1 reply; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-04 20:49 UTC (permalink / raw) To: Mikael Pettersson; +Cc: Linux Kernel Mailing List, Kernel Testers List On Sunday 04 October 2009, Mikael Pettersson wrote: > Rafael J. Wysocki wrote: > > The following bug entry is on the current list of known regressions > > introduced between 2.6.30 and 2.6.31. Please verify if it still should > > be listed and let me know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14256 > > Subject : kernel BUG at fs/ext3/super.c:435 > > Submitter : Mikael Pettersson <mikpe@it.uu.se> > > Date : 2009-09-21 7:29 (11 days old) > > References : http://marc.info/?l=3Dlinux-kernel&m=3D125351816109264&w=3D4 > > The exact same bug (same cause, same symptom) just hit me again in 2.6.32-rc1. Thanks for the update. Could you check the current Linus' tree, please? There are some known regression fixes in there. Best, Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14256] kernel BUG at fs/ext3/super.c:435 @ 2009-10-04 23:04 ` Mikael Pettersson 0 siblings, 0 replies; 384+ messages in thread From: Mikael Pettersson @ 2009-10-04 23:04 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Mikael Pettersson, Linux Kernel Mailing List, Kernel Testers List Rafael J. Wysocki writes: > On Sunday 04 October 2009, Mikael Pettersson wrote: > > Rafael J. Wysocki wrote: > > > The following bug entry is on the current list of known regressions > > > introduced between 2.6.30 and 2.6.31. Please verify if it still should > > > be listed and let me know (either way). > > > > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14256 > > > Subject : kernel BUG at fs/ext3/super.c:435 > > > Submitter : Mikael Pettersson <mikpe@it.uu.se> > > > Date : 2009-09-21 7:29 (11 days old) > > > References : http://marc.info/?l=3Dlinux-kernel&m=3D125351816109264&w=3D4 > > > > The exact same bug (same cause, same symptom) just hit me again in 2.6.32-rc1. > > Thanks for the update. > > Could you check the current Linus' tree, please? There are some known > regression fixes in there. I tried simplified versions of the bug trigger on two machines running 2.6.32-rc1-git6, and neither triggered the kernel bug. The original recipe involved doing a glibc rebuild, run its test suite, install it, and reboot. Today however machine 1 was already doing a rebuild so after the rebuild it did a reboot into the new kernel before the install. The second machine booted the new kernel directly to install the binary packages from the first machine. I'll re-run the full bug trigger recipe on a third machine later next week (it must rebuild glibc itself anyway due to arch differences). ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14256] kernel BUG at fs/ext3/super.c:435 @ 2009-10-04 23:04 ` Mikael Pettersson 0 siblings, 0 replies; 384+ messages in thread From: Mikael Pettersson @ 2009-10-04 23:04 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Mikael Pettersson, Linux Kernel Mailing List, Kernel Testers List Rafael J. Wysocki writes: > On Sunday 04 October 2009, Mikael Pettersson wrote: > > Rafael J. Wysocki wrote: > > > The following bug entry is on the current list of known regressions > > > introduced between 2.6.30 and 2.6.31. Please verify if it still should > > > be listed and let me know (either way). > > > > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14256 > > > Subject : kernel BUG at fs/ext3/super.c:435 > > > Submitter : Mikael Pettersson <mikpe-1zs4UD6AkMk@public.gmane.org> > > > Date : 2009-09-21 7:29 (11 days old) > > > References : http://marc.info/?l=3Dlinux-kernel&m=3D125351816109264&w=3D4 > > > > The exact same bug (same cause, same symptom) just hit me again in 2.6.32-rc1. > > Thanks for the update. > > Could you check the current Linus' tree, please? There are some known > regression fixes in there. I tried simplified versions of the bug trigger on two machines running 2.6.32-rc1-git6, and neither triggered the kernel bug. The original recipe involved doing a glibc rebuild, run its test suite, install it, and reboot. Today however machine 1 was already doing a rebuild so after the rebuild it did a reboot into the new kernel before the install. The second machine booted the new kernel directly to install the binary packages from the first machine. I'll re-run the full bug trigger recipe on a third machine later next week (it must rebuild glibc itself anyway due to arch differences). ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14256] kernel BUG at fs/ext3/super.c:435 @ 2009-10-09 16:40 ` Mikael Pettersson 0 siblings, 0 replies; 384+ messages in thread From: Mikael Pettersson @ 2009-10-09 16:40 UTC (permalink / raw) To: Mikael Pettersson Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List Mikael Pettersson writes: > Rafael J. Wysocki writes: > > On Sunday 04 October 2009, Mikael Pettersson wrote: > > > Rafael J. Wysocki wrote: > > > > The following bug entry is on the current list of known regressions > > > > introduced between 2.6.30 and 2.6.31. Please verify if it still should > > > > be listed and let me know (either way). > > > > > > > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14256 > > > > Subject : kernel BUG at fs/ext3/super.c:435 > > > > Submitter : Mikael Pettersson <mikpe@it.uu.se> > > > > Date : 2009-09-21 7:29 (11 days old) > > > > References : http://marc.info/?l=3Dlinux-kernel&m=3D125351816109264&w=3D4 > > > > > > The exact same bug (same cause, same symptom) just hit me again in 2.6.32-rc1. > > > > Thanks for the update. > > > > Could you check the current Linus' tree, please? There are some known > > regression fixes in there. > > I tried simplified versions of the bug trigger on two machines > running 2.6.32-rc1-git6, and neither triggered the kernel bug. > > The original recipe involved doing a glibc rebuild, run its test > suite, install it, and reboot. Today however machine 1 was already > doing a rebuild so after the rebuild it did a reboot into the new > kernel before the install. The second machine booted the new kernel > directly to install the binary packages from the first machine. > > I'll re-run the full bug trigger recipe on a third machine later next > week (it must rebuild glibc itself anyway due to arch differences). Not fixed in 2.6.32-rc3. A glibc rebuild + install triggered the exact same bug on the third machine. /Mikael ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14256] kernel BUG at fs/ext3/super.c:435 @ 2009-10-09 16:40 ` Mikael Pettersson 0 siblings, 0 replies; 384+ messages in thread From: Mikael Pettersson @ 2009-10-09 16:40 UTC (permalink / raw) To: Mikael Pettersson Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List Mikael Pettersson writes: > Rafael J. Wysocki writes: > > On Sunday 04 October 2009, Mikael Pettersson wrote: > > > Rafael J. Wysocki wrote: > > > > The following bug entry is on the current list of known regressions > > > > introduced between 2.6.30 and 2.6.31. Please verify if it still should > > > > be listed and let me know (either way). > > > > > > > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14256 > > > > Subject : kernel BUG at fs/ext3/super.c:435 > > > > Submitter : Mikael Pettersson <mikpe-1zs4UD6AkMk@public.gmane.org> > > > > Date : 2009-09-21 7:29 (11 days old) > > > > References : http://marc.info/?l=3Dlinux-kernel&m=3D125351816109264&w=3D4 > > > > > > The exact same bug (same cause, same symptom) just hit me again in 2.6.32-rc1. > > > > Thanks for the update. > > > > Could you check the current Linus' tree, please? There are some known > > regression fixes in there. > > I tried simplified versions of the bug trigger on two machines > running 2.6.32-rc1-git6, and neither triggered the kernel bug. > > The original recipe involved doing a glibc rebuild, run its test > suite, install it, and reboot. Today however machine 1 was already > doing a rebuild so after the rebuild it did a reboot into the new > kernel before the install. The second machine booted the new kernel > directly to install the binary packages from the first machine. > > I'll re-run the full bug trigger recipe on a third machine later next > week (it must rebuild glibc itself anyway due to arch differences). Not fixed in 2.6.32-rc3. A glibc rebuild + install triggered the exact same bug on the third machine. /Mikael ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14256] kernel BUG at fs/ext3/super.c:435 @ 2009-10-09 22:03 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-09 22:03 UTC (permalink / raw) To: Mikael Pettersson; +Cc: Linux Kernel Mailing List, Kernel Testers List On Friday 09 October 2009, Mikael Pettersson wrote: > Mikael Pettersson writes: > > Rafael J. Wysocki writes: > > > On Sunday 04 October 2009, Mikael Pettersson wrote: > > > > Rafael J. Wysocki wrote: > > > > > The following bug entry is on the current list of known regressions > > > > > introduced between 2.6.30 and 2.6.31. Please verify if it still should > > > > > be listed and let me know (either way). > > > > > > > > > > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14256 > > > > > Subject : kernel BUG at fs/ext3/super.c:435 > > > > > Submitter : Mikael Pettersson <mikpe@it.uu.se> > > > > > Date : 2009-09-21 7:29 (11 days old) > > > > > References : http://marc.info/?l=3Dlinux-kernel&m=3D125351816109264&w=3D4 > > > > > > > > The exact same bug (same cause, same symptom) just hit me again in 2.6.32-rc1. > > > > > > Thanks for the update. > > > > > > Could you check the current Linus' tree, please? There are some known > > > regression fixes in there. > > > > I tried simplified versions of the bug trigger on two machines > > running 2.6.32-rc1-git6, and neither triggered the kernel bug. > > > > The original recipe involved doing a glibc rebuild, run its test > > suite, install it, and reboot. Today however machine 1 was already > > doing a rebuild so after the rebuild it did a reboot into the new > > kernel before the install. The second machine booted the new kernel > > directly to install the binary packages from the first machine. > > > > I'll re-run the full bug trigger recipe on a third machine later next > > week (it must rebuild glibc itself anyway due to arch differences). > > Not fixed in 2.6.32-rc3. A glibc rebuild + install triggered the > exact same bug on the third machine. Thanks for the update. Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14256] kernel BUG at fs/ext3/super.c:435 @ 2009-10-09 22:03 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-09 22:03 UTC (permalink / raw) To: Mikael Pettersson; +Cc: Linux Kernel Mailing List, Kernel Testers List On Friday 09 October 2009, Mikael Pettersson wrote: > Mikael Pettersson writes: > > Rafael J. Wysocki writes: > > > On Sunday 04 October 2009, Mikael Pettersson wrote: > > > > Rafael J. Wysocki wrote: > > > > > The following bug entry is on the current list of known regressions > > > > > introduced between 2.6.30 and 2.6.31. Please verify if it still should > > > > > be listed and let me know (either way). > > > > > > > > > > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14256 > > > > > Subject : kernel BUG at fs/ext3/super.c:435 > > > > > Submitter : Mikael Pettersson <mikpe-1zs4UD6AkMk@public.gmane.org> > > > > > Date : 2009-09-21 7:29 (11 days old) > > > > > References : http://marc.info/?l=3Dlinux-kernel&m=3D125351816109264&w=3D4 > > > > > > > > The exact same bug (same cause, same symptom) just hit me again in 2.6.32-rc1. > > > > > > Thanks for the update. > > > > > > Could you check the current Linus' tree, please? There are some known > > > regression fixes in there. > > > > I tried simplified versions of the bug trigger on two machines > > running 2.6.32-rc1-git6, and neither triggered the kernel bug. > > > > The original recipe involved doing a glibc rebuild, run its test > > suite, install it, and reboot. Today however machine 1 was already > > doing a rebuild so after the rebuild it did a reboot into the new > > kernel before the install. The second machine booted the new kernel > > directly to install the binary packages from the first machine. > > > > I'll re-run the full bug trigger recipe on a third machine later next > > week (it must rebuild glibc itself anyway due to arch differences). > > Not fixed in 2.6.32-rc3. A glibc rebuild + install triggered the > exact same bug on the third machine. Thanks for the update. Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14261] e1000e jumbo frames no longer work: 'Unsupported MTU setting' 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Alexander Duyck, Nix This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14261 Subject : e1000e jumbo frames no longer work: 'Unsupported MTU setting' Submitter : Nix <nix@esperi.org.uk> Date : 2009-09-26 11:16 (6 days old) References : http://marc.info/?l=linux-kernel&m=125396433321342&w=4 Handled-By : Alexander Duyck <alexander.duyck@gmail.com> Patch : http://patchwork.kernel.org/patch/50277/ ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14261] e1000e jumbo frames no longer work: 'Unsupported MTU setting' @ 2009-10-01 19:56 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Alexander Duyck, Nix This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14261 Subject : e1000e jumbo frames no longer work: 'Unsupported MTU setting' Submitter : Nix <nix-dKoSMcxRz+Te9xe1eoZjHA@public.gmane.org> Date : 2009-09-26 11:16 (6 days old) References : http://marc.info/?l=linux-kernel&m=125396433321342&w=4 Handled-By : Alexander Duyck <alexander.duyck-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Patch : http://patchwork.kernel.org/patch/50277/ ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14261] e1000e jumbo frames no longer work: 'Unsupported MTU setting' 2009-10-01 19:56 ` Rafael J. Wysocki (?) @ 2009-10-02 20:33 ` Nix 2009-10-02 21:31 ` Rafael J. Wysocki -1 siblings, 1 reply; 384+ messages in thread From: Nix @ 2009-10-02 20:33 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Alexander Duyck On 1 Oct 2009, Rafael J. Wysocki stated: > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). The patch fixes it. > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14261 > Subject : e1000e jumbo frames no longer work: 'Unsupported MTU setting' > Submitter : Nix <nix@esperi.org.uk> > Date : 2009-09-26 11:16 (6 days old) > References : http://marc.info/?l=linux-kernel&m=125396433321342&w=4 > Handled-By : Alexander Duyck <alexander.duyck@gmail.com> > Patch : http://patchwork.kernel.org/patch/50277/ (Possibly a stable candidate? It's not in 2.6.31.2-to-be, perhaps the only patch that isn't. ;) ) ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14261] e1000e jumbo frames no longer work: 'Unsupported MTU setting' @ 2009-10-02 21:31 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-02 21:31 UTC (permalink / raw) To: Nix Cc: Linux Kernel Mailing List, Kernel Testers List, Alexander Duyck, Network Development, Jeff Kirsher, Jesse Brandeburg, e1000-devel On Friday 02 October 2009, Nix wrote: > On 1 Oct 2009, Rafael J. Wysocki stated: > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.30 and 2.6.31. Please verify if it still should > > be listed and let me know (either way). > > The patch fixes it. > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14261 > > Subject : e1000e jumbo frames no longer work: 'Unsupported MTU setting' > > Submitter : Nix <nix@esperi.org.uk> > > Date : 2009-09-26 11:16 (6 days old) > > References : http://marc.info/?l=linux-kernel&m=125396433321342&w=4 > > Handled-By : Alexander Duyck <alexander.duyck@gmail.com> > > Patch : http://patchwork.kernel.org/patch/50277/ > > (Possibly a stable candidate? It's not in 2.6.31.2-to-be, perhaps the > only patch that isn't. ;) ) Most likely because it's not in the Linus' tree yet. [e1000e maintainers, we have a regression fix to merge, please.] Thanks, Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14261] e1000e jumbo frames no longer work: 'Unsupported MTU setting' @ 2009-10-02 21:31 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-02 21:31 UTC (permalink / raw) To: Nix Cc: Linux Kernel Mailing List, Kernel Testers List, Alexander Duyck, Network Development, Jeff Kirsher, Jesse Brandeburg, e1000-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f On Friday 02 October 2009, Nix wrote: > On 1 Oct 2009, Rafael J. Wysocki stated: > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.30 and 2.6.31. Please verify if it still should > > be listed and let me know (either way). > > The patch fixes it. > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14261 > > Subject : e1000e jumbo frames no longer work: 'Unsupported MTU setting' > > Submitter : Nix <nix-dKoSMcxRz+Te9xe1eoZjHA@public.gmane.org> > > Date : 2009-09-26 11:16 (6 days old) > > References : http://marc.info/?l=linux-kernel&m=125396433321342&w=4 > > Handled-By : Alexander Duyck <alexander.duyck-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> > > Patch : http://patchwork.kernel.org/patch/50277/ > > (Possibly a stable candidate? It's not in 2.6.31.2-to-be, perhaps the > only patch that isn't. ;) ) Most likely because it's not in the Linus' tree yet. [e1000e maintainers, we have a regression fix to merge, please.] Thanks, Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14261] e1000e jumbo frames no longer work: 'Unsupported MTU setting' 2009-10-02 21:31 ` Rafael J. Wysocki @ 2009-10-02 22:13 ` Jeff Kirsher -1 siblings, 0 replies; 384+ messages in thread From: Jeff Kirsher @ 2009-10-02 22:13 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Nix, Linux Kernel Mailing List, Kernel Testers List, Alexander Duyck, Network Development, Jesse Brandeburg, e1000-devel On Fri, Oct 2, 2009 at 14:31, Rafael J. Wysocki <rjw@sisk.pl> wrote: > On Friday 02 October 2009, Nix wrote: >> On 1 Oct 2009, Rafael J. Wysocki stated: >> >> > The following bug entry is on the current list of known regressions >> > introduced between 2.6.30 and 2.6.31. Please verify if it still should >> > be listed and let me know (either way). >> >> The patch fixes it. >> >> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14261 >> > Subject : e1000e jumbo frames no longer work: 'Unsupported MTU setting' >> > Submitter : Nix <nix@esperi.org.uk> >> > Date : 2009-09-26 11:16 (6 days old) >> > References : http://marc.info/?l=linux-kernel&m=125396433321342&w=4 >> > Handled-By : Alexander Duyck <alexander.duyck@gmail.com> >> > Patch : http://patchwork.kernel.org/patch/50277/ >> >> (Possibly a stable candidate? It's not in 2.6.31.2-to-be, perhaps the >> only patch that isn't. ;) ) > > Most likely because it's not in the Linus' tree yet. > > [e1000e maintainers, we have a regression fix to merge, please.] > > Thanks, > Rafael Sorry, I forgot to send this patch out last night. I will send it now. -- Cheers, Jeff ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14261] e1000e jumbo frames no longer work: 'Unsupported MTU setting' @ 2009-10-02 22:13 ` Jeff Kirsher 0 siblings, 0 replies; 384+ messages in thread From: Jeff Kirsher @ 2009-10-02 22:13 UTC (permalink / raw) To: Rafael J. Wysocki Cc: e1000-devel, Network Development, Linux Kernel Mailing List, Alexander Duyck, Jesse Brandeburg, Nix, Kernel Testers List On Fri, Oct 2, 2009 at 14:31, Rafael J. Wysocki <rjw@sisk.pl> wrote: > On Friday 02 October 2009, Nix wrote: >> On 1 Oct 2009, Rafael J. Wysocki stated: >> >> > The following bug entry is on the current list of known regressions >> > introduced between 2.6.30 and 2.6.31. Please verify if it still should >> > be listed and let me know (either way). >> >> The patch fixes it. >> >> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14261 >> > Subject : e1000e jumbo frames no longer work: 'Unsupported MTU setting' >> > Submitter : Nix <nix@esperi.org.uk> >> > Date : 2009-09-26 11:16 (6 days old) >> > References : http://marc.info/?l=linux-kernel&m=125396433321342&w=4 >> > Handled-By : Alexander Duyck <alexander.duyck@gmail.com> >> > Patch : http://patchwork.kernel.org/patch/50277/ >> >> (Possibly a stable candidate? It's not in 2.6.31.2-to-be, perhaps the >> only patch that isn't. ;) ) > > Most likely because it's not in the Linus' tree yet. > > [e1000e maintainers, we have a regression fix to merge, please.] > > Thanks, > Rafael Sorry, I forgot to send this patch out last night. I will send it now. -- Cheers, Jeff ------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14261] e1000e jumbo frames no longer work: 'Unsupported MTU setting' 2009-10-02 22:13 ` Jeff Kirsher @ 2009-10-07 18:34 ` Theodore Tso -1 siblings, 0 replies; 384+ messages in thread From: Theodore Tso @ 2009-10-07 18:34 UTC (permalink / raw) To: Jeff Kirsher Cc: Rafael J. Wysocki, Nix, Linux Kernel Mailing List, Kernel Testers List, Alexander Duyck, Network Development, Jesse Brandeburg, e1000-devel On Fri, Oct 02, 2009 at 03:13:07PM -0700, Jeff Kirsher wrote: > >> > Patch : http://patchwork.kernel.org/patch/50277/ > >> > > Most likely because it's not in the Linus' tree yet. > > > > [e1000e maintainers, we have a regression fix to merge, please.] > > Sorry, I forgot to send this patch out last night. I will send it now. Do we have a status on this progress of this patch to mainline? Thanks, - Ted ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14261] e1000e jumbo frames no longer work: 'Unsupported MTU setting' @ 2009-10-07 18:34 ` Theodore Tso 0 siblings, 0 replies; 384+ messages in thread From: Theodore Tso @ 2009-10-07 18:34 UTC (permalink / raw) To: Jeff Kirsher Cc: Nix, e1000-devel, Network Development, Linux Kernel Mailing List, Alexander Duyck, Jesse Brandeburg, Rafael J. Wysocki, Kernel Testers List On Fri, Oct 02, 2009 at 03:13:07PM -0700, Jeff Kirsher wrote: > >> > Patch : http://patchwork.kernel.org/patch/50277/ > >> > > Most likely because it's not in the Linus' tree yet. > > > > [e1000e maintainers, we have a regression fix to merge, please.] > > Sorry, I forgot to send this patch out last night. I will send it now. Do we have a status on this progress of this patch to mainline? Thanks, - Ted ------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference ^ permalink raw reply [flat|nested] 384+ messages in thread
[parent not found: <20091007183453.GD12971-3s7WtUTddSA@public.gmane.org>]
* Re: [Bug #14261] e1000e jumbo frames no longer work: 'Unsupported MTU setting' [not found] ` <20091007183453.GD12971-3s7WtUTddSA@public.gmane.org> @ 2009-10-07 19:12 ` Jeff Kirsher 0 siblings, 0 replies; 384+ messages in thread From: Jeff Kirsher @ 2009-10-07 19:12 UTC (permalink / raw) To: Theodore Tso, Jeff Kirsher, Rafael J. Wysocki, Nix, Linux Kernel Mailing List On Wed, Oct 7, 2009 at 11:34, Theodore Tso <tytso-3s7WtUTddSA@public.gmane.org> wrote: > On Fri, Oct 02, 2009 at 03:13:07PM -0700, Jeff Kirsher wrote: >> >> > Patch : http://patchwork.kernel.org/patch/50277/ >> >> >> > Most likely because it's not in the Linus' tree yet. >> > >> > [e1000e maintainers, we have a regression fix to merge, please.] >> >> Sorry, I forgot to send this patch out last night. I will send it now. > > Do we have a status on this progress of this patch to mainline? Thanks, > > - Ted The patch has been submitted and accepted into David Miller's net-2.6 tree. I will submit the patch for 2.6.31 stable tree once it makes it into Linus's tree later this week. -- Cheers, Jeff ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14261] e1000e jumbo frames no longer work: 'Unsupported MTU setting' 2009-10-07 18:34 ` Theodore Tso (?) (?) @ 2009-10-07 19:12 ` Jeff Kirsher -1 siblings, 0 replies; 384+ messages in thread From: Jeff Kirsher @ 2009-10-07 19:12 UTC (permalink / raw) To: Theodore Tso, Jeff Kirsher, Rafael J. Wysocki, Nix, Linux Kernel Mailing List On Wed, Oct 7, 2009 at 11:34, Theodore Tso <tytso@mit.edu> wrote: > On Fri, Oct 02, 2009 at 03:13:07PM -0700, Jeff Kirsher wrote: >> >> > Patch : http://patchwork.kernel.org/patch/50277/ >> >> >> > Most likely because it's not in the Linus' tree yet. >> > >> > [e1000e maintainers, we have a regression fix to merge, please.] >> >> Sorry, I forgot to send this patch out last night. I will send it now. > > Do we have a status on this progress of this patch to mainline? Thanks, > > - Ted The patch has been submitted and accepted into David Miller's net-2.6 tree. I will submit the patch for 2.6.31 stable tree once it makes it into Linus's tree later this week. -- Cheers, Jeff ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14261] e1000e jumbo frames no longer work: 'Unsupported MTU setting' 2009-10-07 18:34 ` Theodore Tso ` (2 preceding siblings ...) (?) @ 2009-10-07 19:12 ` Jeff Kirsher -1 siblings, 0 replies; 384+ messages in thread From: Jeff Kirsher @ 2009-10-07 19:12 UTC (permalink / raw) To: Theodore Tso, Jeff Kirsher, Rafael J. Wysocki, Nix, Linux Kernel Mailing List, Kernel Testers List, Alexander Duyck, Network Development, Jesse Brandeburg, e1000-devel On Wed, Oct 7, 2009 at 11:34, Theodore Tso <tytso@mit.edu> wrote: > On Fri, Oct 02, 2009 at 03:13:07PM -0700, Jeff Kirsher wrote: >> >> > Patch : http://patchwork.kernel.org/patch/50277/ >> >> >> > Most likely because it's not in the Linus' tree yet. >> > >> > [e1000e maintainers, we have a regression fix to merge, please.] >> >> Sorry, I forgot to send this patch out last night. I will send it now. > > Do we have a status on this progress of this patch to mainline? Thanks, > > - Ted The patch has been submitted and accepted into David Miller's net-2.6 tree. I will submit the patch for 2.6.31 stable tree once it makes it into Linus's tree later this week. -- Cheers, Jeff ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14264] ehci problem - mouse dead on scroll 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (40 preceding siblings ...) 2009-10-01 19:56 ` Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:56 ` Rafael J. Wysocki ` (6 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Alan Stern, Oliver Neukum, Volker Armin Hemmann This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14264 Subject : ehci problem - mouse dead on scroll Submitter : Volker Armin Hemmann <volkerarmin@googlemail.com> Date : 2009-09-12 7:46 (20 days old) References : http://marc.info/?l=linux-kernel&m=125274202707893&w=4 Handled-By : Alan Stern <stern@rowland.harvard.edu> ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14270] Cannot boot on a PIII Celeron 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki ` (47 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Cyrill Gorcunov, Michael Tokarev This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14270 Subject : Cannot boot on a PIII Celeron Submitter : Michael Tokarev <mjt@tls.msk.ru> Date : 2009-09-28 15:26 (4 days old) References : http://marc.info/?l=linux-kernel&m=125415160524110&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14270] Cannot boot on a PIII Celeron @ 2009-10-01 19:56 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Cyrill Gorcunov, Michael Tokarev This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14270 Subject : Cannot boot on a PIII Celeron Submitter : Michael Tokarev <mjt-XAri/EZa3C4vJsYlp49lxw@public.gmane.org> Date : 2009-09-28 15:26 (4 days old) References : http://marc.info/?l=linux-kernel&m=125415160524110&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14270] Cannot boot on a PIII Celeron 2009-10-01 19:56 ` Rafael J. Wysocki @ 2009-10-02 8:30 ` Cyrill Gorcunov -1 siblings, 0 replies; 384+ messages in thread From: Cyrill Gorcunov @ 2009-10-02 8:30 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Michael Tokarev On 10/1/09, Rafael J. Wysocki <rjw@sisk.pl> wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > Michael has been asked to bisect it (if possible). I cant reproduce it in kvm unfortunately. > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14270 > Subject : Cannot boot on a PIII Celeron > Submitter : Michael Tokarev <mjt@tls.msk.ru> > Date : 2009-09-28 15:26 (4 days old) > References : http://marc.info/?l=linux-kernel&m=125415160524110&w=4 > > > ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14270] Cannot boot on a PIII Celeron @ 2009-10-02 8:30 ` Cyrill Gorcunov 0 siblings, 0 replies; 384+ messages in thread From: Cyrill Gorcunov @ 2009-10-02 8:30 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Michael Tokarev On 10/1/09, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > Michael has been asked to bisect it (if possible). I cant reproduce it in kvm unfortunately. > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14270 > Subject : Cannot boot on a PIII Celeron > Submitter : Michael Tokarev <mjt-XAri/EZa3C4vJsYlp49lxw@public.gmane.org> > Date : 2009-09-28 15:26 (4 days old) > References : http://marc.info/?l=linux-kernel&m=125415160524110&w=4 > > > ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14270] Cannot boot on a PIII Celeron @ 2009-10-02 9:13 ` Michael Tokarev 0 siblings, 0 replies; 384+ messages in thread From: Michael Tokarev @ 2009-10-02 9:13 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List Cyrill Gorcunov wrote: > On 10/1/09, Rafael J. Wysocki <rjw@sisk.pl> wrote: >> This message has been generated automatically as a part of a report >> of regressions introduced between 2.6.30 and 2.6.31. >> >> The following bug entry is on the current list of known regressions >> introduced between 2.6.30 and 2.6.31. Please verify if it still should >> be listed and let me know (either way). > > Michael has been asked to bisect it (if possible). I cant reproduce it > in kvm unfortunately. Yes, and that's what I'll be trying to do shortly. I had other issues to sort out and wasn't able to get to it in few last days. Also I've a few other suspects. For example, in this .31 config I changed from bzip to lzma compression - and that's where (or near) kernel is rebooting. /mjt >> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14270 >> Subject : Cannot boot on a PIII Celeron >> Submitter : Michael Tokarev <mjt@tls.msk.ru> >> Date : 2009-09-28 15:26 (4 days old) >> References : http://marc.info/?l=linux-kernel&m=125415160524110&w=4 >> >> >> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14270] Cannot boot on a PIII Celeron @ 2009-10-02 9:13 ` Michael Tokarev 0 siblings, 0 replies; 384+ messages in thread From: Michael Tokarev @ 2009-10-02 9:13 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List Cyrill Gorcunov wrote: > On 10/1/09, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote: >> This message has been generated automatically as a part of a report >> of regressions introduced between 2.6.30 and 2.6.31. >> >> The following bug entry is on the current list of known regressions >> introduced between 2.6.30 and 2.6.31. Please verify if it still should >> be listed and let me know (either way). > > Michael has been asked to bisect it (if possible). I cant reproduce it > in kvm unfortunately. Yes, and that's what I'll be trying to do shortly. I had other issues to sort out and wasn't able to get to it in few last days. Also I've a few other suspects. For example, in this .31 config I changed from bzip to lzma compression - and that's where (or near) kernel is rebooting. /mjt >> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14270 >> Subject : Cannot boot on a PIII Celeron >> Submitter : Michael Tokarev <mjt-XAri/EZa3C4vJsYlp49lxw@public.gmane.org> >> Date : 2009-09-28 15:26 (4 days old) >> References : http://marc.info/?l=linux-kernel&m=125415160524110&w=4 >> >> >> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14270] Cannot boot on a PIII Celeron @ 2009-10-02 10:38 ` Michael Tokarev 0 siblings, 0 replies; 384+ messages in thread From: Michael Tokarev @ 2009-10-02 10:38 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List Michael Tokarev wrote: > Cyrill Gorcunov wrote: >> On 10/1/09, Rafael J. Wysocki <rjw@sisk.pl> wrote: >>> This message has been generated automatically as a part of a report >>> of regressions introduced between 2.6.30 and 2.6.31. >>> >>> The following bug entry is on the current list of known regressions >>> introduced between 2.6.30 and 2.6.31. Please verify if it still should >>> be listed and let me know (either way). >> >> Michael has been asked to bisect it (if possible). I cant reproduce it >> in kvm unfortunately. > > Yes, and that's what I'll be trying to do shortly. > I had other issues to sort out and wasn't able to > get to it in few last days. > > Also I've a few other suspects. For example, in this .31 > config I changed from bzip to lzma compression - and that's > where (or near) kernel is rebooting. And that was the problem. After switching from LZMA to BZIP2 kernel boots again. Dunno if it can be treated as a regression, but it's definitely a bug. /mjt >>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14270 >>> Subject : Cannot boot on a PIII Celeron >>> Submitter : Michael Tokarev <mjt@tls.msk.ru> >>> Date : 2009-09-28 15:26 (4 days old) >>> References : http://marc.info/?l=linux-kernel&m=125415160524110&w=4 >>> >>> >>> > ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14270] Cannot boot on a PIII Celeron @ 2009-10-02 10:38 ` Michael Tokarev 0 siblings, 0 replies; 384+ messages in thread From: Michael Tokarev @ 2009-10-02 10:38 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List Michael Tokarev wrote: > Cyrill Gorcunov wrote: >> On 10/1/09, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote: >>> This message has been generated automatically as a part of a report >>> of regressions introduced between 2.6.30 and 2.6.31. >>> >>> The following bug entry is on the current list of known regressions >>> introduced between 2.6.30 and 2.6.31. Please verify if it still should >>> be listed and let me know (either way). >> >> Michael has been asked to bisect it (if possible). I cant reproduce it >> in kvm unfortunately. > > Yes, and that's what I'll be trying to do shortly. > I had other issues to sort out and wasn't able to > get to it in few last days. > > Also I've a few other suspects. For example, in this .31 > config I changed from bzip to lzma compression - and that's > where (or near) kernel is rebooting. And that was the problem. After switching from LZMA to BZIP2 kernel boots again. Dunno if it can be treated as a regression, but it's definitely a bug. /mjt >>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14270 >>> Subject : Cannot boot on a PIII Celeron >>> Submitter : Michael Tokarev <mjt-XAri/EZa3C4vJsYlp49lxw@public.gmane.org> >>> Date : 2009-09-28 15:26 (4 days old) >>> References : http://marc.info/?l=linux-kernel&m=125415160524110&w=4 >>> >>> >>> > ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14270] Cannot boot on a PIII Celeron 2009-10-02 10:38 ` Michael Tokarev (?) @ 2009-10-02 10:55 ` Cyrill Gorcunov 2009-10-02 10:59 ` Michael Tokarev -1 siblings, 1 reply; 384+ messages in thread From: Cyrill Gorcunov @ 2009-10-02 10:55 UTC (permalink / raw) To: Michael Tokarev Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List On 10/2/09, Michael Tokarev <mjt@tls.msk.ru> wrote: > Michael Tokarev wrote: >> Cyrill Gorcunov wrote: >>> On 10/1/09, Rafael J. Wysocki <rjw@sisk.pl> wrote: >>>> This message has been generated automatically as a part of a report >>>> of regressions introduced between 2.6.30 and 2.6.31. >>>> >>>> The following bug entry is on the current list of known regressions >>>> introduced between 2.6.30 and 2.6.31. Please verify if it still should >>>> be listed and let me know (either way). >>> >>> Michael has been asked to bisect it (if possible). I cant reproduce it >>> in kvm unfortunately. >> >> Yes, and that's what I'll be trying to do shortly. >> I had other issues to sort out and wasn't able to >> get to it in few last days. >> >> Also I've a few other suspects. For example, in this .31 >> config I changed from bzip to lzma compression - and that's >> where (or near) kernel is rebooting. > > And that was the problem. After switching from LZMA > to BZIP2 kernel boots again. > > Dunno if it can be treated as a regression, but it's > definitely a bug. > > /mjt thanks for tracking it down Michael! Rafael, who is responsible for LZMA now? Cc him please. > >>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14270 >>>> Subject : Cannot boot on a PIII Celeron >>>> Submitter : Michael Tokarev <mjt@tls.msk.ru> >>>> Date : 2009-09-28 15:26 (4 days old) >>>> References : http://marc.info/?l=linux-kernel&m=125415160524110&w=4 >>>> >>>> >>>> >> > > ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14270] Cannot boot on a PIII Celeron @ 2009-10-02 10:59 ` Michael Tokarev 0 siblings, 0 replies; 384+ messages in thread From: Michael Tokarev @ 2009-10-02 10:59 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List Cyrill Gorcunov wrote: > On 10/2/09, Michael Tokarev <mjt@tls.msk.ru> wrote: >> Michael Tokarev wrote: >>> Cyrill Gorcunov wrote: >>>> On 10/1/09, Rafael J. Wysocki <rjw@sisk.pl> wrote: >>>>> This message has been generated automatically as a part of a report >>>>> of regressions introduced between 2.6.30 and 2.6.31. >>>>> >>>>> The following bug entry is on the current list of known regressions >>>>> introduced between 2.6.30 and 2.6.31. Please verify if it still should >>>>> be listed and let me know (either way). >>>> Michael has been asked to bisect it (if possible). I cant reproduce it >>>> in kvm unfortunately. >>> Yes, and that's what I'll be trying to do shortly. >>> I had other issues to sort out and wasn't able to >>> get to it in few last days. >>> >>> Also I've a few other suspects. For example, in this .31 >>> config I changed from bzip to lzma compression - and that's >>> where (or near) kernel is rebooting. >> And that was the problem. After switching from LZMA >> to BZIP2 kernel boots again. >> >> Dunno if it can be treated as a regression, but it's >> definitely a bug. > > thanks for tracking it down Michael! > Rafael, who is responsible for LZMA now? > Cc him please. Please hold on for a while. I switched to BZIP2, it booted fine. I switched back to LZMA - and that one now boots too. Original bzImage, which were built by the same compiler from the same source using the same options reboots. So um... I'm now trying to reproduce it ;) /mjt ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14270] Cannot boot on a PIII Celeron @ 2009-10-02 10:59 ` Michael Tokarev 0 siblings, 0 replies; 384+ messages in thread From: Michael Tokarev @ 2009-10-02 10:59 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List Cyrill Gorcunov wrote: > On 10/2/09, Michael Tokarev <mjt-XAri/EZa3C4vJsYlp49lxw@public.gmane.org> wrote: >> Michael Tokarev wrote: >>> Cyrill Gorcunov wrote: >>>> On 10/1/09, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote: >>>>> This message has been generated automatically as a part of a report >>>>> of regressions introduced between 2.6.30 and 2.6.31. >>>>> >>>>> The following bug entry is on the current list of known regressions >>>>> introduced between 2.6.30 and 2.6.31. Please verify if it still should >>>>> be listed and let me know (either way). >>>> Michael has been asked to bisect it (if possible). I cant reproduce it >>>> in kvm unfortunately. >>> Yes, and that's what I'll be trying to do shortly. >>> I had other issues to sort out and wasn't able to >>> get to it in few last days. >>> >>> Also I've a few other suspects. For example, in this .31 >>> config I changed from bzip to lzma compression - and that's >>> where (or near) kernel is rebooting. >> And that was the problem. After switching from LZMA >> to BZIP2 kernel boots again. >> >> Dunno if it can be treated as a regression, but it's >> definitely a bug. > > thanks for tracking it down Michael! > Rafael, who is responsible for LZMA now? > Cc him please. Please hold on for a while. I switched to BZIP2, it booted fine. I switched back to LZMA - and that one now boots too. Original bzImage, which were built by the same compiler from the same source using the same options reboots. So um... I'm now trying to reproduce it ;) /mjt ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14270] Cannot boot on a PIII Celeron 2009-10-02 10:59 ` Michael Tokarev (?) @ 2009-10-02 14:05 ` Cyrill Gorcunov -1 siblings, 0 replies; 384+ messages in thread From: Cyrill Gorcunov @ 2009-10-02 14:05 UTC (permalink / raw) To: Michael Tokarev Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List [Michael Tokarev - Fri, Oct 02, 2009 at 02:59:09PM +0400] ... > > Please hold on for a while. > > I switched to BZIP2, it booted fine. I switched back to LZMA - > and that one now boots too. Original bzImage, which were built > by the same compiler from the same source using the same > options reboots. > > So um... I'm now trying to reproduce it ;) > > /mjt > ok, perhaps it was indirect error or cosmic rays -- Cyrill ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14270] Cannot boot on a PIII Celeron 2009-10-02 10:59 ` Michael Tokarev (?) (?) @ 2009-10-04 12:14 ` Michael Tokarev 2009-10-04 12:43 ` Cyrill Gorcunov -1 siblings, 1 reply; 384+ messages in thread From: Michael Tokarev @ 2009-10-04 12:14 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List Michael Tokarev wrote: > Cyrill Gorcunov wrote: >> On 10/2/09, Michael Tokarev <mjt@tls.msk.ru> wrote: >>> Michael Tokarev wrote: >>>> Cyrill Gorcunov wrote: >>>>> On 10/1/09, Rafael J. Wysocki <rjw@sisk.pl> wrote: >>>>>> This message has been generated automatically as a part of a report >>>>>> of regressions introduced between 2.6.30 and 2.6.31. >>>>>> >>>>>> The following bug entry is on the current list of known regressions >>>>>> introduced between 2.6.30 and 2.6.31. Please verify if it still >>>>>> should >>>>>> be listed and let me know (either way). >>>>> Michael has been asked to bisect it (if possible). I cant reproduce it >>>>> in kvm unfortunately. >>>> Yes, and that's what I'll be trying to do shortly. >>>> I had other issues to sort out and wasn't able to >>>> get to it in few last days. >>>> >>>> Also I've a few other suspects. For example, in this .31 >>>> config I changed from bzip to lzma compression - and that's >>>> where (or near) kernel is rebooting. >>> And that was the problem. After switching from LZMA >>> to BZIP2 kernel boots again. >>> >>> Dunno if it can be treated as a regression, but it's >>> definitely a bug. >> >> thanks for tracking it down Michael! >> Rafael, who is responsible for LZMA now? >> Cc him please. > > Please hold on for a while. > > I switched to BZIP2, it booted fine. I switched back to LZMA - > and that one now boots too. Original bzImage, which were built > by the same compiler from the same source using the same > options reboots. > > So um... I'm now trying to reproduce it ;) I performed about 20 kernel recompiles, and finally have some "statistics". The problem is almost reproduceable, in a sense that I was able to get 6 more cases behaving the same way (rebooting right at early boot on a cel). And all 3 "non-working" cases were with ccache. Ie, about half out of ~25 compiles done with ccache, and 7 of the resulting kernels are buggy. No single failure without ccache so far. Maybe it's some stale .o file cached by ccache (and it indeed looks like that) -- I didn't try to remove the cache yet (but my guess is that I wont be able to reproduce the issue with clean cache anymore). What puzzles me most is the "failure mode". The difference between the two processors is minimal. Having a corrupt .o file and almost-working kernel is almost impossible by its own. And hitting this difference with a corrupt .o file is.. unbelievable. So I'm declaring it's a false alarm for now, and closing the bug. /mjt ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14270] Cannot boot on a PIII Celeron 2009-10-04 12:14 ` Michael Tokarev @ 2009-10-04 12:43 ` Cyrill Gorcunov 0 siblings, 0 replies; 384+ messages in thread From: Cyrill Gorcunov @ 2009-10-04 12:43 UTC (permalink / raw) To: Michael Tokarev Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List [Michael Tokarev - Sun, Oct 04, 2009 at 04:14:44PM +0400] ... >> >> I switched to BZIP2, it booted fine. I switched back to LZMA - >> and that one now boots too. Original bzImage, which were built >> by the same compiler from the same source using the same >> options reboots. >> >> So um... I'm now trying to reproduce it ;) > > I performed about 20 kernel recompiles, and finally have some "statistics". > The problem is almost reproduceable, in a sense that I was able to get 6 > more cases behaving the same way (rebooting right at early boot on a cel). > And all 3 "non-working" cases were with ccache. Ie, about half out of ~25 > compiles done with ccache, and 7 of the resulting kernels are buggy. No > single failure without ccache so far. > > Maybe it's some stale .o file cached by ccache (and it indeed looks like > that) -- I didn't try to remove the cache yet (but my guess is that I > wont be able to reproduce the issue with clean cache anymore). > > What puzzles me most is the "failure mode". The difference between the > two processors is minimal. Having a corrupt .o file and almost-working > kernel is almost impossible by its own. And hitting this difference with > a corrupt .o file is.. unbelievable. > > So I'm declaring it's a false alarm for now, and closing the bug. ok, thanks for hard work on this Michael! > > /mjt > -- Cyrill ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14265] ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (42 preceding siblings ...) 2009-10-01 19:56 ` Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-21 20:04 ` Karol Lewandowski 2009-10-01 19:56 ` [Bug #14266] regression in page writeback Rafael J. Wysocki ` (4 subsequent siblings) 48 siblings, 1 reply; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Karol Lewandowski, Mel Gorman This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14265 Subject : ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100 Submitter : Karol Lewandowski <karol.k.lewandowski@gmail.com> Date : 2009-09-15 12:05 (17 days old) References : http://marc.info/?l=linux-kernel&m=125301636509517&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) 2009-10-01 19:56 ` [Bug #14265] ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100 Rafael J. Wysocki @ 2009-10-21 20:04 ` Karol Lewandowski 0 siblings, 0 replies; 384+ messages in thread From: Karol Lewandowski @ 2009-10-21 20:04 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Karol Lewandowski, Mel Gorman, Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe, Tobias Oetiker On Thu, Oct 01, 2009 at 09:56:04PM +0200, Rafael J. Wysocki wrote: > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14265 > Subject : ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100 > Submitter : Karol Lewandowski <karol.k.lewandowski@gmail.com> > Date : 2009-09-15 12:05 (17 days old) > References : http://marc.info/?l=linux-kernel&m=125301636509517&w=4 Guys, could anyone check if patch below helps? I think I've finally found culprit of all allocation failures (but I might be wrong too... ;-) Thanks. commit d6849591e042bceb66f1b4513a1df6740d2ad762 Author: Karol Lewandowski <karol.k.lewandowski@gmail.com> Date: Wed Oct 21 21:01:20 2009 +0200 SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() Commit ba52270d18fb17ce2cf176b35419dab1e43fe4a3 unconditionally cleared __GFP_NOFAIL flag on all allocations. Preserve this flag on second attempt to allocate page (with possibly decreased order). This should help with bugs #14265, #14141 and similar. Signed-off-by: Karol Lewandowski <karol.k.lewandowski@gmail.com> diff --git a/mm/slub.c b/mm/slub.c index b627675..ac5db65 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1084,7 +1084,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) { struct page *page; struct kmem_cache_order_objects oo = s->oo; - gfp_t alloc_gfp; + gfp_t alloc_gfp, nofail; flags |= s->allocflags; @@ -1092,6 +1092,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) * Let the initial higher-order allocation fail under memory pressure * so we fall-back to the minimum order allocation. */ + nofail = flags & __GFP_NOFAIL; alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL; page = alloc_slab_page(alloc_gfp, node, oo); @@ -1100,8 +1101,10 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) /* * Allocation may have failed due to fragmentation. * Try a lower order alloc if possible + * + * Preserve __GFP_NOFAIL flag if previous allocation failed. */ - page = alloc_slab_page(flags, node, oo); + page = alloc_slab_page(flags | nofail, node, oo); if (!page) return NULL; ^ permalink raw reply related [flat|nested] 384+ messages in thread
* [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) @ 2009-10-21 20:04 ` Karol Lewandowski 0 siblings, 0 replies; 384+ messages in thread From: Karol Lewandowski @ 2009-10-21 20:04 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Karol Lewandowski, Mel Gorman, Frans Pop, Pekka Enberg, David Rientjes, KOSAKI Motohiro, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe, Tobias Oetiker On Thu, Oct 01, 2009 at 09:56:04PM +0200, Rafael J. Wysocki wrote: > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14265 > Subject : ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100 > Submitter : Karol Lewandowski <karol.k.lewandowski@gmail.com> > Date : 2009-09-15 12:05 (17 days old) > References : http://marc.info/?l=linux-kernel&m=125301636509517&w=4 Guys, could anyone check if patch below helps? I think I've finally found culprit of all allocation failures (but I might be wrong too... ;-) Thanks. commit d6849591e042bceb66f1b4513a1df6740d2ad762 Author: Karol Lewandowski <karol.k.lewandowski@gmail.com> Date: Wed Oct 21 21:01:20 2009 +0200 SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() Commit ba52270d18fb17ce2cf176b35419dab1e43fe4a3 unconditionally cleared __GFP_NOFAIL flag on all allocations. Preserve this flag on second attempt to allocate page (with possibly decreased order). This should help with bugs #14265, #14141 and similar. Signed-off-by: Karol Lewandowski <karol.k.lewandowski@gmail.com> diff --git a/mm/slub.c b/mm/slub.c index b627675..ac5db65 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1084,7 +1084,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) { struct page *page; struct kmem_cache_order_objects oo = s->oo; - gfp_t alloc_gfp; + gfp_t alloc_gfp, nofail; flags |= s->allocflags; @@ -1092,6 +1092,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) * Let the initial higher-order allocation fail under memory pressure * so we fall-back to the minimum order allocation. */ + nofail = flags & __GFP_NOFAIL; alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL; page = alloc_slab_page(alloc_gfp, node, oo); @@ -1100,8 +1101,10 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) /* * Allocation may have failed due to fragmentation. * Try a lower order alloc if possible + * + * Preserve __GFP_NOFAIL flag if previous allocation failed. */ - page = alloc_slab_page(flags, node, oo); + page = alloc_slab_page(flags | nofail, node, oo); if (!page) return NULL; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) 2009-10-21 20:04 ` Karol Lewandowski (?) @ 2009-10-21 21:06 ` David Rientjes -1 siblings, 0 replies; 384+ messages in thread From: David Rientjes @ 2009-10-21 21:06 UTC (permalink / raw) To: Karol Lewandowski Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Mel Gorman, Frans Pop, Pekka Enberg, KOSAKI Motohiro, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe, Tobias Oetiker On Wed, 21 Oct 2009, Karol Lewandowski wrote: > commit d6849591e042bceb66f1b4513a1df6740d2ad762 > Author: Karol Lewandowski <karol.k.lewandowski@gmail.com> > Date: Wed Oct 21 21:01:20 2009 +0200 > > SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() > > Commit ba52270d18fb17ce2cf176b35419dab1e43fe4a3 unconditionally > cleared __GFP_NOFAIL flag on all allocations. > No, it clears __GFP_NOFAIL from the first allocation of oo_order(s->oo). If that fails (and it's easy to fail, it has __GFP_NORETRY), another allocation is attempted with oo_order(s->min), for which __GFP_NOFAIL would be preserved if that's the slab cache's allocflags. > Preserve this flag on second attempt to allocate page (with possibly > decreased order). > > This should help with bugs #14265, #14141 and similar. > > Signed-off-by: Karol Lewandowski <karol.k.lewandowski@gmail.com> > > diff --git a/mm/slub.c b/mm/slub.c > index b627675..ac5db65 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1084,7 +1084,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) > { > struct page *page; > struct kmem_cache_order_objects oo = s->oo; > - gfp_t alloc_gfp; > + gfp_t alloc_gfp, nofail; > > flags |= s->allocflags; > > @@ -1092,6 +1092,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) > * Let the initial higher-order allocation fail under memory pressure > * so we fall-back to the minimum order allocation. > */ > + nofail = flags & __GFP_NOFAIL; > alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL; > > page = alloc_slab_page(alloc_gfp, node, oo); > @@ -1100,8 +1101,10 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) > /* > * Allocation may have failed due to fragmentation. > * Try a lower order alloc if possible > + * > + * Preserve __GFP_NOFAIL flag if previous allocation failed. > */ > - page = alloc_slab_page(flags, node, oo); > + page = alloc_slab_page(flags | nofail, node, oo); > if (!page) > return NULL; > > This does nothing. You may have missed that the lower order allocation is passing 'flags' (which is a union of the gfp flags passed to allocate_slab() based on the allocation context and the cache's allocflags), and not alloc_gfp where __GFP_NOFAIL is masked. Nack. Note: slub isn't going to be a culprit in order 5 allocation failures since they have kmalloc passthrough to the page allocator. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) @ 2009-10-21 21:06 ` David Rientjes 0 siblings, 0 replies; 384+ messages in thread From: David Rientjes @ 2009-10-21 21:06 UTC (permalink / raw) To: Karol Lewandowski Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Mel Gorman, Frans Pop, Pekka Enberg, KOSAKI Motohiro, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, Tobias Oetiker On Wed, 21 Oct 2009, Karol Lewandowski wrote: > commit d6849591e042bceb66f1b4513a1df6740d2ad762 > Author: Karol Lewandowski <karol.k.lewandowski-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> > Date: Wed Oct 21 21:01:20 2009 +0200 > > SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() > > Commit ba52270d18fb17ce2cf176b35419dab1e43fe4a3 unconditionally > cleared __GFP_NOFAIL flag on all allocations. > No, it clears __GFP_NOFAIL from the first allocation of oo_order(s->oo). If that fails (and it's easy to fail, it has __GFP_NORETRY), another allocation is attempted with oo_order(s->min), for which __GFP_NOFAIL would be preserved if that's the slab cache's allocflags. > Preserve this flag on second attempt to allocate page (with possibly > decreased order). > > This should help with bugs #14265, #14141 and similar. > > Signed-off-by: Karol Lewandowski <karol.k.lewandowski-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> > > diff --git a/mm/slub.c b/mm/slub.c > index b627675..ac5db65 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1084,7 +1084,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) > { > struct page *page; > struct kmem_cache_order_objects oo = s->oo; > - gfp_t alloc_gfp; > + gfp_t alloc_gfp, nofail; > > flags |= s->allocflags; > > @@ -1092,6 +1092,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) > * Let the initial higher-order allocation fail under memory pressure > * so we fall-back to the minimum order allocation. > */ > + nofail = flags & __GFP_NOFAIL; > alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL; > > page = alloc_slab_page(alloc_gfp, node, oo); > @@ -1100,8 +1101,10 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) > /* > * Allocation may have failed due to fragmentation. > * Try a lower order alloc if possible > + * > + * Preserve __GFP_NOFAIL flag if previous allocation failed. > */ > - page = alloc_slab_page(flags, node, oo); > + page = alloc_slab_page(flags | nofail, node, oo); > if (!page) > return NULL; > > This does nothing. You may have missed that the lower order allocation is passing 'flags' (which is a union of the gfp flags passed to allocate_slab() based on the allocation context and the cache's allocflags), and not alloc_gfp where __GFP_NOFAIL is masked. Nack. Note: slub isn't going to be a culprit in order 5 allocation failures since they have kmalloc passthrough to the page allocator. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) @ 2009-10-21 21:06 ` David Rientjes 0 siblings, 0 replies; 384+ messages in thread From: David Rientjes @ 2009-10-21 21:06 UTC (permalink / raw) To: Karol Lewandowski Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Mel Gorman, Frans Pop, Pekka Enberg, KOSAKI Motohiro, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe, Tobias Oetiker On Wed, 21 Oct 2009, Karol Lewandowski wrote: > commit d6849591e042bceb66f1b4513a1df6740d2ad762 > Author: Karol Lewandowski <karol.k.lewandowski@gmail.com> > Date: Wed Oct 21 21:01:20 2009 +0200 > > SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() > > Commit ba52270d18fb17ce2cf176b35419dab1e43fe4a3 unconditionally > cleared __GFP_NOFAIL flag on all allocations. > No, it clears __GFP_NOFAIL from the first allocation of oo_order(s->oo). If that fails (and it's easy to fail, it has __GFP_NORETRY), another allocation is attempted with oo_order(s->min), for which __GFP_NOFAIL would be preserved if that's the slab cache's allocflags. > Preserve this flag on second attempt to allocate page (with possibly > decreased order). > > This should help with bugs #14265, #14141 and similar. > > Signed-off-by: Karol Lewandowski <karol.k.lewandowski@gmail.com> > > diff --git a/mm/slub.c b/mm/slub.c > index b627675..ac5db65 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1084,7 +1084,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) > { > struct page *page; > struct kmem_cache_order_objects oo = s->oo; > - gfp_t alloc_gfp; > + gfp_t alloc_gfp, nofail; > > flags |= s->allocflags; > > @@ -1092,6 +1092,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) > * Let the initial higher-order allocation fail under memory pressure > * so we fall-back to the minimum order allocation. > */ > + nofail = flags & __GFP_NOFAIL; > alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL; > > page = alloc_slab_page(alloc_gfp, node, oo); > @@ -1100,8 +1101,10 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) > /* > * Allocation may have failed due to fragmentation. > * Try a lower order alloc if possible > + * > + * Preserve __GFP_NOFAIL flag if previous allocation failed. > */ > - page = alloc_slab_page(flags, node, oo); > + page = alloc_slab_page(flags | nofail, node, oo); > if (!page) > return NULL; > > This does nothing. You may have missed that the lower order allocation is passing 'flags' (which is a union of the gfp flags passed to allocate_slab() based on the allocation context and the cache's allocflags), and not alloc_gfp where __GFP_NOFAIL is masked. Nack. Note: slub isn't going to be a culprit in order 5 allocation failures since they have kmalloc passthrough to the page allocator. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) 2009-10-21 21:06 ` David Rientjes (?) @ 2009-10-21 21:20 ` Karol Lewandowski -1 siblings, 0 replies; 384+ messages in thread From: Karol Lewandowski @ 2009-10-21 21:20 UTC (permalink / raw) To: David Rientjes Cc: Karol Lewandowski, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Mel Gorman, Frans Pop, Pekka Enberg, KOSAKI Motohiro, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe, Tobias Oetiker On Wed, Oct 21, 2009 at 02:06:41PM -0700, David Rientjes wrote: > On Wed, 21 Oct 2009, Karol Lewandowski wrote: > > > commit d6849591e042bceb66f1b4513a1df6740d2ad762 > > Author: Karol Lewandowski <karol.k.lewandowski@gmail.com> > > Date: Wed Oct 21 21:01:20 2009 +0200 > > > > SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() > > > > Commit ba52270d18fb17ce2cf176b35419dab1e43fe4a3 unconditionally > > cleared __GFP_NOFAIL flag on all allocations. > > > > No, it clears __GFP_NOFAIL from the first allocation of oo_order(s->oo). > If that fails (and it's easy to fail, it has __GFP_NORETRY), another > allocation is attempted with oo_order(s->min), for which __GFP_NOFAIL > would be preserved if that's the slab cache's allocflags. Right, patch is junk. However, I haven't been able to trigger failures since I've switched to SLAB allocator. That patch seemed related (and wrong), but it wasn't. > > */ > > - page = alloc_slab_page(flags, node, oo); > > + page = alloc_slab_page(flags | nofail, node, oo); > > if (!page) > > return NULL; > > > > > > This does nothing. You may have missed that the lower order allocation is > passing 'flags' (which is a union of the gfp flags passed to > allocate_slab() based on the allocation context and the cache's > allocflags), and not alloc_gfp where __GFP_NOFAIL is masked. Right, I missed that. > Nack. > > Note: slub isn't going to be a culprit in order 5 allocation failures > since they have kmalloc passthrough to the page allocator. However, it might change fragmentation somewhat I guess. This might make problem more/less visible. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) @ 2009-10-21 21:20 ` Karol Lewandowski 0 siblings, 0 replies; 384+ messages in thread From: Karol Lewandowski @ 2009-10-21 21:20 UTC (permalink / raw) To: David Rientjes Cc: Karol Lewandowski, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Mel Gorman, Frans Pop, Pekka Enberg, KOSAKI Motohiro, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, Tobias Oetiker On Wed, Oct 21, 2009 at 02:06:41PM -0700, David Rientjes wrote: > On Wed, 21 Oct 2009, Karol Lewandowski wrote: > > > commit d6849591e042bceb66f1b4513a1df6740d2ad762 > > Author: Karol Lewandowski <karol.k.lewandowski-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> > > Date: Wed Oct 21 21:01:20 2009 +0200 > > > > SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() > > > > Commit ba52270d18fb17ce2cf176b35419dab1e43fe4a3 unconditionally > > cleared __GFP_NOFAIL flag on all allocations. > > > > No, it clears __GFP_NOFAIL from the first allocation of oo_order(s->oo). > If that fails (and it's easy to fail, it has __GFP_NORETRY), another > allocation is attempted with oo_order(s->min), for which __GFP_NOFAIL > would be preserved if that's the slab cache's allocflags. Right, patch is junk. However, I haven't been able to trigger failures since I've switched to SLAB allocator. That patch seemed related (and wrong), but it wasn't. > > */ > > - page = alloc_slab_page(flags, node, oo); > > + page = alloc_slab_page(flags | nofail, node, oo); > > if (!page) > > return NULL; > > > > > > This does nothing. You may have missed that the lower order allocation is > passing 'flags' (which is a union of the gfp flags passed to > allocate_slab() based on the allocation context and the cache's > allocflags), and not alloc_gfp where __GFP_NOFAIL is masked. Right, I missed that. > Nack. > > Note: slub isn't going to be a culprit in order 5 allocation failures > since they have kmalloc passthrough to the page allocator. However, it might change fragmentation somewhat I guess. This might make problem more/less visible. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) @ 2009-10-21 21:20 ` Karol Lewandowski 0 siblings, 0 replies; 384+ messages in thread From: Karol Lewandowski @ 2009-10-21 21:20 UTC (permalink / raw) To: David Rientjes Cc: Karol Lewandowski, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Mel Gorman, Frans Pop, Pekka Enberg, KOSAKI Motohiro, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe, Tobias Oetiker On Wed, Oct 21, 2009 at 02:06:41PM -0700, David Rientjes wrote: > On Wed, 21 Oct 2009, Karol Lewandowski wrote: > > > commit d6849591e042bceb66f1b4513a1df6740d2ad762 > > Author: Karol Lewandowski <karol.k.lewandowski@gmail.com> > > Date: Wed Oct 21 21:01:20 2009 +0200 > > > > SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() > > > > Commit ba52270d18fb17ce2cf176b35419dab1e43fe4a3 unconditionally > > cleared __GFP_NOFAIL flag on all allocations. > > > > No, it clears __GFP_NOFAIL from the first allocation of oo_order(s->oo). > If that fails (and it's easy to fail, it has __GFP_NORETRY), another > allocation is attempted with oo_order(s->min), for which __GFP_NOFAIL > would be preserved if that's the slab cache's allocflags. Right, patch is junk. However, I haven't been able to trigger failures since I've switched to SLAB allocator. That patch seemed related (and wrong), but it wasn't. > > */ > > - page = alloc_slab_page(flags, node, oo); > > + page = alloc_slab_page(flags | nofail, node, oo); > > if (!page) > > return NULL; > > > > > > This does nothing. You may have missed that the lower order allocation is > passing 'flags' (which is a union of the gfp flags passed to > allocate_slab() based on the allocation context and the cache's > allocflags), and not alloc_gfp where __GFP_NOFAIL is masked. Right, I missed that. > Nack. > > Note: slub isn't going to be a culprit in order 5 allocation failures > since they have kmalloc passthrough to the page allocator. However, it might change fragmentation somewhat I guess. This might make problem more/less visible. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) 2009-10-21 21:20 ` Karol Lewandowski @ 2009-10-22 10:20 ` Mel Gorman -1 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-22 10:20 UTC (permalink / raw) To: Karol Lewandowski Cc: David Rientjes, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Frans Pop, Pekka Enberg, KOSAKI Motohiro, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe, Tobias Oetiker On Wed, Oct 21, 2009 at 11:20:34PM +0200, Karol Lewandowski wrote: > On Wed, Oct 21, 2009 at 02:06:41PM -0700, David Rientjes wrote: > > On Wed, 21 Oct 2009, Karol Lewandowski wrote: > > > > > commit d6849591e042bceb66f1b4513a1df6740d2ad762 > > > Author: Karol Lewandowski <karol.k.lewandowski@gmail.com> > > > Date: Wed Oct 21 21:01:20 2009 +0200 > > > > > > SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() > > > > > > Commit ba52270d18fb17ce2cf176b35419dab1e43fe4a3 unconditionally > > > cleared __GFP_NOFAIL flag on all allocations. > > > > > > > No, it clears __GFP_NOFAIL from the first allocation of oo_order(s->oo). > > If that fails (and it's easy to fail, it has __GFP_NORETRY), another > > allocation is attempted with oo_order(s->min), for which __GFP_NOFAIL > > would be preserved if that's the slab cache's allocflags. > > Right, patch is junk. > > However, I haven't been able to trigger failures since I've switched > to SLAB allocator. That patch seemed related (and wrong), but it > wasn't. > Interesting. Pekka, I looked for SLUB commits in the 2.6.30..2.6.31 range for patches that might affect what order of pages SLUB allocates but didn't spot anything obvious. Can you think of any changes that might have altered how SLUB uses memory? > > > */ > > > - page = alloc_slab_page(flags, node, oo); > > > + page = alloc_slab_page(flags | nofail, node, oo); > > > if (!page) > > > return NULL; > > > > > > > > > > This does nothing. You may have missed that the lower order allocation is > > passing 'flags' (which is a union of the gfp flags passed to > > allocate_slab() based on the allocation context and the cache's > > allocflags), and not alloc_gfp where __GFP_NOFAIL is masked. > > Right, I missed that. > > > Nack. > > > > Note: slub isn't going to be a culprit in order 5 allocation failures > > since they have kmalloc passthrough to the page allocator. > > However, it might change fragmentation somewhat I guess. This might > make problem more/less visible. > Did you have CONFIG_KMEMCHECK set by any chance? -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) @ 2009-10-22 10:20 ` Mel Gorman 0 siblings, 0 replies; 384+ messages in thread From: Mel Gorman @ 2009-10-22 10:20 UTC (permalink / raw) To: Karol Lewandowski Cc: David Rientjes, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Frans Pop, Pekka Enberg, KOSAKI Motohiro, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe, Tobias Oetiker On Wed, Oct 21, 2009 at 11:20:34PM +0200, Karol Lewandowski wrote: > On Wed, Oct 21, 2009 at 02:06:41PM -0700, David Rientjes wrote: > > On Wed, 21 Oct 2009, Karol Lewandowski wrote: > > > > > commit d6849591e042bceb66f1b4513a1df6740d2ad762 > > > Author: Karol Lewandowski <karol.k.lewandowski@gmail.com> > > > Date: Wed Oct 21 21:01:20 2009 +0200 > > > > > > SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() > > > > > > Commit ba52270d18fb17ce2cf176b35419dab1e43fe4a3 unconditionally > > > cleared __GFP_NOFAIL flag on all allocations. > > > > > > > No, it clears __GFP_NOFAIL from the first allocation of oo_order(s->oo). > > If that fails (and it's easy to fail, it has __GFP_NORETRY), another > > allocation is attempted with oo_order(s->min), for which __GFP_NOFAIL > > would be preserved if that's the slab cache's allocflags. > > Right, patch is junk. > > However, I haven't been able to trigger failures since I've switched > to SLAB allocator. That patch seemed related (and wrong), but it > wasn't. > Interesting. Pekka, I looked for SLUB commits in the 2.6.30..2.6.31 range for patches that might affect what order of pages SLUB allocates but didn't spot anything obvious. Can you think of any changes that might have altered how SLUB uses memory? > > > */ > > > - page = alloc_slab_page(flags, node, oo); > > > + page = alloc_slab_page(flags | nofail, node, oo); > > > if (!page) > > > return NULL; > > > > > > > > > > This does nothing. You may have missed that the lower order allocation is > > passing 'flags' (which is a union of the gfp flags passed to > > allocate_slab() based on the allocation context and the cache's > > allocflags), and not alloc_gfp where __GFP_NOFAIL is masked. > > Right, I missed that. > > > Nack. > > > > Note: slub isn't going to be a culprit in order 5 allocation failures > > since they have kmalloc passthrough to the page allocator. > > However, it might change fragmentation somewhat I guess. This might > make problem more/less visible. > Did you have CONFIG_KMEMCHECK set by any chance? -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) 2009-10-22 10:20 ` Mel Gorman (?) @ 2009-10-22 21:33 ` Karol Lewandowski -1 siblings, 0 replies; 384+ messages in thread From: Karol Lewandowski @ 2009-10-22 21:33 UTC (permalink / raw) To: Mel Gorman Cc: Karol Lewandowski, David Rientjes, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Frans Pop, Pekka Enberg, KOSAKI Motohiro, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe, Tobias Oetiker On Thu, Oct 22, 2009 at 11:20:14AM +0100, Mel Gorman wrote: > On Wed, Oct 21, 2009 at 11:20:34PM +0200, Karol Lewandowski wrote: > > > Note: slub isn't going to be a culprit in order 5 allocation failures > > > since they have kmalloc passthrough to the page allocator. > > > > However, it might change fragmentation somewhat I guess. This might > > make problem more/less visible. > > > > Did you have CONFIG_KMEMCHECK set by any chance? No, kmemcheck (and kmemleak) was always disabled. It's likely that's possible to trigger allocation failures with slab, I just haven't been successful at it. Lack of good testcase is really problem here -- even if I can't trigger failures I can never be sure that these wont appear in some strange moment. BTW I'll test your patches (from another thread) shortly. Thanks. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) @ 2009-10-22 21:33 ` Karol Lewandowski 0 siblings, 0 replies; 384+ messages in thread From: Karol Lewandowski @ 2009-10-22 21:33 UTC (permalink / raw) To: Mel Gorman Cc: Karol Lewandowski, David Rientjes, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Frans Pop, Pekka Enberg, KOSAKI Motohiro, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, Tobias Oetiker On Thu, Oct 22, 2009 at 11:20:14AM +0100, Mel Gorman wrote: > On Wed, Oct 21, 2009 at 11:20:34PM +0200, Karol Lewandowski wrote: > > > Note: slub isn't going to be a culprit in order 5 allocation failures > > > since they have kmalloc passthrough to the page allocator. > > > > However, it might change fragmentation somewhat I guess. This might > > make problem more/less visible. > > > > Did you have CONFIG_KMEMCHECK set by any chance? No, kmemcheck (and kmemleak) was always disabled. It's likely that's possible to trigger allocation failures with slab, I just haven't been successful at it. Lack of good testcase is really problem here -- even if I can't trigger failures I can never be sure that these wont appear in some strange moment. BTW I'll test your patches (from another thread) shortly. Thanks. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) @ 2009-10-22 21:33 ` Karol Lewandowski 0 siblings, 0 replies; 384+ messages in thread From: Karol Lewandowski @ 2009-10-22 21:33 UTC (permalink / raw) To: Mel Gorman Cc: Karol Lewandowski, David Rientjes, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List, Frans Pop, Pekka Enberg, KOSAKI Motohiro, Reinette Chatre, Bartlomiej Zolnierkiewicz, Mohamed Abbas, John W. Linville, linux-mm, jens.axboe, Tobias Oetiker On Thu, Oct 22, 2009 at 11:20:14AM +0100, Mel Gorman wrote: > On Wed, Oct 21, 2009 at 11:20:34PM +0200, Karol Lewandowski wrote: > > > Note: slub isn't going to be a culprit in order 5 allocation failures > > > since they have kmalloc passthrough to the page allocator. > > > > However, it might change fragmentation somewhat I guess. This might > > make problem more/less visible. > > > > Did you have CONFIG_KMEMCHECK set by any chance? No, kmemcheck (and kmemleak) was always disabled. It's likely that's possible to trigger allocation failures with slab, I just haven't been successful at it. Lack of good testcase is really problem here -- even if I can't trigger failures I can never be sure that these wont appear in some strange moment. BTW I'll test your patches (from another thread) shortly. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14266] regression in page writeback 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (43 preceding siblings ...) 2009-10-01 19:56 ` [Bug #14265] ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100 Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14267] Disassociating atheros wlan Rafael J. Wysocki ` (3 subsequent siblings) 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Andrew Morton, Chris Mason, Christoph Hellwig, Dave Chinner, Linus Torvalds, Peter Zijlstra, Richard Kennedy, Shaohua Li, Theodore Tso, Wu Fengguang This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14266 Subject : regression in page writeback Submitter : Shaohua Li <shaohua.li@intel.com> Date : 2009-09-22 5:49 (10 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d7831a0bdf06b9f722b947bb0c205ff7d77cebd8 References : http://marc.info/?l=linux-kernel&m=125359858117176&w=4 Handled-By : Wu Fengguang <fengguang.wu@intel.com> ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14267] Disassociating atheros wlan 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (44 preceding siblings ...) 2009-10-01 19:56 ` [Bug #14266] regression in page writeback Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-05 0:34 ` Justin Mattock 2009-10-01 19:56 ` [Bug #14275] kernel>=2.6.31: ahci.c: do not force unconditionally sb600 to 32bit dma any more? Rafael J. Wysocki ` (2 subsequent siblings) 48 siblings, 1 reply; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Justin P. Mattock, Kristoffer Ericson This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14267 Subject : Disassociating atheros wlan Submitter : Kristoffer Ericson <kristoffer.ericson@gmail.com> Date : 2009-09-24 10:16 (8 days old) References : http://marc.info/?l=linux-kernel&m=125378723723384&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14267] Disassociating atheros wlan 2009-10-01 19:56 ` [Bug #14267] Disassociating atheros wlan Rafael J. Wysocki @ 2009-10-05 0:34 ` Justin Mattock 0 siblings, 0 replies; 384+ messages in thread From: Justin Mattock @ 2009-10-05 0:34 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Kristoffer Ericson On Thu, Oct 1, 2009 at 12:56 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14267 > Subject : Disassociating atheros wlan > Submitter : Kristoffer Ericson <kristoffer.ericson@gmail.com> > Date : 2009-09-24 10:16 (8 days old) > References : http://marc.info/?l=linux-kernel&m=125378723723384&w=4 > > > Sorry for the delay (spent some time in bodie). yes it should be still open. -- Justin P. Mattock ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14267] Disassociating atheros wlan @ 2009-10-05 0:34 ` Justin Mattock 0 siblings, 0 replies; 384+ messages in thread From: Justin Mattock @ 2009-10-05 0:34 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux Kernel Mailing List, Kernel Testers List, Kristoffer Ericson On Thu, Oct 1, 2009 at 12:56 PM, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14267 > Subject : Disassociating atheros wlan > Submitter : Kristoffer Ericson <kristoffer.ericson@gmail.com> > Date : 2009-09-24 10:16 (8 days old) > References : http://marc.info/?l=linux-kernel&m=125378723723384&w=4 > > > Sorry for the delay (spent some time in bodie). yes it should be still open. -- Justin P. Mattock ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14267] Disassociating atheros wlan 2009-10-05 0:34 ` Justin Mattock (?) @ 2009-10-05 20:09 ` Rafael J. Wysocki -1 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-05 20:09 UTC (permalink / raw) To: Justin Mattock Cc: Linux Kernel Mailing List, Kernel Testers List, Kristoffer Ericson On Monday 05 October 2009, Justin Mattock wrote: > On Thu, Oct 1, 2009 at 12:56 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote: > > This message has been generated automatically as a part of a report > > of regressions introduced between 2.6.30 and 2.6.31. > > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.30 and 2.6.31. Please verify if it still should > > be listed and let me know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14267 > > Subject : Disassociating atheros wlan > > Submitter : Kristoffer Ericson <kristoffer.ericson@gmail.com> > > Date : 2009-09-24 10:16 (8 days old) > > References : http://marc.info/?l=linux-kernel&m=125378723723384&w=4 > > > > > > > > Sorry for the delay > (spent some time in bodie). > yes it should be still open. Thanks for the update. Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14275] kernel>=2.6.31: ahci.c: do not force unconditionally sb600 to 32bit dma any more? 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (45 preceding siblings ...) 2009-10-01 19:56 ` [Bug #14267] Disassociating atheros wlan Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14294] kernel BUG at drivers/ide/ide-disk.c:187 Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14301] WARNING: at net/ipv4/af_inet.c:154 Rafael J. Wysocki 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Kernel Testers List, gabriele balducci This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14275 Subject : kernel>=2.6.31: ahci.c: do not force unconditionally sb600 to 32bit dma any more? Submitter : gabriele balducci <balducci@units.it> Date : 2009-09-30 15:02 (2 days old) Patch : http://bugzilla.kernel.org/show_bug.cgi?id=14275#c0 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14294] kernel BUG at drivers/ide/ide-disk.c:187 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (46 preceding siblings ...) 2009-10-01 19:56 ` [Bug #14275] kernel>=2.6.31: ahci.c: do not force unconditionally sb600 to 32bit dma any more? Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14301] WARNING: at net/ipv4/af_inet.c:154 Rafael J. Wysocki 48 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Bartlomiej Zolnierkiewicz, David Miller, Santiago Garcia Mantinan This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14294 Subject : kernel BUG at drivers/ide/ide-disk.c:187 Submitter : Santiago Garcia Mantinan <manty@manty.net> Date : 2009-09-30 11:05 (2 days old) References : http://marc.info/?l=linux-kernel&m=125430926311466&w=4 Handled-By : David Miller <davem@davemloft.net> ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #14301] WARNING: at net/ipv4/af_inet.c:154 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki ` (47 preceding siblings ...) 2009-10-01 19:56 ` [Bug #14294] kernel BUG at drivers/ide/ide-disk.c:187 Rafael J. Wysocki @ 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-03 8:36 ` Eric Dumazet 48 siblings, 1 reply; 384+ messages in thread From: Rafael J. Wysocki @ 2009-10-01 19:56 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Eric Dumazet, Ralf Hildebrandt This message has been generated automatically as a part of a report of regressions introduced between 2.6.30 and 2.6.31. The following bug entry is on the current list of known regressions introduced between 2.6.30 and 2.6.31. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14301 Subject : WARNING: at net/ipv4/af_inet.c:154 Submitter : Ralf Hildebrandt <Ralf.Hildebrandt@charite.de> Date : 2009-09-30 12:24 (2 days old) References : http://marc.info/?l=linux-kernel&m=125431350218137&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14301] WARNING: at net/ipv4/af_inet.c:154 2009-10-01 19:56 ` [Bug #14301] WARNING: at net/ipv4/af_inet.c:154 Rafael J. Wysocki 2009-10-03 8:36 ` Eric Dumazet @ 2009-10-03 8:36 ` Eric Dumazet 0 siblings, 0 replies; 384+ messages in thread From: Eric Dumazet @ 2009-10-03 8:36 UTC (permalink / raw) To: Rafael J. Wysocki, Ralf Hildebrandt Cc: Linux Kernel Mailing List, Kernel Testers List, Herbert Xu, Linux Netdev List, Wei Yongjun, David S. Miller Rafael J. Wysocki a écrit : > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14301 > Subject : WARNING: at net/ipv4/af_inet.c:154 > Submitter : Ralf Hildebrandt <Ralf.Hildebrandt@charite.de> > Date : 2009-09-30 12:24 (2 days old) > References : http://marc.info/?l=linux-kernel&m=125431350218137&w=4 > > If commit d99927f4d93f36553699573b279e0ff98ad7dea6 (net: Fix sock_wfree() race) doesnt fix this problem, then maybe we should take a look at an old patch. < data mining... running... output results to lkml/netdev > Random guesses 1) : commit d55d87fdff8252d0e2f7c28c2d443aee17e9d70f (net: Move rx skb_orphan call to where needed) A similar problem on SCTP was fixed by commit 1bc4ee4088c9a502db0e9c87f675e61e57fa1734 (sctp: fix warning at inet_sock_destruct() while release sctp socket) 2) CORK and UDP sockets It seems we can leave an UDP socket with a frame in sk_write_queue Purge of this queue is done by udp_flush_pending_frames() This calls ip_flush_pending_frames() But this function only calls kfree_skb(), not sk_wmem_free_skb()... Could you try following patch ? Thanks [PATCH] net: UDP should not use ip_flush_pending_frames() Now xmit UDP messages are charged, we must take care of calling right skb freeing function. In case a close() is performed on a socket where CORKED frame is still queued in sk_write_queue, calling ip_flush_pending_frames() leads to sk_forward_alloc leak. Reported-by: Ralf Hildebrandt <Ralf.Hildebrandt@charite.de> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- include/net/sock.h | 10 ++++++++++ include/net/tcp.h | 10 ---------- net/ipv4/tcp.c | 2 +- net/ipv4/tcp_ipv4.c | 2 +- net/ipv4/udp.c | 2 +- 5 files changed, 13 insertions(+), 13 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 1621935..7c80fec 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -882,6 +882,16 @@ static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb) __kfree_skb(skb); } +/* write queue abstraction */ +static inline void sk_write_queue_purge(struct sock *sk) +{ + struct sk_buff *skb; + + while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) + sk_wmem_free_skb(sk, skb); + sk_mem_reclaim(sk); +} + /* Used by processes to "lock" a socket state, so that * interrupts and bottom half handlers won't change it * from under us. It essentially blocks any incoming diff --git a/include/net/tcp.h b/include/net/tcp.h index 03a49c7..4c7036a 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1220,16 +1220,6 @@ static inline void tcp_put_md5sig_pool(void) put_cpu(); } -/* write queue abstraction */ -static inline void tcp_write_queue_purge(struct sock *sk) -{ - struct sk_buff *skb; - - while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) - sk_wmem_free_skb(sk, skb); - sk_mem_reclaim(sk); -} - static inline struct sk_buff *tcp_write_queue_head(struct sock *sk) { return skb_peek(&sk->sk_write_queue); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 64d0af6..0124f5b 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1992,7 +1992,7 @@ int tcp_disconnect(struct sock *sk, int flags) tcp_clear_xmit_timers(sk); __skb_queue_purge(&sk->sk_receive_queue); - tcp_write_queue_purge(sk); + sk_write_queue_purge(sk); __skb_queue_purge(&tp->out_of_order_queue); #ifdef CONFIG_NET_DMA __skb_queue_purge(&sk->sk_async_wait_queue); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 7cda24b..76e59df 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1845,7 +1845,7 @@ void tcp_v4_destroy_sock(struct sock *sk) tcp_cleanup_congestion_control(sk); /* Cleanup up the write buffer. */ - tcp_write_queue_purge(sk); + sk_write_queue_purge(sk); /* Cleans up our, hopefully empty, out_of_order_queue. */ __skb_queue_purge(&tp->out_of_order_queue); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 6ec6a8a..58007d1 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -464,7 +464,7 @@ void udp_flush_pending_frames(struct sock *sk) if (up->pending) { up->len = 0; up->pending = 0; - ip_flush_pending_frames(sk); + sk_write_queue_purge(sk); } } EXPORT_SYMBOL(udp_flush_pending_frames); ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14301] WARNING: at net/ipv4/af_inet.c:154 @ 2009-10-03 8:36 ` Eric Dumazet 0 siblings, 0 replies; 384+ messages in thread From: Eric Dumazet @ 2009-10-03 8:36 UTC (permalink / raw) To: Rafael J. Wysocki, Ralf Hildebrandt Cc: Linux Kernel Mailing List, Kernel Testers List, Herbert Xu, Linux Netdev List, Wei Yongjun, David S. Miller Rafael J. Wysocki a écrit : > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14301 > Subject : WARNING: at net/ipv4/af_inet.c:154 > Submitter : Ralf Hildebrandt <Ralf.Hildebrandt-jq1tPX9l7E6ELgA04lAiVw@public.gmane.org> > Date : 2009-09-30 12:24 (2 days old) > References : http://marc.info/?l=linux-kernel&m=125431350218137&w=4 > > If commit d99927f4d93f36553699573b279e0ff98ad7dea6 (net: Fix sock_wfree() race) doesnt fix this problem, then maybe we should take a look at an old patch. < data mining... running... output results to lkml/netdev > Random guesses 1) : commit d55d87fdff8252d0e2f7c28c2d443aee17e9d70f (net: Move rx skb_orphan call to where needed) A similar problem on SCTP was fixed by commit 1bc4ee4088c9a502db0e9c87f675e61e57fa1734 (sctp: fix warning at inet_sock_destruct() while release sctp socket) 2) CORK and UDP sockets It seems we can leave an UDP socket with a frame in sk_write_queue Purge of this queue is done by udp_flush_pending_frames() This calls ip_flush_pending_frames() But this function only calls kfree_skb(), not sk_wmem_free_skb()... Could you try following patch ? Thanks [PATCH] net: UDP should not use ip_flush_pending_frames() Now xmit UDP messages are charged, we must take care of calling right skb freeing function. In case a close() is performed on a socket where CORKED frame is still queued in sk_write_queue, calling ip_flush_pending_frames() leads to sk_forward_alloc leak. Reported-by: Ralf Hildebrandt <Ralf.Hildebrandt-jq1tPX9l7E6ELgA04lAiVw@public.gmane.org> Signed-off-by: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> --- include/net/sock.h | 10 ++++++++++ include/net/tcp.h | 10 ---------- net/ipv4/tcp.c | 2 +- net/ipv4/tcp_ipv4.c | 2 +- net/ipv4/udp.c | 2 +- 5 files changed, 13 insertions(+), 13 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 1621935..7c80fec 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -882,6 +882,16 @@ static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb) __kfree_skb(skb); } +/* write queue abstraction */ +static inline void sk_write_queue_purge(struct sock *sk) +{ + struct sk_buff *skb; + + while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) + sk_wmem_free_skb(sk, skb); + sk_mem_reclaim(sk); +} + /* Used by processes to "lock" a socket state, so that * interrupts and bottom half handlers won't change it * from under us. It essentially blocks any incoming diff --git a/include/net/tcp.h b/include/net/tcp.h index 03a49c7..4c7036a 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1220,16 +1220,6 @@ static inline void tcp_put_md5sig_pool(void) put_cpu(); } -/* write queue abstraction */ -static inline void tcp_write_queue_purge(struct sock *sk) -{ - struct sk_buff *skb; - - while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) - sk_wmem_free_skb(sk, skb); - sk_mem_reclaim(sk); -} - static inline struct sk_buff *tcp_write_queue_head(struct sock *sk) { return skb_peek(&sk->sk_write_queue); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 64d0af6..0124f5b 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1992,7 +1992,7 @@ int tcp_disconnect(struct sock *sk, int flags) tcp_clear_xmit_timers(sk); __skb_queue_purge(&sk->sk_receive_queue); - tcp_write_queue_purge(sk); + sk_write_queue_purge(sk); __skb_queue_purge(&tp->out_of_order_queue); #ifdef CONFIG_NET_DMA __skb_queue_purge(&sk->sk_async_wait_queue); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 7cda24b..76e59df 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1845,7 +1845,7 @@ void tcp_v4_destroy_sock(struct sock *sk) tcp_cleanup_congestion_control(sk); /* Cleanup up the write buffer. */ - tcp_write_queue_purge(sk); + sk_write_queue_purge(sk); /* Cleans up our, hopefully empty, out_of_order_queue. */ __skb_queue_purge(&tp->out_of_order_queue); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 6ec6a8a..58007d1 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -464,7 +464,7 @@ void udp_flush_pending_frames(struct sock *sk) if (up->pending) { up->len = 0; up->pending = 0; - ip_flush_pending_frames(sk); + sk_write_queue_purge(sk); } } EXPORT_SYMBOL(udp_flush_pending_frames); ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14301] WARNING: at net/ipv4/af_inet.c:154 @ 2009-10-03 8:36 ` Eric Dumazet 0 siblings, 0 replies; 384+ messages in thread From: Eric Dumazet @ 2009-10-03 8:36 UTC (permalink / raw) To: Rafael J. Wysocki, Ralf Hildebrandt Cc: Linux Kernel Mailing List, Kernel Testers List, Herbert Xu, Linux Netdev List, Wei Yongjun, David S. Miller Rafael J. Wysocki a écrit : > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.30 and 2.6.31. > > The following bug entry is on the current list of known regressions > introduced between 2.6.30 and 2.6.31. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14301 > Subject : WARNING: at net/ipv4/af_inet.c:154 > Submitter : Ralf Hildebrandt <Ralf.Hildebrandt-jq1tPX9l7E6ELgA04lAiVw@public.gmane.org> > Date : 2009-09-30 12:24 (2 days old) > References : http://marc.info/?l=linux-kernel&m=125431350218137&w=4 > > If commit d99927f4d93f36553699573b279e0ff98ad7dea6 (net: Fix sock_wfree() race) doesnt fix this problem, then maybe we should take a look at an old patch. < data mining... running... output results to lkml/netdev > Random guesses 1) : commit d55d87fdff8252d0e2f7c28c2d443aee17e9d70f (net: Move rx skb_orphan call to where needed) A similar problem on SCTP was fixed by commit 1bc4ee4088c9a502db0e9c87f675e61e57fa1734 (sctp: fix warning at inet_sock_destruct() while release sctp socket) 2) CORK and UDP sockets It seems we can leave an UDP socket with a frame in sk_write_queue Purge of this queue is done by udp_flush_pending_frames() This calls ip_flush_pending_frames() But this function only calls kfree_skb(), not sk_wmem_free_skb()... Could you try following patch ? Thanks [PATCH] net: UDP should not use ip_flush_pending_frames() Now xmit UDP messages are charged, we must take care of calling right skb freeing function. In case a close() is performed on a socket where CORKED frame is still queued in sk_write_queue, calling ip_flush_pending_frames() leads to sk_forward_alloc leak. Reported-by: Ralf Hildebrandt <Ralf.Hildebrandt-jq1tPX9l7E6ELgA04lAiVw@public.gmane.org> Signed-off-by: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> --- include/net/sock.h | 10 ++++++++++ include/net/tcp.h | 10 ---------- net/ipv4/tcp.c | 2 +- net/ipv4/tcp_ipv4.c | 2 +- net/ipv4/udp.c | 2 +- 5 files changed, 13 insertions(+), 13 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 1621935..7c80fec 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -882,6 +882,16 @@ static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb) __kfree_skb(skb); } +/* write queue abstraction */ +static inline void sk_write_queue_purge(struct sock *sk) +{ + struct sk_buff *skb; + + while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) + sk_wmem_free_skb(sk, skb); + sk_mem_reclaim(sk); +} + /* Used by processes to "lock" a socket state, so that * interrupts and bottom half handlers won't change it * from under us. It essentially blocks any incoming diff --git a/include/net/tcp.h b/include/net/tcp.h index 03a49c7..4c7036a 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1220,16 +1220,6 @@ static inline void tcp_put_md5sig_pool(void) put_cpu(); } -/* write queue abstraction */ -static inline void tcp_write_queue_purge(struct sock *sk) -{ - struct sk_buff *skb; - - while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) - sk_wmem_free_skb(sk, skb); - sk_mem_reclaim(sk); -} - static inline struct sk_buff *tcp_write_queue_head(struct sock *sk) { return skb_peek(&sk->sk_write_queue); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 64d0af6..0124f5b 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1992,7 +1992,7 @@ int tcp_disconnect(struct sock *sk, int flags) tcp_clear_xmit_timers(sk); __skb_queue_purge(&sk->sk_receive_queue); - tcp_write_queue_purge(sk); + sk_write_queue_purge(sk); __skb_queue_purge(&tp->out_of_order_queue); #ifdef CONFIG_NET_DMA __skb_queue_purge(&sk->sk_async_wait_queue); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 7cda24b..76e59df 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1845,7 +1845,7 @@ void tcp_v4_destroy_sock(struct sock *sk) tcp_cleanup_congestion_control(sk); /* Cleanup up the write buffer. */ - tcp_write_queue_purge(sk); + sk_write_queue_purge(sk); /* Cleans up our, hopefully empty, out_of_order_queue. */ __skb_queue_purge(&tp->out_of_order_queue); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 6ec6a8a..58007d1 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -464,7 +464,7 @@ void udp_flush_pending_frames(struct sock *sk) if (up->pending) { up->len = 0; up->pending = 0; - ip_flush_pending_frames(sk); + sk_write_queue_purge(sk); } } EXPORT_SYMBOL(udp_flush_pending_frames); ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14301] WARNING: at net/ipv4/af_inet.c:154 @ 2009-10-03 8:52 ` Eric Dumazet 0 siblings, 0 replies; 384+ messages in thread From: Eric Dumazet @ 2009-10-03 8:52 UTC (permalink / raw) To: Rafael J. Wysocki, Ralf Hildebrandt Cc: Linux Kernel Mailing List, Kernel Testers List, Herbert Xu, Linux Netdev List, Wei Yongjun, David S. Miller Eric Dumazet a écrit : > Rafael J. Wysocki a écrit : >> This message has been generated automatically as a part of a report >> of regressions introduced between 2.6.30 and 2.6.31. >> >> The following bug entry is on the current list of known regressions >> introduced between 2.6.30 and 2.6.31. Please verify if it still should >> be listed and let me know (either way). >> >> >> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14301 >> Subject : WARNING: at net/ipv4/af_inet.c:154 >> Submitter : Ralf Hildebrandt <Ralf.Hildebrandt@charite.de> >> Date : 2009-09-30 12:24 (2 days old) >> References : http://marc.info/?l=linux-kernel&m=125431350218137&w=4 >> >> > > If commit d99927f4d93f36553699573b279e0ff98ad7dea6 > (net: Fix sock_wfree() race) doesnt fix this problem, then > maybe we should take a look at an old patch. > > < data mining... running... output results to lkml/netdev > > > Random guesses > > 1) : commit d55d87fdff8252d0e2f7c28c2d443aee17e9d70f > (net: Move rx skb_orphan call to where needed) > > A similar problem on SCTP was fixed by commit > 1bc4ee4088c9a502db0e9c87f675e61e57fa1734 > (sctp: fix warning at inet_sock_destruct() while release sctp socket) > > 2) CORK and UDP sockets > It seems we can leave an UDP socket with a frame in sk_write_queue > Purge of this queue is done by udp_flush_pending_frames() > This calls ip_flush_pending_frames() > But this function only calls kfree_skb(), not sk_wmem_free_skb()... > > > Could you try following patch ? > Hmm, I missed the ip_cork_release(), here is an updated version. [PATCH] net: UDP should not use ip_flush_pending_frames() Now xmit UDP messages are charged, we must take care of calling right skb freeing function. In case a close() is performed on a socket where CORKED frame is still queued in sk_write_queue, calling ip_flush_pending_frames() leads to sk_forward_alloc leak. Fix this by calling sk_write_queue_purge() and ip_cork_release() instead. Reported-by: Ralf Hildebrandt <Ralf.Hildebrandt@charite.de> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- include/net/ip.h | 1 + include/net/sock.h | 10 ++++++++++ include/net/tcp.h | 10 ---------- net/ipv4/tcp.c | 2 +- net/ipv4/tcp_ipv4.c | 2 +- net/ipv4/udp.c | 3 ++- 6 files changed, 15 insertions(+), 13 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index 2f47e54..c8d8828 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -117,6 +117,7 @@ extern int ip_generic_getfrag(void *from, char *to, int offset, int len, int od extern ssize_t ip_append_page(struct sock *sk, struct page *page, int offset, size_t size, int flags); extern int ip_push_pending_frames(struct sock *sk); +extern void ip_cork_release(struct inet_sock *); extern void ip_flush_pending_frames(struct sock *sk); /* datagram.c */ diff --git a/include/net/sock.h b/include/net/sock.h index 1621935..7c80fec 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -882,6 +882,16 @@ static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb) __kfree_skb(skb); } +/* write queue abstraction */ +static inline void sk_write_queue_purge(struct sock *sk) +{ + struct sk_buff *skb; + + while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) + sk_wmem_free_skb(sk, skb); + sk_mem_reclaim(sk); +} + /* Used by processes to "lock" a socket state, so that * interrupts and bottom half handlers won't change it * from under us. It essentially blocks any incoming diff --git a/include/net/tcp.h b/include/net/tcp.h index 03a49c7..4c7036a 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1220,16 +1220,6 @@ static inline void tcp_put_md5sig_pool(void) put_cpu(); } -/* write queue abstraction */ -static inline void tcp_write_queue_purge(struct sock *sk) -{ - struct sk_buff *skb; - - while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) - sk_wmem_free_skb(sk, skb); - sk_mem_reclaim(sk); -} - static inline struct sk_buff *tcp_write_queue_head(struct sock *sk) { return skb_peek(&sk->sk_write_queue); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 64d0af6..0124f5b 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1992,7 +1992,7 @@ int tcp_disconnect(struct sock *sk, int flags) tcp_clear_xmit_timers(sk); __skb_queue_purge(&sk->sk_receive_queue); - tcp_write_queue_purge(sk); + sk_write_queue_purge(sk); __skb_queue_purge(&tp->out_of_order_queue); #ifdef CONFIG_NET_DMA __skb_queue_purge(&sk->sk_async_wait_queue); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 7cda24b..76e59df 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1845,7 +1845,7 @@ void tcp_v4_destroy_sock(struct sock *sk) tcp_cleanup_congestion_control(sk); /* Cleanup up the write buffer. */ - tcp_write_queue_purge(sk); + sk_write_queue_purge(sk); /* Cleans up our, hopefully empty, out_of_order_queue. */ __skb_queue_purge(&tp->out_of_order_queue); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 6ec6a8a..b6370d0 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -464,7 +464,8 @@ void udp_flush_pending_frames(struct sock *sk) if (up->pending) { up->len = 0; up->pending = 0; - ip_flush_pending_frames(sk); + sk_write_queue_purge(sk); + ip_cork_release(inet_sk(sk)); } } EXPORT_SYMBOL(udp_flush_pending_frames); ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14301] WARNING: at net/ipv4/af_inet.c:154 @ 2009-10-03 8:52 ` Eric Dumazet 0 siblings, 0 replies; 384+ messages in thread From: Eric Dumazet @ 2009-10-03 8:52 UTC (permalink / raw) To: Rafael J. Wysocki, Ralf Hildebrandt Cc: Linux Kernel Mailing List, Kernel Testers List, Herbert Xu, Linux Netdev List, Wei Yongjun, David S. Miller Eric Dumazet a écrit : > Rafael J. Wysocki a écrit : >> This message has been generated automatically as a part of a report >> of regressions introduced between 2.6.30 and 2.6.31. >> >> The following bug entry is on the current list of known regressions >> introduced between 2.6.30 and 2.6.31. Please verify if it still should >> be listed and let me know (either way). >> >> >> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14301 >> Subject : WARNING: at net/ipv4/af_inet.c:154 >> Submitter : Ralf Hildebrandt <Ralf.Hildebrandt-jq1tPX9l7E6ELgA04lAiVw@public.gmane.org> >> Date : 2009-09-30 12:24 (2 days old) >> References : http://marc.info/?l=linux-kernel&m=125431350218137&w=4 >> >> > > If commit d99927f4d93f36553699573b279e0ff98ad7dea6 > (net: Fix sock_wfree() race) doesnt fix this problem, then > maybe we should take a look at an old patch. > > < data mining... running... output results to lkml/netdev > > > Random guesses > > 1) : commit d55d87fdff8252d0e2f7c28c2d443aee17e9d70f > (net: Move rx skb_orphan call to where needed) > > A similar problem on SCTP was fixed by commit > 1bc4ee4088c9a502db0e9c87f675e61e57fa1734 > (sctp: fix warning at inet_sock_destruct() while release sctp socket) > > 2) CORK and UDP sockets > It seems we can leave an UDP socket with a frame in sk_write_queue > Purge of this queue is done by udp_flush_pending_frames() > This calls ip_flush_pending_frames() > But this function only calls kfree_skb(), not sk_wmem_free_skb()... > > > Could you try following patch ? > Hmm, I missed the ip_cork_release(), here is an updated version. [PATCH] net: UDP should not use ip_flush_pending_frames() Now xmit UDP messages are charged, we must take care of calling right skb freeing function. In case a close() is performed on a socket where CORKED frame is still queued in sk_write_queue, calling ip_flush_pending_frames() leads to sk_forward_alloc leak. Fix this by calling sk_write_queue_purge() and ip_cork_release() instead. Reported-by: Ralf Hildebrandt <Ralf.Hildebrandt-jq1tPX9l7E6ELgA04lAiVw@public.gmane.org> Signed-off-by: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> --- include/net/ip.h | 1 + include/net/sock.h | 10 ++++++++++ include/net/tcp.h | 10 ---------- net/ipv4/tcp.c | 2 +- net/ipv4/tcp_ipv4.c | 2 +- net/ipv4/udp.c | 3 ++- 6 files changed, 15 insertions(+), 13 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index 2f47e54..c8d8828 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -117,6 +117,7 @@ extern int ip_generic_getfrag(void *from, char *to, int offset, int len, int od extern ssize_t ip_append_page(struct sock *sk, struct page *page, int offset, size_t size, int flags); extern int ip_push_pending_frames(struct sock *sk); +extern void ip_cork_release(struct inet_sock *); extern void ip_flush_pending_frames(struct sock *sk); /* datagram.c */ diff --git a/include/net/sock.h b/include/net/sock.h index 1621935..7c80fec 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -882,6 +882,16 @@ static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb) __kfree_skb(skb); } +/* write queue abstraction */ +static inline void sk_write_queue_purge(struct sock *sk) +{ + struct sk_buff *skb; + + while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) + sk_wmem_free_skb(sk, skb); + sk_mem_reclaim(sk); +} + /* Used by processes to "lock" a socket state, so that * interrupts and bottom half handlers won't change it * from under us. It essentially blocks any incoming diff --git a/include/net/tcp.h b/include/net/tcp.h index 03a49c7..4c7036a 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1220,16 +1220,6 @@ static inline void tcp_put_md5sig_pool(void) put_cpu(); } -/* write queue abstraction */ -static inline void tcp_write_queue_purge(struct sock *sk) -{ - struct sk_buff *skb; - - while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) - sk_wmem_free_skb(sk, skb); - sk_mem_reclaim(sk); -} - static inline struct sk_buff *tcp_write_queue_head(struct sock *sk) { return skb_peek(&sk->sk_write_queue); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 64d0af6..0124f5b 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1992,7 +1992,7 @@ int tcp_disconnect(struct sock *sk, int flags) tcp_clear_xmit_timers(sk); __skb_queue_purge(&sk->sk_receive_queue); - tcp_write_queue_purge(sk); + sk_write_queue_purge(sk); __skb_queue_purge(&tp->out_of_order_queue); #ifdef CONFIG_NET_DMA __skb_queue_purge(&sk->sk_async_wait_queue); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 7cda24b..76e59df 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1845,7 +1845,7 @@ void tcp_v4_destroy_sock(struct sock *sk) tcp_cleanup_congestion_control(sk); /* Cleanup up the write buffer. */ - tcp_write_queue_purge(sk); + sk_write_queue_purge(sk); /* Cleans up our, hopefully empty, out_of_order_queue. */ __skb_queue_purge(&tp->out_of_order_queue); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 6ec6a8a..b6370d0 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -464,7 +464,8 @@ void udp_flush_pending_frames(struct sock *sk) if (up->pending) { up->len = 0; up->pending = 0; - ip_flush_pending_frames(sk); + sk_write_queue_purge(sk); + ip_cork_release(inet_sk(sk)); } } EXPORT_SYMBOL(udp_flush_pending_frames); ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [Bug #14301] WARNING: at net/ipv4/af_inet.c:154 2009-10-03 8:52 ` Eric Dumazet @ 2009-10-03 17:53 ` Eric Dumazet -1 siblings, 0 replies; 384+ messages in thread From: Eric Dumazet @ 2009-10-03 17:53 UTC (permalink / raw) Cc: Rafael J. Wysocki, Ralf Hildebrandt, Linux Kernel Mailing List, Kernel Testers List, Herbert Xu, Linux Netdev List, Wei Yongjun, David S. Miller Eric Dumazet a écrit : > Eric Dumazet a écrit : >> Rafael J. Wysocki a écrit : >>> This message has been generated automatically as a part of a report >>> of regressions introduced between 2.6.30 and 2.6.31. >>> >>> The following bug entry is on the current list of known regressions >>> introduced between 2.6.30 and 2.6.31. Please verify if it still should >>> be listed and let me know (either way). >>> >>> >>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14301 >>> Subject : WARNING: at net/ipv4/af_inet.c:154 >>> Submitter : Ralf Hildebrandt <Ralf.Hildebrandt@charite.de> >>> Date : 2009-09-30 12:24 (2 days old) >>> References : http://marc.info/?l=linux-kernel&m=125431350218137&w=4 >>> >>> >> If commit d99927f4d93f36553699573b279e0ff98ad7dea6 >> (net: Fix sock_wfree() race) doesnt fix this problem, then >> maybe we should take a look at an old patch. >> >> < data mining... running... output results to lkml/netdev > >> >> Random guesses >> >> 1) : commit d55d87fdff8252d0e2f7c28c2d443aee17e9d70f >> (net: Move rx skb_orphan call to where needed) >> >> A similar problem on SCTP was fixed by commit >> 1bc4ee4088c9a502db0e9c87f675e61e57fa1734 >> (sctp: fix warning at inet_sock_destruct() while release sctp socket) >> >> 2) CORK and UDP sockets >> It seems we can leave an UDP socket with a frame in sk_write_queue >> Purge of this queue is done by udp_flush_pending_frames() >> This calls ip_flush_pending_frames() >> But this function only calls kfree_skb(), not sk_wmem_free_skb()... >> >> >> Could you try following patch ? >> > > Hmm, I missed the ip_cork_release(), here is an updated version. > Please ignore this patch, I was wrong, sk_forward_alloc is not used on xmit side for udp, only receive side. CORK/UDP should be fine Investigation still needed... ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14301] WARNING: at net/ipv4/af_inet.c:154 @ 2009-10-03 17:53 ` Eric Dumazet 0 siblings, 0 replies; 384+ messages in thread From: Eric Dumazet @ 2009-10-03 17:53 UTC (permalink / raw) Cc: Rafael J. Wysocki, Ralf Hildebrandt, Linux Kernel Mailing List, Kernel Testers List, Herbert Xu, Linux Netdev List, Wei Yongjun, David S. Miller Eric Dumazet a écrit : > Eric Dumazet a écrit : >> Rafael J. Wysocki a écrit : >>> This message has been generated automatically as a part of a report >>> of regressions introduced between 2.6.30 and 2.6.31. >>> >>> The following bug entry is on the current list of known regressions >>> introduced between 2.6.30 and 2.6.31. Please verify if it still should >>> be listed and let me know (either way). >>> >>> >>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14301 >>> Subject : WARNING: at net/ipv4/af_inet.c:154 >>> Submitter : Ralf Hildebrandt <Ralf.Hildebrandt@charite.de> >>> Date : 2009-09-30 12:24 (2 days old) >>> References : http://marc.info/?l=linux-kernel&m=125431350218137&w=4 >>> >>> >> If commit d99927f4d93f36553699573b279e0ff98ad7dea6 >> (net: Fix sock_wfree() race) doesnt fix this problem, then >> maybe we should take a look at an old patch. >> >> < data mining... running... output results to lkml/netdev > >> >> Random guesses >> >> 1) : commit d55d87fdff8252d0e2f7c28c2d443aee17e9d70f >> (net: Move rx skb_orphan call to where needed) >> >> A similar problem on SCTP was fixed by commit >> 1bc4ee4088c9a502db0e9c87f675e61e57fa1734 >> (sctp: fix warning at inet_sock_destruct() while release sctp socket) >> >> 2) CORK and UDP sockets >> It seems we can leave an UDP socket with a frame in sk_write_queue >> Purge of this queue is done by udp_flush_pending_frames() >> This calls ip_flush_pending_frames() >> But this function only calls kfree_skb(), not sk_wmem_free_skb()... >> >> >> Could you try following patch ? >> > > Hmm, I missed the ip_cork_release(), here is an updated version. > Please ignore this patch, I was wrong, sk_forward_alloc is not used on xmit side for udp, only receive side. CORK/UDP should be fine Investigation still needed... ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14301] WARNING: at net/ipv4/af_inet.c:154 @ 2009-10-07 15:41 ` Eric Dumazet 0 siblings, 0 replies; 384+ messages in thread From: Eric Dumazet @ 2009-10-07 15:41 UTC (permalink / raw) To: Herbert Xu, David S. Miller Cc: Rafael J. Wysocki, Ralf Hildebrandt, Linux Kernel Mailing List, Kernel Testers List, Linux Netdev List, Wei Yongjun, Takahiro Yasui, Hideo Aoki Eric Dumazet a écrit : > Eric Dumazet a écrit : >> Eric Dumazet a écrit : >>> Rafael J. Wysocki a écrit : >>>> This message has been generated automatically as a part of a report >>>> of regressions introduced between 2.6.30 and 2.6.31. >>>> >>>> The following bug entry is on the current list of known regressions >>>> introduced between 2.6.30 and 2.6.31. Please verify if it still should >>>> be listed and let me know (either way). >>>> >>>> >>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14301 >>>> Subject : WARNING: at net/ipv4/af_inet.c:154 >>>> Submitter : Ralf Hildebrandt <Ralf.Hildebrandt@charite.de> >>>> Date : 2009-09-30 12:24 (2 days old) >>>> References : http://marc.info/?l=linux-kernel&m=125431350218137&w=4 >>>> > > Investigation still needed... > OK, my last (buggy ???) feeling is about commit 95766fff6b9a78d1 [UDP]: Add memory accounting. (Its a two years old patch, oh well...) Problem is the udp_poll() : We check the first frame to be dequeued from sk_receive_queue has a good checksum. If it doesnt, we drop the frame ( calling kfree_skb(skb); ) Problem is now we perform memory accounting on UDP, this kfree_skb() should be done with socket locked, but we are allowed to call lock_sock() from this udp_poll() context unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait) { unsigned int mask = datagram_poll(file, sock, wait); struct sock *sk = sock->sk; int is_lite = IS_UDPLITE(sk); /* Check for false positives due to checksum errors */ if ((mask & POLLRDNORM) && !(file->f_flags & O_NONBLOCK) && !(sk->sk_shutdown & RCV_SHUTDOWN)) { struct sk_buff_head *rcvq = &sk->sk_receive_queue; struct sk_buff *skb; spin_lock_bh(&rcvq->lock); while ((skb = skb_peek(rcvq)) != NULL && udp_lib_checksum_complete(skb)) { UDP_INC_STATS_BH(sock_net(sk), UDP_MIB_INERRORS, is_lite); __skb_unlink(skb, rcvq); <<HERE>> kfree_skb(skb); } spin_unlock_bh(&rcvq->lock); David, Herbert, any idea how to solve this problem ? 1) Allow false positives Or 2) Maybe we should finally convert sk_forward_alloc to an atomic_t after all... This would make things easier, and speedup UDP (no more need to lock_sock()) Or 3) ??? ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #14301] WARNING: at net/ipv4/af_inet.c:154 @ 2009-10-07 15:41 ` Eric Dumazet 0 siblings, 0 replies; 384+ messages in thread From: Eric Dumazet @ 2009-10-07 15:41 UTC (permalink / raw) To: Herbert Xu, David S. Miller Cc: Rafael J. Wysocki, Ralf Hildebrandt, Linux Kernel Mailing List, Kernel Testers List, Linux Netdev List, Wei Yongjun, Takahiro Yasui, Hideo Aoki Eric Dumazet a écrit : > Eric Dumazet a écrit : >> Eric Dumazet a écrit : >>> Rafael J. Wysocki a écrit : >>>> This message has been generated automatically as a part of a report >>>> of regressions introduced between 2.6.30 and 2.6.31. >>>> >>>> The following bug entry is on the current list of known regressions >>>> introduced between 2.6.30 and 2.6.31. Please verify if it still should >>>> be listed and let me know (either way). >>>> >>>> >>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14301 >>>> Subject : WARNING: at net/ipv4/af_inet.c:154 >>>> Submitter : Ralf Hildebrandt <Ralf.Hildebrandt-jq1tPX9l7E6ELgA04lAiVw@public.gmane.org> >>>> Date : 2009-09-30 12:24 (2 days old) >>>> References : http://marc.info/?l=linux-kernel&m=125431350218137&w=4 >>>> > > Investigation still needed... > OK, my last (buggy ???) feeling is about commit 95766fff6b9a78d1 [UDP]: Add memory accounting. (Its a two years old patch, oh well...) Problem is the udp_poll() : We check the first frame to be dequeued from sk_receive_queue has a good checksum. If it doesnt, we drop the frame ( calling kfree_skb(skb); ) Problem is now we perform memory accounting on UDP, this kfree_skb() should be done with socket locked, but we are allowed to call lock_sock() from this udp_poll() context unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait) { unsigned int mask = datagram_poll(file, sock, wait); struct sock *sk = sock->sk; int is_lite = IS_UDPLITE(sk); /* Check for false positives due to checksum errors */ if ((mask & POLLRDNORM) && !(file->f_flags & O_NONBLOCK) && !(sk->sk_shutdown & RCV_SHUTDOWN)) { struct sk_buff_head *rcvq = &sk->sk_receive_queue; struct sk_buff *skb; spin_lock_bh(&rcvq->lock); while ((skb = skb_peek(rcvq)) != NULL && udp_lib_checksum_complete(skb)) { UDP_INC_STATS_BH(sock_net(sk), UDP_MIB_INERRORS, is_lite); __skb_unlink(skb, rcvq); <<HERE>> kfree_skb(skb); } spin_unlock_bh(&rcvq->lock); David, Herbert, any idea how to solve this problem ? 1) Allow false positives Or 2) Maybe we should finally convert sk_forward_alloc to an atomic_t after all... This would make things easier, and speedup UDP (no more need to lock_sock()) Or 3) ??? ^ permalink raw reply [flat|nested] 384+ messages in thread
* [PATCH] udp: Fix udp_poll() and ioctl() 2009-10-07 15:41 ` Eric Dumazet (?) @ 2009-10-09 14:43 ` Eric Dumazet 2009-10-13 10:18 ` David Miller -1 siblings, 1 reply; 384+ messages in thread From: Eric Dumazet @ 2009-10-09 14:43 UTC (permalink / raw) To: David S. Miller Cc: Herbert Xu, Rafael J. Wysocki, Ralf Hildebrandt, Linux Kernel Mailing List, Kernel Testers List, Linux Netdev List, Wei Yongjun, Takahiro Yasui, Hideo Aoki Eric Dumazet a écrit : > Eric Dumazet a écrit : >> Eric Dumazet a écrit : >>> Eric Dumazet a écrit : >>>> Rafael J. Wysocki a écrit : >>>>> This message has been generated automatically as a part of a report >>>>> of regressions introduced between 2.6.30 and 2.6.31. >>>>> >>>>> The following bug entry is on the current list of known regressions >>>>> introduced between 2.6.30 and 2.6.31. Please verify if it still should >>>>> be listed and let me know (either way). >>>>> >>>>> >>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14301 >>>>> Subject : WARNING: at net/ipv4/af_inet.c:154 >>>>> Submitter : Ralf Hildebrandt <Ralf.Hildebrandt@charite.de> >>>>> Date : 2009-09-30 12:24 (2 days old) >>>>> References : http://marc.info/?l=linux-kernel&m=125431350218137&w=4 >>>>> >> Investigation still needed... >> > > OK, my last (buggy ???) feeling is about commit 95766fff6b9a78d1 > > [UDP]: Add memory accounting. > > (Its a two years old patch, oh well...) > > Problem is the udp_poll() : > > We check the first frame to be dequeued from sk_receive_queue has a good checksum. > > If it doesnt, we drop the frame ( calling kfree_skb(skb); ) > > Problem is now we perform memory accounting on UDP, this kfree_skb() > should be done with socket locked, but are we allowed to > call lock_sock() from this udp_poll() context ? > It seems we can lock_sock() from udp_poll() context, so here is a patch. [PATCH] udp: Fix udp_poll() udp_poll() can in some circumstances drop frames with incorrect checksums. Problem is we now have to lock the socket while dropping frames, or risk sk_forward corruption. This bug is present since commit 95766fff6b9a78d1 ([UDP]: Add memory accounting.) While we are at it, we can correct ioctl(SIOCINQ) to also drop bad frames. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- net/ipv4/udp.c | 73 +++++++++++++++++++++++++++-------------------- 1 files changed, 43 insertions(+), 30 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 6ec6a8a..d0d436d 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -841,6 +841,42 @@ out: return ret; } + +/** + * first_packet_length - return length of first packet in receive queue + * @sk: socket + * + * Drops all bad checksum frames, until a valid one is found. + * Returns the length of found skb, or 0 if none is found. + */ +static unsigned int first_packet_length(struct sock *sk) +{ + struct sk_buff_head list_kill, *rcvq = &sk->sk_receive_queue; + struct sk_buff *skb; + unsigned int res; + + __skb_queue_head_init(&list_kill); + + spin_lock_bh(&rcvq->lock); + while ((skb = skb_peek(rcvq)) != NULL && + udp_lib_checksum_complete(skb)) { + UDP_INC_STATS_BH(sock_net(sk), UDP_MIB_INERRORS, + IS_UDPLITE(sk)); + __skb_unlink(skb, rcvq); + __skb_queue_tail(&list_kill, skb); + } + res = skb ? skb->len : 0; + spin_unlock_bh(&rcvq->lock); + + if (!skb_queue_empty(&list_kill)) { + lock_sock(sk); + __skb_queue_purge(&list_kill); + sk_mem_reclaim_partial(sk); + release_sock(sk); + } + return res; +} + /* * IOCTL requests applicable to the UDP protocol */ @@ -857,21 +893,16 @@ int udp_ioctl(struct sock *sk, int cmd, unsigned long arg) case SIOCINQ: { - struct sk_buff *skb; - unsigned long amount; + unsigned int amount = first_packet_length(sk); - amount = 0; - spin_lock_bh(&sk->sk_receive_queue.lock); - skb = skb_peek(&sk->sk_receive_queue); - if (skb != NULL) { + if (amount) /* * We will only return the amount * of this packet since that is all * that will be read. */ - amount = skb->len - sizeof(struct udphdr); - } - spin_unlock_bh(&sk->sk_receive_queue.lock); + amount -= sizeof(struct udphdr); + return put_user(amount, (int __user *)arg); } @@ -1540,29 +1571,11 @@ unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait) { unsigned int mask = datagram_poll(file, sock, wait); struct sock *sk = sock->sk; - int is_lite = IS_UDPLITE(sk); /* Check for false positives due to checksum errors */ - if ((mask & POLLRDNORM) && - !(file->f_flags & O_NONBLOCK) && - !(sk->sk_shutdown & RCV_SHUTDOWN)) { - struct sk_buff_head *rcvq = &sk->sk_receive_queue; - struct sk_buff *skb; - - spin_lock_bh(&rcvq->lock); - while ((skb = skb_peek(rcvq)) != NULL && - udp_lib_checksum_complete(skb)) { - UDP_INC_STATS_BH(sock_net(sk), - UDP_MIB_INERRORS, is_lite); - __skb_unlink(skb, rcvq); - kfree_skb(skb); - } - spin_unlock_bh(&rcvq->lock); - - /* nothing to see, move along */ - if (skb == NULL) - mask &= ~(POLLIN | POLLRDNORM); - } + if ((mask & POLLRDNORM) && !(file->f_flags & O_NONBLOCK) && + !(sk->sk_shutdown & RCV_SHUTDOWN) && !first_packet_length(sk)) + mask &= ~(POLLIN | POLLRDNORM); return mask; ^ permalink raw reply related [flat|nested] 384+ messages in thread
* Re: [PATCH] udp: Fix udp_poll() and ioctl() @ 2009-10-13 10:18 ` David Miller 0 siblings, 0 replies; 384+ messages in thread From: David Miller @ 2009-10-13 10:18 UTC (permalink / raw) To: eric.dumazet Cc: herbert, rjw, Ralf.Hildebrandt, linux-kernel, kernel-testers, netdev, yjwei, tyasui, haoki From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 09 Oct 2009 16:43:40 +0200 > [PATCH] udp: Fix udp_poll() > > udp_poll() can in some circumstances drop frames with incorrect checksums. > > Problem is we now have to lock the socket while dropping frames, or risk > sk_forward corruption. > > This bug is present since commit 95766fff6b9a78d1 > ([UDP]: Add memory accounting.) > > While we are at it, we can correct ioctl(SIOCINQ) to also drop bad frames. > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Looks good, applied, thanks Eric! ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [PATCH] udp: Fix udp_poll() and ioctl() @ 2009-10-13 10:18 ` David Miller 0 siblings, 0 replies; 384+ messages in thread From: David Miller @ 2009-10-13 10:18 UTC (permalink / raw) To: eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w Cc: herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q, rjw-KKrjLPT3xs0, Ralf.Hildebrandt-jq1tPX9l7E6ELgA04lAiVw, linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-testers-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, yjwei-BthXqXjhjHXQFUHtdCDX3A, tyasui-H+wXaHxf7aLQT0dZR+AlfA, haoki-H+wXaHxf7aLQT0dZR+AlfA From: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Date: Fri, 09 Oct 2009 16:43:40 +0200 > [PATCH] udp: Fix udp_poll() > > udp_poll() can in some circumstances drop frames with incorrect checksums. > > Problem is we now have to lock the socket while dropping frames, or risk > sk_forward corruption. > > This bug is present since commit 95766fff6b9a78d1 > ([UDP]: Add memory accounting.) > > While we are at it, we can correct ioctl(SIOCINQ) to also drop bad frames. > > Signed-off-by: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Looks good, applied, thanks Eric! ^ permalink raw reply [flat|nested] 384+ messages in thread
* 2.6.31-rc9: Reported regressions from 2.6.30 @ 2009-09-06 17:15 Rafael J. Wysocki 2009-09-06 17:24 ` Rafael J. Wysocki 0 siblings, 1 reply; 384+ messages in thread From: Rafael J. Wysocki @ 2009-09-06 17:15 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich, Kernel Testers List, Network Development, Linux ACPI, Linux PM List, Linux SCSI List, Linux Wireless List, DRI This message contains a list of some regressions from 2.6.30, for which there are no fixes in the mainline I know of. If any of them have been fixed already, please let me know. If you know of any other unresolved regressions from 2.6.30, please let me know either and I'll add them to the list. Also, please let me know if any of the entries below are invalid. Each entry from the list will be sent additionally in an automatic reply to this message with CCs to the people involved in reporting and handling the issue. Listed regressions statistics: Date Total Pending Unresolved ---------------------------------------- 2009-09-06 123 34 27 2009-08-26 108 33 26 2009-08-20 102 32 29 2009-08-10 89 27 24 2009-08-02 76 36 28 2009-07-27 70 51 43 2009-07-07 35 25 21 2009-06-29 22 22 15 Unresolved regressions ---------------------- Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14141 Subject : order 2 page allocation failures Submitter : Frans Pop <elendil@planet.nl> Date : 2009-09-06 7:40 (1 days old) References : http://marc.info/?l=linux-kernel&m=125222287419691&w=4 Handled-By : Pekka Enberg <penberg@cs.helsinki.fi> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14139 Subject : Output to external monitor is broken Submitter : Carlos R. Mafra <crmafra2@gmail.com> Date : 2009-09-06 14:22 (1 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f8aed700c6ec46ddade6570004ce25332283b306 References : http://marc.info/?l=linux-kernel&m=125224701520738&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14135 Subject : NULL pointer dereference in ima_counts_put Submitter : Ciprian Docan <docan@eden.rutgers.edu> Date : 2009-09-02 13:49 (5 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=94e5d714f604d4cb4cb13163f01ede278e69258b References : http://marc.info/?l=linux-kernel&m=125190146028116&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14133 Subject : WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule Submitter : Jens Axboe <jens.axboe@oracle.com> Date : 2009-08-31 20:43 (7 days old) References : http://marc.info/?l=linux-kernel&m=125175143918050&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14114 Subject : Tuning a saa7134 based card is broken in kernel 2.6.31-rc7 Submitter : Tsvety Petrov <Tsvetoslav.Petrov@itron.com> Date : 2009-09-03 21:06 (4 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14103 Subject : cdc_acm gives I/O error Submitter : Paul Martin <pm@debian.org> Date : 2009-09-01 13:30 (6 days old) Handled-By : Oliver Neukum <oliver@neukum.org> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14095 Subject : Asus EeePC 1005HA-M: Suspend hangs and disables the wireless Submitter : Karsten Jaeger <lists@oss42.com> Date : 2009-08-31 10:14 (7 days old) References : http://lists.alioth.debian.org/pipermail/debian-eeepc-devel/2009-August/002513.html Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14070 Subject : lockdep warning triggered by dup_fd Submitter : Bart Van Assche <bart.vanassche@gmail.com> Date : 2009-08-23 09:36 (15 days old) References : http://lkml.org/lkml/2009/8/23/8 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14058 Subject : Oops in fsnotify Submitter : Grant Wilson <grant.wilson@zen.co.uk> Date : 2009-08-20 15:48 (18 days old) References : http://marc.info/?l=linux-kernel&m=125078450923133&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14043 Subject : System sometimes hangs during boot Submitter : Bart Van Assche <bart.vanassche@gmail.com> Date : 2009-08-23 18:04 (15 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14018 Subject : kernel freezes, inotify problem Submitter : Christoph Thielecke <christoph.thielecke@gmx.de> Date : 2009-08-19 12:48 (19 days old) References : http://marc.info/?l=linux-kernel&m=125068616818353&w=4 Handled-By : Eric Paris <eparis@parisplace.org> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14013 Subject : hd don't show up Submitter : Tim Blechmann <tim@klingt.org> Date : 2009-08-14 8:26 (24 days old) References : http://marc.info/?l=linux-kernel&m=125023842514480&w=4 Handled-By : Tejun Heo <tj@kernel.org> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13987 Subject : Received NMI interrupt at resume Submitter : Christian Casteyde <casteyde.christian@free.fr> Date : 2009-08-15 07:55 (23 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950 Subject : Oops when USB Serial disconnected while in use Submitter : Bruno Prémont <bonbons@linux-vserver.org> Date : 2009-08-08 17:47 (30 days old) References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4 Handled-By : Alan Stern <stern@rowland.harvard.edu> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943 Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k Submitter : Fabio Comolli <fabio.comolli@gmail.com> Date : 2009-08-06 20:15 (32 days old) References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942 Subject : Troubles with AoE and uninitialized object Submitter : Bruno Prémont <bonbons@linux-vserver.org> Date : 2009-08-04 10:12 (34 days old) References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941 Subject : x86 Geode issue Submitter : Martin-Éric Racine <q-funk@iki.fi> Date : 2009-08-03 12:58 (35 days old) References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 Subject : iwlagn and sky2 stopped working, ACPI-related Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net> Date : 2009-08-07 22:33 (31 days old) References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935 Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) Submitter : Adrian Ulrich <kernel@blinkenlights.ch> Date : 2009-08-08 22:08 (30 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13906 Subject : Huawei E169 GPRS connection causes Ooops Submitter : Clemens Eisserer <linuxhippy@gmail.com> Date : 2009-08-04 09:02 (34 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13869 Subject : Radeon framebuffer (w/o KMS) corruption at boot. Submitter : Duncan <1i5t5.duncan@cox.net> Date : 2009-07-29 16:44 (40 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836 Subject : suspend script fails, related to stdout? Submitter : Tomas M. <tmezzadra@gmail.com> Date : 2009-07-17 21:24 (52 days old) References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819 Subject : system freeze when switching to console Submitter : Reinette Chatre <reinette.chatre@intel.com> Date : 2009-07-23 17:57 (46 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13809 Subject : oprofile: possible circular locking dependency detected Submitter : Jerome Marchand <jmarchan@redhat.com> Date : 2009-07-22 13:35 (47 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13740 Subject : X server crashes with 2.6.31-rc2 when options are changed Submitter : Michael S. Tsirkin <m.s.tsirkin@gmail.com> Date : 2009-07-07 15:19 (62 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13733 Subject : 2.6.31-rc2: irq 16: nobody cared Submitter : Niel Lambrechts <niel.lambrechts@gmail.com> Date : 2009-07-06 18:32 (63 days old) References : http://marc.info/?l=linux-kernel&m=124690524027166&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13645 Subject : NULL pointer dereference at (null) (level2_spare_pgt) Submitter : poornima nayak <mpnayak@linux.vnet.ibm.com> Date : 2009-06-17 17:56 (82 days old) References : http://lkml.org/lkml/2009/6/17/194 Regressions with patches ------------------------ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14140 Subject : 2.6.31-rc9 breaks gianfar Submitter : Michael Guntsche <mike@it-loops.com> Date : 2009-09-06 7:27 (1 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=38bddf04bcfe661fbdab94888c3b72c32f6873b3 References : http://marc.info/?l=linux-kernel&m=125222206218784&w=4 Handled-By : David Miller <davem@davemloft.net> Patch : http://patchwork.kernel.org/patch/45965/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14138 Subject : Regression in suspend to ram Submitter : Zdenek Kabelac <zdenek.kabelac@gmail.com> Date : 2009-08-31 11:51 (7 days old) References : http://marc.info/?l=linux-kernel&m=125171952817851&w=4 Handled-By : OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Patch : http://patchwork.kernel.org/patch/45945/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14137 Subject : usb console regressions Submitter : Jason Wessel <jason.wessel@windriver.com> Date : 2009-09-05 21:08 (2 days old) References : http://marc.info/?l=linux-kernel&m=125218501310512&w=4 Handled-By : Jason Wessel <jason.wessel@windriver.com> Patch : http://patchwork.kernel.org/patch/45953/ http://patchwork.kernel.org/patch/45952/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14136 Subject : readcd Oops Submitter : Bob Tracy <rct@gherkin.frus.com> Date : 2009-09-03 3:39 (4 days old) References : http://marc.info/?l=linux-kernel&m=125195043617418&w=4 Handled-By : Michal Schmidt <mschmidt@redhat.com> Patch : http://patchwork.kernel.org/patch/45347/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14017 Subject : _end symbol missing from Symbol.map Submitter : Hannes Reinecke <hare@suse.de> Date : 2009-08-13 6:45 (25 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=091e52c3551d3031343df24b573b770b4c6c72b6 References : http://marc.info/?l=linux-kernel&m=125014649102253&w=4 Handled-By : Hannes Reinecke <hare@suse.de> Patch : http://marc.info/?l=linux-kernel&m=125014649102253&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13948 Subject : ath5k broken after suspend-to-ram Submitter : Johannes Stezenbach <js@sig21.net> Date : 2009-08-07 21:51 (31 days old) References : http://marc.info/?l=linux-kernel&m=124968192727854&w=4 Handled-By : Nick Kossifidis <mickflemm@gmail.com> Patch : http://patchwork.kernel.org/patch/38550/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13947 Subject : Libertas: Association request to the driver failed Submitter : Daniel Mack <daniel@caiaq.de> Date : 2009-08-07 19:11 (31 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=57921c312e8cef72ba35a4cfe870b376da0b1b87 References : http://marc.info/?l=linux-kernel&m=124967234311481&w=4 Handled-By : Roel Kluin <roel.kluin@gmail.com> Dan Williams <dcbw@redhat.com> Patch : http://patchwork.kernel.org/patch/43114/ For details, please visit the bug entries and follow the links given in references. As you can see, there is a Bugzilla entry for each of the listed regressions. There also is a Bugzilla entry used for tracking the regressions from 2.6.30, unresolved as well as resolved, at: http://bugzilla.kernel.org/show_bug.cgi?id=13615 Please let me know if there are any Bugzilla entries that should be added to the list in there. Thanks, Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13940] iwlagn and sky2 stopped working, ACPI-related 2009-09-06 17:15 2.6.31-rc9: Reported regressions from 2.6.30 Rafael J. Wysocki @ 2009-09-06 17:24 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-09-06 17:24 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Ricardo Jorge da Fonseca Marques Ferreira This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.30. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 Subject : iwlagn and sky2 stopped working, ACPI-related Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net> Date : 2009-08-07 22:33 (31 days old) References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13940] iwlagn and sky2 stopped working, ACPI-related @ 2009-09-06 17:24 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-09-06 17:24 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Ricardo Jorge da Fonseca Marques Ferreira This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.30. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 Subject : iwlagn and sky2 stopped working, ACPI-related Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm-cOTmPFJTJjbk1uMJSBkQmQ@public.gmane.org> Date : 2009-08-07 22:33 (31 days old) References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13940] iwlagn and sky2 stopped working, ACPI-related 2009-09-06 17:24 ` Rafael J. Wysocki @ 2009-09-06 20:55 ` Ricardo Jorge da Fonseca Marques Ferreira -1 siblings, 0 replies; 384+ messages in thread From: Ricardo Jorge da Fonseca Marques Ferreira @ 2009-09-06 20:55 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Kernel Testers List On Sunday 06 September 2009, Rafael J. Wysocki wrote: > This message has been generated automatically as a part of a report > of recent regressions. > > The following bug entry is on the current list of known regressions > from 2.6.30. Please verify if it still should be listed and let me know > (either way). Yes, the regression is still present. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13940] iwlagn and sky2 stopped working, ACPI-related @ 2009-09-06 20:55 ` Ricardo Jorge da Fonseca Marques Ferreira 0 siblings, 0 replies; 384+ messages in thread From: Ricardo Jorge da Fonseca Marques Ferreira @ 2009-09-06 20:55 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Kernel Testers List On Sunday 06 September 2009, Rafael J. Wysocki wrote: > This message has been generated automatically as a part of a report > of recent regressions. > > The following bug entry is on the current list of known regressions > from 2.6.30. Please verify if it still should be listed and let me know > (either way). Yes, the regression is still present. ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13940] iwlagn and sky2 stopped working, ACPI-related 2009-09-06 20:55 ` Ricardo Jorge da Fonseca Marques Ferreira (?) @ 2009-09-06 21:11 ` Rafael J. Wysocki -1 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-09-06 21:11 UTC (permalink / raw) To: Ricardo Jorge da Fonseca Marques Ferreira Cc: Linux Kernel Mailing List, Kernel Testers List On Sunday 06 September 2009, Ricardo Jorge da Fonseca Marques Ferreira wrote: > On Sunday 06 September 2009, Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.30. Please verify if it still should be listed and let me know > > (either way). > > Yes, the regression is still present. Thanks for the update. Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* 2.6.31-rc7-git2: Reported regressions from 2.6.30 @ 2009-08-25 20:00 Rafael J. Wysocki 2009-08-25 20:34 ` Rafael J. Wysocki 0 siblings, 1 reply; 384+ messages in thread From: Rafael J. Wysocki @ 2009-08-25 20:00 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich, Kernel Testers List, Network Development, Linux ACPI, Linux PM List, Linux SCSI List, Linux Wireless List, DRI This message contains a list of some regressions from 2.6.30, for which there are no fixes in the mainline I know of. If any of them have been fixed already, please let me know. If you know of any other unresolved regressions from 2.6.30, please let me know either and I'll add them to the list. Also, please let me know if any of the entries below are invalid. Each entry from the list will be sent additionally in an automatic reply to this message with CCs to the people involved in reporting and handling the issue. Listed regressions statistics: Date Total Pending Unresolved ---------------------------------------- 2009-08-26 108 33 26 2009-08-20 102 32 29 2009-08-10 89 27 24 2009-08-02 76 36 28 2009-07-27 70 51 43 2009-07-07 35 25 21 2009-06-29 22 22 15 Unresolved regressions ---------------------- Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14060 Subject : oops: sysfs_remove_link and i915 Submitter : Dominik Brodowski <linux@dominikbrodowski.net> Date : 2009-08-22 5:48 (4 days old) References : http://marc.info/?l=linux-kernel&m=125092139113955&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14058 Subject : Oops in fsnotify Submitter : Grant Wilson <grant.wilson@zen.co.uk> Date : 2009-08-20 15:48 (6 days old) References : http://marc.info/?l=linux-kernel&m=125078450923133&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14057 Subject : Strange network timeouts w/ e100 Submitter : Walt Holman <walt@holmansrus.com> Date : 2009-08-20 0:21 (6 days old) References : http://marc.info/?l=linux-kernel&m=125072831831443&w=4 Handled-By : Krzysztof Halasa <khc@pm.waw.pl> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14031 Subject : dvb_usb_af9015: Oops on hotplugging Submitter : Stefan Lippers-Hollmann <s.L-H@gmx.de> Date : 2009-08-05 20:32 (21 days old) References : http://marc.info/?l=linux-kernel&m=124949716608828&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14018 Subject : kernel freezes, inotify problem Submitter : Christoph Thielecke <christoph.thielecke@gmx.de> Date : 2009-08-19 12:48 (7 days old) References : http://marc.info/?l=linux-kernel&m=125068616818353&w=4 Handled-By : Eric Paris <eparis@parisplace.org> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14016 Subject : mm/ipw2200 regression Submitter : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Date : 2009-08-15 16:56 (11 days old) References : http://marc.info/?l=linux-kernel&m=125036437221408&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14015 Subject : pty regressed again, breaking expect and gcc's testsuite Submitter : Mikael Pettersson <mikpe@it.uu.se> Date : 2009-08-14 23:41 (12 days old) References : http://marc.info/?l=linux-kernel&m=125029329805643&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14013 Subject : hd don't show up Submitter : Tim Blechmann <tim@klingt.org> Date : 2009-08-14 8:26 (12 days old) References : http://marc.info/?l=linux-kernel&m=125023842514480&w=4 Handled-By : Tejun Heo <tj@kernel.org> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14012 Subject : latest git fried my x86_64 imac Submitter : Justin P. Mattock <justinmattock@gmail.com> Date : 2009-08-13 07:20 (13 days old) References : http://marc.info/?l=linux-kernel&m=125014080427090&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14011 Subject : Kernel paging request failed in kmem_cache_alloc Submitter : Matthias Dahl <ml_kernel@mortal-soul.de> Date : 2009-08-10 22:26 (16 days old) References : http://marc.info/?l=linux-kernel&m=124993603825082&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13987 Subject : Received NMI interrupt at resume Submitter : Christian Casteyde <casteyde.christian@free.fr> Date : 2009-08-15 07:55 (11 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950 Subject : Oops when USB Serial disconnected while in use Submitter : Bruno Prémont <bonbons@linux-vserver.org> Date : 2009-08-08 17:47 (18 days old) References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943 Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k Submitter : Fabio Comolli <fabio.comolli@gmail.com> Date : 2009-08-06 20:15 (20 days old) References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942 Subject : Troubles with AoE and uninitialized object Submitter : Bruno Prémont <bonbons@linux-vserver.org> Date : 2009-08-04 10:12 (22 days old) References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941 Subject : x86 Geode issue Submitter : Martin-Éric Racine <q-funk@iki.fi> Date : 2009-08-03 12:58 (23 days old) References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 Subject : iwlagn and sky2 stopped working, ACPI-related Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net> Date : 2009-08-07 22:33 (19 days old) References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935 Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) Submitter : Adrian Ulrich <kernel@blinkenlights.ch> Date : 2009-08-08 22:08 (18 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13906 Subject : Huawei E169 GPRS connection causes Ooops Submitter : Clemens Eisserer <linuxhippy@gmail.com> Date : 2009-08-04 09:02 (22 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13869 Subject : Radeon framebuffer (w/o KMS) corruption at boot. Submitter : Duncan <1i5t5.duncan@cox.net> Date : 2009-07-29 16:44 (28 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13848 Subject : iwlwifi (4965) regression since 2.6.30 Submitter : Lukas Hejtmanek <xhejtman@ics.muni.cz> Date : 2009-07-26 7:57 (31 days old) References : http://marc.info/?l=linux-kernel&m=124859658502866&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836 Subject : suspend script fails, related to stdout? Submitter : Tomas M. <tmezzadra@gmail.com> Date : 2009-07-17 21:24 (40 days old) References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819 Subject : system freeze when switching to console Submitter : Reinette Chatre <reinette.chatre@intel.com> Date : 2009-07-23 17:57 (34 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13809 Subject : oprofile: possible circular locking dependency detected Submitter : Jerome Marchand <jmarchan@redhat.com> Date : 2009-07-22 13:35 (35 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13740 Subject : X server crashes with 2.6.31-rc2 when options are changed Submitter : Michael S. Tsirkin <m.s.tsirkin@gmail.com> Date : 2009-07-07 15:19 (50 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13733 Subject : 2.6.31-rc2: irq 16: nobody cared Submitter : Niel Lambrechts <niel.lambrechts@gmail.com> Date : 2009-07-06 18:32 (51 days old) References : http://marc.info/?l=linux-kernel&m=124690524027166&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13645 Subject : NULL pointer dereference at (null) (level2_spare_pgt) Submitter : poornima nayak <mpnayak@linux.vnet.ibm.com> Date : 2009-06-17 17:56 (70 days old) References : http://lkml.org/lkml/2009/6/17/194 Regressions with patches ------------------------ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14062 Subject : Failure to boot as xen guest Submitter : Arnd Hannemann <hannemann@nets.rwth-aachen.de> Date : 2009-08-25 15:48 (1 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=83b519e8b9572c319c8e0c615ee5dd7272856090 References : http://marc.info/?l=linux-kernel&m=125121534229538&w=4 Handled-By : Jeremy Fitzhardinge <jeremy@goop.org> Patch : http://patchwork.kernel.org/patch/43799/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14061 Subject : Crash due to buggy flat_phys_pkg_id Submitter : Ravikiran G Thirumalai <kiran@scalex86.org> Date : 2009-08-24 18:26 (2 days old) References : http://marc.info/?l=linux-kernel&m=125114085701508&w=4 Handled-By : Yinghai Lu <yinghai@kernel.org> Patch : http://patchwork.kernel.org/patch/43806/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14030 Subject : Kernel NULL pointer dereference at 0000000000000008, pty-related Submitter : Eric W. Biederman <ebiederm@xmission.com> Date : 2009-08-20 5:46 (6 days old) References : http://marc.info/?l=linux-kernel&m=125074724623423&w=4 Handled-By : Linus Torvalds <torvalds@linux-foundation.org> Patch : http://patchwork.kernel.org/patch/43679/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14017 Subject : _end symbol missing from Symbol.map Submitter : Hannes Reinecke <hare@suse.de> Date : 2009-08-13 6:45 (13 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=091e52c3551d3031343df24b573b770b4c6c72b6 References : http://marc.info/?l=linux-kernel&m=125014649102253&w=4 Handled-By : Hannes Reinecke <hare@suse.de> Patch : http://marc.info/?l=linux-kernel&m=125014649102253&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13960 Subject : rtl8187 not connect to wifi Submitter : okias <d.okias@gmail.com> Date : 2009-08-10 19:16 (16 days old) Handled-By : Larry Finger <Larry.Finger@lwfinger.net> Patch : http://bugzilla.kernel.org/attachment.cgi?id=22798 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13948 Subject : ath5k broken after suspend-to-ram Submitter : Johannes Stezenbach <js@sig21.net> Date : 2009-08-07 21:51 (19 days old) References : http://marc.info/?l=linux-kernel&m=124968192727854&w=4 Handled-By : Nick Kossifidis <mickflemm@gmail.com> Patch : http://patchwork.kernel.org/patch/38550/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13947 Subject : Libertas: Association request to the driver failed Submitter : Daniel Mack <daniel@caiaq.de> Date : 2009-08-07 19:11 (19 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=57921c312e8cef72ba35a4cfe870b376da0b1b87 References : http://marc.info/?l=linux-kernel&m=124967234311481&w=4 Handled-By : Roel Kluin <roel.kluin@gmail.com> Dan Williams <dcbw@redhat.com> Patch : http://patchwork.kernel.org/patch/43114/ For details, please visit the bug entries and follow the links given in references. As you can see, there is a Bugzilla entry for each of the listed regressions. There also is a Bugzilla entry used for tracking the regressions from 2.6.30, unresolved as well as resolved, at: http://bugzilla.kernel.org/show_bug.cgi?id=13615 Please let me know if there are any Bugzilla entries that should be added to the list in there. Thanks, Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13940] iwlagn and sky2 stopped working, ACPI-related 2009-08-25 20:00 2.6.31-rc7-git2: Reported regressions from 2.6.30 Rafael J. Wysocki @ 2009-08-25 20:34 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Ricardo Jorge da Fonseca Marques Ferreira This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.30. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 Subject : iwlagn and sky2 stopped working, ACPI-related Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net> Date : 2009-08-07 22:33 (19 days old) References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13940] iwlagn and sky2 stopped working, ACPI-related @ 2009-08-25 20:34 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Ricardo Jorge da Fonseca Marques Ferreira This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.30. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 Subject : iwlagn and sky2 stopped working, ACPI-related Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm-cOTmPFJTJjbk1uMJSBkQmQ@public.gmane.org> Date : 2009-08-07 22:33 (19 days old) References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13940] iwlagn and sky2 stopped working, ACPI-related 2009-08-25 20:34 ` Rafael J. Wysocki @ 2009-08-26 0:00 ` Ricardo Jorge da Fonseca Marques Ferreira -1 siblings, 0 replies; 384+ messages in thread From: Ricardo Jorge da Fonseca Marques Ferreira @ 2009-08-26 0:00 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Kernel Testers List A patch has been proposed in the bugreport that fixes the problem, so if the patch is commited, the regression is fixed for me. I don't think the patch was commited yet. On Tuesday 25 August 2009, Rafael J. Wysocki wrote: > This message has been generated automatically as a part of a report > of recent regressions. > > The following bug entry is on the current list of known regressions > from 2.6.30. Please verify if it still should be listed and let me know > (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 > Subject : iwlagn and sky2 stopped working, ACPI-related > Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net> > Date : 2009-08-07 22:33 (19 days old) > References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 > ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13940] iwlagn and sky2 stopped working, ACPI-related @ 2009-08-26 0:00 ` Ricardo Jorge da Fonseca Marques Ferreira 0 siblings, 0 replies; 384+ messages in thread From: Ricardo Jorge da Fonseca Marques Ferreira @ 2009-08-26 0:00 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Kernel Testers List A patch has been proposed in the bugreport that fixes the problem, so if the patch is commited, the regression is fixed for me. I don't think the patch was commited yet. On Tuesday 25 August 2009, Rafael J. Wysocki wrote: > This message has been generated automatically as a part of a report > of recent regressions. > > The following bug entry is on the current list of known regressions > from 2.6.30. Please verify if it still should be listed and let me know > (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 > Subject : iwlagn and sky2 stopped working, ACPI-related > Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm-cOTmPFJTJjbk1uMJSBkQmQ@public.gmane.org> > Date : 2009-08-07 22:33 (19 days old) > References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 > ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13940] iwlagn and sky2 stopped working, ACPI-related @ 2009-08-26 20:58 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-08-26 20:58 UTC (permalink / raw) To: Ricardo Jorge da Fonseca Marques Ferreira Cc: Linux Kernel Mailing List, Kernel Testers List On Wednesday 26 August 2009, Ricardo Jorge da Fonseca Marques Ferreira wrote: > A patch has been proposed in the bugreport that fixes the problem, so if the > patch is commited, the regression is fixed for me. I don't think the patch was > commited yet. Well, honestly, it doesn't seem it will be applied. Thanks for the update anyway. Rafael > On Tuesday 25 August 2009, Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.30. Please verify if it still should be listed and let me know > > (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 > > Subject : iwlagn and sky2 stopped working, ACPI-related > > Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net> > > Date : 2009-08-07 22:33 (19 days old) > > References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13940] iwlagn and sky2 stopped working, ACPI-related @ 2009-08-26 20:58 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-08-26 20:58 UTC (permalink / raw) To: Ricardo Jorge da Fonseca Marques Ferreira Cc: Linux Kernel Mailing List, Kernel Testers List On Wednesday 26 August 2009, Ricardo Jorge da Fonseca Marques Ferreira wrote: > A patch has been proposed in the bugreport that fixes the problem, so if the > patch is commited, the regression is fixed for me. I don't think the patch was > commited yet. Well, honestly, it doesn't seem it will be applied. Thanks for the update anyway. Rafael > On Tuesday 25 August 2009, Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.30. Please verify if it still should be listed and let me know > > (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 > > Subject : iwlagn and sky2 stopped working, ACPI-related > > Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm-cOTmPFJTJjbk1uMJSBkQmQ@public.gmane.org> > > Date : 2009-08-07 22:33 (19 days old) > > References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
[parent not found: <bug-13940-13546@http.bugzilla.kernel.org/>]
[parent not found: <200908251555.n7PFt7Wt015763@demeter.kernel.org>]
* Re: [Bug 13940] iwlagn and sky2 stopped working, ACPI-related [not found] ` <200908251555.n7PFt7Wt015763@demeter.kernel.org> @ 2009-08-25 17:56 ` Yinghai Lu 2009-08-25 18:42 ` Linus Torvalds 2009-08-26 17:44 ` Yinghai Lu 0 siblings, 2 replies; 384+ messages in thread From: Yinghai Lu @ 2009-08-25 17:56 UTC (permalink / raw) To: bugzilla-daemon, Ingo Molnar, Linus Torvalds, Jesse Barnes, Ricardo Jorge da Fonseca Marques Ferreira, cebbert Cc: linux-kernel bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=13940 > > > Chuck Ebbert <cebbert@redhat.com> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |yinghai@kernel.org > > > > [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009f400 (usable) [ 0.000000] BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved) [ 0.000000] BIOS-e820: 00000000000d2000 - 00000000000d4000 (reserved) [ 0.000000] BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved) [ 0.000000] BIOS-e820: 0000000000100000 - 00000000b5aa1000 (usable) [ 0.000000] BIOS-e820: 00000000b5aa1000 - 00000000b5aa7000 (reserved) [ 0.000000] BIOS-e820: 00000000b5aa7000 - 00000000b5bba000 (usable) [ 0.000000] BIOS-e820: 00000000b5bba000 - 00000000b5c0f000 (reserved) [ 0.000000] BIOS-e820: 00000000b5c0f000 - 00000000b5d08000 (usable) [ 0.000000] BIOS-e820: 00000000b5d08000 - 00000000b5f0f000 (reserved) [ 0.000000] BIOS-e820: 00000000b5f0f000 - 00000000b5f18000 (usable) [ 0.000000] BIOS-e820: 00000000b5f18000 - 00000000b5f1f000 (reserved) [ 0.000000] BIOS-e820: 00000000b5f1f000 - 00000000b5f65000 (usable) [ 0.000000] BIOS-e820: 00000000b5f65000 - 00000000b5f9f000 (ACPI NVS) [ 0.000000] BIOS-e820: 00000000b5f9f000 - 00000000b5fe1000 (usable) [ 0.000000] BIOS-e820: 00000000b5fe1000 - 00000000b5fff000 (ACPI data) [ 0.000000] BIOS-e820: 00000000b5fff000 - 00000000b6000000 (usable) [ 0.000000] BIOS-e820: 0000000100000000 - 0000000140000000 (usable) some devices don't get allocated resources from BIOS [ 0.819921] pci 0000:00:1f.3: reg 10 64bit mmio: [0x000000-0x0000ff] [ 0.819939] pci 0000:00:1f.3: reg 20 io port: [0x1c00-0x1c1f] [ 0.820029] pci 0000:00:1c.0: bridge io port: [0x00-0xfff] [ 0.820033] pci 0000:00:1c.0: bridge 32bit mmio: [0x000000-0x0fffff] [ 0.820041] pci 0000:00:1c.0: bridge 64bit mmio pref: [0x000000-0x0fffff] [ 0.820113] pci 0000:07:00.0: reg 10 64bit mmio: [0x000000-0x003fff] [ 0.820123] pci 0000:07:00.0: reg 18 io port: [0x00-0xff] [ 0.820203] pci 0000:07:00.0: supports D1 D2 [ 0.820204] pci 0000:07:00.0: PME# supported from D0 D1 D2 D3hot D3cold [ 0.820213] pci 0000:07:00.0: PME# disabled [ 0.820289] pci 0000:00:1c.4: bridge io port: [0x00-0xfff] [ 0.820294] pci 0000:00:1c.4: bridge 32bit mmio: [0x000000-0x0fffff] [ 0.820301] pci 0000:00:1c.4: bridge 64bit mmio pref: [0x000000-0x0fffff] [ 0.820388] pci 0000:08:00.0: reg 10 64bit mmio: [0x000000-0x001fff] [ 0.820501] pci 0000:08:00.0: PME# supported from D0 D3hot D3cold [ 0.820510] pci 0000:08:00.0: PME# disabled [ 0.820593] pci 0000:00:1c.5: bridge io port: [0x00-0xfff] [ 0.820598] pci 0000:00:1c.5: bridge 32bit mmio: [0x000000-0x0fffff] [ 0.820605] pci 0000:00:1c.5: bridge 64bit mmio pref: [0x000000-0x0fffff] will be in [ 0.000000] Allocating PCI resources starting at b6000000 (gap: b6000000:4a000000) [ 7.878413] sky2 driver version 1.23 [ 7.884402] sky2 0000:07:00.0: enabling device (0000 -> 0003) [ 7.889483] sky2 0000:07:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 [ 7.894502] sky2 0000:07:00.0: setting latency timer to 64 [ 7.894555] sky2 0000:07:00.0: unsupported chip type 0xff [ 7.899554] sky2 0000:07:00.0: PCI INT A disabled [ 7.904379] sky2: probe of 0000:07:00.0 failed with error -95 [ 8.857709] iwlagn: Intel(R) Wireless WiFi Link AGN driver for Linux, 1.3.27kds [ 8.863357] iwlagn: Copyright(c) 2003-2009 Intel Corporation [ 8.875763] iwlagn 0000:08:00.0: enabling device (0000 -> 0002) [ 8.881477] iwlagn 0000:08:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 [ 8.887083] iwlagn 0000:08:00.0: setting latency timer to 64 [ 8.887128] iwlagn 0000:08:00.0: Detected Intel Wireless WiFi Link 5100AGN REV=0xFDFFFFFF [ 8.892723] alloc irq_desc for 22 on node 0 [ 8.892726] alloc kstat_irqs on node 0 [ 9.073995] iwlagn 0000:08:00.0: Failed, HW not ready [ 9.080292] iwlagn 0000:08:00.0: PCI INT A disabled 07:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8040 PCI-E Fast Ethernet Controller (rev \ 12) Subsystem: Toshiba America Info Systems Device ff50 ... Memory at b6000000 (64-bit, non-prefetchable) [size=16K] I/O ports at 2000 [size=256] ... 08:00.0 Network controller: Intel Corporation Wireless WiFi Link 5100 ... Memory at b6100000 (64-bit, non-prefetchable) [size=8K] ... please try to attached patch, that will increae alignment from 32M to 64M. --- arch/x86/kernel/e820.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6/arch/x86/kernel/e820.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/e820.c +++ linux-2.6/arch/x86/kernel/e820.c @@ -1378,8 +1378,8 @@ static unsigned long ram_alignment(resou if (mb < 16) return 1024*1024; - /* To 32MB for anything above that */ - return 32*1024*1024; + /* To 64MB for anything above that */ + return 64*1024*1024; } #define MAX_RESOURCE_SIZE ((resource_size_t)-1) ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug 13940] iwlagn and sky2 stopped working, ACPI-related 2009-08-25 17:56 ` [Bug 13940] " Yinghai Lu @ 2009-08-25 18:42 ` Linus Torvalds 2009-08-25 19:00 ` Yinghai Lu 2009-08-26 17:44 ` Yinghai Lu 1 sibling, 1 reply; 384+ messages in thread From: Linus Torvalds @ 2009-08-25 18:42 UTC (permalink / raw) To: Yinghai Lu Cc: bugzilla-daemon, Ingo Molnar, Jesse Barnes, Ricardo Jorge da Fonseca Marques Ferreira, cebbert, linux-kernel On Tue, 25 Aug 2009, Yinghai Lu wrote: > > please try to attached patch, that will increae alignment from 32M to 64M. Hmm. That may indeed fix the problem, because we have: - working-2.6.30.log: Allocating PCI resources starting at b8000000 (gap: b6000000:4a000000) - not-working-2.6.31.log: Allocating PCI resources starting at b6000000 (gap: b6000000:4a000000) HOWEVER. We also have: - working-2.6.31_acpi=off.log: Allocating PCI resources starting at b6000000 (gap: b6000000:4a000000) ie it really does seem to be ACPI-related somehow: starting PCI allocations at that b6000000 address works perfectly fine if ACPI is not enabled. In the not-working version, we end up getting: [ 1.408588] pci 0000:00:1c.4: PCI bridge, secondary bus 0000:07 [ 1.408593] pci 0000:00:1c.4: IO window: 0x2000-0x2fff [ 1.408600] pci 0000:00:1c.4: MEM window: 0xb6000000-0xb60fffff [ 1.408606] pci 0000:00:1c.4: PREFETCH window: disabled [ 1.408623] pci 0000:00:1c.5: PCI bridge, secondary bus 0000:08 [ 1.408626] pci 0000:00:1c.5: IO window: disabled [ 1.408633] pci 0000:00:1c.5: MEM window: 0xb6100000-0xb61fffff [ 1.408639] pci 0000:00:1c.5: PREFETCH window: disabled while in the working version we have: - ACPI off - looks like a BIOS allocated memory window: [ 0.290854] pci 0000:00:1c.4: PCI bridge, secondary bus 0000:07 [ 0.290854] pci 0000:00:1c.4: IO window: 0x3000-0x3fff [ 0.290854] pci 0000:00:1c.4: MEM window: 0xf4500000-0xf45fffff [ 0.290854] pci 0000:00:1c.4: PREFETCH window: disabled [ 0.290854] pci 0000:00:1c.5: PCI bridge, secondary bus 0000:08 [ 0.290854] pci 0000:00:1c.5: IO window: disabled [ 0.290854] pci 0000:00:1c.5: MEM window: 0xf4600000-0xf46fffff [ 0.290854] pci 0000:00:1c.5: PREFETCH window: disabled - ACPI on - we allocated the memory window, but at 0xb8000000+, rather than directly after end-of-RAM: [ 0.842970] pci 0000:00:1c.4: PCI bridge, secondary bus 0000:07 [ 0.842975] pci 0000:00:1c.4: IO window: 0x2000-0x2fff [ 0.842983] pci 0000:00:1c.4: MEM window: 0xb8000000-0xb80fffff [ 0.842989] pci 0000:00:1c.4: PREFETCH window: disabled [ 0.843012] pci 0000:00:1c.5: PCI bridge, secondary bus 0000:08 [ 0.843016] pci 0000:00:1c.5: IO window: disabled [ 0.843023] pci 0000:00:1c.5: MEM window: 0xb8100000-0xb81fffff [ 0.843029] pci 0000:00:1c.5: PREFETCH window: disabled ie for some reason ACPI caused that bus to be re-allocated, and re-allocating it right after the memory window doesn't work. Crazy. I wonder what is hiding at that 0xb6000000 address. And while I think that in this case rounding up to 64MB will fix it, I worry that our old model (of never starting directly after RAM, even if it was aligned) may not have been safer. Linus ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug 13940] iwlagn and sky2 stopped working, ACPI-related 2009-08-25 18:42 ` Linus Torvalds @ 2009-08-25 19:00 ` Yinghai Lu 0 siblings, 0 replies; 384+ messages in thread From: Yinghai Lu @ 2009-08-25 19:00 UTC (permalink / raw) To: Linus Torvalds Cc: bugzilla-daemon, Ingo Molnar, Jesse Barnes, Ricardo Jorge da Fonseca Marques Ferreira, cebbert, linux-kernel Linus Torvalds wrote: > > On Tue, 25 Aug 2009, Yinghai Lu wrote: >> please try to attached patch, that will increae alignment from 32M to 64M. > > Hmm. That may indeed fix the problem, because we have: > > - working-2.6.30.log: > > Allocating PCI resources starting at b8000000 (gap: b6000000:4a000000) > > - not-working-2.6.31.log: > > Allocating PCI resources starting at b6000000 (gap: b6000000:4a000000) > > HOWEVER. We also have: > > - working-2.6.31_acpi=off.log: > > Allocating PCI resources starting at b6000000 (gap: b6000000:4a000000) > > ie it really does seem to be ACPI-related somehow: starting PCI > allocations at that b6000000 address works perfectly fine if ACPI is not > enabled. > > In the not-working version, we end up getting: > > [ 1.408588] pci 0000:00:1c.4: PCI bridge, secondary bus 0000:07 > [ 1.408593] pci 0000:00:1c.4: IO window: 0x2000-0x2fff > [ 1.408600] pci 0000:00:1c.4: MEM window: 0xb6000000-0xb60fffff > [ 1.408606] pci 0000:00:1c.4: PREFETCH window: disabled > [ 1.408623] pci 0000:00:1c.5: PCI bridge, secondary bus 0000:08 > [ 1.408626] pci 0000:00:1c.5: IO window: disabled > [ 1.408633] pci 0000:00:1c.5: MEM window: 0xb6100000-0xb61fffff > [ 1.408639] pci 0000:00:1c.5: PREFETCH window: disabled > > > while in the working version we have: > > - ACPI off - looks like a BIOS allocated memory window: > > [ 0.290854] pci 0000:00:1c.4: PCI bridge, secondary bus 0000:07 > [ 0.290854] pci 0000:00:1c.4: IO window: 0x3000-0x3fff > [ 0.290854] pci 0000:00:1c.4: MEM window: 0xf4500000-0xf45fffff > [ 0.290854] pci 0000:00:1c.4: PREFETCH window: disabled > [ 0.290854] pci 0000:00:1c.5: PCI bridge, secondary bus 0000:08 > [ 0.290854] pci 0000:00:1c.5: IO window: disabled > [ 0.290854] pci 0000:00:1c.5: MEM window: 0xf4600000-0xf46fffff > [ 0.290854] pci 0000:00:1c.5: PREFETCH window: disabled interesting, when acpi=off, BIOS does allocate resource for them [ 0.261960] pci 0000:07:00.0: reg 10 64bit mmio: [0xf4500000-0xf4503fff] [ 0.261970] pci 0000:07:00.0: reg 18 io port: [0x3000-0x30ff] [ 0.262049] pci 0000:07:00.0: supports D1 D2 [ 0.262051] pci 0000:07:00.0: PME# supported from D0 D1 D2 D3hot D3cold [ 0.262058] pci 0000:07:00.0: PME# disabled [ 0.272117] pci 0000:00:1c.4: bridge io port: [0x3000-0x3fff] [ 0.272122] pci 0000:00:1c.4: bridge 32bit mmio: [0xf4500000-0xf45fffff] [ 0.272212] pci 0000:08:00.0: reg 10 64bit mmio: [0xf4600000-0xf4601fff] [ 0.272321] pci 0000:08:00.0: PME# supported from D0 D3hot D3cold [ 0.272330] pci 0000:08:00.0: PME# disabled [ 0.280128] pci 0000:00:1c.5: bridge 32bit mmio: [0xf4600000-0xf46fffff] > > - ACPI on - we allocated the memory window, but at 0xb8000000+, rather > than directly after end-of-RAM: > > [ 0.842970] pci 0000:00:1c.4: PCI bridge, secondary bus 0000:07 > [ 0.842975] pci 0000:00:1c.4: IO window: 0x2000-0x2fff > [ 0.842983] pci 0000:00:1c.4: MEM window: 0xb8000000-0xb80fffff > [ 0.842989] pci 0000:00:1c.4: PREFETCH window: disabled > [ 0.843012] pci 0000:00:1c.5: PCI bridge, secondary bus 0000:08 > [ 0.843016] pci 0000:00:1c.5: IO window: disabled > [ 0.843023] pci 0000:00:1c.5: MEM window: 0xb8100000-0xb81fffff > [ 0.843029] pci 0000:00:1c.5: PREFETCH window: disabled > > ie for some reason ACPI caused that bus to be re-allocated, and > re-allocating it right after the memory window doesn't work. > > Crazy. > > I wonder what is hiding at that 0xb6000000 address. And while I think that > in this case rounding up to 64MB will fix it, I worry that our old model > (of never starting directly after RAM, even if it was aligned) may not > have been safer. > acpi does clear sth. YH ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug 13940] iwlagn and sky2 stopped working, ACPI-related 2009-08-25 17:56 ` [Bug 13940] " Yinghai Lu 2009-08-25 18:42 ` Linus Torvalds @ 2009-08-26 17:44 ` Yinghai Lu 1 sibling, 0 replies; 384+ messages in thread From: Yinghai Lu @ 2009-08-26 17:44 UTC (permalink / raw) To: bugzilla-daemon, Ricardo Jorge da Fonseca Marques Ferreira Cc: Ingo Molnar, Linus Torvalds, Jesse Barnes, cebbert, linux-kernel bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=13940 > > > > > > --- Comment #12 from Ricardo Ferreira <bugzillas@sys49152.net> 2009-08-26 16:54:49 --- > Created an attachment (id=22864) > --> (http://bugzilla.kernel.org/attachment.cgi?id=22864) > dmesg of not wroking kernel booted with "debug pci=earlydump" > > This is of the 2.6.31-rc6 kernel with neither the patch applied or the commit > reverted. > [ 0.000000] pci 0000:07:00.0 config space: [ 0.000000] 00: ab 11 55 43 07 00 10 00 12 00 00 02 10 00 00 00 [ 0.000000] 10: 04 00 50 f4 00 00 00 00 01 30 00 00 00 00 00 00 [ 0.000000] 20: 00 00 00 00 00 00 00 00 00 00 00 00 79 11 50 ff [ 0.000000] 30: 00 00 00 00 48 00 00 00 00 00 00 00 0b 01 00 00 [ 0.000000] 40: 00 00 b0 84 09 c0 a0 01 01 5c 03 fe 00 20 00 13 [ 0.000000] 50: 03 5c 00 80 00 00 00 00 00 00 04 00 05 c0 80 00 [ 0.000000] 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] 70: 00 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] 80: 00 00 00 00 00 30 00 00 00 00 00 00 82 a8 e8 00 [ 0.000000] 90: 00 00 00 00 00 00 00 00 a0 25 26 00 00 00 00 00 [ 0.000000] a0: f6 00 00 ff 40 00 08 01 0c 31 33 40 04 0a 10 44 [ 0.000000] b0: 00 00 00 05 00 00 60 20 fa 00 00 00 00 00 00 00 [ 0.000000] c0: 10 00 12 00 c0 8f 04 05 00 20 19 00 11 ac 07 00 [ 0.000000] d0: 48 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] e0: 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] pci 0000:08:00.0 config space: [ 0.000000] 00: 86 80 32 42 06 00 10 00 00 00 80 02 10 00 00 00 [ 0.000000] 10: 04 00 60 f4 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 01 12 [ 0.000000] 30: 00 00 00 00 c8 00 00 00 00 00 00 00 05 01 00 00 [ 0.000000] 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] c0: 00 00 00 00 00 00 00 00 01 d0 23 c8 00 00 00 0d [ 0.000000] d0: 05 e0 80 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.000000] e0: 10 00 01 00 c0 8e 00 10 10 08 19 00 11 9c 06 00 [ 0.000000] f0: 40 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00 [ 0.815933] pci 0000:07:00.0: reg 10 64bit mmio: [0x000000-0x003fff] [ 0.815946] pci 0000:07:00.0: reg 18 io port: [0x00-0xff] [ 0.816029] pci 0000:07:00.0: supports D1 D2 [ 0.816033] pci 0000:07:00.0: PME# supported from D0 D1 D2 D3hot D3cold [ 0.816041] pci 0000:07:00.0: PME# disabled [ 0.816223] pci 0000:08:00.0: reg 10 64bit mmio: [0x000000-0x001fff] [ 0.816339] pci 0000:08:00.0: PME# supported from D0 D3hot D3cold [ 0.816348] pci 0000:08:00.0: PME# disabled not sure it is caused by acpi or mmconf... please try to boot with pci=nommconf YH ^ permalink raw reply [flat|nested] 384+ messages in thread
* 2.6.31-rc6-git5: Reported regressions from 2.6.30 @ 2009-08-19 20:20 Rafael J. Wysocki 2009-08-19 20:26 ` [Bug #13940] iwlagn and sky2 stopped working, ACPI-related Rafael J. Wysocki 0 siblings, 1 reply; 384+ messages in thread From: Rafael J. Wysocki @ 2009-08-19 20:20 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich, Kernel Testers List, Network Development, Linux ACPI, Linux PM List, Linux SCSI List, Linux Wireless List, DRI This message contains a list of some regressions from 2.6.30, for which there are no fixes in the mainline I know of. If any of them have been fixed already, please let me know. If you know of any other unresolved regressions from 2.6.30, please let me know either and I'll add them to the list. Also, please let me know if any of the entries below are invalid. Each entry from the list will be sent additionally in an automatic reply to this message with CCs to the people involved in reporting and handling the issue. Listed regressions statistics: Date Total Pending Unresolved ---------------------------------------- 2009-08-20 102 32 29 2009-08-10 89 27 24 2009-08-02 76 36 28 2009-07-27 70 51 43 2009-07-07 35 25 21 2009-06-29 22 22 15 Unresolved regressions ---------------------- Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14018 Subject : kernel freezes, inotify problem Submitter : Christoph Thielecke <christoph.thielecke@gmx.de> Date : 2009-08-19 12:48 (1 days old) References : http://marc.info/?l=linux-kernel&m=125068616818353&w=4 Handled-By : Eric Paris <eparis@parisplace.org> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14016 Subject : mm/ipw2200 regression Submitter : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Date : 2009-08-15 16:56 (5 days old) References : http://marc.info/?l=linux-kernel&m=125036437221408&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14015 Subject : pty regressed again, breaking expect and gcc's testsuite Submitter : Mikael Pettersson <mikpe@it.uu.se> Date : 2009-08-14 23:41 (6 days old) References : http://marc.info/?l=linux-kernel&m=125029329805643&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14014 Subject : kernel bug at shut down Submitter : Norbert Preining <preining@logic.at> Date : 2009-08-14 9:11 (6 days old) References : http://marc.info/?l=linux-kernel&m=125024112418870&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14013 Subject : hd don't show up Submitter : Tim Blechmann <tim@klingt.org> Date : 2009-08-14 8:26 (6 days old) References : http://marc.info/?l=linux-kernel&m=125023842514480&w=4 Handled-By : Tejun Heo <tj@kernel.org> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14012 Subject : latest git fried my x86_64 imac Submitter : Justin P. Mattock <justinmattock@gmail.com> Date : 2009-08-13 07:20 (7 days old) References : http://marc.info/?l=linux-kernel&m=125014080427090&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14011 Subject : Kernel paging request failed in kmem_cache_alloc Submitter : Matthias Dahl <ml_kernel@mortal-soul.de> Date : 2009-08-10 22:26 (10 days old) References : http://marc.info/?l=linux-kernel&m=124993603825082&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14003 Subject : Infinite loop on bootup while handling DMAR Submitter : Bernhard Rosenkraenzer <bero@arklinux.org> Date : 2009-08-18 14:54 (2 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14002 Subject : WARNING: at net/ipv4/af_inet.c:154 inet_sock_destruct+0x164/0x1c0() Submitter : Ralf Hildebrandt <ralf.hildebrandt@charite.de> Date : 2009-08-18 12:37 (2 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13987 Subject : Received NMI interrupt at resume Submitter : Christian Casteyde <casteyde.christian@free.fr> Date : 2009-08-15 07:55 (5 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13960 Subject : rtl8187 not connect to wifi Submitter : okias <d.okias@gmail.com> Date : 2009-08-10 19:16 (10 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950 Subject : Oops when USB Serial disconnected while in use Submitter : Bruno Prémont <bonbons@linux-vserver.org> Date : 2009-08-08 17:47 (12 days old) References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13947 Subject : Libertas: Association request to the driver failed Submitter : Daniel Mack <daniel@caiaq.de> Date : 2009-08-07 19:11 (13 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=57921c312e8cef72ba35a4cfe870b376da0b1b87 References : http://marc.info/?l=linux-kernel&m=124967234311481&w=4 Handled-By : Roel Kluin <roel.kluin@gmail.com> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943 Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k Submitter : Fabio Comolli <fabio.comolli@gmail.com> Date : 2009-08-06 20:15 (14 days old) References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942 Subject : Troubles with AoE and uninitialized object Submitter : Bruno Prémont <bonbons@linux-vserver.org> Date : 2009-08-04 10:12 (16 days old) References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941 Subject : x86 Geode issue Submitter : Martin-Éric Racine <q-funk@iki.fi> Date : 2009-08-03 12:58 (17 days old) References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 Subject : iwlagn and sky2 stopped working, ACPI-related Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net> Date : 2009-08-07 22:33 (13 days old) References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935 Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) Submitter : Adrian Ulrich <kernel@blinkenlights.ch> Date : 2009-08-08 22:08 (12 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13914 Subject : e1000e reports invalid NVM Checksum on 82566DM-2 (bisected) Submitter : <jsbronder@gentoo.org> Date : 2009-08-04 18:06 (16 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13906 Subject : Huawei E169 GPRS connection causes Ooops Submitter : Clemens Eisserer <linuxhippy@gmail.com> Date : 2009-08-04 09:02 (16 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13899 Subject : Oops from tar, 2.6.31-rc5, 32 bit on quad core phenom. Submitter : Gene Heskett <gene.heskett@verizon.net> Date : 2009-08-01 13:04 (19 days old) References : http://marc.info/?l=linux-kernel&m=124913190304149&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13869 Subject : Radeon framebuffer (w/o KMS) corruption at boot. Submitter : Duncan <1i5t5.duncan@cox.net> Date : 2009-07-29 16:44 (22 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13848 Subject : iwlwifi (4965) regression since 2.6.30 Submitter : Lukas Hejtmanek <xhejtman@ics.muni.cz> Date : 2009-07-26 7:57 (25 days old) References : http://marc.info/?l=linux-kernel&m=124859658502866&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836 Subject : suspend script fails, related to stdout? Submitter : Tomas M. <tmezzadra@gmail.com> Date : 2009-07-17 21:24 (34 days old) References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819 Subject : system freeze when switching to console Submitter : Reinette Chatre <reinette.chatre@intel.com> Date : 2009-07-23 17:57 (28 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13809 Subject : oprofile: possible circular locking dependency detected Submitter : Jerome Marchand <jmarchan@redhat.com> Date : 2009-07-22 13:35 (29 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13740 Subject : X server crashes with 2.6.31-rc2 when options are changed Submitter : Michael S. Tsirkin <m.s.tsirkin@gmail.com> Date : 2009-07-07 15:19 (44 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13733 Subject : 2.6.31-rc2: irq 16: nobody cared Submitter : Niel Lambrechts <niel.lambrechts@gmail.com> Date : 2009-07-06 18:32 (45 days old) References : http://marc.info/?l=linux-kernel&m=124690524027166&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13645 Subject : NULL pointer dereference at (null) (level2_spare_pgt) Submitter : poornima nayak <mpnayak@linux.vnet.ibm.com> Date : 2009-06-17 17:56 (64 days old) References : http://lkml.org/lkml/2009/6/17/194 Regressions with patches ------------------------ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14017 Subject : _end symbol missing from Symbol.map Submitter : Hannes Reinecke <hare@suse.de> Date : 2009-08-13 6:45 (7 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=091e52c3551d3031343df24b573b770b4c6c72b6 References : http://marc.info/?l=linux-kernel&m=125014649102253&w=4 Handled-By : Hannes Reinecke <hare@suse.de> Patch : http://marc.info/?l=linux-kernel&m=125014649102253&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13948 Subject : ath5k broken after suspend-to-ram Submitter : Johannes Stezenbach <js@sig21.net> Date : 2009-08-07 21:51 (13 days old) References : http://marc.info/?l=linux-kernel&m=124968192727854&w=4 Handled-By : Nick Kossifidis <mickflemm@gmail.com> Patch : http://patchwork.kernel.org/patch/38550/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13946 Subject : x86 MCE malfunction on Thinkpad T42p Submitter : Johannes Stezenbach <js@sig21.net> Date : 2009-08-07 17:09 (13 days old) References : http://marc.info/?l=linux-kernel&m=124966500232399&w=4 Handled-By : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Patch : http://patchwork.kernel.org/patch/37908/ For details, please visit the bug entries and follow the links given in references. As you can see, there is a Bugzilla entry for each of the listed regressions. There also is a Bugzilla entry used for tracking the regressions from 2.6.30, unresolved as well as resolved, at: http://bugzilla.kernel.org/show_bug.cgi?id=13615 Please let me know if there are any Bugzilla entries that should be added to the list in there. Thanks, Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13940] iwlagn and sky2 stopped working, ACPI-related 2009-08-19 20:20 2.6.31-rc6-git5: Reported regressions from 2.6.30 Rafael J. Wysocki @ 2009-08-19 20:26 ` Rafael J. Wysocki 2009-08-19 23:54 ` Ricardo Jorge da Fonseca Marques Ferreira 0 siblings, 1 reply; 384+ messages in thread From: Rafael J. Wysocki @ 2009-08-19 20:26 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Ricardo Jorge da Fonseca Marques Ferreira This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.30. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 Subject : iwlagn and sky2 stopped working, ACPI-related Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net> Date : 2009-08-07 22:33 (13 days old) References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13940] iwlagn and sky2 stopped working, ACPI-related 2009-08-19 20:26 ` [Bug #13940] iwlagn and sky2 stopped working, ACPI-related Rafael J. Wysocki @ 2009-08-19 23:54 ` Ricardo Jorge da Fonseca Marques Ferreira 0 siblings, 0 replies; 384+ messages in thread From: Ricardo Jorge da Fonseca Marques Ferreira @ 2009-08-19 23:54 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Kernel Testers List Yes, 2.6.31-rc6 still has the bug. On Wednesday 19 August 2009, Rafael J. Wysocki wrote: > This message has been generated automatically as a part of a report > of recent regressions. > > The following bug entry is on the current list of known regressions > from 2.6.30. Please verify if it still should be listed and let me know > (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 > Subject : iwlagn and sky2 stopped working, ACPI-related > Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net> > Date : 2009-08-07 22:33 (13 days old) > References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 > ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13940] iwlagn and sky2 stopped working, ACPI-related @ 2009-08-19 23:54 ` Ricardo Jorge da Fonseca Marques Ferreira 0 siblings, 0 replies; 384+ messages in thread From: Ricardo Jorge da Fonseca Marques Ferreira @ 2009-08-19 23:54 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Kernel Testers List Yes, 2.6.31-rc6 still has the bug. On Wednesday 19 August 2009, Rafael J. Wysocki wrote: > This message has been generated automatically as a part of a report > of recent regressions. > > The following bug entry is on the current list of known regressions > from 2.6.30. Please verify if it still should be listed and let me know > (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 > Subject : iwlagn and sky2 stopped working, ACPI-related > Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm-cOTmPFJTJjbk1uMJSBkQmQ@public.gmane.org> > Date : 2009-08-07 22:33 (13 days old) > References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 > ^ permalink raw reply [flat|nested] 384+ messages in thread
[parent not found: <200908200054.36939.storm-cOTmPFJTJjbk1uMJSBkQmQ@public.gmane.org>]
* Re: [Bug #13940] iwlagn and sky2 stopped working, ACPI-related 2009-08-19 23:54 ` Ricardo Jorge da Fonseca Marques Ferreira @ 2009-08-20 14:59 ` Rafael J. Wysocki -1 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-08-20 14:59 UTC (permalink / raw) To: Ricardo Jorge da Fonseca Marques Ferreira Cc: Linux Kernel Mailing List, Kernel Testers List, ACPI Devel Maling List On Thursday 20 August 2009, Ricardo Jorge da Fonseca Marques Ferreira wrote: > Yes, 2.6.31-rc6 still has the bug. Thanks for the update. Rafael > On Wednesday 19 August 2009, Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.30. Please verify if it still should be listed and let me know > > (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 > > Subject : iwlagn and sky2 stopped working, ACPI-related > > Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm-cOTmPFJTJjbk1uMJSBkQmQ@public.gmane.org> > > Date : 2009-08-07 22:33 (13 days old) > > References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* Re: [Bug #13940] iwlagn and sky2 stopped working, ACPI-related @ 2009-08-20 14:59 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-08-20 14:59 UTC (permalink / raw) To: Ricardo Jorge da Fonseca Marques Ferreira Cc: Linux Kernel Mailing List, Kernel Testers List, ACPI Devel Maling List On Thursday 20 August 2009, Ricardo Jorge da Fonseca Marques Ferreira wrote: > Yes, 2.6.31-rc6 still has the bug. Thanks for the update. Rafael > On Wednesday 19 August 2009, Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.30. Please verify if it still should be listed and let me know > > (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 > > Subject : iwlagn and sky2 stopped working, ACPI-related > > Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net> > > Date : 2009-08-07 22:33 (13 days old) > > References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
* 2.6.31-rc5-git5: Reported regressions from 2.6.30 @ 2009-08-09 20:36 Rafael J. Wysocki 2009-08-09 20:44 ` [Bug #13940] iwlagn and sky2 stopped working, ACPI-related Rafael J. Wysocki 0 siblings, 1 reply; 384+ messages in thread From: Rafael J. Wysocki @ 2009-08-09 20:36 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich, Kernel Testers List, Network Development, Linux ACPI, Linux PM List, Linux SCSI List, Linux Wireless List, DRI This message contains a list of some regressions from 2.6.30, for which there are no fixes in the mainline I know of. If any of them have been fixed already, please let me know. If you know of any other unresolved regressions from 2.6.30, please let me know either and I'll add them to the list. Also, please let me know if any of the entries below are invalid. Each entry from the list will be sent additionally in an automatic reply to this message with CCs to the people involved in reporting and handling the issue. Listed regressions statistics: Date Total Pending Unresolved ---------------------------------------- 2009-08-10 89 27 24 2009-08-02 76 36 28 2009-07-27 70 51 43 2009-07-07 35 25 21 2009-06-29 22 22 15 Unresolved regressions ---------------------- Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950 Subject : Oops when USB Serial disconnected while in use Submitter : Bruno Prémont <bonbons@linux-vserver.org> Date : 2009-08-08 17:47 (2 days old) References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13947 Subject : Libertas: Association request to the driver failed Submitter : Daniel Mack <daniel@caiaq.de> Date : 2009-08-07 19:11 (3 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=57921c312e8cef72ba35a4cfe870b376da0b1b87 References : http://marc.info/?l=linux-kernel&m=124967234311481&w=4 Handled-By : Roel Kluin <roel.kluin@gmail.com> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943 Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k Submitter : Fabio Comolli <fabio.comolli@gmail.com> Date : 2009-08-06 20:15 (4 days old) References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942 Subject : Troubles with AoE and uninitialized object Submitter : Bruno Prémont <bonbons@linux-vserver.org> Date : 2009-08-04 10:12 (6 days old) References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941 Subject : x86 Geode issue Submitter : Martin-Éric Racine <q-funk@iki.fi> Date : 2009-08-03 12:58 (7 days old) References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 Subject : iwlagn and sky2 stopped working, ACPI-related Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net> Date : 2009-08-07 22:33 (3 days old) References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935 Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) Submitter : Adrian Ulrich <kernel@blinkenlights.ch> Date : 2009-08-08 22:08 (2 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13914 Subject : e1000e reports invalid NVM Checksum on 82566DM-2 (bisected) Submitter : <jsbronder@gentoo.org> Date : 2009-08-04 18:06 (6 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13906 Subject : Huawei E169 GPRS connection causes Ooops Submitter : Clemens Eisserer <linuxhippy@gmail.com> Date : 2009-08-04 09:02 (6 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13899 Subject : Oops from tar, 2.6.31-rc5, 32 bit on quad core phenom. Submitter : Gene Heskett <gene.heskett@verizon.net> Date : 2009-08-01 13:04 (9 days old) References : http://marc.info/?l=linux-kernel&m=124913190304149&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13895 Subject : 2.6.31-rc4 - slab entry tak_delay_info leaking ??? Submitter : Paul Rolland <rol@as2917.net> Date : 2009-07-29 08:20 (12 days old) References : http://marc.info/?l=linux-kernel&m=124884847925375&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13869 Subject : Radeon framebuffer (w/o KMS) corruption at boot. Submitter : Duncan <1i5t5.duncan@cox.net> Date : 2009-07-29 16:44 (12 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13848 Subject : iwlwifi (4965) regression since 2.6.30 Submitter : Lukas Hejtmanek <xhejtman@ics.muni.cz> Date : 2009-07-26 7:57 (15 days old) References : http://marc.info/?l=linux-kernel&m=124859658502866&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13846 Subject : LEDs switched off permanently by power saving with rt61pci driver Submitter : Chris Clayton <chris2553@googlemail.com> Date : 2009-07-13 8:27 (28 days old) References : http://marc.info/?l=linux-kernel&m=124747418828398&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13837 Subject : Input : regression - touchpad not detected Submitter : Dave Young <hidave.darkstar@gmail.com> Date : 2009-07-17 07:13 (24 days old) References : http://marc.info/?l=linux-kernel&m=124780763701571&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836 Subject : suspend script fails, related to stdout? Submitter : Tomas M. <tmezzadra@gmail.com> Date : 2009-07-17 21:24 (24 days old) References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13833 Subject : Kernel Oops when trying to suspend with ubifs mounted on block2mtd mtd device Submitter : Tobias Diedrich <ranma@tdiedrich.de> Date : 2009-07-15 14:20 (26 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=15bce40cb3133bcc07d548013df97e4653d363c1 References : http://marc.info/?l=linux-kernel&m=124766049207807&w=4 http://marc.info/?l=linux-kernel&m=124704927819769&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819 Subject : system freeze when switching to console Submitter : Reinette Chatre <reinette.chatre@intel.com> Date : 2009-07-23 17:57 (18 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13809 Subject : oprofile: possible circular locking dependency detected Submitter : Jerome Marchand <jmarchan@redhat.com> Date : 2009-07-22 13:35 (19 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13740 Subject : X server crashes with 2.6.31-rc2 when options are changed Submitter : Michael S. Tsirkin <m.s.tsirkin@gmail.com> Date : 2009-07-07 15:19 (34 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13733 Subject : 2.6.31-rc2: irq 16: nobody cared Submitter : Niel Lambrechts <niel.lambrechts@gmail.com> Date : 2009-07-06 18:32 (35 days old) References : http://marc.info/?l=linux-kernel&m=124690524027166&w=4 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13716 Subject : The AIC-7892P controller does not work any more Submitter : Andrej Podzimek <andrej@podzimek.org> Date : 2009-07-05 19:23 (36 days old) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13713 Subject : [drm/i915] Possible regression due to commit "Change GEM throttling to be 20ms (...)" Submitter : <kazikcz@gmail.com> Date : 2009-07-05 10:49 (36 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b962442e46a9340bdbc6711982c59ff0cc2b5afb Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13645 Subject : NULL pointer dereference at (null) (level2_spare_pgt) Submitter : poornima nayak <mpnayak@linux.vnet.ibm.com> Date : 2009-06-17 17:56 (54 days old) References : http://lkml.org/lkml/2009/6/17/194 Regressions with patches ------------------------ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13948 Subject : ath5k broken after suspend-to-ram Submitter : Johannes Stezenbach <js@sig21.net> Date : 2009-08-07 21:51 (3 days old) References : http://marc.info/?l=linux-kernel&m=124968192727854&w=4 Handled-By : Nick Kossifidis <mickflemm@gmail.com> Patch : http://patchwork.kernel.org/patch/38550/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13946 Subject : x86 MCE malfunction on Thinkpad T42p Submitter : Johannes Stezenbach <js@sig21.net> Date : 2009-08-07 17:09 (3 days old) References : http://marc.info/?l=linux-kernel&m=124966500232399&w=4 Handled-By : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Patch : http://patchwork.kernel.org/patch/37908/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13944 Subject : MD raid regression Submitter : Mike Snitzer <snitzer@redhat.com> Date : 2009-08-05 15:06 (5 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=449aad3e25358812c43afc60918c5ad3819488e7 References : http://marc.info/?l=linux-kernel&m=124948481218857&w=4 Handled-By : NeilBrown <neilb@suse.de> Patch : http://patchwork.kernel.org/patch/39521/ For details, please visit the bug entries and follow the links given in references. As you can see, there is a Bugzilla entry for each of the listed regressions. There also is a Bugzilla entry used for tracking the regressions from 2.6.30, unresolved as well as resolved, at: http://bugzilla.kernel.org/show_bug.cgi?id=13615 Please let me know if there are any Bugzilla entries that should be added to the list in there. Thanks, Rafael ^ permalink raw reply [flat|nested] 384+ messages in thread
* [Bug #13940] iwlagn and sky2 stopped working, ACPI-related 2009-08-09 20:36 2.6.31-rc5-git5: Reported regressions from 2.6.30 Rafael J. Wysocki @ 2009-08-09 20:44 ` Rafael J. Wysocki 0 siblings, 0 replies; 384+ messages in thread From: Rafael J. Wysocki @ 2009-08-09 20:44 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Kernel Testers List, Ricardo Jorge da Fonseca Marques Ferreira This message has been generated automatically as a part of a report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.30. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940 Subject : iwlagn and sky2 stopped working, ACPI-related Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net> Date : 2009-08-07 22:33 (3 days old) References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4 ^ permalink raw reply [flat|nested] 384+ messages in thread
end of thread, other threads:[~2009-11-09 19:01 UTC | newest] Thread overview: 384+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-10-01 19:53 2.6.32-rc1-git2: Reported regressions 2.6.30 -> 2.6.31 Rafael J. Wysocki 2009-10-01 19:53 ` [Bug #13645] NULL pointer dereference at (null) (level2_spare_pgt) Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13836] suspend script fails, related to stdout? Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13809] oprofile: possible circular locking dependency detected Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13906] Huawei E169 GPRS connection causes Ooops Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13935] 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) Rafael J. Wysocki 2009-10-02 12:51 ` Jan Scholz 2009-10-02 12:51 ` Jan Scholz 2009-10-02 15:58 ` Jiri Kosina 2009-10-02 15:58 ` Jiri Kosina 2009-10-02 17:16 ` Rafael J. Wysocki 2009-10-02 17:16 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13940] iwlagn and sky2 stopped working, ACPI-related Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13869] Radeon framebuffer (w/o KMS) corruption at boot Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k Rafael J. Wysocki 2009-10-02 7:12 ` Fabio Comolli 2009-10-02 7:12 ` Fabio Comolli 2009-10-02 17:17 ` Rafael J. Wysocki 2009-10-02 17:17 ` Rafael J. Wysocki 2009-10-02 21:37 ` Fabio Comolli 2009-10-02 21:37 ` Fabio Comolli 2009-10-02 21:42 ` Rafael J. Wysocki 2009-10-02 21:42 ` Rafael J. Wysocki 2009-10-03 13:36 ` Fabio Comolli 2009-10-03 13:36 ` Fabio Comolli 2009-10-01 19:55 ` [Bug #13948] ath5k broken after suspend-to-ram Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13941] x86 Geode issue Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13942] Troubles with AoE and uninitialized object Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-02 19:36 ` Bruno Prémont 2009-10-02 19:36 ` Bruno Prémont 2009-10-02 21:24 ` Rafael J. Wysocki 2009-10-02 21:24 ` Rafael J. Wysocki 2009-10-02 19:57 ` David Rientjes 2009-10-02 19:57 ` David Rientjes 2009-10-01 19:55 ` [Bug #13950] Oops when USB Serial disconnected while in use Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-02 19:45 ` Bruno Prémont 2009-10-02 19:45 ` Bruno Prémont 2009-10-02 21:26 ` Rafael J. Wysocki 2009-10-02 21:26 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #13987] Received NMI interrupt at resume Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #14017] _end symbol missing from Symbol.map Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #14013] hd don't show up Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #14058] Oops in fsnotify Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-02 7:14 ` Jaswinder Singh Rajput 2009-10-02 7:14 ` Jaswinder Singh Rajput 2009-10-01 19:55 ` [Bug #14070] lockdep warning triggered by dup_fd Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #14090] WARNING: at fs/notify/inotify/inotify_user.c:394 Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #14133] WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule Rafael J. Wysocki 2009-10-02 7:00 ` Jaswinder Singh Rajput 2009-10-02 7:00 ` Jaswinder Singh Rajput 2009-10-02 7:34 ` Jens Axboe 2009-10-02 17:21 ` Rafael J. Wysocki 2009-10-02 17:21 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #14114] Tuning a saa7134 based card is broken in kernel 2.6.31-rc7 Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #14137] usb console regressions Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #14157] end_request: I/O error, dev cciss/cXdX, sector 0 Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #14143] OOPS when setting nr_requests for md devices Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #14181] b43 causes panic at system shutdown Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #14141] order 2 page allocation failures in iwlagn Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-02 9:11 ` Frans Pop 2009-10-02 9:11 ` Frans Pop 2009-10-02 9:32 ` Mel Gorman 2009-10-02 9:32 ` Mel Gorman 2009-10-02 10:01 ` Frans Pop 2009-10-02 10:01 ` Frans Pop 2009-10-02 20:01 ` Karol Lewandowski 2009-10-02 20:01 ` Karol Lewandowski 2009-10-04 19:28 ` Karol Lewandowski 2009-10-05 5:13 ` Frans Pop 2009-10-05 5:13 ` Frans Pop 2009-10-05 6:50 ` Frans Pop 2009-10-05 6:50 ` Frans Pop 2009-10-05 8:54 ` Frans Pop 2009-10-05 8:54 ` Frans Pop 2009-10-05 8:57 ` Mel Gorman 2009-10-05 8:57 ` Mel Gorman 2009-10-05 21:34 ` Frans Pop 2009-10-05 21:34 ` Frans Pop 2009-10-06 0:04 ` David Rientjes 2009-10-06 0:04 ` David Rientjes 2009-10-06 1:25 ` KOSAKI Motohiro 2009-10-06 1:25 ` KOSAKI Motohiro 2009-10-06 8:53 ` Mel Gorman 2009-10-06 8:53 ` Mel Gorman 2009-10-06 9:14 ` David Rientjes 2009-10-06 9:14 ` David Rientjes 2009-10-06 9:22 ` Mel Gorman 2009-10-06 9:22 ` Mel Gorman 2009-10-06 10:23 ` Frans Pop 2009-10-06 10:23 ` Frans Pop 2009-10-11 23:10 ` Frans Pop 2009-10-11 23:10 ` Frans Pop 2009-10-11 23:36 ` Frans Pop 2009-10-11 23:36 ` Frans Pop 2009-10-12 13:43 ` Mel Gorman 2009-10-12 13:43 ` Mel Gorman 2009-10-12 13:43 ` Mel Gorman 2009-10-12 17:32 ` Frans Pop 2009-10-12 17:32 ` Frans Pop 2009-10-12 18:43 ` Mel Gorman 2009-10-12 18:43 ` Mel Gorman 2009-10-13 20:38 ` Frans Pop 2009-10-13 20:38 ` Frans Pop 2009-10-14 10:30 ` Mel Gorman 2009-10-14 10:30 ` Mel Gorman 2009-10-14 10:30 ` Mel Gorman 2009-10-14 13:10 ` Frans Pop 2009-10-14 15:40 ` Mel Gorman 2009-10-14 15:40 ` Mel Gorman 2009-10-14 16:13 ` Frans Pop 2009-10-14 16:13 ` Frans Pop 2009-10-14 18:34 ` Frans Pop 2009-10-14 18:34 ` Frans Pop 2009-10-14 23:56 ` Mel Gorman 2009-10-14 23:56 ` Mel Gorman 2009-10-14 23:56 ` Mel Gorman 2009-10-15 20:15 ` Frans Pop 2009-10-15 20:15 ` Frans Pop 2009-10-16 9:39 ` Mel Gorman 2009-10-16 9:39 ` Mel Gorman 2009-10-14 16:30 ` reinette chatre 2009-10-14 16:30 ` reinette chatre 2009-10-18 23:33 ` Frans Pop 2009-10-18 23:33 ` Frans Pop 2009-10-18 23:33 ` Frans Pop 2009-10-19 0:36 ` Pekka Enberg 2009-10-19 0:36 ` Pekka Enberg 2009-10-19 2:44 ` Frans Pop 2009-10-19 2:44 ` Frans Pop 2009-10-19 2:44 ` Frans Pop 2009-10-19 9:49 ` [Bug #14141] order 2 page allocation failures (generic) Tobi Oetiker 2009-10-19 9:49 ` Tobi Oetiker 2009-10-19 9:54 ` Pekka Enberg 2009-10-19 9:54 ` Pekka Enberg 2009-10-19 9:54 ` Pekka Enberg 2009-10-19 14:01 ` Karol Lewandowski 2009-10-19 14:01 ` Karol Lewandowski 2009-10-19 14:06 ` Mel Gorman 2009-10-19 14:06 ` Mel Gorman 2009-10-19 14:06 ` Mel Gorman 2009-10-19 17:09 ` Karol Lewandowski 2009-10-19 17:09 ` Karol Lewandowski 2009-10-20 1:47 ` Karol Lewandowski 2009-10-20 1:47 ` Karol Lewandowski 2009-10-19 13:31 ` Mel Gorman 2009-10-19 13:31 ` Mel Gorman 2009-10-19 13:31 ` Mel Gorman 2009-10-19 13:40 ` Tobias Oetiker 2009-10-19 13:40 ` Tobias Oetiker 2009-10-19 13:40 ` Tobias Oetiker 2009-10-19 14:09 ` Mel Gorman 2009-10-19 14:09 ` Mel Gorman 2009-10-19 14:09 ` Mel Gorman 2009-10-19 14:16 ` Tobias Oetiker 2009-10-19 14:16 ` Tobias Oetiker 2009-10-19 14:59 ` Mel Gorman 2009-10-19 14:59 ` Mel Gorman 2009-10-19 20:12 ` Tobias Oetiker 2009-10-19 20:12 ` Tobias Oetiker 2009-10-19 20:17 ` Tobias Oetiker 2009-10-19 20:17 ` Tobias Oetiker 2009-10-20 10:57 ` Mel Gorman 2009-10-20 10:57 ` Mel Gorman 2009-10-20 11:44 ` Tobias Oetiker 2009-10-20 11:44 ` Tobias Oetiker 2009-10-20 12:51 ` Mel Gorman 2009-10-20 12:51 ` Mel Gorman 2009-10-20 12:58 ` Tobias Oetiker 2009-10-20 12:58 ` Tobias Oetiker 2009-10-20 13:39 ` Mel Gorman 2009-10-20 13:39 ` Mel Gorman 2009-10-20 13:50 ` Tobias Oetiker 2009-10-20 13:50 ` Tobias Oetiker 2009-10-20 14:14 ` Mel Gorman 2009-10-20 14:14 ` Mel Gorman 2009-10-20 14:20 ` Tobias Oetiker 2009-10-20 14:20 ` Tobias Oetiker 2009-10-22 10:27 ` Tobias Oetiker 2009-10-22 10:27 ` Tobias Oetiker 2009-10-19 2:52 ` [Bug #14141] order 2 page allocation failures in iwlagn Jens Axboe 2009-10-19 2:52 ` Jens Axboe 2009-10-19 14:01 ` Mel Gorman 2009-10-19 14:01 ` Mel Gorman 2009-10-19 14:01 ` Mel Gorman 2009-10-19 16:18 ` Chris Mason 2009-10-19 16:18 ` Chris Mason 2009-10-19 17:01 ` Christoph Hellwig 2009-10-19 17:01 ` Christoph Hellwig 2009-10-19 17:01 ` Christoph Hellwig 2009-10-19 17:01 ` Christoph Hellwig 2009-10-19 21:57 ` Chris Mason 2009-10-19 21:57 ` Chris Mason 2009-10-19 21:57 ` Chris Mason 2009-10-20 10:48 ` Mel Gorman 2009-10-20 10:48 ` Mel Gorman 2009-10-20 10:48 ` Mel Gorman 2009-10-26 21:06 ` Frans Pop 2009-10-27 14:54 ` Mel Gorman 2009-10-27 14:54 ` Mel Gorman 2009-10-27 14:54 ` Mel Gorman 2009-10-27 15:16 ` KOSAKI Motohiro 2009-10-27 15:16 ` KOSAKI Motohiro 2009-10-27 15:21 ` Mel Gorman 2009-10-27 15:21 ` Mel Gorman 2009-10-27 15:21 ` Mel Gorman 2009-10-27 15:52 ` Mel Gorman 2009-10-27 15:52 ` Mel Gorman 2009-10-27 15:52 ` Mel Gorman 2009-10-27 16:03 ` Chris Mason 2009-10-27 16:03 ` Chris Mason 2009-10-27 17:21 ` Frans Pop 2009-10-27 17:21 ` Frans Pop 2009-10-27 17:21 ` Frans Pop 2009-10-27 17:21 ` Frans Pop 2009-11-05 20:14 ` Frans Pop 2009-11-05 20:14 ` Frans Pop 2009-11-05 20:14 ` Frans Pop 2009-11-06 9:51 ` Frans Pop 2009-11-06 9:51 ` Frans Pop 2009-11-06 9:51 ` Frans Pop 2009-11-09 19:00 ` Mel Gorman 2009-11-09 19:00 ` Mel Gorman 2009-10-20 10:48 ` Mel Gorman 2009-10-25 18:54 ` Frans Pop 2009-10-25 18:54 ` Frans Pop 2009-10-25 18:54 ` Frans Pop 2009-10-14 16:28 ` reinette chatre 2009-10-14 16:28 ` reinette chatre 2009-10-14 16:50 ` Mel Gorman 2009-10-14 16:50 ` Mel Gorman 2009-10-14 20:41 ` reinette chatre 2009-10-14 20:41 ` reinette chatre 2009-10-14 21:33 ` Frans Pop 2009-10-14 21:33 ` Frans Pop 2009-10-14 21:55 ` reinette chatre 2009-10-14 21:55 ` reinette chatre 2009-10-15 2:02 ` Frans Pop 2009-10-15 2:02 ` Frans Pop 2009-10-15 15:29 ` reinette chatre 2009-10-15 15:29 ` reinette chatre 2009-10-15 19:41 ` Frans Pop 2009-10-16 17:21 ` reinette chatre 2009-10-16 17:21 ` reinette chatre 2009-10-16 17:21 ` reinette chatre 2009-10-17 5:42 ` reinette chatre 2009-10-17 5:42 ` reinette chatre 2009-10-17 5:42 ` reinette chatre 2009-10-27 11:10 ` Frans Pop 2009-10-27 11:10 ` Frans Pop 2009-10-27 16:15 ` reinette chatre 2009-10-27 16:15 ` reinette chatre 2009-10-27 16:15 ` reinette chatre 2009-10-01 19:55 ` [Bug #14185] Oops in driversbasefirmware_class Rafael J. Wysocki 2009-10-01 19:55 ` Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #14204] MCE prevent booting on my computer(pentium iii @500Mhz) Rafael J. Wysocki 2009-10-01 19:55 ` [Bug #14205] Intel DX58SO mainboard - powering off takes really long Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14249] BUG: oops in gss_validate on 2.6.31 Rafael J. Wysocki 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14248] 2.6.31 wireless: WARNING: at net/wireless/ibss.c:34 Rafael J. Wysocki 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14222] Hibernation oopses for the 2nd time with 2.6.31 (won't fit the screen) Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14252] WARNING: at include/linux/skbuff.h:1382 w/ e1000 Rafael J. Wysocki 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14253] Oops in driversbasefirmware_class Rafael J. Wysocki 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14254] Hibernation broken by clocksource: Save mult_orig in clocksource_disable() Rafael J. Wysocki 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14251] 2.6.31: no login prompt Rafael J. Wysocki 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14255] WARNING: at drivers/char/tty_io.c:1267 Rafael J. Wysocki 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-02 0:05 ` Linus Torvalds 2009-10-02 0:05 ` Linus Torvalds 2009-10-01 19:56 ` [Bug #14258] Memory leak in SCSI initialization Rafael J. Wysocki 2009-10-02 12:58 ` Tetsuo Handa 2009-10-02 17:26 ` Rafael J. Wysocki 2009-10-07 14:04 ` Tetsuo Handa 2009-10-07 20:24 ` Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14257] Not able to boot on 32 bit System Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14256] kernel BUG at fs/ext3/super.c:435 Rafael J. Wysocki 2009-10-04 17:38 ` Mikael Pettersson 2009-10-04 17:38 ` Mikael Pettersson 2009-10-04 20:49 ` Rafael J. Wysocki 2009-10-04 23:04 ` Mikael Pettersson 2009-10-04 23:04 ` Mikael Pettersson 2009-10-09 16:40 ` Mikael Pettersson 2009-10-09 16:40 ` Mikael Pettersson 2009-10-09 22:03 ` Rafael J. Wysocki 2009-10-09 22:03 ` Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14261] e1000e jumbo frames no longer work: 'Unsupported MTU setting' Rafael J. Wysocki 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-02 20:33 ` Nix 2009-10-02 21:31 ` Rafael J. Wysocki 2009-10-02 21:31 ` Rafael J. Wysocki 2009-10-02 22:13 ` Jeff Kirsher 2009-10-02 22:13 ` Jeff Kirsher 2009-10-07 18:34 ` Theodore Tso 2009-10-07 18:34 ` Theodore Tso [not found] ` <20091007183453.GD12971-3s7WtUTddSA@public.gmane.org> 2009-10-07 19:12 ` Jeff Kirsher 2009-10-07 19:12 ` Jeff Kirsher 2009-10-07 19:12 ` Jeff Kirsher 2009-10-01 19:56 ` [Bug #14264] ehci problem - mouse dead on scroll Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14270] Cannot boot on a PIII Celeron Rafael J. Wysocki 2009-10-01 19:56 ` Rafael J. Wysocki 2009-10-02 8:30 ` Cyrill Gorcunov 2009-10-02 8:30 ` Cyrill Gorcunov 2009-10-02 9:13 ` Michael Tokarev 2009-10-02 9:13 ` Michael Tokarev 2009-10-02 10:38 ` Michael Tokarev 2009-10-02 10:38 ` Michael Tokarev 2009-10-02 10:55 ` Cyrill Gorcunov 2009-10-02 10:59 ` Michael Tokarev 2009-10-02 10:59 ` Michael Tokarev 2009-10-02 14:05 ` Cyrill Gorcunov 2009-10-04 12:14 ` Michael Tokarev 2009-10-04 12:43 ` Cyrill Gorcunov 2009-10-01 19:56 ` [Bug #14265] ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100 Rafael J. Wysocki 2009-10-21 20:04 ` [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) Karol Lewandowski 2009-10-21 20:04 ` Karol Lewandowski 2009-10-21 21:06 ` David Rientjes 2009-10-21 21:06 ` David Rientjes 2009-10-21 21:06 ` David Rientjes 2009-10-21 21:20 ` Karol Lewandowski 2009-10-21 21:20 ` Karol Lewandowski 2009-10-21 21:20 ` Karol Lewandowski 2009-10-22 10:20 ` Mel Gorman 2009-10-22 10:20 ` Mel Gorman 2009-10-22 21:33 ` Karol Lewandowski 2009-10-22 21:33 ` Karol Lewandowski 2009-10-22 21:33 ` Karol Lewandowski 2009-10-01 19:56 ` [Bug #14266] regression in page writeback Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14267] Disassociating atheros wlan Rafael J. Wysocki 2009-10-05 0:34 ` Justin Mattock 2009-10-05 0:34 ` Justin Mattock 2009-10-05 20:09 ` Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14275] kernel>=2.6.31: ahci.c: do not force unconditionally sb600 to 32bit dma any more? Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14294] kernel BUG at drivers/ide/ide-disk.c:187 Rafael J. Wysocki 2009-10-01 19:56 ` [Bug #14301] WARNING: at net/ipv4/af_inet.c:154 Rafael J. Wysocki 2009-10-03 8:36 ` Eric Dumazet 2009-10-03 8:36 ` Eric Dumazet 2009-10-03 8:36 ` Eric Dumazet 2009-10-03 8:52 ` Eric Dumazet 2009-10-03 8:52 ` Eric Dumazet 2009-10-03 17:53 ` Eric Dumazet 2009-10-03 17:53 ` Eric Dumazet 2009-10-07 15:41 ` Eric Dumazet 2009-10-07 15:41 ` Eric Dumazet 2009-10-09 14:43 ` [PATCH] udp: Fix udp_poll() and ioctl() Eric Dumazet 2009-10-13 10:18 ` David Miller 2009-10-13 10:18 ` David Miller -- strict thread matches above, loose matches on Subject: below -- 2009-09-06 17:15 2.6.31-rc9: Reported regressions from 2.6.30 Rafael J. Wysocki 2009-09-06 17:24 ` [Bug #13940] iwlagn and sky2 stopped working, ACPI-related Rafael J. Wysocki 2009-09-06 17:24 ` Rafael J. Wysocki 2009-09-06 20:55 ` Ricardo Jorge da Fonseca Marques Ferreira 2009-09-06 20:55 ` Ricardo Jorge da Fonseca Marques Ferreira 2009-09-06 21:11 ` Rafael J. Wysocki 2009-08-25 20:00 2.6.31-rc7-git2: Reported regressions from 2.6.30 Rafael J. Wysocki 2009-08-25 20:34 ` [Bug #13940] iwlagn and sky2 stopped working, ACPI-related Rafael J. Wysocki 2009-08-25 20:34 ` Rafael J. Wysocki 2009-08-26 0:00 ` Ricardo Jorge da Fonseca Marques Ferreira 2009-08-26 0:00 ` Ricardo Jorge da Fonseca Marques Ferreira 2009-08-26 20:58 ` Rafael J. Wysocki 2009-08-26 20:58 ` Rafael J. Wysocki [not found] <bug-13940-13546@http.bugzilla.kernel.org/> [not found] ` <200908251555.n7PFt7Wt015763@demeter.kernel.org> 2009-08-25 17:56 ` [Bug 13940] " Yinghai Lu 2009-08-25 18:42 ` Linus Torvalds 2009-08-25 19:00 ` Yinghai Lu 2009-08-26 17:44 ` Yinghai Lu 2009-08-19 20:20 2.6.31-rc6-git5: Reported regressions from 2.6.30 Rafael J. Wysocki 2009-08-19 20:26 ` [Bug #13940] iwlagn and sky2 stopped working, ACPI-related Rafael J. Wysocki 2009-08-19 23:54 ` Ricardo Jorge da Fonseca Marques Ferreira 2009-08-19 23:54 ` Ricardo Jorge da Fonseca Marques Ferreira [not found] ` <200908200054.36939.storm-cOTmPFJTJjbk1uMJSBkQmQ@public.gmane.org> 2009-08-20 14:59 ` Rafael J. Wysocki 2009-08-20 14:59 ` Rafael J. Wysocki 2009-08-09 20:36 2.6.31-rc5-git5: Reported regressions from 2.6.30 Rafael J. Wysocki 2009-08-09 20:44 ` [Bug #13940] iwlagn and sky2 stopped working, ACPI-related Rafael J. Wysocki
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.